Open AI Models & Github Copilot

This report explores the integration of Artificial Intelligence (AI) into existing technologies, using GitHub Copilot as a case study of current implementations. Recent advancements in AI have led to the development of advanced Large Language Models such as GPT-3 and Codex by OpenAI. These Large Language Models (LLMs) are capable of processing and analyzing vast amounts of data, generating advanced human-like responses, and even completing complex software engineering applications. With such progress, LLMs are now mature enough for integration into business applications, providing significant potential benefits for those who can effectively leverage the technology.

The purpose of this report is to examine the impact of LLMs on existing technologies and their potential implications for productivity in business. Specifically, this report will investigate the implementation of GitHub Copilot, an AI-powered code completion tool that assists developers in building application components. By analyzing GitHub Copilot's implementation, we can gain insight into the challenges and opportunities associated with integrating LLMs into existing technologies.

Evolution of AI

The concept of artificial intelligence has been a topic of interest for decades. One of the earliest pioneers to explore the potential of artificial intelligence was Alan Turing. In his 1950 paper, he proposed the question; “Can machines think?” Turing proposed a test to determine whether a machine could be considered to “think” like humans. To determine the answer, he proposed a test where a human evaluator converses with a machine and another human, without knowing which is which. If the evaluator cannot distinguish between the machine and the human, the machine is considered to have passed the test. Turing also identified possible limitations of such tests, indicating that machines lack intuition and creative abilities that differentiate human thinking from machine thinking. Nonetheless, Turing's test is regarded as a foundational work in the development of modern artificial intelligence.

Research on the topic persisted, but fundamental limitations of early computers hindered further development of the technology. Researchers attempted various techniques to address the broad and extensive problems they were trying to solve. One of the earliest examples of an AI system was the General Problem Solver by Ernst & Newell. While the system was a breakthrough at the time, it was severely limited in functionality, primarily due to the combinatorial explosion phenomenon. Simply put, there were too many inputs and potential outcomes for the computers to calculate at the time.

The period from the 1960s to the 1980s saw numerous PhDs and researchers release papers outlining new methods and advancements in the field of AI. Despite this progress, commercially viable AI products fail to make their way into markets. However, the 1990s marked a turning point, with AI beginning to trickle its way into certain industries. The first prototypes of autonomous vehicles were produced, and other industries explored AI implementations. Machine learning models emerged as the prevailing method of AI implementation, but the limited availability of compute power hindered the technology's progress.

In the 2000s, there were significant breakthroughs in machine learning. Researchers solved the game of checkers, a major achievement given that the game has approximately 500 billion potential board positions. Shortly after, Microsoft released the Kinect for Xbox 360, implementing one of the most advanced machine learning computer-vision algorithms at the time, providing a commercial application for AI/machine learning models. The 2010s saw an explosion in advancements, from Apple’s Siri release to Google DeepMind's defeat of top Go players, which was considered a major milestone due to the game's complexity.

Recent developments have been led by OpenAI, which debuted its ML bot at a Dota 2 tournament in 2017. Three years later, the release of GPT-3 is considered one of the most advanced machine learning algorithms. The most recent surge in attention to the industry is due to the public release of ChatGPT, which is an implementation of GPT-3.5 in a chatbot form available to the public free of charge.

The history of AI development indicates a clear trend toward a technological inflection point. The advancement of AI models has been increasing rapidly, and the current state of the S-curve is showing rapid growth. In the past two years, commercial applications have increased exponentially. As a result, companies are now faced with the challenge of leveraging these recent advancements and implementing them into various industries.

Introduction

GitHub copilot is a powerful tool that is revolutionizing the way programmers write code. Aa an AI-powered code suggestion and editor engine, it was designed to use cutting edge deep learning to provide developers with intelligent code completion suggestions and even some novel ideas. In essence, it is a more powerful, adaptive, and intelligent version of the pre-existing integrated development environments and the version control systems for code that programmers and engineers use. While there were tools to assist in code writing, historically, programmers’ primary way of completing a functioning set of code was to write it manually, a tedious and time consuming process that was all the harder for inexperienced developers. And with the plethora of coding languages, even experienced programmers struggled with branching out to different kinds of projects and the knowledge needed to work on them. With GitHub copilot, developers can now rely on context-aware, intelligent, and relevant AI-powered code suggestions, saving users time and effort, allowing them to focus on more specific higher level tasks.

The creation of GitHub Copilot is the result of the accumulation of technology that has been in the works over the past two decades, marking an important milestone in the development of code, as it is a machine that helps perpetuate improvements. The primary technology behind GitHub Copilot is OpenAI’s GPT systems, a deep learning algorithm whose functionality and flexibility has recently taken the world by storm. Artificial intelligence has made enormous strides in many areas over the years, and these types of large language models are a testament to the incredible time and effort that had been historically invested into natural language processing and machine learning.

Some of the crucial developments that led to the creation of AI-powered code suggestion tools such as GitHub Copilot include deep learning and cloud computing. The rise of deep learning introduced and made mainstream using neural networks to improve machine learning over time, its accuracy and validity compounding continuously. Deep learning algorithms and neural networks such as these excel in processing immense quantities of data, far larger than any human being is capable of analyzing, and providing details on patterns and correlations it identifies, making them the foremost tool in large language models and natural language processing machines. Cloud computing allowed for the accessibility and availability of vast amounts of computing power, all shared over a common network, making it possible for developers and programmers to access and process large amounts of data in a secure and efficient manner. Without cloud computing, developing AI tools would be nearly impossible, for it would require a central base where thousands of exabytes of data would need to be stored for a computer to process it all.

GitHub Copilot is the accumulation of many cutting edge technologies, revolutionizing the way developers can write code, and no doubt will be a significant tool in the arsenal of programmers and coding professionals in the future.

S-Curves

Currently, we are in the early-middle stages of the S-curve for AI in general nearing the “takeoff” state, where the investment and research of AI is starting to pay off on an exponential scale. We are entering the stage where the product is fit for mass consumers and not just early adopters. There's still immense effort necessary to increase the growth of the industry. The deal count of venture capital investments into generative AI has increased exponentially in recent years. Increasing capital flows will propel AI to become a full-fledged S-curve with faster development and more competition from other companies in the upcoming years.

In terms of GitHub CoPilot, it can represent a new S-curve in the Integrated Development Environment space. In the beginning, we punched 0s and 1s on physical papers and bugs were literal bugs that were stuck inside the machines. This required a lot of effort and resources so we developed programming languages to communicate with the computers easier. Then, we came up with user interfaces with more functions that help programmers focus on the code itself called Integrated Development Environments (IDE). GitHub Copilot, with its AI capability of programming through machine learning, will be the new jump from one S curve to another.

Using TOE analysis, we can peek into the future potential of GitHub Copilot and how the jump will affect the current state of programming. From a technological standpoint, Open AI is on the leading edge of AI model development; Codex and GPT are continuing to rapidly improve in terms of machine learning models and code outputs. Organizationally the advantages of Microsoft's suite of products are immense and further discussed in the following section. However, there remains a large organizational impact for companies which integrate technologies such as Copilot into their work processes. There are benefits of increased efficiency among developers, reduced repetitive work and increased satisfaction. GitHub’s internal team estimates 88% faster code completion and 74% of Copilot users feel more satisfied with their job. This can increase employee retention and create better working environments. Environmentally, GitHub Copilot is ahead of most competitors as this technology is fairly new and has a lot of room to grow. We can expect exponential growth in interest and market size for AI assistance in programming. One environmental drawback is that Open AI is also ahead of regulatory frameworks such as government regulations or copyright laws. As we will explain in the Obstacles section of this paper, the inner workings of GitHub Copilot are not transparent to the public and it is stirring up concerns of ethics and ownership issues. Overall, we expect the jump from the IDE curve to the GitHub Copilot curve will be smooth.

Revenue Models

GitHub Copilot is one of the first examples of a commercially successful AI product with a large total addressable market. Microsoft's exclusive licensing agreement with OpenAI highlights its commitment to this potential market. Their recent 2023 10K filing directly states their intent: “We are building AI into many of our offerings, including our productivity services… We expect these elements of our business to grow. We envision a future in which AI operating in our devices, applications, and the cloud helps our customers be more productive in their work and personal lives” . Copilot represents one of the first of many potential implementations of AI into a suite of existing products.

The organizational environment of Microsoft is well suited to integrate such product, their already popular Visual Studio Code developer environment holds an estimate 75% market share, with millions of users. The existing technical integrations with GitHub made it an excellent opportunity for potential monetization by Microsoft. Currently, Copilot has two pricing models, $100 per year for individual developers/users and $19 per user per month for business team plans.

Copilot has already demonstrated to be an immense success, Microsoft CEO Satya Nadella stated they have already surpassed one million paid users, yet have a potential market much larger. Key factors of Microsoft’s execution are due to their ability to identify potential integration areas that create synergy, improving the end product for consumers, and even turning a free product into a subscription model. The success of this implementation was dependent on the great value customers receive from the product. This successful implementation of AI provides valuable managerial insight to businesses looking to integrate AI into their own products and services.

Obstacles

If an AI creates a piece of work from the information it gathered from other source materials, who does the work belong to? Open AI being in its early stages of development and without established copyright standards, any one of the following can very well claim copyright to the output: the AI, creators of the original sources, inventor of the AI, or person who instructed the AI to create the piece.

There already have been concerns regarding copyright infringement of GitHub Copilot. On November 3rd, Joseph Saveri Law Firm filed a class action lawsuit regarding GitHub Copilot. The main concern is that instead of creating new code based on public-access open-source GitHub codes that GitHub Copilot supposedly trained on, it is spitting out a mere copy of the source material. Codex or Copilot does not accredit the creators of the source materials they train on nor keep track of them. It is crucial to determine if training on source code and creating near identical outputs is against fair-use laws or not. Until it is decided under jurisdiction that Open AI and its current practices fall under fair-use rights,, we can expect reluctance and pushback from potential users of the community.

Open AI brings ethics onto the table as well. Where will programmers and artists stand when AI replaces their jobs by training its system with their unauthorized work. How will we know which authors the AI took from legitimately (or illegitimately) when there is no credibility system? Microsoft recently disbanded their AI ethics and society team, who is in charge of making sure that the principles Microsoft claim are actually in place and being performed by the AI they create. When leaders of the industry are more concerned with the profitability and leave moral practices behind, we will have to wait for government intervention over a long period of time for Open AI to be acknowledged as a fair, ethical source of creation and this may hinder the use of Open AI for major potential customers.

Managerial Implications

Open AI Models and GitHub Co-Pilot overall have made significant contributions to the field of AI, including breakthroughs in natural language processing, reinforcement learning, and computer vision.

From a managerial standpoint, Open AI is succeeding in terms of research and development within the field of AI itself. The team of researchers and engineers behind the software are currently and have already completed cutting edge projects with incredibly impressive results. Additionally, both Open AI and GitHub Copilot have been successful in acquiring both private and public funding sources which in turn has allowed them to further invest in research and development. Another success from a managerial standpoint stems from both platforms quickly becoming a widely accepted extension for professional developers in a positive light.

Although there are a lot of positives that have come from OpenAI and GitHub Copilot, there is still a lot of room for improvement from a managerial standpoint.

Open AI and GitHub Copilot need to be more transparent about how their system works, including the training, biases, and limitations of the program. In turn, this will allow developers to better understand the platform and expand their work or refrain from certain clashes they previously did not know about. Further, it is important for these programs to address the biases that are associated with each one. It is impossible for the developers to not accidentally include bias, so the AI itself should work to identify the biases within the system and announce them publicly so as to create a platform that is fair. Another implication from a managerial standpoint is the concept of ensuring privacy. Since GitHub Copilot processes code in real-time, there is a large concern for privacy and data protection. There should be steps taken to ensure that individual data is safe, secure, and only collected if necessary for the program to run properly. Once these issues are resolved, the already booming integration of Open AI and GitHub Co-Pilot will only expand further from a managerial perspective.

Tech Integration

How does CoPilot integrate into existing stacks that developers use? How could a company leverage CoPilot into their existing developer tools to bolster productivity? These are valid questions that need to be addressed in order to truly see the impact that predictive and context aware code completion tools will have on the Software Development industry moving forward.

Currently, CoPilot is available at both the individual level and the corporate level, providing price points suitable for each respective party. Given that Git is the primary version control paradigm for organizations across the board, integrating this into existing code editors is as simple as downloading and installing a package. Licensing this toolkit allows for developers to adhere to this subscription model, and build on top of their existing tool stack.

CoPilot utilizes OpenAI’s Codex, which is the fundamental backbone of the service. Codex is a production-level version of GPT-3. Not to mention, Codex is trained on much more than just English. It is trained on many different programming languages, which allows it to be versatile in any coding environment. Fundamentally, Codex is the core engine of CoPilot, and is the reason this service is so vastly capable of improving productivity.

In addition to being trained on different programming languages, CoPilot also uses many GitHub repositories as its training data, and builds off of these preexisting code snippets to predict code in diverse contexts.

Fundamentally, CoPilot does not make any changes to a base technology toolkit and works primarily as an extension to be integrated into a preexisting integrated development environment. This maintains the Total Addressable Market by building on top of preexisting tools, and providing value directly out of the box.

The evolution of AI from its early beginnings to the current state of rapid advancement and commercial application is extremely important to understand its future trajectory. GitHub Copilot is a powerful tool that uses cutting edge machine learning algorithms to provide developers with intelligent code completion suggestions and creative ideas. Its development marks a significant milestone in the progress of AI-powered code writing. While the technology is not necessarily groundbreaking, it is a testament to the significant progress made over the past two decades.

Resources

Bellan, Rebecca. “Microsoft Lays off an Ethical AI Team as It Doubles down on OpenAI.” TechCrunch, 13 Mar. 2023, https://techcrunch.com/2023/03/13/microsoft-lays-off-an-ethical-ai-team-as-it-doubles-down-on-openai/.

Dastin, Jeffrey. “Microsoft Attracting Users to Its Code-Writing, Generative AI Software.” Reuters, Thomson Reuters, 25 Jan. 2023, https://www.reuters.com/technology/microsoft-attracting-users-its-code-writing-generative-ai-software-2023-01-25/.

Dickmanns, Ernst. “Vehicles Capable of Dynamic Vision.” ResearchGate, Jan. 1997, https://www.researchgate.net/publication/220814662_Vehicles_Capable_of_Dynamic_Vision.

Kalliamvakou, Eirini. “Research: Quantifying Github Copilot's Impact on Developer Productivity and Happiness.” The GitHub Blog, 7 Sept. 2022, https://github.blog/2022-09-07-research-quantifying-github-copilots-impact-on-developer-productivity-and-happiness/.

MSFT. “Microsoft Quarterly Earnings Report (10Q).” SEC.gov, 24 Jan. 2023, https://www.sec.gov/ix?doc=%2FArchives%2Fedgar%2Fdata%2F789019%2F000156459023000733%2Fmsft-10q_20221231.htm#INCOME_STATEMENTS.

Schaeffer, Jonathan. “Checkers Is Solved Et Al.. Science 317, 1518 (2007);” ScienceMag.org, AAAS, 2007, https://www.cs.mcgill.ca/\~dprecup/courses/AI/Materials/checkers_is_solved.pdf.

“Stack Overflow Developer Survey 2021.” Stack Overflow, https://insights.stackoverflow.com/survey/2021#section-most-popular-technologies-integrated-development-environment.

Turing, Alan M. “Computing Machinery and Intelligence .” Cs.umbc.edu, 1950, https://redirect.cs.umbc.edu/courses/471/papers/turing.pdf.

Vaughan-Nichols, Steven J. GitHub's Copilot Faces First Open Source Copyright Lawsuit. The Register, 11 Nov. 2022, https://www.theregister.com/2022/11/11/githubs_copilot_opinion/#:\~:text=GitHub%20now%20has%20an%20annual,million%20when%20it%20was%20acquired.

jkw1

Open AI Models & Github Copilot

jkw1