Director of Engineering at GitHub on AI tools for code generation

•Download as PPTX, PDF•

0 likes•62 views

The document discusses various AI tools for code generation including Copilot, Codex, GPT-3, TabNine, and Kite. It provides details on how each tool works, such as using statistical correlations to generate code for Codex based on function descriptions. The document also discusses reviews of Copilot and potential security risks if adversarial code is uploaded for models to learn from. It concludes that DevOps and AI can work together in areas like code reviews, testing, and anomaly detection.

Technology

We use the developer community to help us

We could participate with the right structure with people who care
deeply about developing AI in a way that is safe and is beneficial to
humanity.
The best defense is to empower as many people as possible to
have AI. If everyone has AI powers, then there's not any one person
or a small set of individuals who can have AI superpower.

GPT - Generative Pre-Trained
Transformer
• GPT - an innovation in the
Natural Language Processing
(NLP) space
• Takes an input such as a sentence
and tries to generate an
appropriate response.
• Unsupervised and Pre-trained

• A machine learning model that can look at
part of a sentence and predict the next
word.
• The GPT-2 was trained on a massive 40GB
dataset called WebText
• GPT2, is opened sourced

Sequential Text Prediction Model
• Has been known to be the
most advanced of its kind
• Can understand the
meaning of a sentence
and try to output a
meaningful sentence
• Public can use OpenAI
APIs to make use of the
GPT-3 model.

• Codexisa descendentofGPT-3designedtoperformonespecializedtask (transformingfunctiondescriptionsandsignaturesintosourcecode)withhigh
accuracy.
• Thedeeplearningmodeldoesnotunderstandprogramming.Likeall otherdeeplearning–basedlanguagemodels,Codexis capturingstatistical
correlationsbetweencodefragments.

Reviews
• Big time-saver. It built out entire
React components for me.
• Copilot can autofill repetitive
code if it senses a pattern
• Besides, providing suggestions
regarding code completion, it is
also a very good spell detector.

• Copilots, KITe and TabNine Analyzes the code on the file context only
• Copilot uses smaller memory size only 12 billions parameters vs 175 billion on GPT-3
• Performance is reduced when the size of the model increases
• The models are relatively new they need to be trained

All the products are built to learn from our preferences and make better code
suggsetions. So the more we use them the better they will become.

Security risk - If adversary uploads malicious code in
GitHub in enough abundance and targeted for a
specific type of prompt, Codex or GPT-2 might pick up
those patterns during training and then output them in
response to user instructions.
Licensing -what happens when the tool reproduces
code snippets thar are licensed and under copyright
protection? GitHub has said there is 0.1 percent
chance of Copilot replicating the learned snippet of
code verbatim.
Vulnerabilities & Bugs - Code often contains bugs—and
so, given the vast quantity of unvetted code that
Copilot has processed, it is certain that the language
model will have learned from exploitable, buggy code.

DevOps and AI
operate together
• Code reviews
• Software testing
• Monitor systems
• Resource management
• Anomaly detection & AIOps

Director of Engineering at GitHub on AI tools for code generation

What's hot

GitOps, Driving NGN Operations Teams 211127 #kcdgt 2021William Caban

Marko BerkovićCodeFest

The Role of GitOps in IT-Strategy - November 2021 - Schlomo Schapiro - Contin...Schlomo Schapiro

How open source is driving DevOps innovation: CloudOpen NA 2015Gordon Haff

Data Science Challenges in Personal Program AnalysisWork-Bench

Pentaho 8 Reporting for Java Developers - Because details matterFrancesco Corti

Scala from the Trenches Kfir Bloch

Sprachsteuerung mit dem Google Assistant – Add a new User Interface to your P...inovex GmbH

It's all about feedback - code review as a great tool in the agile toolboxStefan Lay

Scala from the Trenches - Java One 2016Kfir Bloch

Git in the Enterprise: How to succeed at DevOps using Git and a monorepoGina Bustos

Scaling Analysis ResponsiblyWork-Bench

ATAGTR2017 Expanding test horizons with Robot FrameworkAgile Testing Alliance

ESE 2010: Using Git in EclipseChris Aniszczyk

What's New in GitLab and Software Development TrendsNoa Harel

DevSecCon Boston2018 - advanced mobile security automation with bddDavide Cioccia

Case Study: Migration to GitLab (from Bitbucket) at AppsFlyerNoa Harel

Code Refactoring or Rewrite: How to Properly Dispose of Legacy CodeRoman Labunsky

Attacking and defending GraphQL applications: a hands-on approachDavide Cioccia

Building A Distributed Build System at Google Scale (StrangeLoop 2016)Aysylu Greenberg

What's hot (20)

GitOps, Driving NGN Operations Teams 211127 #kcdgt 2021

Marko Berković

The Role of GitOps in IT-Strategy - November 2021 - Schlomo Schapiro - Contin...

How open source is driving DevOps innovation: CloudOpen NA 2015

Data Science Challenges in Personal Program Analysis

Pentaho 8 Reporting for Java Developers - Because details matter

Scala from the Trenches

Sprachsteuerung mit dem Google Assistant – Add a new User Interface to your P...

It's all about feedback - code review as a great tool in the agile toolbox

Scala from the Trenches - Java One 2016

Git in the Enterprise: How to succeed at DevOps using Git and a monorepo

Scaling Analysis Responsibly

ATAGTR2017 Expanding test horizons with Robot Framework

ESE 2010: Using Git in Eclipse

What's New in GitLab and Software Development Trends

DevSecCon Boston2018 - advanced mobile security automation with bdd

Case Study: Migration to GitLab (from Bitbucket) at AppsFlyer

Code Refactoring or Rewrite: How to Properly Dispose of Legacy Code

Attacking and defending GraphQL applications: a hands-on approach

Building A Distributed Build System at Google Scale (StrangeLoop 2016)

Similar to Director of Engineering at GitHub on AI tools for code generation

Open Source Security and ChatGPT-Published.pdfJavier Perez

codex.pptxASHISH KUMAR

GitHub Copilot.pptxLuis Beltran

Everything You Need To Know About ChatGPTExpeed Software

Smart modeling of smart softwareJordi Cabot

160930-artificial-intelligence-template-4x3.pptxBehzad74

Implications of GPT-3Raven Jiang

ChatGPT and OpenAI.pdfSonal Tiwari

Improving your team’s source code searching capabilitiesNikos Katirtzis

Improving your team's source code searching capabilities - Voxxed Thessalonik...Nikos Katirtzis

acomprehensivereviewoflargelanguagemodelsfor-230515063139-1fc27b64.pdfYaserAli40

A Comprehensive Review of Large Language Models for.pptxSaiPragnaKancheti

OpenAI Chatgpt.pptxNawroz University

chatgpt how it workscolomomario446

Everything to know about ChatGPTKnoldus Inc.

ChatGPT usage in software development - curse or boon.pdfLaura Miller

codex (1).pptssuserdf52ca

orlando-codecamp-meet-copilot-24-Feb-2024_pub.pptxBill Wilder

Software Modeling and Artificial Intelligence: friends or foes?Jordi Cabot

Academic Integrity and Gen AI -Basic Concepts and SkillsAhmed-Refat Refat

Similar to Director of Engineering at GitHub on AI tools for code generation (20)

Open Source Security and ChatGPT-Published.pdf

codex.pptx

GitHub Copilot.pptx

Everything You Need To Know About ChatGPT

Smart modeling of smart software

160930-artificial-intelligence-template-4x3.pptx

Implications of GPT-3

ChatGPT and OpenAI.pdf

Improving your team’s source code searching capabilities

Improving your team's source code searching capabilities - Voxxed Thessalonik...

acomprehensivereviewoflargelanguagemodelsfor-230515063139-1fc27b64.pdf

A Comprehensive Review of Large Language Models for.pptx

OpenAI Chatgpt.pptx

chatgpt how it works

Everything to know about ChatGPT

ChatGPT usage in software development - curse or boon.pdf

codex (1).ppt

orlando-codecamp-meet-copilot-24-Feb-2024_pub.pptx

Software Modeling and Artificial Intelligence: friends or foes?

Academic Integrity and Gen AI -Basic Concepts and Skills

Recently uploaded

Presentation on how to chat with PDF using ChatGPT code interpreternaman860154

Pigging Solutions in Pet Food ManufacturingPigging Solutions

Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst

Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community

"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited

Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies

Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK

Install Stable Diffusion in windows machinePadma Pradeep

The transition to renewables in India.pdfCompetition Advisory Services (India) LLP

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

Artificial intelligence in the post-deep learning eraDeakin University

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55

Build your next Gen AI Breakthrough - April 2024Neo4j

Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge

Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada

Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson

Recently uploaded (20)

Presentation on how to chat with PDF using ChatGPT code interpreter

Pigging Solutions in Pet Food Manufacturing

Human Factors of XR: Using Human Factors to Design XR Systems

Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365

Benefits Of Flutter Compared To Other Frameworks

Advanced Test Driven-Development @ php[tek] 2024

08448380779 Call Girls In Friends Colony Women Seeking Men

Unblocking The Main Thread Solving ANRs and Frozen Frames

Install Stable Diffusion in windows machine

The transition to renewables in India.pdf

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics

My Hashitalk Indonesia April 2024 Presentation

Artificial intelligence in the post-deep learning era

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...

Build your next Gen AI Breakthrough - April 2024

Designing IA for AI - Information Architecture Conference 2024

Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024

Are Multi-Cloud and Serverless Good or Bad?

Director of Engineering at GitHub on AI tools for code generation

1. Director of Engineering at GitHub

2. The future of developer tools

3. Too many repetitive operations

4. We use the developer community to help us

5. What is the problem?

6. AI tools for code generation

7. We could participate with the right structure with people who care deeply about developing AI in a way that is safe and is beneficial to humanity. The best defense is to empower as many people as possible to have AI. If everyone has AI powers, then there's not any one person or a small set of individuals who can have AI superpower.

8. GPT - Generative Pre-Trained Transformer • GPT - an innovation in the Natural Language Processing (NLP) space • Takes an input such as a sentence and tries to generate an appropriate response. • Unsupervised and Pre-trained

9. • A machine learning model that can look at part of a sentence and predict the next word. • The GPT-2 was trained on a massive 40GB dataset called WebText • GPT2, is opened sourced

10. Local Machine Tabnine

11.

12. Sequential Text Prediction Model • Has been known to be the most advanced of its kind • Can understand the meaning of a sentence and try to output a meaningful sentence • Public can use OpenAI APIs to make use of the GPT-3 model.

13. • Codexisa descendentofGPT-3designedtoperformonespecializedtask (transformingfunctiondescriptionsandsignaturesintosourcecode)withhigh accuracy. • Thedeeplearningmodeldoesnotunderstandprogramming.Likeall otherdeeplearning–basedlanguagemodels,Codexis capturingstatistical correlationsbetweencodefragments.

14. Copilot

15.

16. Reviews • Big time-saver. It built out entire React components for me. • Copilot can autofill repetitive code if it senses a pattern • Besides, providing suggestions regarding code completion, it is also a very good spell detector.

17. Critics

18. • Copilots, KITe and TabNine Analyzes the code on the file context only • Copilot uses smaller memory size only 12 billions parameters vs 175 billion on GPT-3 • Performance is reduced when the size of the model increases • The models are relatively new they need to be trained

19. All the products are built to learn from our preferences and make better code suggsetions. So the more we use them the better they will become.

20. Security risk - If adversary uploads malicious code in GitHub in enough abundance and targeted for a specific type of prompt, Codex or GPT-2 might pick up those patterns during training and then output them in response to user instructions. Licensing -what happens when the tool reproduces code snippets thar are licensed and under copyright protection? GitHub has said there is 0.1 percent chance of Copilot replicating the learned snippet of code verbatim. Vulnerabilities & Bugs - Code often contains bugs—and so, given the vast quantity of unvetted code that Copilot has processed, it is certain that the language model will have learned from exploitable, buggy code.

21. DevOps and AI operate together • Code reviews • Software testing • Monitor systems • Resource management • Anomaly detection & AIOps

Editor's Notes

I am very excited to be here today. Let me first introduce myself.My name is Meirav and I am a Director of engineering at GitHub owning npm the public registry of node packages, today I am not going to talk about the past or what my teams has been doing even though it's pretty interesting,
Today I am going to talk about the future. Specifically the future of developer tools. But Before I talk about the future we should probably talk about the present and what is the problem with it.
As developers we have to do a lot of repetitive and sometimes even boring tasks, like creating authentication models, http clients or implementing CRUD operations. As developers we don't want to invent the wheel every time over and over again so
We go to the community to help us, there are a bunch of platforms for it, the most known ones are stack overflow. With very basic string syntax we search for what we are looking for and usually the search engine will give us back some Stack Overflow solution. In most cases we will copy and paste it (or some similar version of it) to our IDE.
This process is time consuming, error prone and distracting, we lose our focus and context every time we leave the IDE to the browser and make decisions on our code that might be risky. However, relying on the knowledge of the developer community is important and very helpful for all of us. So Instead of searching on the web for solutions, it seems that integrating something similar as stack overflow inside our IDE will make developers more efficient and less likely to make mistakes.
So with the continued growth of technology, prediction tools such as Intellij and AI systems a new line of developer tools emerges such as CoPilot , Kite and TabNine which I think are going to shape the future of developers. These tools have an AI engine that is able to give code suggestions for whole lines or entire functions right inside the IDE based on simple sentences. Today I am going to share with you a bit more details about these tools and how they work and how we can all benefit from them.
I used a lot of big terms such as gpt-2 gpt -3 codex aמd more let me explain a bit more what do they mean.. OpenAI is an AI research lab company. They are the ones who Created the generative pre-training (GPT) language models. They Deliver API’s that can provide a general-purpose , you input some text , and the model will generate a text completion that attempts to match whatever context or pattern you gave it. The reason they develop such APIs is because of their vision of making AI accessible for everyone, they believe that if everyone has the power to use AI it will ensure that not any one person or a small set of individuals can have AI superpower.
GPT generative pre-training transformers is an innovation in the Natural Language Processing (NLP) NLP are models which aim to make computers understand the unstructured language human speaks and retrieve meaningful pieces of information from it. The groundbreaking change with GPT is that unlike NLP previous models it wasn't trained for a specific task it’s general and it’s using the unsupervised approach for the machine learning algorithm. There are two types of Machine Learning algorithms: Supervised and Unsupervised. Supervised learning includes all those algorithms that must need labeled data and can verify what they have learned. Or in other words is able to identify if the answer is right or wrong. Supervised learning isn’t something humans dont really do. Rather, most of the time, we collect knowledge based on our experience, or intuitions. That’s what roughly you can regard as unsupervised learning. The algorithm is not provided with any pre-assigned labels or scores for the training data. In unsupervised learning, an AI system will group unsorted information according to similarities and differences even though there are no categories provided.
GPT-2 is A machine learning model that can look at part of a sentence and predict the next word. The most famous language models we all know are smartphone keyboards that suggest the next word based on what you’ve currently typed. GPT-2 is open sourced and a direct scale-up of GPT, with more than 10X the parameters and trained on more than 10X the amount of data.The model uses 1.5 billion parameters and trained on a dataset of 8 million web pages.It is trained with a simple objective: predict the next word, given all of the previous words within some text.
First I will talk about Tebnine, their first version of their product was published in 2018. It works with 21 IDEs and 30 programming languages. Their AI engine called Deep TabNine is based on a GPT-2 model ,which I will explain a bit later what it is, but it short it can predict the next word, given all of the previous words within some text. . Deep TabNine was trained on 2 millions of GitHub’s open source repositories. As you can see in this diagram the Plugin listens to the keyboard and uses the file you are working on as the context for the input sending the information to the Deep Tabnine model, which suggests solutions. The plugin registers the suggestion you choose in order to improve its suggestion for the next time. Tabnine runs locally on your machine by installing its models once you register. The pros of having everything locally keeps your code secure and the suggestion mechanism becomes more suited to your preferences. However its known that the gpt-2 model requires a lot of computing power so if you dont have a strong machine you might feel that the plugin is slow or not responsive, another downside is that with local configuration you are losing the tool’s improvements coming from public usage. Tabnine prefer the local configuration because this way your code never leaves the local machine. However they recently published a cloud version of their tools but you need to opt in to use it.
This is a short demo of Tabnine using Type secript. As you can see it looks very similar to any autocomplete plugin. However you can see that the suggestions are using past context such as parameter name.
GPT-3 Could Be Called a Sequential Text Prediction Model. Its the 3rd version release and the upgraded version of GPT-2. Version 3 takes the GPT model to a whole new level as it’s trained on 175 billion parameters (which is over 10x the size of, GPT-2). GPT-3 can now go further with tasks such as answering questions, writing essays, text summarization, language translation, and generating computer code. The algorithmic structure of GPT-3 has been known to be the most advanced of its kind thanks to the vast amount of data used to pre-train it. To generate sentences after taking an input, GPT-3 uses the field of semantics to understand the meaning of language and try to output a meaningful sentence for the user. The model does not learn what is correct or incorrect as it does not use labelled or supervised.
OpenAI Codex is a direct descendant of GPT-3 that has been trained for programming tasks. Its significantly more capable than GPT-3 in code generation, , because it was trained on a data set that includes a much larger concentration of public source code. Due to memory and data limitations codex uses only 12 billion parameters not like the original GPT-3 model who uses 175 billion parameters.Making it less accurate then GPT-3.
GitHub recently launched Co-Pilot, which is the newest AI auto completion tool. It currently works only on 3 IDES and 2 programming languages and is in a beta phase. What's interesting with Co-Pilot that its based on GPT-3 which can generate sequences of text not only single word like GPT-2. Similarly to Tabnine the Copilot plugin communicates with the IDE sending the context of the current file to the AI model Codex. The model responses with text suggestions that are then displayed in the monitor. Once a suggestion has been chosen the plugin will send back telemetry to improve the suggestions in the future. Unlike Tabnine , Codex is hosted on the cloud and shared with all users, making the community a significant player in their product.
Here is a short demo of copilot and how its able by reading a comment to suggest a full function.
There are a lot of positive reviews on how these tools are efficient. Let me read them.
There are also some critics….
There are many reasons why these products are still not that great , here are a few. For now all products work on the context of a single file, that does not work well on big projects. Where for instance you define functions on different files. It was also found that GPT’s models efficiently decrease when we increase the number parameters, so adding parameters might give us more accurate results but it will take more time for getting the result. Also we need more data to train the models for more parameters which willrequire to scan private repositories, not an easy task . If you recall I mentioned the codex was trained on 12 Billion parameters not 175 Billion like the GPT-3 original model. The tools are relatively new, they need to be trained and used to become better. For most of these reasons there isn’t much that we can actually do , we just need to wait for the next version improvements , however
We can help train products by using them. This way the developers community can help shape the future of these tools.
Now, I am sure that some of you might be asking yourself, so shoul I be looking for a new job? Are these tools going to replace us developers? My short and simple answer is NO. It might change it though. Replacing developers isn’t the aim or something that I think would ever happen. Currently the engine can’t understand a real-world problem, plan a solution, build it and show it off to the world — these tasks are what developers are good at and that will probably won’t change. However, with the power of the developer community , tools like Copilot and Tabnine can be a game changer in the programming industry , not by stealing jobs, but by making developers more productive. We’ve been improving developers’ experience (code editors, debugging tools, etc.) since the last century, and now with the rise of AI technology, we can expect the creation of many more tools using it. New technologies usually create new jobs!
These really cool products also come with challenges yet to be solved. The most straightforward one is the security concern. A sophisticated attacker can target malicious code to a specific prompt that can be picked up by the models, causing users to use malwared code. Licensing issues, what happens when the tool reproduces code snippets that are licensed and under copyright protection? Vulnerabilities & Bugs - Code often contains bugs and given the vast quantity of unvetted code that Copilot and Tabnine has processed, it is certain that the language model has learned from exploitable, buggy code and might suggest it.
So what can we do about it? Don’t blindly accept the tool’s recommendations, same as you will never copy paste blindly a solution from stackoverflow. Don’t let unexperienced developers use the tool without proper guidance , these tools are not ways to learn how to code properly. Use Automated tools for vulnerabilities scans, for instance depndabot or component government Improve your code reviewing skills as its required before accepting any suggestion from these tools.
Today I spoke briefly on one part of the development cycle but I believe that the future of DevOps will be AI-driven. Humans are not equipped to handle the massive volumes of data and computing in daily operations of high traffic products, artificial intelligence will become a critical tool for computing analyzing and transforming how teams develop, deliver, deploy, and manage applications. DevOps and AI can become interdependent. DevOps is a business-driven approach to deliver software, and AI is the technology that can be integrated into the system for enhanced functionality. With the help of AI, DevOps teams can test, code, release, and monitor software more efficiently.
Now that you know about these tools and the algorithms it uses, I hope you will not be afraid of AI being an integrated part of the developer's life cycle. rather excited about it and maybe you will even take action and help shape it. Thank you!

Director of Engineering at GitHub on AI tools for code generation

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Director of Engineering at GitHub on AI tools for code generation

Similar to Director of Engineering at GitHub on AI tools for code generation (20)

More from DevOpsDays Tel Aviv

More from DevOpsDays Tel Aviv (20)

Recently uploaded

Recently uploaded (20)

Director of Engineering at GitHub on AI tools for code generation

Editor's Notes