SlideShare a Scribd company logo
A Comprehensive Review of
Large Language Models for
Code Generation
Presented By: Sai Pragna Kancheti
INTRODUCTION:
 Chatgpt like chatbots has become popular in recent times, These chatbots are natural
language processing tools that are developed for general-purpose and uses artificial
intelligence to generate text after a user enters a prompt.
 Although these chatbots are made for general purpose, they are also good at
generating code from user prompts using Large Language Models
 In this presentation, we are going to systematically review Large Language
Models for code generation base on user prompts
 At the end, based on the results we have presented some Insights for further
research in this direction
What are LLMs?
 A large language model is a more advanced sort of language model that is
developed on vast volumes of text data using deep learning techniques.
 These models can generate human-like text and perform a variety of natural
language processing tasks
 The complexity of a language model can range from simple n-gram models to
more complex neural network models.
 Examples: GPT-3 (Generative Pretrained Transformer 3), BERT (Bidirectional
Encoder Representations from Transformers), RoBERTa (Robustly Optimized
BERT Approach) ,etc.,
LLMs for code generation
 The recent models excel at tasks like code completion and code synthesis
from natural language descriptions.
 One such promising model developed in the recent times is Austin et al.
(2021),which has demonstrated significant progress toward AI-based
programming aid.
 One of the largest of these models, Codex (Chen et al., 2021), has been
deployed as an in-IDE developer assistant that automatically generates code
based on the user's context in the real-world production tool GitHub Copilot1.
 Despite the enormous success of large language models of code, the most
powerful models are not publicly accessible.
LLMs for code
generation
Some of the Existing models of
code,their sizes and
availability(open source or not
open-source ) is shown in the
figure.
Challenges With the available LLMs for code
Generation
 Although these models can show good performance for code generation based
on the user prompt. There are some following challenges needed to be
addressed for these models for further development in this scope
 There was no large open-source language model trained almost exclusively on
code from multiple programming languages.
 Lack of availability of powerful models that are publicly accessible.
 Unavailability of access to the model's internals.
 This prohibits these models from being applied to code generation tasks and
inhibits research in this particular field for low-resource organizations
PRETRAINING
METHODS
Types of Pretraining Methods
Left-to-Right Language Models
 The auto-regressive, left-to-right language models predict the likelihood of a
certain token depending on the sequence of tokens that have come before it
 These models' sequential, left-to-right operation is especially useful for
activities connected to program generation, such as auto-completion code.
 However, because code isn't often produced in a single left-to-right pass,
utilizing context that appears "after" the moment of generation is difficult.
 Examples: CodeParrot, GPT-Neo ,GPT-J (6B) ,Codex (12B), GPT-NeoX (20B),
and Google’s (137B) (Austin et al., 2021)
 These type of the models are considered in review.
Masked Language Models
 While auto-regressive language models are powerful for modeling the
probability of sequences, their unidirectional nature makes them less suitable
for producing effective whole-sequence representations for downstream tasks
such as classification.
 One popular bidirectional objective function used widely in representation
learning is masked language modeling.
 where the aim is to predict masked text pieces based on surrounding context.
 Examples: CodeBERT (125M) and CuBERT (345M) are some of the examples of
these models.
Encoder-decoder Models
 An encoder-decoder model first uses an encoder to encode an input
sequence, and then uses a left-to-right LM to decode an output sequence
conditioned on the input sequence.
 Popular pretraining objectives include masked span prediction where the
input sequence is randomly masked with multiple masks and the output
sequence are the masked contents in order
 and denoising sequence reconstruction where the input is a corrupted
sequence and the output is the original sequence.
 These pretrained models are useful in many sequence-to-sequence tasks
 Examples: CodeT5 (220M) and PLBART (406M)
COMPARED MODELS
Existing Models
 Codex: Codex is a Language Learning Model (LLM) that has been specifically
adjusted using Python code available to the public on GitHub.
 This model employs GPT-3 due to its substantial proficiency in creating Python
programs. Despite being considerably smaller than GPT-3, with a total of 12 billion
parameters, Codex still exhibits remarkable performance.
 GPT-Neo: GPT-Neo is a series of substantial large language models have been trained
on the Pile dataset.
 These models, similar to GPT-3, are available in different sizes including 125M, 1.3B,
and 2.7B parameter versions.
 The GPT-Neo 2.7B version, in particular, is a transformer model that has been
developed based on EleutherAI's recreation of the GPT-3 architecture.
Existing Models
 GPT-J : GPT-J, developed by EleutherAI, is an open source model with 6 billion
parameters, trained on The Pile dataset.
 It largely adheres to the GPT-2 architecture and stands out as the highest performing
transformer language model available to the public, in terms of its zero-shot performance
on a range of subsequent tasks.
 CodeParrot: CodeParrot is a model based on GPT-2, possessing 1.5 billion
parameters, which has been specifically fine-tuned using publicly accessible code from
GitHub for the purpose of generating Python code
Introduced model- PolyCoder
 To overcome the challenges of
available LLMs for code
generation a new PolyCoder
model is introduced , which
boasts 2.7 billion parameters,
is trained on a diverse range of
repositories sourced from
GitHub, encompassing 12
distinct programming
languages. As shown in the
table
PolyCoder’s Training
 Polycoder uses the GPT-2 model architecture.
 To investigate the effect of model size scaling, it was trained using three
different model sizes: 2.7 billion, 400 million, and 160 million parameters,
with the largest 2.7B model equalling GPT-Neo's capacity to allow a fair
comparison
 The 2.7 billion parameter model is a 32-layer, 2,560 dimensional Transformer
model with a maximum context window of 2048 tokens, and it was trained
using a batch size of 128 sequences (262K tokens) for a total of 150K steps
PolyCoder’s Training
 The following table is a
Comparison of design
decisions and hyper-
parameters in training
different models of code.
PolyCoder’s Training
 The following figure is
the Training and
validation loss during the
150K step training
process
Results
Results of Extrinsic evaluations:
 Among the current models, PolyCoder performs less effectively than the comparably
sized GPT-Neo and even the smaller Codex 300M. In the grand scheme of things,
PolyCoder ranks after Codex, GPT-Neo/J, but outperforms CodeParrot
 Despite being trained exclusively on code, PolyCoder lags behind a model of similar
size, GPT-Neo 2.7B, which was trained on the Pile, a mix of both code and natural
language texts
 This finding implies that future studies could profit from mixing code from diverse
programming languages, along with natural language text
Results of Extrinsic evaluations:
 The following table
shows results of different
models on the
HumanEval benchmark,
and the number of
different typesof tokens
seen during the training
process.
Results of Intrinsic Evaluations
 Interestingly, PolyCoder surpasses Codex and all other models when it comes to the C
language. When considering only open-source models, PolyCoder outperforms the
similarly sized GPT-Neo 2.7B in C, JavaScript, Rust, Scala, and TypeScript
 In the remaining 11 languages apart from C, all other open-source models, including
the newly introduced PolyCoder, exhibit significantly lower performance (higher
perplexity) compared to Codex.
 This observation could imply that for languages where larger models don't yield extra
benefits, training the model solely on code might be sufficient or even slightly more
advantageous than training on a combination of natural language and code
Conclusions
 We've presented the results of a systematic evaluatoion of large language models for
code. The findings generally indicate that performance improves with bigger models
and extended training durations.
 Based on the results, we infer that GPT-Neo's superior performance over PolyCoder in
certain languages suggests that training on both natural language text and code can
enhance code modeling
 However, it's noteworthy that in the realm of the C programming language, PolyCoder
outperforms all models, including Codex, by achieving a lower perplexity
Thank
You

More Related Content

What's hot

LLMs Bootcamp
LLMs BootcampLLMs Bootcamp
LLMs Bootcamp
Fiza987241
 
Generative-AI-in-enterprise-20230615.pdf
Generative-AI-in-enterprise-20230615.pdfGenerative-AI-in-enterprise-20230615.pdf
Generative-AI-in-enterprise-20230615.pdf
Liming Zhu
 
Fine tuning large LMs
Fine tuning large LMsFine tuning large LMs
Fine tuning large LMs
SylvainGugger
 
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
taozen
 
Generative Models and ChatGPT
Generative Models and ChatGPTGenerative Models and ChatGPT
Generative Models and ChatGPT
Loic Merckel
 
Let's talk about GPT: A crash course in Generative AI for researchers
Let's talk about GPT: A crash course in Generative AI for researchersLet's talk about GPT: A crash course in Generative AI for researchers
Let's talk about GPT: A crash course in Generative AI for researchers
Steven Van Vaerenbergh
 
LLMs_talk_March23.pdf
LLMs_talk_March23.pdfLLMs_talk_March23.pdf
LLMs_talk_March23.pdf
ChaoYang81
 
Neural Language Generation Head to Toe
Neural Language Generation Head to Toe Neural Language Generation Head to Toe
Neural Language Generation Head to Toe
Hady Elsahar
 
How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...
ssuser4edc93
 
Prompting is an art / Sztuka promptowania
Prompting is an art / Sztuka promptowaniaPrompting is an art / Sztuka promptowania
Prompting is an art / Sztuka promptowania
Michal Jaskolski
 
Implications of GPT-3
Implications of GPT-3Implications of GPT-3
Implications of GPT-3
Raven Jiang
 
ChatGPT, Foundation Models and Web3.pptx
ChatGPT, Foundation Models and Web3.pptxChatGPT, Foundation Models and Web3.pptx
ChatGPT, Foundation Models and Web3.pptx
Jesus Rodriguez
 
Introduction to LLMs
Introduction to LLMsIntroduction to LLMs
Introduction to LLMs
Loic Merckel
 
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
DianaGray10
 
‘Big models’: the success and pitfalls of Transformer models in natural langu...
‘Big models’: the success and pitfalls of Transformer models in natural langu...‘Big models’: the success and pitfalls of Transformer models in natural langu...
‘Big models’: the success and pitfalls of Transformer models in natural langu...
Leiden University
 
An Introduction to Generative AI - May 18, 2023
An Introduction  to Generative AI - May 18, 2023An Introduction  to Generative AI - May 18, 2023
An Introduction to Generative AI - May 18, 2023
CoriFaklaris1
 
OpenAI’s GPT 3 Language Model - guest Steve Omohundro
OpenAI’s GPT 3 Language Model - guest Steve OmohundroOpenAI’s GPT 3 Language Model - guest Steve Omohundro
OpenAI’s GPT 3 Language Model - guest Steve Omohundro
Numenta
 
Using Large Language Models in 10 Lines of Code
Using Large Language Models in 10 Lines of CodeUsing Large Language Models in 10 Lines of Code
Using Large Language Models in 10 Lines of Code
Gautier Marti
 
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer RoadmapEpisode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Anant Corporation
 
Chain-of-thought Prompting.pptx
Chain-of-thought Prompting.pptxChain-of-thought Prompting.pptx
Chain-of-thought Prompting.pptx
NeethaSherra1
 

What's hot (20)

LLMs Bootcamp
LLMs BootcampLLMs Bootcamp
LLMs Bootcamp
 
Generative-AI-in-enterprise-20230615.pdf
Generative-AI-in-enterprise-20230615.pdfGenerative-AI-in-enterprise-20230615.pdf
Generative-AI-in-enterprise-20230615.pdf
 
Fine tuning large LMs
Fine tuning large LMsFine tuning large LMs
Fine tuning large LMs
 
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
 
Generative Models and ChatGPT
Generative Models and ChatGPTGenerative Models and ChatGPT
Generative Models and ChatGPT
 
Let's talk about GPT: A crash course in Generative AI for researchers
Let's talk about GPT: A crash course in Generative AI for researchersLet's talk about GPT: A crash course in Generative AI for researchers
Let's talk about GPT: A crash course in Generative AI for researchers
 
LLMs_talk_March23.pdf
LLMs_talk_March23.pdfLLMs_talk_March23.pdf
LLMs_talk_March23.pdf
 
Neural Language Generation Head to Toe
Neural Language Generation Head to Toe Neural Language Generation Head to Toe
Neural Language Generation Head to Toe
 
How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...
 
Prompting is an art / Sztuka promptowania
Prompting is an art / Sztuka promptowaniaPrompting is an art / Sztuka promptowania
Prompting is an art / Sztuka promptowania
 
Implications of GPT-3
Implications of GPT-3Implications of GPT-3
Implications of GPT-3
 
ChatGPT, Foundation Models and Web3.pptx
ChatGPT, Foundation Models and Web3.pptxChatGPT, Foundation Models and Web3.pptx
ChatGPT, Foundation Models and Web3.pptx
 
Introduction to LLMs
Introduction to LLMsIntroduction to LLMs
Introduction to LLMs
 
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
 
‘Big models’: the success and pitfalls of Transformer models in natural langu...
‘Big models’: the success and pitfalls of Transformer models in natural langu...‘Big models’: the success and pitfalls of Transformer models in natural langu...
‘Big models’: the success and pitfalls of Transformer models in natural langu...
 
An Introduction to Generative AI - May 18, 2023
An Introduction  to Generative AI - May 18, 2023An Introduction  to Generative AI - May 18, 2023
An Introduction to Generative AI - May 18, 2023
 
OpenAI’s GPT 3 Language Model - guest Steve Omohundro
OpenAI’s GPT 3 Language Model - guest Steve OmohundroOpenAI’s GPT 3 Language Model - guest Steve Omohundro
OpenAI’s GPT 3 Language Model - guest Steve Omohundro
 
Using Large Language Models in 10 Lines of Code
Using Large Language Models in 10 Lines of CodeUsing Large Language Models in 10 Lines of Code
Using Large Language Models in 10 Lines of Code
 
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer RoadmapEpisode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
 
Chain-of-thought Prompting.pptx
Chain-of-thought Prompting.pptxChain-of-thought Prompting.pptx
Chain-of-thought Prompting.pptx
 

Similar to A Comprehensive Review of Large Language Models for.pptx

Smart modeling of smart software
Smart modeling of smart softwareSmart modeling of smart software
Smart modeling of smart software
Jordi Cabot
 
codex.pptx
codex.pptxcodex.pptx
codex.pptx
ASHISH KUMAR
 
SPOTLIGHT IGNITE (10 MINUTES): THE FUTURE OF DEVELOPER TOOLS: FROM STACKOVERF...
SPOTLIGHT IGNITE (10 MINUTES): THE FUTURE OF DEVELOPER TOOLS: FROM STACKOVERF...SPOTLIGHT IGNITE (10 MINUTES): THE FUTURE OF DEVELOPER TOOLS: FROM STACKOVERF...
SPOTLIGHT IGNITE (10 MINUTES): THE FUTURE OF DEVELOPER TOOLS: FROM STACKOVERF...
DevOpsDays Tel Aviv
 
고급컴파일러구성론_개레_230303.pptx
고급컴파일러구성론_개레_230303.pptx고급컴파일러구성론_개레_230303.pptx
고급컴파일러구성론_개레_230303.pptx
ssuser1e7611
 
Recent Trends in Translation of Programming Languages using NLP Approaches
Recent Trends in Translation of Programming Languages using NLP ApproachesRecent Trends in Translation of Programming Languages using NLP Approaches
Recent Trends in Translation of Programming Languages using NLP Approaches
IRJET Journal
 
Software Modeling and Artificial Intelligence: friends or foes?
Software Modeling and Artificial Intelligence: friends or foes?Software Modeling and Artificial Intelligence: friends or foes?
Software Modeling and Artificial Intelligence: friends or foes?
Jordi Cabot
 
short-story.pptx
short-story.pptxshort-story.pptx
short-story.pptx
SravaniRaparla
 
Model-To-Text Transformation Language chapter 9 – J Cabot model driven engine...
Model-To-Text Transformation Language chapter 9 – J Cabot model driven engine...Model-To-Text Transformation Language chapter 9 – J Cabot model driven engine...
Model-To-Text Transformation Language chapter 9 – J Cabot model driven engine...
majid lotfinia
 
Ready, set, go! An introduction to the Go programming language
Ready, set, go! An introduction to the Go programming languageReady, set, go! An introduction to the Go programming language
Ready, set, go! An introduction to the Go programming language
RTigger
 
WEBSITE DEVELOPMENT
WEBSITE DEVELOPMENTWEBSITE DEVELOPMENT
WEBSITE DEVELOPMENT
shahzadebaujiti
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
Lionel Briand
 
STATICMOCK : A Mock Object Framework for Compiled Languages
STATICMOCK : A Mock Object Framework for Compiled Languages STATICMOCK : A Mock Object Framework for Compiled Languages
STATICMOCK : A Mock Object Framework for Compiled Languages
ijseajournal
 
mbeddr meets IncQuer - Combining the Best Features of Two Modeling Worlds
mbeddr meets IncQuer - Combining the Best Features of Two Modeling Worldsmbeddr meets IncQuer - Combining the Best Features of Two Modeling Worlds
mbeddr meets IncQuer - Combining the Best Features of Two Modeling Worlds
Istvan Rath
 
Model driven software engineering in practice book - Chapter 9 - Model to tex...
Model driven software engineering in practice book - Chapter 9 - Model to tex...Model driven software engineering in practice book - Chapter 9 - Model to tex...
Model driven software engineering in practice book - Chapter 9 - Model to tex...
Marco Brambilla
 
New microsoft office word document
New microsoft office word documentNew microsoft office word document
New microsoft office word documentSIVAJISADHANA
 
New microsoft office word document
New microsoft office word documentNew microsoft office word document
New microsoft office word document
SIVAJISADHANA
 
New microsoft office word document
New microsoft office word documentNew microsoft office word document
New microsoft office word document
SIVAJISADHANA
 

Similar to A Comprehensive Review of Large Language Models for.pptx (20)

Smart modeling of smart software
Smart modeling of smart softwareSmart modeling of smart software
Smart modeling of smart software
 
codex.pptx
codex.pptxcodex.pptx
codex.pptx
 
SPOTLIGHT IGNITE (10 MINUTES): THE FUTURE OF DEVELOPER TOOLS: FROM STACKOVERF...
SPOTLIGHT IGNITE (10 MINUTES): THE FUTURE OF DEVELOPER TOOLS: FROM STACKOVERF...SPOTLIGHT IGNITE (10 MINUTES): THE FUTURE OF DEVELOPER TOOLS: FROM STACKOVERF...
SPOTLIGHT IGNITE (10 MINUTES): THE FUTURE OF DEVELOPER TOOLS: FROM STACKOVERF...
 
01 overview
01 overview01 overview
01 overview
 
01 overview
01 overview01 overview
01 overview
 
고급컴파일러구성론_개레_230303.pptx
고급컴파일러구성론_개레_230303.pptx고급컴파일러구성론_개레_230303.pptx
고급컴파일러구성론_개레_230303.pptx
 
Recent Trends in Translation of Programming Languages using NLP Approaches
Recent Trends in Translation of Programming Languages using NLP ApproachesRecent Trends in Translation of Programming Languages using NLP Approaches
Recent Trends in Translation of Programming Languages using NLP Approaches
 
Software Modeling and Artificial Intelligence: friends or foes?
Software Modeling and Artificial Intelligence: friends or foes?Software Modeling and Artificial Intelligence: friends or foes?
Software Modeling and Artificial Intelligence: friends or foes?
 
short-story.pptx
short-story.pptxshort-story.pptx
short-story.pptx
 
Model-To-Text Transformation Language chapter 9 – J Cabot model driven engine...
Model-To-Text Transformation Language chapter 9 – J Cabot model driven engine...Model-To-Text Transformation Language chapter 9 – J Cabot model driven engine...
Model-To-Text Transformation Language chapter 9 – J Cabot model driven engine...
 
Ready, set, go! An introduction to the Go programming language
Ready, set, go! An introduction to the Go programming languageReady, set, go! An introduction to the Go programming language
Ready, set, go! An introduction to the Go programming language
 
thrift-20070401
thrift-20070401thrift-20070401
thrift-20070401
 
WEBSITE DEVELOPMENT
WEBSITE DEVELOPMENTWEBSITE DEVELOPMENT
WEBSITE DEVELOPMENT
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 
STATICMOCK : A Mock Object Framework for Compiled Languages
STATICMOCK : A Mock Object Framework for Compiled Languages STATICMOCK : A Mock Object Framework for Compiled Languages
STATICMOCK : A Mock Object Framework for Compiled Languages
 
mbeddr meets IncQuer - Combining the Best Features of Two Modeling Worlds
mbeddr meets IncQuer - Combining the Best Features of Two Modeling Worldsmbeddr meets IncQuer - Combining the Best Features of Two Modeling Worlds
mbeddr meets IncQuer - Combining the Best Features of Two Modeling Worlds
 
Model driven software engineering in practice book - Chapter 9 - Model to tex...
Model driven software engineering in practice book - Chapter 9 - Model to tex...Model driven software engineering in practice book - Chapter 9 - Model to tex...
Model driven software engineering in practice book - Chapter 9 - Model to tex...
 
New microsoft office word document
New microsoft office word documentNew microsoft office word document
New microsoft office word document
 
New microsoft office word document
New microsoft office word documentNew microsoft office word document
New microsoft office word document
 
New microsoft office word document
New microsoft office word documentNew microsoft office word document
New microsoft office word document
 

Recently uploaded

How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
Celine George
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
EugeneSaldivar
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
EduSkills OECD
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
Vivekanand Anglo Vedic Academy
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
DeeptiGupta154
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
Thiyagu K
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 
678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf
CarlosHernanMontoyab2
 
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdfAdversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Po-Chuan Chen
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
DhatriParmar
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
Jean Carlos Nunes Paixão
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
beazzy04
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
Atul Kumar Singh
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
EverAndrsGuerraGuerr
 

Recently uploaded (20)

How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 
678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf
 
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdfAdversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
 

A Comprehensive Review of Large Language Models for.pptx

  • 1. A Comprehensive Review of Large Language Models for Code Generation Presented By: Sai Pragna Kancheti
  • 2. INTRODUCTION:  Chatgpt like chatbots has become popular in recent times, These chatbots are natural language processing tools that are developed for general-purpose and uses artificial intelligence to generate text after a user enters a prompt.  Although these chatbots are made for general purpose, they are also good at generating code from user prompts using Large Language Models  In this presentation, we are going to systematically review Large Language Models for code generation base on user prompts  At the end, based on the results we have presented some Insights for further research in this direction
  • 3. What are LLMs?  A large language model is a more advanced sort of language model that is developed on vast volumes of text data using deep learning techniques.  These models can generate human-like text and perform a variety of natural language processing tasks  The complexity of a language model can range from simple n-gram models to more complex neural network models.  Examples: GPT-3 (Generative Pretrained Transformer 3), BERT (Bidirectional Encoder Representations from Transformers), RoBERTa (Robustly Optimized BERT Approach) ,etc.,
  • 4. LLMs for code generation  The recent models excel at tasks like code completion and code synthesis from natural language descriptions.  One such promising model developed in the recent times is Austin et al. (2021),which has demonstrated significant progress toward AI-based programming aid.  One of the largest of these models, Codex (Chen et al., 2021), has been deployed as an in-IDE developer assistant that automatically generates code based on the user's context in the real-world production tool GitHub Copilot1.  Despite the enormous success of large language models of code, the most powerful models are not publicly accessible.
  • 5. LLMs for code generation Some of the Existing models of code,their sizes and availability(open source or not open-source ) is shown in the figure.
  • 6. Challenges With the available LLMs for code Generation  Although these models can show good performance for code generation based on the user prompt. There are some following challenges needed to be addressed for these models for further development in this scope  There was no large open-source language model trained almost exclusively on code from multiple programming languages.  Lack of availability of powerful models that are publicly accessible.  Unavailability of access to the model's internals.  This prohibits these models from being applied to code generation tasks and inhibits research in this particular field for low-resource organizations
  • 9. Left-to-Right Language Models  The auto-regressive, left-to-right language models predict the likelihood of a certain token depending on the sequence of tokens that have come before it  These models' sequential, left-to-right operation is especially useful for activities connected to program generation, such as auto-completion code.  However, because code isn't often produced in a single left-to-right pass, utilizing context that appears "after" the moment of generation is difficult.  Examples: CodeParrot, GPT-Neo ,GPT-J (6B) ,Codex (12B), GPT-NeoX (20B), and Google’s (137B) (Austin et al., 2021)  These type of the models are considered in review.
  • 10. Masked Language Models  While auto-regressive language models are powerful for modeling the probability of sequences, their unidirectional nature makes them less suitable for producing effective whole-sequence representations for downstream tasks such as classification.  One popular bidirectional objective function used widely in representation learning is masked language modeling.  where the aim is to predict masked text pieces based on surrounding context.  Examples: CodeBERT (125M) and CuBERT (345M) are some of the examples of these models.
  • 11. Encoder-decoder Models  An encoder-decoder model first uses an encoder to encode an input sequence, and then uses a left-to-right LM to decode an output sequence conditioned on the input sequence.  Popular pretraining objectives include masked span prediction where the input sequence is randomly masked with multiple masks and the output sequence are the masked contents in order  and denoising sequence reconstruction where the input is a corrupted sequence and the output is the original sequence.  These pretrained models are useful in many sequence-to-sequence tasks  Examples: CodeT5 (220M) and PLBART (406M)
  • 13. Existing Models  Codex: Codex is a Language Learning Model (LLM) that has been specifically adjusted using Python code available to the public on GitHub.  This model employs GPT-3 due to its substantial proficiency in creating Python programs. Despite being considerably smaller than GPT-3, with a total of 12 billion parameters, Codex still exhibits remarkable performance.  GPT-Neo: GPT-Neo is a series of substantial large language models have been trained on the Pile dataset.  These models, similar to GPT-3, are available in different sizes including 125M, 1.3B, and 2.7B parameter versions.  The GPT-Neo 2.7B version, in particular, is a transformer model that has been developed based on EleutherAI's recreation of the GPT-3 architecture.
  • 14. Existing Models  GPT-J : GPT-J, developed by EleutherAI, is an open source model with 6 billion parameters, trained on The Pile dataset.  It largely adheres to the GPT-2 architecture and stands out as the highest performing transformer language model available to the public, in terms of its zero-shot performance on a range of subsequent tasks.  CodeParrot: CodeParrot is a model based on GPT-2, possessing 1.5 billion parameters, which has been specifically fine-tuned using publicly accessible code from GitHub for the purpose of generating Python code
  • 15. Introduced model- PolyCoder  To overcome the challenges of available LLMs for code generation a new PolyCoder model is introduced , which boasts 2.7 billion parameters, is trained on a diverse range of repositories sourced from GitHub, encompassing 12 distinct programming languages. As shown in the table
  • 16. PolyCoder’s Training  Polycoder uses the GPT-2 model architecture.  To investigate the effect of model size scaling, it was trained using three different model sizes: 2.7 billion, 400 million, and 160 million parameters, with the largest 2.7B model equalling GPT-Neo's capacity to allow a fair comparison  The 2.7 billion parameter model is a 32-layer, 2,560 dimensional Transformer model with a maximum context window of 2048 tokens, and it was trained using a batch size of 128 sequences (262K tokens) for a total of 150K steps
  • 17. PolyCoder’s Training  The following table is a Comparison of design decisions and hyper- parameters in training different models of code.
  • 18. PolyCoder’s Training  The following figure is the Training and validation loss during the 150K step training process
  • 20. Results of Extrinsic evaluations:  Among the current models, PolyCoder performs less effectively than the comparably sized GPT-Neo and even the smaller Codex 300M. In the grand scheme of things, PolyCoder ranks after Codex, GPT-Neo/J, but outperforms CodeParrot  Despite being trained exclusively on code, PolyCoder lags behind a model of similar size, GPT-Neo 2.7B, which was trained on the Pile, a mix of both code and natural language texts  This finding implies that future studies could profit from mixing code from diverse programming languages, along with natural language text
  • 21. Results of Extrinsic evaluations:  The following table shows results of different models on the HumanEval benchmark, and the number of different typesof tokens seen during the training process.
  • 22. Results of Intrinsic Evaluations  Interestingly, PolyCoder surpasses Codex and all other models when it comes to the C language. When considering only open-source models, PolyCoder outperforms the similarly sized GPT-Neo 2.7B in C, JavaScript, Rust, Scala, and TypeScript  In the remaining 11 languages apart from C, all other open-source models, including the newly introduced PolyCoder, exhibit significantly lower performance (higher perplexity) compared to Codex.  This observation could imply that for languages where larger models don't yield extra benefits, training the model solely on code might be sufficient or even slightly more advantageous than training on a combination of natural language and code
  • 23. Conclusions  We've presented the results of a systematic evaluatoion of large language models for code. The findings generally indicate that performance improves with bigger models and extended training durations.  Based on the results, we infer that GPT-Neo's superior performance over PolyCoder in certain languages suggests that training on both natural language text and code can enhance code modeling  However, it's noteworthy that in the realm of the C programming language, PolyCoder outperforms all models, including Codex, by achieving a lower perplexity