odsc_2023.pdf

•

0 likes•13 views

There are so many external API(OpenAI, Bard,...) and open source models (LLAMA, Mistral, ..) building a user facing application must be easy! What could go wrong? What do we have to think about before creating experiences? Here is a short glimpse of some of things you need to think of for building your own application Finetuning or using pre-trained models Token optimizations: every word costs time and money Building small ML models vs using prompts for all tasks Prompt Engineering Prompt versioning Building an evaluation framework Engineering challenges for streaming data Moderation & safety of LLMs .... and the list goes on.

Engineering

https://aws.amazon.com/blogs/machine-learning/learn-how-to-build-and-deploy-tool-using-llm-agents-using-aws-sagemaker-jumpstart-foundation-models/
LLM powered Applications: Intelligent Agents

Generative AI project Lifecycle
Deeplearning.ai

What keeps
me up at
night?
• LLM Models: FineTune vs External API
• Token Optimizations & Latency
• Building a robust evaluations framework
• Prompt Engineering
• Engineering challenges
• Building small LMs vs using prompts for most ML tasks
• Prompt versioning
• When should we use RAGs?
• Moderation and safety guardrails
• A/B testing prompt versions, Agent versions, LLM models :
what creates the best consumer experience?

LLM Models: To finetune or not ?
External API
• Hosted by third party : reliable uptime
• Wide range of use cases
• Prompts are developed by community
• Should have good data privacy and safety
measures
Finetuned Open Source Models
Pros
Cons
• Models are not trained on specific use case
which could produce lower quality results.
• Paying an external Vendor (example: OpenAI)
can be expensive.
A great place to start building your
first consumer facing applications
• Smaller Models
• Data is not send to external API
• Transparency: investigate code
• Scope for innovation and collaboration
• Full Finetuning
• PEFT Finetuning
Pros
Cons
• Self Hosting can be expensive
• Since code is open, its vulnerable to hacking
• Full fine tuning : lose its ability to handle
general behaviors and result in poor
performance on tasks it wasn't originally
trained for.
Finetuned GPT-3.5
Once you have collected data , gathered
expertise in LLMs – its time to finetune
If your application is build on GPT-3.5
finetuning it improves performance
Pros
Cons
OpenAI, Cluade, Bard, … LLAMA, Falcon, T5, …
• Application/agent build with GPT-3.5 can have
performance similar to GPT-4.
• Less expensive.
• Pipeline for training is available & documented.
• Use prompting & develop on already available
resources.
• Tied to OpenAI.
• Could get more expensive in future.
• Code is a black box.

Token Optimization & Latency
Every word costs money and
takes time!!
Model Parameters
GPT - 4 1.76 T
GPT - 3.5 175 B
Claude 93-137 B
LLAMA 7-70B
Optimization Techniques
• Use smaller LMs to do classifications, NER & other
relevant models
• Context Summarization
• Stop word removal
• Make fewer call to LLMs
• Optimize prompt sizes & combine prompts.
• Specify token limit for content generated by LLMs
• Finetuning: use smaller models with task specific data
to achieve similar performance without prompts
• Queue responses to stay within TPM limits

Prompt Engineering
https://cobusgreyling.medium.com/eight-prompt-engineering-implementations-fc361fdc87b

Building a robust Evaluation Framework
Constantly evolves: needs versioning
Offline Online

Engineering challenges
Streaming output gives a better user
experience
• Text is broken into chunks , chunks need to be re-
processed to create the output, increases compute
requirements & needs real time processing.
• Use of coroutines while building a fast API endpoint
to ensure concurrent requests.
• Use of singleton design to make sure that the same
function is not instantiated multiple times.
• As systems are build by stacking multiple layers for
intelligent decision making latency can increase with
high traffic. This can lead to timeouts. Building a
queuing system can help with timeouts and sub
optimal user experience.
• LLM results are not deterministic : they are ML
models!

Thank You
Taranveer Singh, Snir Orlanczyk, Hardik Nahata, Bonaventure Raj
A huge shout out to my team!
https://www.linkedin.com/in/sanghamitra-deb-ml/

Similar to odsc_2023.pdf

Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016MLconf

Building A Machine Learning Platform At Quora (1)Nikhil Garg

Benefits of a Homemade ML PlatformGetInData

Automation Test FrameworkSachin-QA

ResumeAmaravarman Jayakumar

Qtp - Introduction valuesVibrant Technologies & Computers

Agile MDDfntnhd

Serverless Functions and Machine Learning: Putting the AI in APIsNordic APIs

10 Limitations of Large Language Models and Mitigation OptionsMihai Criveti

Introduction to GoLangNVISIA

System Development Life Cycle ModelsPavithran Anthonipillai

FazilShaikh Resume 13th januaryfazilahmed sheikh

Lecture3.se.pptxAmna Ch

QA Team Goes to Agile and Continuous integrationSujit Ghosh

Overcoming software development challenges by using an integrated software fr...Design World

10 Reasons You MUST Consider Pattern-Aware ProgrammingPostSharp Technologies

Opticon18: Developer NightOptimizely

Optimus XPages: An Explosion of Techniques and Best PracticesTeamstudio

Making software development processes to work for youAmbientia

Similar to odsc_2023.pdf (20)

Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016

Building A Machine Learning Platform At Quora (1)

Benefits of a Homemade ML Platform

Automation Test Framework

Resume

Qtp - Introduction values

Agile MDD

Serverless Functions and Machine Learning: Putting the AI in APIs

10 Limitations of Large Language Models and Mitigation Options

Introduction to GoLang

System Development Life Cycle Models

FazilShaikh Resume 13th january

Lecture3.se.pptx

QA Team Goes to Agile and Continuous integration

Overcoming software development challenges by using an integrated software fr...

10 Reasons You MUST Consider Pattern-Aware Programming

Opticon18: Developer Night

Optimus XPages: An Explosion of Techniques and Best Practices

Making software development processes to work for you

Recently uploaded

VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor

Analog to Digital and Digital to Analog ConverterAbhinavSharma374939

Biology for Computer Engineers Course Handout.pptxDeepakSakkari2

IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...RajaP95

HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95

Internship report on mechanical engineeringmalavadedarshan25

ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZTE

Architect Hassan Khalil Portfolio for 2024hassan khalil

Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona

Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile

GDSC ASEB Gen AI study jams presentationGDSCAESB

(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat

High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile

VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor

Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665

Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia

(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat

Introduction to IEEE STANDARDS and its different types.pptxupamatechverse

Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh

Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR9953056974 Low Rate Call Girls In Saket, Delhi NCR

Recently uploaded (20)

VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130

Analog to Digital and Digital to Analog Converter

Biology for Computer Engineers Course Handout.pptx

IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...

HARMONY IN THE NATURE AND EXISTENCE - Unit-IV

Internship report on mechanical engineering

ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...

Architect Hassan Khalil Portfolio for 2024

Processing & Properties of Floor and Wall Tiles.pptx

Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts

GDSC ASEB Gen AI study jams presentation

(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts

High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts

VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130

Call Girls Delhi {Jodhpur} 9711199012 high profile service

Software Development Life Cycle By Team Orange (Dept. of Pharmacy)

(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...

Introduction to IEEE STANDARDS and its different types.pptx

Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝

Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR

odsc_2023.pdf

1. Building LLM Driven Applications: Promises and Pitfalls Sanghamitra Deb, PhD. Engineering Manager , Gen AI & ML Chegg Inc

2. https://aws.amazon.com/blogs/machine-learning/learn-how-to-build-and-deploy-tool-using-llm-agents-using-aws-sagemaker-jumpstart-foundation-models/ LLM powered Applications: Intelligent Agents

3. Generative AI project Lifecycle Deeplearning.ai

4. What keeps me up at night? • LLM Models: FineTune vs External API • Token Optimizations & Latency • Building a robust evaluations framework • Prompt Engineering • Engineering challenges • Building small LMs vs using prompts for most ML tasks • Prompt versioning • When should we use RAGs? • Moderation and safety guardrails • A/B testing prompt versions, Agent versions, LLM models : what creates the best consumer experience?

5. LLM Models: To finetune or not ? External API • Hosted by third party : reliable uptime • Wide range of use cases • Prompts are developed by community • Should have good data privacy and safety measures Finetuned Open Source Models Pros Cons • Models are not trained on specific use case which could produce lower quality results. • Paying an external Vendor (example: OpenAI) can be expensive. A great place to start building your first consumer facing applications • Smaller Models • Data is not send to external API • Transparency: investigate code • Scope for innovation and collaboration • Full Finetuning • PEFT Finetuning Pros Cons • Self Hosting can be expensive • Since code is open, its vulnerable to hacking • Full fine tuning : lose its ability to handle general behaviors and result in poor performance on tasks it wasn't originally trained for. Finetuned GPT-3.5 Once you have collected data , gathered expertise in LLMs – its time to finetune If your application is build on GPT-3.5 finetuning it improves performance Pros Cons OpenAI, Cluade, Bard, … LLAMA, Falcon, T5, … • Application/agent build with GPT-3.5 can have performance similar to GPT-4. • Less expensive. • Pipeline for training is available & documented. • Use prompting & develop on already available resources. • Tied to OpenAI. • Could get more expensive in future. • Code is a black box.

6. Token Optimization & Latency Every word costs money and takes time!! Model Parameters GPT - 4 1.76 T GPT - 3.5 175 B Claude 93-137 B LLAMA 7-70B Optimization Techniques • Use smaller LMs to do classifications, NER & other relevant models • Context Summarization • Stop word removal • Make fewer call to LLMs • Optimize prompt sizes & combine prompts. • Specify token limit for content generated by LLMs • Finetuning: use smaller models with task specific data to achieve similar performance without prompts • Queue responses to stay within TPM limits

7. Prompt Engineering https://cobusgreyling.medium.com/eight-prompt-engineering-implementations-fc361fdc87b

8. Building a robust Evaluation Framework Constantly evolves: needs versioning Offline Online

9. Engineering challenges Streaming output gives a better user experience • Text is broken into chunks , chunks need to be re- processed to create the output, increases compute requirements & needs real time processing. • Use of coroutines while building a fast API endpoint to ensure concurrent requests. • Use of singleton design to make sure that the same function is not instantiated multiple times. • As systems are build by stacking multiple layers for intelligent decision making latency can increase with high traffic. This can lead to timeouts. Building a queuing system can help with timeouts and sub optimal user experience. • LLM results are not deterministic : they are ML models!

10.

11. Thank You Taranveer Singh, Snir Orlanczyk, Hardik Nahata, Bonaventure Raj A huge shout out to my team! https://www.linkedin.com/in/sanghamitra-deb-ml/

odsc_2023.pdf

Recommended

Recommended

More Related Content

Similar to odsc_2023.pdf

Similar to odsc_2023.pdf (20)

More from Sanghamitra Deb

More from Sanghamitra Deb (17)

Recently uploaded

Recently uploaded (20)

odsc_2023.pdf