SlideShare a Scribd company logo
Open Source LLMs:
Viable for Production or
a Low-Quality Toy?
M Waleed Kadous
Chief Scientist, Anyscale
What we’ll cover
- Propietary vs Open LLMs
- Examples of people using Open LLMs in production
- Why people use Open LLMs (with supporting experiments)
- Cost
- Deployment Flexibility
- Fine-tuning options
- Where Open LLMs are lagging
- Quality
- Instruction following
- Missing features
- Function Templates
- Big context windows
2
Summary
Open Models are viable in production – people are using them already
It is often possible to get close to commercial LLM quality
Small fine-tuned models outperform giant general models (sometimes)
It is often radically cheaper (e.g. 30x)
Usually takes a bit of extra work e.g. prompt tuning, post-processing
OS Models still missing key features (but being worked on)
3
Being used already!
endpoints.anyscale.com – right now, use an open LLM in 2 minutes
4 models:
- Llama 2 7B, 13B, 70B
- Code Llama 34B Instruct
$0.15 per million tokens to $1 per million tokens
Some quotes from our customers
4
5
Merlin
“We use Anyscale Endpoints to power
consumer-facing services that have
reach to millions of users … Anyscale
Endpoints gives us 5x-8x cost
advantages over alternatives, making
it easy for us to make Merlin even more
powerful while staying affordable for
millions of users.”
Some quotes from our customers
Realchar.ai
“Realchar.ai is about delivering
immersive, realistic experiences for our
users, not fighting infrastructure or
upgrading open source models.
Endpoints made it possible for us to
introduce new services in hours, instead
of weeks, and for a fraction of the cost of
proprietary services. It also enables us
to seamlessly personalize user
experiences at scale.”
We are using Open LLMs: docs.ray.io
6
Endless possibilities for AI innovation.
AI app serving & routing
Model training & continuous tuning
Python-native Workspaces
GPU/CPU optimizations
Multi-Cloud, auto-scaling
Anyscale AI Platform
Anyscale Endpoints
LLMs served via API
LLMs fine-tuned via API
Ray AI Libraries Ray Core
Ray Open Source
Serve your LLMs from your Cloud
Fine-tune & customize in your Cloud
Anyscale Private
Endpoints
Your options for LLMs
Proprietary
OpenAI, Anthropic, Cohere
Managed Open Source
Anyscale Endpoints, Hugging Face, etc
Self Hosted
Run and maintain your own Open Source models
- Won’t dive into today, more details: walee.dk/selfhost
- TL;DR: Doable but harder than it looks (and maybe more expensive)
- Aviary: easy serving of LLMs using Ray Serve.
8
The Most Popular “Open” Models
Llama 2 (99% open)
Released in July
3 sizes: 7B, 13B, 70B
Permissive licence
- Can be used commercially
- Can’t be used to train other models
Code Llama (99% open)
Released in August
Specifically for generating code
3 sizes: 7B, 13B, 34B
3 “tunes”: Base, Python and Instruct
9
Falcon (90% open)
In June, released 7B, 40B
In September, released 180B model
Need a license for managed hosting
Very Dynamic Space
No LLM has been “most popular”
> 2 months
Keep an eye on this!
Direct comparisons
Open vs Proprietary
Summary Ranking established in literature.
“insiders say the row brought simmering tensions
between the starkly contrasting pair -- both
rivals for miliband's ear -- to a head.”
A: insiders say the row brought tensions between the
contrasting pair.
B: insiders say the row brought simmering tensions
between miliband's ear.
Comparing quality: Factuality eval
11
12
GPT-4 is Expensive – 30x Llama 2 70b for similar performance
Comparing Cost: Summarization
30x!
13
Can mean the difference between a product being viable or not
RayAssistant numbers (approx):
2000 tokens in, 500 tokens out, 1,000 questions/day
GPT-4: 10c per question ~$35,000 (VP approval?)
Llama 2 70b: 0.25c per question ~$900 (Credit card?)
30x is radically cheaper
A small fine-tuned open source model
can outperform the best available general model
in some cases
The Power of Fine-tuning
Natural Language to SQL
Llama-2-7B GPT-4 (~1.4T?)
general
fine-tuned
86%
3%
78%
Fine tuning is for form, not facts
17
18
What do you do for facts?
Retrieval Augmented Generation
Vector DB does a lot of the heavy lifting
LLM mostly just has to synthesize the context
A much easier problem
OS LLMs like Llama 70b work well – don’t see as big a difference vs
GPT-4
19
Open model challenges
- Quality
- Instruction following
- Function Templates
- Large Context Windows
No. The Right tool for the Right job
High End Proprietary APIs (esp GPT-4 and Claude 2)
are the best quality:
- Better logical & analogical reasoning
- Better “general knowledge”
- More refined answers
Open LLMs are “good enough” for (blog post forthcoming)
- Summarization
- Generation stage of RAG
Quality
Hybrids make a lot of sense
For evaluations, we still use GPT-4:
“Is answer A better or answer B better?”
We still send ~5% of queries to GPT-4 for Ray Assistant
(costs 150% more: $900 → $2250)
We still use proprietary LLMs
Proprietary LLMs are better at following instructions. OS LLMs
tend to not be as refined
Hypothesis: OS LLMs don’t have the same level of human
review and RLHF.
Instruction following
What we asked for: Please give an A or a B.
What we got from GPT-4:
A
What we got from Llama 2 70b:
‘The correct answer is A: those who receive centrelink payments made up half of radio rental's
income last year. Explanation: Summary A accurately summarizes the article sentence by
mentioning that those who receive centrelink payments made up half of radio rental's income
last year. It maintains the same meaning and information as the original sentence. On the other
hand, Summary B is inconsistent with the article sentence. It suggests that the ABC's report
only mentioned that those who receive centrelink payments made up radio rental's income last
year, which is not entirely accurate. The article sentence explicitly states that the ABC reported
that those who receive centrelink payments made up half of radio rental's income last year.
Therefore, Summary A is the better choice’.
Example of instruction following
Easily fixed by another LLM
System prompt
You are a helpful assistant that carefully follows instruction. You
provide only answers, no explanations.
User prompt
Determine if the following text says whether the answer is A, B or other.
Only output a single word, either: A B or other
Text: {query}
26
Function Templates
Convert the text below into one that calls a Python function.
The function is find_flights(departure_city, arrival_city,time, date,
class)
Convert to the appropriate city code using another function
city_code(str) that returns the city code for a given city.
“Hi. I'd like to book a flight to SF from Boston on Wednesday 20
September in the evening. Business class.”
27
Llama 13B output:
find_flights(Boston,
San_Francisco,
“2023-09-20”,
“18:00”,
“business”)
Does this parse?
- No, first two parameters are variables, should have quotes
- Didn’t use city_code function
- Decided 6pm was evening
28
Vs OpenAI strictly defined templates
"functions": [{
"name": "find_flights",
"description": "template to find flights.",
"parameters": {
"type": "object",
"properties": {
"from_city_code": {
"type": "string",
"description": "Three letter code for the city"
}, ...
29
vs Proprietary (OpenAI)
find_flights(city_code(“Boston”),
city_code(“San Francisco”),
“2023-09-20”,
“evening”,
“business”)
30
Large context windows
Bigger context windows are useful for retrieval augmented generation
From Ray Assistant Blog:
Increasing our number of chunks improves our retrieval and quality scores. We
had to stop testing at 7 chunks since Llama-2-70b's maximum content length is
4096 tokens. This is a compelling reason to invest in extending context size
31
Current status
Anthropic: 100K context window
GPT-4: 32K context window (8K by default)
Llama 2: 4K context window
CodeLlama: 16K context window
OSS
- Actively being worked on (eg RoPE)
- Larger context windows also need more GPU resources
- GPT-4 charges 2x for 32K context (vs 8K)
32
Status of Open LLM Weaknesses
Quality
- Larger and larger open models (180B now largest)
- Will likely be a moving target (eg Google’s Gemini)
Instruction following
- RLHF is pretty expensive and hard to do – may have to live with this
Expanded context window is actively being developed
- RoPE, YaRN, Hyena
Function templates being actively worked on
- Guidance, JSONFormer, LMQL
33
Best place to run Open LLMs?
endpoints.anyscale.com – right now, use an Open LLM in 2 minutes
4 models:
- Llama 2 7B, Llama 2 13B, Llama 70B
- Code Llama 34B Instruct
$0.15 per million tokens to $1 per million tokens
Fine tuning in Preview – super easy
34
One more thing …
$50 credit for Anyscale Endpoints if you sign up today
35
Summary
Open Models are viable in production – people are using them already
It is often possible to get close to proprietary LLM quality
Small fine-tuned models outperform giant general models (sometimes)
Use RAG for factual information
Open models are often radically cheaper (e.g. 30x)
Usually takes a bit of extra work e.g. prompt tuning, post-processing
Open Models still missing key features (but being worked on)
36
Thank you.
mwk@anyscale.com
Template
38
Here is a Basic Light Slide
39
Ray Summit 2023 Color Palette
40
Here is a basic Dark Slide
41
Realchar.ai
“Realchar.ai is about delivering immersive, realistic experiences for our users, not fighting infrastructure or upgrading open source
models. Endpoints made it possible for us to introduce new services in hours, instead of weeks, and for a fraction of the cost of
proprietary services. It also enables us to seamlessly personalize user experiences at scale.”
42
43
Here is an info card
Lorem ipsum dolor sit amet, consectetur
adipiscing elit, sed do eiusmod tempor
incid idunt ut labo re et dolore magna
aliqu Ut enim ad minim veniam, quis
nostrud exercitation
How about a slide with 2 options?
Here is an info card
Lorem ipsum dolor sit amet, consectetur
adipiscing elit, sed do eiusmod tempor
incid idunt ut labo re et dolore magna
aliqu Ut enim ad minim veniam, quis
nostrud exercitation
44
Here is an info card
Lorem ipsum dolor sit amet,
consectetur adipiscing elit, sed do
eiusmod tempor incid idunt ut
labo re et dolore magna aliqu Ut
enim ad minim veniam, quis
nostrud exercitation
How about a slide with 3?
Here is an info card
Lorem ipsum dolor sit amet,
consectetur adipiscing elit, sed do
eiusmod tempor incid idunt ut
labo re et dolore magna aliqu Ut
enim ad minim veniam, quis
nostrud exercitation
Here is an info card
Lorem ipsum dolor sit amet,
consectetur adipiscing elit, sed do
eiusmod tempor incid idunt ut
labo re et dolore magna aliqu Ut
enim ad minim veniam, quis
nostrud exercitation
Here is a Section Header
Here is a Section Header
HERE IS A SECTION
HEADER
Here is a Section Header
Thank you.
Follow up information can go here.

More Related Content

What's hot

Llama 2 Open Foundation and Fine-Tuned Chat Models.pdf
Llama 2 Open Foundation and Fine-Tuned Chat Models.pdfLlama 2 Open Foundation and Fine-Tuned Chat Models.pdf
Llama 2 Open Foundation and Fine-Tuned Chat Models.pdf
Dr. Yasir Butt
 
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Fwdays
 
TensorFlow Lite for mobile & IoT
TensorFlow Lite for mobile & IoT   TensorFlow Lite for mobile & IoT
TensorFlow Lite for mobile & IoT
Mia Chang
 
Introduction to OpenFlow
Introduction to OpenFlowIntroduction to OpenFlow
Introduction to OpenFlow
rjain51
 
IoT Control Units and Communication Models
IoT Control Units and Communication ModelsIoT Control Units and Communication Models
IoT Control Units and Communication Models
National Institute of Technology Karnataka, Surathkal
 
Raspberry Pi Introduction
Raspberry Pi IntroductionRaspberry Pi Introduction
Raspberry Pi Introduction
Michal Sedlak
 
PDT 89 - $1.4M - Seed - Plantee Innovations.pdf
PDT 89 - $1.4M - Seed - Plantee Innovations.pdfPDT 89 - $1.4M - Seed - Plantee Innovations.pdf
PDT 89 - $1.4M - Seed - Plantee Innovations.pdf
HajeJanKamps
 
Hands on with CoAP and Californium
Hands on with CoAP and CaliforniumHands on with CoAP and Californium
Hands on with CoAP and Californium
Julien Vermillard
 
Lecture 15
Lecture 15Lecture 15
Lecture 15
vishal choudhary
 
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
DataScienceConferenc1
 
5G Edge Computing IoT Presentation
5G Edge Computing IoT Presentation 5G Edge Computing IoT Presentation
5G Edge Computing IoT Presentation
Rick Stomphorst
 
Pitch Deck Teardown: Netmaker's $2.3M Seed deck
Pitch Deck Teardown: Netmaker's $2.3M Seed deckPitch Deck Teardown: Netmaker's $2.3M Seed deck
Pitch Deck Teardown: Netmaker's $2.3M Seed deck
HajeJanKamps
 
Introduction to eBPF
Introduction to eBPFIntroduction to eBPF
Introduction to eBPF
RogerColl2
 
An introduction to M2M / IoT technologies
An introduction to M2M / IoT technologiesAn introduction to M2M / IoT technologies
An introduction to M2M / IoT technologies
Pascal Bodin
 
Pitch Deck Teardown - Doola's $1m Series A extension deck
Pitch Deck Teardown - Doola's $1m Series A extension deckPitch Deck Teardown - Doola's $1m Series A extension deck
Pitch Deck Teardown - Doola's $1m Series A extension deck
HajeJanKamps
 
How to build high performance 5G networks with vRAN and O-RAN
How to build high performance 5G networks with vRAN and O-RANHow to build high performance 5G networks with vRAN and O-RAN
How to build high performance 5G networks with vRAN and O-RAN
Qualcomm Research
 
Pitch Deck Teardown: Scalestack's $1M AI sales tech Seed deck
Pitch Deck Teardown: Scalestack's $1M AI sales tech Seed deckPitch Deck Teardown: Scalestack's $1M AI sales tech Seed deck
Pitch Deck Teardown: Scalestack's $1M AI sales tech Seed deck
HajeJanKamps
 
IoTivity: From Devices to the Cloud
IoTivity: From Devices to the CloudIoTivity: From Devices to the Cloud
IoTivity: From Devices to the Cloud
Samsung Open Source Group
 
Tipalti pitch deck
Tipalti pitch deckTipalti pitch deck
Tipalti pitch deck
Tech in Asia
 
Cloud Robotics: It’s time to offload their brain on Cloud, for better Robotic...
Cloud Robotics: It’s time to offload their brain on Cloud, for better Robotic...Cloud Robotics: It’s time to offload their brain on Cloud, for better Robotic...
Cloud Robotics: It’s time to offload their brain on Cloud, for better Robotic...
Sai Natkar
 

What's hot (20)

Llama 2 Open Foundation and Fine-Tuned Chat Models.pdf
Llama 2 Open Foundation and Fine-Tuned Chat Models.pdfLlama 2 Open Foundation and Fine-Tuned Chat Models.pdf
Llama 2 Open Foundation and Fine-Tuned Chat Models.pdf
 
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
 
TensorFlow Lite for mobile & IoT
TensorFlow Lite for mobile & IoT   TensorFlow Lite for mobile & IoT
TensorFlow Lite for mobile & IoT
 
Introduction to OpenFlow
Introduction to OpenFlowIntroduction to OpenFlow
Introduction to OpenFlow
 
IoT Control Units and Communication Models
IoT Control Units and Communication ModelsIoT Control Units and Communication Models
IoT Control Units and Communication Models
 
Raspberry Pi Introduction
Raspberry Pi IntroductionRaspberry Pi Introduction
Raspberry Pi Introduction
 
PDT 89 - $1.4M - Seed - Plantee Innovations.pdf
PDT 89 - $1.4M - Seed - Plantee Innovations.pdfPDT 89 - $1.4M - Seed - Plantee Innovations.pdf
PDT 89 - $1.4M - Seed - Plantee Innovations.pdf
 
Hands on with CoAP and Californium
Hands on with CoAP and CaliforniumHands on with CoAP and Californium
Hands on with CoAP and Californium
 
Lecture 15
Lecture 15Lecture 15
Lecture 15
 
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
 
5G Edge Computing IoT Presentation
5G Edge Computing IoT Presentation 5G Edge Computing IoT Presentation
5G Edge Computing IoT Presentation
 
Pitch Deck Teardown: Netmaker's $2.3M Seed deck
Pitch Deck Teardown: Netmaker's $2.3M Seed deckPitch Deck Teardown: Netmaker's $2.3M Seed deck
Pitch Deck Teardown: Netmaker's $2.3M Seed deck
 
Introduction to eBPF
Introduction to eBPFIntroduction to eBPF
Introduction to eBPF
 
An introduction to M2M / IoT technologies
An introduction to M2M / IoT technologiesAn introduction to M2M / IoT technologies
An introduction to M2M / IoT technologies
 
Pitch Deck Teardown - Doola's $1m Series A extension deck
Pitch Deck Teardown - Doola's $1m Series A extension deckPitch Deck Teardown - Doola's $1m Series A extension deck
Pitch Deck Teardown - Doola's $1m Series A extension deck
 
How to build high performance 5G networks with vRAN and O-RAN
How to build high performance 5G networks with vRAN and O-RANHow to build high performance 5G networks with vRAN and O-RAN
How to build high performance 5G networks with vRAN and O-RAN
 
Pitch Deck Teardown: Scalestack's $1M AI sales tech Seed deck
Pitch Deck Teardown: Scalestack's $1M AI sales tech Seed deckPitch Deck Teardown: Scalestack's $1M AI sales tech Seed deck
Pitch Deck Teardown: Scalestack's $1M AI sales tech Seed deck
 
IoTivity: From Devices to the Cloud
IoTivity: From Devices to the CloudIoTivity: From Devices to the Cloud
IoTivity: From Devices to the Cloud
 
Tipalti pitch deck
Tipalti pitch deckTipalti pitch deck
Tipalti pitch deck
 
Cloud Robotics: It’s time to offload their brain on Cloud, for better Robotic...
Cloud Robotics: It’s time to offload their brain on Cloud, for better Robotic...Cloud Robotics: It’s time to offload their brain on Cloud, for better Robotic...
Cloud Robotics: It’s time to offload their brain on Cloud, for better Robotic...
 

Similar to Open LLMs: Viable for Production or Low-Quality Toy?

LLMs for the “GPU-Poor” - Franck Nijimbere.pdf
LLMs for the “GPU-Poor” - Franck Nijimbere.pdfLLMs for the “GPU-Poor” - Franck Nijimbere.pdf
LLMs for the “GPU-Poor” - Franck Nijimbere.pdf
GDG Bujumbura
 
Tensors Are All You Need: Faster Inference with Hummingbird
Tensors Are All You Need: Faster Inference with HummingbirdTensors Are All You Need: Faster Inference with Hummingbird
Tensors Are All You Need: Faster Inference with Hummingbird
Databricks
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software
Justin Basilico
 
SFSCON23 - Chris Mair - Self-hosted, Open Source Large Language Models (LLMs)
SFSCON23 - Chris Mair - Self-hosted, Open Source Large Language Models (LLMs)SFSCON23 - Chris Mair - Self-hosted, Open Source Large Language Models (LLMs)
SFSCON23 - Chris Mair - Self-hosted, Open Source Large Language Models (LLMs)
South Tyrol Free Software Conference
 
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
MLconf
 
LLMOps for Your Data: Best Practices to Ensure Safety, Quality, and Cost
LLMOps for Your Data: Best Practices to Ensure Safety, Quality, and CostLLMOps for Your Data: Best Practices to Ensure Safety, Quality, and Cost
LLMOps for Your Data: Best Practices to Ensure Safety, Quality, and Cost
Aggregage
 
Object Oriented Concepts and Principles
Object Oriented Concepts and PrinciplesObject Oriented Concepts and Principles
Object Oriented Concepts and Principles
deonpmeyer
 
Low latency in java 8 by Peter Lawrey
Low latency in java 8 by Peter Lawrey Low latency in java 8 by Peter Lawrey
Low latency in java 8 by Peter Lawrey
J On The Beach
 
Microservices for performance - GOTO Chicago 2016
Microservices for performance - GOTO Chicago 2016Microservices for performance - GOTO Chicago 2016
Microservices for performance - GOTO Chicago 2016
Peter Lawrey
 
OttoBot
OttoBotOttoBot
The Yin and Yang of Software
The Yin and Yang of SoftwareThe Yin and Yang of Software
The Yin and Yang of Software
elliando dias
 
Automatic License Plate Recognition using OpenCV
Automatic License Plate Recognition using OpenCVAutomatic License Plate Recognition using OpenCV
Automatic License Plate Recognition using OpenCV
Editor IJCATR
 
Automatic License Plate Recognition using OpenCV
Automatic License Plate Recognition using OpenCV Automatic License Plate Recognition using OpenCV
Automatic License Plate Recognition using OpenCV
Editor IJCATR
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at Netflix
Justin Basilico
 
Matlab for Electrical Engineers
Matlab for Electrical EngineersMatlab for Electrical Engineers
Matlab for Electrical Engineers
Manish Joshi
 
Low latency in java 8 v5
Low latency in java 8 v5Low latency in java 8 v5
Low latency in java 8 v5
Peter Lawrey
 
How to estimate the cost of a Maximo migration project with a high level of c...
How to estimate the cost of a Maximo migration project with a high level of c...How to estimate the cost of a Maximo migration project with a high level of c...
How to estimate the cost of a Maximo migration project with a high level of c...
Mariano Zelaya Feijoo
 
Serverless projects at Myplanet
Serverless projects at MyplanetServerless projects at Myplanet
Serverless projects at Myplanet
Daniel Zivkovic
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software
Justin Basilico
 
Matopt
MatoptMatopt

Similar to Open LLMs: Viable for Production or Low-Quality Toy? (20)

LLMs for the “GPU-Poor” - Franck Nijimbere.pdf
LLMs for the “GPU-Poor” - Franck Nijimbere.pdfLLMs for the “GPU-Poor” - Franck Nijimbere.pdf
LLMs for the “GPU-Poor” - Franck Nijimbere.pdf
 
Tensors Are All You Need: Faster Inference with Hummingbird
Tensors Are All You Need: Faster Inference with HummingbirdTensors Are All You Need: Faster Inference with Hummingbird
Tensors Are All You Need: Faster Inference with Hummingbird
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software
 
SFSCON23 - Chris Mair - Self-hosted, Open Source Large Language Models (LLMs)
SFSCON23 - Chris Mair - Self-hosted, Open Source Large Language Models (LLMs)SFSCON23 - Chris Mair - Self-hosted, Open Source Large Language Models (LLMs)
SFSCON23 - Chris Mair - Self-hosted, Open Source Large Language Models (LLMs)
 
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
 
LLMOps for Your Data: Best Practices to Ensure Safety, Quality, and Cost
LLMOps for Your Data: Best Practices to Ensure Safety, Quality, and CostLLMOps for Your Data: Best Practices to Ensure Safety, Quality, and Cost
LLMOps for Your Data: Best Practices to Ensure Safety, Quality, and Cost
 
Object Oriented Concepts and Principles
Object Oriented Concepts and PrinciplesObject Oriented Concepts and Principles
Object Oriented Concepts and Principles
 
Low latency in java 8 by Peter Lawrey
Low latency in java 8 by Peter Lawrey Low latency in java 8 by Peter Lawrey
Low latency in java 8 by Peter Lawrey
 
Microservices for performance - GOTO Chicago 2016
Microservices for performance - GOTO Chicago 2016Microservices for performance - GOTO Chicago 2016
Microservices for performance - GOTO Chicago 2016
 
OttoBot
OttoBotOttoBot
OttoBot
 
The Yin and Yang of Software
The Yin and Yang of SoftwareThe Yin and Yang of Software
The Yin and Yang of Software
 
Automatic License Plate Recognition using OpenCV
Automatic License Plate Recognition using OpenCVAutomatic License Plate Recognition using OpenCV
Automatic License Plate Recognition using OpenCV
 
Automatic License Plate Recognition using OpenCV
Automatic License Plate Recognition using OpenCV Automatic License Plate Recognition using OpenCV
Automatic License Plate Recognition using OpenCV
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at Netflix
 
Matlab for Electrical Engineers
Matlab for Electrical EngineersMatlab for Electrical Engineers
Matlab for Electrical Engineers
 
Low latency in java 8 v5
Low latency in java 8 v5Low latency in java 8 v5
Low latency in java 8 v5
 
How to estimate the cost of a Maximo migration project with a high level of c...
How to estimate the cost of a Maximo migration project with a high level of c...How to estimate the cost of a Maximo migration project with a high level of c...
How to estimate the cost of a Maximo migration project with a high level of c...
 
Serverless projects at Myplanet
Serverless projects at MyplanetServerless projects at Myplanet
Serverless projects at Myplanet
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software
 
Matopt
MatoptMatopt
Matopt
 

Recently uploaded

Technical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prismsTechnical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prisms
heavyhaig
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
Dr Ramhari Poudyal
 
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
ihlasbinance2003
 
spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
Madan Karki
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
Victor Morales
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
KrishnaveniKrishnara1
 
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdfTutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
aqil azizi
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
MIGUELANGEL966976
 
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTSHeap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Soumen Santra
 
Building Electrical System Design & Installation
Building Electrical System Design & InstallationBuilding Electrical System Design & Installation
Building Electrical System Design & Installation
symbo111
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
zwunae
 
A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...
nooriasukmaningtyas
 
Swimming pool mechanical components design.pptx
Swimming pool  mechanical components design.pptxSwimming pool  mechanical components design.pptx
Swimming pool mechanical components design.pptx
yokeleetan1
 
New techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdfNew techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdf
wisnuprabawa3
 
DfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributionsDfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributions
gestioneergodomus
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
insn4465
 
Modelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdfModelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdf
camseq
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
SyedAbiiAzazi1
 
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTCHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
jpsjournal1
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Christina Lin
 

Recently uploaded (20)

Technical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prismsTechnical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prisms
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
 
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
 
spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
 
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdfTutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
 
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTSHeap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
 
Building Electrical System Design & Installation
Building Electrical System Design & InstallationBuilding Electrical System Design & Installation
Building Electrical System Design & Installation
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
 
A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...
 
Swimming pool mechanical components design.pptx
Swimming pool  mechanical components design.pptxSwimming pool  mechanical components design.pptx
Swimming pool mechanical components design.pptx
 
New techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdfNew techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdf
 
DfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributionsDfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributions
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
 
Modelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdfModelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdf
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
 
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTCHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
 

Open LLMs: Viable for Production or Low-Quality Toy?

  • 1. Open Source LLMs: Viable for Production or a Low-Quality Toy? M Waleed Kadous Chief Scientist, Anyscale
  • 2. What we’ll cover - Propietary vs Open LLMs - Examples of people using Open LLMs in production - Why people use Open LLMs (with supporting experiments) - Cost - Deployment Flexibility - Fine-tuning options - Where Open LLMs are lagging - Quality - Instruction following - Missing features - Function Templates - Big context windows 2
  • 3. Summary Open Models are viable in production – people are using them already It is often possible to get close to commercial LLM quality Small fine-tuned models outperform giant general models (sometimes) It is often radically cheaper (e.g. 30x) Usually takes a bit of extra work e.g. prompt tuning, post-processing OS Models still missing key features (but being worked on) 3
  • 4. Being used already! endpoints.anyscale.com – right now, use an open LLM in 2 minutes 4 models: - Llama 2 7B, 13B, 70B - Code Llama 34B Instruct $0.15 per million tokens to $1 per million tokens Some quotes from our customers 4
  • 5. 5 Merlin “We use Anyscale Endpoints to power consumer-facing services that have reach to millions of users … Anyscale Endpoints gives us 5x-8x cost advantages over alternatives, making it easy for us to make Merlin even more powerful while staying affordable for millions of users.” Some quotes from our customers Realchar.ai “Realchar.ai is about delivering immersive, realistic experiences for our users, not fighting infrastructure or upgrading open source models. Endpoints made it possible for us to introduce new services in hours, instead of weeks, and for a fraction of the cost of proprietary services. It also enables us to seamlessly personalize user experiences at scale.”
  • 6. We are using Open LLMs: docs.ray.io 6
  • 7. Endless possibilities for AI innovation. AI app serving & routing Model training & continuous tuning Python-native Workspaces GPU/CPU optimizations Multi-Cloud, auto-scaling Anyscale AI Platform Anyscale Endpoints LLMs served via API LLMs fine-tuned via API Ray AI Libraries Ray Core Ray Open Source Serve your LLMs from your Cloud Fine-tune & customize in your Cloud Anyscale Private Endpoints
  • 8. Your options for LLMs Proprietary OpenAI, Anthropic, Cohere Managed Open Source Anyscale Endpoints, Hugging Face, etc Self Hosted Run and maintain your own Open Source models - Won’t dive into today, more details: walee.dk/selfhost - TL;DR: Doable but harder than it looks (and maybe more expensive) - Aviary: easy serving of LLMs using Ray Serve. 8
  • 9. The Most Popular “Open” Models Llama 2 (99% open) Released in July 3 sizes: 7B, 13B, 70B Permissive licence - Can be used commercially - Can’t be used to train other models Code Llama (99% open) Released in August Specifically for generating code 3 sizes: 7B, 13B, 34B 3 “tunes”: Base, Python and Instruct 9 Falcon (90% open) In June, released 7B, 40B In September, released 180B model Need a license for managed hosting Very Dynamic Space No LLM has been “most popular” > 2 months Keep an eye on this!
  • 11. Summary Ranking established in literature. “insiders say the row brought simmering tensions between the starkly contrasting pair -- both rivals for miliband's ear -- to a head.” A: insiders say the row brought tensions between the contrasting pair. B: insiders say the row brought simmering tensions between miliband's ear. Comparing quality: Factuality eval 11
  • 12. 12
  • 13. GPT-4 is Expensive – 30x Llama 2 70b for similar performance Comparing Cost: Summarization 30x! 13
  • 14. Can mean the difference between a product being viable or not RayAssistant numbers (approx): 2000 tokens in, 500 tokens out, 1,000 questions/day GPT-4: 10c per question ~$35,000 (VP approval?) Llama 2 70b: 0.25c per question ~$900 (Credit card?) 30x is radically cheaper
  • 15. A small fine-tuned open source model can outperform the best available general model in some cases The Power of Fine-tuning
  • 16. Natural Language to SQL Llama-2-7B GPT-4 (~1.4T?) general fine-tuned 86% 3% 78%
  • 17. Fine tuning is for form, not facts 17
  • 18. 18 What do you do for facts?
  • 19. Retrieval Augmented Generation Vector DB does a lot of the heavy lifting LLM mostly just has to synthesize the context A much easier problem OS LLMs like Llama 70b work well – don’t see as big a difference vs GPT-4 19
  • 20.
  • 21. Open model challenges - Quality - Instruction following - Function Templates - Large Context Windows No. The Right tool for the Right job
  • 22. High End Proprietary APIs (esp GPT-4 and Claude 2) are the best quality: - Better logical & analogical reasoning - Better “general knowledge” - More refined answers Open LLMs are “good enough” for (blog post forthcoming) - Summarization - Generation stage of RAG Quality
  • 23. Hybrids make a lot of sense For evaluations, we still use GPT-4: “Is answer A better or answer B better?” We still send ~5% of queries to GPT-4 for Ray Assistant (costs 150% more: $900 → $2250) We still use proprietary LLMs
  • 24. Proprietary LLMs are better at following instructions. OS LLMs tend to not be as refined Hypothesis: OS LLMs don’t have the same level of human review and RLHF. Instruction following
  • 25. What we asked for: Please give an A or a B. What we got from GPT-4: A What we got from Llama 2 70b: ‘The correct answer is A: those who receive centrelink payments made up half of radio rental's income last year. Explanation: Summary A accurately summarizes the article sentence by mentioning that those who receive centrelink payments made up half of radio rental's income last year. It maintains the same meaning and information as the original sentence. On the other hand, Summary B is inconsistent with the article sentence. It suggests that the ABC's report only mentioned that those who receive centrelink payments made up radio rental's income last year, which is not entirely accurate. The article sentence explicitly states that the ABC reported that those who receive centrelink payments made up half of radio rental's income last year. Therefore, Summary A is the better choice’. Example of instruction following
  • 26. Easily fixed by another LLM System prompt You are a helpful assistant that carefully follows instruction. You provide only answers, no explanations. User prompt Determine if the following text says whether the answer is A, B or other. Only output a single word, either: A B or other Text: {query} 26
  • 27. Function Templates Convert the text below into one that calls a Python function. The function is find_flights(departure_city, arrival_city,time, date, class) Convert to the appropriate city code using another function city_code(str) that returns the city code for a given city. “Hi. I'd like to book a flight to SF from Boston on Wednesday 20 September in the evening. Business class.” 27
  • 28. Llama 13B output: find_flights(Boston, San_Francisco, “2023-09-20”, “18:00”, “business”) Does this parse? - No, first two parameters are variables, should have quotes - Didn’t use city_code function - Decided 6pm was evening 28
  • 29. Vs OpenAI strictly defined templates "functions": [{ "name": "find_flights", "description": "template to find flights.", "parameters": { "type": "object", "properties": { "from_city_code": { "type": "string", "description": "Three letter code for the city" }, ... 29
  • 30. vs Proprietary (OpenAI) find_flights(city_code(“Boston”), city_code(“San Francisco”), “2023-09-20”, “evening”, “business”) 30
  • 31. Large context windows Bigger context windows are useful for retrieval augmented generation From Ray Assistant Blog: Increasing our number of chunks improves our retrieval and quality scores. We had to stop testing at 7 chunks since Llama-2-70b's maximum content length is 4096 tokens. This is a compelling reason to invest in extending context size 31
  • 32. Current status Anthropic: 100K context window GPT-4: 32K context window (8K by default) Llama 2: 4K context window CodeLlama: 16K context window OSS - Actively being worked on (eg RoPE) - Larger context windows also need more GPU resources - GPT-4 charges 2x for 32K context (vs 8K) 32
  • 33. Status of Open LLM Weaknesses Quality - Larger and larger open models (180B now largest) - Will likely be a moving target (eg Google’s Gemini) Instruction following - RLHF is pretty expensive and hard to do – may have to live with this Expanded context window is actively being developed - RoPE, YaRN, Hyena Function templates being actively worked on - Guidance, JSONFormer, LMQL 33
  • 34. Best place to run Open LLMs? endpoints.anyscale.com – right now, use an Open LLM in 2 minutes 4 models: - Llama 2 7B, Llama 2 13B, Llama 70B - Code Llama 34B Instruct $0.15 per million tokens to $1 per million tokens Fine tuning in Preview – super easy 34
  • 35. One more thing … $50 credit for Anyscale Endpoints if you sign up today 35
  • 36. Summary Open Models are viable in production – people are using them already It is often possible to get close to proprietary LLM quality Small fine-tuned models outperform giant general models (sometimes) Use RAG for factual information Open models are often radically cheaper (e.g. 30x) Usually takes a bit of extra work e.g. prompt tuning, post-processing Open Models still missing key features (but being worked on) 36
  • 39. Here is a Basic Light Slide 39
  • 40. Ray Summit 2023 Color Palette 40
  • 41. Here is a basic Dark Slide 41
  • 42. Realchar.ai “Realchar.ai is about delivering immersive, realistic experiences for our users, not fighting infrastructure or upgrading open source models. Endpoints made it possible for us to introduce new services in hours, instead of weeks, and for a fraction of the cost of proprietary services. It also enables us to seamlessly personalize user experiences at scale.” 42
  • 43. 43 Here is an info card Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incid idunt ut labo re et dolore magna aliqu Ut enim ad minim veniam, quis nostrud exercitation How about a slide with 2 options? Here is an info card Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incid idunt ut labo re et dolore magna aliqu Ut enim ad minim veniam, quis nostrud exercitation
  • 44. 44 Here is an info card Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incid idunt ut labo re et dolore magna aliqu Ut enim ad minim veniam, quis nostrud exercitation How about a slide with 3? Here is an info card Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incid idunt ut labo re et dolore magna aliqu Ut enim ad minim veniam, quis nostrud exercitation Here is an info card Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incid idunt ut labo re et dolore magna aliqu Ut enim ad minim veniam, quis nostrud exercitation
  • 45. Here is a Section Header
  • 46. Here is a Section Header
  • 47. HERE IS A SECTION HEADER Here is a Section Header
  • 48. Thank you. Follow up information can go here.