LLMs in Production:
Tooling, Process, and Team Structure
December 6th, 2023
10:30am PST
Have a question or comment for our
panelists?
Use this QR code to engage with our
speakers, or visit the link in the chat!
Having an audio issue?
Try dialing in by phone!
Dial: +1 312 626 6799
Webinar ID: 819 1469 6007
Passcode: 385318
Closed Captioning is available
for this webinar!
Our Panelists
Tony Karrer
Founder & CEO TechEmpower,
Founder & CTO Aggregage
Greg Loughnane
Founder & CEO of
AI Makerspace
Chris Alexiuk
Co-Founder & CTO at
AI Makerspace
🎯ALIGNING OUR AIM
BY THE END OF TODAY...
Understand processes for building and
improving production LLM applications
Overview of industry-standard tooling
How to leverage LangSmith
OVERVIEW
LLM Ops, LLM OS, and “The New Stack”
Leading Tooling
Meet LangSmith
Conclusions, Q&A
LLM OPS
What is LLM Ops?
BUILDING LLM APPLICATIONS
ENTERPRISE BUILDS
Synthetic data, closed-source models
1.
Baseline performance
a.
Open-source models
2.
Add your private data
a.
Iterate
3.
Optimize models, data, metrics,
inference, efficiency
a.
NEXT-LEVEL
❓Some additional questions:
On premise hardware?
What scale and speed?
Training proprietary LLM?
Small Language Models (SLMs)
Efficiency
Transparency
Accuracy
Security
PROTOTYPING
🧩PROTOTYPING LLM APPS
Prompt Engineering
1.
Question Answering Systems
2.
Fine-Tuning Models
3.
“THE NEW STACK AND OPS FOR AI”
#LLM OPS
USER EXPERIENCE
Control for uncertainty (PE)
Build guardrails for steerability
and safety (Harm/Help)
MODEL CONSISTENCY
Constrain model behavior (FT)
Ground the model (RAG)
EVALUATING PERF
Create evaluation suites (RAGAS)
Use model-graded evals (GPT-4)
LATENCY AND COST
Use semantic caching (Prompts)
Route to cheaper models (FT)
RAG, FROM LANGCHAIN
LEADING TOOLING
OUR LLM OPS CURRICULUM
🧑‍💻Building LLM Applications in Pure Python
1.
🔗LangChain Powered RAG and Advanced Retrieval
2.
🦙Open-Source Production RAG with LlamaIndex
3.
🕴️Agents, 🧑‍💻Hackathon, and 🧑‍🏫Demo Day!
4.
OUR LLM OPS CURRICULUM
LLMs
Evaluation
Visibility
Infrastructure
Vector Database
Embedding Models
User Interface
Deployment
LLMs: OpenAI GPT-4-Turbo, Mistral 7B
Evaluation: RAGAS, Built-in metrics
Visibility: Weights and Biases
Infrastructure: LangChain LlamaIndex
Vector Database: Pinecone FAISS
Embedding Models: OpenAI Ada MTEB Leaderboard
User Interface: Chainlit
Deployment: Hugging Face Amazon Bedrock
OPEN LEADERBOARDS
BASIC RAG +
ADVANCED RAG W/
LANGSMITH
HALLUCINATIONS
Confident responses that are false.
FACT CHECKING
RETREIVAL AUGMENTED GENERATION
Retrieval
Finds references in your documents
Augmented
Adds references to your prompts
Generation
Improves answers to questions!
SPECIALIZED DOMAINS
Jargon, e.g.;
Legal, healthcare, financial, insurance,
government, research
Alignment
With common language that humans use
RETRIEVAL AUGMENTED
GENERATION
OVERVIEW
RAG
=
DENSE VECTOR RETRIEVAL
(R)
+
IN-CONTEXT LEARNING
(AG)
🧩3 EASY PIECES TO RETRIEVAL
Ask a question
1.
Search database for stuff similar to question
2.
Return the stuff
3.
📇INDEX (THE DATABASE)
Split docs into chunks
1.
Create embeddings for each chunk
2.
Store embeddings in vector store index
3.
Embeddings Vector Store Index
Documents
Raw Source
Documents Chunked Documents
[0.1,0.4,-0.6,...]
[0.2,0.3,-0.4,...]
[0.8,0.3,-0.1,...]
🐕RETRIEVERS
Embeddings Vector Store Index
Documents
Raw Source
Documents Chunked Documents
[0.1,0.4,-0.6,...]
Query
INPUT
[0.1,0.4,-0.6,...]
Find Nearest Neighbors
Context: From source 1
Context: From source 2
Context: From source
🐕
[0.2,0.3,-0.4,...]
[0.8,0.3,-0.1,...]
[0.1, 0.4, -0.6, ...]
Ryan was ...
Query
Find Nearest
Neighbours
(cosine similarity)
Vector Database
App Logic
INPUT
“Query...”
Embedding Model
[0.1, 0.4, -0.6, ...]
Use the provided context to answer the user's query.
You may not answer the user's query unless there is specific
context in the following text.
If you do not know the answer, or cannot answer, please respond
with "I don't know".
Context:
{context}
User Query:
{user_query}
Query
Embedding Model Chat Model
Prompt Templates
INPUT
“Query...”
Find Nearest
Neighbours
(cosine similarity)
Vector Database
App Logic
Embedding Model Chat Model
Vector Store
Find Nearest
Neighbours
(cosine similarity)
Return document(s)
from
Nearest Neighbours
[0.1, 0.4, -0.6, ...]
Prompt Templates
Vector Database
App Logic App Logic
Use the provided context to answer the user's query.
You may not answer the user's query unless there is specific
context in the following text.
If you do not know the answer, or cannot answer, please respond
with "I don't know".
Context:
{context}
User Query:
{user_query}
Context: ref 1
Context: ref 2
Context: ref 3
Context: ref 4
Ryan was ...
Query
INPUT
“Query”
Embedding Model Chat Model
Vector Store
Find Nearest
Neighbours
(cosine similarity)
Return document(s)
from
Nearest Neighbours
[0.1, 0.4, -0.6, ...]
Prompt Templates
Vector Database
App Logic App Logic
Use the provided context to answer the user's query.
You may not answer the user's query unless there is specific
context in the following text.
If you do not know the answer, or cannot answer, please respond
with "I don't know".
Context:
{context}
User Query:
{user_query}
Context: ref 1
Context: ref 2
Context: ref 3
Context: ref 4
Answer
Query
INPUT
OUTPUT
“Query”
Embedding Model Chat Model
Vector Store
Find Nearest
Neighbours
(cosine similarity)
Return document(s)
from
Nearest Neighbours
[0.1, 0.4, -0.6, ...]
Prompt Templates
Vector Database
App Logic App Logic
Use the provided context to answer the user's query.
You may not answer the user's query unless there is specific
context in the following text.
If you do not know the answer, or cannot answer, please respond
with "I don't know".
Context:
{context}
User Query:
{user_query}
Context: ref 1
Context: ref 2
Context: ref 3
Context: ref 4
Answer
Query
INPUT
OUTPUT
“Query”
Dense Vector Retrieval
In-Context Learning
Embedding Model Chat Model
Vector Store
Find Nearest
Neighbours
(cosine similarity)
Return document(s)
from
Nearest Neighbours
[0.1, 0.4, -0.6, ...]
Prompt Templates
Vector Database
App Logic App Logic
Use the provided context to answer the user's query.
You may not answer the user's query unless there is specific
context in the following text.
If you do not know the answer, or cannot answer, please respond
with "I don't know".
Context:
{context}
User Query:
{user_query}
Context: ref 1
Context: ref 2
Context: ref 3
Context: ref 4
Answer
Query
INPUT
OUTPUT
“Query”
Dense Vector Retrieval
In-Context Learning
TODAY’S BUILD
DATA
Rich Content
Hard to search
OUR MODELS
EMBEDDING & CHAT (LLM) MODEL
Chat Model (e.g.
, LLM)
e.g,. OpenAI GPT-4
Embeddi
ng Model
e.g., Cohere V3
Embedding Model Chat Model
OUR TOOLING
Vector Database
Qdrant
Infrasturucture & Evaluation
LangChain
LangSmith
DATABASE AND INFRASTRUCTURE
Vector Database
🔗LANGCHAIN
“The real power
comes when you can
combine them with
other sources of
computation or
knowledge.”
~ Harrison Chase
Creator of LangChain
WHAT IS IT?
WHAT IS IT?
WHAT IS IT?
Search OpenAI blog for top k resources, rerank
1.
Ask specific questions related to content
2.
Return answers to questions with sources
3.
OpenAI RAG Flow
OPENAI RAG 👨‍💻
Presented By
Chris Alexiuk, LLM Wizard 🪄
HOW DO I IMPROVE?
PROMPT ENGINEERING
Check system-level prompting
and one-shot/few-shot examples
for alignment with your task
e.g., varies
RAG
“Are you pulling the right
references?
e.g., context recall
FINE-TUNING EMBEDDINGS
Is your model understanding
domain-specific language?
e.g., hit rate
AGENTS
Is your model reasoning the way
a human would?
e.g., ???
THE RISE OF THE AI
ENGINEER
THE AGE OF THE AI ENGINEER
“A wide range of AI tasks that used to
take 5 years and a research team to
accomplish in 2013, now just require API
docs and a spare afternoon in 2023.”
It is now possible to build what used
to take months in a single day!
DATA SCIENTISTS!
Enhance Retrieval (and thus Generation!)
Fine-Tuning
Embeddings
Chat Models
Evaluation
CONCLUSIONS
Best-practice tools are out there!
LangSmith-like tooling is the most comprehensive
Building
Prompt Engineering, RAG, Fine-Tuning
Improvement
Depends on Building!
Eval varies
Lots of work for data scientist and AI Engineers!
Q&A
Tony Karrer
Founder & CEO TechEmpower,
Founder & CTO Aggregage
Dr. Greg Loughnane
Founder & CEO of
AI Makerspace
Chris Alexiuk
Co-Founder & CTO at
AI Makerspace
Tara Dwyer
Webinar Manager
/in/tonykarrer/
aggregage.com
/in/gregloughnane/
aimakerspace.io
/in/csalexiuk/
aimakerspace.io
/in/taradwyer/
artificialintelligencezone.com
JOIN THE GENERATIVE AI FOR TECHNOLOGY LEADERS LINKEDIN GROUP
FOR THOUGHTFUL DISCUSSION AND Q&A! VISIT THE LINK OR SCAN THE QR CODE!
bit.ly/genaitechleaders

LLMs in Production: Tooling, Process, and Team Structure

  • 1.
    LLMs in Production: Tooling,Process, and Team Structure December 6th, 2023 10:30am PST
  • 2.
    Have a questionor comment for our panelists? Use this QR code to engage with our speakers, or visit the link in the chat! Having an audio issue? Try dialing in by phone! Dial: +1 312 626 6799 Webinar ID: 819 1469 6007 Passcode: 385318 Closed Captioning is available for this webinar!
  • 3.
    Our Panelists Tony Karrer Founder& CEO TechEmpower, Founder & CTO Aggregage Greg Loughnane Founder & CEO of AI Makerspace Chris Alexiuk Co-Founder & CTO at AI Makerspace
  • 4.
  • 5.
    BY THE ENDOF TODAY... Understand processes for building and improving production LLM applications Overview of industry-standard tooling How to leverage LangSmith
  • 6.
    OVERVIEW LLM Ops, LLMOS, and “The New Stack” Leading Tooling Meet LangSmith Conclusions, Q&A
  • 7.
  • 8.
  • 14.
  • 15.
    ENTERPRISE BUILDS Synthetic data,closed-source models 1. Baseline performance a. Open-source models 2. Add your private data a. Iterate 3. Optimize models, data, metrics, inference, efficiency a.
  • 16.
    NEXT-LEVEL ❓Some additional questions: Onpremise hardware? What scale and speed? Training proprietary LLM?
  • 17.
    Small Language Models(SLMs) Efficiency Transparency Accuracy Security
  • 18.
  • 19.
    🧩PROTOTYPING LLM APPS PromptEngineering 1. Question Answering Systems 2. Fine-Tuning Models 3.
  • 23.
    “THE NEW STACKAND OPS FOR AI” #LLM OPS
  • 24.
    USER EXPERIENCE Control foruncertainty (PE) Build guardrails for steerability and safety (Harm/Help)
  • 25.
    MODEL CONSISTENCY Constrain modelbehavior (FT) Ground the model (RAG)
  • 26.
    EVALUATING PERF Create evaluationsuites (RAGAS) Use model-graded evals (GPT-4)
  • 27.
    LATENCY AND COST Usesemantic caching (Prompts) Route to cheaper models (FT)
  • 30.
  • 31.
  • 32.
    OUR LLM OPSCURRICULUM 🧑‍💻Building LLM Applications in Pure Python 1. 🔗LangChain Powered RAG and Advanced Retrieval 2. 🦙Open-Source Production RAG with LlamaIndex 3. 🕴️Agents, 🧑‍💻Hackathon, and 🧑‍🏫Demo Day! 4.
  • 33.
    OUR LLM OPSCURRICULUM
  • 34.
  • 35.
    LLMs: OpenAI GPT-4-Turbo,Mistral 7B Evaluation: RAGAS, Built-in metrics Visibility: Weights and Biases Infrastructure: LangChain LlamaIndex Vector Database: Pinecone FAISS Embedding Models: OpenAI Ada MTEB Leaderboard User Interface: Chainlit Deployment: Hugging Face Amazon Bedrock
  • 36.
  • 37.
    BASIC RAG + ADVANCEDRAG W/ LANGSMITH
  • 38.
  • 39.
  • 40.
    RETREIVAL AUGMENTED GENERATION Retrieval Findsreferences in your documents Augmented Adds references to your prompts Generation Improves answers to questions!
  • 41.
    SPECIALIZED DOMAINS Jargon, e.g.; Legal,healthcare, financial, insurance, government, research Alignment With common language that humans use
  • 42.
  • 43.
  • 44.
    🧩3 EASY PIECESTO RETRIEVAL Ask a question 1. Search database for stuff similar to question 2. Return the stuff 3.
  • 45.
    📇INDEX (THE DATABASE) Splitdocs into chunks 1. Create embeddings for each chunk 2. Store embeddings in vector store index 3. Embeddings Vector Store Index Documents Raw Source Documents Chunked Documents [0.1,0.4,-0.6,...] [0.2,0.3,-0.4,...] [0.8,0.3,-0.1,...]
  • 46.
    🐕RETRIEVERS Embeddings Vector StoreIndex Documents Raw Source Documents Chunked Documents [0.1,0.4,-0.6,...] Query INPUT [0.1,0.4,-0.6,...] Find Nearest Neighbors Context: From source 1 Context: From source 2 Context: From source 🐕 [0.2,0.3,-0.4,...] [0.8,0.3,-0.1,...]
  • 47.
    [0.1, 0.4, -0.6,...] Ryan was ... Query Find Nearest Neighbours (cosine similarity) Vector Database App Logic INPUT “Query...” Embedding Model
  • 48.
    [0.1, 0.4, -0.6,...] Use the provided context to answer the user's query. You may not answer the user's query unless there is specific context in the following text. If you do not know the answer, or cannot answer, please respond with "I don't know". Context: {context} User Query: {user_query} Query Embedding Model Chat Model Prompt Templates INPUT “Query...” Find Nearest Neighbours (cosine similarity) Vector Database App Logic
  • 49.
    Embedding Model ChatModel Vector Store Find Nearest Neighbours (cosine similarity) Return document(s) from Nearest Neighbours [0.1, 0.4, -0.6, ...] Prompt Templates Vector Database App Logic App Logic Use the provided context to answer the user's query. You may not answer the user's query unless there is specific context in the following text. If you do not know the answer, or cannot answer, please respond with "I don't know". Context: {context} User Query: {user_query} Context: ref 1 Context: ref 2 Context: ref 3 Context: ref 4 Ryan was ... Query INPUT “Query”
  • 50.
    Embedding Model ChatModel Vector Store Find Nearest Neighbours (cosine similarity) Return document(s) from Nearest Neighbours [0.1, 0.4, -0.6, ...] Prompt Templates Vector Database App Logic App Logic Use the provided context to answer the user's query. You may not answer the user's query unless there is specific context in the following text. If you do not know the answer, or cannot answer, please respond with "I don't know". Context: {context} User Query: {user_query} Context: ref 1 Context: ref 2 Context: ref 3 Context: ref 4 Answer Query INPUT OUTPUT “Query”
  • 51.
    Embedding Model ChatModel Vector Store Find Nearest Neighbours (cosine similarity) Return document(s) from Nearest Neighbours [0.1, 0.4, -0.6, ...] Prompt Templates Vector Database App Logic App Logic Use the provided context to answer the user's query. You may not answer the user's query unless there is specific context in the following text. If you do not know the answer, or cannot answer, please respond with "I don't know". Context: {context} User Query: {user_query} Context: ref 1 Context: ref 2 Context: ref 3 Context: ref 4 Answer Query INPUT OUTPUT “Query” Dense Vector Retrieval In-Context Learning
  • 52.
    Embedding Model ChatModel Vector Store Find Nearest Neighbours (cosine similarity) Return document(s) from Nearest Neighbours [0.1, 0.4, -0.6, ...] Prompt Templates Vector Database App Logic App Logic Use the provided context to answer the user's query. You may not answer the user's query unless there is specific context in the following text. If you do not know the answer, or cannot answer, please respond with "I don't know". Context: {context} User Query: {user_query} Context: ref 1 Context: ref 2 Context: ref 3 Context: ref 4 Answer Query INPUT OUTPUT “Query” Dense Vector Retrieval In-Context Learning
  • 53.
  • 54.
  • 55.
  • 56.
    EMBEDDING & CHAT(LLM) MODEL Chat Model (e.g. , LLM) e.g,. OpenAI GPT-4 Embeddi ng Model e.g., Cohere V3 Embedding Model Chat Model
  • 57.
  • 58.
    Vector Database Qdrant Infrasturucture &Evaluation LangChain LangSmith DATABASE AND INFRASTRUCTURE Vector Database
  • 59.
  • 60.
    “The real power comeswhen you can combine them with other sources of computation or knowledge.” ~ Harrison Chase Creator of LangChain
  • 61.
  • 62.
  • 63.
  • 65.
    Search OpenAI blogfor top k resources, rerank 1. Ask specific questions related to content 2. Return answers to questions with sources 3. OpenAI RAG Flow
  • 66.
    OPENAI RAG 👨‍💻 PresentedBy Chris Alexiuk, LLM Wizard 🪄
  • 67.
    HOW DO IIMPROVE?
  • 69.
    PROMPT ENGINEERING Check system-levelprompting and one-shot/few-shot examples for alignment with your task e.g., varies
  • 70.
    RAG “Are you pullingthe right references? e.g., context recall
  • 71.
    FINE-TUNING EMBEDDINGS Is yourmodel understanding domain-specific language? e.g., hit rate
  • 72.
    AGENTS Is your modelreasoning the way a human would? e.g., ???
  • 73.
    THE RISE OFTHE AI ENGINEER
  • 74.
    THE AGE OFTHE AI ENGINEER “A wide range of AI tasks that used to take 5 years and a research team to accomplish in 2013, now just require API docs and a spare afternoon in 2023.” It is now possible to build what used to take months in a single day!
  • 75.
    DATA SCIENTISTS! Enhance Retrieval(and thus Generation!) Fine-Tuning Embeddings Chat Models Evaluation
  • 76.
    CONCLUSIONS Best-practice tools areout there! LangSmith-like tooling is the most comprehensive Building Prompt Engineering, RAG, Fine-Tuning Improvement Depends on Building! Eval varies Lots of work for data scientist and AI Engineers!
  • 77.
    Q&A Tony Karrer Founder &CEO TechEmpower, Founder & CTO Aggregage Dr. Greg Loughnane Founder & CEO of AI Makerspace Chris Alexiuk Co-Founder & CTO at AI Makerspace Tara Dwyer Webinar Manager /in/tonykarrer/ aggregage.com /in/gregloughnane/ aimakerspace.io /in/csalexiuk/ aimakerspace.io /in/taradwyer/ artificialintelligencezone.com JOIN THE GENERATIVE AI FOR TECHNOLOGY LEADERS LINKEDIN GROUP FOR THOUGHTFUL DISCUSSION AND Q&A! VISIT THE LINK OR SCAN THE QR CODE! bit.ly/genaitechleaders