LLMs in Production: Tooling, Process, and Team Structure

LLMs in Production:
Tooling, Process, and Team Structure
December 6th, 2023
10:30am PST

Have a question or comment for our
panelists?
Use this QR code to engage with our
speakers, or visit the link in the chat!
Having an audio issue?
Try dialing in by phone!
Dial: +1 312 626 6799
Webinar ID: 819 1469 6007
Passcode: 385318
Closed Captioning is available
for this webinar!

Our Panelists
Tony Karrer
Founder & CEO TechEmpower,
Founder & CTO Aggregage
Greg Loughnane
Founder & CEO of
AI Makerspace
Chris Alexiuk
Co-Founder & CTO at
AI Makerspace

BY THE END OF TODAY...
Understand processes for building and
improving production LLM applications
Overview of industry-standard tooling
How to leverage LangSmith

OVERVIEW
LLM Ops, LLM OS, and “The New Stack”
Leading Tooling
Meet LangSmith
Conclusions, Q&A

ENTERPRISE BUILDS
Synthetic data, closed-source models
1.
Baseline performance
a.
Open-source models
2.
Add your private data
a.
Iterate
3.
Optimize models, data, metrics,
inference, efficiency
a.

NEXT-LEVEL
❓Some additional questions:
On premise hardware?
What scale and speed?
Training proprietary LLM?

Small Language Models (SLMs)
Efficiency
Transparency
Accuracy
Security

🧩PROTOTYPING LLM APPS
Prompt Engineering
1.
Question Answering Systems
2.
Fine-Tuning Models
3.

“THE NEW STACK AND OPS FOR AI”
#LLM OPS

USER EXPERIENCE
Control for uncertainty (PE)
Build guardrails for steerability
and safety (Harm/Help)

MODEL CONSISTENCY
Constrain model behavior (FT)
Ground the model (RAG)

EVALUATING PERF
Create evaluation suites (RAGAS)
Use model-graded evals (GPT-4)

LATENCY AND COST
Use semantic caching (Prompts)
Route to cheaper models (FT)

OUR LLM OPS CURRICULUM
🧑‍💻Building LLM Applications in Pure Python
1.
🔗LangChain Powered RAG and Advanced Retrieval
2.
🦙Open-Source Production RAG with LlamaIndex
3.
🕴️Agents, 🧑‍💻Hackathon, and 🧑‍🏫Demo Day!
4.

LLMs
Evaluation
Visibility
Infrastructure
Vector Database
Embedding Models
User Interface
Deployment

LLMs: OpenAI GPT-4-Turbo, Mistral 7B
Evaluation: RAGAS, Built-in metrics
Visibility: Weights and Biases
Infrastructure: LangChain LlamaIndex
Vector Database: Pinecone FAISS
Embedding Models: OpenAI Ada MTEB Leaderboard
User Interface: Chainlit
Deployment: Hugging Face Amazon Bedrock

BASIC RAG +
ADVANCED RAG W/
LANGSMITH

HALLUCINATIONS
Confident responses that are false.

RETREIVAL AUGMENTED GENERATION
Retrieval
Finds references in your documents
Augmented
Adds references to your prompts
Generation
Improves answers to questions!

SPECIALIZED DOMAINS
Jargon, e.g.;
Legal, healthcare, financial, insurance,
government, research
Alignment
With common language that humans use

RETRIEVAL AUGMENTED
GENERATION
OVERVIEW

RAG
=
DENSE VECTOR RETRIEVAL
(R)
+
IN-CONTEXT LEARNING
(AG)

🧩3 EASY PIECES TO RETRIEVAL
Ask a question
1.
Search database for stuff similar to question
2.
Return the stuff
3.

📇INDEX (THE DATABASE)
Split docs into chunks
1.
Create embeddings for each chunk
2.
Store embeddings in vector store index
3.
Embeddings Vector Store Index
Documents
Raw Source
Documents Chunked Documents
[0.1,0.4,-0.6,...]
[0.2,0.3,-0.4,...]
[0.8,0.3,-0.1,...]

🐕RETRIEVERS
Embeddings Vector Store Index
Documents
Raw Source
Documents Chunked Documents
[0.1,0.4,-0.6,...]
Query
INPUT
[0.1,0.4,-0.6,...]
Find Nearest Neighbors
Context: From source 1
Context: From source 2
Context: From source
🐕
[0.2,0.3,-0.4,...]
[0.8,0.3,-0.1,...]

[0.1, 0.4, -0.6, ...]
Ryan was ...
Query
Find Nearest
Neighbours
(cosine similarity)
Vector Database
App Logic
INPUT
“Query...”
Embedding Model

[0.1, 0.4, -0.6, ...]
Use the provided context to answer the user's query.
You may not answer the user's query unless there is specific
context in the following text.
If you do not know the answer, or cannot answer, please respond
with "I don't know".
Context:
{context}
User Query:
{user_query}
Query
Embedding Model Chat Model
Prompt Templates
INPUT
“Query...”
Find Nearest
Neighbours
(cosine similarity)
Vector Database
App Logic

Vector Store
Find Nearest
Neighbours
(cosine similarity)
Return document(s)
from
Nearest Neighbours
[0.1, 0.4, -0.6, ...]
Prompt Templates
Vector Database
App Logic App Logic
Context:
{context}
User Query:
{user_query}
Context: ref 1
Context: ref 2
Context: ref 3
Context: ref 4
Ryan was ...
Query
INPUT
“Query”

Vector Store
Find Nearest
Neighbours
(cosine similarity)
Return document(s)
from
Nearest Neighbours
[0.1, 0.4, -0.6, ...]
Prompt Templates
Vector Database
App Logic App Logic
Context:
{context}
User Query:
{user_query}
Context: ref 1
Context: ref 2
Context: ref 3
Context: ref 4
Answer
Query
INPUT
OUTPUT
“Query”

Vector Store
Find Nearest
Neighbours
(cosine similarity)
Return document(s)
from
Nearest Neighbours
[0.1, 0.4, -0.6, ...]
Prompt Templates
Vector Database
App Logic App Logic
Context:
{context}
User Query:
{user_query}
Context: ref 1
Context: ref 2
Context: ref 3
Context: ref 4
Answer
Query
INPUT
OUTPUT
“Query”
Dense Vector Retrieval
In-Context Learning

DATA
Rich Content
Hard to search

EMBEDDING & CHAT (LLM) MODEL
Chat Model (e.g.
, LLM)
e.g,. OpenAI GPT-4
Embeddi
ng Model
e.g., Cohere V3

Vector Database
Qdrant
Infrasturucture & Evaluation
LangChain
LangSmith
DATABASE AND INFRASTRUCTURE
Vector Database

“The real power
comes when you can
combine them with
other sources of
computation or
knowledge.”
~ Harrison Chase
Creator of LangChain

Search OpenAI blog for top k resources, rerank
1.
Ask specific questions related to content
2.
Return answers to questions with sources
3.
OpenAI RAG Flow

OPENAI RAG 👨‍💻
Presented By
Chris Alexiuk, LLM Wizard 🪄

PROMPT ENGINEERING
Check system-level prompting
and one-shot/few-shot examples
for alignment with your task
e.g., varies

RAG
“Are you pulling the right
references?
e.g., context recall

FINE-TUNING EMBEDDINGS
Is your model understanding
domain-specific language?
e.g., hit rate

AGENTS
Is your model reasoning the way
a human would?
e.g., ???

THE AGE OF THE AI ENGINEER
“A wide range of AI tasks that used to
take 5 years and a research team to
accomplish in 2013, now just require API
docs and a spare afternoon in 2023.”
It is now possible to build what used
to take months in a single day!

DATA SCIENTISTS!
Enhance Retrieval (and thus Generation!)
Fine-Tuning
Embeddings
Chat Models
Evaluation

CONCLUSIONS
Best-practice tools are out there!
LangSmith-like tooling is the most comprehensive
Building
Prompt Engineering, RAG, Fine-Tuning
Improvement
Depends on Building!
Eval varies
Lots of work for data scientist and AI Engineers!

Q&A
Tony Karrer
Founder & CEO TechEmpower,
Founder & CTO Aggregage
Dr. Greg Loughnane
Founder & CEO of
AI Makerspace
Chris Alexiuk
Co-Founder & CTO at
AI Makerspace
Tara Dwyer
Webinar Manager
/in/tonykarrer/
aggregage.com
/in/gregloughnane/
aimakerspace.io
/in/csalexiuk/
aimakerspace.io
/in/taradwyer/
artificialintelligencezone.com
JOIN THE GENERATIVE AI FOR TECHNOLOGY LEADERS LINKEDIN GROUP
FOR THOUGHTFUL DISCUSSION AND Q&A! VISIT THE LINK OR SCAN THE QR CODE!
bit.ly/genaitechleaders

LLMs in Production: Tooling, Process, and Team Structure

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to LLMs in Production: Tooling, Process, and Team Structure

Similar to LLMs in Production: Tooling, Process, and Team Structure (20)

More from Aggregage

More from Aggregage (20)

Recently uploaded

Recently uploaded (20)

LLMs in Production: Tooling, Process, and Team Structure