Supercharge Your AI Development
with Local LLMs
Speaker
Francesco Corti
Principal Product Manager, Docker
This
slide
deck
is
developed
by
DataCouch
About Speaker
Speaker
Francesco Corti
Principal Product Manager, Docker
Principal Product Manager @
Open Source enthusiast
Developer at heart (ex-DevRel)
Speaker & Author
This
slide
deck
is
developed
by
DataCouch
What you should expect
The Current AI Development Landscape (Hands-On)
The Core Challenges
What Are Local LLMs?
Development Guide For Local LLMs (Hands-On)
Integration Patterns And Tips for Developers
Questions & Answers
Demo... Demo… Demo…
This
slide
deck
is
developed
by
DataCouch
This
slide
deck
is
developed
by
DataCouch
Disclaimer
I’m not here to sell
We are going to focus on GenAI (subset of AI)
What you are going to see today is evolving fast*
(*) Insanely fast
Call To Action: Let me know your experience!
This
slide
deck
is
developed
by
DataCouch
Developing GenAI Applications is not necessarily hard
User Application
Submits prompt
LLM API
SaaS Service
Large Language Model
API
Send request to LLM
OpenAI API
Demo
Create a simple web application in python
running at the port 8080.
The web application develops a chat with the end user.
The requirements are: Flask==2.3.3, requests==2.31.0
python-dotenv==1.0.0
Use the OpenAI client because I'm going to use it.
The ChatGPT base url available at https://api.openai.com/v1/
to interact with the OpenAI API.
The OpenAI API key is sk-proj-XXX-XXX.
Then execute:
python3 -m venv venv
source venv/bin/activate
pip3 install -r requirements.txt
python3 app.py
Asking Cursor to create a GenAI Application
This
slide
deck
is
developed
by
DataCouch
Developing GenAI Applications is not necessarily hard easy
Example: Modular Monolith LLM Approach With Composability
https://medium.com/data-science/generative-ai-design-patterns-a-comprehensive-guide-41425a40d7d0
Real life scenario are never easy (as they should)
This
slide
deck
is
developed
by
DataCouch
Developing GenAI Applications Using RAG
User Application
RAG (Retrieval Augmented Generation)
Retrieval model Vector database
Create vector (embedding)
for prompt
Find relevant
information
Submits prompt
LLM API
Send request to LLM
SaaS Service
Large Language Model
API
Ingestion is not trivial
The Business Logic is more complex
(nothing crazy)
This
slide
deck
is
developed
by
DataCouch
Developing GenAI Applications Using Tools (MCP)
User Application
Submits prompt
LLM API
Send request to LLM
SaaS Service
Large Language Model
API
Tool/Function calling
Tool
(either bundled or
remote)
Invoke tools and get responses
RAG
(Retrieval Augmented
Generation)
This
slide
deck
is
developed
by
DataCouch
Developing GenAI Applications Using A Custom Model
User Application
Submits prompt
LLM API
Send request to LLM
Inference Engine
RAG
(Retrieval Augmented
Generation)
Tools
LLM
LLM
LLM
Custom LLM
It can be complex
Cannot be SaaS
This
slide
deck
is
developed
by
DataCouch
The Increasing Complexity of Developing GenAI Applications
GenAI App
Leveraging
SaaS Services
GenAI App
Leveraging
RAG and SaaS Services
(Eventually Using Tools)
GenAI App
Using
Custom Models
Easy Complex
…
Major Concerns
Security & Privacy Latency Cost At Scale
Business Developer
This
slide
deck
is
developed
by
DataCouch
Introducing Local Inference Engines + LLMs
A local Large Language Model (LLM) is an AI model deployed and executed entirely on your own hardware — such as a
personal computer, workstation, or server — without relying on external cloud services
User Application
Submits prompt
Large Language
Model (LLM)
Inference Engine
Large Language Model
API
Send request to LLM
OpenAI API
Not as SaaS
Large Language
Model (LLM)
Large Language
Model (LLM)
This
slide
deck
is
developed
by
DataCouch
Are Local Inference Engines New?
No, they exists since some time*
(*) But in these days they are getting more attention
This
slide
deck
is
developed
by
DataCouch
Are LLMs Hard To Find And Use?
No, almost every product has its own registry… but not always the same format.
GGUF
GPTQ
AWQ
EXL2
HQQ
ONNX
Safetensors
PyTorch (.pt/.bin)
TensorFlow (.pb/.ckpt)
Demo
● Local LLM in Action
● Managing Local LLMs
● Same Power But Local
Demo
● Local LLM in Action
● Managing Local LLMs
● Same Power But Local
Demo
● Local LLM in Action
● Managing Local LLMs
● Same Power But Local
This
slide
deck
is
developed
by
DataCouch
Strengths And Weaknesses
+
○ Data Are Managed Locally (Security & Privacy)
○ Zero Latency
○ Zero Cost (For Execution)
-
○ Local Resources (Need of Bigger Machines)
○ Production-like Environments (Despite OpenAI API)
○ Security of Models (Emerging) and MCP Tools
This
slide
deck
is
developed
by
DataCouch
Why Do They Have More Attention In These Days?
Models Are Getting “Smaller” Emerging Use Cases
Testing of GenAI Applications
“Light” Chatbots
Models Experimentation
Models Comparison
…
Call To Action: Are you Aware About More?
This
slide
deck
is
developed
by
DataCouch
The Current AI Development Landscape (Hands-On)
The Core Challenges
What Are Local LLMs?
Development Guide For Local LLMs (Hands-On)
Integration Patterns And Tips for Developers
Questions & Answers
This
slide
deck
is
developed
by
DataCouch
AI is going to be disruptive
A lot of developers are using AI for development
A lot of developers are playing with AI development
Very few are developing AI applications (for production)
Now it’s time to learn!
Now it’s time to build the DevEx of the future!
Francesco Corti
Principal Product Manager at Docker Inc.
https://www.linkedin.com/in/fcorti/

Supercharge Your AI Development with Local LLMs