TechDayPakistan-Slides RAG with Cosmos DB.pptx

Unleashing the power of
rag retrieval augmented
generation with cosmos
db and LLM’s
Usama Wahab Khan
MVP
, MCT, AI Champ award winner
CTO @ Evolution Technologies
#TechdayPakistan | @TechDayP | TechDayPakistan.com

Usama Wahab Khan
 Father, data Scientist, Developer/Nerd, Traveler
Twitter : @usamawahabkhan
LinkedIn : Usamawahabkhan

Introduction to Generative AI
Image or Video generation Code generation
Generative AI refers to a type of artificial intelligence that has the ability
to generate content that is, in many cases, indistinguishable from content
created by humans. This AI can produce text, images, audio, or even
video, often in response to a given input or prompt.
Generative AI operates by learning patterns and structures from large
datasets and then using that knowledge to produce new content that fits
within those learned patterns. It's a type of machine learning where the AI
model learns to understand and mimic the characteristics of the data it
has been trained on.

Generative AI
capabilities
Limitless generation with a
few lines of input
Essay
outlines
Summarizi
ng text
Virtual
Assistants
Extracting
insights
Classifying
text
Language
Translation
Poem
creation
Code
refactoring
Writing
assistance
Code
generation
Answering
questions
Creative
Ideation
Subject
Research Dialog
agents
Comments
from code
Semantic
search
Image
Generation

What are
large
language
models?
Large Language Models (i.e., “language calculators”)
Large: More data than can be manually labeled
Language: Match context and words (e.g., word prediction, creative writing)
Model: Semi-supervised learning
A large language model (LLM) is a type of AI that can
process and produce natural language text. It learns from
a massive amount of data gathered from sources like
books, articles, webpages, and images to discover
patterns and rules of language.

Foundation Models
Data
Text
Images
Speech
Structured Data
3d Signals
Tasks
Question and
Answering
Sentiment Analysis
Information
Extraction
Image Captioning
Object Recognition
Training Adaptation
Foundation
Model
Instruction Follow
Transformer Model

Understanding
prompts,
completions, and
tokens
Like a person writing an essay, an AI
model takes a prompt and continues
writing based on the text in the prompt.
The new text that the model outputs is
called the completion. An example task
might be to write a Python program to
add two numbers. If you write out the
task as a Python comment like so:
Prompt engineering

What is OpenAI?
OpenAI is a private research laboratory that aims to
develop and direct artificial intelligence (AI) in ways that
benefit humanity as a whole. The company was founded
by Elon Musk, Sam Altman and others in 2015 and is
headquartered in San Francisco.
Introduction to OpenAI

Hallucinations
While the LLM’s like ChatGPT model has proven to have extensive knowledge, it
can still be wrong at times. It’s important to understand this limitation and apply
mitigations for your scenario.
Fine-Tuning is Expensive
Enhanced Contextual Relevance
No Realtime Knowledge
LLM’s

Fine Tuning
 This is the process of taking a
pre-trained LLM such as Llama
OR OpenAI’s GPT Models and
further training it on a smaller,
specific dataset to adapt it for a
particular task or to improve its
performance. By finetuning, we
are adjusting the model’s
weights based on our data,
making it more tailored to our
application’s unique needs.
3/4/2024
19

What is RAG?
RAG, or retrieval augmented generation, is a method
introduced by Meta AI researchers that combines an
information retrieval component with a text generator
model to address knowledge-intensive tasks
Large language models (LLMs) like ChatGPT are trained
on public internet data which was available at the point in
time when they were trained. They can answer questions
related to the data they were trained on. This public data
might not be sufficient to meet all your needs. You might
want questions answered based on your private data. Or,
the public data might simply have gotten out of date. The
solution to this problem is Retrieval Augmented
Generation (RAG), a pattern used in AI which uses an
LLM to generate answers with your own data

User Question
LLM Workflow
Query My Data
Add Results to Prompt
Query Model
Send Results

Why use RAG?
• Fine-tuning is suitable for continuous domain adaptation, enabling significant improvements in
model quality but often incurring higher costs. Conversely, RAG offers an alternative approach,
allowing the use of the same model as a reasoning engine over new data provided in a prompt.
This technique enables in-context learning without the need for expensive fine-tuning,
empowering businesses to use LLMs more efficiently.
• RAG allows businesses to achieve customized solutions while maintaining data relevance and
optimizing costs. By adopting RAG, companies can use the reasoning capabilities of LLMs, utilizing
their existing models to process and generate responses based on new data. RAG facilitates
periodic data updates without the need for fine-tuning, thereby streamlining the integration of
LLMs into businesses.
1. Provide supplemental data as a directive or a prompt to the LLM
2. Adds a fact checking component on your existing models
3. Train your model on up-to-date data without incurring the extra time and costs associated with
fine-tuning
4. Train on your business specific data
3/4/2024 #TechdayPakistan | @TechDayP | TechDayPakistan.com
22

RAG Process
3/4/2024 #TechdayPakistan | @TechDayP | TechDayPakistan.com 24

Anatomy of a RAG app
App UX Orchestrator
Retriever over Knowledge Base
Query 
Knowledge
Prompt + Knowledge 
Response
Large Language Model
Build your own experience
UX, orchestration, calls to retriever and LLM
e.g., Copilots, in-app chat
Extend other app experiences
Plugins for retrieval, symbolic math,
app integration, etc.
e.g., plugins for OpenAI ChatGPT

RAG Components
 Data-Sources ( unstructured and structure or Cloud
storage)
 Indexing pipeline
– Data Cleaning or OCR
– Data Chunking
– Indexing
 Embedding Model
 Vector Database
 Vector Index
 Vector retrieval query
 Application AI orchestrator
 LLM to Response

Embedding Model
• “Embeddings are vectors or arrays of numbers
that represent the meaning and context of the
tokens processed by the model. They are used to
encode and decode input and output texts, and
can vary in size and dimension. Embeddings can
help the model understand the relationships
between tokens, and generate relevant and
coherent texts.”
• LLM embedding models?
• text-embedding-ada-002 by Azure Open AI
• Hugging face Leadership board for embedding
models
https://huggingface.co/spaces/mteb/leaderboard

28

Vector databases or store
 Vector databases or stores: are used in numerous domains and
situations across analytical and generative AI, including
natural language processing, video and image recognition,
recommendation systems, search, etc. Most popular options
for RAG. Vector database is used to store Text embeddings.
The outcomes of Embedding models into the quarriable
database.
29

Vector databases or store
 Poplar Vector Database / Stores
 There are several vector databases and stores available for AI-native embedding. Here
are some examples:
 Pinecone: A cloud-native vector database that offers a seamless API and hassle-free
infrastructure. It is now available on Azure Marketplace.
 Chroma DB: An open-source vector database tailored for AI-native embedding.
 Faiss: An open-source library developed by Facebook AI Research for fast, dense vector
similarity search and grouping.
 Azure Cosmos DB: A globally distributed, multi-model database service that supports
document, key-value, graph, and column-family data models. It also supports vector
database extensions. Use the native vector search feature in Azure Cosmos DB for
MongoDB vCore, which offers an efficient way to store, index, and search high-
dimensional vector data directly alongside other application data. Use the native vector
search feature in Azure Cosmos DB for PostgreSQL, which offers an efficient way to
store, index, and search high-dimensional vector data directly alongside other
application data
30

Azure Cosmos DB Vector store Options
 Azure Cosmos DB for Mongo DB vCore Store your application
data and vector embeddings together in a single MongoDB-
compatible service featuring native support for vector search.
 Azure Cosmos DB for PostgreSQL Store your data and vectors
together in a scalable PostgreSQL offering with native support
for vector search.
 Azure Cosmos DB for NoSQL with Azure AI Search Augment
your Azure Cosmos DB data with semantic and vector search
capabilities of Azure AI Search.
31

Speaker Slides

Optional Slides
The slides that follow this slide are optional
Note: This slide is just for Informational Purposes

TechDayPakistan-Slides RAG with Cosmos DB.pptx

Recommended

Recommended

More Related Content

Similar to TechDayPakistan-Slides RAG with Cosmos DB.pptx

Similar to TechDayPakistan-Slides RAG with Cosmos DB.pptx (20)

More from Usama Wahab Khan Cloud, Data and AI

More from Usama Wahab Khan Cloud, Data and AI (15)

Recently uploaded

Recently uploaded (20)

TechDayPakistan-Slides RAG with Cosmos DB.pptx

Editor's Notes