Scanning the Internet for External Cloud Exposures via SSL Certs
TechDayPakistan-Slides RAG with Cosmos DB.pptx
1. Unleashing the power of
rag retrieval augmented
generation with cosmos
db and LLM’s
Usama Wahab Khan
MVP
, MCT, AI Champ award winner
CTO @ Evolution Technologies
#TechdayPakistan | @TechDayP | TechDayPakistan.com
5. Introduction to Generative AI
Image or Video generation Code generation
Generative AI refers to a type of artificial intelligence that has the ability
to generate content that is, in many cases, indistinguishable from content
created by humans. This AI can produce text, images, audio, or even
video, often in response to a given input or prompt.
Generative AI operates by learning patterns and structures from large
datasets and then using that knowledge to produce new content that fits
within those learned patterns. It's a type of machine learning where the AI
model learns to understand and mimic the characteristics of the data it
has been trained on.
6. Generative AI
capabilities
Limitless generation with a
few lines of input
Essay
outlines
Summarizi
ng text
Virtual
Assistants
Extracting
insights
Classifying
text
Language
Translation
Poem
creation
Code
refactoring
Writing
assistance
Code
generation
Answering
questions
Creative
Ideation
Subject
Research Dialog
agents
Comments
from code
Semantic
search
Image
Generation
7. What are
large
language
models?
Large Language Models (i.e., “language calculators”)
Large: More data than can be manually labeled
Language: Match context and words (e.g., word prediction, creative writing)
Model: Semi-supervised learning
A large language model (LLM) is a type of AI that can
process and produce natural language text. It learns from
a massive amount of data gathered from sources like
books, articles, webpages, and images to discover
patterns and rules of language.
8. Foundation Models
Data
Text
Images
Speech
Structured Data
3d Signals
Tasks
Question and
Answering
Sentiment Analysis
Information
Extraction
Image Captioning
Object Recognition
Training Adaptation
Foundation
Model
Instruction Follow
Transformer Model
13. Understanding
prompts,
completions, and
tokens
Like a person writing an essay, an AI
model takes a prompt and continues
writing based on the text in the prompt.
The new text that the model outputs is
called the completion. An example task
might be to write a Python program to
add two numbers. If you write out the
task as a Python comment like so:
Prompt engineering
14.
15.
16. What is OpenAI?
OpenAI is a private research laboratory that aims to
develop and direct artificial intelligence (AI) in ways that
benefit humanity as a whole. The company was founded
by Elon Musk, Sam Altman and others in 2015 and is
headquartered in San Francisco.
Introduction to OpenAI
17.
18. Hallucinations
While the LLM’s like ChatGPT model has proven to have extensive knowledge, it
can still be wrong at times. It’s important to understand this limitation and apply
mitigations for your scenario.
Fine-Tuning is Expensive
Enhanced Contextual Relevance
No Realtime Knowledge
LLM’s
19. Fine Tuning
This is the process of taking a
pre-trained LLM such as Llama
OR OpenAI’s GPT Models and
further training it on a smaller,
specific dataset to adapt it for a
particular task or to improve its
performance. By finetuning, we
are adjusting the model’s
weights based on our data,
making it more tailored to our
application’s unique needs.
#TechdayPakistan | @TechDayP | TechDayPakistan.com
3/4/2024
19
20. What is RAG?
RAG, or retrieval augmented generation, is a method
introduced by Meta AI researchers that combines an
information retrieval component with a text generator
model to address knowledge-intensive tasks
Large language models (LLMs) like ChatGPT are trained
on public internet data which was available at the point in
time when they were trained. They can answer questions
related to the data they were trained on. This public data
might not be sufficient to meet all your needs. You might
want questions answered based on your private data. Or,
the public data might simply have gotten out of date. The
solution to this problem is Retrieval Augmented
Generation (RAG), a pattern used in AI which uses an
LLM to generate answers with your own data
22. Why use RAG?
• Fine-tuning is suitable for continuous domain adaptation, enabling significant improvements in
model quality but often incurring higher costs. Conversely, RAG offers an alternative approach,
allowing the use of the same model as a reasoning engine over new data provided in a prompt.
This technique enables in-context learning without the need for expensive fine-tuning,
empowering businesses to use LLMs more efficiently.
• RAG allows businesses to achieve customized solutions while maintaining data relevance and
optimizing costs. By adopting RAG, companies can use the reasoning capabilities of LLMs, utilizing
their existing models to process and generate responses based on new data. RAG facilitates
periodic data updates without the need for fine-tuning, thereby streamlining the integration of
LLMs into businesses.
1. Provide supplemental data as a directive or a prompt to the LLM
2. Adds a fact checking component on your existing models
3. Train your model on up-to-date data without incurring the extra time and costs associated with
fine-tuning
4. Train on your business specific data
3/4/2024 #TechdayPakistan | @TechDayP | TechDayPakistan.com
22
25. Anatomy of a RAG app
App UX Orchestrator
Retriever over Knowledge Base
Query
Knowledge
Prompt + Knowledge
Response
Large Language Model
Build your own experience
UX, orchestration, calls to retriever and LLM
e.g., Copilots, in-app chat
Extend other app experiences
Plugins for retrieval, symbolic math,
app integration, etc.
e.g., plugins for OpenAI ChatGPT
26. RAG Components
Data-Sources ( unstructured and structure or Cloud
storage)
Indexing pipeline
– Data Cleaning or OCR
– Data Chunking
– Indexing
Embedding Model
Vector Database
Vector Index
Vector retrieval query
Application AI orchestrator
LLM to Response
3/4/2024 #TechdayPakistan | @TechDayP | TechDayPakistan.com 26
27. Embedding Model
• “Embeddings are vectors or arrays of numbers
that represent the meaning and context of the
tokens processed by the model. They are used to
encode and decode input and output texts, and
can vary in size and dimension. Embeddings can
help the model understand the relationships
between tokens, and generate relevant and
coherent texts.”
• LLM embedding models?
• text-embedding-ada-002 by Azure Open AI
• Hugging face Leadership board for embedding
models
https://huggingface.co/spaces/mteb/leaderboard
3/4/2024 #TechdayPakistan | @TechDayP | TechDayPakistan.com 27
29. Vector databases or store
Vector databases or stores: are used in numerous domains and
situations across analytical and generative AI, including
natural language processing, video and image recognition,
recommendation systems, search, etc. Most popular options
for RAG. Vector database is used to store Text embeddings.
The outcomes of Embedding models into the quarriable
database.
3/4/2024 #TechdayPakistan | @TechDayP | TechDayPakistan.com
29
30. Vector databases or store
Poplar Vector Database / Stores
There are several vector databases and stores available for AI-native embedding. Here
are some examples:
Pinecone: A cloud-native vector database that offers a seamless API and hassle-free
infrastructure. It is now available on Azure Marketplace.
Chroma DB: An open-source vector database tailored for AI-native embedding.
Faiss: An open-source library developed by Facebook AI Research for fast, dense vector
similarity search and grouping.
Azure Cosmos DB: A globally distributed, multi-model database service that supports
document, key-value, graph, and column-family data models. It also supports vector
database extensions. Use the native vector search feature in Azure Cosmos DB for
MongoDB vCore, which offers an efficient way to store, index, and search high-
dimensional vector data directly alongside other application data. Use the native vector
search feature in Azure Cosmos DB for PostgreSQL, which offers an efficient way to
store, index, and search high-dimensional vector data directly alongside other
application data
3/4/2024 #TechdayPakistan | @TechDayP | TechDayPakistan.com
30
31. Azure Cosmos DB Vector store Options
Azure Cosmos DB for Mongo DB vCore Store your application
data and vector embeddings together in a single MongoDB-
compatible service featuring native support for vector search.
Azure Cosmos DB for PostgreSQL Store your data and vectors
together in a scalable PostgreSQL offering with native support
for vector search.
Azure Cosmos DB for NoSQL with Azure AI Search Augment
your Azure Cosmos DB data with semantic and vector search
capabilities of Azure AI Search.
3/4/2024 #TechdayPakistan | @TechDayP | TechDayPakistan.com
31
33. Optional Slides
The slides that follow this slide are optional
#TechdayPakistan | @TechDayP | TechDayPakistan.com
Note: This slide is just for Informational Purposes
Editor's Notes
Overview of generative AI capabilities for inspiration.
This technique uses a large language model (LLM) to generate text based on information retrieved from external sources. The process involves the following steps: Getting data: The data can be any text-based document or database that contains relevant facts or knowledge for the task. Splitting it into small chunks: The data is divided into smaller pieces of text, such as sentences or paragraphs, that can be easily processed by the LLM. Using a specific type of LLM embedding model: The embedding model is a component of the LLM that converts each text chunk into a numerical vector in a high-dimensional space. The vector represents the meaning and context of the text chunk.