Marco De Nittis
marco.denittis [a] gmail.com
LLM-ize your apps 101
whoami
• Independent Cloud Architect
• Trainer
• ❤ cloud, serverless, devops, AI, wasm
• Curious and tinkerer
objectives
• Explore an LLM app with no framework
• Bash scripts / curl (almost)
• Highlight fundamentals behaviours and
building blocks
Old Manticore Inn
in the beginning…
Everyone uses ChatGPT!
We should do either to improve our
business!
We could ask to some professionals
Too expensive!
Alex the waiter could do it!
He has a playstation and plays Fortnite every day
in the beginning…
Ok, we need Python
I hate snakes, no pythons here
And also several cloud servers with
GPUs
Ok, use this, that’s the same
…then
fi
rst try
• Open AI API
• Fair price
• Rest HTTPS calls with auth token
do it!
fi
rst try
• Very easy to implement
• No contextual responses
second try - context
• Prompt engineering
• Adding context with background
informations
• Mixing user text with “prepared
contextual text”
prompt engineering
• Several established techniques
• Role: “you are an helpful restaurant
assistant..
• Shots: providing examples
• Ask for steps
do it!
second try - context
• Tones and background are ok
• No integrations with company
informations
what is a LLM
• Probabilistic engine: most probable text
response from input (according to the
corpus)
• Black box with input and output pipe
• No dynamic/short term memory
• Context
context
• Max input + output size
• Measured in tokens
• Group of letters
• English 1.2 token/word
• Typical size 4k - 32k, and
beyond…
memory with context
• Adding all the informations in each
request
• How "the chat" ChatGPT works
• Limited to context size
RAG + external memory
• Semantic memory
• => retrieve the “most relevant” informations
• Vector database + RAG:
• Retrieval Augmented Generation
• Include in context only the relevants
• Easy to edit the memory
fi
ne tuning
• Enhancing the “cultural baggage” of
the LLM
• Train again the model with custom
speci
fi
c data
• Expensive (not so much)
• Dif
fi
cult to “edit” the data in memory
embeddings
• Place a text in a multidimensional “space of concepts"
• Digest the informations
• text => vector (of
fl
oats)
• High dimensional vector (300+)
• Relevancy => mathematical distance
• A model “calculates” the embeds
• Vector database to store & search
vector database
• A database capable to handle vector of
fl
oats
• Indexing
• Search
• Find distances (several algorithms)
• Ad hoc db: Qdrant, Pinecone, Chroma, Milvus, …
• Almost any common db: postgres, redis, sqlite, …
phases
• Storing and indexing company info in a vector DB
• Embeddings contents
• Una tantum
• Embeddings of user prompt
• Search of relevant infos
• Embed infos in the prompt
phases
do it!
improvements
• Chunking input for large document
• Check for prompt injection
• feedback (or self feedback) for the
response
improvements
• Integrate LLM framework
• Assistant base approach
• An LLM coordinating the work of
other LLMs
• Needs a full-
fl
edged framework
point of attentions
• Control costs
• Input control
• Predictability
• Sustainability
thank you!
Marco De Nittis
marco.denittis [a] gmail.com
https://speakerscore.it/AIHEROES-101
https://github.com/mdnmdn/aiheroes-2023

Integrate LLM in your applications 101

  • 1.
    Marco De Nittis marco.denittis[a] gmail.com LLM-ize your apps 101
  • 2.
    whoami • Independent CloudArchitect • Trainer • ❤ cloud, serverless, devops, AI, wasm • Curious and tinkerer
  • 3.
    objectives • Explore anLLM app with no framework • Bash scripts / curl (almost) • Highlight fundamentals behaviours and building blocks
  • 4.
  • 5.
    in the beginning… Everyoneuses ChatGPT! We should do either to improve our business! We could ask to some professionals Too expensive! Alex the waiter could do it! He has a playstation and plays Fortnite every day
  • 6.
    in the beginning… Ok,we need Python I hate snakes, no pythons here And also several cloud servers with GPUs Ok, use this, that’s the same
  • 7.
  • 8.
    fi rst try • OpenAI API • Fair price • Rest HTTPS calls with auth token
  • 9.
  • 10.
    fi rst try • Veryeasy to implement • No contextual responses
  • 11.
    second try -context • Prompt engineering • Adding context with background informations • Mixing user text with “prepared contextual text”
  • 12.
    prompt engineering • Severalestablished techniques • Role: “you are an helpful restaurant assistant.. • Shots: providing examples • Ask for steps
  • 13.
  • 14.
    second try -context • Tones and background are ok • No integrations with company informations
  • 15.
    what is aLLM • Probabilistic engine: most probable text response from input (according to the corpus) • Black box with input and output pipe • No dynamic/short term memory • Context
  • 16.
    context • Max input+ output size • Measured in tokens • Group of letters • English 1.2 token/word • Typical size 4k - 32k, and beyond…
  • 17.
    memory with context •Adding all the informations in each request • How "the chat" ChatGPT works • Limited to context size
  • 18.
    RAG + externalmemory • Semantic memory • => retrieve the “most relevant” informations • Vector database + RAG: • Retrieval Augmented Generation • Include in context only the relevants • Easy to edit the memory
  • 19.
    fi ne tuning • Enhancingthe “cultural baggage” of the LLM • Train again the model with custom speci fi c data • Expensive (not so much) • Dif fi cult to “edit” the data in memory
  • 20.
    embeddings • Place atext in a multidimensional “space of concepts" • Digest the informations • text => vector (of fl oats) • High dimensional vector (300+) • Relevancy => mathematical distance • A model “calculates” the embeds • Vector database to store & search
  • 21.
    vector database • Adatabase capable to handle vector of fl oats • Indexing • Search • Find distances (several algorithms) • Ad hoc db: Qdrant, Pinecone, Chroma, Milvus, … • Almost any common db: postgres, redis, sqlite, …
  • 22.
    phases • Storing andindexing company info in a vector DB • Embeddings contents • Una tantum • Embeddings of user prompt • Search of relevant infos • Embed infos in the prompt
  • 23.
  • 24.
  • 25.
    improvements • Chunking inputfor large document • Check for prompt injection • feedback (or self feedback) for the response
  • 26.
    improvements • Integrate LLMframework • Assistant base approach • An LLM coordinating the work of other LLMs • Needs a full- fl edged framework
  • 27.
    point of attentions •Control costs • Input control • Predictability • Sustainability
  • 28.
    thank you! Marco DeNittis marco.denittis [a] gmail.com https://speakerscore.it/AIHEROES-101 https://github.com/mdnmdn/aiheroes-2023