This document provides an agenda for a full-day bootcamp on large language models (LLMs) like GPT-3. The bootcamp will cover fundamentals of machine learning and neural networks, the transformer architecture, how LLMs work, and popular LLMs beyond ChatGPT. The agenda includes sessions on LLM strategy and theory, design patterns for LLMs, no-code/code stacks for LLMs, and building a custom chatbot with an LLM and your own data.
Unblocking The Main Thread Solving ANRs and Frozen Frames
LLMs Inside Bootcamp Fundamentals
1. NoCode, Data & AI
LLM Inside Bootcamp
Fundamentals of LLM
What is a large language model, how is it trained, how are
different from traditional machine learning models.
Rahul Xavier Singh Anant Corporation
Nocode Data & AI
2. To most , LLMs seem like
magic. In computing &
technology, LLMs show
great promise in bridging
the gap between human
computer interaction.
4. NoCode, Data & AI
LLM Inside Bootcamp
with Cassandra
Full day bootcamp to familiarize product managers, software
professionals, and data engineers to creating next generation
experts, assistants, and platforms powered by Generative AI
with Large Language Models (LLM, OpenAI, GPT)
Rahul Xavier Singh Anant Corporation
Nocode Data & AI
kono.io/bootcamp
5. Agenda
● I: Strategy & Theory
● II: LLM Design Patterns
● III: NoCode/Code LLM Stacks
● IV: Build a Custom ChatBot
with LLM your Data
8. History of Large Language Models
1. Everything
before GPT-3
(2020) was trash.
2. ChatGPT made
GPT-3 popular.
3. Now everyone
wants in on the
party.
https://voicebot.ai/large-language-models
-history-timeline/
Most of the hype, growth
relating to LLMs have
happened in the last 6 months
( November 2022 till now , May
2023
9. Machine Learning in a Nutshell
https://www.avenga.com/magazine/machi
ne-learning-programming/
1. In machine learning, the
computer trains on your
data, and gives you the
most likely answer. The
better the data, the
better the algorithm.
2. Neural networks process
input data through layers
to predict outcomes
based on patterns and
relationships learned
during training.
10. What can Neural Neworks do?
https://thedatascientist.com/wp-content/uploads/2018/03/Deep-Neural-
Network-What-is-Deep-Learning-Edureka.png
1. Artificial neural networks (ANN)
can recognize patterns and
relationships in data.
2. They can classify and categorize
data accurately.
3. They can make predictions based
on input data.
4. Neural networks can be used for
image and speech recognition.
5. Deep neural network is an ANN
that has many layers and can do
more complex predictions.
6. They can be trained to improve
their accuracy over time.
https://www.analyticsvidhya.com/blog/202
1/05/convolutional-neural-networks-cnn/
11. What is the big deal about Transformers?
1. Because ANNs are implementations in matrix math -
and that relates to the Matrix of Leadership …
2. Transformers improve natural language processing,
enabling better chatbots and language translation
tools.
3. Transformers are a neural network architecture that
outperforms previous models on various NLP tasks.
4. Attention mechanisms in Transformers better model
long-term dependencies in sequential data.
5. Transformers are a hardware accelerator that
speeds up AI computations by several orders of
magnitude.
6. Transformers were invented by Elon Musk
The encoder-decoder structure of the Transformer
architecture
Taken from “Attention Is All You Need“
12. How LLMs Work &
What LLMs Do
● Transformers Decoder/Encoder
● What LLMs Do: Predict Words
● What LLMs Do: Narrow Possibilities
● What LLMs Do: Verse Jumping
● What LLMs Do: Document Construction
13. How does a Large Language Model Work?
1. The transformer architecture consists of two
components: the encoder and decoder.
2. The encoder processes the input sequence and
generates embeddings through self-attention
mechanisms.
3. The decoder takes the encoder's embeddings as
input and generates an output sequence, while
also using self-attention mechanisms to attend to
relevant parts of the input sequence.
4. Together, they enable the transformer to learn
complex patterns and relationships within
sequences, making it a powerful tool for natural
language processing and other sequence modeling
tasks.
The encoder-decoder structure of the Transformer
architecture
Taken from “Attention Is All You Need“
14. What LLMs Do: Predict Words
1. A language model uses deep learning
algorithms to learn patterns and
relationships in large sets of text data.
2. It is trained on a large corpus of text, such
as books, articles, and websites, to
recognize and understand the underlying
structure and meaning of language.
3. Once trained, the model can generate
new text based on the input it receives,
by predicting the most likely sequence of
words to follow.
4. The model uses a probabilistic approach
to generate text, allowing it to produce
diverse and creative responses to different
inputs.
5. LLMs have a wide range of applications,
including language translation, chatbots,
content creation, and more.
https://vectara.com/avoiding-hallucinations-in-llm-powered-applications/
15. What LLMs Do: Narrow Possibilities
1. A LLM is like a really
smart guesser that's
been trained on a lot
of text.
2. When you give it a
prompt, it starts
guessing what the
next word might be.
3. Instead of guessing
randomly, it predicts
the best possible
word.
4. As you add words to
your prompt, you are
narrowing down the
overall “document”
you get back.
16. What LLMs Do: Verse Jumping
1. It’s a simulator of the real world, but it isn’t a real
world. Each prompt is a portal to a a possible
realistic universe.
2. It contains probabilities of words or tokens from the
tokenverse strung together which we can call a
“Document”
3. As you give it more words, the universe of possible
“Documents” reduces.
https://now.tufts.edu/2022/05/31/exploring-shape-
our-universe-and-multiverse
17. What LLMs Do: Document Construction
1. Each model has a
“tokenverse” which it
picks words from.
GPT4 has 100k tokens.
2. Document A &
Document B are
possible path through
all of the tokens in the
tokenverse for a
particular model.
3. If you start with certain
words, a Prompt A’,
the possibility of
getting Document A
increases
4.
A’
B’
18. LLMs other than
ChatGPT/GPT
● Popular LLMs Available
● Popular Open Source LLMs Available
● Cloud Providers LLM Offerings
19. Popular Public LLMs Available Today
1. OpenAI: ChatGPT,
GPT3.5-Turbo,
Text-Davinci-003,
GPT4 (Waitlist)
2. Anthropic: Claude,
Claude-Instant
3. Cohere: Baseline,
allows training
https://vectara.com/top-large-language-models-llms-gpt-4-llama-g
ato-bloom-and-when-to-choose-one-over-the-other/
If you are starting out, just use GPT-3.5 Turbo.
It’s easy to get access to, and there are lots of
code examples on Github
20. Leaked @Google: “We Have No Moat…”
“We Have No Moat, And
Neither Does OpenAI"
https://lmsys.org/blog/2023-03-30-vicuna/
https://www.semianalysis.com/p/google-we-have-
no-moat-and-neither ● Meta LLaMa Open Sourced
● GPT Answers used to Train
● LoRA - Low rank adaptation
● Retraining models is hard
● Small models iterating
better
● Data quality scales better
● Battling open source means
failure
● Companies need users /
researchers
● Individuals can use
different licenses
● Be your customer
● Let open source do the
work
● OpenAI no different than
Google
21. Example Open LLM: Stanford Alpaca
https://crfm.stanford.edu/2023/03/13/alpaca.html
https://lmsys.org/blog/2023-03-30-vicuna/
22. Popular Open LLMs Available Today
Leaderboard
1. Vicuna-13b
2. Koala-13b
3. Oast-pythia-12b
Others to Look into
4. StableLM
5. Dolly
6. ChatGLM
https://chat.lmsys.org/
If you don’t want to send your data to a public
LLM, you can host your own open model, or use
Azure OpenAI, Amazon Bedrock
23. Cost of Fine Tuning: Alpaca/Vicuna
https://lmsys.org/blog/2023-03-30-vicuna/
24. Public Cloud Offerings of LLM
1. Azure OpenAI
2. Amazon Bedrock
3. NVidia NeMo
4. Google Vertex (batteries
not Included)
https://venturebeat.com/ai/amazon-launches-bedrock-for-generative
-ai-escalating-ai-cloud-wars/
Azure OpenAI is the most mature, and probably the
best. Amazon’s Bedrock offers managed hosting of
Claude, StableLM, etc. Google’s offering requires
work to get it to work.
25. 25
Key Takeaways: History Foundations of LLM
Neural Networks : 1940s/50s
Transformers/Attention: 2017
GPT3: 2020, GPT3.5: 2022
Tensorflow : 2015/ Pytorch 2016
- People have been hacking away at
ML/AI since the 1940s. Until GPUs, TPUs,
Cloud Infrastructure, very few
companies could do “Deep Learning”
- Deep Learning enabled great stuff in
vision, speech, and starts to generative
AI. It wasn’t until the Transformers paper,
that things took off.
- LLMs are good at predicting the “next
word” or token from a tokenverse given
an input.
- The quality / characteristics of the
prompt given, narrows down a
Document from a multiverse of
documents.
TPUs / GPT: 2018, GPT2: 2019
Everything Else: 2023 Q1/Q2
26. 26
Thank you and Dream Big.
Hire us
- Design Workshops
- Innovation Sprints
- Service Catalog
Anant.us
- Read our Playbook
- Join our Mailing List
- Read up on Data Platforms
- Watch our Videos
- Download Examples