2. AI vs ML vs DL vs GenAI
Artificial Intelligence
Machine Learning
Neural Networks
Deep Learning
Generative AI
Large Language Models
3. Artificial Intelligence
(AI)
• AI is a discipline, a branch of
computer science, that deals with
the creation and development of
machines that think and act like
humans.
• AI powered technologies have
been around for a while and
some everyday examples are Siri
and Alexa and customer service
chatbots that pop up on websites
4. Machine Learning
(ML)
• Machine Learning is a subfield of AI. It is a
program or system that trains a model from
input data and then that trained model can
make useful predictions from new or never
before seen data.
• So, ML gives the computer the ability to learn
without explicitly programming. While in
traditional programming, developers write
explicit instructions for a computer to execute,
in ML, algorithms learn patterns and
relationships from data to make predictions or
decisions.
• Unsupervised, Supervised and Reinforcement
learning are the most common ML models.
5. Neural Networks
• Also known as artificial neural
networks (ANNs) or simulated neural
networks (SNNs), are a subset of
machine learning.
• The name and structure inspired by the
human brain, mimicking the way that
biological neurons signal to one
another.
• NN consist of interconnected artificial
neurons organized in layers: an input
layer, one or more hidden layers, and
an output layer. NN are at the heart of
deep learning algorithms.
6. Deep Learning (DL)
• Deep Learning is a subset of NN. The word deep here refers to the depth of layers in
a neural network.
• Any neural network with more than three hidden layers can be considered a deep
learning algorithm.
• Having a higher number of hidden layers, DL models are well-suited for tackling
complex real-world problems. Everyday examples of technologies using NN and DL
are:
• Image recognition
• Object detection in smartphone cameras — such as Facial Recognition
• Autofocus
• Online language translation services like Google Translate.
7. Generative AI
• Gen AI is a subset of Deep Learning,
focuses on creating models capable of
generating new content that resemble
existing data.
• These models aim to generate content that
is indistinguishable from what might be
created by humans.
• Generative Adversarial Networks (GANs)
are popular examples of generative AI
models that use deep neural networks to
generate realistic content such as images,
text, or even music.
8. Large Language
Model
• LLM is a form of generative AI, which
focuses on generating human-like text
based on the patterns learned from
vast amounts of textual data during the
training process.
• Large Language Model can be
considered as a specific type of
machine learning model specialized in
natural language processing
• ChatGPT is possibly the most famous
example of technologies using LLM
right now.
10. • Type of Generative Intelligence
Model for understanding and
generating text, images and various
other types of data.
• Transformers analyse chunks of data
called “Tokens” and learn to predict
the next token in a sequence, based
on previous and following token.
• The Output of the Model such as the
prediction of a word in a sentence, is
influenced by the previous words it
has generated.
What are Transformers ?
11.
12. 2017: Google
Revolutionized Text
Generation
• Google Introduces Transformers which
became state of the art approach to solve
most Natural Language Processing
Problems.
• OpenAI’s Generative Pre-trained
Transformers (DALL E,2021; ChatGPT,2022)
as the name suggests, reposes on
Transformers.
13.
14.
15. Techniques for Tailoring LLM’s to specific Problems
REINFORCEMENT LEARNING
FROM HUMAN FEEDBACK (RLHF)
FINE TUNING PROMPT ENGINEERING
16.
17. Then we need a model
Commercial API’s
• Google, OpenAI,
Microsoft …
• Privacy Concerns
• No Specific Hardware
Requirements
• Prompt Engineering
Train a Model from
scratch
• Requires huge data and
computing resources
Foundational Models
• Open-Source Models
• Fine Tuned
• May require specific
hardware/infrastructure
18. Finetuning
Retraining a pre-
trained model on a
specific task or dataset
to adapt it for a
particular application
1
Training the model on
a dataset of data that
is relevant to the task
2
Training the LLM on a
smaller, more specific
set of information
3
19. RLHF
LLMs are trained on the web data with a lot of irrelevant matters (unhelpful), or worse, where false
(dishonest) and/or harmful information are abundant, e.g.,
• Potentially dangerous false medical advices.
• Valid techniques for illegal activities (hacking, deceiving, building weapons, ...).
HHH (Helpful, Honest & Harmless) alignment (Askell et al., 2021): ensuring that the model's behavior and
outputs are consistent with human values, intentions, and ethical standards.
Reinforcement Learning from Human Feedback, or RLHF (Casper et al., 2023)
• "is a technique for training Al systems to align with human goals."
• "[It] has emerged as the central method used to finetune state-of-the-art [LLMs]."
• It reposes on human judgment and consensus.
Source:
o Casper et al., 2023, Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback. arxiv.org/abs/2307.15217 Ziegler et al., 2022, Fine-Tuning Language Models from Human Preferences, arxiv.org/abs/1909.08593
o Askell et al., 2021, A General Language Assistant as a Laboratory for Alignment. arxiv.org/abs/2112.00861
26. Backbone: Diffusion Models
• Diffusion models are a form of generative model built to create
new data that resembles the data they were trained on.
• They have a variety of uses, such as data generation for
domains where real data is limited (ex. medical imaging).
• Diffusion models consist of a forward and backward process.
• The forward process consists of progressively destroying data,
traditionally images, until it is pure noise. Then the backward
process, consisting of a U-Net, aims to recover the original data
from the noise.
27. Eventually, the trained model is supplied pure noise and only the
backward process is run to synthesize new data like that in the
training dataset.
28. Recent Developments
• By introducing a latent
phase (https://arxiv.org/pdf/2112.10752.pdf) into which images
are autoencoded, the forward/backward process occurs in the
latent space, allowing for faster sampling overall.
• Basically, adding a latent phase means that the original images
are compressed, or encoded, into a smaller/latent dimension
using a neural network, and then the diffusion model is only
responsible for learning from and generating these latents.
• Once generated, these latents are then passed through
a decoder which can fill in details at a higher resolution.
29.
30. How Stable Diffusion is different ?
• Lets start off by better understanding the components of the
model
1. CLIP (Contrastive Language-Image Pretraining) Text Encoder
• Main differences of Stable Diffusion compared to traditional diffusion
models is that it accepts a text prompt.
• CLIP was trained to place related images and text into a similar latent
space.
• If CLIP is given an image of a dog, it should be able to correctly output
the text string “photo of a dog”, because the model has learned to put
the image and text encodings close to each other in latent space.
31.
32. 2. Variational AutoEncoder (VAE)
• A VAE is a neural network that facilitates the conversion to/from latent
space for images.
• The Encoder acts like a compressor, squishing the input image into a
lower dimensional latent representation.
• Once the forward / reverse diffusion process finishes and the diffusion
model has output a reconstruction the original latent, this output latent is
passed through the Decoder to create an image with the same resolution
as input images.
33. 3.
Diffusion
Model
• For Stable Diffusion is
that the backward
process uses the text
embedding as well
as random noise to
generate the desired
image.
34.
35. To put it all together, we can follow this general procedure to build
our own Stable Diffusion pipeline to generate images from text:
1.Encode our text prompt using the CLIP model.
2.Generate some random noise in the latent dimension.
3.Load in a pretrained U-Net model, and perform the reverse
process for a fixed number of timesteps, using the random
noise and encoded text prompt as input.
4.The output of this step is the latent representation of our
generated image.
5.Load in a pretrained VAE, and perform the Decoding process
on the output latent from the previous step to obtain the final
output image, in full resolution.