The Backbone of Modern AI Models" The architecture of Transformers

Welcome to
Generative AI
The Backbone of Modern AI Models"
• The architecture of Transformers
• How they differ from traditional neural networks
• Their role in Foundation Models
• Applications in NLP (e.g., BERT, GPT)
• Why they are essential for Large Language Models (LLMs)

AI Milestones
1. Origins of AI
Alan Turing (1950): Introduced AI concept in
"Computing Machinery and Intelligence".
Turing Test: Evaluates a machine’s ability to exhibit
intelligent behavior.
2. Early AI Developments
John McCarthy (1956): Coined the term "Artificial
Intelligence" at Dartmouth Conference.
Frank Rosenblatt (1958): Developed the Mark 1
Perceptron, an early neural network model.
3. Rise of Neural Networks
Minsky & Papert (1969): Published Perceptrons,
foundational for neural networks.1980s: Neural
networks gained popularity in AI application

AI Milestones
4. AI Milestones in Computing
• IBM Deep Blue (1997): Defeated world chess champion Garry Kasparov.
• IBM Watson (2011): Won Jeopardy! against human champions.
5. Advancements in Deep Learning
• Baidu Minwa (2015): Used CNNs to outperform humans in image
recognition.
• DeepMind AlphaGo (2016): Beat world Go champion Lee Sedol,
demonstrating AI's strategic potential.
6. Emergence of Generative AI
• OpenAI ChatGPT (2022): Released as a powerful language model
(LLM).
• GPT-3.5 → GPT-4: Enhanced text generation, conversational AI, and
coding capabilities.
7. AI's Societal Impact
• AI is revolutionizing healthcare, automation, and communication.
• Debates on job displacement vs. innovation.

GEN AI
1. Artificial Intelligence (AI) - The Broadest Concept
Encompasses all smart machine capabilities.
Performs tasks requiring human-like intelligence (e.g., decision-making, problem-solving).
2. Machine Learning (ML) - A Subset of AI
Uses data & algorithms to enable machines to learn.
Improves with more data exposure.
Optimizes error reduction or prediction accuracy.
3. Deep Learning (DL) - A Subset of ML
Utilizes deep artificial neural networks.
Handles complex tasks like image recognition, NLP, object detection.
Requires large datasets and high computational power.
4. Russian Doll Analogy
AI (largest) → ML (middle) → DL (smallest, most advanced).
Each layer builds upon the previous one for more sophisticated learning.

1. Deep Learning (DL)
• Subset of Machine Learning (ML).
• Primarily uses deep artificial neural networks.
2. Meaning of "Deep"
• Refers to the number of hidden layers in a neural network.
• Shallow Network → One hidden layer.
• Deep Network → Multiple hidden layers.
3. Feature Hierarchy in DL
• Lower layers detect simple features (e.g., pixels).
• Higher layers combine them into complex patterns (e.g., shapes, objects).
4. Computational Challenges & Breakthroughs
• Before 2010, DL was limited due to high computational costs.
Growth driven by:
• Labeled Data – Large datasets for training.
• Advanced Algorithms – Improved deep learning models.
Faster Hardware – High-performance GPUs and CPUs.
5. Strategic Importance of DL
• Now a key component in AI-driven enterprises.
• Analyzes large datasets to find patterns & generate insights

GEN AI
Definition of Generative AI
A subset of AI that generates new outputs (text, images, code, audio).
Differs from Traditional AI, which focuses on pattern recognition & prediction.
2. Phases of Generative AI
Training → Creating a foundation model.
Tuning → Adapting the model for a specific application.
Generation & Evaluation → Assessing output quality and refining the model.
3. Popular GenAI Tools
ChatGPT (Text Generation)
Gemini (Multimodal AI)
Midjourney (Image Generation)
IBM Watsonx.ai (Enterprise AI solutions)
4. Importance & Impact
For Businesses: Boosts growth, revenue, creativity, & efficiency.
For Developers: Enhances productivity & innovation in various fields

1. The Evolution of Generative AI
Not a new concept → Decades of research in AI & ML.
Recent boom → Due to advancements in computing power, data availability, and deep learning
techniques.
2. Foundations of Generative AI
Traditional AI → Focuses on rule-based decision-making & pattern recognition.
Neural Networks → Modeled after the human brain to process data in layers.
Deep Neural Networks (DNNs) → Multiple hidden layers enable learning of complex features.
3. Key Breakthroughs Enabling GenAI
Increase in computational power (GPUs, TPUs).
Availability of large-scale datasets for training.
Advancements in deep learning models (Transformers, GANs, VAEs).

Neural Networks
• Neural networks (or artificial neural networks) can learn complex patterns using layers of
neurons which mathematically transform the data.
• The layers between the input and output are referred to as "hidden layers".
• A neural network can learn relationships between the features that other algorithms cannot
easily discover.

Introduction to Neural Networks
• Modeled after the human brain.
• Composed of interconnected processing units called Neurons.
• Designed to learn patterns and relationships in data.
Structure of a Neural Network
Layers of a Neural Network:
• Input Layer → Takes in raw data.
• Hidden Layers → Processes & extracts features.
• Output Layer → Provides final prediction.
Connections between neurons allow learning.

How Neurons Work
• Neurons receive inputs and perform mathematical transformations.
• Each neuron decides what to pass to the next layer.
• The network improves accuracy by refining weights & biases.
Deep Learning & Hidden Layers
• Deep Learning → Neural networks with multiple hidden layers.
• More layers = Higher ability to capture complex patterns.
• Enables tasks like image recognition, NLP, and AI-driven decision-making.

Artificial Intelligence
Computers that are able to perform tasks that normally require human intelligence. True AI or Pure AI refers to machines that
are equal in intelligence to humans, but AI that is being developed today is not purely autonomous but rather a tool used to
expand the capabilities of its users.
Machine Learning
Set of techniques that get a program to perform a task as it gets more experience (or more examples). In other words, Machine
Learning refers to techniques which can learn from historical data. One common example is the Recommendation Engine
available on E-Commerce websites. Machine Learning can be broadly categorized into 03 types: Supervised Learning,
Unsupervised Learning and Reinforcement Learning.
Deep Learning
It is a subset of Machine Learning. Deep Learning involves training of Artificial Neural Networks with several layers. It can
be used for both - Structured and Unstructured Data, however, it is most commonly used with Unstructured Data to implement
use cases such as Image Classification, Speech Recognition, etc.
Data Science
Combines aspects of Statistics, Computer Science, Applied Mathematics and Visualizations to turn data into insights and new
knowledge. It can be described as the process of obtaining, transforming and analyzing data to communicate certain insights
from data.

MLP - Multilayer Perceptron (MLP)
Introduction to MLP
• A type of feedforward neural network.
• Composed of multiple layers of perceptrons.
• Uses nonlinear activation functions.
• Effective for handling non-linearly separable data.
Applications of MLP
• Classification (e.g., image & text classification).
• Natural Language Processing (NLP).
• Function approximation in deep learning.
Why Use MLP?
• Universal Approximation Theorem → Can approximate any
function.
• Highly flexible in architecture & learning capabilities.
• Works well for structured data, images, and NLP tasks.
MLPArchitecture – Three Fundamental Layers
Input Layer:
Receives initial data.
Each neuron represents a feature of the input.
Number of neurons = dimensionality of input data.
Hidden Layers:
One or more layers between input & output.
Fully connected neurons process and learn data patterns.
Each neuron applies transformations before passing data
forward.
Output Layer:
Produces final predictions.
Number of neurons depends on the task:
Binary classification → 1 or 2 neurons.
Multi-class classification → Multiple neurons.

Introduction to CNNs
Specialized neural networks for computer vision tasks.
Capture spatial relationships between pixels.
Consider neighboring pixels instead of treating each pixel separately.
Why Spatial Relationships Matter?
Pixels next to each other form meaningful patterns.
Example:
“Circle of black surrounded by brown” → Likely an eye.
“20% black, 35% brown” → Could be anything (T-shirt, object, etc.).
Key Components of CNNs
Convolution Layers
Apply filters to detect features (edges, textures, objects).
Helps in recognizing spatial hierarchies in an image.
Pooling Layers
Reduce dimensionality while retaining important information.

Recurrent Neural Networks
• An RNN is a powerful model from the deep learning family that has shown incredible
results in the last few years.
• Like feedforward and convolutional neural networks (CNNs), recurrent neural networks
utilize training data to learn. They are distinguished by their "memory" as they take
information from prior inputs to influence the current input and output.
• While traditional deep neural networks assume that inputs and outputs are independent of
each other, the output of recurrent neural networks depend on the prior elements within the
sequence.

RNN
Memory-Driven Neural Networks
• Utilize prior inputs to influence current predictions.
• Unlike traditional deep networks, outputs depend on sequence context.
Understanding RNNs with an Example
• Idiom: “Feeling under the weather” → Must follow a specific order.
• RNNs retain word positions to predict the next word correctly.
Key Characteristics
• Shared Parameters: Unlike feedforward networks, RNNs reuse weights across layers.
• Trained with Backpropagation & Gradient Descent to improve accuracy.
Applications
• Speech Recognition, Language Modeling, Text Prediction, Time-Series Analysis.

Limitations of CNNs & RNNs
Convolutional Neural Networks (CNNs) - Challenges
• Require large labeled datasets for effective training.
• Computationally expensive and slow to train.
• Not ideal for sequential data like text or speech.
Recurrent Neural Networks (RNNs) - Challenges
• Vanishing Gradient Problem: Tiny gradients cause weights to shrink to zero, stopping learning.
• Exploding Gradient Problem: Large gradients make weights unstable, leading to NaN values.

Autoencoder
• The introduction of Autoencoder, Variational Autoencoder (VAE)
and Generative Adversarial Networks (GANs) really kicked off the
modern era of generative AI. Up until then, deep neural networks
were used primarily for classification tasks, but with these
architectures, they began to be used for generative artificial
intelligence.
modern architectures.
Autoencoder
• An autoencoder is a type of neural network architecture designed
to efficiently compress (encode) input data down to its essential
features, then reconstruct (decode) the original input from this
compressed representation.
• Unsupervised ML models that learn to compress and reconstruct
data.
• Discover latent variables that capture the essential information of
input data.

Key Components:
• Encoder: Compresses input data into a latent space representation.
• Bottleneck (Code): The most compact form of data representation.
• Decoder: Reconstructs input data from latent representation.
• Loss Function: Measures reconstruction error to optimize learning.
Hyperparameters & Design Choices:
• Code Size: Controls compression level.
• Depth (Layers): More layers increase complexity.
• Neurons per Layer: Varies based on input type & model design.
• Loss Function: Optimized based on task requirements.
Applications:
• Feature extraction (dimensionality reduction)
• Image denoising
• Anomaly detection
• Facial recognition
• Generative tasks (e.g., VAEs, AAEs for image generation)

The Backbone of Modern AI Models" The architecture of Transformers

More Related Content

Similar to The Backbone of Modern AI Models" The architecture of Transformers

Recently uploaded

The Backbone of Modern AI Models" The architecture of Transformers