psychology
Neural Networks
A subsetof machine learning using artificial
neural networks with multiple layers ("deep"
architectures)
insights
Complex Patterns
Excels at modeling complex patterns in data
through hierarchical representations
auto_awesome
Key Applications
• Image recognition
• Natural language processing
• Speech recognition
What is Deep
Learning?
1 shuffle Initialize
Setrandom weights w and bias b
2 arrow_forward Forward Pass
Compute weighted sum and apply activation
function
z = wTx + b ŷ = σ(z)
3 calculate Loss Calculation
Measure error between prediction and actual
value
L = ½(y - ŷ)2 (MSE)
4 arrow_back Backward Pass
Compute gradient and update weights via
backpropagation
∂L/∂w = ∂L/∂ŷ · ∂ŷ/∂z · ∂z/∂w
w ← w - η ∂L/∂w
Training a Neuron
5.
1 cloud_download
Data Collection
Gatherraw data (images, text, sensor
readings)
2 cleaning_services
Preprocessing
Clean data, normalize features, split into
training (70%), validation (15%), and test
(15%) sets
3 construction
Feature Engineering
Create relevant features (e.g., edge
detection in images)
4 category
Model Selection
Choose architecture (CNN for images,
RNN for sequences)
5 fitness_center
Training
Fit model to training data
6 analytics
Evaluation
Assess performance using metrics
(accuracy, F1-score)
7 rocket_launch
Deployment
Integrate model into production
Data Analysis Workflow
6.
verified
Purpose: Evaluate modelgeneralization to
unseen data
1 call_split Holdout Validation
Split data into separate train and test sets
2 view_comfy K-Fold Cross-Validation
Divide data into k subsets; train on k-1, validate
on the held-out fold (repeat k times)
3 balance Stratified Sampling
Preserve class distribution in data splits
lightbulb Key Metric
Use test set accuracy (not training accuracy) to detect
overfitting
Out-of-Sample
Validation
7.
swap_horiz Transfer Learning
model_training
Pre-trainedModels
Use models trained on large datasets
(ImageNet, Wikipedia)
Examples: ResNet, BERT, GPT
tune
Fine-tuning
Replace final layer and retrain on your small
dataset
lock
Freeze Layers
Keep early layers fixed (they capture generic
features)
auto_fix_high Data Augmentation
image
Images
Rotation, flipping, cropping, color jitter
text_fields
Text
Synonym replacement, back-translation
graphic_eq
Audio
Time-shifting, noise injection
Training Large
Networks with Limited
Data
8.
psychology
Concept: Dense vectorrepresentations of
words capturing semantic meaning
text_format Word2Vec
Predicts context words (skip-gram) or center
words (CBOW)
grid_view GloVe
Combines global matrix factorization and local
context
extension FastText
Uses subword information (handles rare words)
stars Key Properties
compare_arrows Similar words have similar vectors
functions Vector arithmetic captures semantic relationships
king - man + woman ≈ queen
Word Embeddings
9.
layers
Feed-Forward Neural Networks
(FNN)
architecture
Structure
Inputlayer → Hidden layers (fully connected) →
Output layer
bolt
Activation Functions
ReLU (hidden layers), softmax (output for
classification)
apps
Use Case
Tabular data, simple classification/regression
tasks
warning
Limitation
Cannot handle sequential or spatial data
efficiently
transform Vectorization
description
Definition
Converting non-numeric data (text, images) into
numerical vectors
format_list_bulleted
Examples
Text: Bag-of-words, TF-IDF, word embeddings
Images: Pixel values, flattened arrays, CNN
feature maps
lightbulb
Purpose
Enable processing by neural networks
Feed-Forward Neural
Networks &
Vectorization
10.
filter Convolutional Layers
search
CoreIdea
Apply convolutional filters to input data to
detect local patterns
tune
Key Operations
Convolution: Slide filters over input to produce
feature maps
Pooling: Downsample feature maps
(max/average pooling)
stars
Advantages
Parameter Sharing: Filters reused across spatial
locations
Translation Invariance: Detects features
regardless of position
image
Use Case
Image recognition, object detection (e.g., CNNs
like ResNet)
loop Recurrent Neural Networks (RNN)
architecture
Structure
Loops to retain information from previous
timesteps
warning
Problem
Vanishing/exploding gradients (fails to capture
long-term dependencies)
timeline
Use Case
Sequential data (time series, text)
Convolutional Layers &
Recurrent Networks
11.
memory Long Short-TermMemory (LSTM)
architecture
Enhanced RNN
Uses memory cells and gates to control
information flow
settings
Gates
Forget Gate: Discard irrelevant information
Input Gate: Store new information
Output Gate: Expose memory cell state
trending_up
Advantage
Captures long-range dependencies in text or
time series
transform Transformers and Self-Attention
visibility
Self-Attention
Computes weighted relationships between all
words in a sequence
Attention(Q,K,V) = softmax(QKT/√dk)V
view_quilt
Architecture
Encoder-Decoder: Stacked self-attention and
feed-forward layers
Positional Encoding: Injects sequence order
information
speed
Advantages
Parallel processing (faster than RNNs)
Handles long-range dependencies effectively
apps
Use Cases
NLP (BERT, GPT), vision (ViT)
LSTM and Transformers
12.
speed Algorithm
psychology
Adaptive MomentEstimation
Combines the best properties of Momentum
and RMSProp optimizers
settings Mechanism
functions
Update Rule
mt = β1mt-1 + (1-β1)∇θ
vt = β2vt-1 + (1-β2)(∇θ)2
θt = θt-1 - η(mt/√(vt)+ε)
insights
Key Components
Momentum: Exponential moving average of
gradients
RMSProp: Adaptive learning rates
star Advantages
bolt
Performance
Fast convergence
Works well with sparse gradients
tune
Practical Benefits
Requires little tuning
Default parameters work well for most problems
Adam Optimizer
13.
compress Auto-Encoders
architecture
Structure
Encoder (compressesinput to latent space) +
Decoder (reconstructs input)
category
Types
Denoising Auto-Encoder: Reconstructs input
from corrupted data
Variational Auto-Encoder (VAE): Generates new
data by sampling from latent space
apps
Use Cases
Dimensionality reduction, anomaly detection,
generative modeling
sports_esports
Generative Adversarial Networks
(GANs)
architecture
Architecture
Generator: Creates fake data to fool the
discriminator
Discriminator: Classifies data as real or fake
fitness_center
Training
Minimax game where both networks improve
adversarially
minG maxD V(D,G) = E[log D(x)] +
E[log(1-D(G(z)))]
image
Use Cases
Image generation (e.g., StyleGAN), data
augmentation
Auto-Encoders & GANs
14.
1 layers HierarchicalLearning
Deep learning leverages layered neural
networks to learn hierarchical representations
from data
2 lightbulb Key Innovations
CNNs (spatial data), RNNs/LSTMs (sequences),
Transformers (attention), and GANs
(generation) have revolutionized AI
3 build Problem-Solving Techniques
Transfer learning, data augmentation, and auto-
diff address challenges like limited data and
optimization
4 school Mastery Requirements
Success requires understanding math,
programming, and iterative experimentation
Key Takeaways