SlideShare a Scribd company logo
1 of 57
Download to read offline
• This session is recorded.
• Mute your mic.
• You can leave messages in the chat panel to ask questions.
• When I check your progress, please click if you are done.
• We will start at 6:35.
National University of Singapore – CS5242 1
CS5242 Neural Networks and Deep Learning
Lecture 01: Introduction and Linear Regression
Wei WANG
TA: FU Di, LIANG Yuxuan, FuYujian
cs5242@comp.nus.edu.sg
National University of Singapore – CS5242 2
change log:
V1: slide 41, the superscript should be n
Agenda
• Introduction of this module
• Logistics
• History
• Linear regression
• Univariate
• Multivariate
National University of Singapore – CS5242 3
Instructor
• WANG Wei
• COM2-04-09
• wangwei@comp.nus.edu.sg
• Research
• Machine learning system optimization
• Multi-modal data analysis/applications
National University of Singapore – CS5242 4
Teaching Assistants
National University of Singapore – CS5242 5
e0427783@u.nus.edu e0409760@u.nus.edu e0427770@u.nus.edu
LIANG Yuxuan FU Di FU Yujian
Quiz Assignment Final project
LumiNUS Poll
National University of Singapore – CS5242 6
Your background?
Pre-requisite
• Machine Learning (CS3244)
• Or https://www.coursera.org/learn/machine-learning
• Linear Algebra (MA1101R)
• Or https://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-spring-2010/
• Calculus (MA1521)
• Probability (ST2334 )
• Brief summary
Course
• Python (ONLY, version 3.x)
• Numpy
• Keras, TensorFlow, PyTorch, MxNet, Caffe, SINGA
• Jupyter notebook or Google colab
Coding
National University of Singapore – CS5242 7
Grading policy
•Assignment 30 points
•Two assignments
•15% off per day late (17:01 is the start of one day); 0 score if you submit it 7 days after the deadline
•Online quiz 30 points
•3 quizzes in total
•You need a camera (from your laptop, mobile phone or external camera)
•Project 40 points
•Kaggle competition
•Group size <=4
Weightage:
•Every assignment is an individual task
•The project is a group-based task
•Avoid academic offence (cheating, plagiarism including copying code from the internet, e.g., github)
Collaboration
National University of Singapore – CS5242 8
Contacts
• LumiNUS forum
• For all issues
• Email: cs5242@comp.nus.edu.sg
• For private/personal issues
• Consultation (make appointments on LumiNUS)
• Wei WANG, wangwei@comp.nus.edu.sg
• Wed, 11:00-12:00
National University of Singapore – CS5242 9
GPU machines
• SoC GPU machines
• https://dochub.comp.nus.edu.sg/cf/guides/compute-cluster/hardware
• Google Cloud Platform
• Some free credit for new register
• GPU is expensive
• Amazon EC2 (g2.8xlarge)
• Some free credit for students
• GPU is expensive
• NSCC (National Super-Computing Center)
• Free for NUS students
• The libraries are sometimes outdated
• Jobs will be submitted into a queue for executing
• Important notes if you use cloud platforms:
• STOP/TERMINATE the instance immediately after your program terminates
• Check the usage status frequently to avoid high outstanding bills
• Amazon/Google may charge you for additional storage volume
National University of Singapore – CS5242 10
Syllabus & Schedule
• Learn various neural network models
• From shallow to deep neural networks
• Including the model structure, training algorithms, tricks and applications
• Covering the principles, coding practices and a bit theory
• Check LumiNUS for the latest plan
National University of Singapore – CS5242 11
Intended learning outcomes (my expectation)
Explain the
principles of the
operations of
different layers and
training algorithms
1
Compare different
neural network
architectures
2
Implement popular
neural networks
3
Solve problems
using neural
networks and deep
learning techniques
4
National University of Singapore – CS5242 12
Your expectation?
National University of Singapore – CS5242 13
I want to get an easy boost to my
GPA.
National University of Singapore – CS5242 14
This course may not be easy
I want to earn the big bucks as a
data scientist / ML engineer
at Google / Amazon / Facebook.
National University of Singapore – CS5242 15
This course provides the basics; but is not enough.
check out:
CS5260 – NN & Deep Learning II
CS4243 – Computer vision & Pattern Recognition
CS4248 – Natural Language Processing
Slide credit: Angela Yao
National University of Singapore – CS5242 16
Slide credit: Angela Yao
My PhD advisor is making me take
this course so that I can use deep
learning in our research.
National University of Singapore – CS5242 17
Neural Networks and Deep Learning:
•Andrew Ng
•https://www.coursera.org/learn/neural-networks-deep-learning
CSC 321: Intro to Neural Networks and Machine Learning
•Roger Grosse
•http://www.cs.toronto.edu/~rgrosse/courses/csc321_2017/
Neural Networks for Machine Learning
•Geoffrey Hinton
•https://www.coursera.org/learn/neural-networks
CS231n: Convolutional Neural Networks for Visual Recognition
• Fei-Fei Li, Justin Johnson, Serena Yeung
• http://cs231n.stanford.edu/
CS224d: Deep Learning for Natural Language Processing
• Richard Socher
• http://cs224d.stanford.edu/
Slide credit: Angela Yao
AI is cool and I want to build the next Skynet.
National University of Singapore – CS5242 18
Slide credit: Angela Yao
Terminology Overload: AI vs. ML vs. DL
National University of Singapore – CS5242 19
https://blogs.nvidia.com/blog/2016/07/29/whats-difference-artificial-intelligence-machine-learning-deep-learning-ai/
AI
• 1950’s
• “Human intelligence exhibited by machines”
• Expert systems; Rules
• Machine learning algorithms
• Narrow AI: image recognition, machine translation
National University of Singapore – CS5242 20
https://www.youtube.com/watch?v=nASDYRkbQIY
Machine Learning
• 1980’s
• “An approach to achieve AI through systems that can learn from
experience to find patterns in that data”
• Can we code a program with rules (like bubble sorting) to do
• Image recognition
• Speech recognition
National University of Singapore – CS5242 21
This Photo by Unknown Author is licensed
under CC BY-NC-ND
Program Plane
Neural networks and Deep Learning
National University of Singapore – CS5242 22
• 1950’s
• A class of machine learning algorithm that use a cascade of multiple
layers of nonlinear processing units for feature extraction and
transformation---Wikipedia
x y
f1() f2() f3()
Image source: http://pediaa.com/difference-between-nerve-and-neuron/
Neural nets/perceptrons are loosely inspired by biology. But they certainly
are not a model of how the brain works, or even how neurons work.
Slide credit: Angela Yao
Neural networks and Deep Learning
• Feature learning (transformation)
23
National University of Singapore – CS5242
Neural networks and Deep Learning
• Feature learning (transformation)
Data
Performance
24
Deep learning is not a killer for everything
National University of Singapore – CS5242
Neural networks (NN)
• Linear, polynomial, logistic, multinomial regression
• Perceptron and multi-layer perceptron (MLP)
• Convolutional neural network (CNN)
• Recurrent neural networks (RNN)
• Generative adversarial networks (GAN)
• (Restricted) Boltzmann machine
• Deep belief network
• Spike neural network
• Radial basis function neural network
• Hopfield networks
National University of Singapore – CS5242 25
Source from [3]
History [2]
First perceptron,
Frank Rosenblatt
1958
Backpropagation,
Paul Werbos
1974
Neocogitron
(inspired CNN),
Kunihiko
Fukushima
1980
Recurrent Neural
Network (RNN),
Michael I. Jordan
1986
LeNet, Yann
LeCun
1990
Hochreiter &
Schmidhube,
LSTM
1998
Deep Belief
Networks,
Geoffrey Hinton
2006
AlexNet, Alex
Krizhevsky &
Geoffrey Hinton
2012
GoogleNet &
VGG
2014
Generative
adversarial
network, Ian
Goodfellow
2014
ResNet, Kaimin
He
2015
AlphaGo,
DeepMind
2015
National University of Singapore – CS5242 26
Refer to http://people.idsia.ch/~juergen/who-invented-backpropagation.html for more papers
https://www.youtube.com/watch?v=yaL5ZMvRRqE
Applications
National University of Singapore – CS5242 27
• Image classification
• Object detection, demo
• Scene text recognition
• Neural style transferring
• Image generation
Computer Vision
• Question answering
• Machine translation
Natural language processing
• Speech recognition, e.g. Amazon Echo
Speech
Univariate Linear Regression
28
National University of Singapore – CS5242
Weight vs Height
Problem definition:
• Predict the weight given the height of a person.
Data
• Input: Height (inches)
• Output: Weight (pounds)
• Split the data into
• 80% for training
• 20% for evaluation
29
Data source: https://www.kaggle.com/mustafaali96/weight-height
National University of Singapore – CS5242
Modeling
• Notation
• A person is called an example/instance
• Height denoted as 𝑥𝑥 ∈ 𝑅𝑅: input feature
• Weight denoted as 𝑦𝑦 ∈ 𝑅𝑅: the target or ground truth
• Map from input to output by linear regression
• �
𝑦𝑦 = 𝑥𝑥𝑥𝑥 + 𝑏𝑏, 𝑤𝑤 ∈ 𝑅𝑅, 𝑏𝑏 ∈ 𝑅𝑅
• 𝑤𝑤 is the slope and 𝑏𝑏 is the intercept
• �
𝑦𝑦 is called the prediction
30
National University of Singapore – CS5242
Training
• Learn 𝑤𝑤 and 𝑏𝑏 to fit the training data 𝑆𝑆𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 = (𝑥𝑥(𝑖𝑖), 𝑦𝑦(𝑖𝑖)) , 𝑖𝑖 = 1 … 𝑚𝑚
• That fit the data well
• How to measure the quality of w and b?
• Loss function
• The smaller the loss value, the better the prediction (closer to the target)
• 𝐿𝐿 𝑥𝑥, 𝑦𝑦|𝑤𝑤, 𝑏𝑏 = |�
𝑦𝑦 − 𝑦𝑦|
• 𝐿𝐿 𝑥𝑥, 𝑦𝑦|𝑤𝑤, 𝑏𝑏 =
1
2
||�
𝑦𝑦 − 𝑦𝑦||2
=
1
2
(�
𝑦𝑦 − 𝑦𝑦)2
• Easier for optimization/training because it is differentiable
• The coefficient
1
2
is to make the gradient simple
• Define the training objective
• 𝐽𝐽(𝑤𝑤, 𝑏𝑏) =
1
𝑚𝑚
∑𝑖𝑖=1
𝑚𝑚
𝐿𝐿 𝑥𝑥(𝑖𝑖)
, 𝑦𝑦(𝑖𝑖)
𝑤𝑤, 𝑏𝑏
31
National University of Singapore – CS5242
Tune the model parameters over the training instances to minimize the
average loss, i.e., the training objective
Optimization/training
32
National University of Singapore – CS5242
min
𝑤𝑤,𝑏𝑏
𝐽𝐽 𝑤𝑤, 𝑏𝑏
→ min
𝑤𝑤,𝑏𝑏
1
𝑚𝑚
∑𝑖𝑖=1
𝑚𝑚
𝐿𝐿 𝑥𝑥 𝑖𝑖 , 𝑦𝑦 𝑖𝑖 𝑤𝑤, 𝑏𝑏
→ min
𝑤𝑤,𝑏𝑏
1
2𝑚𝑚
∑𝑖𝑖=1
𝑚𝑚
𝑤𝑤𝑥𝑥 𝑖𝑖 + 𝑏𝑏 − 𝑦𝑦 𝑖𝑖 2
1. Fix b and learn w
min
𝑤𝑤
1
2𝑚𝑚
�
𝑖𝑖=1
𝑚𝑚
𝑥𝑥 𝑖𝑖 2
𝑤𝑤2 +
1
𝑚𝑚
�
𝑖𝑖=1
𝑚𝑚
𝑥𝑥 𝑖𝑖 (𝑏𝑏 − 𝑦𝑦 𝑖𝑖 ) 𝑤𝑤 +
1
2𝑚𝑚
�
𝑖𝑖=1
𝑚𝑚
𝑏𝑏 − 𝑦𝑦 𝑖𝑖 2
→ min
𝑤𝑤
𝑐𝑐1𝑤𝑤2 + 𝑐𝑐2𝑤𝑤 + 𝑐𝑐3
2. Fix w and learn b
Optimization/training
33
National University of Singapore – CS5242
Gradient for univariate simple functions
Gradient: recall high school calculus and how to take derivatives
•
𝜕𝜕𝜕𝜕 𝑥𝑥
𝜕𝜕𝜕𝜕
= lim
ℎ→0
𝑓𝑓 𝑥𝑥+ℎ −𝑓𝑓 𝑥𝑥
ℎ
• 𝑓𝑓 𝑥𝑥 = 3,
𝜕𝜕𝜕𝜕 𝑥𝑥
𝜕𝜕𝜕𝜕
= 0
• 𝑓𝑓 𝑥𝑥 = 3𝑥𝑥,
𝜕𝜕𝜕𝜕 𝑥𝑥
𝜕𝜕𝜕𝜕
= 3
• 𝑓𝑓 𝑥𝑥 = 𝑥𝑥2
,
𝜕𝜕𝜕𝜕 𝑥𝑥
𝜕𝜕𝜕𝜕
= 2𝑥𝑥
34
National University of Singapore – CS5242
Gradient for univariate composite functions
• 𝑓𝑓 𝑥𝑥 = 𝑔𝑔 𝑥𝑥 + ℎ 𝑥𝑥
• 𝑓𝑓 𝑥𝑥 = 3𝑥𝑥 + 𝑥𝑥2
• 𝑓𝑓 𝑥𝑥 = 𝑔𝑔 𝑥𝑥 ℎ 𝑥𝑥
• 𝑓𝑓 𝑥𝑥 = 𝑥𝑥𝑥𝑥2
• 𝑓𝑓 𝑥𝑥 =
𝑔𝑔 𝑥𝑥
ℎ(𝑥𝑥)
• 𝑓𝑓 𝑥𝑥 =
𝑒𝑒𝑥𝑥
𝑥𝑥
• 𝑓𝑓 𝑥𝑥 = 𝑔𝑔 𝑢𝑢 , 𝑢𝑢 = ℎ(𝑥𝑥)
• 𝑓𝑓 𝑥𝑥 = ln(𝑥𝑥 + 𝑥𝑥2
)
• 𝑓𝑓′ 𝑥𝑥 = 𝑔𝑔′ 𝑥𝑥 + ℎ′ 𝑥𝑥
• 𝑓𝑓′ 𝑥𝑥 = 𝑔𝑔′
(𝑥𝑥)ℎ 𝑥𝑥 + 𝑔𝑔 𝑥𝑥 ℎ′
𝑥𝑥
• 𝑓𝑓𝑓 𝑥𝑥 =
𝑔𝑔′ 𝑥𝑥 ℎ 𝑥𝑥 −𝑔𝑔 𝑥𝑥 ℎ′
(𝑥𝑥)
ℎ 𝑥𝑥 2
• 𝑓𝑓′
𝑥𝑥 = 𝑔𝑔′
𝑢𝑢 ℎ′(𝑥𝑥)
35
We use 𝑓𝑓𝑓(𝑥𝑥) and
𝜕𝜕𝜕𝜕 𝑥𝑥
𝜕𝜕𝜕𝜕
interchangeably for the gradient of
f(x) with respect to x, where x and
f(x) are scalars
National University of Singapore – CS5242
HW1: solve for the derivatives /
gradients of these functions.
Training by gradient descent
36
J
w
w0 w2 w1
Initialize w as w0
Compute
𝜕𝜕𝜕𝜕
𝜕𝜕𝜕𝜕
|𝑤𝑤=𝑤𝑤𝑤 , negative;
Move w from w0 to the right by
w1 = w0 − α
𝜕𝜕𝜕𝜕
𝜕𝜕𝜕𝜕
|𝑤𝑤=𝑤𝑤𝑤
Compute
𝜕𝜕𝜕𝜕
𝜕𝜕𝜕𝜕
|𝑤𝑤=𝑤𝑤1, positive;
Move w from w1 to the left by
w2 = w1 − α
𝜕𝜕𝜕𝜕
𝜕𝜕𝜕𝜕
|𝑤𝑤=𝑤𝑤1
Compute
𝜕𝜕𝜕𝜕
𝜕𝜕𝜕𝜕
|𝑤𝑤=𝑤𝑤2, negative
Move w from w2 to the right by
w3 = w2 − α
𝜕𝜕𝜕𝜕
𝜕𝜕𝜕𝜕
|𝑤𝑤=𝑤𝑤2
Find the w for which we have the lowest* J.
α is the learning rate, which controls
the moving step length. It’s
important for convergence. If it is
large, w oscillates around the optimal
position. If it is small, it takes many
iterations to reach the optimum.
For example:
𝑐𝑐1 = 1, 𝑐𝑐2 = −2, 𝑐𝑐3 = 1
National University of Singapore – CS5242
𝐽𝐽 = 𝑐𝑐1𝑤𝑤2
+ 𝑐𝑐2𝑤𝑤 + 𝑐𝑐3
Training by gradient descent
• Gradient descent algorithm for optimization
• Set w = 0.1 or a random number
• Repeat
• For each data sample, compute �
𝑦𝑦 = 𝑥𝑥𝑥𝑥 + 𝑏𝑏
• Compute the average loss, ∑<𝑥𝑥,𝑦𝑦>∈𝑆𝑆𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
𝐿𝐿 𝑥𝑥, 𝑦𝑦 𝑤𝑤, 𝑏𝑏 /|𝑆𝑆𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡|
• Compute
𝜕𝜕𝐽𝐽
𝜕𝜕𝑤𝑤
• Update w = w − α
𝜕𝜕𝐽𝐽
𝜕𝜕𝑤𝑤
37
National University of Singapore – CS5242
Update w, b repeatedly…
What if there are multiple parameters, e.g., multiple
variables?
Machine learning pipeline
Prepare
data
Extract
feature
Create
model
Find loss
function
Conduct
training
Evaluate Deploy
38
National University of Singapore – CS5242
Multivariate Linear Regression
National University of Singapore – CS5242 39
Multivariate linear regression
• Consider the problem of house price prediction
• Each instance in the training dataset
• Denote the features using a column vector
• 𝒙𝒙 ∈ 𝑅𝑅𝑛𝑛×1: 𝑥𝑥i is the i-th feature
• size, floor, location, age, lease, etc.
• Target 𝑦𝑦 ∈ 𝑅𝑅: price
40
𝒙𝒙 =
𝑥𝑥1
𝑥𝑥2
…
𝑥𝑥𝑛𝑛
National University of Singapore – CS5242
• Map from input to output
�
𝑦𝑦 = 𝒘𝒘𝑻𝑻𝒙𝒙 + 𝑏𝑏, 𝒘𝒘 ∈ 𝑅𝑅𝑛𝑛×1, 𝑏𝑏 ∈ 𝑅𝑅 , 𝑤𝑤𝑖𝑖 is the i-th element of 𝒘𝒘 (importance of i-th feature)
= ∑𝑖𝑖=1
𝑛𝑛
𝑤𝑤𝑖𝑖𝑥𝑥𝑖𝑖 + 𝑏𝑏
= 𝑤𝑤1, 𝑤𝑤2, … , 𝑤𝑤𝑛𝑛
𝑥𝑥1
𝑥𝑥2
…
𝑥𝑥𝑛𝑛
+ 𝑏𝑏
= 𝑤𝑤1, 𝑤𝑤2, … , 𝑤𝑤𝑛𝑛, 𝑏𝑏
𝑥𝑥1
𝑥𝑥2
…
𝑥𝑥𝑛𝑛
1
 �
𝑦𝑦 = �
𝒘𝒘𝑇𝑇�
𝒙𝒙
Model
41
National University of Singapore – CS5242
For the rest of this module, we use 𝒘𝒘 and 𝒙𝒙 for �
𝒘𝒘 and �
𝒙𝒙
respectively unless there is a special definition of 𝒘𝒘 and 𝒙𝒙 .
The summation is over n elements, where n is the number of features per instance.
Optimization
• Compute the gradient of the loss with respect to (w.r.t) w
• Consider a single instance
42
𝐽𝐽 𝒘𝒘 = 𝐿𝐿 𝒙𝒙, 𝑦𝑦 𝒘𝒘 =
1
2
�
𝑦𝑦 − 𝑦𝑦
2
=
1
2
𝒘𝒘𝑻𝑻𝒙𝒙 − 𝑦𝑦
2
𝜕𝜕𝐽𝐽 𝒘𝒘
𝜕𝜕𝒘𝒘
=?
National University of Singapore – CS5242
Gradient of vector and matrix
• Vectors
• By default, is a column vector
• Denoted as 𝒙𝒙, 𝒚𝒚, 𝒛𝒛
43
(denominator layout)
National University of Singapore – CS5242
Denominator layout: the result shape is (𝑛𝑛, 𝑚𝑚)
Gradient of vector and matrix
44
• Matrix
• Denoted as X, Y, Z
(denominator layout)
𝜕𝜕𝒀𝒀
𝜕𝜕𝜕𝜕
=
𝜕𝜕𝑦𝑦11
𝜕𝜕𝜕𝜕
𝜕𝜕𝑦𝑦21
𝜕𝜕𝜕𝜕
𝜕𝜕𝑦𝑦12
𝜕𝜕𝜕𝜕
𝜕𝜕𝑦𝑦22
𝜕𝜕𝜕𝜕
⋯
𝜕𝜕𝑦𝑦𝑚𝑚1
𝜕𝜕𝜕𝜕
𝜕𝜕𝑦𝑦𝑚𝑚𝑚
𝜕𝜕𝜕𝜕
⋮ ⋱ ⋮
𝜕𝜕𝑦𝑦1𝑛𝑛
𝜕𝜕𝜕𝜕
𝜕𝜕𝑦𝑦2𝑛𝑛
𝜕𝜕𝜕𝜕
⋯
𝜕𝜕𝑦𝑦𝑚𝑚𝑚𝑚
𝜕𝜕𝜕𝜕
National University of Singapore – CS5242
Gradient of matrix with respect to (w.r.t) matrix,
matrix w.r.t vector, vector w.r.t matrix?
Too complex. Not commonly used.
Gradient table
Shape of
𝜕𝜕𝜕𝜕
𝜕𝜕𝜕𝜕
Scalar y Vector y (m ,1) Matrix Y (m, n)
Scalar x ? ? ?
Vector x (n ,1) (n, 1) ?
Matrix X (p, q) ?
45
(denominator layout)
National University of Singapore – CS5242
HW2: fill in the table with the matrix
dimensions of the resulting gradients.
Gradient table: vector by vector
𝒚𝒚 = 𝒂𝒂 𝒙𝒙 𝑨𝑨𝑨𝑨 𝒙𝒙𝑻𝑻
𝑨𝑨
𝝏𝝏𝝏𝝏
𝝏𝝏𝝏𝝏
=
? ? 𝑨𝑨𝑇𝑇 ?
46
𝒚𝒚 = 𝑎𝑎𝒖𝒖
𝒖𝒖 = 𝒖𝒖(𝒙𝒙)
𝑣𝑣𝒖𝒖
𝑣𝑣 = 𝑣𝑣(𝒙𝒙), 𝒖𝒖 = 𝒖𝒖(𝒙𝒙)
𝒗𝒗 + 𝒖𝒖
𝒗𝒗 = 𝒗𝒗(𝒙𝒙), 𝒖𝒖 = 𝒖𝒖(𝒙𝒙)
𝑨𝑨𝑨𝑨
𝒖𝒖 = 𝒖𝒖(𝒙𝒙)
𝑔𝑔(𝒖𝒖)
𝒖𝒖 = 𝒖𝒖(𝒙𝒙)
𝝏𝝏𝝏𝝏
𝝏𝝏𝝏𝝏
=
?
𝑣𝑣
𝝏𝝏𝝏𝝏
𝝏𝝏𝝏𝝏
+
𝝏𝝏𝑣𝑣
𝝏𝝏𝝏𝝏
𝒖𝒖𝑻𝑻 ? ? ?
(denominator layout)
National University of Singapore – CS5242
𝑦𝑦 = 𝑎𝑎 𝒖𝒖𝑻𝑻
𝒗𝒗
𝒗𝒗 = 𝒗𝒗(𝒙𝒙), 𝒖𝒖 = 𝒖𝒖(𝒙𝒙)
𝑔𝑔 𝑢𝑢
𝑢𝑢 = 𝑢𝑢(𝒙𝒙)
𝒙𝒙𝑻𝑻
𝐴𝐴𝒙𝒙
𝝏𝝏𝝏𝝏
𝝏𝝏𝝏𝝏
=
? ? ? ?
HW3: fill in the gradient table with the resulting gradients
Optimization
• Compute the gradient of the loss w.r.t w
• Consider a single instance
47
𝐽𝐽 𝒘𝒘 = 𝐿𝐿 𝒙𝒙, 𝑦𝑦 𝒘𝒘 =
1
2
�
𝑦𝑦 − 𝑦𝑦
2
=
1
2
𝒘𝒘𝑻𝑻𝒙𝒙 − 𝑦𝑦
2
𝑧𝑧 = 𝒘𝒘𝑻𝑻𝒙𝒙 − 𝑦𝑦
𝐽𝐽(𝒘𝒘) =
1
2
𝑧𝑧2
𝜕𝜕𝜕𝜕 𝒘𝒘
𝜕𝜕𝒘𝒘
=
𝜕𝜕𝜕𝜕
𝜕𝜕𝑧𝑧
𝜕𝜕𝑧𝑧
𝜕𝜕𝒘𝒘
= 𝑧𝑧𝒙𝒙 = 𝒘𝒘𝑻𝑻𝒙𝒙 − 𝑦𝑦 𝒙𝒙
𝜕𝜕𝐽𝐽 𝒘𝒘
𝜕𝜕𝒘𝒘
=?
National University of Singapore – CS5242
Shape Check!
Vectorization
• Consider multiple examples {(𝒙𝒙 𝑖𝑖
, 𝑦𝑦(𝑖𝑖)
)}, i=1, 2, …, m
• Naïve approach of computing the gradient
• for each instance (𝒙𝒙 𝑖𝑖 , 𝑦𝑦(𝑖𝑖))
• Accumulate the loss 𝐿𝐿(𝒙𝒙 𝑖𝑖
, 𝑦𝑦(𝑖𝑖)
) into 𝐽𝐽
• Average J over m
• for each instance (𝒙𝒙 𝑖𝑖 , 𝑦𝑦(𝑖𝑖))
• Accumulate the gradient of
𝜕𝜕𝐿𝐿 𝒙𝒙 𝑖𝑖 ,𝑦𝑦 𝑖𝑖
𝜕𝜕𝒘𝒘
• Not as fast as matrix operations
48
Image source: https://www.ubergizmo.com/what-is/gpu-graphics-processing-unit/
National University of Singapore – CS5242
Vectorization
• Consider multiple examples {(𝒙𝒙 𝑖𝑖
, 𝑦𝑦(𝑖𝑖)
)}, i=1, 2, …, m
• Put each instance (the feature vector) into one row of a matrix
• For fast memory access as Numpy stores data in row-major format
• Put each target into one element of a column vector
49
National University of Singapore – CS5242
�
𝒚𝒚 = 𝑋𝑋𝒘𝒘
𝑋𝑋 =
𝒙𝒙 𝟏𝟏 𝑻𝑻
𝒙𝒙 𝟐𝟐 𝑻𝑻
…
𝒙𝒙 𝒎𝒎 𝑻𝑻
𝒚𝒚 =
𝑦𝑦(1)
𝑦𝑦(2)
…
𝑦𝑦(𝑚𝑚)
𝐽𝐽 𝒘𝒘 =?
Vectorization
• Consider multiple examples {(𝒙𝒙 𝑖𝑖
, 𝑦𝑦(𝑖𝑖)
)}, i=1, 2, …, m
• Put each instance (the feature vector) into one row of a matrix
• For fast memory access as Numpy stores data in row-major format
• Put each target into one element of a column vector
50
National University of Singapore – CS5242
�
𝒚𝒚 = 𝑋𝑋𝒘𝒘
𝑋𝑋 =
𝒙𝒙 𝟏𝟏 𝑻𝑻
𝒙𝒙 𝟐𝟐 𝑻𝑻
…
𝒙𝒙 𝒎𝒎 𝑻𝑻
𝒚𝒚 =
𝑦𝑦(1)
𝑦𝑦(2)
…
𝑦𝑦(𝑚𝑚)
𝐽𝐽 𝒘𝒘 =
1
2𝑚𝑚
�
𝒚𝒚 − 𝑦𝑦
2
=
1
2𝑚𝑚
�
𝒚𝒚 − 𝒚𝒚 T
(�
𝒚𝒚 − 𝒚𝒚)
𝜕𝜕𝐽𝐽 𝒘𝒘
𝝏𝝏𝝏𝝏
=?
𝒖𝒖 = �
𝒚𝒚 − 𝒚𝒚
Vectorization
• Consider multiple examples {(𝒙𝒙 𝑖𝑖
, 𝑦𝑦(𝑖𝑖)
)}, i=1, 2, …, m
• Put each instance (the feature vector) into one row of a matrix
• For fast memory access as numpy stores data in row-major format
• Put each target into one element of a column vector
51
�
𝒚𝒚 = 𝑋𝑋𝒘𝒘
𝑋𝑋 =
𝒙𝒙 𝟏𝟏 𝑻𝑻
𝒙𝒙 𝟐𝟐 𝑻𝑻
…
𝒙𝒙 𝒎𝒎 𝑻𝑻
𝒚𝒚 =
𝑦𝑦(1)
𝑦𝑦(2)
…
𝑦𝑦(𝑚𝑚)
𝐽𝐽 𝒘𝒘 =
1
2𝑚𝑚
�
𝒚𝒚 − 𝑦𝑦
2
=
1
2𝑚𝑚
�
𝒚𝒚 − 𝒚𝒚 T
(�
𝒚𝒚 − 𝒚𝒚)
𝜕𝜕𝐽𝐽 𝒘𝒘
𝝏𝝏𝝏𝝏
=
1
2𝑚𝑚
𝜕𝜕𝒖𝒖
𝝏𝝏𝒘𝒘
𝒖𝒖 +
𝜕𝜕𝒖𝒖
𝝏𝝏𝝏𝝏
𝒖𝒖 =
1
𝑚𝑚
𝑿𝑿𝑻𝑻
𝒖𝒖 =
1
𝑚𝑚
𝑿𝑿𝑻𝑻
(�
𝒚𝒚 − 𝒚𝒚)
𝒖𝒖 = �
𝒚𝒚 − 𝒚𝒚
National University of Singapore – CS5242
Shape Check!
Training by gradient descent
• Gradient descent algorithm for optimization
• initialize 𝒘𝒘 randomly
• Repeat
• Compute the objective 𝐽𝐽(𝒘𝒘) over all training instances
• Compute
𝜕𝜕𝐽𝐽
𝜕𝜕𝒘𝒘
• Update 𝐰𝐰 = 𝐰𝐰 − α
𝜕𝜕𝐽𝐽
𝜕𝜕𝒘𝒘
52
National University of Singapore – CS5242
Notation summary
• Scalar 𝑥𝑥
• Vector 𝒙𝒙 ∈ 𝑅𝑅𝑛𝑛×1
• i-th element of a vector, 𝑥𝑥𝑖𝑖
• i-th example 𝒙𝒙(𝑖𝑖)
• Matrix 𝑿𝑿
• i-th row and j-th column 𝑋𝑋𝑖𝑖𝑗𝑗
• If we choose denominator layout for
𝝏𝝏𝒚𝒚
𝝏𝝏𝝏𝝏
we should lay out the
gradient
𝝏𝝏𝑦𝑦
𝝏𝝏𝝏𝝏
as a column vector, and
𝝏𝝏𝒚𝒚
𝝏𝝏𝑥𝑥
as a row vector.
53
National University of Singapore – CS5242
Summary
• AI vs. ML vs. Deep Learning
• Terminologies & Notations
• Feature, label, loss
• Univariate & multivariate linear regression
• Solving parameters via gradient descent
• Taking gradients of vectors & matrices
National University of Singapore – CS5242 54
References
• [1] Goodfellow Ian, Bengio Yoshua, Courville Aaron. Deep learning. MIT
Press. http://www.deeplearningbook.org
• [2] Haohan Wang, Bhiksha Raj. On the Origin of Deep Learning. 2017
https://arxiv.org/abs/1702.07800
• [3] H. Lee, R. Grosse, R. Ranganath, and A. Y. Ng. “Convolutional deep belief
networks for scalable unsupervised learning of hierarchical
representations.” In ICML 2009
• http://neuralnetworksanddeeplearning.com/chap1.html
• https://www.analyticsvidhya.com/blog/2017/06/a-comprehensive-guide-
for-linear-ridge-and-lasso-regression/
• https://medium.com/meta-design-ideas/math-stats-and-nlp-for-machine-
learning-as-fast-as-possible-915ef47ced5f
National University of Singapore – CS5242 55
Homework
Theory
1. Solve for the gradients of the functions. (slide 34)
2. Solve for the dimensionality of the gradient vectors (slide 44)
3. Fill out the gradient tables (slide 45).
Practical
1. https://tinyurl.com/y26qj2ts
National University of Singapore – CS5242 56
Next Lecture
(Shallow) Neural Networks
National University of Singapore – CS5242 57

More Related Content

Similar to lecture01v1.pdf

Notes from 2016 bay area deep learning school
Notes from 2016 bay area deep learning school Notes from 2016 bay area deep learning school
Notes from 2016 bay area deep learning school Niketan Pansare
 
Tsunami alerting with Cassandra (From 0 to Cassandra on AWS in 30 days)
Tsunami alerting with Cassandra (From 0 to Cassandra on AWS in 30 days)Tsunami alerting with Cassandra (From 0 to Cassandra on AWS in 30 days)
Tsunami alerting with Cassandra (From 0 to Cassandra on AWS in 30 days)andrei.arion
 
Enabling Real-Time Adaptivity in MOOCs with a Personalized Next-Step Recommen...
Enabling Real-Time Adaptivity in MOOCs with a Personalized Next-Step Recommen...Enabling Real-Time Adaptivity in MOOCs with a Personalized Next-Step Recommen...
Enabling Real-Time Adaptivity in MOOCs with a Personalized Next-Step Recommen...Daniel Davis
 
Image Classification (20230411)
Image Classification (20230411)Image Classification (20230411)
Image Classification (20230411)FEG
 
Model Technology Enhanced Classrooms
Model Technology Enhanced ClassroomsModel Technology Enhanced Classrooms
Model Technology Enhanced ClassroomsBCcampus
 
Introduction To Neural Network
Introduction To Neural NetworkIntroduction To Neural Network
Introduction To Neural NetworkBangalore
 
Texas Rangers to the rescue: turning your VLE into an exam centre
Texas Rangers to the rescue: turning your VLE into an exam centreTexas Rangers to the rescue: turning your VLE into an exam centre
Texas Rangers to the rescue: turning your VLE into an exam centreBlackboardEMEA
 
Entity embeddings for categorical data
Entity embeddings for categorical dataEntity embeddings for categorical data
Entity embeddings for categorical dataPaul Skeie
 
Search to Distill: Pearls are Everywhere but not the Eyes
Search to Distill: Pearls are Everywhere but not the EyesSearch to Distill: Pearls are Everywhere but not the Eyes
Search to Distill: Pearls are Everywhere but not the EyesSungchul Kim
 
PAKDD2016 Tutorial DLIF: Introduction and Basics
PAKDD2016 Tutorial DLIF: Introduction and BasicsPAKDD2016 Tutorial DLIF: Introduction and Basics
PAKDD2016 Tutorial DLIF: Introduction and BasicsAtsunori Kanemura
 
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsBenjamin Le
 
66e2b8f0-325d-4394-9b45-0a598362a7da.ppt
66e2b8f0-325d-4394-9b45-0a598362a7da.ppt66e2b8f0-325d-4394-9b45-0a598362a7da.ppt
66e2b8f0-325d-4394-9b45-0a598362a7da.pptnebulaa2
 
Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...
Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...
Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...Seonho Park
 
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionTe-Yen Liu
 
Towards a Syllabus Repository for Computer Science Courses
Towards a Syllabus Repository for Computer Science CoursesTowards a Syllabus Repository for Computer Science Courses
Towards a Syllabus Repository for Computer Science CoursesManas Tungare
 
Neo4j GraphTalk Basel - Building intelligent Software with Graphs
Neo4j GraphTalk Basel - Building intelligent Software with GraphsNeo4j GraphTalk Basel - Building intelligent Software with Graphs
Neo4j GraphTalk Basel - Building intelligent Software with GraphsNeo4j
 
Symbolic Background Knowledge for Machine Learning
Symbolic Background Knowledge for Machine LearningSymbolic Background Knowledge for Machine Learning
Symbolic Background Knowledge for Machine LearningSteffen Staab
 

Similar to lecture01v1.pdf (20)

Notes from 2016 bay area deep learning school
Notes from 2016 bay area deep learning school Notes from 2016 bay area deep learning school
Notes from 2016 bay area deep learning school
 
Tsunami alerting with Cassandra (From 0 to Cassandra on AWS in 30 days)
Tsunami alerting with Cassandra (From 0 to Cassandra on AWS in 30 days)Tsunami alerting with Cassandra (From 0 to Cassandra on AWS in 30 days)
Tsunami alerting with Cassandra (From 0 to Cassandra on AWS in 30 days)
 
Enabling Real-Time Adaptivity in MOOCs with a Personalized Next-Step Recommen...
Enabling Real-Time Adaptivity in MOOCs with a Personalized Next-Step Recommen...Enabling Real-Time Adaptivity in MOOCs with a Personalized Next-Step Recommen...
Enabling Real-Time Adaptivity in MOOCs with a Personalized Next-Step Recommen...
 
Image Classification (20230411)
Image Classification (20230411)Image Classification (20230411)
Image Classification (20230411)
 
Hyun joong
Hyun joongHyun joong
Hyun joong
 
Datalake project
Datalake projectDatalake project
Datalake project
 
Model Technology Enhanced Classrooms
Model Technology Enhanced ClassroomsModel Technology Enhanced Classrooms
Model Technology Enhanced Classrooms
 
Introduction To Neural Network
Introduction To Neural NetworkIntroduction To Neural Network
Introduction To Neural Network
 
Texas Rangers to the rescue: turning your VLE into an exam centre
Texas Rangers to the rescue: turning your VLE into an exam centreTexas Rangers to the rescue: turning your VLE into an exam centre
Texas Rangers to the rescue: turning your VLE into an exam centre
 
Entity embeddings for categorical data
Entity embeddings for categorical dataEntity embeddings for categorical data
Entity embeddings for categorical data
 
Search to Distill: Pearls are Everywhere but not the Eyes
Search to Distill: Pearls are Everywhere but not the EyesSearch to Distill: Pearls are Everywhere but not the Eyes
Search to Distill: Pearls are Everywhere but not the Eyes
 
PAKDD2016 Tutorial DLIF: Introduction and Basics
PAKDD2016 Tutorial DLIF: Introduction and BasicsPAKDD2016 Tutorial DLIF: Introduction and Basics
PAKDD2016 Tutorial DLIF: Introduction and Basics
 
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender Systems
 
Presentation MaSE 18-102012
Presentation MaSE 18-102012Presentation MaSE 18-102012
Presentation MaSE 18-102012
 
66e2b8f0-325d-4394-9b45-0a598362a7da.ppt
66e2b8f0-325d-4394-9b45-0a598362a7da.ppt66e2b8f0-325d-4394-9b45-0a598362a7da.ppt
66e2b8f0-325d-4394-9b45-0a598362a7da.ppt
 
Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...
Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...
Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...
 
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis Introduction
 
Towards a Syllabus Repository for Computer Science Courses
Towards a Syllabus Repository for Computer Science CoursesTowards a Syllabus Repository for Computer Science Courses
Towards a Syllabus Repository for Computer Science Courses
 
Neo4j GraphTalk Basel - Building intelligent Software with Graphs
Neo4j GraphTalk Basel - Building intelligent Software with GraphsNeo4j GraphTalk Basel - Building intelligent Software with Graphs
Neo4j GraphTalk Basel - Building intelligent Software with Graphs
 
Symbolic Background Knowledge for Machine Learning
Symbolic Background Knowledge for Machine LearningSymbolic Background Knowledge for Machine Learning
Symbolic Background Knowledge for Machine Learning
 

Recently uploaded

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governanceWSO2
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAnitaRaj43
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightSafe Software
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaWSO2
 
Choreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringChoreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringWSO2
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc
 

Recently uploaded (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governance
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using Ballerina
 
Choreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringChoreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software Engineering
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 

lecture01v1.pdf

  • 1. • This session is recorded. • Mute your mic. • You can leave messages in the chat panel to ask questions. • When I check your progress, please click if you are done. • We will start at 6:35. National University of Singapore – CS5242 1
  • 2. CS5242 Neural Networks and Deep Learning Lecture 01: Introduction and Linear Regression Wei WANG TA: FU Di, LIANG Yuxuan, FuYujian cs5242@comp.nus.edu.sg National University of Singapore – CS5242 2 change log: V1: slide 41, the superscript should be n
  • 3. Agenda • Introduction of this module • Logistics • History • Linear regression • Univariate • Multivariate National University of Singapore – CS5242 3
  • 4. Instructor • WANG Wei • COM2-04-09 • wangwei@comp.nus.edu.sg • Research • Machine learning system optimization • Multi-modal data analysis/applications National University of Singapore – CS5242 4
  • 5. Teaching Assistants National University of Singapore – CS5242 5 e0427783@u.nus.edu e0409760@u.nus.edu e0427770@u.nus.edu LIANG Yuxuan FU Di FU Yujian Quiz Assignment Final project
  • 6. LumiNUS Poll National University of Singapore – CS5242 6 Your background?
  • 7. Pre-requisite • Machine Learning (CS3244) • Or https://www.coursera.org/learn/machine-learning • Linear Algebra (MA1101R) • Or https://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-spring-2010/ • Calculus (MA1521) • Probability (ST2334 ) • Brief summary Course • Python (ONLY, version 3.x) • Numpy • Keras, TensorFlow, PyTorch, MxNet, Caffe, SINGA • Jupyter notebook or Google colab Coding National University of Singapore – CS5242 7
  • 8. Grading policy •Assignment 30 points •Two assignments •15% off per day late (17:01 is the start of one day); 0 score if you submit it 7 days after the deadline •Online quiz 30 points •3 quizzes in total •You need a camera (from your laptop, mobile phone or external camera) •Project 40 points •Kaggle competition •Group size <=4 Weightage: •Every assignment is an individual task •The project is a group-based task •Avoid academic offence (cheating, plagiarism including copying code from the internet, e.g., github) Collaboration National University of Singapore – CS5242 8
  • 9. Contacts • LumiNUS forum • For all issues • Email: cs5242@comp.nus.edu.sg • For private/personal issues • Consultation (make appointments on LumiNUS) • Wei WANG, wangwei@comp.nus.edu.sg • Wed, 11:00-12:00 National University of Singapore – CS5242 9
  • 10. GPU machines • SoC GPU machines • https://dochub.comp.nus.edu.sg/cf/guides/compute-cluster/hardware • Google Cloud Platform • Some free credit for new register • GPU is expensive • Amazon EC2 (g2.8xlarge) • Some free credit for students • GPU is expensive • NSCC (National Super-Computing Center) • Free for NUS students • The libraries are sometimes outdated • Jobs will be submitted into a queue for executing • Important notes if you use cloud platforms: • STOP/TERMINATE the instance immediately after your program terminates • Check the usage status frequently to avoid high outstanding bills • Amazon/Google may charge you for additional storage volume National University of Singapore – CS5242 10
  • 11. Syllabus & Schedule • Learn various neural network models • From shallow to deep neural networks • Including the model structure, training algorithms, tricks and applications • Covering the principles, coding practices and a bit theory • Check LumiNUS for the latest plan National University of Singapore – CS5242 11
  • 12. Intended learning outcomes (my expectation) Explain the principles of the operations of different layers and training algorithms 1 Compare different neural network architectures 2 Implement popular neural networks 3 Solve problems using neural networks and deep learning techniques 4 National University of Singapore – CS5242 12
  • 13. Your expectation? National University of Singapore – CS5242 13
  • 14. I want to get an easy boost to my GPA. National University of Singapore – CS5242 14 This course may not be easy
  • 15. I want to earn the big bucks as a data scientist / ML engineer at Google / Amazon / Facebook. National University of Singapore – CS5242 15 This course provides the basics; but is not enough. check out: CS5260 – NN & Deep Learning II CS4243 – Computer vision & Pattern Recognition CS4248 – Natural Language Processing Slide credit: Angela Yao
  • 16. National University of Singapore – CS5242 16 Slide credit: Angela Yao
  • 17. My PhD advisor is making me take this course so that I can use deep learning in our research. National University of Singapore – CS5242 17 Neural Networks and Deep Learning: •Andrew Ng •https://www.coursera.org/learn/neural-networks-deep-learning CSC 321: Intro to Neural Networks and Machine Learning •Roger Grosse •http://www.cs.toronto.edu/~rgrosse/courses/csc321_2017/ Neural Networks for Machine Learning •Geoffrey Hinton •https://www.coursera.org/learn/neural-networks CS231n: Convolutional Neural Networks for Visual Recognition • Fei-Fei Li, Justin Johnson, Serena Yeung • http://cs231n.stanford.edu/ CS224d: Deep Learning for Natural Language Processing • Richard Socher • http://cs224d.stanford.edu/ Slide credit: Angela Yao
  • 18. AI is cool and I want to build the next Skynet. National University of Singapore – CS5242 18 Slide credit: Angela Yao
  • 19. Terminology Overload: AI vs. ML vs. DL National University of Singapore – CS5242 19 https://blogs.nvidia.com/blog/2016/07/29/whats-difference-artificial-intelligence-machine-learning-deep-learning-ai/
  • 20. AI • 1950’s • “Human intelligence exhibited by machines” • Expert systems; Rules • Machine learning algorithms • Narrow AI: image recognition, machine translation National University of Singapore – CS5242 20 https://www.youtube.com/watch?v=nASDYRkbQIY
  • 21. Machine Learning • 1980’s • “An approach to achieve AI through systems that can learn from experience to find patterns in that data” • Can we code a program with rules (like bubble sorting) to do • Image recognition • Speech recognition National University of Singapore – CS5242 21 This Photo by Unknown Author is licensed under CC BY-NC-ND Program Plane
  • 22. Neural networks and Deep Learning National University of Singapore – CS5242 22 • 1950’s • A class of machine learning algorithm that use a cascade of multiple layers of nonlinear processing units for feature extraction and transformation---Wikipedia x y f1() f2() f3() Image source: http://pediaa.com/difference-between-nerve-and-neuron/ Neural nets/perceptrons are loosely inspired by biology. But they certainly are not a model of how the brain works, or even how neurons work. Slide credit: Angela Yao
  • 23. Neural networks and Deep Learning • Feature learning (transformation) 23 National University of Singapore – CS5242
  • 24. Neural networks and Deep Learning • Feature learning (transformation) Data Performance 24 Deep learning is not a killer for everything National University of Singapore – CS5242
  • 25. Neural networks (NN) • Linear, polynomial, logistic, multinomial regression • Perceptron and multi-layer perceptron (MLP) • Convolutional neural network (CNN) • Recurrent neural networks (RNN) • Generative adversarial networks (GAN) • (Restricted) Boltzmann machine • Deep belief network • Spike neural network • Radial basis function neural network • Hopfield networks National University of Singapore – CS5242 25 Source from [3]
  • 26. History [2] First perceptron, Frank Rosenblatt 1958 Backpropagation, Paul Werbos 1974 Neocogitron (inspired CNN), Kunihiko Fukushima 1980 Recurrent Neural Network (RNN), Michael I. Jordan 1986 LeNet, Yann LeCun 1990 Hochreiter & Schmidhube, LSTM 1998 Deep Belief Networks, Geoffrey Hinton 2006 AlexNet, Alex Krizhevsky & Geoffrey Hinton 2012 GoogleNet & VGG 2014 Generative adversarial network, Ian Goodfellow 2014 ResNet, Kaimin He 2015 AlphaGo, DeepMind 2015 National University of Singapore – CS5242 26 Refer to http://people.idsia.ch/~juergen/who-invented-backpropagation.html for more papers https://www.youtube.com/watch?v=yaL5ZMvRRqE
  • 27. Applications National University of Singapore – CS5242 27 • Image classification • Object detection, demo • Scene text recognition • Neural style transferring • Image generation Computer Vision • Question answering • Machine translation Natural language processing • Speech recognition, e.g. Amazon Echo Speech
  • 28. Univariate Linear Regression 28 National University of Singapore – CS5242
  • 29. Weight vs Height Problem definition: • Predict the weight given the height of a person. Data • Input: Height (inches) • Output: Weight (pounds) • Split the data into • 80% for training • 20% for evaluation 29 Data source: https://www.kaggle.com/mustafaali96/weight-height National University of Singapore – CS5242
  • 30. Modeling • Notation • A person is called an example/instance • Height denoted as 𝑥𝑥 ∈ 𝑅𝑅: input feature • Weight denoted as 𝑦𝑦 ∈ 𝑅𝑅: the target or ground truth • Map from input to output by linear regression • � 𝑦𝑦 = 𝑥𝑥𝑥𝑥 + 𝑏𝑏, 𝑤𝑤 ∈ 𝑅𝑅, 𝑏𝑏 ∈ 𝑅𝑅 • 𝑤𝑤 is the slope and 𝑏𝑏 is the intercept • � 𝑦𝑦 is called the prediction 30 National University of Singapore – CS5242
  • 31. Training • Learn 𝑤𝑤 and 𝑏𝑏 to fit the training data 𝑆𝑆𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 = (𝑥𝑥(𝑖𝑖), 𝑦𝑦(𝑖𝑖)) , 𝑖𝑖 = 1 … 𝑚𝑚 • That fit the data well • How to measure the quality of w and b? • Loss function • The smaller the loss value, the better the prediction (closer to the target) • 𝐿𝐿 𝑥𝑥, 𝑦𝑦|𝑤𝑤, 𝑏𝑏 = |� 𝑦𝑦 − 𝑦𝑦| • 𝐿𝐿 𝑥𝑥, 𝑦𝑦|𝑤𝑤, 𝑏𝑏 = 1 2 ||� 𝑦𝑦 − 𝑦𝑦||2 = 1 2 (� 𝑦𝑦 − 𝑦𝑦)2 • Easier for optimization/training because it is differentiable • The coefficient 1 2 is to make the gradient simple • Define the training objective • 𝐽𝐽(𝑤𝑤, 𝑏𝑏) = 1 𝑚𝑚 ∑𝑖𝑖=1 𝑚𝑚 𝐿𝐿 𝑥𝑥(𝑖𝑖) , 𝑦𝑦(𝑖𝑖) 𝑤𝑤, 𝑏𝑏 31 National University of Singapore – CS5242
  • 32. Tune the model parameters over the training instances to minimize the average loss, i.e., the training objective Optimization/training 32 National University of Singapore – CS5242 min 𝑤𝑤,𝑏𝑏 𝐽𝐽 𝑤𝑤, 𝑏𝑏 → min 𝑤𝑤,𝑏𝑏 1 𝑚𝑚 ∑𝑖𝑖=1 𝑚𝑚 𝐿𝐿 𝑥𝑥 𝑖𝑖 , 𝑦𝑦 𝑖𝑖 𝑤𝑤, 𝑏𝑏 → min 𝑤𝑤,𝑏𝑏 1 2𝑚𝑚 ∑𝑖𝑖=1 𝑚𝑚 𝑤𝑤𝑥𝑥 𝑖𝑖 + 𝑏𝑏 − 𝑦𝑦 𝑖𝑖 2
  • 33. 1. Fix b and learn w min 𝑤𝑤 1 2𝑚𝑚 � 𝑖𝑖=1 𝑚𝑚 𝑥𝑥 𝑖𝑖 2 𝑤𝑤2 + 1 𝑚𝑚 � 𝑖𝑖=1 𝑚𝑚 𝑥𝑥 𝑖𝑖 (𝑏𝑏 − 𝑦𝑦 𝑖𝑖 ) 𝑤𝑤 + 1 2𝑚𝑚 � 𝑖𝑖=1 𝑚𝑚 𝑏𝑏 − 𝑦𝑦 𝑖𝑖 2 → min 𝑤𝑤 𝑐𝑐1𝑤𝑤2 + 𝑐𝑐2𝑤𝑤 + 𝑐𝑐3 2. Fix w and learn b Optimization/training 33 National University of Singapore – CS5242
  • 34. Gradient for univariate simple functions Gradient: recall high school calculus and how to take derivatives • 𝜕𝜕𝜕𝜕 𝑥𝑥 𝜕𝜕𝜕𝜕 = lim ℎ→0 𝑓𝑓 𝑥𝑥+ℎ −𝑓𝑓 𝑥𝑥 ℎ • 𝑓𝑓 𝑥𝑥 = 3, 𝜕𝜕𝜕𝜕 𝑥𝑥 𝜕𝜕𝜕𝜕 = 0 • 𝑓𝑓 𝑥𝑥 = 3𝑥𝑥, 𝜕𝜕𝜕𝜕 𝑥𝑥 𝜕𝜕𝜕𝜕 = 3 • 𝑓𝑓 𝑥𝑥 = 𝑥𝑥2 , 𝜕𝜕𝜕𝜕 𝑥𝑥 𝜕𝜕𝜕𝜕 = 2𝑥𝑥 34 National University of Singapore – CS5242
  • 35. Gradient for univariate composite functions • 𝑓𝑓 𝑥𝑥 = 𝑔𝑔 𝑥𝑥 + ℎ 𝑥𝑥 • 𝑓𝑓 𝑥𝑥 = 3𝑥𝑥 + 𝑥𝑥2 • 𝑓𝑓 𝑥𝑥 = 𝑔𝑔 𝑥𝑥 ℎ 𝑥𝑥 • 𝑓𝑓 𝑥𝑥 = 𝑥𝑥𝑥𝑥2 • 𝑓𝑓 𝑥𝑥 = 𝑔𝑔 𝑥𝑥 ℎ(𝑥𝑥) • 𝑓𝑓 𝑥𝑥 = 𝑒𝑒𝑥𝑥 𝑥𝑥 • 𝑓𝑓 𝑥𝑥 = 𝑔𝑔 𝑢𝑢 , 𝑢𝑢 = ℎ(𝑥𝑥) • 𝑓𝑓 𝑥𝑥 = ln(𝑥𝑥 + 𝑥𝑥2 ) • 𝑓𝑓′ 𝑥𝑥 = 𝑔𝑔′ 𝑥𝑥 + ℎ′ 𝑥𝑥 • 𝑓𝑓′ 𝑥𝑥 = 𝑔𝑔′ (𝑥𝑥)ℎ 𝑥𝑥 + 𝑔𝑔 𝑥𝑥 ℎ′ 𝑥𝑥 • 𝑓𝑓𝑓 𝑥𝑥 = 𝑔𝑔′ 𝑥𝑥 ℎ 𝑥𝑥 −𝑔𝑔 𝑥𝑥 ℎ′ (𝑥𝑥) ℎ 𝑥𝑥 2 • 𝑓𝑓′ 𝑥𝑥 = 𝑔𝑔′ 𝑢𝑢 ℎ′(𝑥𝑥) 35 We use 𝑓𝑓𝑓(𝑥𝑥) and 𝜕𝜕𝜕𝜕 𝑥𝑥 𝜕𝜕𝜕𝜕 interchangeably for the gradient of f(x) with respect to x, where x and f(x) are scalars National University of Singapore – CS5242 HW1: solve for the derivatives / gradients of these functions.
  • 36. Training by gradient descent 36 J w w0 w2 w1 Initialize w as w0 Compute 𝜕𝜕𝜕𝜕 𝜕𝜕𝜕𝜕 |𝑤𝑤=𝑤𝑤𝑤 , negative; Move w from w0 to the right by w1 = w0 − α 𝜕𝜕𝜕𝜕 𝜕𝜕𝜕𝜕 |𝑤𝑤=𝑤𝑤𝑤 Compute 𝜕𝜕𝜕𝜕 𝜕𝜕𝜕𝜕 |𝑤𝑤=𝑤𝑤1, positive; Move w from w1 to the left by w2 = w1 − α 𝜕𝜕𝜕𝜕 𝜕𝜕𝜕𝜕 |𝑤𝑤=𝑤𝑤1 Compute 𝜕𝜕𝜕𝜕 𝜕𝜕𝜕𝜕 |𝑤𝑤=𝑤𝑤2, negative Move w from w2 to the right by w3 = w2 − α 𝜕𝜕𝜕𝜕 𝜕𝜕𝜕𝜕 |𝑤𝑤=𝑤𝑤2 Find the w for which we have the lowest* J. α is the learning rate, which controls the moving step length. It’s important for convergence. If it is large, w oscillates around the optimal position. If it is small, it takes many iterations to reach the optimum. For example: 𝑐𝑐1 = 1, 𝑐𝑐2 = −2, 𝑐𝑐3 = 1 National University of Singapore – CS5242 𝐽𝐽 = 𝑐𝑐1𝑤𝑤2 + 𝑐𝑐2𝑤𝑤 + 𝑐𝑐3
  • 37. Training by gradient descent • Gradient descent algorithm for optimization • Set w = 0.1 or a random number • Repeat • For each data sample, compute � 𝑦𝑦 = 𝑥𝑥𝑥𝑥 + 𝑏𝑏 • Compute the average loss, ∑<𝑥𝑥,𝑦𝑦>∈𝑆𝑆𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝐿𝐿 𝑥𝑥, 𝑦𝑦 𝑤𝑤, 𝑏𝑏 /|𝑆𝑆𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡| • Compute 𝜕𝜕𝐽𝐽 𝜕𝜕𝑤𝑤 • Update w = w − α 𝜕𝜕𝐽𝐽 𝜕𝜕𝑤𝑤 37 National University of Singapore – CS5242 Update w, b repeatedly… What if there are multiple parameters, e.g., multiple variables?
  • 38. Machine learning pipeline Prepare data Extract feature Create model Find loss function Conduct training Evaluate Deploy 38 National University of Singapore – CS5242
  • 39. Multivariate Linear Regression National University of Singapore – CS5242 39
  • 40. Multivariate linear regression • Consider the problem of house price prediction • Each instance in the training dataset • Denote the features using a column vector • 𝒙𝒙 ∈ 𝑅𝑅𝑛𝑛×1: 𝑥𝑥i is the i-th feature • size, floor, location, age, lease, etc. • Target 𝑦𝑦 ∈ 𝑅𝑅: price 40 𝒙𝒙 = 𝑥𝑥1 𝑥𝑥2 … 𝑥𝑥𝑛𝑛 National University of Singapore – CS5242
  • 41. • Map from input to output � 𝑦𝑦 = 𝒘𝒘𝑻𝑻𝒙𝒙 + 𝑏𝑏, 𝒘𝒘 ∈ 𝑅𝑅𝑛𝑛×1, 𝑏𝑏 ∈ 𝑅𝑅 , 𝑤𝑤𝑖𝑖 is the i-th element of 𝒘𝒘 (importance of i-th feature) = ∑𝑖𝑖=1 𝑛𝑛 𝑤𝑤𝑖𝑖𝑥𝑥𝑖𝑖 + 𝑏𝑏 = 𝑤𝑤1, 𝑤𝑤2, … , 𝑤𝑤𝑛𝑛 𝑥𝑥1 𝑥𝑥2 … 𝑥𝑥𝑛𝑛 + 𝑏𝑏 = 𝑤𝑤1, 𝑤𝑤2, … , 𝑤𝑤𝑛𝑛, 𝑏𝑏 𝑥𝑥1 𝑥𝑥2 … 𝑥𝑥𝑛𝑛 1  � 𝑦𝑦 = � 𝒘𝒘𝑇𝑇� 𝒙𝒙 Model 41 National University of Singapore – CS5242 For the rest of this module, we use 𝒘𝒘 and 𝒙𝒙 for � 𝒘𝒘 and � 𝒙𝒙 respectively unless there is a special definition of 𝒘𝒘 and 𝒙𝒙 . The summation is over n elements, where n is the number of features per instance.
  • 42. Optimization • Compute the gradient of the loss with respect to (w.r.t) w • Consider a single instance 42 𝐽𝐽 𝒘𝒘 = 𝐿𝐿 𝒙𝒙, 𝑦𝑦 𝒘𝒘 = 1 2 � 𝑦𝑦 − 𝑦𝑦 2 = 1 2 𝒘𝒘𝑻𝑻𝒙𝒙 − 𝑦𝑦 2 𝜕𝜕𝐽𝐽 𝒘𝒘 𝜕𝜕𝒘𝒘 =? National University of Singapore – CS5242
  • 43. Gradient of vector and matrix • Vectors • By default, is a column vector • Denoted as 𝒙𝒙, 𝒚𝒚, 𝒛𝒛 43 (denominator layout) National University of Singapore – CS5242 Denominator layout: the result shape is (𝑛𝑛, 𝑚𝑚)
  • 44. Gradient of vector and matrix 44 • Matrix • Denoted as X, Y, Z (denominator layout) 𝜕𝜕𝒀𝒀 𝜕𝜕𝜕𝜕 = 𝜕𝜕𝑦𝑦11 𝜕𝜕𝜕𝜕 𝜕𝜕𝑦𝑦21 𝜕𝜕𝜕𝜕 𝜕𝜕𝑦𝑦12 𝜕𝜕𝜕𝜕 𝜕𝜕𝑦𝑦22 𝜕𝜕𝜕𝜕 ⋯ 𝜕𝜕𝑦𝑦𝑚𝑚1 𝜕𝜕𝜕𝜕 𝜕𝜕𝑦𝑦𝑚𝑚𝑚 𝜕𝜕𝜕𝜕 ⋮ ⋱ ⋮ 𝜕𝜕𝑦𝑦1𝑛𝑛 𝜕𝜕𝜕𝜕 𝜕𝜕𝑦𝑦2𝑛𝑛 𝜕𝜕𝜕𝜕 ⋯ 𝜕𝜕𝑦𝑦𝑚𝑚𝑚𝑚 𝜕𝜕𝜕𝜕 National University of Singapore – CS5242 Gradient of matrix with respect to (w.r.t) matrix, matrix w.r.t vector, vector w.r.t matrix? Too complex. Not commonly used.
  • 45. Gradient table Shape of 𝜕𝜕𝜕𝜕 𝜕𝜕𝜕𝜕 Scalar y Vector y (m ,1) Matrix Y (m, n) Scalar x ? ? ? Vector x (n ,1) (n, 1) ? Matrix X (p, q) ? 45 (denominator layout) National University of Singapore – CS5242 HW2: fill in the table with the matrix dimensions of the resulting gradients.
  • 46. Gradient table: vector by vector 𝒚𝒚 = 𝒂𝒂 𝒙𝒙 𝑨𝑨𝑨𝑨 𝒙𝒙𝑻𝑻 𝑨𝑨 𝝏𝝏𝝏𝝏 𝝏𝝏𝝏𝝏 = ? ? 𝑨𝑨𝑇𝑇 ? 46 𝒚𝒚 = 𝑎𝑎𝒖𝒖 𝒖𝒖 = 𝒖𝒖(𝒙𝒙) 𝑣𝑣𝒖𝒖 𝑣𝑣 = 𝑣𝑣(𝒙𝒙), 𝒖𝒖 = 𝒖𝒖(𝒙𝒙) 𝒗𝒗 + 𝒖𝒖 𝒗𝒗 = 𝒗𝒗(𝒙𝒙), 𝒖𝒖 = 𝒖𝒖(𝒙𝒙) 𝑨𝑨𝑨𝑨 𝒖𝒖 = 𝒖𝒖(𝒙𝒙) 𝑔𝑔(𝒖𝒖) 𝒖𝒖 = 𝒖𝒖(𝒙𝒙) 𝝏𝝏𝝏𝝏 𝝏𝝏𝝏𝝏 = ? 𝑣𝑣 𝝏𝝏𝝏𝝏 𝝏𝝏𝝏𝝏 + 𝝏𝝏𝑣𝑣 𝝏𝝏𝝏𝝏 𝒖𝒖𝑻𝑻 ? ? ? (denominator layout) National University of Singapore – CS5242 𝑦𝑦 = 𝑎𝑎 𝒖𝒖𝑻𝑻 𝒗𝒗 𝒗𝒗 = 𝒗𝒗(𝒙𝒙), 𝒖𝒖 = 𝒖𝒖(𝒙𝒙) 𝑔𝑔 𝑢𝑢 𝑢𝑢 = 𝑢𝑢(𝒙𝒙) 𝒙𝒙𝑻𝑻 𝐴𝐴𝒙𝒙 𝝏𝝏𝝏𝝏 𝝏𝝏𝝏𝝏 = ? ? ? ? HW3: fill in the gradient table with the resulting gradients
  • 47. Optimization • Compute the gradient of the loss w.r.t w • Consider a single instance 47 𝐽𝐽 𝒘𝒘 = 𝐿𝐿 𝒙𝒙, 𝑦𝑦 𝒘𝒘 = 1 2 � 𝑦𝑦 − 𝑦𝑦 2 = 1 2 𝒘𝒘𝑻𝑻𝒙𝒙 − 𝑦𝑦 2 𝑧𝑧 = 𝒘𝒘𝑻𝑻𝒙𝒙 − 𝑦𝑦 𝐽𝐽(𝒘𝒘) = 1 2 𝑧𝑧2 𝜕𝜕𝜕𝜕 𝒘𝒘 𝜕𝜕𝒘𝒘 = 𝜕𝜕𝜕𝜕 𝜕𝜕𝑧𝑧 𝜕𝜕𝑧𝑧 𝜕𝜕𝒘𝒘 = 𝑧𝑧𝒙𝒙 = 𝒘𝒘𝑻𝑻𝒙𝒙 − 𝑦𝑦 𝒙𝒙 𝜕𝜕𝐽𝐽 𝒘𝒘 𝜕𝜕𝒘𝒘 =? National University of Singapore – CS5242 Shape Check!
  • 48. Vectorization • Consider multiple examples {(𝒙𝒙 𝑖𝑖 , 𝑦𝑦(𝑖𝑖) )}, i=1, 2, …, m • Naïve approach of computing the gradient • for each instance (𝒙𝒙 𝑖𝑖 , 𝑦𝑦(𝑖𝑖)) • Accumulate the loss 𝐿𝐿(𝒙𝒙 𝑖𝑖 , 𝑦𝑦(𝑖𝑖) ) into 𝐽𝐽 • Average J over m • for each instance (𝒙𝒙 𝑖𝑖 , 𝑦𝑦(𝑖𝑖)) • Accumulate the gradient of 𝜕𝜕𝐿𝐿 𝒙𝒙 𝑖𝑖 ,𝑦𝑦 𝑖𝑖 𝜕𝜕𝒘𝒘 • Not as fast as matrix operations 48 Image source: https://www.ubergizmo.com/what-is/gpu-graphics-processing-unit/ National University of Singapore – CS5242
  • 49. Vectorization • Consider multiple examples {(𝒙𝒙 𝑖𝑖 , 𝑦𝑦(𝑖𝑖) )}, i=1, 2, …, m • Put each instance (the feature vector) into one row of a matrix • For fast memory access as Numpy stores data in row-major format • Put each target into one element of a column vector 49 National University of Singapore – CS5242 � 𝒚𝒚 = 𝑋𝑋𝒘𝒘 𝑋𝑋 = 𝒙𝒙 𝟏𝟏 𝑻𝑻 𝒙𝒙 𝟐𝟐 𝑻𝑻 … 𝒙𝒙 𝒎𝒎 𝑻𝑻 𝒚𝒚 = 𝑦𝑦(1) 𝑦𝑦(2) … 𝑦𝑦(𝑚𝑚) 𝐽𝐽 𝒘𝒘 =?
  • 50. Vectorization • Consider multiple examples {(𝒙𝒙 𝑖𝑖 , 𝑦𝑦(𝑖𝑖) )}, i=1, 2, …, m • Put each instance (the feature vector) into one row of a matrix • For fast memory access as Numpy stores data in row-major format • Put each target into one element of a column vector 50 National University of Singapore – CS5242 � 𝒚𝒚 = 𝑋𝑋𝒘𝒘 𝑋𝑋 = 𝒙𝒙 𝟏𝟏 𝑻𝑻 𝒙𝒙 𝟐𝟐 𝑻𝑻 … 𝒙𝒙 𝒎𝒎 𝑻𝑻 𝒚𝒚 = 𝑦𝑦(1) 𝑦𝑦(2) … 𝑦𝑦(𝑚𝑚) 𝐽𝐽 𝒘𝒘 = 1 2𝑚𝑚 � 𝒚𝒚 − 𝑦𝑦 2 = 1 2𝑚𝑚 � 𝒚𝒚 − 𝒚𝒚 T (� 𝒚𝒚 − 𝒚𝒚) 𝜕𝜕𝐽𝐽 𝒘𝒘 𝝏𝝏𝝏𝝏 =? 𝒖𝒖 = � 𝒚𝒚 − 𝒚𝒚
  • 51. Vectorization • Consider multiple examples {(𝒙𝒙 𝑖𝑖 , 𝑦𝑦(𝑖𝑖) )}, i=1, 2, …, m • Put each instance (the feature vector) into one row of a matrix • For fast memory access as numpy stores data in row-major format • Put each target into one element of a column vector 51 � 𝒚𝒚 = 𝑋𝑋𝒘𝒘 𝑋𝑋 = 𝒙𝒙 𝟏𝟏 𝑻𝑻 𝒙𝒙 𝟐𝟐 𝑻𝑻 … 𝒙𝒙 𝒎𝒎 𝑻𝑻 𝒚𝒚 = 𝑦𝑦(1) 𝑦𝑦(2) … 𝑦𝑦(𝑚𝑚) 𝐽𝐽 𝒘𝒘 = 1 2𝑚𝑚 � 𝒚𝒚 − 𝑦𝑦 2 = 1 2𝑚𝑚 � 𝒚𝒚 − 𝒚𝒚 T (� 𝒚𝒚 − 𝒚𝒚) 𝜕𝜕𝐽𝐽 𝒘𝒘 𝝏𝝏𝝏𝝏 = 1 2𝑚𝑚 𝜕𝜕𝒖𝒖 𝝏𝝏𝒘𝒘 𝒖𝒖 + 𝜕𝜕𝒖𝒖 𝝏𝝏𝝏𝝏 𝒖𝒖 = 1 𝑚𝑚 𝑿𝑿𝑻𝑻 𝒖𝒖 = 1 𝑚𝑚 𝑿𝑿𝑻𝑻 (� 𝒚𝒚 − 𝒚𝒚) 𝒖𝒖 = � 𝒚𝒚 − 𝒚𝒚 National University of Singapore – CS5242 Shape Check!
  • 52. Training by gradient descent • Gradient descent algorithm for optimization • initialize 𝒘𝒘 randomly • Repeat • Compute the objective 𝐽𝐽(𝒘𝒘) over all training instances • Compute 𝜕𝜕𝐽𝐽 𝜕𝜕𝒘𝒘 • Update 𝐰𝐰 = 𝐰𝐰 − α 𝜕𝜕𝐽𝐽 𝜕𝜕𝒘𝒘 52 National University of Singapore – CS5242
  • 53. Notation summary • Scalar 𝑥𝑥 • Vector 𝒙𝒙 ∈ 𝑅𝑅𝑛𝑛×1 • i-th element of a vector, 𝑥𝑥𝑖𝑖 • i-th example 𝒙𝒙(𝑖𝑖) • Matrix 𝑿𝑿 • i-th row and j-th column 𝑋𝑋𝑖𝑖𝑗𝑗 • If we choose denominator layout for 𝝏𝝏𝒚𝒚 𝝏𝝏𝝏𝝏 we should lay out the gradient 𝝏𝝏𝑦𝑦 𝝏𝝏𝝏𝝏 as a column vector, and 𝝏𝝏𝒚𝒚 𝝏𝝏𝑥𝑥 as a row vector. 53 National University of Singapore – CS5242
  • 54. Summary • AI vs. ML vs. Deep Learning • Terminologies & Notations • Feature, label, loss • Univariate & multivariate linear regression • Solving parameters via gradient descent • Taking gradients of vectors & matrices National University of Singapore – CS5242 54
  • 55. References • [1] Goodfellow Ian, Bengio Yoshua, Courville Aaron. Deep learning. MIT Press. http://www.deeplearningbook.org • [2] Haohan Wang, Bhiksha Raj. On the Origin of Deep Learning. 2017 https://arxiv.org/abs/1702.07800 • [3] H. Lee, R. Grosse, R. Ranganath, and A. Y. Ng. “Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations.” In ICML 2009 • http://neuralnetworksanddeeplearning.com/chap1.html • https://www.analyticsvidhya.com/blog/2017/06/a-comprehensive-guide- for-linear-ridge-and-lasso-regression/ • https://medium.com/meta-design-ideas/math-stats-and-nlp-for-machine- learning-as-fast-as-possible-915ef47ced5f National University of Singapore – CS5242 55
  • 56. Homework Theory 1. Solve for the gradients of the functions. (slide 34) 2. Solve for the dimensionality of the gradient vectors (slide 44) 3. Fill out the gradient tables (slide 45). Practical 1. https://tinyurl.com/y26qj2ts National University of Singapore – CS5242 56
  • 57. Next Lecture (Shallow) Neural Networks National University of Singapore – CS5242 57