SlideShare a Scribd company logo
1 of 31
Download to read offline
Machine Learning Basics
Lecture 1: Linear Regression
Princeton University COS 495
Instructor: Yingyu Liang
Machine learning basics
What is machine learning?
• “A computer program is said to learn from experience E with respect
to some class of tasks T and performance measure P, if its
performance at tasks in T as measured by P, improves with experience
E.”
------- Machine Learning, Tom Mitchell, 1997
Example 1: image classification
Task: determine if the image is indoor or outdoor
Performance measure: probability of misclassification
Example 1: image classification
indoor outdoor
Experience/Data:
images with labels
Indoor
Example 1: image classification
• A few terminologies
• Training data: the images given for learning
• Test data: the images to be classified
• Binary classification: classify into two classes
Example 1: image classification (multi-class)
ImageNet figure borrowed from vision.standford.edu
Example 2: clustering images
Task: partition the images into 2 groups
Performance: similarities within groups
Data: a set of images
Example 2: clustering images
• A few terminologies
• Unlabeled data vs labeled data
• Supervised learning vs unsupervised learning
Math formulation
Color Histogram
Red Green Blue
Indoor 0
Feature vector: 𝑥𝑖
Label: 𝑦𝑖
Extract
features
Math formulation
Color Histogram
Red Green Blue
outdoor 1
Feature vector: 𝑥𝑗
Label: 𝑦𝑗
Extract
features
Math formulation
• Given training data 𝑥𝑖, 𝑦𝑖 : 1 ≤ 𝑖 ≤ 𝑛
• Find 𝑦 = 𝑓(𝑥) using training data
• s.t. 𝑓 correct on test data
What kind of functions?
Math formulation
• Given training data 𝑥𝑖, 𝑦𝑖 : 1 ≤ 𝑖 ≤ 𝑛
• Find 𝑦 = 𝑓(𝑥) ∈ 𝓗 using training data
• s.t. 𝑓 correct on test data
Hypothesis class
Math formulation
• Given training data 𝑥𝑖, 𝑦𝑖 : 1 ≤ 𝑖 ≤ 𝑛
• Find 𝑦 = 𝑓(𝑥) ∈ 𝓗 using training data
• s.t. 𝑓 correct on test data
Connection between
training data and test data?
Math formulation
• Given training data 𝑥𝑖, 𝑦𝑖 : 1 ≤ 𝑖 ≤ 𝑛 i.i.d. from distribution 𝐷
• Find 𝑦 = 𝑓(𝑥) ∈ 𝓗 using training data
• s.t. 𝑓 correct on test data i.i.d. from distribution 𝐷
They have the same
distribution
i.i.d.: independently
identically distributed
Math formulation
• Given training data 𝑥𝑖, 𝑦𝑖 : 1 ≤ 𝑖 ≤ 𝑛 i.i.d. from distribution 𝐷
• Find 𝑦 = 𝑓(𝑥) ∈ 𝓗 using training data
• s.t. 𝑓 correct on test data i.i.d. from distribution 𝐷
What kind of performance
measure?
Math formulation
• Given training data 𝑥𝑖, 𝑦𝑖 : 1 ≤ 𝑖 ≤ 𝑛 i.i.d. from distribution 𝐷
• Find 𝑦 = 𝑓(𝑥) ∈ 𝓗 using training data
• s.t. the expected loss is small
𝐿 𝑓 = 𝔼 𝑥,𝑦 ~𝐷[𝑙(𝑓, 𝑥, 𝑦)] Various loss functions
Math formulation
• Given training data 𝑥𝑖, 𝑦𝑖 : 1 ≤ 𝑖 ≤ 𝑛 i.i.d. from distribution 𝐷
• Find 𝑦 = 𝑓(𝑥) ∈ 𝓗 using training data
• s.t. the expected loss is small
𝐿 𝑓 = 𝔼 𝑥,𝑦 ~𝐷[𝑙(𝑓, 𝑥, 𝑦)]
• Examples of loss functions:
• 0-1 loss: 𝑙 𝑓, 𝑥, 𝑦 = 𝕀[𝑓 𝑥 ≠ 𝑦] and 𝐿 𝑓 = Pr[𝑓 𝑥 ≠ 𝑦]
• 𝑙2 loss: 𝑙 𝑓, 𝑥, 𝑦 = [𝑓 𝑥 − 𝑦]2 and 𝐿 𝑓 = 𝔼[𝑓 𝑥 − 𝑦]2
Math formulation
• Given training data 𝑥𝑖, 𝑦𝑖 : 1 ≤ 𝑖 ≤ 𝑛 i.i.d. from distribution 𝐷
• Find 𝑦 = 𝑓(𝑥) ∈ 𝓗 using training data
• s.t. the expected loss is small
𝐿 𝑓 = 𝔼 𝑥,𝑦 ~𝐷[𝑙(𝑓, 𝑥, 𝑦)] How to use?
Math formulation
• Given training data 𝑥𝑖, 𝑦𝑖 : 1 ≤ 𝑖 ≤ 𝑛 i.i.d. from distribution 𝐷
• Find 𝑦 = 𝑓(𝑥) ∈ 𝓗 that minimizes ෠
𝐿 𝑓 =
1
𝑛
σ𝑖=1
𝑛
𝑙(𝑓, 𝑥𝑖, 𝑦𝑖)
• s.t. the expected loss is small
𝐿 𝑓 = 𝔼 𝑥,𝑦 ~𝐷[𝑙(𝑓, 𝑥, 𝑦)]
Empirical loss
Machine learning 1-2-3
• Collect data and extract features
• Build model: choose hypothesis class 𝓗 and loss function 𝑙
• Optimization: minimize the empirical loss
Wait…
• Why handcraft the feature vectors 𝑥, 𝑦?
• Can use prior knowledge to design suitable features
• Can computer learn the features on the raw images?
• Learn features directly on the raw images: Representation Learning
• Deep Learning ⊆ Representation Learning ⊆ Machine Learning ⊆ Artificial
Intelligence
Wait…
• Does MachineLearning-1-2-3 include all approaches?
• Include many but not all
• Our current focus will be MachineLearning-1-2-3
Example: Stock Market Prediction
2013 2014 2015 2016
Stock Market (Disclaimer: synthetic data/in another parallel universe)
Orange MacroHard Ackermann
Sliding window over time: serve as input 𝑥; non-i.i.d.
Linear regression
Real data: Prostate Cancer
by Stamey et al. (1989)
Figure borrowed from
The Elements of Statistical Learning
𝑦: prostate
specific antigen
(𝑥1, … , 𝑥8):
clinical measures
Linear regression
• Given training data 𝑥𝑖, 𝑦𝑖 : 1 ≤ 𝑖 ≤ 𝑛 i.i.d. from distribution 𝐷
• Find 𝑓𝑤 𝑥 = 𝑤𝑇𝑥 that minimizes ෠
𝐿 𝑓𝑤 =
1
𝑛
σ𝑖=1
𝑛
𝑤𝑇𝑥𝑖 − 𝑦𝑖
2
𝑙2 loss; also called mean
square error
Hypothesis class 𝓗
Linear regression: optimization
• Given training data 𝑥𝑖, 𝑦𝑖 : 1 ≤ 𝑖 ≤ 𝑛 i.i.d. from distribution 𝐷
• Find 𝑓𝑤 𝑥 = 𝑤𝑇𝑥 that minimizes ෠
𝐿 𝑓𝑤 =
1
𝑛
σ𝑖=1
𝑛
𝑤𝑇𝑥𝑖 − 𝑦𝑖
2
• Let 𝑋 be a matrix whose 𝑖-th row is 𝑥𝑖
𝑇
, 𝑦 be the vector 𝑦1, … , 𝑦𝑛
𝑇
෠
𝐿 𝑓𝑤 =
1
𝑛
෍
𝑖=1
𝑛
𝑤𝑇𝑥𝑖 − 𝑦𝑖
2 =
1
𝑛
⃦𝑋𝑤 − 𝑦 ⃦2
2
Linear regression: optimization
• Set the gradient to 0 to get the minimizer
𝛻𝑤
෠
𝐿 𝑓𝑤 = 𝛻𝑤
1
𝑛
⃦𝑋𝑤 − 𝑦 ⃦2
2
= 0
𝛻𝑤[ 𝑋𝑤 − 𝑦 𝑇(𝑋𝑤 − 𝑦)] = 0
𝛻𝑤[ 𝑤𝑇𝑋𝑇𝑋𝑤 − 2𝑤𝑇𝑋𝑇𝑦 + 𝑦𝑇𝑦] = 0
2𝑋𝑇𝑋𝑤 − 2𝑋𝑇𝑦 = 0
w = 𝑋𝑇𝑋 −1𝑋𝑇𝑦
Linear regression: optimization
• Algebraic view of the minimizer
• If 𝑋 is invertible, just solve 𝑋𝑤 = 𝑦 and get 𝑤 = 𝑋−1𝑦
• But typically 𝑋 is a tall matrix
𝑋
𝑤
=
𝑦
𝑋𝑇
𝑋 𝑤
=
𝑋𝑇
𝑦
Normal equation: w = 𝑋𝑇
𝑋 −1
𝑋𝑇
𝑦
Linear regression with bias
• Given training data 𝑥𝑖, 𝑦𝑖 : 1 ≤ 𝑖 ≤ 𝑛 i.i.d. from distribution 𝐷
• Find 𝑓𝑤,𝑏 𝑥 = 𝑤𝑇𝑥 + 𝑏 to minimize the loss
• Reduce to the case without bias:
• Let 𝑤′
= 𝑤; 𝑏 , 𝑥′
= 𝑥; 1
• Then 𝑓𝑤,𝑏 𝑥 = 𝑤𝑇
𝑥 + 𝑏 = 𝑤′ 𝑇
(𝑥′
)
Bias term

More Related Content

Similar to ML_basics_lecture1_linear_regression.pdf

Introduction to Boosted Trees by Tianqi Chen
Introduction to Boosted Trees by Tianqi ChenIntroduction to Boosted Trees by Tianqi Chen
Introduction to Boosted Trees by Tianqi ChenZhuyi Xue
 
introduction to machine learning 3c-feature-extraction.pptx
introduction to machine learning 3c-feature-extraction.pptxintroduction to machine learning 3c-feature-extraction.pptx
introduction to machine learning 3c-feature-extraction.pptxPratik Gohel
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习AdaboostShocky1
 
Boosted tree
Boosted treeBoosted tree
Boosted treeZhuyi Xue
 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data ScienceAlbert Bifet
 
Le Machine Learning de A à Z
Le Machine Learning de A à ZLe Machine Learning de A à Z
Le Machine Learning de A à ZAlexia Audevart
 
Domain adaptation: A Theoretical View
Domain adaptation: A Theoretical ViewDomain adaptation: A Theoretical View
Domain adaptation: A Theoretical ViewChia-Ching Lin
 
Gradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learnGradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learnDataRobot
 
Training DNN Models - II.pptx
Training DNN Models - II.pptxTraining DNN Models - II.pptx
Training DNN Models - II.pptxPrabhuSelvaraj15
 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelineChenYiHuang5
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to RankBhaskar Mitra
 
Coursera 1week
Coursera  1weekCoursera  1week
Coursera 1weekcsl9496
 
Learning a nonlinear embedding by preserving class neibourhood structure 최종
Learning a nonlinear embedding by preserving class neibourhood structure   최종Learning a nonlinear embedding by preserving class neibourhood structure   최종
Learning a nonlinear embedding by preserving class neibourhood structure 최종WooSung Choi
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to RankBhaskar Mitra
 
Deep learning paper review ppt sourece -Direct clr
Deep learning paper review ppt sourece -Direct clr Deep learning paper review ppt sourece -Direct clr
Deep learning paper review ppt sourece -Direct clr taeseon ryu
 
L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learningYogendra Singh
 
Support Vector Machines Simply
Support Vector Machines SimplySupport Vector Machines Simply
Support Vector Machines SimplyEmad Nabil
 

Similar to ML_basics_lecture1_linear_regression.pdf (20)

Introduction to Boosted Trees by Tianqi Chen
Introduction to Boosted Trees by Tianqi ChenIntroduction to Boosted Trees by Tianqi Chen
Introduction to Boosted Trees by Tianqi Chen
 
introduction to machine learning 3c-feature-extraction.pptx
introduction to machine learning 3c-feature-extraction.pptxintroduction to machine learning 3c-feature-extraction.pptx
introduction to machine learning 3c-feature-extraction.pptx
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习Adaboost
 
NeurIPS22.pptx
NeurIPS22.pptxNeurIPS22.pptx
NeurIPS22.pptx
 
Boosted tree
Boosted treeBoosted tree
Boosted tree
 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data Science
 
Le Machine Learning de A à Z
Le Machine Learning de A à ZLe Machine Learning de A à Z
Le Machine Learning de A à Z
 
Domain adaptation: A Theoretical View
Domain adaptation: A Theoretical ViewDomain adaptation: A Theoretical View
Domain adaptation: A Theoretical View
 
Gradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learnGradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learn
 
Training DNN Models - II.pptx
Training DNN Models - II.pptxTraining DNN Models - II.pptx
Training DNN Models - II.pptx
 
ngboost.pptx
ngboost.pptxngboost.pptx
ngboost.pptx
 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipeline
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to Rank
 
Coursera 1week
Coursera  1weekCoursera  1week
Coursera 1week
 
Learning a nonlinear embedding by preserving class neibourhood structure 최종
Learning a nonlinear embedding by preserving class neibourhood structure   최종Learning a nonlinear embedding by preserving class neibourhood structure   최종
Learning a nonlinear embedding by preserving class neibourhood structure 최종
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to Rank
 
Unit-1.ppt
Unit-1.pptUnit-1.ppt
Unit-1.ppt
 
Deep learning paper review ppt sourece -Direct clr
Deep learning paper review ppt sourece -Direct clr Deep learning paper review ppt sourece -Direct clr
Deep learning paper review ppt sourece -Direct clr
 
L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learning
 
Support Vector Machines Simply
Support Vector Machines SimplySupport Vector Machines Simply
Support Vector Machines Simply
 

More from Tigabu Yaya

03. Data Exploration in Data Science.pdf
03. Data Exploration in Data Science.pdf03. Data Exploration in Data Science.pdf
03. Data Exploration in Data Science.pdfTigabu Yaya
 
MOD_Architectural_Design_Chap6_Summary.pdf
MOD_Architectural_Design_Chap6_Summary.pdfMOD_Architectural_Design_Chap6_Summary.pdf
MOD_Architectural_Design_Chap6_Summary.pdfTigabu Yaya
 
MOD_Design_Implementation_Ch7_summary.pdf
MOD_Design_Implementation_Ch7_summary.pdfMOD_Design_Implementation_Ch7_summary.pdf
MOD_Design_Implementation_Ch7_summary.pdfTigabu Yaya
 
GER_Project_Management_Ch22_summary.pdf
GER_Project_Management_Ch22_summary.pdfGER_Project_Management_Ch22_summary.pdf
GER_Project_Management_Ch22_summary.pdfTigabu Yaya
 
lecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdflecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdfTigabu Yaya
 
lecture_GPUArchCUDA04-OpenMPHOMP.pdf
lecture_GPUArchCUDA04-OpenMPHOMP.pdflecture_GPUArchCUDA04-OpenMPHOMP.pdf
lecture_GPUArchCUDA04-OpenMPHOMP.pdfTigabu Yaya
 
6_RealTimeScheduling.pdf
6_RealTimeScheduling.pdf6_RealTimeScheduling.pdf
6_RealTimeScheduling.pdfTigabu Yaya
 
200402_RoseRealTime.ppt
200402_RoseRealTime.ppt200402_RoseRealTime.ppt
200402_RoseRealTime.pptTigabu Yaya
 
matrixfactorization.ppt
matrixfactorization.pptmatrixfactorization.ppt
matrixfactorization.pptTigabu Yaya
 
The Jacobi and Gauss-Seidel Iterative Methods.pdf
The Jacobi and Gauss-Seidel Iterative Methods.pdfThe Jacobi and Gauss-Seidel Iterative Methods.pdf
The Jacobi and Gauss-Seidel Iterative Methods.pdfTigabu Yaya
 
C_and_C++_notes.pdf
C_and_C++_notes.pdfC_and_C++_notes.pdf
C_and_C++_notes.pdfTigabu Yaya
 

More from Tigabu Yaya (20)

03. Data Exploration in Data Science.pdf
03. Data Exploration in Data Science.pdf03. Data Exploration in Data Science.pdf
03. Data Exploration in Data Science.pdf
 
MOD_Architectural_Design_Chap6_Summary.pdf
MOD_Architectural_Design_Chap6_Summary.pdfMOD_Architectural_Design_Chap6_Summary.pdf
MOD_Architectural_Design_Chap6_Summary.pdf
 
MOD_Design_Implementation_Ch7_summary.pdf
MOD_Design_Implementation_Ch7_summary.pdfMOD_Design_Implementation_Ch7_summary.pdf
MOD_Design_Implementation_Ch7_summary.pdf
 
GER_Project_Management_Ch22_summary.pdf
GER_Project_Management_Ch22_summary.pdfGER_Project_Management_Ch22_summary.pdf
GER_Project_Management_Ch22_summary.pdf
 
lecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdflecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdf
 
lecture_GPUArchCUDA04-OpenMPHOMP.pdf
lecture_GPUArchCUDA04-OpenMPHOMP.pdflecture_GPUArchCUDA04-OpenMPHOMP.pdf
lecture_GPUArchCUDA04-OpenMPHOMP.pdf
 
6_RealTimeScheduling.pdf
6_RealTimeScheduling.pdf6_RealTimeScheduling.pdf
6_RealTimeScheduling.pdf
 
Regression.pptx
Regression.pptxRegression.pptx
Regression.pptx
 
lecture6.pdf
lecture6.pdflecture6.pdf
lecture6.pdf
 
lecture5.pdf
lecture5.pdflecture5.pdf
lecture5.pdf
 
lecture4.pdf
lecture4.pdflecture4.pdf
lecture4.pdf
 
lecture3.pdf
lecture3.pdflecture3.pdf
lecture3.pdf
 
lecture2.pdf
lecture2.pdflecture2.pdf
lecture2.pdf
 
Chap 4.ppt
Chap 4.pptChap 4.ppt
Chap 4.ppt
 
200402_RoseRealTime.ppt
200402_RoseRealTime.ppt200402_RoseRealTime.ppt
200402_RoseRealTime.ppt
 
matrixfactorization.ppt
matrixfactorization.pptmatrixfactorization.ppt
matrixfactorization.ppt
 
nnfl.0620.pptx
nnfl.0620.pptxnnfl.0620.pptx
nnfl.0620.pptx
 
L20.ppt
L20.pptL20.ppt
L20.ppt
 
The Jacobi and Gauss-Seidel Iterative Methods.pdf
The Jacobi and Gauss-Seidel Iterative Methods.pdfThe Jacobi and Gauss-Seidel Iterative Methods.pdf
The Jacobi and Gauss-Seidel Iterative Methods.pdf
 
C_and_C++_notes.pdf
C_and_C++_notes.pdfC_and_C++_notes.pdf
C_and_C++_notes.pdf
 

Recently uploaded

Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitolTechU
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerunnathinaik
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,Virag Sontakke
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxEyham Joco
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxJiesonDelaCerna
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 

Recently uploaded (20)

Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptx
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developer
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptx
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptx
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 

ML_basics_lecture1_linear_regression.pdf

  • 1. Machine Learning Basics Lecture 1: Linear Regression Princeton University COS 495 Instructor: Yingyu Liang
  • 3. What is machine learning? • “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T as measured by P, improves with experience E.” ------- Machine Learning, Tom Mitchell, 1997
  • 4. Example 1: image classification Task: determine if the image is indoor or outdoor Performance measure: probability of misclassification
  • 5. Example 1: image classification indoor outdoor Experience/Data: images with labels Indoor
  • 6. Example 1: image classification • A few terminologies • Training data: the images given for learning • Test data: the images to be classified • Binary classification: classify into two classes
  • 7. Example 1: image classification (multi-class) ImageNet figure borrowed from vision.standford.edu
  • 8. Example 2: clustering images Task: partition the images into 2 groups Performance: similarities within groups Data: a set of images
  • 9. Example 2: clustering images • A few terminologies • Unlabeled data vs labeled data • Supervised learning vs unsupervised learning
  • 10. Math formulation Color Histogram Red Green Blue Indoor 0 Feature vector: 𝑥𝑖 Label: 𝑦𝑖 Extract features
  • 11. Math formulation Color Histogram Red Green Blue outdoor 1 Feature vector: 𝑥𝑗 Label: 𝑦𝑗 Extract features
  • 12. Math formulation • Given training data 𝑥𝑖, 𝑦𝑖 : 1 ≤ 𝑖 ≤ 𝑛 • Find 𝑦 = 𝑓(𝑥) using training data • s.t. 𝑓 correct on test data What kind of functions?
  • 13. Math formulation • Given training data 𝑥𝑖, 𝑦𝑖 : 1 ≤ 𝑖 ≤ 𝑛 • Find 𝑦 = 𝑓(𝑥) ∈ 𝓗 using training data • s.t. 𝑓 correct on test data Hypothesis class
  • 14. Math formulation • Given training data 𝑥𝑖, 𝑦𝑖 : 1 ≤ 𝑖 ≤ 𝑛 • Find 𝑦 = 𝑓(𝑥) ∈ 𝓗 using training data • s.t. 𝑓 correct on test data Connection between training data and test data?
  • 15. Math formulation • Given training data 𝑥𝑖, 𝑦𝑖 : 1 ≤ 𝑖 ≤ 𝑛 i.i.d. from distribution 𝐷 • Find 𝑦 = 𝑓(𝑥) ∈ 𝓗 using training data • s.t. 𝑓 correct on test data i.i.d. from distribution 𝐷 They have the same distribution i.i.d.: independently identically distributed
  • 16. Math formulation • Given training data 𝑥𝑖, 𝑦𝑖 : 1 ≤ 𝑖 ≤ 𝑛 i.i.d. from distribution 𝐷 • Find 𝑦 = 𝑓(𝑥) ∈ 𝓗 using training data • s.t. 𝑓 correct on test data i.i.d. from distribution 𝐷 What kind of performance measure?
  • 17. Math formulation • Given training data 𝑥𝑖, 𝑦𝑖 : 1 ≤ 𝑖 ≤ 𝑛 i.i.d. from distribution 𝐷 • Find 𝑦 = 𝑓(𝑥) ∈ 𝓗 using training data • s.t. the expected loss is small 𝐿 𝑓 = 𝔼 𝑥,𝑦 ~𝐷[𝑙(𝑓, 𝑥, 𝑦)] Various loss functions
  • 18. Math formulation • Given training data 𝑥𝑖, 𝑦𝑖 : 1 ≤ 𝑖 ≤ 𝑛 i.i.d. from distribution 𝐷 • Find 𝑦 = 𝑓(𝑥) ∈ 𝓗 using training data • s.t. the expected loss is small 𝐿 𝑓 = 𝔼 𝑥,𝑦 ~𝐷[𝑙(𝑓, 𝑥, 𝑦)] • Examples of loss functions: • 0-1 loss: 𝑙 𝑓, 𝑥, 𝑦 = 𝕀[𝑓 𝑥 ≠ 𝑦] and 𝐿 𝑓 = Pr[𝑓 𝑥 ≠ 𝑦] • 𝑙2 loss: 𝑙 𝑓, 𝑥, 𝑦 = [𝑓 𝑥 − 𝑦]2 and 𝐿 𝑓 = 𝔼[𝑓 𝑥 − 𝑦]2
  • 19. Math formulation • Given training data 𝑥𝑖, 𝑦𝑖 : 1 ≤ 𝑖 ≤ 𝑛 i.i.d. from distribution 𝐷 • Find 𝑦 = 𝑓(𝑥) ∈ 𝓗 using training data • s.t. the expected loss is small 𝐿 𝑓 = 𝔼 𝑥,𝑦 ~𝐷[𝑙(𝑓, 𝑥, 𝑦)] How to use?
  • 20. Math formulation • Given training data 𝑥𝑖, 𝑦𝑖 : 1 ≤ 𝑖 ≤ 𝑛 i.i.d. from distribution 𝐷 • Find 𝑦 = 𝑓(𝑥) ∈ 𝓗 that minimizes ෠ 𝐿 𝑓 = 1 𝑛 σ𝑖=1 𝑛 𝑙(𝑓, 𝑥𝑖, 𝑦𝑖) • s.t. the expected loss is small 𝐿 𝑓 = 𝔼 𝑥,𝑦 ~𝐷[𝑙(𝑓, 𝑥, 𝑦)] Empirical loss
  • 21. Machine learning 1-2-3 • Collect data and extract features • Build model: choose hypothesis class 𝓗 and loss function 𝑙 • Optimization: minimize the empirical loss
  • 22. Wait… • Why handcraft the feature vectors 𝑥, 𝑦? • Can use prior knowledge to design suitable features • Can computer learn the features on the raw images? • Learn features directly on the raw images: Representation Learning • Deep Learning ⊆ Representation Learning ⊆ Machine Learning ⊆ Artificial Intelligence
  • 23. Wait… • Does MachineLearning-1-2-3 include all approaches? • Include many but not all • Our current focus will be MachineLearning-1-2-3
  • 24. Example: Stock Market Prediction 2013 2014 2015 2016 Stock Market (Disclaimer: synthetic data/in another parallel universe) Orange MacroHard Ackermann Sliding window over time: serve as input 𝑥; non-i.i.d.
  • 26. Real data: Prostate Cancer by Stamey et al. (1989) Figure borrowed from The Elements of Statistical Learning 𝑦: prostate specific antigen (𝑥1, … , 𝑥8): clinical measures
  • 27. Linear regression • Given training data 𝑥𝑖, 𝑦𝑖 : 1 ≤ 𝑖 ≤ 𝑛 i.i.d. from distribution 𝐷 • Find 𝑓𝑤 𝑥 = 𝑤𝑇𝑥 that minimizes ෠ 𝐿 𝑓𝑤 = 1 𝑛 σ𝑖=1 𝑛 𝑤𝑇𝑥𝑖 − 𝑦𝑖 2 𝑙2 loss; also called mean square error Hypothesis class 𝓗
  • 28. Linear regression: optimization • Given training data 𝑥𝑖, 𝑦𝑖 : 1 ≤ 𝑖 ≤ 𝑛 i.i.d. from distribution 𝐷 • Find 𝑓𝑤 𝑥 = 𝑤𝑇𝑥 that minimizes ෠ 𝐿 𝑓𝑤 = 1 𝑛 σ𝑖=1 𝑛 𝑤𝑇𝑥𝑖 − 𝑦𝑖 2 • Let 𝑋 be a matrix whose 𝑖-th row is 𝑥𝑖 𝑇 , 𝑦 be the vector 𝑦1, … , 𝑦𝑛 𝑇 ෠ 𝐿 𝑓𝑤 = 1 𝑛 ෍ 𝑖=1 𝑛 𝑤𝑇𝑥𝑖 − 𝑦𝑖 2 = 1 𝑛 ⃦𝑋𝑤 − 𝑦 ⃦2 2
  • 29. Linear regression: optimization • Set the gradient to 0 to get the minimizer 𝛻𝑤 ෠ 𝐿 𝑓𝑤 = 𝛻𝑤 1 𝑛 ⃦𝑋𝑤 − 𝑦 ⃦2 2 = 0 𝛻𝑤[ 𝑋𝑤 − 𝑦 𝑇(𝑋𝑤 − 𝑦)] = 0 𝛻𝑤[ 𝑤𝑇𝑋𝑇𝑋𝑤 − 2𝑤𝑇𝑋𝑇𝑦 + 𝑦𝑇𝑦] = 0 2𝑋𝑇𝑋𝑤 − 2𝑋𝑇𝑦 = 0 w = 𝑋𝑇𝑋 −1𝑋𝑇𝑦
  • 30. Linear regression: optimization • Algebraic view of the minimizer • If 𝑋 is invertible, just solve 𝑋𝑤 = 𝑦 and get 𝑤 = 𝑋−1𝑦 • But typically 𝑋 is a tall matrix 𝑋 𝑤 = 𝑦 𝑋𝑇 𝑋 𝑤 = 𝑋𝑇 𝑦 Normal equation: w = 𝑋𝑇 𝑋 −1 𝑋𝑇 𝑦
  • 31. Linear regression with bias • Given training data 𝑥𝑖, 𝑦𝑖 : 1 ≤ 𝑖 ≤ 𝑛 i.i.d. from distribution 𝐷 • Find 𝑓𝑤,𝑏 𝑥 = 𝑤𝑇𝑥 + 𝑏 to minimize the loss • Reduce to the case without bias: • Let 𝑤′ = 𝑤; 𝑏 , 𝑥′ = 𝑥; 1 • Then 𝑓𝑤,𝑏 𝑥 = 𝑤𝑇 𝑥 + 𝑏 = 𝑤′ 𝑇 (𝑥′ ) Bias term