SlideShare a Scribd company logo
1 of 7
Download to read offline
Christof Monz
Informatics Institute
University of Amsterdam
Data Mining
Week 1: Linear Regression
Outline
Christof Monz
Data Mining - Week 1: Linear Regression
1
Plotting real-valued predictions
Linear regression
Error function
Linear Regression
Christof Monz
Data Mining - Week 1: Linear Regression
2
Predict real-values (as opposed to discrete
classes)
Simple machine learning prediction task
Assumes linear correlation between data and
target values
Scatter Plots
Christof Monz
Data Mining - Week 1: Linear Regression
3
10 15 20 25 30 35 40 45
10152025303540
x
y
Linear Regression
Christof Monz
Data Mining - Week 1: Linear Regression
4
Find the line that approximates the data as
closely as possible
ˆy = a +b ·x
where b is the slope, and a is the y-intercept
a and b should be chosen such that they
minimize the difference between the predicted
values and the values in the training data
Error Functions
Christof Monz
Data Mining - Week 1: Linear Regression
5
There are a number of ways to define an error
function
Sum of absolute errors = ∑
i∈D
|yi −(a +bxi)|
Sum of squared errors = ∑
i∈D
(yi −(a +bxi))2
where yi is the true value
Squared error is most commonly used
Task: Find the parameters a and b that
minimize the squared error over the training
data
Error Functions
Christof Monz
Data Mining - Week 1: Linear Regression
6
Normalized error functions:
Mean squared error = ∑
i∈D
(yi −(a+bxi ))2
|D|
Relative squared error = ∑i∈D(yi −(a+bxi ))2
∑i∈D(yi −¯y)2
where ¯y = 1
|D| ∑i∈D yi
Root relative squared error = ∑i∈D(yi −(a+bxi ))2
∑i∈D(yi −¯y)2
Minimizing Error Functions
Christof Monz
Data Mining - Week 1: Linear Regression
7
There are roughly two ways:
• Try different parameter instantiations and see which
ones lead to the lowest error (search)
• Solve mathematically (closed form)
Most parameter estimation problems in machine
learning can only be solved by searching
For linear regression, we can solve it
mathematically
Minimizing SSE
Christof Monz
Data Mining - Week 1: Linear Regression
8
SSE = ∑
i∈D
(yi −(a +bxi))2
Take the partial derivatives with respect to a
and b
Set each partial derivative equal to zero and
solve for a and b respectively
The resulting values for a and b minimize the
error rate and can be used to predict unseen
data instances
Applying Linear Regression
Christof Monz
Data Mining - Week 1: Linear Regression
9
For a given training set we first compute b:
b =
|D|∑i∈D xi yi −∑i∈D xi ∑i∈D yi
|D|∑i∈D x2
i −(∑i∈D xi )2
and then a, using the value computed for b:
a = ¯y −b¯x
For any new instances x (i.e. instances that
were not in the training set), the predicted value
is: a +bx
Extendible to multi-valued functions
Linear Regression
Christof Monz
Data Mining - Week 1: Linear Regression
10
Used to predict real-number values, given
numerical input variables
Parameters can be estimated analytically (i.e.
by applying some mathematics), which won’t be
the case for most parameter estimation
algorithms we’ll see later on
Extendible to non-linear functions, e.g.
log-linear regression
Correlation
Christof Monz
Data Mining - Week 1: Linear Regression
11
So far we have used linear regression to predict
target values (prediction)
Linear regression can also be used to determine
how closely to variables are correlated
(description)
The smaller the error rate, the stronger the
correlation between the variables
Correlation does mean that there is some
(interesting relation) between variables (not
necessarily causal)
Recap
Christof Monz
Data Mining - Week 1: Linear Regression
12
Linear regression
Error rates
Analytical parameter estimation

More Related Content

What's hot (20)

AP Calculus January 5, 2009
AP Calculus January 5, 2009AP Calculus January 5, 2009
AP Calculus January 5, 2009
 
Alg2 Notes Unit 1 Day 5
Alg2 Notes Unit 1 Day 5Alg2 Notes Unit 1 Day 5
Alg2 Notes Unit 1 Day 5
 
Examen du seconde semestre g8
Examen du seconde semestre g8Examen du seconde semestre g8
Examen du seconde semestre g8
 
AP Calculus Slides December 10, 2007
AP Calculus Slides December 10, 2007AP Calculus Slides December 10, 2007
AP Calculus Slides December 10, 2007
 
Abstract PDF
Abstract PDFAbstract PDF
Abstract PDF
 
Activity 2
Activity 2Activity 2
Activity 2
 
Activity 02
Activity 02Activity 02
Activity 02
 
Math hssc-ii-a1
Math hssc-ii-a1Math hssc-ii-a1
Math hssc-ii-a1
 
130701 04-01-2013
130701 04-01-2013130701 04-01-2013
130701 04-01-2013
 
Subtractor (1)
Subtractor (1)Subtractor (1)
Subtractor (1)
 
Module 12 topic 1 notes
Module 12 topic 1 notesModule 12 topic 1 notes
Module 12 topic 1 notes
 
4.5 graph using slope int form - day 2
4.5 graph using slope int form - day 24.5 graph using slope int form - day 2
4.5 graph using slope int form - day 2
 
Subtractor
SubtractorSubtractor
Subtractor
 
Matrices, Arrays and Vectors in MATLAB
Matrices, Arrays and Vectors in MATLABMatrices, Arrays and Vectors in MATLAB
Matrices, Arrays and Vectors in MATLAB
 
Examplelf flowchart
Examplelf flowchartExamplelf flowchart
Examplelf flowchart
 
Funções 2
Funções 2Funções 2
Funções 2
 
Chirantan (java)
Chirantan   (java)Chirantan   (java)
Chirantan (java)
 
8 6 Notes
8 6 Notes8 6 Notes
8 6 Notes
 
Implementation
ImplementationImplementation
Implementation
 
Day 3 Angles In Polygons
Day 3 Angles In PolygonsDay 3 Angles In Polygons
Day 3 Angles In Polygons
 

Similar to UM Amsterdam Linear Regression Week 1

Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data ScienceAlbert Bifet
 
ML - Multiple Linear Regression
ML - Multiple Linear RegressionML - Multiple Linear Regression
ML - Multiple Linear RegressionAndrew Ferlitsch
 
Regression: A skin-deep dive
Regression: A skin-deep diveRegression: A skin-deep dive
Regression: A skin-deep diveabulyomon
 
Dm week01 prob-refresher.handout
Dm week01 prob-refresher.handoutDm week01 prob-refresher.handout
Dm week01 prob-refresher.handoutokeee
 
Unit One - Solved problems on error analysis .ppt
Unit One - Solved problems on error analysis .pptUnit One - Solved problems on error analysis .ppt
Unit One - Solved problems on error analysis .pptashugizaw1506
 
optimal subsampling
optimal subsamplingoptimal subsampling
optimal subsamplingTian Tian
 
Data flow vs. procedural programming: How to put your algorithms into Flink
Data flow vs. procedural programming: How to put your algorithms into FlinkData flow vs. procedural programming: How to put your algorithms into Flink
Data flow vs. procedural programming: How to put your algorithms into FlinkMikio L. Braun
 
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...MLconf
 
ISI MSQE Entrance Question Paper (2006)
ISI MSQE Entrance Question Paper (2006)ISI MSQE Entrance Question Paper (2006)
ISI MSQE Entrance Question Paper (2006)CrackDSE
 
SKuehn_MachineLearningAndOptimization_2015
SKuehn_MachineLearningAndOptimization_2015SKuehn_MachineLearningAndOptimization_2015
SKuehn_MachineLearningAndOptimization_2015Stefan Kühn
 
Dm part03 neural-networks-handout
Dm part03 neural-networks-handoutDm part03 neural-networks-handout
Dm part03 neural-networks-handoutokeee
 

Similar to UM Amsterdam Linear Regression Week 1 (20)

Talk iccf 19_ben_hammouda
Talk iccf 19_ben_hammoudaTalk iccf 19_ben_hammouda
Talk iccf 19_ben_hammouda
 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data Science
 
Regression
RegressionRegression
Regression
 
Optimization tutorial
Optimization tutorialOptimization tutorial
Optimization tutorial
 
ML - Multiple Linear Regression
ML - Multiple Linear RegressionML - Multiple Linear Regression
ML - Multiple Linear Regression
 
1
11
1
 
Regression: A skin-deep dive
Regression: A skin-deep diveRegression: A skin-deep dive
Regression: A skin-deep dive
 
Dynamic pgmming
Dynamic pgmmingDynamic pgmming
Dynamic pgmming
 
3ml.pdf
3ml.pdf3ml.pdf
3ml.pdf
 
Dm week01 prob-refresher.handout
Dm week01 prob-refresher.handoutDm week01 prob-refresher.handout
Dm week01 prob-refresher.handout
 
Unit One - Solved problems on error analysis .ppt
Unit One - Solved problems on error analysis .pptUnit One - Solved problems on error analysis .ppt
Unit One - Solved problems on error analysis .ppt
 
optimal subsampling
optimal subsamplingoptimal subsampling
optimal subsampling
 
Data flow vs. procedural programming: How to put your algorithms into Flink
Data flow vs. procedural programming: How to put your algorithms into FlinkData flow vs. procedural programming: How to put your algorithms into Flink
Data flow vs. procedural programming: How to put your algorithms into Flink
 
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
 
Dynamicpgmming
DynamicpgmmingDynamicpgmming
Dynamicpgmming
 
ISI MSQE Entrance Question Paper (2006)
ISI MSQE Entrance Question Paper (2006)ISI MSQE Entrance Question Paper (2006)
ISI MSQE Entrance Question Paper (2006)
 
SKuehn_MachineLearningAndOptimization_2015
SKuehn_MachineLearningAndOptimization_2015SKuehn_MachineLearningAndOptimization_2015
SKuehn_MachineLearningAndOptimization_2015
 
Network Security CS3-4
Network Security CS3-4 Network Security CS3-4
Network Security CS3-4
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
Dm part03 neural-networks-handout
Dm part03 neural-networks-handoutDm part03 neural-networks-handout
Dm part03 neural-networks-handout
 

More from okeee

Week02 answer
Week02 answerWeek02 answer
Week02 answerokeee
 
Dm uitwerkingen wc4
Dm uitwerkingen wc4Dm uitwerkingen wc4
Dm uitwerkingen wc4okeee
 
Dm uitwerkingen wc2
Dm uitwerkingen wc2Dm uitwerkingen wc2
Dm uitwerkingen wc2okeee
 
Dm uitwerkingen wc1
Dm uitwerkingen wc1Dm uitwerkingen wc1
Dm uitwerkingen wc1okeee
 
Dm uitwerkingen wc3
Dm uitwerkingen wc3Dm uitwerkingen wc3
Dm uitwerkingen wc3okeee
 
Dm uitwerkingen wc1
Dm uitwerkingen wc1Dm uitwerkingen wc1
Dm uitwerkingen wc1okeee
 
Dm part03 neural-networks-homework
Dm part03 neural-networks-homeworkDm part03 neural-networks-homework
Dm part03 neural-networks-homeworkokeee
 
10[1].1.1.115.9508
10[1].1.1.115.950810[1].1.1.115.9508
10[1].1.1.115.9508okeee
 
Hcm p137 hilliges
Hcm p137 hilligesHcm p137 hilliges
Hcm p137 hilligesokeee
 
Prob18
Prob18Prob18
Prob18okeee
 
Overfit10
Overfit10Overfit10
Overfit10okeee
 
Decision tree.10.11
Decision tree.10.11Decision tree.10.11
Decision tree.10.11okeee
 
Dm week02 decision-trees-handout
Dm week02 decision-trees-handoutDm week02 decision-trees-handout
Dm week02 decision-trees-handoutokeee
 
Dm week01 intro.handout
Dm week01 intro.handoutDm week01 intro.handout
Dm week01 intro.handoutokeee
 
Dm week01 homework(1)
Dm week01 homework(1)Dm week01 homework(1)
Dm week01 homework(1)okeee
 
Chapter7 huizing
Chapter7 huizingChapter7 huizing
Chapter7 huizingokeee
 
Chapter8 choo
Chapter8 chooChapter8 choo
Chapter8 choookeee
 
Chapter6 huizing
Chapter6 huizingChapter6 huizing
Chapter6 huizingokeee
 
Kbms text-image
Kbms text-imageKbms text-image
Kbms text-imageokeee
 
Kbms audio
Kbms audioKbms audio
Kbms audiookeee
 

More from okeee (20)

Week02 answer
Week02 answerWeek02 answer
Week02 answer
 
Dm uitwerkingen wc4
Dm uitwerkingen wc4Dm uitwerkingen wc4
Dm uitwerkingen wc4
 
Dm uitwerkingen wc2
Dm uitwerkingen wc2Dm uitwerkingen wc2
Dm uitwerkingen wc2
 
Dm uitwerkingen wc1
Dm uitwerkingen wc1Dm uitwerkingen wc1
Dm uitwerkingen wc1
 
Dm uitwerkingen wc3
Dm uitwerkingen wc3Dm uitwerkingen wc3
Dm uitwerkingen wc3
 
Dm uitwerkingen wc1
Dm uitwerkingen wc1Dm uitwerkingen wc1
Dm uitwerkingen wc1
 
Dm part03 neural-networks-homework
Dm part03 neural-networks-homeworkDm part03 neural-networks-homework
Dm part03 neural-networks-homework
 
10[1].1.1.115.9508
10[1].1.1.115.950810[1].1.1.115.9508
10[1].1.1.115.9508
 
Hcm p137 hilliges
Hcm p137 hilligesHcm p137 hilliges
Hcm p137 hilliges
 
Prob18
Prob18Prob18
Prob18
 
Overfit10
Overfit10Overfit10
Overfit10
 
Decision tree.10.11
Decision tree.10.11Decision tree.10.11
Decision tree.10.11
 
Dm week02 decision-trees-handout
Dm week02 decision-trees-handoutDm week02 decision-trees-handout
Dm week02 decision-trees-handout
 
Dm week01 intro.handout
Dm week01 intro.handoutDm week01 intro.handout
Dm week01 intro.handout
 
Dm week01 homework(1)
Dm week01 homework(1)Dm week01 homework(1)
Dm week01 homework(1)
 
Chapter7 huizing
Chapter7 huizingChapter7 huizing
Chapter7 huizing
 
Chapter8 choo
Chapter8 chooChapter8 choo
Chapter8 choo
 
Chapter6 huizing
Chapter6 huizingChapter6 huizing
Chapter6 huizing
 
Kbms text-image
Kbms text-imageKbms text-image
Kbms text-image
 
Kbms audio
Kbms audioKbms audio
Kbms audio
 

Recently uploaded

BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 

Recently uploaded (20)

BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 

UM Amsterdam Linear Regression Week 1

  • 1. Christof Monz Informatics Institute University of Amsterdam Data Mining Week 1: Linear Regression Outline Christof Monz Data Mining - Week 1: Linear Regression 1 Plotting real-valued predictions Linear regression Error function
  • 2. Linear Regression Christof Monz Data Mining - Week 1: Linear Regression 2 Predict real-values (as opposed to discrete classes) Simple machine learning prediction task Assumes linear correlation between data and target values Scatter Plots Christof Monz Data Mining - Week 1: Linear Regression 3 10 15 20 25 30 35 40 45 10152025303540 x y
  • 3. Linear Regression Christof Monz Data Mining - Week 1: Linear Regression 4 Find the line that approximates the data as closely as possible ˆy = a +b ·x where b is the slope, and a is the y-intercept a and b should be chosen such that they minimize the difference between the predicted values and the values in the training data Error Functions Christof Monz Data Mining - Week 1: Linear Regression 5 There are a number of ways to define an error function Sum of absolute errors = ∑ i∈D |yi −(a +bxi)| Sum of squared errors = ∑ i∈D (yi −(a +bxi))2 where yi is the true value Squared error is most commonly used Task: Find the parameters a and b that minimize the squared error over the training data
  • 4. Error Functions Christof Monz Data Mining - Week 1: Linear Regression 6 Normalized error functions: Mean squared error = ∑ i∈D (yi −(a+bxi ))2 |D| Relative squared error = ∑i∈D(yi −(a+bxi ))2 ∑i∈D(yi −¯y)2 where ¯y = 1 |D| ∑i∈D yi Root relative squared error = ∑i∈D(yi −(a+bxi ))2 ∑i∈D(yi −¯y)2 Minimizing Error Functions Christof Monz Data Mining - Week 1: Linear Regression 7 There are roughly two ways: • Try different parameter instantiations and see which ones lead to the lowest error (search) • Solve mathematically (closed form) Most parameter estimation problems in machine learning can only be solved by searching For linear regression, we can solve it mathematically
  • 5. Minimizing SSE Christof Monz Data Mining - Week 1: Linear Regression 8 SSE = ∑ i∈D (yi −(a +bxi))2 Take the partial derivatives with respect to a and b Set each partial derivative equal to zero and solve for a and b respectively The resulting values for a and b minimize the error rate and can be used to predict unseen data instances Applying Linear Regression Christof Monz Data Mining - Week 1: Linear Regression 9 For a given training set we first compute b: b = |D|∑i∈D xi yi −∑i∈D xi ∑i∈D yi |D|∑i∈D x2 i −(∑i∈D xi )2 and then a, using the value computed for b: a = ¯y −b¯x For any new instances x (i.e. instances that were not in the training set), the predicted value is: a +bx Extendible to multi-valued functions
  • 6. Linear Regression Christof Monz Data Mining - Week 1: Linear Regression 10 Used to predict real-number values, given numerical input variables Parameters can be estimated analytically (i.e. by applying some mathematics), which won’t be the case for most parameter estimation algorithms we’ll see later on Extendible to non-linear functions, e.g. log-linear regression Correlation Christof Monz Data Mining - Week 1: Linear Regression 11 So far we have used linear regression to predict target values (prediction) Linear regression can also be used to determine how closely to variables are correlated (description) The smaller the error rate, the stronger the correlation between the variables Correlation does mean that there is some (interesting relation) between variables (not necessarily causal)
  • 7. Recap Christof Monz Data Mining - Week 1: Linear Regression 12 Linear regression Error rates Analytical parameter estimation