SlideShare a Scribd company logo
1 of 55
Machine Learning & Linear
Regression
Faculty of Computing and
Information Technology
CPIS-703: Intelligent Information Systems
and
Decision Support
Department of Information Science
Machine Learning
- Grew out of work in AI
- New capability for computers
Examples:
- Database mining
Large datasets from growth of automation/web.
E.g., Web click data, medical records, biology,
engineering
- Applications can’t program by hand.
E.g., Autonomous helicopter, handwriting recognition, most
of Natural Language Processing (NLP), Computer Vision.
Machine Learning
- Grew out of work in AI
- New capability for computers
Examples:
- Database mining
Large datasets from growth of automation/web.
E.g., Web click data, medical records, biology,
engineering
- Applications can’t program by hand.
E.g., Autonomous helicopter, handwriting recognition, most
of Natural Language Processing (NLP), Computer Vision.
- Self-customizing programs
E.g., Amazon, Netflix product recommendations
- Understanding human learning (brain, real AI).
• Arthur Samuel (1959). Machine Learning:
Field of study that gives computers the ability
to learn without being explicitly programmed.
• Tom Mitchell (1998) Well-posed Learning
Problem: A computer program is said to learn
from experience E with respect to some task
T and some performance measure P, if its
performance on T, as measured by P,
improves with experience E.
Machine Learning definition
Classifying emails as spam or not spam.
Watching you label emails as spam or not spam.
The number (or fraction) of emails correctly classified as spam/not spam.
None of the above—this is not a machine learning problem.
Suppose your email program watches which emails you do or
do not mark as spam, and based on that learns how to better
filter spam. What is the task T in this setting?
“A computer program is said to learn from experience E with respect
to some task T and some performance measure P, if its performance
on T, as measured by P, improves with experience E.”
Machine learning algorithms:
- Supervised learning
- Unsupervised learning
Others: Reinforcement learning,
recommender systems.
Also talk about: Practical advice for applying
learning algorithms.
Machine learning Review
Supervised Learning
0
100
200
300
400
0 500 1000 1500 2000 2500
Housing price prediction
Price ($)
in 1000’s
Size in feet2
Regression: Predict
continuous valued
output (price)
Supervised Learning
“right answers” given
Cancer (malignant, benign)
Classification
Discrete
valued output
(0 or 1)
Malignant?
1(Y)
0(N)
Tumor Size
Tumor Size
Tumor Size
Age
- Clump Thickness
- Uniformity of Cell
Size
- Uniformity of Cell
Shape
…
Treat both as classification problems.
Treat problem 1 as a classification problem, problem 2 as a regression
problem.
Treat problem 1 as a regression problem, problem 2 as a classification
problem.
Treat both as regression problems.
You’re running a company, and you want to develop learning algorithms to address
each of two problems.
Problem 1: You have a large inventory of identical items. You want to predict how
many of these items will sell over the next 3 months.
Problem 2: You’d like software to examine individual customer accounts, and for
each account decide if it has been hacked/compromised.
Should you treat these as classification or as regression problems?
Unsupervised Learning
x1
x2
Supervised Learning
Unsupervised Learning
x1
x2
[Source: Daphne Koller]
Genes
Individuals
[Source: Daphne Koller]
Genes
Individuals
Organize computing clusters Social network analysis
Image credit: NASA/JPL-Caltech/E. Churchwell (Univ. of Wisconsin, Madison)
Astronomical data analysis
Market segmentation
Of the following examples, which would you address
using an unsupervised learning algorithm? (Check
all that apply.)
Given a database of customer data, automatically
discover market segments and group customers into
different market segments.
Given email labeled as spam/not spam, learn a spam
filter.
Given a set of news articles found on the web, group
them into set of articles about the same story.
Given a dataset of patients diagnosed as either having
diabetes or not, learn to classify new patients as having
diabetes or not.
Supervised learning
• Notation
– Features x
– Targets y
– Predictions ŷ
– Parameters q
Program (“Learner”)
Characterized by
some “parameters” θ
Procedure (using θ)
that outputs a prediction
Training data
(examples)
Features
Learning algorithm
Change θ
Improve performance
Feedback /
Target values Score performance
(“cost function”)
Linear regression
• Define form of function f(x) explicitly
• Find a good f(x) within that family
0 10 20
0
20
40
Target
y
Feature x
“Predictor”:
Evaluate line:
return r
More dimensions?
0
10
20
30
40
0
10
20
30
20
22
24
26
0
10
20
30
40
0
10
20
30
20
22
24
26
x1 x2
y
x1 x2
y
Notation
Define “feature” x0 = 1 (constant)
Then
Measuring error
0 20
0
Error or “residual”
Prediction
Observation
Mean squared error
• How can we quantify the error?
• Could choose something else, of course…
– Computationally convenient (more later)
– Measures the variance of the residuals
– Corresponds to likelihood under Gaussian model of “noise”
MSE cost function
• Rewrite using matrix form
(Matlab) >> e = y’ – th*X’; J = e*e’/m;
Visualizing the cost function
-1 -0.5 0 0.5 1 1.5 2 2.5 3
-40
-30
-20
-10
0
10
20
30
40
-1 -0.5 0 0.5 1 1.5 2 2.5 3
-40
-30
-20
-10
0
10
20
30
40
θ1
J(θ)
Supervised learning
• Notation
– Features x
– Targets y
– Predictions ŷ
– Parameters q
Program (“Learner”)
Characterized by
some “parameters” θ
Procedure (using θ)
that outputs a prediction
Training data
(examples)
Features
Learning algorithm
Change θ
Improve performance
Feedback /
Target values Score performance
(“cost function”)
Finding good parameters
• Want to find parameters which minimize our error…
• Think of a cost “surface”: error residual for that θ…
Linear regression: direct minimization
+
MSE Minimum
• Consider a simple problem
– One feature, two data points
– Two unknowns: µ0, µ1
– Two equations:
• Can solve this system directly:
• However, most of the time, m > n
– There may be no linear function that hits all the data exactly
– Instead, solve directly for minimum of MSE function
SSE (Sum of squared errors) Minimum
• Reordering, we have
• X (XT X)-1 is called the “pseudo-inverse”
• If XT is square and independent, this is the inverse
• If m > n: overdetermined; gives minimum MSE fit
Matlab SSE
• This is easy to solve in Matlab…
% y = [y1 ; … ; ym]
% X = [x1_0 … x1_m ; x2_0 … x2_m ; …]
% Solution 1: “manual”
th = y’ * X * inv(X’ * X);
% Solution 2: “mrdivide”
th = y’ / X’; % th*X’ = y => th = y/X’
“matrix-right-divide”
0 2 4 6 8 10 12 14 16 18 20
0
2
4
6
8
10
12
14
16
18
Effects of MSE choice
• Sensitivity to outliers
16 2 cost for this one datum
Heavy penalty for large errors
-20 -15 -10 -5 0 5
0
1
2
3
4
5
L1 error (minimum absolute error )
0 2 4 6 8 10 12 14 16 18 20
0
2
4
6
8
10
12
14
16
18 L2, original data
L1, original data
L1, outlier data
Cost functions for regression
“Arbitrary” functions can’t be
solved in closed form…
- use gradient descent
(MSE)
(MAE)
Something else entirely…
(???)
Linear regression: nonlinear features
+
Nonlinear functions
• What if our hypotheses are not lines?
– Ex: higher-order polynomials
0 2 4 6 8 10 12 14 16 18 20
0
2
4
6
8
10
12
14
16
18
Order 1 polynom ial
0 2 4 6 8 10 12 14 16 18 20
0
2
4
6
8
10
12
14
16
18
Order 3 polynom ial
Nonlinear functions
• Single feature x, predict target y:
• Sometimes useful to think of “feature transform”
Add features:
Linear regression in new features
Higher-order polynomials
• Fit in the same way
• More “features”
0 2 4 6 8 10 12 14 16 18 20
0
2
4
6
8
10
12
14
16
18
Order 1 polynom ial
0 2 4 6 8 10 12 14 16 18 20
-2
0
2
4
6
8
10
12
14
16
18
Order 2 polynom ial
0 2 4 6 8 10 12 14 16 18 20
0
2
4
6
8
10
12
14
16
18
Order 3 polynom ial
Features
• In general, can use any features we think are useful
• Other information about the problem
– Sq. footage, location, age, …
• Polynomial functions
– Features [1, x, x2, x3, …]
• Other functions
– 1/x, sqrt(x), x1 * x2, …
• “Linear regression” = linear in the parameters
– Features we can make as complex as we want!
Higher-order polynomials
• Are more features better?
• “Nested” hypotheses
– 2nd order more general than 1st,
– 3rd order “ “ than 2nd, …
• Fits the observed data better
Overfitting and complexity
• More complex models will always fit the training data
better
• But they may “overfit” the training data, learning
complex relationships that are not really present
X
Y
Complex model
X
Y
Simple model
Test data
• After training the model
• Go out and get more data from the world
– New observations (x,y)
• How well does our model perform?
Training data
New, “test” data
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
0
5
10
15
20
25
30
Training data
Training versus test error
• Plot MSE as a function
of model complexity
– Polynomial order
• Decreases
– More complex function
fits training data better
• What about new data?
Mean
squared
error
Polynomial order
New, “test” data
• 0th to 1st order
– Error decreases
– Underfitting
• Higher order
– Error increases
– Overfitting

More Related Content

Similar to 06-01 Machine Learning and Linear Regression.pptx

know Machine Learning Basic Concepts.pdf
know Machine Learning Basic Concepts.pdfknow Machine Learning Basic Concepts.pdf
know Machine Learning Basic Concepts.pdfhemangppatel
 
Camp IT: Making the World More Efficient Using AI & Machine Learning
Camp IT: Making the World More Efficient Using AI & Machine LearningCamp IT: Making the World More Efficient Using AI & Machine Learning
Camp IT: Making the World More Efficient Using AI & Machine LearningKrzysztof Kowalczyk
 
Computational Biology, Part 4 Protein Coding Regions
Computational Biology, Part 4 Protein Coding RegionsComputational Biology, Part 4 Protein Coding Regions
Computational Biology, Part 4 Protein Coding Regionsbutest
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401butest
 
Lecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language TechnologyLecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language TechnologyMarina Santini
 
Machine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural NetworksMachine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural NetworksFrancesco Collova'
 
Neural Networks and Deep Learning for Physicists
Neural Networks and Deep Learning for PhysicistsNeural Networks and Deep Learning for Physicists
Neural Networks and Deep Learning for PhysicistsHéloïse Nonne
 
Machine Learning : why we should know and how it works
Machine Learning : why we should know and how it worksMachine Learning : why we should know and how it works
Machine Learning : why we should know and how it worksKevin Lee
 
G. Barcaroli, The use of machine learning in official statistics
G. Barcaroli, The use of machine learning in official statisticsG. Barcaroli, The use of machine learning in official statistics
G. Barcaroli, The use of machine learning in official statisticsIstituto nazionale di statistica
 
Introduction to conventional machine learning techniques
Introduction to conventional machine learning techniquesIntroduction to conventional machine learning techniques
Introduction to conventional machine learning techniquesXavier Rafael Palou
 
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017MLconf
 
Les outils de modélisation des Big Data
Les outils de modélisation des Big DataLes outils de modélisation des Big Data
Les outils de modélisation des Big DataKezhan SHI
 
notes as .ppt
notes as .pptnotes as .ppt
notes as .pptbutest
 
Machine learning pour les données massives algorithmes randomis´es, en ligne ...
Machine learning pour les données massives algorithmes randomis´es, en ligne ...Machine learning pour les données massives algorithmes randomis´es, en ligne ...
Machine learning pour les données massives algorithmes randomis´es, en ligne ...Kezhan SHI
 
ML Basic Concepts.pdf
ML Basic Concepts.pdfML Basic Concepts.pdf
ML Basic Concepts.pdfManishaS49
 
Machine Learning and Inductive Inference
Machine Learning and Inductive InferenceMachine Learning and Inductive Inference
Machine Learning and Inductive Inferencebutest
 
Machine learning ppt unit one syllabuspptx
Machine learning ppt unit one syllabuspptxMachine learning ppt unit one syllabuspptx
Machine learning ppt unit one syllabuspptxVenkateswaraBabuRavi
 

Similar to 06-01 Machine Learning and Linear Regression.pptx (20)

know Machine Learning Basic Concepts.pdf
know Machine Learning Basic Concepts.pdfknow Machine Learning Basic Concepts.pdf
know Machine Learning Basic Concepts.pdf
 
Camp IT: Making the World More Efficient Using AI & Machine Learning
Camp IT: Making the World More Efficient Using AI & Machine LearningCamp IT: Making the World More Efficient Using AI & Machine Learning
Camp IT: Making the World More Efficient Using AI & Machine Learning
 
Computational Biology, Part 4 Protein Coding Regions
Computational Biology, Part 4 Protein Coding RegionsComputational Biology, Part 4 Protein Coding Regions
Computational Biology, Part 4 Protein Coding Regions
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401
 
Machine learning
Machine learningMachine learning
Machine learning
 
Lecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language TechnologyLecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language Technology
 
LR2. Summary Day 2
LR2. Summary Day 2LR2. Summary Day 2
LR2. Summary Day 2
 
Machine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural NetworksMachine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural Networks
 
Neural Networks and Deep Learning for Physicists
Neural Networks and Deep Learning for PhysicistsNeural Networks and Deep Learning for Physicists
Neural Networks and Deep Learning for Physicists
 
Machine Learning : why we should know and how it works
Machine Learning : why we should know and how it worksMachine Learning : why we should know and how it works
Machine Learning : why we should know and how it works
 
G. Barcaroli, The use of machine learning in official statistics
G. Barcaroli, The use of machine learning in official statisticsG. Barcaroli, The use of machine learning in official statistics
G. Barcaroli, The use of machine learning in official statistics
 
Introduction to conventional machine learning techniques
Introduction to conventional machine learning techniquesIntroduction to conventional machine learning techniques
Introduction to conventional machine learning techniques
 
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
 
Les outils de modélisation des Big Data
Les outils de modélisation des Big DataLes outils de modélisation des Big Data
Les outils de modélisation des Big Data
 
notes as .ppt
notes as .pptnotes as .ppt
notes as .ppt
 
Machine learning pour les données massives algorithmes randomis´es, en ligne ...
Machine learning pour les données massives algorithmes randomis´es, en ligne ...Machine learning pour les données massives algorithmes randomis´es, en ligne ...
Machine learning pour les données massives algorithmes randomis´es, en ligne ...
 
Unit-1.ppt
Unit-1.pptUnit-1.ppt
Unit-1.ppt
 
ML Basic Concepts.pdf
ML Basic Concepts.pdfML Basic Concepts.pdf
ML Basic Concepts.pdf
 
Machine Learning and Inductive Inference
Machine Learning and Inductive InferenceMachine Learning and Inductive Inference
Machine Learning and Inductive Inference
 
Machine learning ppt unit one syllabuspptx
Machine learning ppt unit one syllabuspptxMachine learning ppt unit one syllabuspptx
Machine learning ppt unit one syllabuspptx
 

Recently uploaded

Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 

Recently uploaded (20)

Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 

06-01 Machine Learning and Linear Regression.pptx

  • 1. Machine Learning & Linear Regression Faculty of Computing and Information Technology CPIS-703: Intelligent Information Systems and Decision Support Department of Information Science
  • 2. Machine Learning - Grew out of work in AI - New capability for computers Examples: - Database mining Large datasets from growth of automation/web. E.g., Web click data, medical records, biology, engineering - Applications can’t program by hand. E.g., Autonomous helicopter, handwriting recognition, most of Natural Language Processing (NLP), Computer Vision.
  • 3. Machine Learning - Grew out of work in AI - New capability for computers Examples: - Database mining Large datasets from growth of automation/web. E.g., Web click data, medical records, biology, engineering - Applications can’t program by hand. E.g., Autonomous helicopter, handwriting recognition, most of Natural Language Processing (NLP), Computer Vision. - Self-customizing programs E.g., Amazon, Netflix product recommendations - Understanding human learning (brain, real AI).
  • 4. • Arthur Samuel (1959). Machine Learning: Field of study that gives computers the ability to learn without being explicitly programmed. • Tom Mitchell (1998) Well-posed Learning Problem: A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E. Machine Learning definition
  • 5. Classifying emails as spam or not spam. Watching you label emails as spam or not spam. The number (or fraction) of emails correctly classified as spam/not spam. None of the above—this is not a machine learning problem. Suppose your email program watches which emails you do or do not mark as spam, and based on that learns how to better filter spam. What is the task T in this setting? “A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.”
  • 6. Machine learning algorithms: - Supervised learning - Unsupervised learning Others: Reinforcement learning, recommender systems. Also talk about: Practical advice for applying learning algorithms.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 17. 0 100 200 300 400 0 500 1000 1500 2000 2500 Housing price prediction Price ($) in 1000’s Size in feet2 Regression: Predict continuous valued output (price) Supervised Learning “right answers” given
  • 18. Cancer (malignant, benign) Classification Discrete valued output (0 or 1) Malignant? 1(Y) 0(N) Tumor Size Tumor Size
  • 19. Tumor Size Age - Clump Thickness - Uniformity of Cell Size - Uniformity of Cell Shape …
  • 20. Treat both as classification problems. Treat problem 1 as a classification problem, problem 2 as a regression problem. Treat problem 1 as a regression problem, problem 2 as a classification problem. Treat both as regression problems. You’re running a company, and you want to develop learning algorithms to address each of two problems. Problem 1: You have a large inventory of identical items. You want to predict how many of these items will sell over the next 3 months. Problem 2: You’d like software to examine individual customer accounts, and for each account decide if it has been hacked/compromised. Should you treat these as classification or as regression problems?
  • 24.
  • 25.
  • 28. Organize computing clusters Social network analysis Image credit: NASA/JPL-Caltech/E. Churchwell (Univ. of Wisconsin, Madison) Astronomical data analysis Market segmentation
  • 29. Of the following examples, which would you address using an unsupervised learning algorithm? (Check all that apply.) Given a database of customer data, automatically discover market segments and group customers into different market segments. Given email labeled as spam/not spam, learn a spam filter. Given a set of news articles found on the web, group them into set of articles about the same story. Given a dataset of patients diagnosed as either having diabetes or not, learn to classify new patients as having diabetes or not.
  • 30. Supervised learning • Notation – Features x – Targets y – Predictions ŷ – Parameters q Program (“Learner”) Characterized by some “parameters” θ Procedure (using θ) that outputs a prediction Training data (examples) Features Learning algorithm Change θ Improve performance Feedback / Target values Score performance (“cost function”)
  • 31. Linear regression • Define form of function f(x) explicitly • Find a good f(x) within that family 0 10 20 0 20 40 Target y Feature x “Predictor”: Evaluate line: return r
  • 33. Notation Define “feature” x0 = 1 (constant) Then
  • 34. Measuring error 0 20 0 Error or “residual” Prediction Observation
  • 35. Mean squared error • How can we quantify the error? • Could choose something else, of course… – Computationally convenient (more later) – Measures the variance of the residuals – Corresponds to likelihood under Gaussian model of “noise”
  • 36. MSE cost function • Rewrite using matrix form (Matlab) >> e = y’ – th*X’; J = e*e’/m;
  • 37. Visualizing the cost function -1 -0.5 0 0.5 1 1.5 2 2.5 3 -40 -30 -20 -10 0 10 20 30 40 -1 -0.5 0 0.5 1 1.5 2 2.5 3 -40 -30 -20 -10 0 10 20 30 40 θ1 J(θ)
  • 38. Supervised learning • Notation – Features x – Targets y – Predictions ŷ – Parameters q Program (“Learner”) Characterized by some “parameters” θ Procedure (using θ) that outputs a prediction Training data (examples) Features Learning algorithm Change θ Improve performance Feedback / Target values Score performance (“cost function”)
  • 39. Finding good parameters • Want to find parameters which minimize our error… • Think of a cost “surface”: error residual for that θ…
  • 40. Linear regression: direct minimization +
  • 41. MSE Minimum • Consider a simple problem – One feature, two data points – Two unknowns: µ0, µ1 – Two equations: • Can solve this system directly: • However, most of the time, m > n – There may be no linear function that hits all the data exactly – Instead, solve directly for minimum of MSE function
  • 42. SSE (Sum of squared errors) Minimum • Reordering, we have • X (XT X)-1 is called the “pseudo-inverse” • If XT is square and independent, this is the inverse • If m > n: overdetermined; gives minimum MSE fit
  • 43. Matlab SSE • This is easy to solve in Matlab… % y = [y1 ; … ; ym] % X = [x1_0 … x1_m ; x2_0 … x2_m ; …] % Solution 1: “manual” th = y’ * X * inv(X’ * X); % Solution 2: “mrdivide” th = y’ / X’; % th*X’ = y => th = y/X’ “matrix-right-divide”
  • 44. 0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18 Effects of MSE choice • Sensitivity to outliers 16 2 cost for this one datum Heavy penalty for large errors -20 -15 -10 -5 0 5 0 1 2 3 4 5
  • 45. L1 error (minimum absolute error ) 0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18 L2, original data L1, original data L1, outlier data
  • 46. Cost functions for regression “Arbitrary” functions can’t be solved in closed form… - use gradient descent (MSE) (MAE) Something else entirely… (???)
  • 48. Nonlinear functions • What if our hypotheses are not lines? – Ex: higher-order polynomials 0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18 Order 1 polynom ial 0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18 Order 3 polynom ial
  • 49. Nonlinear functions • Single feature x, predict target y: • Sometimes useful to think of “feature transform” Add features: Linear regression in new features
  • 50. Higher-order polynomials • Fit in the same way • More “features” 0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18 Order 1 polynom ial 0 2 4 6 8 10 12 14 16 18 20 -2 0 2 4 6 8 10 12 14 16 18 Order 2 polynom ial 0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18 Order 3 polynom ial
  • 51. Features • In general, can use any features we think are useful • Other information about the problem – Sq. footage, location, age, … • Polynomial functions – Features [1, x, x2, x3, …] • Other functions – 1/x, sqrt(x), x1 * x2, … • “Linear regression” = linear in the parameters – Features we can make as complex as we want!
  • 52. Higher-order polynomials • Are more features better? • “Nested” hypotheses – 2nd order more general than 1st, – 3rd order “ “ than 2nd, … • Fits the observed data better
  • 53. Overfitting and complexity • More complex models will always fit the training data better • But they may “overfit” the training data, learning complex relationships that are not really present X Y Complex model X Y Simple model
  • 54. Test data • After training the model • Go out and get more data from the world – New observations (x,y) • How well does our model perform? Training data New, “test” data
  • 55. 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0 5 10 15 20 25 30 Training data Training versus test error • Plot MSE as a function of model complexity – Polynomial order • Decreases – More complex function fits training data better • What about new data? Mean squared error Polynomial order New, “test” data • 0th to 1st order – Error decreases – Underfitting • Higher order – Error increases – Overfitting