SlideShare a Scribd company logo
1 of 86
Download to read offline
Linear
Regression
This course content is being actively
developed by Delta Analytics, a 501(c)3
Bay Area nonprofit that aims to
empower communities to leverage their
data for good.
Please reach out with any questions or
feedback to inquiry@deltanalytics.org.
Find out more about our mission here.
Delta Analytics builds technical capacity
around the world.
Module 3:
Linear Regression
Let’s do a quick
review of module 2!
How does a model learn from raw data?
Task
What is the problem we want our
model to solve?
Performance
Measure
Quantitative measure we use to
evaluate the model’s performance.
Learning
Methodology
ML algorithms can be supervised or
unsupervised. This determines the
learning methodology.
All models have the following components:
Source: Deep Learning Book - Chapter 5: Introduction to Machine Learning
Classification task:
No
No
Yes
Yes
Exercise 1
Predicted
Approval for
credit card
Y Y*
12,000
60,000
11,000
200,000
Current
Annual
Income
($)
Approval
for credit
card
Yes
No
No
Yes
Outstanding
Debt
200
60,000
0
10,000
What are the explanatory features? What
is the outcome feature?
Task
Performance
Measure
Learning
Experience
Let’s fill in the blanks for our credit approval
example:
Task
Should this customer be approved
for a credit card?
Performance
Measure
Log Loss
Learning
Experience
Supervised
Let’s fill in the blanks for our credit approval
example:
Regression task:
90%
30%
95%
50%
Exercise 2
X Y Y*
10
2
12
0
Time spent
A week
studying
machine
learning
Accuracy of
classification
model built
by student
30%
60%
26%
88%
Predicted
Accuracy of
classification
model
What are the explanatory features?
What is the outcome feature?
Task
Performance
Measure
Learning
Experience
Let’s fill in the blanks for our studying example:
Task
How accurate is this student’s
classification model?
Performance
Measure
MSE
Learning
Experience
Supervised
Let’s fill in the blanks for our studying example:
Module 3:
Linear Regression
Course overview:
Now let’s turn to the data we will be using...
✓ Module 1: Introduction to Machine Learning
✓ Module 2: Machine Learning Deep Dive
✓ Module 3: Linear Regression
❏ Module 4: Decision Trees
❏ Module 5: Ensemble Algorithms
❏ Module 6: Unsupervised Learning Algorithms
❏ Module 7: Natural Language Processing Part 1
❏ Module 8: Natural Language Processing Part 2
❏ Linear regression
❏ Relationship between two variables (x and y)
❏ Formalizing f(x)
❏ Correlation between two variables
❏ Assumptions
❏ Feature engineering and selection
❏ Learning process: Loss function and Mean Squared Error
❏ Univariate regression, Multivariate regression
❏ Measures of performance (R2, Adjusted R2, MSE)
❏ Overfitting, Underfitting
Module Checklist
What is linear regression?
Linear regression is a model that explains the relationship
between explanatory features and an outcome feature as a
line in two dimensional space.
Linear regression has been in use since
the 19th century & is one of the most
important machine learning tools
available to researchers.
Many other models are built upon the
logic of linear models. For example, the
most simple form of deep learning model,
with no hidden layers, is a linear model!
Why is linear regression important?
Source: History of Linear Regression - Introduction to Linear Regression
Modeling
Phase
Performance
Learning
Methodology
Task
Linear regression: model cheat sheet
● Linear relationship
between x and y
● Normal distribution of
variables
● No multicollinearity
(Independent variables)
● Homoscedasticity
● Rule of thumb: at least 20
observations per
independent variable in
the analysis
● Sensitive to outliers
● The world is not always
linear; we often want to
model more complex
relationships
● Does not allow us to
model interactions
between explanatory
features (we will be
able to do this using a
decision tree)
● Very popular model
with intuitive, easy to
understand results.
● Natural extension of
correlation analysis
Pros Cons Assumptions*
* We will explain each of these assumptions in this module!
Task
What is the problem we want our
model to solve?
Defining f(x) What is f(x) for a linear model?
Feature
engineering
& selection
What is x? How do we decide what
explanatory features to include in
our model?
Today we are looking closer at each component of the framework we
discussed in the last class
What assumptions does a linear
regression make about the data? Do
we have to transform the data?
Is our f(x)
correct for
this problem?
Learning
Methodology
Linear models are supervised, how
does that affect the learning
processing?
What is our
loss function?
Every supervised model has a loss
function it wants to minimize.
Optimization
process
How does the model minimize the
loss function.
Learning methodology is how the linear regression model learns which line
best fits the raw data.
How does
our ML
model learn?
Overview of how the model
teaches itself.
Performance
Quantitative measure we use to
evaluate the model’s performance.
Measures of
performance
R2, Adjusted
R2, MSE
Feature
Performance
Statistical
significance, p
values
There are linear models for both regression and classification problems.
Overfitting,
underfitting,
bias, variance
Ability to
generalize to
unseen data
The task
Task
What is the problem we want our
model to solve?
Defining f(x) What is f(x) for a linear model?
Feature
engineering
& selection
What is x? How do we decide what
explanatory features to include in
our model?
What assumptions does a linear
regression make about the data? Do
we have to transform the data?
Is our f(x)
correct for
this problem?
Regression
Classification
Continuous variable
Categorical variable
Recall our discussion of two types of supervised tasks,
regression & classification. There are linear models for
both. Today we will only discuss regression.
Task
Ordinary Least Squares
(OLS) regression
Logistic regression
We will not cover this in the
lecture, but we will provide
resources for further study
OLS is a linear regression method
based on minimizing the sum of
squared residuals
We have plotted the
total $ loaned by
KIVA in Kenya each
year. What is the
trend? How would
you summarize this
trend in one
sentence?
OLS
Regression
Task
A OLS regression is a trend line that predicts how much
Y will change for a given change in x. Let’s try and
break that down a little more by looking at an example.
“Every year, the value
of loans in KE appears
to be increasing”
A linear regression formalizes our perceived
trend as a relationship between x and Y. It
can be understood intuitively as a trend line.
Every additional year
corresponds to x
additional dollars Kiva
loans in KE.
Human Intuition OLS Regression
A linear model allows us
to predict beyond our
set of observations
because for every point
on the x axis we can find
the corresponding Y*
A linear model expresses the relationship between our
explanatory features and our outcome features as a
straight line. The output of f(x) for a linear model will
always be a line.
x
Y*
For example, we can now say
for an x not in our
scatterplot (May, 2012) what
we predict the $ amount of
loans is.
Defining f(x)
Is linear regression
the right model for
our data?
A big part of being a machine learning researcher involves choosing the right
model for the task.
Each model makes certain assumptions about the underlying data.
Let’s take a closer look at how this relates in linear regression.
OLS
Regression
Task
Is our f(x)
correct for
this problem?
OLS
Regression
Task
Before we choose a linear model, we need
to make sure all the assumptions hold true
in our data.
Is our f(x)
correct for
this problem?
❏ Linear relationship between x and Y
❏ Normal distribution of variables
❏ No autocorrelation (Independent variables)
❏ Homoscedasticity
❏ No multicollinearity
❏ Rule of thumb: at least 20 observations per independent variable in the analysis
OLS Linear Regression Assumptions
OLS
Regression
Task
Is there a linear relationship
between x and Y?
Is our f(x)
correct for
this problem?
Source: Linear and non-linear relationships
Linear regression assumes that there is a linear
relationship between x and Y.
If this is not true, our trend line will do a poor
job of predicting Y.
Source: Statistics - Normal distribution
OLS
Regression
Task
Is our data normally distributed?Is our f(x)
correct for
this problem?
Normal distribution of explanatory and
outcome features avoids distortion of
results due to outliers or skewed data.
OLS
Regression
Task
Multicollinearity occurs when
explanatory variables are highly
correlated. Do we have multicollinearity?
Is our f(x)
correct for
this problem?
Multicollinearity introduces redundancy to the model and reduces our certainty in the results.
We want no multicollinearity in our model.
Province
Country
Province and county are highly
correlated. We will want to
include only one of these
variables in our model.
Age
Loan
Amount
Age and loan amount appear to
have no multicollinearity. We can
include both in our model.
OLS
Regression
Task
Do we have homoscedasticity?Is our f(x)
correct for
this problem?
Error term or “noise” is the same across all
values of the outcome variables. If
homoscedasticity does not hold, cases
with a greater error term will have
outsized influence on the regression.
Data must be homoscedastic, meaning the
error rate is evenly distributed at all
outcome variable values.
Good Not good Not good
Source: Homoscedasticity
OLS
Regression
Task
Do we have autocorrelation?Is our f(x)
correct for
this problem?
Autocorrelation is correlation between
values of a variable and its delayed copy.
For example, a stock’s price today is
correlated with yesterday’s price
Autocorrelation commonly occurs when
you work with time series.
OLS
Regression
Task
In the coding lab, we will go over
code that will help you determine
whether a linear model provides the
best f(x)
Is our f(x)
correct for
this problem?
✓ Linear relationship
between x and Y
✓ Normal distribution of
variables
✓ No autocorrelation
(Independent variables)
✓ Homoscedasticity
✓ Rule of thumb: at least 20
observations per
independent variable in
the analysis
OLS Assumptions
We have signed off on all of our
assumptions, which means we can
confidently choose a linear OLS
model for this task.
Let’s start building our model!
OLS
Regression
Task
Question! What happens if you don’t
have all the assumptions?
Is our f(x)
correct for
this problem?
✓ Linear relationship
between x and Y
✓ Normal distribution of
variables
✓ No autocorrelation
(Independent variables)
✓ Homoscedasticity
✓ Rule of thumb: at least 20
observations per
independent variable in
the analysis
OLS Assumptions If these assumptions do not hold true our trend line will
not be accurate. We have a few options:
1) Transform our data to fulfill the assumptions
2) Choose a different model to capture the
relationship between x and Y
Yes! Linear
regression is an
appropriate model
choice. Now what?
True Y
Y*=f(x)+e
OLS
Regression
Task
Defining f(x)
Remember that all models involve
a function f(x) that map an input x
to a predicted Y (Y*). The goal of
the function is to have a model
that predicts Y* as close to true Y
as possible.
f(x) for a linear model is:
Y*=a + bx + e
What is f(x) for a linear regression
model?
Just like in algebra!
Y*
a
OLS
Regression
Task
Defining f(x)
Y*=a + bx + e
What does Y*=a + bx + e actually mean?
Y
bx + e
The y-intercept of the
regression line
b The slope or gradient of the
regression line. This
determines how steep the
line is and it’s directionality
e The irreducible error term,
error our model cannot
reduce
Y* The predicted Y output of
our function
a
parameters
How does
our OLS
model learn?
There are an infinite number of possible
lines in a two dimensional space. How
does our model choose the best one?
Learning
Methodology
Y
X
Linear regression is a learning
algorithm. That is what makes it a
machine learning model.
It learns to find the best trend line
from an infinite number of
possibilities. How?
First, we must understand what
levers we control.
Values that control the
behavior of the model
and are learnt through
experience.
In each model there are some things we cannot control, like e
(irreducible error).
A model may also have hyperparameters which are set beforehand
and not trained using data (no need to think too much about this now).
Y*=a + bx + e
Parameters
Hyperparameters Higher level settings of
a model that are fixed
before training begins.
How does
our OLS
model learn?
Parameters of a model are values that we
control. The parameters in an OLS Model
are a (intercept) and b (slope)
Learning
Methodology
Our model can move a &
b but not e.
How does
our OLS
model learn?
a and b are the only two levers our
simple OLS model can change to get Y*
closer to Y.
Learning
Methodology
Y
X
Y*=a + bx + e
How do I decide in
what direction to
change a and b?
changing a shifts
our line up or down
the y intercept,
+/- b changes the
steepness of our
line and direction
How does
our OLS
model learn?
We can change a and b to move our line
in space. Below is some intuition about
how changing a and b affects f(x).
Learning
Methodology
Y*=a + bx + e
+ b means an
upwards sloping
line
a and b are our
two parameters.
changing a shifts
our line up or down
the y intercept.
Negative a moves
the y-intercept
below 0.
- b means a
downward
sloping line
b=0 means there
is no relationship
between x and Y
Source: Intro to Stat - Introduction to Linear Regression
How does
our OLS
model learn?
The directionality of b is very
important. It tells us if Y gets smaller
or larger when we increase x.
Learning
Methodology
Y*=a + bx + e
+ b is a positive slope - b is a negative slope
Source: Beginning Algebra - Coordinate System and graphing lines
Test your intuition.
Exercise 1
Learning
Methodology
Y
X
Y*=a + bx + e
What happens if I
increase a by 2?
Exercise 1
Learning
Methodology
Y
X
Y*=a + bx + e
What happens if I
increase a by 2?
Exercise 2
Learning
Methodology
Y
X
Y*=a + bx + e
What happens if I
increase b by 3?
Exercise 2
Learning
Methodology
Y
X
Y*=a + bx + e
What happens if I
increase b by 3?
What parameters give us
the best trend line?
What is our
loss function?
We make a decision about how to change
our parameters based upon our loss
function.
Learning
Methodology
Y*=a + bx + e
Let me try
different values
of a and b to
minimize the
total loss
function.
Recall: Our model starts with a random a and
b, and our job is to change a and b in a way that
moves Y* closer to the true Y.
You are in fact trying to reduce the distance
between Y* and true Y. We measure the
distance using mean squared error. This is our
loss function.
All supervised models have a loss function
(sometimes also known as the cost function)
they must optimize by changing the model
parameters.
What is our
loss function?
For every a and b we try, we measure
the mean squared error. Remember
MSE?
Learning
Methodology
This job isn’t done
until I reduce MSE.
Source: Intro to Stat -Introduction to Linear Regression
Mean squared error is a measure of
how close Y* is to Y.
There are four steps to MSE:
Y-Y* For every point in our dataset,
measure the difference between
true Y and predicted Y.
^2 Square each Y-Y* to get the
absolute distance, so positive
values don’t cancel out negative
ones when we sum.
Sum Sum across all observations so we
get the total error.
mean Divide the sum by the number of
observations we have.
What is our
loss function?
The green boxes in the chart below is
the MSE for each data point. We sum
and then take the mean across all data
points to get the MSE.
Learning
Methodology
Source: Seeing Theory - Regression
Y-Y* For every point in our dataset,
measure the difference between
true Y and predicted Y.
^2 Square each Y-Y* to get the
absolute distance, so positive
values don’t cancel out negative
ones when we sum.
Sum Sum across all observations so we
get the total error.
mean Divide the sum by the number of
observations we have.
Optimization
Process
The process of changing a and b to
reduce MSE is called learning. It is what
makes OLS regression a machine learning
algorithm.
Learning
Methodology
Source: Seeing Theory - Regression
For every combination of a
and b we choose there is an
associated MSE. The learning
process involves updating a
and b in order to reach the
global minimum.
The learning process for OLS
is technically called learning
by gradient descent.
Optimization of
MSE:
Next, let’s look at model validation.
What is our
loss function?
Optimization
Process
How does
our OLS
model learn?
Learning
Methodology
Is our f(x)
correct for
this problem?
OLS
Regression
Task
Defining f(x)
Model Validation
Validation is a process of evaluating your model performance.
We have to evaluate a model on two critical aspects:
1. How close the model’s Y* is to true Y
2. How well the model performs in the real world (i.e., on unseen data).
Common validation metrics:
1. R2
2. Adjusted R2
3. MSE (a loss function and a measure of performance)
Thoroughly validating your model is crucial.
Let’s take a look at these metrics…
Mean Squared Error
We have already introduced MSE. This serves as
both a loss function and an evaluation of
performance.
R2
Explained Variation / Total Variation
Increases as number of x’s increase.
Adjusted R2
Explained Variation / Total Variation, adjusted for
the number of features present in the model
Measures of
performance
Performance
For an OLS regression, we will introduce
three measures of model performance:
R2, Adjusted R2 and MSE.
How did I do?
R2 and Adjusted R2 answer the
question, how much of y’s variation can
be explained by our model?
This reminds us of MSE, but R2 metrics
are scaled to be between 0 and 1.
Note: Adjusted R2
is preferable to R2
,
because R2
increases as you include
more features in your model. This may
artificially inflate R2
.
Measures of
performance
Performance
True Y
Y*=f(x)+e
If Y* is explained variation, what’s left
between Y* and Y is unexplained variation
Source: This is a snippet of the output from Notebook 2.
In a regression output, you can find R-squared metrics here:
R2 can give us causation where correlation couldn’t!
R-squared reminds us of
correlation, but there is an
important difference.
Correlation measures the
association between x and y.
R-squared measures how much y
is explained by x.
Ability to
generalize to
unseen data
Performance
Now we know how well the model
performs on the data we have, how do
we predict how it will do in the real
world?
We need a way to quantify the way our model performs on unseen data. In an ideal
world, we would go out and find new data to test our model on.
However, this is often not realistic as we are constrained by time and resources.
Instead of doing this, we can instead split our data into two portions: training and
test data.
We will use a portion of our data to train our model, and the
rest of our data (which is “unseen” by the model, like real world
data) to test our model.
● Randomly split data into “training” and “test” sets
● Use regression results from “training” set to predict“test” set
● Compare Predicted Y to Actual Y
Source: Fortmann-Roe, Accurately Measuring Model Prediction Error,
http://scott.fortmann-roe.com/docs/MeasuringError.html
Ability to
generalize
to unseen
data
Performance
We split our labelled data into training
and test. The test represents our
unseen future data.
Predicted Y - Actual Y
The test data is “unseen” in that the algorithm doesn’t use it to
train the model!
Ability to
generalize
to unseen
data
Performance
We split our labelled data into training
and test. The test represents our
unseen future data.
Predicted Y
Using ~70%
data
Actual Y
Using ~30%
data
It is important to clarify here that we are using train Y*- train Y to train the model,
which is different than the test Y*- test Y we use to evaluate how well the model can
generalize to unseen data.
Ability to
generalize
to unseen
data
Performance
Don’t confuse our use of loss functions
with our use of test data!
Training data
Yields
Our model
(the red line)
We apply the
model to
Test data
We use train Y*- train Y in our
loss functions that train the
model.
We use test Y* - test Y to see how well
our model can be generalized to unseen
data, or data that wasn’t used to train
our model..
Evaluating whether or not a model can be generalized to unseen data is very
important - in fact, being able to predict it is often the whole point of creating a
model in the first place.
If we do not evaluate how well a model can generalize to outside data, there is a
danger that we are creating a model that is too specific to our dataset - that is, a
model that is GREAT at predicting this particular dataset, but is USELESS at
predicting the real world.
What does this look like?
Ability to
generalize
to unseen
data
Performance
Splitting train and test data is
extremely important!
Here, we are using wealth to predict happiness. The dotted lines are the models generated by ML algorithms.
● On the left, the model is not useful because it is too general and does not capture the relationship
accurately.
● On the right, the model is not useful because not general enough and captures every single
idiosyncrasy in the dataset. This means the model is too specific to this dataset, and cannot be
generalized to different datasets.
We want to have a model that is just right - it is accurate enough within the dataset, and is general
enough to be applied outside of the dataset!
Ability to
generalize
to unseen
data
Performance
Splitting train and test data is
extremely important!
“underfitting” “overfitting”“just right”
Bias is how accurate the model is at
predicting the dataset.
Variance is how sensitive the model
is to small fluctuations in the
training set.
Ideally, we would have both low bias
and low variance.
Source: Fortmann-Roe, Accurately Measuring Model Prediction Error.
http://scott.fortmann-roe.com/docs/MeasuringError.html
Ability to
generalize
to unseen
data
Performance
This concept of a model being “just
right” is also called the bias-variance
trade-off.
“underfitting”
“overfitting”“just right”
terrible!
Don’t worry about the specifics of the bias-variance
trade-off for now - we will return to this concept
regularly throughout the course and in-depth in the next
module.
Let’s turn to another aspect of performance that is very
important: feature performance.
Performance
The performance of model components is
just as important as the performance
of the model itself
Feature
Importance
Performance
Understanding feature importance allows us to:
1) Quantify which feature is driving explanatory power in the model
2) Compare one feature to another
3) Guide final feature selection
Evaluating the performance of features becomes important when we start
using more sophisticated linear models where f(x) includes more than one
explanatory variable. Let’s introduce that concept and then return here.
Univariate
Multivariate
One explanatory
variable
Multiple explanatory
variables.
Deciding whether a univariate or
multivariate regression is the best
model for your problem is part of the
task.
OLS Task
E.g. Trying to predict
malaria using patient’s
temperature, travel history,
whether or not they have
chills, nausea, headache.
E.g. Trying to predict
malaria using only patient’s
temperature
Univariate vs.
Multivariate
as f(x)
Source: James, Witten, Hastie and Tibshirani. Introduction to Statistical Learning with Applications in R.
Income =
a + b1
(Seniority) + b2
(Years of
Education) + e
This is an example of linear
regression with 2 explanatory
variables in 3 dimensions. Extend
this to n variables in n+1
dimensions.
OLS Task Defining f(x)
Many of the relationships we try and
model are more complicated than a
univariate regression. Instead, we use a
multivariate regression.
Regressions can also be non-linear!
Source: [link]
Miles walked per day =
a + b1
(Age)2
+ b2
(Age) + e
In this example, we see that the
relationship between your mobility
and your age increases, then
decreases after some point. This is
an example of a non-linear
regression, which is best explained
by a quadratic equation.
OLS Task Defining f(x)
# miles
walked
per day
age
When we model using a multivariate
regression, feature selection becomes
an important step. What explanatory
variables should we include?
OLS Task
Feature
engineering
& selection
Qualitative Research
Exploratory Analysis
Using Other Models
Analyzing Linear
Regression Output
Feature selection is often the difference between a project that fails and succeeds. Some
common ways of doing feature selection:
Literature review of past work
done
Sanity checks on data; human intuition
of what would influence outcome
Quantify feature importance (we’ll see
this later with decision trees)
Look at each feature’s coefficient
and p-value
Feature
Importance
Performance
How do we assess feature importance in
a linear regression?
Each feature has a coefficient and a
p-value.
Each feature has a:
1. Coefficient
2. P-value
The output of a linear regression is a model:
The size of the coefficient = amount of influence that the feature has on
y. Whether or not the feature is negative or positive is the direction of the
relationship that the feature has with y.
Feature
Importance
Performance
How do we assess feature importance in
a linear regression?
Y* = intercept + coef*feature
Feature
Importance
Each feature has a:
1. Coefficient
2. P-value
A huge coefficient is great, but how much confidence we have
in that coefficient is dependent on that coefficient’s p-value.
In technical terms, the p-value is the probability of getting results
as extreme as the ones observed, if the coefficient were actually
zero. It answers the question, “Could I have gotten my result
by random chance?”
A small p-value (<= 0.05, or 5%) says that the result is
probably not random chance - great news for our model!
Performance
How do we assess feature importance in
a linear regression?
Expressed as a %
Heads or tails?
It’s random!
Feature
Importance
Performance
How would you assess this feature using
p-value and size of the coefficient?
Feature
Importance
Performance
How would you assess this feature using
p-value and size of the coefficient?
A person in the clothing sector will, on
average, get a higher loan amount, but
only by very little. I’m reasonably
confident in this conclusion.
Extrapolation is the act of
inferring unknown values
based on known data.
Even validated algorithms are
subject to irresponsible
extrapolation!
Source: https://xkcd.com/605/
Feature
Importance
Performance A few last thoughts...
Linear regression has almost
countless potential applications,
provided we interpret carefully.
“All models are wrong, some are useful.”
- George E.P. Box,
British statistician
✓ Linear regression
✓ Relationship between two variables (x and y)
✓ Formalizing f(x)
✓ Correlation between two variables
✓ Assumptions
✓ Feature engineering and selection
✓ Univariate regression, Multivariate regression
✓ Measures of performance (R2, Adjusted R2, MSE)
✓ Overfitting, Underfitting
✓ Learning process: Loss function and Mean Squared Error
Module Checklist
Advanced resources
● Textbooks
○ An Introduction to Statistical Learning with Applications in R (James, Witten, Hastie and
Tibshirani): Chapters 2.1, 3, 4, 6
○ The Elements of Statistical Learning: Data Mining, Inference, and Prediction (Hastie,
Tibshirani, Friedman): Chapters 3, 4
● Online resources
○ Analytics Vidhya’s guide to understanding regression:
https://www.analyticsvidhya.com/blog/2015/08/comprehensive-guide-regression/
○ Brown University’s introduction to probability and statistics,
http://students.brown.edu/seeing-theory/
● If you are interested in more sophisticated regression models, search:
○ Logistic regression, Polynomial regression, Interactions
● If you are interested in additional ways to solve multicollinearity, search:
○ Eigenvectors, Principal Components Analysis
Want to take this further? Here are
some resources we recommend:
You are on fire! Go straight to the
next module here.
Need to slow down and digest? Take a
minute to write us an email about
what you thought about the course. All
feedback small or large welcome!
Email: sara@deltanalytics.org
Congrats! You finished
module 3!
Find out more about
Delta’s machine
learning for good
mission here.

More Related Content

What's hot

Feature enginnering and selection
Feature enginnering and selectionFeature enginnering and selection
Feature enginnering and selectionDavis David
 
Final thesis presentation
Final thesis presentationFinal thesis presentation
Final thesis presentationPawan Singh
 
Exploratory data analysis
Exploratory data analysis Exploratory data analysis
Exploratory data analysis Peter Reimann
 
Exploratory data analysis with Python
Exploratory data analysis with PythonExploratory data analysis with Python
Exploratory data analysis with PythonDavis David
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsMd. Main Uddin Rony
 
MachineLearning.ppt
MachineLearning.pptMachineLearning.ppt
MachineLearning.pptbutest
 
An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine Learningbutest
 
Exploratory data analysis
Exploratory data analysisExploratory data analysis
Exploratory data analysisVishwas N
 
Ml3 logistic regression-and_classification_error_metrics
Ml3 logistic regression-and_classification_error_metricsMl3 logistic regression-and_classification_error_metrics
Ml3 logistic regression-and_classification_error_metricsankit_ppt
 
Module 1 introduction to machine learning
Module 1  introduction to machine learningModule 1  introduction to machine learning
Module 1 introduction to machine learningSara Hooker
 
Introduction to data analytics
Introduction to data analyticsIntroduction to data analytics
Introduction to data analyticsUmasree Raghunath
 
Feature selection
Feature selectionFeature selection
Feature selectiondkpawar
 
Data preprocessing in Machine learning
Data preprocessing in Machine learning Data preprocessing in Machine learning
Data preprocessing in Machine learning pyingkodi maran
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning Gopal Sakarkar
 

What's hot (20)

Feature enginnering and selection
Feature enginnering and selectionFeature enginnering and selection
Feature enginnering and selection
 
Final thesis presentation
Final thesis presentationFinal thesis presentation
Final thesis presentation
 
Exploratory data analysis
Exploratory data analysis Exploratory data analysis
Exploratory data analysis
 
Exploratory data analysis with Python
Exploratory data analysis with PythonExploratory data analysis with Python
Exploratory data analysis with Python
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
 
MachineLearning.ppt
MachineLearning.pptMachineLearning.ppt
MachineLearning.ppt
 
An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine Learning
 
Exploratory data analysis
Exploratory data analysisExploratory data analysis
Exploratory data analysis
 
Random forest
Random forestRandom forest
Random forest
 
Linear and Logistics Regression
Linear and Logistics RegressionLinear and Logistics Regression
Linear and Logistics Regression
 
Ml3 logistic regression-and_classification_error_metrics
Ml3 logistic regression-and_classification_error_metricsMl3 logistic regression-and_classification_error_metrics
Ml3 logistic regression-and_classification_error_metrics
 
Module 1 introduction to machine learning
Module 1  introduction to machine learningModule 1  introduction to machine learning
Module 1 introduction to machine learning
 
Introduction to data analytics
Introduction to data analyticsIntroduction to data analytics
Introduction to data analytics
 
Missing data handling
Missing data handlingMissing data handling
Missing data handling
 
Feature selection
Feature selectionFeature selection
Feature selection
 
Data preprocessing in Machine learning
Data preprocessing in Machine learning Data preprocessing in Machine learning
Data preprocessing in Machine learning
 
Data science
Data scienceData science
Data science
 
EDA
EDAEDA
EDA
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
 
Statistics for data science
Statistics for data science Statistics for data science
Statistics for data science
 

Similar to Module 3: Linear Regression

HRUG - Linear regression with R
HRUG - Linear regression with RHRUG - Linear regression with R
HRUG - Linear regression with Regoodwintx
 
Linear models for data science
Linear models for data scienceLinear models for data science
Linear models for data scienceBrad Klingenberg
 
Interpretability in ML & Sparse Linear Regression
Interpretability in ML & Sparse Linear RegressionInterpretability in ML & Sparse Linear Regression
Interpretability in ML & Sparse Linear RegressionUnchitta Kan
 
Data Science as a Career and Intro to R
Data Science as a Career and Intro to RData Science as a Career and Intro to R
Data Science as a Career and Intro to RAnshik Bansal
 
Machine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paperMachine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paperJames by CrowdProcess
 
MachineLlearning introduction
MachineLlearning introductionMachineLlearning introduction
MachineLlearning introductionThe IOT Academy
 
Neural Network Model
Neural Network ModelNeural Network Model
Neural Network ModelEric Esajian
 
A tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbiesA tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbiesVimal Gupta
 
Spark + AI Summit - The Importance of Model Fairness and Interpretability in ...
Spark + AI Summit - The Importance of Model Fairness and Interpretability in ...Spark + AI Summit - The Importance of Model Fairness and Interpretability in ...
Spark + AI Summit - The Importance of Model Fairness and Interpretability in ...Francesca Lazzeri, PhD
 
Introduction to machine learning and model building using linear regression
Introduction to machine learning and model building using linear regressionIntroduction to machine learning and model building using linear regression
Introduction to machine learning and model building using linear regressionGirish Gore
 
KIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdfKIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdfDr. Radhey Shyam
 

Similar to Module 3: Linear Regression (20)

HRUG - Linear regression with R
HRUG - Linear regression with RHRUG - Linear regression with R
HRUG - Linear regression with R
 
Qt unit i
Qt unit   iQt unit   i
Qt unit i
 
Linear models for data science
Linear models for data scienceLinear models for data science
Linear models for data science
 
Linear_Regression
Linear_RegressionLinear_Regression
Linear_Regression
 
Interpretability in ML & Sparse Linear Regression
Interpretability in ML & Sparse Linear RegressionInterpretability in ML & Sparse Linear Regression
Interpretability in ML & Sparse Linear Regression
 
An introduction to R
An introduction to RAn introduction to R
An introduction to R
 
Data Science as a Career and Intro to R
Data Science as a Career and Intro to RData Science as a Career and Intro to R
Data Science as a Career and Intro to R
 
Machine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paperMachine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paper
 
MachineLlearning introduction
MachineLlearning introductionMachineLlearning introduction
MachineLlearning introduction
 
Introduction to ml
Introduction to mlIntroduction to ml
Introduction to ml
 
Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...
Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...
Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...
 
Neural Network Model
Neural Network ModelNeural Network Model
Neural Network Model
 
A tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbiesA tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbies
 
Regresión
RegresiónRegresión
Regresión
 
Machine_Learning.pptx
Machine_Learning.pptxMachine_Learning.pptx
Machine_Learning.pptx
 
Spark + AI Summit - The Importance of Model Fairness and Interpretability in ...
Spark + AI Summit - The Importance of Model Fairness and Interpretability in ...Spark + AI Summit - The Importance of Model Fairness and Interpretability in ...
Spark + AI Summit - The Importance of Model Fairness and Interpretability in ...
 
Introduction to machine learning and model building using linear regression
Introduction to machine learning and model building using linear regressionIntroduction to machine learning and model building using linear regression
Introduction to machine learning and model building using linear regression
 
KIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdfKIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdf
 
Bivariate Regression
Bivariate RegressionBivariate Regression
Bivariate Regression
 
Econometrics
EconometricsEconometrics
Econometrics
 

More from Sara Hooker

Module 9: Natural Language Processing Part 2
Module 9:  Natural Language Processing Part 2Module 9:  Natural Language Processing Part 2
Module 9: Natural Language Processing Part 2Sara Hooker
 
Module 8: Natural language processing Pt 1
Module 8:  Natural language processing Pt 1Module 8:  Natural language processing Pt 1
Module 8: Natural language processing Pt 1Sara Hooker
 
Module 7: Unsupervised Learning
Module 7:  Unsupervised LearningModule 7:  Unsupervised Learning
Module 7: Unsupervised LearningSara Hooker
 
Module 6: Ensemble Algorithms
Module 6:  Ensemble AlgorithmsModule 6:  Ensemble Algorithms
Module 6: Ensemble AlgorithmsSara Hooker
 
Module 5: Decision Trees
Module 5: Decision TreesModule 5: Decision Trees
Module 5: Decision TreesSara Hooker
 
Module 4: Model Selection and Evaluation
Module 4: Model Selection and EvaluationModule 4: Model Selection and Evaluation
Module 4: Model Selection and EvaluationSara Hooker
 
Module 1.3 data exploratory
Module 1.3  data exploratoryModule 1.3  data exploratory
Module 1.3 data exploratorySara Hooker
 
Module 1.2 data preparation
Module 1.2  data preparationModule 1.2  data preparation
Module 1.2 data preparationSara Hooker
 
Storytelling with Data (Global Engagement Summit at Northwestern University 2...
Storytelling with Data (Global Engagement Summit at Northwestern University 2...Storytelling with Data (Global Engagement Summit at Northwestern University 2...
Storytelling with Data (Global Engagement Summit at Northwestern University 2...Sara Hooker
 
Delta Analytics Open Data Science Conference Presentation 2016
Delta Analytics Open Data Science Conference Presentation 2016Delta Analytics Open Data Science Conference Presentation 2016
Delta Analytics Open Data Science Conference Presentation 2016Sara Hooker
 

More from Sara Hooker (10)

Module 9: Natural Language Processing Part 2
Module 9:  Natural Language Processing Part 2Module 9:  Natural Language Processing Part 2
Module 9: Natural Language Processing Part 2
 
Module 8: Natural language processing Pt 1
Module 8:  Natural language processing Pt 1Module 8:  Natural language processing Pt 1
Module 8: Natural language processing Pt 1
 
Module 7: Unsupervised Learning
Module 7:  Unsupervised LearningModule 7:  Unsupervised Learning
Module 7: Unsupervised Learning
 
Module 6: Ensemble Algorithms
Module 6:  Ensemble AlgorithmsModule 6:  Ensemble Algorithms
Module 6: Ensemble Algorithms
 
Module 5: Decision Trees
Module 5: Decision TreesModule 5: Decision Trees
Module 5: Decision Trees
 
Module 4: Model Selection and Evaluation
Module 4: Model Selection and EvaluationModule 4: Model Selection and Evaluation
Module 4: Model Selection and Evaluation
 
Module 1.3 data exploratory
Module 1.3  data exploratoryModule 1.3  data exploratory
Module 1.3 data exploratory
 
Module 1.2 data preparation
Module 1.2  data preparationModule 1.2  data preparation
Module 1.2 data preparation
 
Storytelling with Data (Global Engagement Summit at Northwestern University 2...
Storytelling with Data (Global Engagement Summit at Northwestern University 2...Storytelling with Data (Global Engagement Summit at Northwestern University 2...
Storytelling with Data (Global Engagement Summit at Northwestern University 2...
 
Delta Analytics Open Data Science Conference Presentation 2016
Delta Analytics Open Data Science Conference Presentation 2016Delta Analytics Open Data Science Conference Presentation 2016
Delta Analytics Open Data Science Conference Presentation 2016
 

Recently uploaded

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 

Recently uploaded (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

Module 3: Linear Regression

  • 2. This course content is being actively developed by Delta Analytics, a 501(c)3 Bay Area nonprofit that aims to empower communities to leverage their data for good. Please reach out with any questions or feedback to inquiry@deltanalytics.org. Find out more about our mission here. Delta Analytics builds technical capacity around the world.
  • 4. Let’s do a quick review of module 2!
  • 5. How does a model learn from raw data? Task What is the problem we want our model to solve? Performance Measure Quantitative measure we use to evaluate the model’s performance. Learning Methodology ML algorithms can be supervised or unsupervised. This determines the learning methodology. All models have the following components: Source: Deep Learning Book - Chapter 5: Introduction to Machine Learning
  • 6. Classification task: No No Yes Yes Exercise 1 Predicted Approval for credit card Y Y* 12,000 60,000 11,000 200,000 Current Annual Income ($) Approval for credit card Yes No No Yes Outstanding Debt 200 60,000 0 10,000 What are the explanatory features? What is the outcome feature?
  • 7. Task Performance Measure Learning Experience Let’s fill in the blanks for our credit approval example:
  • 8. Task Should this customer be approved for a credit card? Performance Measure Log Loss Learning Experience Supervised Let’s fill in the blanks for our credit approval example:
  • 9. Regression task: 90% 30% 95% 50% Exercise 2 X Y Y* 10 2 12 0 Time spent A week studying machine learning Accuracy of classification model built by student 30% 60% 26% 88% Predicted Accuracy of classification model What are the explanatory features? What is the outcome feature?
  • 10. Task Performance Measure Learning Experience Let’s fill in the blanks for our studying example:
  • 11. Task How accurate is this student’s classification model? Performance Measure MSE Learning Experience Supervised Let’s fill in the blanks for our studying example:
  • 13. Course overview: Now let’s turn to the data we will be using... ✓ Module 1: Introduction to Machine Learning ✓ Module 2: Machine Learning Deep Dive ✓ Module 3: Linear Regression ❏ Module 4: Decision Trees ❏ Module 5: Ensemble Algorithms ❏ Module 6: Unsupervised Learning Algorithms ❏ Module 7: Natural Language Processing Part 1 ❏ Module 8: Natural Language Processing Part 2
  • 14. ❏ Linear regression ❏ Relationship between two variables (x and y) ❏ Formalizing f(x) ❏ Correlation between two variables ❏ Assumptions ❏ Feature engineering and selection ❏ Learning process: Loss function and Mean Squared Error ❏ Univariate regression, Multivariate regression ❏ Measures of performance (R2, Adjusted R2, MSE) ❏ Overfitting, Underfitting Module Checklist
  • 15. What is linear regression? Linear regression is a model that explains the relationship between explanatory features and an outcome feature as a line in two dimensional space.
  • 16. Linear regression has been in use since the 19th century & is one of the most important machine learning tools available to researchers. Many other models are built upon the logic of linear models. For example, the most simple form of deep learning model, with no hidden layers, is a linear model! Why is linear regression important? Source: History of Linear Regression - Introduction to Linear Regression Modeling Phase Performance Learning Methodology Task
  • 17. Linear regression: model cheat sheet ● Linear relationship between x and y ● Normal distribution of variables ● No multicollinearity (Independent variables) ● Homoscedasticity ● Rule of thumb: at least 20 observations per independent variable in the analysis ● Sensitive to outliers ● The world is not always linear; we often want to model more complex relationships ● Does not allow us to model interactions between explanatory features (we will be able to do this using a decision tree) ● Very popular model with intuitive, easy to understand results. ● Natural extension of correlation analysis Pros Cons Assumptions* * We will explain each of these assumptions in this module!
  • 18. Task What is the problem we want our model to solve? Defining f(x) What is f(x) for a linear model? Feature engineering & selection What is x? How do we decide what explanatory features to include in our model? Today we are looking closer at each component of the framework we discussed in the last class What assumptions does a linear regression make about the data? Do we have to transform the data? Is our f(x) correct for this problem?
  • 19. Learning Methodology Linear models are supervised, how does that affect the learning processing? What is our loss function? Every supervised model has a loss function it wants to minimize. Optimization process How does the model minimize the loss function. Learning methodology is how the linear regression model learns which line best fits the raw data. How does our ML model learn? Overview of how the model teaches itself.
  • 20. Performance Quantitative measure we use to evaluate the model’s performance. Measures of performance R2, Adjusted R2, MSE Feature Performance Statistical significance, p values There are linear models for both regression and classification problems. Overfitting, underfitting, bias, variance Ability to generalize to unseen data
  • 22. Task What is the problem we want our model to solve? Defining f(x) What is f(x) for a linear model? Feature engineering & selection What is x? How do we decide what explanatory features to include in our model? What assumptions does a linear regression make about the data? Do we have to transform the data? Is our f(x) correct for this problem?
  • 23. Regression Classification Continuous variable Categorical variable Recall our discussion of two types of supervised tasks, regression & classification. There are linear models for both. Today we will only discuss regression. Task Ordinary Least Squares (OLS) regression Logistic regression We will not cover this in the lecture, but we will provide resources for further study OLS is a linear regression method based on minimizing the sum of squared residuals
  • 24. We have plotted the total $ loaned by KIVA in Kenya each year. What is the trend? How would you summarize this trend in one sentence? OLS Regression Task A OLS regression is a trend line that predicts how much Y will change for a given change in x. Let’s try and break that down a little more by looking at an example.
  • 25. “Every year, the value of loans in KE appears to be increasing” A linear regression formalizes our perceived trend as a relationship between x and Y. It can be understood intuitively as a trend line. Every additional year corresponds to x additional dollars Kiva loans in KE. Human Intuition OLS Regression
  • 26. A linear model allows us to predict beyond our set of observations because for every point on the x axis we can find the corresponding Y* A linear model expresses the relationship between our explanatory features and our outcome features as a straight line. The output of f(x) for a linear model will always be a line. x Y* For example, we can now say for an x not in our scatterplot (May, 2012) what we predict the $ amount of loans is. Defining f(x)
  • 27. Is linear regression the right model for our data?
  • 28. A big part of being a machine learning researcher involves choosing the right model for the task. Each model makes certain assumptions about the underlying data. Let’s take a closer look at how this relates in linear regression. OLS Regression Task Is our f(x) correct for this problem?
  • 29. OLS Regression Task Before we choose a linear model, we need to make sure all the assumptions hold true in our data. Is our f(x) correct for this problem? ❏ Linear relationship between x and Y ❏ Normal distribution of variables ❏ No autocorrelation (Independent variables) ❏ Homoscedasticity ❏ No multicollinearity ❏ Rule of thumb: at least 20 observations per independent variable in the analysis OLS Linear Regression Assumptions
  • 30. OLS Regression Task Is there a linear relationship between x and Y? Is our f(x) correct for this problem? Source: Linear and non-linear relationships Linear regression assumes that there is a linear relationship between x and Y. If this is not true, our trend line will do a poor job of predicting Y.
  • 31. Source: Statistics - Normal distribution OLS Regression Task Is our data normally distributed?Is our f(x) correct for this problem? Normal distribution of explanatory and outcome features avoids distortion of results due to outliers or skewed data.
  • 32. OLS Regression Task Multicollinearity occurs when explanatory variables are highly correlated. Do we have multicollinearity? Is our f(x) correct for this problem? Multicollinearity introduces redundancy to the model and reduces our certainty in the results. We want no multicollinearity in our model. Province Country Province and county are highly correlated. We will want to include only one of these variables in our model. Age Loan Amount Age and loan amount appear to have no multicollinearity. We can include both in our model.
  • 33. OLS Regression Task Do we have homoscedasticity?Is our f(x) correct for this problem? Error term or “noise” is the same across all values of the outcome variables. If homoscedasticity does not hold, cases with a greater error term will have outsized influence on the regression. Data must be homoscedastic, meaning the error rate is evenly distributed at all outcome variable values. Good Not good Not good Source: Homoscedasticity
  • 34. OLS Regression Task Do we have autocorrelation?Is our f(x) correct for this problem? Autocorrelation is correlation between values of a variable and its delayed copy. For example, a stock’s price today is correlated with yesterday’s price Autocorrelation commonly occurs when you work with time series.
  • 35. OLS Regression Task In the coding lab, we will go over code that will help you determine whether a linear model provides the best f(x) Is our f(x) correct for this problem? ✓ Linear relationship between x and Y ✓ Normal distribution of variables ✓ No autocorrelation (Independent variables) ✓ Homoscedasticity ✓ Rule of thumb: at least 20 observations per independent variable in the analysis OLS Assumptions We have signed off on all of our assumptions, which means we can confidently choose a linear OLS model for this task. Let’s start building our model!
  • 36. OLS Regression Task Question! What happens if you don’t have all the assumptions? Is our f(x) correct for this problem? ✓ Linear relationship between x and Y ✓ Normal distribution of variables ✓ No autocorrelation (Independent variables) ✓ Homoscedasticity ✓ Rule of thumb: at least 20 observations per independent variable in the analysis OLS Assumptions If these assumptions do not hold true our trend line will not be accurate. We have a few options: 1) Transform our data to fulfill the assumptions 2) Choose a different model to capture the relationship between x and Y
  • 37. Yes! Linear regression is an appropriate model choice. Now what?
  • 38. True Y Y*=f(x)+e OLS Regression Task Defining f(x) Remember that all models involve a function f(x) that map an input x to a predicted Y (Y*). The goal of the function is to have a model that predicts Y* as close to true Y as possible. f(x) for a linear model is: Y*=a + bx + e What is f(x) for a linear regression model? Just like in algebra!
  • 39. Y* a OLS Regression Task Defining f(x) Y*=a + bx + e What does Y*=a + bx + e actually mean? Y bx + e The y-intercept of the regression line b The slope or gradient of the regression line. This determines how steep the line is and it’s directionality e The irreducible error term, error our model cannot reduce Y* The predicted Y output of our function a parameters
  • 40. How does our OLS model learn? There are an infinite number of possible lines in a two dimensional space. How does our model choose the best one? Learning Methodology Y X Linear regression is a learning algorithm. That is what makes it a machine learning model. It learns to find the best trend line from an infinite number of possibilities. How? First, we must understand what levers we control.
  • 41. Values that control the behavior of the model and are learnt through experience. In each model there are some things we cannot control, like e (irreducible error). A model may also have hyperparameters which are set beforehand and not trained using data (no need to think too much about this now). Y*=a + bx + e Parameters Hyperparameters Higher level settings of a model that are fixed before training begins. How does our OLS model learn? Parameters of a model are values that we control. The parameters in an OLS Model are a (intercept) and b (slope) Learning Methodology Our model can move a & b but not e.
  • 42. How does our OLS model learn? a and b are the only two levers our simple OLS model can change to get Y* closer to Y. Learning Methodology Y X Y*=a + bx + e How do I decide in what direction to change a and b? changing a shifts our line up or down the y intercept, +/- b changes the steepness of our line and direction
  • 43. How does our OLS model learn? We can change a and b to move our line in space. Below is some intuition about how changing a and b affects f(x). Learning Methodology Y*=a + bx + e + b means an upwards sloping line a and b are our two parameters. changing a shifts our line up or down the y intercept. Negative a moves the y-intercept below 0. - b means a downward sloping line b=0 means there is no relationship between x and Y Source: Intro to Stat - Introduction to Linear Regression
  • 44. How does our OLS model learn? The directionality of b is very important. It tells us if Y gets smaller or larger when we increase x. Learning Methodology Y*=a + bx + e + b is a positive slope - b is a negative slope Source: Beginning Algebra - Coordinate System and graphing lines
  • 46. Exercise 1 Learning Methodology Y X Y*=a + bx + e What happens if I increase a by 2?
  • 47. Exercise 1 Learning Methodology Y X Y*=a + bx + e What happens if I increase a by 2?
  • 48. Exercise 2 Learning Methodology Y X Y*=a + bx + e What happens if I increase b by 3?
  • 49. Exercise 2 Learning Methodology Y X Y*=a + bx + e What happens if I increase b by 3?
  • 50. What parameters give us the best trend line?
  • 51. What is our loss function? We make a decision about how to change our parameters based upon our loss function. Learning Methodology Y*=a + bx + e Let me try different values of a and b to minimize the total loss function. Recall: Our model starts with a random a and b, and our job is to change a and b in a way that moves Y* closer to the true Y. You are in fact trying to reduce the distance between Y* and true Y. We measure the distance using mean squared error. This is our loss function. All supervised models have a loss function (sometimes also known as the cost function) they must optimize by changing the model parameters.
  • 52. What is our loss function? For every a and b we try, we measure the mean squared error. Remember MSE? Learning Methodology This job isn’t done until I reduce MSE. Source: Intro to Stat -Introduction to Linear Regression Mean squared error is a measure of how close Y* is to Y. There are four steps to MSE: Y-Y* For every point in our dataset, measure the difference between true Y and predicted Y. ^2 Square each Y-Y* to get the absolute distance, so positive values don’t cancel out negative ones when we sum. Sum Sum across all observations so we get the total error. mean Divide the sum by the number of observations we have.
  • 53. What is our loss function? The green boxes in the chart below is the MSE for each data point. We sum and then take the mean across all data points to get the MSE. Learning Methodology Source: Seeing Theory - Regression Y-Y* For every point in our dataset, measure the difference between true Y and predicted Y. ^2 Square each Y-Y* to get the absolute distance, so positive values don’t cancel out negative ones when we sum. Sum Sum across all observations so we get the total error. mean Divide the sum by the number of observations we have.
  • 54. Optimization Process The process of changing a and b to reduce MSE is called learning. It is what makes OLS regression a machine learning algorithm. Learning Methodology Source: Seeing Theory - Regression For every combination of a and b we choose there is an associated MSE. The learning process involves updating a and b in order to reach the global minimum. The learning process for OLS is technically called learning by gradient descent. Optimization of MSE:
  • 55. Next, let’s look at model validation. What is our loss function? Optimization Process How does our OLS model learn? Learning Methodology Is our f(x) correct for this problem? OLS Regression Task Defining f(x)
  • 57. Validation is a process of evaluating your model performance. We have to evaluate a model on two critical aspects: 1. How close the model’s Y* is to true Y 2. How well the model performs in the real world (i.e., on unseen data). Common validation metrics: 1. R2 2. Adjusted R2 3. MSE (a loss function and a measure of performance) Thoroughly validating your model is crucial. Let’s take a look at these metrics…
  • 58. Mean Squared Error We have already introduced MSE. This serves as both a loss function and an evaluation of performance. R2 Explained Variation / Total Variation Increases as number of x’s increase. Adjusted R2 Explained Variation / Total Variation, adjusted for the number of features present in the model Measures of performance Performance For an OLS regression, we will introduce three measures of model performance: R2, Adjusted R2 and MSE. How did I do?
  • 59. R2 and Adjusted R2 answer the question, how much of y’s variation can be explained by our model? This reminds us of MSE, but R2 metrics are scaled to be between 0 and 1. Note: Adjusted R2 is preferable to R2 , because R2 increases as you include more features in your model. This may artificially inflate R2 . Measures of performance Performance True Y Y*=f(x)+e If Y* is explained variation, what’s left between Y* and Y is unexplained variation
  • 60. Source: This is a snippet of the output from Notebook 2. In a regression output, you can find R-squared metrics here:
  • 61. R2 can give us causation where correlation couldn’t! R-squared reminds us of correlation, but there is an important difference. Correlation measures the association between x and y. R-squared measures how much y is explained by x.
  • 62. Ability to generalize to unseen data Performance Now we know how well the model performs on the data we have, how do we predict how it will do in the real world? We need a way to quantify the way our model performs on unseen data. In an ideal world, we would go out and find new data to test our model on. However, this is often not realistic as we are constrained by time and resources. Instead of doing this, we can instead split our data into two portions: training and test data. We will use a portion of our data to train our model, and the rest of our data (which is “unseen” by the model, like real world data) to test our model.
  • 63. ● Randomly split data into “training” and “test” sets ● Use regression results from “training” set to predict“test” set ● Compare Predicted Y to Actual Y Source: Fortmann-Roe, Accurately Measuring Model Prediction Error, http://scott.fortmann-roe.com/docs/MeasuringError.html Ability to generalize to unseen data Performance We split our labelled data into training and test. The test represents our unseen future data. Predicted Y - Actual Y
  • 64. The test data is “unseen” in that the algorithm doesn’t use it to train the model! Ability to generalize to unseen data Performance We split our labelled data into training and test. The test represents our unseen future data. Predicted Y Using ~70% data Actual Y Using ~30% data
  • 65. It is important to clarify here that we are using train Y*- train Y to train the model, which is different than the test Y*- test Y we use to evaluate how well the model can generalize to unseen data. Ability to generalize to unseen data Performance Don’t confuse our use of loss functions with our use of test data! Training data Yields Our model (the red line) We apply the model to Test data We use train Y*- train Y in our loss functions that train the model. We use test Y* - test Y to see how well our model can be generalized to unseen data, or data that wasn’t used to train our model..
  • 66. Evaluating whether or not a model can be generalized to unseen data is very important - in fact, being able to predict it is often the whole point of creating a model in the first place. If we do not evaluate how well a model can generalize to outside data, there is a danger that we are creating a model that is too specific to our dataset - that is, a model that is GREAT at predicting this particular dataset, but is USELESS at predicting the real world. What does this look like? Ability to generalize to unseen data Performance Splitting train and test data is extremely important!
  • 67. Here, we are using wealth to predict happiness. The dotted lines are the models generated by ML algorithms. ● On the left, the model is not useful because it is too general and does not capture the relationship accurately. ● On the right, the model is not useful because not general enough and captures every single idiosyncrasy in the dataset. This means the model is too specific to this dataset, and cannot be generalized to different datasets. We want to have a model that is just right - it is accurate enough within the dataset, and is general enough to be applied outside of the dataset! Ability to generalize to unseen data Performance Splitting train and test data is extremely important! “underfitting” “overfitting”“just right”
  • 68. Bias is how accurate the model is at predicting the dataset. Variance is how sensitive the model is to small fluctuations in the training set. Ideally, we would have both low bias and low variance. Source: Fortmann-Roe, Accurately Measuring Model Prediction Error. http://scott.fortmann-roe.com/docs/MeasuringError.html Ability to generalize to unseen data Performance This concept of a model being “just right” is also called the bias-variance trade-off. “underfitting” “overfitting”“just right” terrible!
  • 69. Don’t worry about the specifics of the bias-variance trade-off for now - we will return to this concept regularly throughout the course and in-depth in the next module. Let’s turn to another aspect of performance that is very important: feature performance. Performance
  • 70. The performance of model components is just as important as the performance of the model itself Feature Importance Performance Understanding feature importance allows us to: 1) Quantify which feature is driving explanatory power in the model 2) Compare one feature to another 3) Guide final feature selection Evaluating the performance of features becomes important when we start using more sophisticated linear models where f(x) includes more than one explanatory variable. Let’s introduce that concept and then return here.
  • 71. Univariate Multivariate One explanatory variable Multiple explanatory variables. Deciding whether a univariate or multivariate regression is the best model for your problem is part of the task. OLS Task E.g. Trying to predict malaria using patient’s temperature, travel history, whether or not they have chills, nausea, headache. E.g. Trying to predict malaria using only patient’s temperature Univariate vs. Multivariate as f(x)
  • 72. Source: James, Witten, Hastie and Tibshirani. Introduction to Statistical Learning with Applications in R. Income = a + b1 (Seniority) + b2 (Years of Education) + e This is an example of linear regression with 2 explanatory variables in 3 dimensions. Extend this to n variables in n+1 dimensions. OLS Task Defining f(x) Many of the relationships we try and model are more complicated than a univariate regression. Instead, we use a multivariate regression.
  • 73. Regressions can also be non-linear! Source: [link] Miles walked per day = a + b1 (Age)2 + b2 (Age) + e In this example, we see that the relationship between your mobility and your age increases, then decreases after some point. This is an example of a non-linear regression, which is best explained by a quadratic equation. OLS Task Defining f(x) # miles walked per day age
  • 74. When we model using a multivariate regression, feature selection becomes an important step. What explanatory variables should we include? OLS Task Feature engineering & selection Qualitative Research Exploratory Analysis Using Other Models Analyzing Linear Regression Output Feature selection is often the difference between a project that fails and succeeds. Some common ways of doing feature selection: Literature review of past work done Sanity checks on data; human intuition of what would influence outcome Quantify feature importance (we’ll see this later with decision trees) Look at each feature’s coefficient and p-value
  • 75. Feature Importance Performance How do we assess feature importance in a linear regression? Each feature has a coefficient and a p-value.
  • 76. Each feature has a: 1. Coefficient 2. P-value The output of a linear regression is a model: The size of the coefficient = amount of influence that the feature has on y. Whether or not the feature is negative or positive is the direction of the relationship that the feature has with y. Feature Importance Performance How do we assess feature importance in a linear regression? Y* = intercept + coef*feature
  • 77. Feature Importance Each feature has a: 1. Coefficient 2. P-value A huge coefficient is great, but how much confidence we have in that coefficient is dependent on that coefficient’s p-value. In technical terms, the p-value is the probability of getting results as extreme as the ones observed, if the coefficient were actually zero. It answers the question, “Could I have gotten my result by random chance?” A small p-value (<= 0.05, or 5%) says that the result is probably not random chance - great news for our model! Performance How do we assess feature importance in a linear regression? Expressed as a % Heads or tails? It’s random!
  • 78. Feature Importance Performance How would you assess this feature using p-value and size of the coefficient?
  • 79. Feature Importance Performance How would you assess this feature using p-value and size of the coefficient? A person in the clothing sector will, on average, get a higher loan amount, but only by very little. I’m reasonably confident in this conclusion.
  • 80. Extrapolation is the act of inferring unknown values based on known data. Even validated algorithms are subject to irresponsible extrapolation! Source: https://xkcd.com/605/ Feature Importance Performance A few last thoughts... Linear regression has almost countless potential applications, provided we interpret carefully.
  • 81. “All models are wrong, some are useful.” - George E.P. Box, British statistician
  • 82. ✓ Linear regression ✓ Relationship between two variables (x and y) ✓ Formalizing f(x) ✓ Correlation between two variables ✓ Assumptions ✓ Feature engineering and selection ✓ Univariate regression, Multivariate regression ✓ Measures of performance (R2, Adjusted R2, MSE) ✓ Overfitting, Underfitting ✓ Learning process: Loss function and Mean Squared Error Module Checklist
  • 84. ● Textbooks ○ An Introduction to Statistical Learning with Applications in R (James, Witten, Hastie and Tibshirani): Chapters 2.1, 3, 4, 6 ○ The Elements of Statistical Learning: Data Mining, Inference, and Prediction (Hastie, Tibshirani, Friedman): Chapters 3, 4 ● Online resources ○ Analytics Vidhya’s guide to understanding regression: https://www.analyticsvidhya.com/blog/2015/08/comprehensive-guide-regression/ ○ Brown University’s introduction to probability and statistics, http://students.brown.edu/seeing-theory/ ● If you are interested in more sophisticated regression models, search: ○ Logistic regression, Polynomial regression, Interactions ● If you are interested in additional ways to solve multicollinearity, search: ○ Eigenvectors, Principal Components Analysis Want to take this further? Here are some resources we recommend:
  • 85. You are on fire! Go straight to the next module here. Need to slow down and digest? Take a minute to write us an email about what you thought about the course. All feedback small or large welcome! Email: sara@deltanalytics.org
  • 86. Congrats! You finished module 3! Find out more about Delta’s machine learning for good mission here.