Presentation on
Statistical learning
Niladree Chowdhury
Outline
• Concept of statistical learning
• Statistical learning & machine learning
• Modelling
• Supervised vs Unsupervised learning
• Specific real life examples
Big Visual Data
Concept of Statistical Learning
• Statistical learning refers to the set of tools for
modelling and understanding complex datasets.
• It blends with parallel developments in computer
science in particular machine learning
• Both methods are data dependent. However,
Statistical Learning is based on a smaller dataset
with a few attributes, compared to Machine
Learning where it can learn from billions of
observations and attributes.
• Statistical Learning is mostly about inferences,
most of the idea is generated from the sample,
population, and hypothesis, while Machine
Learning emphasizes on predictions, supervised
learning, unsupervised learning.
Statistical Learning
• Suppose we observe and for
• We believe that there is a relationship between
Y and at least one of the X’s.
• We can model the relationship as
• Where f is an unknown function and ε is a
random error. Here f represents systematic
information x provides about y.
• Statistical learning refers to a set of
approaches for estimating f.
Xi = (Xi1,..., Xip )Yi
i =1,...,n
iii fY  )(X
Reasons of estimating
function
There are 2 reasons for estimating f,
• Prediction: If we can produce a good estimate
for f (and the variance of ε is not too large) we
can make accurate predictions for the
response, Y, based on a new value of X.
• Inference: We may also be interested in the
type of relationship between Y and the X's.
Supervised vs. Unsupervised
Learning
Statistical learning can be divided into two parts.
Supervised Learning:
• Supervised Learning is where both the predictors,
Xi, and the response, Yi, are observed.
• This is the situation you deal with in Linear
Regression classes
Unsupervised Learning:
In this situation only the Xi’s are observed.
• We need to use the Xi’s to guess what Y would
have been and build a model from there.
• A common example is market segmentation where
we try to divide potential customers into groups
based on their characteristics.
• A common approach is clustering
What’s Next?
Real life examples
• Income vs Years
of education
• Wage Data
• Boston Data
The dataset (Boston Housing Price) was taken from the StatLib library which
is maintained at Carnegie Mellon University and is freely available for
download from the UCI Machine Learning Repository. The dataset consists of
506 observations of 14 attributes. The median value of house price in
$1000s, denoted by MEDV, is the outcome or the dependent variable in our
model.
10 20 30
1020304050
lstat
medv
4 5 6 7 8
1020304050
rm
medv
14 16 18 20 22
1020304050
ptratio
medv
Thank you

Introduction to statistical learning - Some basics

  • 1.
  • 2.
    Outline • Concept ofstatistical learning • Statistical learning & machine learning • Modelling • Supervised vs Unsupervised learning • Specific real life examples
  • 3.
  • 4.
    Concept of StatisticalLearning • Statistical learning refers to the set of tools for modelling and understanding complex datasets. • It blends with parallel developments in computer science in particular machine learning • Both methods are data dependent. However, Statistical Learning is based on a smaller dataset with a few attributes, compared to Machine Learning where it can learn from billions of observations and attributes.
  • 5.
    • Statistical Learningis mostly about inferences, most of the idea is generated from the sample, population, and hypothesis, while Machine Learning emphasizes on predictions, supervised learning, unsupervised learning.
  • 6.
    Statistical Learning • Supposewe observe and for • We believe that there is a relationship between Y and at least one of the X’s. • We can model the relationship as • Where f is an unknown function and ε is a random error. Here f represents systematic information x provides about y. • Statistical learning refers to a set of approaches for estimating f. Xi = (Xi1,..., Xip )Yi i =1,...,n iii fY  )(X
  • 7.
    Reasons of estimating function Thereare 2 reasons for estimating f, • Prediction: If we can produce a good estimate for f (and the variance of ε is not too large) we can make accurate predictions for the response, Y, based on a new value of X. • Inference: We may also be interested in the type of relationship between Y and the X's.
  • 8.
    Supervised vs. Unsupervised Learning Statisticallearning can be divided into two parts. Supervised Learning: • Supervised Learning is where both the predictors, Xi, and the response, Yi, are observed. • This is the situation you deal with in Linear Regression classes Unsupervised Learning: In this situation only the Xi’s are observed. • We need to use the Xi’s to guess what Y would have been and build a model from there. • A common example is market segmentation where we try to divide potential customers into groups based on their characteristics. • A common approach is clustering
  • 9.
  • 10.
    Real life examples •Income vs Years of education • Wage Data
  • 11.
    • Boston Data Thedataset (Boston Housing Price) was taken from the StatLib library which is maintained at Carnegie Mellon University and is freely available for download from the UCI Machine Learning Repository. The dataset consists of 506 observations of 14 attributes. The median value of house price in $1000s, denoted by MEDV, is the outcome or the dependent variable in our model. 10 20 30 1020304050 lstat medv
  • 12.
    4 5 67 8 1020304050 rm medv 14 16 18 20 22 1020304050 ptratio medv
  • 13.