2. Course Objectives:
The learning objectives of this course are:
1. Familiarity with a set of well-known supervised,
unsupervised and semi-supervised learning algorithms.
2. The ability to implement some basic machine learning
algorithms.
3. Understanding of how machine learning algorithms are
evaluated
3. Course Outcomes: On completion of this course, students can
1. Explain the fundamental usage of the concept Machine
Learning system.
2. Demonstrate on various regression Technique.
3. Analyze the Ensemble Learning Methods.
4. Illustrate the Clustering Techniques and Dimensionality
Reduction Models in Machine Learning.
5. Discuss the Neural Network Models and Fundamentals
concepts of Deep Learning.
4. Text(T) / Reference(R) Books
:
T1 Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd
Edition, O’Reilly Publications, 2019
T2 Data Science and Machine Learning Mathematical and Statistical
Methods,Dirk P. Kroese, Zdravko I. Botev, Thomas Taimre, Radislav
Vaisman,25th November 2020
R1 Understanding Machine Learning: From Theory to algorithms, Shai Shalev-
Shwartz, Shai Ben-David, Cambridge.
R2 Machine Learning Probabilistic Approach, Kevin P. Murphy, MIT Press,
2012
W1 https://www.tutorialspoint.com/what-is-machine-learning
W2 https://www.analyticsvidhya.com/machine-learning/
W3 https://www.youtube.com/watch?v=eq7KF7JTinU
5. What is Machine Learning?
“Machine learning … gives
computers the ability to learn
without being explicitly
programmed.” Arthur Samuel
6. What is Machine Learning?
• Tom Mitchell: Algorithms that
• improve their performance 𝑃
• at task 𝑇
• with experience 𝐸
• A well-defined machine learning
task is given by 𝑃, 𝑇, 𝐸
“A computer program is said to learn
from experience E with respect to
some class of tasks T and performance
measure P if its performance at tasks
in T, as measured by P, improves with
experience E.”
Tom Mitchell
7. Example: Game Playing
• Tom Mitchell: Algorithms that
• improve their performance 𝑃
• at task 𝑇
• with experience 𝐸
• 𝑇 = playing Checkers
• 𝑃 = win rate against opponents
• 𝐸 = playing games against itself
11. Example: Prediction
• Tom Mitchell: Algorithms that
• improve their performance 𝑃
• at some task 𝑇
• with experience 𝐸
• 𝑇 = predict Arctic sea ice extent
• 𝑃 = prediction error (e.g.,
absolute difference)
• 𝐸 = historical data
12. Machine Learning for Prediction
Data 𝑍 Machine learning
algorithm
Model 𝑓
New input
Predicted output
13. Example: Prediction
9.0
8.0
7.0
6.0
5.0
4.0
3.0
2.0
1.0
0.0
1975 1985 1995 2005 2015 2025
Arctic
Sea
Ice
Extent
(millions
of
sq
km)
Year
NSIDC Index of Arctic Sea Ice in September
Photo by NASA Goddard
??
Image:https://www.flickr.com/photos/gsfc/5937599688/
Data from https://nsidc.org/arcticseaicenews/sea-ice-tools/
14. Types of Learning
• Supervised learning
• Input: Examples of inputs and outputs
• Output: Model that predicts unknown output given a new input
• Unsupervised learning
• Input: Examples of some data (no “outputs”)
• Output: Representation of structure in the data
• Reinforcement learning
• Input: Sequence of interactions with an environment
• Output: Policy that performs a desired task
15. Supervised Learning
1. Cost of house depending on size of the house.
2. Regression and Classification
3. Drop a ball from a height, need to calculate distance travelled
16. Supervised Learning
Image:https://www.flickr.com/photos/gsfc/5937
35
899688/
Data from https://nsidc.org/arcticseaicenews/sea-ice-tools/
9.0
8.0
7.0
6.0
5.0
4.0
3.0
2.0
1.0
0.0
1975
• Given 𝑥1, 𝑦1 , … , 𝑥𝑛, 𝑦𝑛 , learn a function that predicts 𝑦 given 𝑥
• Regression: Labels 𝑦 are real-valued
NSIDC Index of Arctic Sea Ice in September
1985 1995 2005 2015 2025
Arctic
Sea
Ice
Extent
(millions
of
sq
km)
Year
Photo by NASA Goddard
17. Supervised Learning
• Given 𝑥1, 𝑦1 , … , 𝑥𝑛, 𝑦𝑛 , learn a function that predicts 𝑦 given 𝑥
• Classification: Labels 𝑦 are categories
Tumor Size
Ocular Tumor (Malignant / Benign)
𝑓(𝑥)
Predict Benign Predict Malignant
Malignant
Benign
18. Supervised Learning
• Given 𝑥1, 𝑦1 , … , 𝑥𝑛, 𝑦𝑛 , learn a function that predicts 𝑦 given 𝑥
• Inputs 𝑥 can be multi-dimensional (SVM will do it)
Tumor Size
Age
• Patient age
• Clump thickness
• Tumor Color
• Cell type
• …
23. Reinforcement Learning
• Learn how to perform a task from
interactions with the environment
• Examples:
• Autonomous vehicle
• Playing chess (interact with the game)
• Robot grasping an object (interact
with the object/real world)
• Optimize inventory allocations
(interact with the inventory system)
environment agent
25. 1. Explain Training dog
good dog, bad dog
2. Optimum way of
a. Earth Movers having two sticks in hand
b. Helicopter flying
if flies well
Reinforcement Learning
https://www.youtube.com/watch?v=LfmAG4dk-rU
https://www.youtube.com/watch?v=tF4DML7FIWk
26. Applications of Machine Learning
45
1.Cortana, Alexa
2.Traffic prediction
3.Judiciary
4.Medical
5.Every area
29. Main Challenges of Machine Learning
1. Insufficient Quantity of Training Data : Millions of data is
required
2. Nonrepresentative Training Data: Localised output because
of trained with localised data
3. Poor-Quality Data: Outliers and missing data
4. Irrelevant Features : Garbage In Garbage Out (our model is
“AWESOME” and we feed it with garbage data)
5. Overfitting the Training Data : Low Bias, High Variance
(Bias means training set error, Variance means test set
error)
6. Underfitting the Training Data : High Bias and High Variance
30.
31.
32.
33. Statistical learning.
Structuring and visualizing data are important aspects of data science
When the goal is to interpret the model and quantify the uncertainty
in the data, this analysis is usually referred to as statistical learning
There are two major goals for modeling data:
1) to accurately predict some future quantity of interest, given some
observed data, and
2) to discover unusual or interesting patterns in the data.
34. To achieve these goals, one must rely on knowledge from
three important pillars of the mathematical science
1. Function approximation:The most natural way to
represent the relationship between variables is via a
mathematical function or map.
Thus, data scientists have to understand how best to
approximate and represent functions using the least
amount of computer processing and memory.
Statistical learning cont.,
35. Statistical learning cont.,
2. Optimization:Given a class of mathematical models, we
wish to find the best possible model in that class.
This step usually requires knowledge of optimization
algorithms and efficient computer coding or programming.
3. Probability and statistics: The knowledge of probability and
statistics is needed to fit or train algorithm and generate a
model.
36.
37.
38.
39.
40.
41.
42.
43.
44.
45. Predictions: In machine learning, predictions are made
based on models or algorithms. These predictions are
estimations or forecasts of certain outcomes, denoted
as "ŷ" (y-hat).
Response y: This refers to the actual or observed
outcomes that are being predicted. In many cases, this
is the ground truth or the real data values
corresponding to the situations for which predictions
are being made. This is denoted as "y."
46. Loss function: A loss function, also known as a cost
function or objective function, is a mathematical
function that quantifies the difference between the
predicted values (ŷ) and the actual values (y). It
essentially measures how well or poorly the model is
performing. The goal is to minimize this function.
This loss function provides a quantitative
measure of the model's performance.
47. Mathematical function g: This refers to a model or an
algorithm represented by the mathematical function
g(x). In the context of predictions, g(x) would be used
to estimate or predict corresponding values y based on
given inputs x.
Accurate predictions: The term "accurate predictions"
implies that the mathematical function should provide
outputs that closely match the true outcomes observed
in the real world.
48. All possible pairs (x, y) in Nature: Nature is diverse and
complex.
Vast range of scenarios and data pairs encounter in the
natural world.
The possibility of finding a single mathematical
function that can accurately predict outcomes for
every conceivable combination of input and output
values.
49. When dealing with complex real-world data, it's
common to use a variety of models or a more
sophisticated approach, such as machine learning,
where the algorithm adapts and learns from the data to
make predictions
50. “Even with the same input x, the output y may be
different“.
Explain Weather forecasting example
(location, temperature, humidity, wind speed, and atmospheric pressure)
Meteorologists acknowledge that, even with
sophisticated models and accurate initial
measurements, perfect predictability in weather
forecasting is challenging due to the inherent
complexity and variability of atmospheric processes.
51. Deterministic and Probabilistic
We adopt a probabilistic approach and assume that
each pair (x, y) is the outcome of a random pair (X, Y)
that has some joint probability density f(x, y).
The relationship between the variables x and y is
treated as a result of a random process (X,Y).
52. We then assess the predictive performance
via the expected loss, usually called the risk
for g:
ℓ(g) = E Loss(Y, g(X)).
53. Our goal is thus to “learn” the unknown g∗ using
the n examples in the training set T.
Let us denote this as gT
The best approximation for g ∗ that we can construct
from T
54.
55. Tower property
Let's consider a sequence of random variables X, Y, and
Z. The tower property states:
E[E[X∣Y,Z]]=E[E[X∣Y]],
Tower property says that the expectation of the
conditional expectation of X given both Y and Z is equal
to the expectation of the conditional expectation of X
given only Y.
56. The learner learns the relationship with examples,
this type of learning is called supervised learning.
In contrast, Unsupervised learning makes no
distinction between response and explanatory
variables.
Objective is simply to learn the structure of the
unknown distribution of the data.
57. Debugging a learning algorithm
• You have built you awesome linear regression model predicting price
• Work perfectly on you testing data
Source: Andrew Ng
58. Debugging a learning algorithm
• You have built you awesome linear regression model predicting price
• Work perfectly on you testing data
• Then it fails miserably when you test it on the new data you collected
Source: Andrew Ng
59. Debugging a learning algorithm
• You have built you awesome linear regression model predicting price
• Work perfectly on you testing data
• Then it fails miserably when you test it on the new data you collected
• What to do now?
Source: Andrew Ng
60. Things You Can Try
Get more data
Try different features
Try tuning your hyperparameter
61. Things You Can Try
Get more data
Try different features
Try tuning your hyperparameter
But which should I try first?
62. Diagnosing Machine Learning System
Figure out what is wrong first
Diagnosing your system takes time, but it can
save your time as well
Ultimate goal: low generalization error
63. Diagnosing Machine Learning System
Figure out what is wrong first
Diagnosing your system takes time, but it can save your time as well
Ultimate goal: low generalization error
Source: reddit?
64. Diagnosing Machine Learning System
Figure out what is wrong first
Diagnosing your system takes time, but it can save your time as well
Ultimate goal: low generalization error
Source: reddit?
65. Problem: Fail to Generalize
Model does not generalize to unseen data
Fail to predict things that are not in training sample
Pick a model that has lower generalization error
68. Evaluate Your Hypothesis
Price
($)
Size (ft)
Price
($)
Size (ft)
Price
($)
Size (ft)
Underfit Overfit
Just right
What if the feature dimension is too high?
Source: Andrew Ng
69. Model Selection
Model does not generalize to unseen data
Fail to predict things that are not in training sample
Pick a model that has lower generalization error
70. Model Selection
Model does not generalize to unseen data
Fail to predict things that are not in training sample
Pick a model that has lower generalization error
How to evaluate generalization error?
71. Model Selection
Model does not generalize to unseen data
Fail to predict things that are not in training sample
Pick a model that has lower generalization error
How to evaluate generalization error?
Split your data into train, validation, and test set.
Use test set error as an estimator of generalization
error
73. Model Selection
Training error
Validation error
Test error
Procedure:
Step 1. Train on training set
Step 2. Evaluate validation error
Step 3. Pick the best model based on Step 2.
Step 4. Evaluate the test error
77. Linear Regression with Regularization
Price
($)
Size (ft)
Price
($)
Size (ft)
Price
($)
Size (ft)
Underfit
High bias
Too simple
Too much regularization
Overfit
High Variance
Too Complex
Too little regularization
Just right
Source: Andrew Ng
78. Bias / Variance Trade-off
Training error
Cross-validation error
Loss
Degree of Polynomial
Source: Andrew Ng
79. Bias / Variance Trade-off
Training error
Cross-validation error
Loss
Degree of Polynomial
High bias High Variance
92. Learning Curve
Does adding more data help?
More data is likely to help when your model has high variance
Price
($)
Size (ft)
Price
($)
Size (ft)
93. Things You Can Try
Get more data
When you have high variance
Try different features
Adding feature helps fix high bias
Using smaller sets of feature fix high variance
Try tuning your hyperparameter
Decrease regularization when bias is high
Increase regularization when variance is high