1. FUNDAMENTALS OF MACHINE
LEARNING (ML)
Alemu Kumilachew
Facultiy of Computing, Bahir Dar University Instituite
of Technology (BiT), Bahir Dar, Ethiopia
alemupilatose@gmail.com
1
2. OUTLINES OF THE COURSE
2
1 • Introduction to Machine Learning
2 • Concepts of Learning and its process
3 • Types of Learning and Machine learning methods
4
• Model Building
5
• Evaluation
6 • Applications & Current trends in machine learning
3. REFERENCES
TEXT BOOKS:
1) Ethem Alpaydin, ”Introduction to Machine Learning”, MIT Press,
Prentice Hall of India, 3rd Edition2014.
2) Mehryar Mohri, Afshin Rostamizadeh, Ameet Talwalkar ” Foundations
of Machine Learning”, MIT Press,2012.
3) MACHINE LEARNING - An Algorithmic Perspective, Second Edition,
Stephen Marsland, 2015.
4) Tom Mitchell, “Machine Learning”, McGraw Hill, 3rdEdition, 1997.
REFERENCE BOOKS:
1) CharuC.Aggarwal,“Data Classification Algorithms and Applications”,
CRCPress, 2014.
2) Charu C. Aggarwal, “DATA CLUSTERING Algorithms and
Applications”, CRC Press, 2014.
3) Kevin P. Murphy ”Machine Learning: A Probabilistic Perspective”, The
MIT Press, 2012
4) Jiawei Han and Micheline Kambers and JianPei, “Data Mining Concepts
andTechniques”,3rd edition, Morgan Kaufman Publications, 2012. 3
5. OBJECTIVES OF LEARNING ML?
The main objectives of learning machine learning is to
know and understand:
What are the different types in machine learning?
What are the different algorithms available for developing
machine learning models?
What tools are available for developing these models?
What are the programming language choices?
What platforms support development and deployment of
Machine Learning applications?
What IDEs (Integrated Development Environment) are
available?
How to quickly upgrade your skills in this important area? 5
7. AI VS. MACHINE LEARNING – “LEARN”
Machine learning (ML) is a subset of artificial
intelligence (AI), that is all about getting an AI to
accomplish tasks without being given specific
instructions. In essence, it’s about teaching
machines how to learn!
7
8. AI VS. MACHINE LEARNING – “LEARNING”
AI is simulated human cognition, that is supposed to do
via learning!
What is learning?
Were we born with PhD level intelligence?
8
Of course not!
At the beginning of our lives, we have little
understanding of the world around us, but over
time we grow to learn a lot. We use our senses to
take in data, and learn via a combination of interacting
with the world around us, being explicitly taught
certain things by others, finding patterns over time,
and, of course, lots of trial-and-error
“Learning is any process by which a system improves
performance from experience.”
- Herbert Simon
9. AI VS. MACHINE LEARNING – “LEARNING”
AI learns in a similar way. When it’s first created, an
AI knows nothing; ML gives AI the ability to learn
about its world.
AI is all about allowing a system to learn from
examples rather than instructions. ML is what
makes that possible.
9
10. AI VS. MACHINE LEARNING – “LEARNING”
10
AIs are taught, not explicitly programmed. In other words,
instead of spelling out specific rules to solve a problem, we
give them examples of what they will encounter in the real
world and let them find the patterns themselves. Allowing
machines to find patterns is beneficial over spelling out the
instructions when the instructions are hard or unknown or
when the data has many different variables, for example
treating cancer, predicting the stock market.
11. WHAT IS MACHINE LEARNING?
Arthur Samuel (at IBM )- coined the term “Machine
Learning” in 1959 at the first time .
He defined machine learning as:
11
No universally accepted definition for ML.
Different authors define the term differently.
“the field of study that gives computers the ability to
learn without being explicitly programmed.”
12. DEFINITION OF ML
Machine learning is a branch of artificial intelligence (AI) and
computer science which focuses on the use of data and algorithms to
imitate the way that humans learn, gradually improving its accuracy.
12
Machine learning is programming computers to optimize a
performance criterion using example data or past experience.
ML creates a model defined up to some parameters, and learning is
the execution of a computer program to optimize the parameters of the
model using the training data or past experience. The model may be
predictive to make predictions in the future, or descriptive to gain
knowledge from data, or both.
A model:
is a compressed version of a database;
extracts knowledge from it;
does not have perfect performance but is a useful approximation to the
data.
Machine learning (ML) is defined as a discipline of artificial intelligence
(AI) that provides machines the ability to automatically learn from data
and past experiences to identify patterns and make predictions with
minimal human intervention.
13. DEFINITION OF ML?
Definition by Tom Mitchell (1998):
Machine Learning is the study of algorithms that
improve their performance P
at some task T
with experience E.
A well-defined learning task is given by <P,T,E>
13
A computer program which learns from experience is
called a machine learning program or simply a learning
program. Such a program is sometimes also referred to
as a learner
14. WHAT IS MACHINE LEARNING?- WHY “MACHINE LEARNING”?
1. “Big data” - models are based on huge amounts of data which is
being produced and stored continuously
science: genomics, astronomy, materials science, particle accelerators. . .
sensor networks: weather measurements, traffic. . .
people: social networks, blogs, mobile phones, purchases, bank
transactions. . . etc
2. Data is not random; it contains structure that can be used to
predict outcomes, or gain knowledge in some way.
Ex: patterns of Amazon purchases can be used to recommend items.
3. It is more difficult to design algorithms for such tasks
(compared to, say, sorting an array or calculating a payroll). Such
algorithms need data.
Ex: construct a spam filter, using a collection of email messages labelled as
spam/not spam.
4. Learning isn’t always useful:
There is no need to “learn” to calculate payroll
5. Data mining – extracting useful knowledge/insights from data
Ex: Data mining is designed to extract the rules via the application of
ML methods from large databases.
14
15. WHAT IS MACHINE LEARNING?- WHY “MACHINE LEARNING”?
Example #1:
A classic example of a task that requires machine
learning: It is very hard to say what makes a 2
15
16. WHAT IS MACHINE LEARNING?- WHY “MACHINE LEARNING”?
Example #2: House price prediction
After plotting various data points on the XY plot, we draw a best-fit line
to do our predictions for any other house given its size. You will feed
the known data to the machine and ask it to find the best fit line. Once
the best fit line is found by the machine, you will test its suitability by
feeding in a known house size, i.e. the Y-value in the above curve. The
machine will now return the estimated X-value, i.e. the expected price
of the house.
16
17. WHAT IS MACHINE LEARNING?- WHY “MACHINE LEARNING”?
Example #3:
Some more examples of tasks that are best solved by
using a learning algorithm
Recognizing patterns:
Facial identities or facial expressions
Handwritten or spoken words
Medical images
Generating patterns:
Generating images or motion sequences
Recognizing anomalies:
Unusual credit card transactions
Unusual patterns of sensor readings in a nuclear power plant
Prediction:
Future stock prices or currency exchange rates
17
20. WHAT IS MACHINE LEARNING?- TRADITIONAL
PROGRAMMING VS. MACHINE LEARNING
20
21. WHAT IS MACHINE LEARNING?- WHEN DO WE USE
MACHINE LEARNING?
ML is used when:
Human expertise does not exist (navigating on Mars)
Humans can’t explain their expertise (speech recognition)
Models must be customized (personalized medicine)
Solution needs to be adapted to particular cases (user biometrics)
Models are based on huge amounts of data (genomics)
Solution changes in time (routing on a computer network)
21
22. STATE OF THE ART APPLICATIONS OF MACHINE
LEARNING
Autonomous cars
22
23. STATE OF THE ART APPLICATIONS OF MACHINE
LEARNING
Autonomous car sensors
23
24. STATE OF THE ART APPLICATIONS OF MACHINE
LEARNING
Autonomous car technologies
24
25. STATE OF THE ART APPLICATIONS OF MACHINE
LEARNING
Deep learning emergence
25
26. STATE OF THE ART APPLICATIONS OF MACHINE
LEARNING
Deep Belief Net on Face Images
26
27. STATE OF THE ART APPLICATIONS OF MACHINE
LEARNING
Learning of Object Parts
27
28. STATE OF THE ART APPLICATIONS OF MACHINE
LEARNING
Training on Multiple Objects
28
29. STATE OF THE ART APPLICATIONS OF MACHINE
LEARNING
Automatic Speech recognition systems
29
30. STATE OF THE ART APPLICATIONS OF MACHINE
LEARNING
Speech technologies
30
31. HISTORY OF ML
1950s
– Samuel’s checker player
– Selfridge’s Pandemonium
1960s:
– Neural networks: Perceptron
– Pattern recognition
– Learning in the limit theory
– Minsky and Papert prove limitationsof Perceptron
1970s:
– Symbolic concept induction
– Winston’s arch learner
– Expert systems and the knowledge acquisition bottleneck
– Quinlan’s D3
– Michalski’s AQ and soybean diagnosis
– Scientific discovery with BACON
– Mathematical discovery with AM 31
32. HISTORY OF ML…
1980s:
– Advanced decision tree and rule learning
– Explanation-based Learning (EBL)
– Learning and planning and problem solving
– Utility problem
– Analogy
– Cognitive architectures
– Resurgence of neural networks (connectionism, backpropagation)
– Valiant’s PAC Learning Theory
– Focus on experimental methodology
1990s
– Data mining
– Adaptive software agents and web applications
– Text learning
– Reinforcement learning (RL)
– Inductive Logic Programming (ILP)
– Ensembles: Bagging, Boosting, and Stacking
– Bayes Net learning 32
33. HISTORY OF ML…
2000s
– Support vector machines & kernel methods
– Graphical models
– Statistical relational learning
– Transfer learning
– Sequence labeling
– Collective classification and structured outputs
– Computer Systems Applications (Compilers, Debugging, Graphics,
Security)
– E-mail management
– Personalized assistants that learn
– Learning in robotics and vision
2010s
– Deep learning systems
– Learning for big data
– Bayesian methods
– Multi-task & lifelong learning
– Applications to vision, speech, social networks, learning to read, etc
– ???
33
34. APPLICATION OF MACHINE LEARNING
34
The following is a list of some of the typical applications of machine learning.
1. In retail business, machine learning is used to study consumer behaviour.
2. In finance, banks analyze their past data to build models to use in credit
applications, fraud detection, and the stock market.
3. In manufacturing, learning models are used for optimization, control, and
troubleshooting. 3
4. In medicine, learning programs are used for medical diagnosis.
5. In telecommunications, call patterns are analyzed for network optimization and
maximizing the quality of service.
6. In science, large amounts of data in physics, astronomy, and biology can only be
analyzed fast enough by computers. The World Wide Web is huge; it is constantly
growing and searching for relevant information cannot be done manually.
7. In artificial intelligence, it is used to teach a system to learn and adapt to
changes so that the system designer need not foresee and provide solutions for all
possible situations.
8. It is used to find solutions to many problems in vision, speech recognition, and
robotics.
9. Machine learning methods are applied in the design of computer-controlled
vehicles to steer correctly when driving on a variety of roads.
10. Machine learning methods have been used to develop programmes for playing
games such as chess, backgammon and Go
35. CHAPTER SUMMARY
Learning can be viewed as using direct or indirect
experience to approximate a chosen target function.
Learning general models from a data of particular examples
Data is cheap and abundant (data warehouses, data
marts); knowledge is expensive and scarce.
Example in retail: Customer transactions to consumer
behavior:
People who bought “Da Vinci Code” also bought “The Five
People You Meet in Heaven” (www.amazon.com)
Machine Learning builds a model that is a good and
useful approximation to the data.
35
37. LEARNING
Definition
A computer program is said to learn from experience E
with respect to some class of tasks T and performance
measure P, if its performance at tasks T, as measured by
P, improves with experience E.
37
38. LEARNING …
Examples: defining a learning task
I. Handwriting recognition learning problem
T: Recognizing and classifying handwritten words within images
P: Percent of words correctly classified
E: A dataset of handwritten words with given classifications
II. A robot driving learning problem
T: Driving on highways using vision sensors
P: Average distance traveled before an error
E: A sequence of images and steering commands recorded
while observing a human driver
III. A chess learning problem
T: Playing chess
P: Percent of games won against opponents
E: Playing practice games against itself.
IV. Spam filtering
T: Categorize email messages as spam or legitimate.
P: Percentage of email messages correctly classified.
E: Database of emails, some with human-given label 38
39. COMPONENTS OF LEARNING
Basic components of learning process
The learning process, whether by a human or a machine, can
be divided into four components, namely, data storage,
abstraction, generalization and evaluation
Fig. Components of the learning process
39
40. COMPONENTS OF LEARNING PROCESS
Data storage (1)
Facilities for storing and retrieving huge amounts of data are
an important component of the learning process. Humans and
computers alike utilize data storage as a foundation for
advanced reasoning.
In a human being, the data is stored in the brain and data is retrieved
using electrochemical signals.
Computers use hard disk drives, flash memory, random access
memory and similar devices to store data and use cables and other
technology to retrieve data.
40
41. COMPONENTS OF LEARNING PROCESS …
Abstraction (2)
The second component of the learning process is known as
abstraction.
Abstraction is the process of extracting knowledge about stored
data. This involves creating general concepts about the data as a
whole. The creation of knowledge involves application of known
models and creation of new models.
The process of fitting a model to a dataset is known as training.
When the model has been trained, the data is transformed into an
abstract form that summarizes the original information
41
42. COMPONENTS OF LEARNING PROCESS …
Generalization (3)
The third component of the learning process is known as
generalization.
The term generalization describes the process of turning the
knowledge about stored data into a form that can be utilized for
future action.
These actions are to be carried out on tasks that are similar, but
not identical, to those what have been seen before.
In generalization, the goal is to discover those properties of the
data that will be most relevant to future tasks
42
43. COMPONENTS OF LEARNING PROCESS …
Evaluation (4)
Evaluation is the last component of the learning process. It is the
process of giving feedback to the user to measure the utility of the
learned knowledge.
This feedback is then utilized to effect improvements in the whole
learning process
43
44. LEARNING MODELS
Machine learning is concerned with using the right features to
build the right models that achieve the right tasks.
For a given problem, the collection of all possible outcomes
represents the sample space or instance space.
The basic idea of Learning models has divided into three
categories.
Using a Logical expression. (Logical models)
Using the Geometry of the instance space. (Geometric models)
Using Probability to classify the instance space. (Probabilistic
models)
Grouping and Grading (an orthogonal categorization to
geometric-probabilistic-logical-compositional) 44
45. LEARNING MODELS : LOGICAL MODELS
Logical models use a logical expression to divide the instance
space into segments and hence construct grouping models.
A logical expression is an expression that returns a Boolean
value, i.e., a True or False outcome.
Once the data is grouped using a logical expression, the data
is divided into homogeneous groupings for the problem we
are trying to solve.
For example, for a classification problem, all the instances in
the group belong to one class.
45
46. LEARNING MODELS : LOGICAL MODELS …
There are mainly two kinds of logical models: Tree models
and Rule models.
Rule models consist of a collection of implications or IF-THEN
rules.
For tree-based models, the ‘if-part’ defines a segment and the
‘then-part’ defines the behaviour of the model for this segment.
Rule models follow the same reasoning.
logical models, such as decision trees, a logical expression is
used to partition the instance space. Two instances are similar
when they end up in the same logical segment.
46
47. LEARNING MODELS : LOGICAL MODELS …
Example:
“Enjoy Sport” as shown above is defined by a set of data from some example days. Each data is
described by six attributes. The task is to learn to predict the value of Enjoy Sport for an
arbitrary day based on the values of its attribute values. The problem can be represented by a
series of hypotheses. Each hypothesis is described by a conjunction of constraints on the
attributes. The training data represents a set of positive and negative examples of the target
function. In the example above, each hypothesis is a vector of six constraints, specifying the
values of the six attributes – Sky, AirTemp, Humidity, Wind, Water, and Forecast. The training
phase involves learning the set of days (as a conjunction of attributes) for which Enjoy Sport =
yes.
Thus, the problem can be formulated as:
Given instances X which represent a set of all possible days, each described by the attributes:
o Sky – (values: Sunny, Cloudy, Rainy),
o AirTemp – (values: Warm, Cold),
o Humidity – (values: Normal, High),
o Wind – (values: Strong, Weak),
o Water – (values: Warm, Cold),
o Forecast – (values: Same, Change).
Q. Try to identify a function that can predict the target variable Enjoy Sport as yes/no, i.e., 1 or 0.
47
48. LEARNING MODELS : GEOMETRIC MODELS …
In Geometric models, features could be described as points in
two dimensions (x- and y-axis) or a three-dimensional space
(x, y, and z).
for example, temperature as a function of time can be modelled in
two axes
In geometric models, there are two ways we could impose
similarity.
We could use geometric concepts like lines or planes to segment
(classify) the instance space. These are called Linear models.
Alternatively, we can use the geometric notion of distance to
represent similarity. In this case, if two points are close together,
they have similar values for features and thus can be classed as
similar. We call such models as Distance-based models.
48
49. LEARNING MODELS : GEOMETRIC MODELS
Linear models
Linear models are relatively simple. In this case, the function is
represented as a linear combination of its inputs.
In the simplest case where f(x) represents a straight line, we have
an equation of the form f (x) = mx + c where c represents the
intercept and m represents the slope.
49
Linear models are parametric, which means
that they have a fixed form with a small
number of numeric parameters that need to be
learned from data. For example, in f (x) = mx
+ c, m and c are the parameters that we are
trying to learn from the data. This technique is
different from tree or rule models, where the
structure of the model (e.g., which features to
use in the tree, and where) is not fixed in
advance.
50. LEARNING MODELS : GEOMETRIC MODELS
Distance-based models
As the name implies, distance-based models work on the concept of
distance. In the context of Machine learning, the concept of distance is
not based on merely the physical distance between two points.
The distance metrics commonly used are Euclidean & Manhattan
distance
50
51. LEARNING MODELS : GEOMETRIC MODELS
Distance-based models
Distance is applied through the concept of neighbors and exemplars.
Neighbors are points in proximity with respect to the distance measure
expressed through exemplars.
Exemplars are either centroids that find a center of mass according to a chosen
distance metric or medoids that find the most centrally located data point.
The most commonly used centroid is the arithmetic mean, which
minimizes squared Euclidean distance to all other points.
Notes:
The centroid represents the geometric center of a plane figure, i.e., the arithmetic
mean position of all the points in the figure from the centroid point. This
definition extends to any object in n-dimensional space: its centroid is the mean
position of all the points.
Medoids are similar in concept to means or centroids. Medoids are most
commonly used on data when a mean or centroid cannot be defined. They are
used in contexts where the centroid is not representative of the dataset, such as
in image data.
Examples of distance-based models include the nearest-neighbour
models, which use the training data as exemplars – for example, in
classification. The K-means clustering algorithm also uses exemplars to
create clusters of similar data points.
51
52. LEARNING MODELS : PROBABILISTIC MODELS
Probabilistic models use the idea of probability to classify new
entities.
Probabilistic models see features and target variables as random
variables. The process of modelling represents and manipulates the
level of uncertainty with respect to these variables.
There are two types of probabilistic models: Predictive and
Generative.
Predictive probability models use the idea of a conditional probability
distribution P (Y |X) from which Y can be predicted from X.
Generative models estimate the joint distribution P (Y, X). Once we know
the joint distribution for the generative models, we can derive any
conditional or marginal distribution involving the same variables. Thus,
the generative model is capable of creating new data points and their
labels, knowing the joint probability distribution. The joint distribution
looks for a relationship between two variables. Once this relationship is
inferred, it is possible to infer new data points. 52
53. LEARNING MODELS : PROBABILISTIC MODELS
Naïve Bayes
Naïve Bayes is an example of a probabilistic classifier. We can do
this using the Bayes rule defined as
The Naïve Bayes algorithm is based on the idea of Conditional
Probability. Conditional probability is based on finding the
probability that something will happen, given that something else
has already happened. The task of the algorithm then is to look at
the evidence and to determine the likelihood of a specific class
and assign a label accordingly to each entity.
53
54. SUMMARY OF LEARNING MODELS
logical models use a logical expression to partition the instance space
Geometric(such as distance-based models) uses the idea of distance
(e.g., Euclidian distance) to classify entities
probabilistic models use the idea of probability to classify new entities.
54
Learning models
Geometric models
K-nearest neighbors,
linear regression,
support vector
machine, logistic
regression, …
Probabilistic
Naïve Bayes,
Gaussian process
regression, conditional
random field, …
Logical models
Decision tree, random
forest, …
55. DESIGNING A LEARNING SYSTEM
For any learning system, we must be knowing the three elements — T
(Task), P (Performance Measure), and E (Training Experience).
At a high level, the process of learning system looks as below.
55
56. DESIGNING A LEARNING SYSTEM
The learning process starts with task T, performance measure P and
training experience E and objective are to find an unknown target
function.
The target function is an exact knowledge to be learned from the
training experience and its unknown.
For example, in a case of credit approval, the learning system will
have customer application records as experience and task would be
to classify whether the given customer application is eligible for a
loan.
So in this case, the training examples can be represented as 8
(x1,y1)(x2,y2)..(xn,yn) where X represents customer application
details and y represents the status of credit approval.
With these details, what is that exact knowledge to be learned
from the training experience?
So the target function to be learned in the credit approval learning
system is a mapping function f:X →y. This function represents the
exact knowledge defining the relationship between input variable
X and output variable y.
56
57. DESIGNING A LEARNING SYSTEM
Just now we looked into the learning process and also understood the goal
of the learning. When we want to design a learning system that follows the
learning process, we need to consider a few design choices. The design
choices will be to decide the following key components
1. Choose the training experience
2. Choose exactly what is to be learned (the target function)
– i.e. the target function
3. Choose how to represent the target function
4. Choose a learning algorithm to infer the target function from the
experience
5. The final design
57
58. DESIGNING A LEARNING SYSTEM
Example:
We will look into the game - checkers learning problem
and apply the above design choices.
For a checkers learning problem, the three elements will
be,
1. Task T: To play checkers
2. Performance measure P: Total percent of the game won in the
tournament.
3. Training experience E: A set of games played against itself
58
60. SUPERVISED LEARNING: OVERVIEW
Labels are provided
SL is also called learning from exemplars.
Supervised learning is a type of machine learning that uses
labeled data to train machine learning models. In labeled
data, the output is already known. The model just needs to
map the inputs to the respective outputs.
Supervised machine learning algorithm works by using and
analyzing the labeled training data and produces/builds a
function/model, which can be used for mapping new examples (the
class labels for unseen instances) to its target outputs.
SL has this form:
Given (x1, y1), (x2, y2), ..., (xn, yn)
The algorithm learns a function f(x) to predict y given x. 60
61. SUPERVISED LEARNING: OVERVIEW
Example#1:
Suppose the data consisting of the gender and age of the
patients and each patient is labeled as “healthy” or “sick”.
Q. What will be the role of the supervised machine learning
algorithm in the above example?
61
A. Therefore the purpose of a supervised machine learning
algorithm here is to learn/train the above data and a build a
function/model that identifies any new/unseen patient as
“sick” or “healthy” based on his age and gender parameters.
62. SUPERVISED LEARNING: OVERVIEW
Example#2:
An example of supervised learning is to train a system that
identifies the image of an animal.
62
63. SUPERVISED LEARNING: WHY “SUPERVISED LEARNING”?
Supervised Learning methods need external supervision to
train machine learning models. They need guidance and
additional information to return the desired result.
It can be thought of as a teacher supervising the learning
process. We know the correct answers (that is, the correct
outputs), the algorithm iteratively makes predictions on the
training data and is corrected by the teacher. Learning stops
when the algorithm achieves an acceptable level of
performance.
63
64. SUPERVISED LEARNING: TYPES SL PROBLEMS
Classification and regression problems are the most
common types of supervised learning problems.
64
65. SUPERVISED LEARNING: CLASSIFICATION
Classification: the labels to be predicted are categorical:
Works by pattern recognition
Face recognition:
Optical character recognition: different styles, slant. . .
Credit scoring: classify customers
into high- and low-risk, based
on their income and savings,
using data about past loans
(whether they were paid or not).
65
Model: IF income > θ1 AND savings > θ2
THEN low-risk ELSE high-risk
66. SUPERVISED LEARNING: CLASSIFICATION …
Given (x1, y1), (x2, y2), ..., (xn, yn)
Learn a function f(x) to predict y given x
– y is categorical == classification
66
67. SUPERVISED LEARNING: REGRESSION …
Regression: the labels to be predicted are continuous
Given (x1, y1), (x2, y2), ..., (xn, yn)
Learn a function f(x) to predict y given x
– y is real-valued == regression
67
68. SUPERVISED LEARNING: REGRESSION …
Example:
Credit scoring: classify customers into high- and low-risk, based on
their income and savings, using data about past loans (whether they
were paid or not).
Predict the price of a car from its mileage.
68
69. SUPERVISED LEARNING: ALGORITHMS
A wide range of supervised learning algorithms are
available, each with its strengths and weaknesses. There is
no single learning algorithm that works best on all
supervised learning problems
Some of the most popularly used supervised learning
algorithms are:
Linear Regression
Logistic Regression
Support Vector Machine
K Nearest Neighbor
Decision Tree
Random Forest
Naive Bayes
69
70. SUPERVISED LEARNING: APPLICATIONS
Supervised learning algorithms are generally used
for solving classification and regression problems.
Few of the top supervised learning applications are
weather prediction, sales forecasting, stock price
analysis.
70
71. UNSUPERVISED LEARNING
Unsupervised learning is a type of machine learning that uses
unlabeled data to train machines and works by finding patterns
and understands the trends in the data to discover the output. So,
the model tries to label the data based on the features of the input
data.
In unsupervised learning algorithms, a classification or
categorization (labels/classes) is not included in the observations.
But instead the algorithm tries to identify similarities between the
inputs so that inputs that have something in common are
categorized together.
The training process used in “unsupervised learning” techniques
does not need any supervision to build models. They learn on
their own and predict the output.
71
72. UNSUPERVISED LEARNING …
Example #1
Here, we have taken an unlabeled input data, which means it is not categorized
and corresponding outputs are also not given. Now, this unlabeled input data is
fed to the machine learning model in order to train it. Firstly, it will interpret
the raw data to find the hidden patterns from the data and then will apply
suitable algorithms such as k-means clustering, Decision tree, etc. Once it
applies the suitable algorithm, the algorithm divides the data objects into
groups according to the similarities and difference between the objects.
72
73. UNSUPERVISED LEARNING …
Example #2:
Depicted below is an example of an unsupervised learning
technique that uses the images of vehicles to classify if it’s a bus
or a truck. The model learns by identifying the parts of a vehicle,
such as a length and width of the vehicle, the front, and rear end
covers, roof hoods, the types of wheels used, etc. Based on these
features, the model classifies if the vehicle is a bus or a truck.
73
74. UNSUPERVISED LEARNING …
Example#3:
Consider the following data regarding patients entering a
clinic. The data consists of the gender and age of the
patients.
Q. Based on this data, can we infer anything regarding the
patients entering the clinic?
74
76. UNSUPERVISED LEARNING: APPLICATIONS
Unsupervised learning is used for solving
clustering and association problems.
Learning associations:
Basket analysis: let p(Y |X) = “probability that a customer who buys
product X also buys product Y ”, estimated from past purchases. If p(Y |X)
is large (say 0.7), associate “X → Y ”. When someone buys X, recommend
them Y .
Clustering: group similar data points/instances.
Density estimation: where are data points likely to lie?
Dimensionality reduction: data lies in a low-dimensional manifold.
Feature selection: keep only useful features.
Outlier/novelty detection
Customer segmentation: based on customer behavior, likes,
dislikes, and interests, you can segment and cluster similar
customers into a group.
Image compression: Color quantization
76
79. UNSUPERVISED LEARNING: ALGORITHMS
Selecting the right algorithm depends on the type of
problem you are trying to solve. Some of the common
examples of unsupervised learning are:
K Means Clustering
Hierarchical Clustering
DBSCAN
Principal Component Analysis (PCA)
79
80. SEMI-SUPERVISED LEARNING
labels provided for some points only.
It is a branch of machine learning that combines a small
amount of labeled data with a large amount of unlabeled
data during training.
Semi-supervised learning falls between unsupervised
learning (with no labeled training data) and supervised
learning (with only labeled training data).
80
81. SEMI-SUPERVISED LEARNING: HOW SEMI-SUPERVISED LEARNING WORKS
Semi-supervised machine learning is a combination
of supervised and unsupervised learning. It uses a small amount of
labeled data and a large amount of unlabeled data, which provides
the benefits of both unsupervised and supervised learning while
avoiding the challenges of finding a large amount of labeled data.
That means you can train a model to label data without having to
use as much labeled training data.
Here’s how it works:
1. Train the model with the small amount of labeled training data just like
you would in supervised learning, until it gives you good results.
2. Then use it with the unlabeled training dataset to predict the outputs,
which are pseudo labels since they may not be quite accurate.
3. Link the labels from the labeled training data with the pseudo labels
created in the previous step.
4. Link the data inputs in the labeled training data with the inputs in the
unlabeled data.
5. Then, train the model the same way as you did with the labeled set in the
beginning in order to decrease the error and improve the model’s
accuracy. 81
82. SEMI-SUPERVISED LEARNING: APPLICATIONS
Text document classifier: this is the type of situation where
semi-supervised learning is ideal because it would be nearly
impossible to find a large amount of labeled text documents.
The Classification of Content on the Internet: the internet
is a vast trove of web pages, and it cannot be expected that
every page will be labeled and have all the data for the field
that you desire. However, at the same time, it is true that over
the years, some minority of web pages will have been labeled
for one dimension or the other.
82
83. SEMI-SUPERVISED LEARNING: ASSUMPTIONS
Semi-supervised methods must make some assumption about
the data in order to justify using a small set of labeled data to
make conclusions about the unlabeled data points. These can
be grouped into three categories.
1. The first is the continuity assumption. This assumes that data
points that are “close” to each other are more likely to have a
common label.
2. The second is the cluster assumption. This assumes that the
data naturally forms discrete clusters, and that points in the same
cluster are more likely to share a label.
3. The third is the manifold assumption. This assumes that the
data roughly lies in a lower-dimensional space (or manifold) than
the input space. This scenario is relevant when an unobservable
or difficult-to-observe system with a small number of parameters
produces high-dimensional observable output.
83
84. REINFORCEMENT LEARNING
This is somewhere between supervised and unsupervised learning. The
algorithm gets told when the answer is wrong, but does not get told how to
correct it. It has to explore and try out different possibilities until it works
out how to get the answer right.
Reinforcement learning is sometime called learning with a critic because
of this monitor that scores the answer, but does not suggest improvements.
No supervised output but delayed reward.
Given a sequence of states and actions with rewards, find a sequence of
actions (policy) that reaches a goal (output a policy)
Policy is a mapping from states à actions that tells you what to do in a
given state
Policies: what actions should an agent take in a particular situation
Utility estimation: how good is a state (used by policy) 84
85. REINFORCEMENT LEARNING: HOW IT WORKS
Reinforcement learning follows trial and error methods to get the
desired result. After accomplishing a task, the agent receives an
award. An example could be to train a dog to catch the ball. If the
dog learns to catch a ball, you give it a reward, such as a biscuit.
Reinforcement Learning methods do not need any external
supervision to train models.
Reinforcement learning problems are reward-based. For every task
or for every step completed, there will be a reward received by the
agent. If the task is not achieved correctly, there will be some
penalty added.
85
86. REINFORCEMENT LEARNING: THE AGENT-ENVIRONMENT INTERFACE
Reinforcement Learning trains a machine to take suitable actions and
maximize its rewards in a particular situation. It uses an agent and an
environment to produce actions and rewards. The agent has a start and an
end state. But, there might be different paths for reaching the end state,
like a maze. In this learning technique, there is no predefined target
variable.
86
87. REINFORCEMENT LEARNING: EXAMPLE
Example#1:
An example of reinforcement learning is to train a machine that can identify
the shape of an object, given a list of different objects. In the example shown,
the model tries to predict the shape of the object, which is a square in this
case.
Example #2:
Consider teaching a dog a new trick: we cannot tell it what to do, but we can
reward/punish it if it does the right/wrong thing. It has to find out what it did
that made it get the reward/punishment. We can use a similar method to train
computers to do many tasks, such as playing backgammon or chess 87
88. REINFORCEMENT LEARNING: APPLICATIONS
Reinforcement learning algorithms are widely used in the
gaming industries to build games. It is also used to train
robots to do human tasks.
Playing chess or a computer game
Credit assignment problem
Game playing
Robot in a maze
88
89. REINFORCEMENT LEARNING : SUMMARY
Supervised (inductive) learning
– Given: training data + desired outputs (labels)
Unsupervised learning
– Given: training data (without desired outputs)
Semi-supervised learning
– Given: training data + a few desired outputs
Reinforcement learning
– Rewards from sequence of actions
89
91. MACHINE LEARNING MODELS
Machine learning models are computer programs that are
used to recognize patterns in data or make predictions.
Machine learning models are created from machine learning
algorithms, which are trained using either labeled,
unlabeled, or mixed data.
Different machine learning algorithms are selected as they
can be suited to different goals, such as classification,
regression, clustering, etc.
91
92. HOW TO BUILD A MACHINE LEARNING MODEL: COMMON STEPS
Machine learning models are created by training algorithms with
either labeled or unlabeled data, or a mix of both using different
machine learning methods.
Building a machine learning model project commonly involves the
following 10 steps:
Step 1: Understand the business problem (and define success)
Step 2: Understand and identify data
Step 3: Collecting Data
Step 4: Preparing data
Step 5: Choose a model
Step 6: Training a model
Step 7: Evaluating the Model
Step 8: Parameter tuning
Step 9: Making Predictions
Step 10: Deploy the machine learning model 92
93. STEP 1. UNDERSTAND THE BUSINESS PROBLEM (AND
DEFINE SUCCESS)
The first phase of any machine learning project is developing an understanding
of the business requirements. You need to know what problem you're trying to
solve before attempting to solve it.
To start, work with the owner of the project and make sure you understand the
project's objectives and requirements.
Key questions to answer include the following:
What's the business objective that requires a cognitive solution?
What parts of the solution are cognitive, and what aren't?
Have all the necessary technical, business and deployment issues been addressed?
What are the defined "success" criteria for the project?
How can the project be staged in iterative sprints?
Are there any special requirements for transparency, explainability or bias reduction?
What are the ethical considerations?
What are the acceptable parameters for accuracy, precision and confusion matrix values?
What are the expected inputs to the model and the expected outputs?
What are the characteristics of the problem being solved? Is this a classification,
regression or clustering problem?
What is the "heuristic" -- the quick-and-dirty approach to solving the problem that
doesn't require machine learning? How much better than the heuristic does the model
need to be?
How will the benefits of the model be measured?
93
94. STEP 2. UNDERSTAND AND IDENTIFY DATA
A machine learning model is built by learning and generalizing from training data, then
applying that acquired knowledge to new data it has never seen before to make
predictions and fulfill its purpose. Lack of data will prevent you from building the
model, and access to data isn't enough. Useful data needs to be clean and in a good
shape.
Identify your data needs and determine whether the data is in proper shape for the
machine learning project. The focus should be on data identification, initial collection,
requirements, quality identification, insights and potentially interesting aspects that are
worth further investigation.
Here are some key questions to consider:
Where are the sources of the data that's needed for training the model?
What quantity of data is needed for the machine learning project?
What is the current quantity and quality of training data?
How are the test set data and training set data being split?
For supervised learning tasks, is there a way to label that data?
Can pre-trained models be used?
Where is the operational and training data located?
Are there special needs for accessing real-time data on edge devices or in more difficult-to-
reach places?
Answering these important questions helps you get a handle on the quantity and quality
of data as well as understand the type of data that's needed to make the model work.
94
95. STEP 3: COLLECTING DATA
This step requires a the need for reliable data source and quality
data
It is of the utmost importance to collect reliable data so that your
machine learning model can find the correct patterns. The quality of
the data that you feed to the machine will determine how accurate
your model is. If you have incorrect or outdated data, you will have
wrong outcomes or predictions which are not relevant.
Make sure you use data from a reliable source, as it will directly
affect the outcome of your model. Good data is relevant, contains
very few missing and repeated values, and has a good
representation of the various subcategories/classes present.
95
96. STEP 4: PREPARING THE DATA
After you have your data, you have to prepare it. You can do this
by :
Putting together all the data you have and randomizing it. This helps
make sure that data is evenly distributed, and the ordering does not affect
the learning process.
Cleaning the data to remove unwanted data, missing values, rows, and
columns, duplicate values, data type conversion, etc. You might even
have to restructure the dataset and change the rows and columns or index
of rows and columns.
Visualize the data to understand how it is structured and understand the
relationship between various variables and classes present.
Splitting the cleaned data into two sets - a training set and a testing set.
The training set is the set your model learns from. A testing set is used to
check the accuracy of your model after training.
Data preparation and cleansing tasks can take a substantial amount
of time
96
97. STEP 4: PREPARING THE DATA: SPECIFIC ACTIVITIES
Procedures during the data preparation, collection and cleansing process
include the following:
Collect data from the various sources.
Standardize formats across different data sources.
Replace incorrect data.
Enhance and augment data.
Add more dimensions with pre-calculated amounts and aggregate information
as needed.
Enhance data with third-party data.
"Multiply" image-based data sets if they aren't sufficient enough for training.
Remove extraneous information and deduplication.
Remove irrelevant data from training to improve results.
Reduce noise reduction and remove ambiguity.
Consider anonymizing data.
Normalize or standardize data to get it into formatted ranges.
Sample data from large data sets.
Select features that identify the most important dimensions and, if necessary,
reduce dimensions using a variety of techniques.
Split data into training, test and validation sets. 97
98. STEP 5: CHOOSING A MODEL
A machine learning model determines the output you get after
running a machine learning algorithm on the collected data.
It is important to choose a model which is relevant to the task at
hand.
Over the years, scientists and engineers developed various models
suited for different tasks like speech recognition, image
recognition, prediction, etc.
Apart from this, you also have to see if your model is suited for
numerical or categorical data and choose accordingly.
98
99. STEP 6: TRAINING THE MODEL
Training is the most important step in machine learning.
In training, you pass the prepared data to your machine learning
model to find patterns and make predictions. It results in the
model learning from the data so that it can accomplish the task set.
Over time, with training, the model gets better at predicting.
99
100. STEP 5: EVALUATING THE MODEL
After training your model, you have to check to see how it’s
performing. This is done by testing the performance of the
model on previously unseen data. The unseen data used is the
testing set that you split our data into earlier.
If testing was done on the same data which is used for
training, you will not get an accurate measure, as the model is
already used to the data, and finds the same patterns in it, as it
previously did. This will give you disproportionately high
accuracy.
When used on testing data, you get an accurate measure of
how your model will perform and its speed. 100
101. STEP 7: MODEL EVALUATION
During the model evaluation process, you should do the
following:
Evaluate the models using a validation data set.
Determine confusion matrix values for classification problems.
Identify methods for k-fold cross-validation if that approach is
used.
Further tune hyperparameters for optimal performance.
Compare the machine learning model to the baseline model or
heuristic.
101
102. STEP 8: PARAMETER TUNING
Once you have created and evaluated your model, see if its
accuracy can be improved in any way. This is done by tuning
the parameters present in your model.
Parameters are the variables in the model that the programmer
generally decides.
At a particular value of your parameter, the accuracy will be
the maximum. Parameter tuning refers to finding these values.
102
103. STEP 9: MAKING PREDICTIONS
In the end, you can use your model on unseen data to
make predictions accurately.
103
104. STEP 9: DEPLOY THE MACHINE LEARNING MODEL
The last step in building a machine learning model is
the deployment of the model.
Machine learning models are generally developed and tested
in a local or offline environment using training and testing
datasets.
Deployment is when the model is moved into a live
environment, dealing with new and unseen data.
This is the point that the model starts to bring a return on
investment to the organization, as it is performing the task it
was trained to do with live data.
104
106. MODEL EVALUATION: OVERVIEW
Key questions
Q. How well the model works/perform in an unseen data?
While training a model is a key step, how the model
generalizes on unseen data is an equally important aspect
that should be considered in every machine learning
pipeline.
We need to know whether it actually works and,
consequently, if we can trust its predictions.
106
107. MODEL EVALUATION : DEFINITION
Model evaluation aims to estimate the generalization
accuracy of a model on future (unseen/out-of-sample)
data.
The purpose of model evaluation is to help us to know
which algorithm best suits the given dataset for solving a
particular problem
To select the “Best Fit” algorithm
It evaluates the performance of different Machine Learning
models, based on the same input dataset.
107
108. MODEL EVALUATION TECHNIQUES
There are two methods that are used to evaluate a model
performance. They are
1. Holdout
2. Cross Validation
Both methods use a test set (i.e data not seen by the model) to
evaluate model performance.
It’s not recommended to use the data we used to build the model
to evaluate it. This is because our model will simply remember the
whole training set, and will therefore always predict the correct
label for any point in the training set. This is known as overfitting
108
109. MODEL EVALUATION TECHNIQUES: HOLDOUT METHOD
The Holdout method is used to evaluate the model
performance and uses two types of data : training and testing
The training data is used to train the system
The test data is used to calculate the performance of the model
whereas it is trained using the training data set
This method is used to check how well the machine learning
model developed using different algorithm
techniques performs on unseen samples of data.
The approach is simple, flexible and fast.
E.g. 80/20% train-test data split
109
110. CROSS-VALIDATION
k-fold cross-validation is the most common cross-validation technique
and it works as the following way:
The original dataset is partitioned into k equal size subsamples, called folds.
The k is a user-specified number, usually with 5 or 10 as its preferred value.
This is repeated k times, such that each time, one of the k subsets is used as
the test set/validation set and the other k-1 subsets are put together to form a
training set.
The error estimation is averaged over all k trials to get the total effectiveness
of our model.
Example:
when performing five-fold cross-validation, the data is first partitioned into 5
parts of (approximately) equal size. A sequence of models is trained. The first
model is trained using the first fold as the test set, and the remaining folds are
used as the training set. This is repeated for each of these 5 splits of the data
and the estimation of accuracy is averaged over all 5 trials to get the total
effectiveness of our model.
Cross-validation is usually the preferred method because it gives your
model the opportunity to train on multiple train-test splits. This gives you
a better indication of how well your model will perform on unseen data.
Hold-out, on the other hand, is dependent on just one train-test split.
110
111. MODEL EVALUATION METRICS
Model evaluation metrics are required to quantify model
performance.
The choice of evaluation metrics depends on a given machine
learning task (such as classification, regression, ranking,
clustering, topic modeling, among others).
All tasks may not require all evaluation metrics
Some metrics, such as precision-recall, are useful for multiple tasks.
Common types of evaluation metrics are depends on the type of
machine learning task
Classification model
Clustering model
Forecast model
Outlier model 111
112. MODEL EVALUATION: CLASSIFICATION METRICS
The different types of classification metrics are:
Classification Accuracy
Confusion Matrix
F-Measure
Logarithmic Loss
Area under Curve (AUC)
112
113. Classification Accuracy
Classification accuracy is similar to the term Accuracy. It is
the ratio of the correct predictions to the total number of
Predictions made by the model from the given data.
113
MODEL EVALUATION: CLASSIFICATION METRICS
114. Confusion Matrix
It is a NxN matrix structure used for
evaluating the performance of a classification
model, where N is the number of classes that
are predicted.
It is operated on a test dataset in which the
true values are known.
The matrix lets us know about the number of
incorrect and correct predictions made by a
classifier and is used to find correctness of the
model.
It consists of values like True Positive, False
Positive, True Negative, and False Negative,
which helps in measuring Accuracy, Precision,
Recall, Specificity, Sensitivity, and AUC curve.
114
MODEL EVALUATION: CLASSIFICATION METRICS
115. Confusion matrix:
There are 4 important terms in confusion matrix:
True Positives (TP): The cases in which our predictions are TRUE, and the actual output was also
TRUE.
True Negatives (TN): The cases in which our predictions are FALSE, and the actual output was
also FALSE.
False Positives (FP): The cases in which our predictions are TRUE, and the actual output was
FALSE.
False Negative (FN): The cases in which our predictions are FALSE, and the actual output was
TRUE.
Helps to calculate accuracy, precision, recall and F-measure
The accuracy can be calculated by using the mean of True Positive and True Negative values of the
total sample values. It tells us about the total number of predictions made by the model that were
correct.
Precision is the ratio of Number of True Positives in the sample to the total Positive
samples predicted by the classifier. It tells us about the positive samples that were correctly
identified by the model.
Recall is the ratio of Number of True Positives in the sample to the sum of True Positive and False
Negative samples in the data.
F1 Score
It is also called as F-Measure. It is a best measure of the Test accuracy of the developed model. It
makes our task easy by eliminating the need to calculate Precision and Recall separately to know
about the model performance. F1 Score is the Harmonic mean of Recall and Precision. Higher the
F1 Score, better the performance of the model. Without calculating Precision and Recall separately,
we can calculate the model performance using F1 score as it is precise and robust.
115
MODEL EVALUATION: CLASSIFICATION METRICS
116. REGRESSION METRICS
It helps to predict the state of outcome at any time with the help of
independent variables that are correlated.
These metrics are designed in order to predict if the data is underfitted or
overfitted for the better usage of the model.
They are:-
Mean Absolute Error (MAE)
Mean Squared Error (MSE)
Root Mean Squared Error (RMSE)
Mean Absolute Error is the average of the difference of the original values and
the predicted values. It gives us an idea of how far the predictions are from the
actual output. It doesn’t give clarity on whether the data is under fitted or over
fitted. It is calculated as follows:
The mean squared error is similar to the mean absolute error. It is computed by
taking the average of the square of the difference between original
and predicted values. With the help of squaring, large errors can be converted
to small errors and large errors can be dealt with. It is computed as follows.
The root mean squared error is the root of the mean of the square of difference
of the predicted and actual values of the given data. It is the most popular
metric evolution technique used in regression problems. It follows a normal
distribution and is based on the assumption that errors are unbiased. It is
computed using the below formulae.
116