SlideShare a Scribd company logo
1 of 118
Download to read offline
Alemu Kumilachew
Facultiy of Computing, Bahir Dar University Instituite
of Technology (BiT), Bahir Dar, Ethiopia
1 • Introduction to Machine Learning
2 • Concepts of Learning and its process
3 • Types of Learning and Machine learning methods
• Model Building
• Evaluation
6 • Applications & Current trends in machine learning
1) Ethem Alpaydin, ”Introduction to Machine Learning”, MIT Press,
Prentice Hall of India, 3rd Edition2014.
2) Mehryar Mohri, Afshin Rostamizadeh, Ameet Talwalkar ” Foundations
of Machine Learning”, MIT Press,2012.
3) MACHINE LEARNING - An Algorithmic Perspective, Second Edition,
Stephen Marsland, 2015.
4) Tom Mitchell, “Machine Learning”, McGraw Hill, 3rdEdition, 1997.
1) CharuC.Aggarwal,“Data Classification Algorithms and Applications”,
CRCPress, 2014.
2) Charu C. Aggarwal, “DATA CLUSTERING Algorithms and
Applications”, CRC Press, 2014.
3) Kevin P. Murphy ”Machine Learning: A Probabilistic Perspective”, The
MIT Press, 2012
4) Jiawei Han and Micheline Kambers and JianPei, “Data Mining Concepts
andTechniques”,3rd edition, Morgan Kaufman Publications, 2012. 3
 Assignment/Quiz/test …………..……5%
 Group Project ………………….……. 20%
 Mid Exam……………………………..25%
 Final Exam…………………………….50%
 The main objectives of learning machine learning is to
know and understand:
 What are the different types in machine learning?
 What are the different algorithms available for developing
machine learning models?
 What tools are available for developing these models?
 What are the programming language choices?
 What platforms support development and deployment of
Machine Learning applications?
 What IDEs (Integrated Development Environment) are
 How to quickly upgrade your skills in this important area? 5
 Machine learning (ML) is a subset of artificial
intelligence (AI), that is all about getting an AI to
accomplish tasks without being given specific
instructions. In essence, it’s about teaching
machines how to learn!
 AI is simulated human cognition, that is supposed to do
via learning!
 What is learning?
 Were we born with PhD level intelligence?
 Of course not!
At the beginning of our lives, we have little
understanding of the world around us, but over
time we grow to learn a lot. We use our senses to
take in data, and learn via a combination of interacting
with the world around us, being explicitly taught
certain things by others, finding patterns over time,
and, of course, lots of trial-and-error
 “Learning is any process by which a system improves
performance from experience.”
- Herbert Simon
 AI learns in a similar way. When it’s first created, an
AI knows nothing; ML gives AI the ability to learn
about its world.
 AI is all about allowing a system to learn from
examples rather than instructions. ML is what
makes that possible.
 AIs are taught, not explicitly programmed. In other words,
instead of spelling out specific rules to solve a problem, we
give them examples of what they will encounter in the real
world and let them find the patterns themselves. Allowing
machines to find patterns is beneficial over spelling out the
instructions when the instructions are hard or unknown or
when the data has many different variables, for example
treating cancer, predicting the stock market.
 Arthur Samuel (at IBM )- coined the term “Machine
Learning” in 1959 at the first time .
 He defined machine learning as:
 No universally accepted definition for ML.
 Different authors define the term differently.
“the field of study that gives computers the ability to
learn without being explicitly programmed.”
 Machine learning is a branch of artificial intelligence (AI) and
computer science which focuses on the use of data and algorithms to
imitate the way that humans learn, gradually improving its accuracy.
 Machine learning is programming computers to optimize a
performance criterion using example data or past experience.
 ML creates a model defined up to some parameters, and learning is
the execution of a computer program to optimize the parameters of the
model using the training data or past experience. The model may be
predictive to make predictions in the future, or descriptive to gain
knowledge from data, or both.
 A model:
 is a compressed version of a database;
 extracts knowledge from it;
 does not have perfect performance but is a useful approximation to the
 Machine learning (ML) is defined as a discipline of artificial intelligence
(AI) that provides machines the ability to automatically learn from data
and past experiences to identify patterns and make predictions with
minimal human intervention.
 Definition by Tom Mitchell (1998):
Machine Learning is the study of algorithms that
 improve their performance P
 at some task T
 with experience E.
A well-defined learning task is given by <P,T,E>
 A computer program which learns from experience is
called a machine learning program or simply a learning
program. Such a program is sometimes also referred to
as a learner
1. “Big data” - models are based on huge amounts of data which is
being produced and stored continuously
 science: genomics, astronomy, materials science, particle accelerators. . .
 sensor networks: weather measurements, traffic. . .
 people: social networks, blogs, mobile phones, purchases, bank
transactions. . . etc
2. Data is not random; it contains structure that can be used to
predict outcomes, or gain knowledge in some way.
 Ex: patterns of Amazon purchases can be used to recommend items.
3. It is more difficult to design algorithms for such tasks
(compared to, say, sorting an array or calculating a payroll). Such
algorithms need data.
 Ex: construct a spam filter, using a collection of email messages labelled as
spam/not spam.
4. Learning isn’t always useful:
 There is no need to “learn” to calculate payroll
5. Data mining – extracting useful knowledge/insights from data
 Ex: Data mining is designed to extract the rules via the application of
ML methods from large databases.
 Example #1:
 A classic example of a task that requires machine
learning: It is very hard to say what makes a 2
 Example #2: House price prediction
 After plotting various data points on the XY plot, we draw a best-fit line
to do our predictions for any other house given its size. You will feed
the known data to the machine and ask it to find the best fit line. Once
the best fit line is found by the machine, you will test its suitability by
feeding in a known house size, i.e. the Y-value in the above curve. The
machine will now return the estimated X-value, i.e. the expected price
of the house.
 Example #3:
Some more examples of tasks that are best solved by
using a learning algorithm
 Recognizing patterns:
 Facial identities or facial expressions
 Handwritten or spoken words
 Medical images
 Generating patterns:
 Generating images or motion sequences
 Recognizing anomalies:
 Unusual credit card transactions
 Unusual patterns of sensor readings in a nuclear power plant
 Prediction:
 Future stock prices or currency exchange rates
 ML is used when:
 Human expertise does not exist (navigating on Mars)
 Humans can’t explain their expertise (speech recognition)
 Models must be customized (personalized medicine)
 Solution needs to be adapted to particular cases (user biometrics)
 Models are based on huge amounts of data (genomics)
 Solution changes in time (routing on a computer network)
 Autonomous cars
 Autonomous car sensors
 Autonomous car technologies
 Deep learning emergence
 Deep Belief Net on Face Images
 Learning of Object Parts
 Training on Multiple Objects
 Automatic Speech recognition systems
 Speech technologies
 1950s
– Samuel’s checker player
– Selfridge’s Pandemonium
 1960s:
– Neural networks: Perceptron
– Pattern recognition
– Learning in the limit theory
– Minsky and Papert prove limitationsof Perceptron
 1970s:
– Symbolic concept induction
– Winston’s arch learner
– Expert systems and the knowledge acquisition bottleneck
– Quinlan’s D3
– Michalski’s AQ and soybean diagnosis
– Scientific discovery with BACON
– Mathematical discovery with AM 31
 1980s:
– Advanced decision tree and rule learning
– Explanation-based Learning (EBL)
– Learning and planning and problem solving
– Utility problem
– Analogy
– Cognitive architectures
– Resurgence of neural networks (connectionism, backpropagation)
– Valiant’s PAC Learning Theory
– Focus on experimental methodology
 1990s
– Data mining
– Adaptive software agents and web applications
– Text learning
– Reinforcement learning (RL)
– Inductive Logic Programming (ILP)
– Ensembles: Bagging, Boosting, and Stacking
– Bayes Net learning 32
 2000s
– Support vector machines & kernel methods
– Graphical models
– Statistical relational learning
– Transfer learning
– Sequence labeling
– Collective classification and structured outputs
– Computer Systems Applications (Compilers, Debugging, Graphics,
– E-mail management
– Personalized assistants that learn
– Learning in robotics and vision
 2010s
– Deep learning systems
– Learning for big data
– Bayesian methods
– Multi-task & lifelong learning
– Applications to vision, speech, social networks, learning to read, etc
– ???
 The following is a list of some of the typical applications of machine learning.
1. In retail business, machine learning is used to study consumer behaviour.
2. In finance, banks analyze their past data to build models to use in credit
applications, fraud detection, and the stock market.
3. In manufacturing, learning models are used for optimization, control, and
troubleshooting. 3
4. In medicine, learning programs are used for medical diagnosis.
5. In telecommunications, call patterns are analyzed for network optimization and
maximizing the quality of service.
6. In science, large amounts of data in physics, astronomy, and biology can only be
analyzed fast enough by computers. The World Wide Web is huge; it is constantly
growing and searching for relevant information cannot be done manually.
7. In artificial intelligence, it is used to teach a system to learn and adapt to
changes so that the system designer need not foresee and provide solutions for all
possible situations.
8. It is used to find solutions to many problems in vision, speech recognition, and
9. Machine learning methods are applied in the design of computer-controlled
vehicles to steer correctly when driving on a variety of roads.
10. Machine learning methods have been used to develop programmes for playing
games such as chess, backgammon and Go
 Learning can be viewed as using direct or indirect
experience to approximate a chosen target function.
 Learning general models from a data of particular examples
 Data is cheap and abundant (data warehouses, data
marts); knowledge is expensive and scarce.
 Example in retail: Customer transactions to consumer
People who bought “Da Vinci Code” also bought “The Five
People You Meet in Heaven” (
 Machine Learning builds a model that is a good and
useful approximation to the data.
 Definition
 A computer program is said to learn from experience E
with respect to some class of tasks T and performance
measure P, if its performance at tasks T, as measured by
P, improves with experience E.
 Examples: defining a learning task
I. Handwriting recognition learning problem
 T: Recognizing and classifying handwritten words within images
 P: Percent of words correctly classified
 E: A dataset of handwritten words with given classifications
II. A robot driving learning problem
 T: Driving on highways using vision sensors
 P: Average distance traveled before an error
 E: A sequence of images and steering commands recorded
while observing a human driver
III. A chess learning problem
 T: Playing chess
 P: Percent of games won against opponents
 E: Playing practice games against itself.
IV. Spam filtering
 T: Categorize email messages as spam or legitimate.
 P: Percentage of email messages correctly classified.
 E: Database of emails, some with human-given label 38
 Basic components of learning process
 The learning process, whether by a human or a machine, can
be divided into four components, namely, data storage,
abstraction, generalization and evaluation
Fig. Components of the learning process
 Data storage (1)
 Facilities for storing and retrieving huge amounts of data are
an important component of the learning process. Humans and
computers alike utilize data storage as a foundation for
advanced reasoning.
 In a human being, the data is stored in the brain and data is retrieved
using electrochemical signals.
 Computers use hard disk drives, flash memory, random access
memory and similar devices to store data and use cables and other
technology to retrieve data.
 Abstraction (2)
 The second component of the learning process is known as
 Abstraction is the process of extracting knowledge about stored
data. This involves creating general concepts about the data as a
whole. The creation of knowledge involves application of known
models and creation of new models.
 The process of fitting a model to a dataset is known as training.
When the model has been trained, the data is transformed into an
abstract form that summarizes the original information
 Generalization (3)
 The third component of the learning process is known as
 The term generalization describes the process of turning the
knowledge about stored data into a form that can be utilized for
future action.
 These actions are to be carried out on tasks that are similar, but
not identical, to those what have been seen before.
 In generalization, the goal is to discover those properties of the
data that will be most relevant to future tasks
 Evaluation (4)
 Evaluation is the last component of the learning process. It is the
process of giving feedback to the user to measure the utility of the
learned knowledge.
 This feedback is then utilized to effect improvements in the whole
learning process
 Machine learning is concerned with using the right features to
build the right models that achieve the right tasks.
 For a given problem, the collection of all possible outcomes
represents the sample space or instance space.
 The basic idea of Learning models has divided into three
 Using a Logical expression. (Logical models)
 Using the Geometry of the instance space. (Geometric models)
 Using Probability to classify the instance space. (Probabilistic
 Grouping and Grading (an orthogonal categorization to
geometric-probabilistic-logical-compositional) 44
 Logical models use a logical expression to divide the instance
space into segments and hence construct grouping models.
 A logical expression is an expression that returns a Boolean
value, i.e., a True or False outcome.
 Once the data is grouped using a logical expression, the data
is divided into homogeneous groupings for the problem we
are trying to solve.
 For example, for a classification problem, all the instances in
the group belong to one class.
 There are mainly two kinds of logical models: Tree models
and Rule models.
 Rule models consist of a collection of implications or IF-THEN
 For tree-based models, the ‘if-part’ defines a segment and the
‘then-part’ defines the behaviour of the model for this segment.
Rule models follow the same reasoning.
 logical models, such as decision trees, a logical expression is
used to partition the instance space. Two instances are similar
when they end up in the same logical segment.
 Example:
 “Enjoy Sport” as shown above is defined by a set of data from some example days. Each data is
described by six attributes. The task is to learn to predict the value of Enjoy Sport for an
arbitrary day based on the values of its attribute values. The problem can be represented by a
series of hypotheses. Each hypothesis is described by a conjunction of constraints on the
attributes. The training data represents a set of positive and negative examples of the target
function. In the example above, each hypothesis is a vector of six constraints, specifying the
values of the six attributes – Sky, AirTemp, Humidity, Wind, Water, and Forecast. The training
phase involves learning the set of days (as a conjunction of attributes) for which Enjoy Sport =
 Thus, the problem can be formulated as:
 Given instances X which represent a set of all possible days, each described by the attributes:
 o Sky – (values: Sunny, Cloudy, Rainy),
 o AirTemp – (values: Warm, Cold),
 o Humidity – (values: Normal, High),
 o Wind – (values: Strong, Weak),
 o Water – (values: Warm, Cold),
 o Forecast – (values: Same, Change).
 Q. Try to identify a function that can predict the target variable Enjoy Sport as yes/no, i.e., 1 or 0.
 In Geometric models, features could be described as points in
two dimensions (x- and y-axis) or a three-dimensional space
(x, y, and z).
 for example, temperature as a function of time can be modelled in
two axes
 In geometric models, there are two ways we could impose
 We could use geometric concepts like lines or planes to segment
(classify) the instance space. These are called Linear models.
 Alternatively, we can use the geometric notion of distance to
represent similarity. In this case, if two points are close together,
they have similar values for features and thus can be classed as
similar. We call such models as Distance-based models.
 Linear models
 Linear models are relatively simple. In this case, the function is
represented as a linear combination of its inputs.
 In the simplest case where f(x) represents a straight line, we have
an equation of the form f (x) = mx + c where c represents the
intercept and m represents the slope.
 Linear models are parametric, which means
that they have a fixed form with a small
number of numeric parameters that need to be
learned from data. For example, in f (x) = mx
+ c, m and c are the parameters that we are
trying to learn from the data. This technique is
different from tree or rule models, where the
structure of the model (e.g., which features to
use in the tree, and where) is not fixed in
 Distance-based models
 As the name implies, distance-based models work on the concept of
distance. In the context of Machine learning, the concept of distance is
not based on merely the physical distance between two points.
 The distance metrics commonly used are Euclidean & Manhattan
 Distance-based models
 Distance is applied through the concept of neighbors and exemplars.
 Neighbors are points in proximity with respect to the distance measure
expressed through exemplars.
 Exemplars are either centroids that find a center of mass according to a chosen
distance metric or medoids that find the most centrally located data point.
 The most commonly used centroid is the arithmetic mean, which
minimizes squared Euclidean distance to all other points.
 Notes:
 The centroid represents the geometric center of a plane figure, i.e., the arithmetic
mean position of all the points in the figure from the centroid point. This
definition extends to any object in n-dimensional space: its centroid is the mean
position of all the points.
 Medoids are similar in concept to means or centroids. Medoids are most
commonly used on data when a mean or centroid cannot be defined. They are
used in contexts where the centroid is not representative of the dataset, such as
in image data.
 Examples of distance-based models include the nearest-neighbour
models, which use the training data as exemplars – for example, in
classification. The K-means clustering algorithm also uses exemplars to
create clusters of similar data points.
 Probabilistic models use the idea of probability to classify new
 Probabilistic models see features and target variables as random
variables. The process of modelling represents and manipulates the
level of uncertainty with respect to these variables.
 There are two types of probabilistic models: Predictive and
 Predictive probability models use the idea of a conditional probability
distribution P (Y |X) from which Y can be predicted from X.
 Generative models estimate the joint distribution P (Y, X). Once we know
the joint distribution for the generative models, we can derive any
conditional or marginal distribution involving the same variables. Thus,
the generative model is capable of creating new data points and their
labels, knowing the joint probability distribution. The joint distribution
looks for a relationship between two variables. Once this relationship is
inferred, it is possible to infer new data points. 52
 Naïve Bayes
 Naïve Bayes is an example of a probabilistic classifier. We can do
this using the Bayes rule defined as
 The Naïve Bayes algorithm is based on the idea of Conditional
Probability. Conditional probability is based on finding the
probability that something will happen, given that something else
has already happened. The task of the algorithm then is to look at
the evidence and to determine the likelihood of a specific class
and assign a label accordingly to each entity.
 logical models use a logical expression to partition the instance space
 Geometric(such as distance-based models) uses the idea of distance
(e.g., Euclidian distance) to classify entities
 probabilistic models use the idea of probability to classify new entities.
Learning models
Geometric models
K-nearest neighbors,
linear regression,
support vector
machine, logistic
regression, …
Naïve Bayes,
Gaussian process
regression, conditional
random field, …
Logical models
Decision tree, random
forest, …
 For any learning system, we must be knowing the three elements — T
(Task), P (Performance Measure), and E (Training Experience).
 At a high level, the process of learning system looks as below.
 The learning process starts with task T, performance measure P and
training experience E and objective are to find an unknown target
 The target function is an exact knowledge to be learned from the
training experience and its unknown.
 For example, in a case of credit approval, the learning system will
have customer application records as experience and task would be
to classify whether the given customer application is eligible for a
 So in this case, the training examples can be represented as 8
(x1,y1)(x2,y2)..(xn,yn) where X represents customer application
details and y represents the status of credit approval.
 With these details, what is that exact knowledge to be learned
from the training experience?
 So the target function to be learned in the credit approval learning
system is a mapping function f:X →y. This function represents the
exact knowledge defining the relationship between input variable
X and output variable y.
 Just now we looked into the learning process and also understood the goal
of the learning. When we want to design a learning system that follows the
learning process, we need to consider a few design choices. The design
choices will be to decide the following key components
1. Choose the training experience
2. Choose exactly what is to be learned (the target function)
– i.e. the target function
3. Choose how to represent the target function
4. Choose a learning algorithm to infer the target function from the
5. The final design
 Example:
 We will look into the game - checkers learning problem
and apply the above design choices.
 For a checkers learning problem, the three elements will
1. Task T: To play checkers
2. Performance measure P: Total percent of the game won in the
3. Training experience E: A set of games played against itself
 Labels are provided
 SL is also called learning from exemplars.
 Supervised learning is a type of machine learning that uses
labeled data to train machine learning models. In labeled
data, the output is already known. The model just needs to
map the inputs to the respective outputs.
 Supervised machine learning algorithm works by using and
analyzing the labeled training data and produces/builds a
function/model, which can be used for mapping new examples (the
class labels for unseen instances) to its target outputs.
 SL has this form:
Given (x1, y1), (x2, y2), ..., (xn, yn)
The algorithm learns a function f(x) to predict y given x. 60
 Example#1:
 Suppose the data consisting of the gender and age of the
patients and each patient is labeled as “healthy” or “sick”.
 Q. What will be the role of the supervised machine learning
algorithm in the above example?
 A. Therefore the purpose of a supervised machine learning
algorithm here is to learn/train the above data and a build a
function/model that identifies any new/unseen patient as
“sick” or “healthy” based on his age and gender parameters.
 Example#2:
 An example of supervised learning is to train a system that
identifies the image of an animal.
 Supervised Learning methods need external supervision to
train machine learning models. They need guidance and
additional information to return the desired result.
 It can be thought of as a teacher supervising the learning
process. We know the correct answers (that is, the correct
outputs), the algorithm iteratively makes predictions on the
training data and is corrected by the teacher. Learning stops
when the algorithm achieves an acceptable level of
 Classification and regression problems are the most
common types of supervised learning problems.
 Classification: the labels to be predicted are categorical:
 Works by pattern recognition
 Face recognition:
 Optical character recognition: different styles, slant. . .
 Credit scoring: classify customers
into high- and low-risk, based
on their income and savings,
using data about past loans
(whether they were paid or not).
Model: IF income > θ1 AND savings > θ2
THEN low-risk ELSE high-risk
 Given (x1, y1), (x2, y2), ..., (xn, yn)
Learn a function f(x) to predict y given x
– y is categorical == classification
 Regression: the labels to be predicted are continuous
Given (x1, y1), (x2, y2), ..., (xn, yn)
Learn a function f(x) to predict y given x
– y is real-valued == regression
 Example:
 Credit scoring: classify customers into high- and low-risk, based on
their income and savings, using data about past loans (whether they
were paid or not).
 Predict the price of a car from its mileage.
 A wide range of supervised learning algorithms are
available, each with its strengths and weaknesses. There is
no single learning algorithm that works best on all
supervised learning problems
 Some of the most popularly used supervised learning
algorithms are:
 Linear Regression
 Logistic Regression
 Support Vector Machine
 K Nearest Neighbor
 Decision Tree
 Random Forest
 Naive Bayes
 Supervised learning algorithms are generally used
for solving classification and regression problems.
 Few of the top supervised learning applications are
weather prediction, sales forecasting, stock price
 Unsupervised learning is a type of machine learning that uses
unlabeled data to train machines and works by finding patterns
and understands the trends in the data to discover the output. So,
the model tries to label the data based on the features of the input
 In unsupervised learning algorithms, a classification or
categorization (labels/classes) is not included in the observations.
But instead the algorithm tries to identify similarities between the
inputs so that inputs that have something in common are
categorized together.
 The training process used in “unsupervised learning” techniques
does not need any supervision to build models. They learn on
their own and predict the output.
 Example #1
 Here, we have taken an unlabeled input data, which means it is not categorized
and corresponding outputs are also not given. Now, this unlabeled input data is
fed to the machine learning model in order to train it. Firstly, it will interpret
the raw data to find the hidden patterns from the data and then will apply
suitable algorithms such as k-means clustering, Decision tree, etc. Once it
applies the suitable algorithm, the algorithm divides the data objects into
groups according to the similarities and difference between the objects.
 Example #2:
 Depicted below is an example of an unsupervised learning
technique that uses the images of vehicles to classify if it’s a bus
or a truck. The model learns by identifying the parts of a vehicle,
such as a length and width of the vehicle, the front, and rear end
covers, roof hoods, the types of wheels used, etc. Based on these
features, the model classifies if the vehicle is a bus or a truck.
 Example#3:
 Consider the following data regarding patients entering a
clinic. The data consists of the gender and age of the
 Q. Based on this data, can we infer anything regarding the
patients entering the clinic?
 no labels provided, only input data.
 Unsupervised learning is used for solving
clustering and association problems.
 Learning associations:
 Basket analysis: let p(Y |X) = “probability that a customer who buys
product X also buys product Y ”, estimated from past purchases. If p(Y |X)
is large (say 0.7), associate “X → Y ”. When someone buys X, recommend
them Y .
 Clustering: group similar data points/instances.
 Density estimation: where are data points likely to lie?
 Dimensionality reduction: data lies in a low-dimensional manifold.
 Feature selection: keep only useful features.
 Outlier/novelty detection
 Customer segmentation: based on customer behavior, likes,
dislikes, and interests, you can segment and cluster similar
customers into a group.
 Image compression: Color quantization
 Genomics application: group individuals by genetic
 Selecting the right algorithm depends on the type of
problem you are trying to solve. Some of the common
examples of unsupervised learning are:
 K Means Clustering
 Hierarchical Clustering
 Principal Component Analysis (PCA)
 labels provided for some points only.
 It is a branch of machine learning that combines a small
amount of labeled data with a large amount of unlabeled
data during training.
 Semi-supervised learning falls between unsupervised
learning (with no labeled training data) and supervised
learning (with only labeled training data).
 Semi-supervised machine learning is a combination
of supervised and unsupervised learning. It uses a small amount of
labeled data and a large amount of unlabeled data, which provides
the benefits of both unsupervised and supervised learning while
avoiding the challenges of finding a large amount of labeled data.
That means you can train a model to label data without having to
use as much labeled training data.
 Here’s how it works:
1. Train the model with the small amount of labeled training data just like
you would in supervised learning, until it gives you good results.
2. Then use it with the unlabeled training dataset to predict the outputs,
which are pseudo labels since they may not be quite accurate.
3. Link the labels from the labeled training data with the pseudo labels
created in the previous step.
4. Link the data inputs in the labeled training data with the inputs in the
unlabeled data.
5. Then, train the model the same way as you did with the labeled set in the
beginning in order to decrease the error and improve the model’s
accuracy. 81
 Text document classifier: this is the type of situation where
semi-supervised learning is ideal because it would be nearly
impossible to find a large amount of labeled text documents.
 The Classification of Content on the Internet: the internet
is a vast trove of web pages, and it cannot be expected that
every page will be labeled and have all the data for the field
that you desire. However, at the same time, it is true that over
the years, some minority of web pages will have been labeled
for one dimension or the other.
 Semi-supervised methods must make some assumption about
the data in order to justify using a small set of labeled data to
make conclusions about the unlabeled data points. These can
be grouped into three categories.
1. The first is the continuity assumption. This assumes that data
points that are “close” to each other are more likely to have a
common label.
2. The second is the cluster assumption. This assumes that the
data naturally forms discrete clusters, and that points in the same
cluster are more likely to share a label.
3. The third is the manifold assumption. This assumes that the
data roughly lies in a lower-dimensional space (or manifold) than
the input space. This scenario is relevant when an unobservable
or difficult-to-observe system with a small number of parameters
produces high-dimensional observable output.
 This is somewhere between supervised and unsupervised learning. The
algorithm gets told when the answer is wrong, but does not get told how to
correct it. It has to explore and try out different possibilities until it works
out how to get the answer right.
 Reinforcement learning is sometime called learning with a critic because
of this monitor that scores the answer, but does not suggest improvements.
 No supervised output but delayed reward.
 Given a sequence of states and actions with rewards, find a sequence of
actions (policy) that reaches a goal (output a policy)
 Policy is a mapping from states à actions that tells you what to do in a
given state
 Policies: what actions should an agent take in a particular situation
 Utility estimation: how good is a state (used by policy) 84
 Reinforcement learning follows trial and error methods to get the
desired result. After accomplishing a task, the agent receives an
award. An example could be to train a dog to catch the ball. If the
dog learns to catch a ball, you give it a reward, such as a biscuit.
 Reinforcement Learning methods do not need any external
supervision to train models.
 Reinforcement learning problems are reward-based. For every task
or for every step completed, there will be a reward received by the
agent. If the task is not achieved correctly, there will be some
penalty added.
 Reinforcement Learning trains a machine to take suitable actions and
maximize its rewards in a particular situation. It uses an agent and an
environment to produce actions and rewards. The agent has a start and an
end state. But, there might be different paths for reaching the end state,
like a maze. In this learning technique, there is no predefined target
 Example#1:
 An example of reinforcement learning is to train a machine that can identify
the shape of an object, given a list of different objects. In the example shown,
the model tries to predict the shape of the object, which is a square in this
 Example #2:
 Consider teaching a dog a new trick: we cannot tell it what to do, but we can
reward/punish it if it does the right/wrong thing. It has to find out what it did
that made it get the reward/punishment. We can use a similar method to train
computers to do many tasks, such as playing backgammon or chess 87
 Reinforcement learning algorithms are widely used in the
gaming industries to build games. It is also used to train
robots to do human tasks.
 Playing chess or a computer game
 Credit assignment problem
 Game playing
 Robot in a maze
 Supervised (inductive) learning
– Given: training data + desired outputs (labels)
 Unsupervised learning
– Given: training data (without desired outputs)
 Semi-supervised learning
– Given: training data + a few desired outputs
 Reinforcement learning
– Rewards from sequence of actions
 Machine learning models are computer programs that are
used to recognize patterns in data or make predictions.
 Machine learning models are created from machine learning
algorithms, which are trained using either labeled,
unlabeled, or mixed data.
 Different machine learning algorithms are selected as they
can be suited to different goals, such as classification,
regression, clustering, etc.
 Machine learning models are created by training algorithms with
either labeled or unlabeled data, or a mix of both using different
machine learning methods.
 Building a machine learning model project commonly involves the
following 10 steps:
 Step 1: Understand the business problem (and define success)
 Step 2: Understand and identify data
 Step 3: Collecting Data
 Step 4: Preparing data
 Step 5: Choose a model
 Step 6: Training a model
 Step 7: Evaluating the Model
 Step 8: Parameter tuning
 Step 9: Making Predictions
 Step 10: Deploy the machine learning model 92
 The first phase of any machine learning project is developing an understanding
of the business requirements. You need to know what problem you're trying to
solve before attempting to solve it.
 To start, work with the owner of the project and make sure you understand the
project's objectives and requirements.
 Key questions to answer include the following:
 What's the business objective that requires a cognitive solution?
 What parts of the solution are cognitive, and what aren't?
 Have all the necessary technical, business and deployment issues been addressed?
 What are the defined "success" criteria for the project?
 How can the project be staged in iterative sprints?
 Are there any special requirements for transparency, explainability or bias reduction?
 What are the ethical considerations?
 What are the acceptable parameters for accuracy, precision and confusion matrix values?
 What are the expected inputs to the model and the expected outputs?
 What are the characteristics of the problem being solved? Is this a classification,
regression or clustering problem?
 What is the "heuristic" -- the quick-and-dirty approach to solving the problem that
doesn't require machine learning? How much better than the heuristic does the model
need to be?
 How will the benefits of the model be measured?
 A machine learning model is built by learning and generalizing from training data, then
applying that acquired knowledge to new data it has never seen before to make
predictions and fulfill its purpose. Lack of data will prevent you from building the
model, and access to data isn't enough. Useful data needs to be clean and in a good
 Identify your data needs and determine whether the data is in proper shape for the
machine learning project. The focus should be on data identification, initial collection,
requirements, quality identification, insights and potentially interesting aspects that are
worth further investigation.
 Here are some key questions to consider:
 Where are the sources of the data that's needed for training the model?
 What quantity of data is needed for the machine learning project?
 What is the current quantity and quality of training data?
 How are the test set data and training set data being split?
 For supervised learning tasks, is there a way to label that data?
 Can pre-trained models be used?
 Where is the operational and training data located?
 Are there special needs for accessing real-time data on edge devices or in more difficult-to-
reach places?
 Answering these important questions helps you get a handle on the quantity and quality
of data as well as understand the type of data that's needed to make the model work.
 This step requires a the need for reliable data source and quality
 It is of the utmost importance to collect reliable data so that your
machine learning model can find the correct patterns. The quality of
the data that you feed to the machine will determine how accurate
your model is. If you have incorrect or outdated data, you will have
wrong outcomes or predictions which are not relevant.
 Make sure you use data from a reliable source, as it will directly
affect the outcome of your model. Good data is relevant, contains
very few missing and repeated values, and has a good
representation of the various subcategories/classes present.
 After you have your data, you have to prepare it. You can do this
by :
 Putting together all the data you have and randomizing it. This helps
make sure that data is evenly distributed, and the ordering does not affect
the learning process.
 Cleaning the data to remove unwanted data, missing values, rows, and
columns, duplicate values, data type conversion, etc. You might even
have to restructure the dataset and change the rows and columns or index
of rows and columns.
 Visualize the data to understand how it is structured and understand the
relationship between various variables and classes present.
 Splitting the cleaned data into two sets - a training set and a testing set.
The training set is the set your model learns from. A testing set is used to
check the accuracy of your model after training.
 Data preparation and cleansing tasks can take a substantial amount
of time
 Procedures during the data preparation, collection and cleansing process
include the following:
 Collect data from the various sources.
 Standardize formats across different data sources.
 Replace incorrect data.
 Enhance and augment data.
 Add more dimensions with pre-calculated amounts and aggregate information
as needed.
 Enhance data with third-party data.
 "Multiply" image-based data sets if they aren't sufficient enough for training.
 Remove extraneous information and deduplication.
 Remove irrelevant data from training to improve results.
 Reduce noise reduction and remove ambiguity.
 Consider anonymizing data.
 Normalize or standardize data to get it into formatted ranges.
 Sample data from large data sets.
 Select features that identify the most important dimensions and, if necessary,
reduce dimensions using a variety of techniques.
 Split data into training, test and validation sets. 97
 A machine learning model determines the output you get after
running a machine learning algorithm on the collected data.
 It is important to choose a model which is relevant to the task at
 Over the years, scientists and engineers developed various models
suited for different tasks like speech recognition, image
recognition, prediction, etc.
 Apart from this, you also have to see if your model is suited for
numerical or categorical data and choose accordingly.
 Training is the most important step in machine learning.
 In training, you pass the prepared data to your machine learning
model to find patterns and make predictions. It results in the
model learning from the data so that it can accomplish the task set.
 Over time, with training, the model gets better at predicting.
 After training your model, you have to check to see how it’s
performing. This is done by testing the performance of the
model on previously unseen data. The unseen data used is the
testing set that you split our data into earlier.
 If testing was done on the same data which is used for
training, you will not get an accurate measure, as the model is
already used to the data, and finds the same patterns in it, as it
previously did. This will give you disproportionately high
 When used on testing data, you get an accurate measure of
how your model will perform and its speed. 100
 During the model evaluation process, you should do the
 Evaluate the models using a validation data set.
 Determine confusion matrix values for classification problems.
 Identify methods for k-fold cross-validation if that approach is
 Further tune hyperparameters for optimal performance.
 Compare the machine learning model to the baseline model or
 Once you have created and evaluated your model, see if its
accuracy can be improved in any way. This is done by tuning
the parameters present in your model.
 Parameters are the variables in the model that the programmer
generally decides.
 At a particular value of your parameter, the accuracy will be
the maximum. Parameter tuning refers to finding these values.
 In the end, you can use your model on unseen data to
make predictions accurately.
 The last step in building a machine learning model is
the deployment of the model.
 Machine learning models are generally developed and tested
in a local or offline environment using training and testing
 Deployment is when the model is moved into a live
environment, dealing with new and unseen data.
 This is the point that the model starts to bring a return on
investment to the organization, as it is performing the task it
was trained to do with live data.
 Key questions
Q. How well the model works/perform in an unseen data?
 While training a model is a key step, how the model
generalizes on unseen data is an equally important aspect
that should be considered in every machine learning
 We need to know whether it actually works and,
consequently, if we can trust its predictions.
 Model evaluation aims to estimate the generalization
accuracy of a model on future (unseen/out-of-sample)
 The purpose of model evaluation is to help us to know
which algorithm best suits the given dataset for solving a
particular problem
 To select the “Best Fit” algorithm
 It evaluates the performance of different Machine Learning
models, based on the same input dataset.
 There are two methods that are used to evaluate a model
performance. They are
1. Holdout
2. Cross Validation
 Both methods use a test set (i.e data not seen by the model) to
evaluate model performance.
 It’s not recommended to use the data we used to build the model
to evaluate it. This is because our model will simply remember the
whole training set, and will therefore always predict the correct
label for any point in the training set. This is known as overfitting
 The Holdout method is used to evaluate the model
performance and uses two types of data : training and testing
 The training data is used to train the system
 The test data is used to calculate the performance of the model
whereas it is trained using the training data set
 This method is used to check how well the machine learning
model developed using different algorithm
techniques performs on unseen samples of data.
 The approach is simple, flexible and fast.
 E.g. 80/20% train-test data split
 k-fold cross-validation is the most common cross-validation technique
and it works as the following way:
 The original dataset is partitioned into k equal size subsamples, called folds.
 The k is a user-specified number, usually with 5 or 10 as its preferred value.
 This is repeated k times, such that each time, one of the k subsets is used as
the test set/validation set and the other k-1 subsets are put together to form a
training set.
 The error estimation is averaged over all k trials to get the total effectiveness
of our model.
 Example:
 when performing five-fold cross-validation, the data is first partitioned into 5
parts of (approximately) equal size. A sequence of models is trained. The first
model is trained using the first fold as the test set, and the remaining folds are
used as the training set. This is repeated for each of these 5 splits of the data
and the estimation of accuracy is averaged over all 5 trials to get the total
effectiveness of our model.
 Cross-validation is usually the preferred method because it gives your
model the opportunity to train on multiple train-test splits. This gives you
a better indication of how well your model will perform on unseen data.
Hold-out, on the other hand, is dependent on just one train-test split.
 Model evaluation metrics are required to quantify model
 The choice of evaluation metrics depends on a given machine
learning task (such as classification, regression, ranking,
clustering, topic modeling, among others).
 All tasks may not require all evaluation metrics
 Some metrics, such as precision-recall, are useful for multiple tasks.
 Common types of evaluation metrics are depends on the type of
machine learning task
 Classification model
 Clustering model
 Forecast model
 Outlier model 111
 The different types of classification metrics are:
 Classification Accuracy
 Confusion Matrix
 F-Measure
 Logarithmic Loss
 Area under Curve (AUC)
 Classification Accuracy
 Classification accuracy is similar to the term Accuracy. It is
the ratio of the correct predictions to the total number of
Predictions made by the model from the given data.
 Confusion Matrix
 It is a NxN matrix structure used for
evaluating the performance of a classification
model, where N is the number of classes that
are predicted.
 It is operated on a test dataset in which the
true values are known.
 The matrix lets us know about the number of
incorrect and correct predictions made by a
classifier and is used to find correctness of the
 It consists of values like True Positive, False
Positive, True Negative, and False Negative,
which helps in measuring Accuracy, Precision,
Recall, Specificity, Sensitivity, and AUC curve.
 Confusion matrix:
 There are 4 important terms in confusion matrix:
 True Positives (TP): The cases in which our predictions are TRUE, and the actual output was also
 True Negatives (TN): The cases in which our predictions are FALSE, and the actual output was
also FALSE.
 False Positives (FP): The cases in which our predictions are TRUE, and the actual output was
 False Negative (FN): The cases in which our predictions are FALSE, and the actual output was
 Helps to calculate accuracy, precision, recall and F-measure
 The accuracy can be calculated by using the mean of True Positive and True Negative values of the
total sample values. It tells us about the total number of predictions made by the model that were
 Precision is the ratio of Number of True Positives in the sample to the total Positive
samples predicted by the classifier. It tells us about the positive samples that were correctly
identified by the model.
 Recall is the ratio of Number of True Positives in the sample to the sum of True Positive and False
Negative samples in the data.
 F1 Score
 It is also called as F-Measure. It is a best measure of the Test accuracy of the developed model. It
makes our task easy by eliminating the need to calculate Precision and Recall separately to know
about the model performance. F1 Score is the Harmonic mean of Recall and Precision. Higher the
F1 Score, better the performance of the model. Without calculating Precision and Recall separately,
we can calculate the model performance using F1 score as it is precise and robust.
 It helps to predict the state of outcome at any time with the help of
independent variables that are correlated.
 These metrics are designed in order to predict if the data is underfitted or
overfitted for the better usage of the model.
 They are:-
 Mean Absolute Error (MAE)
 Mean Squared Error (MSE)
 Root Mean Squared Error (RMSE)
 Mean Absolute Error is the average of the difference of the original values and
the predicted values. It gives us an idea of how far the predictions are from the
actual output. It doesn’t give clarity on whether the data is under fitted or over
fitted. It is calculated as follows:
 The mean squared error is similar to the mean absolute error. It is computed by
taking the average of the square of the difference between original
and predicted values. With the help of squaring, large errors can be converted
to small errors and large errors can be dealt with. It is computed as follows.
 The root mean squared error is the root of the mean of the square of difference
of the predicted and actual values of the given data. It is the most popular
metric evolution technique used in regression problems. It follows a normal
distribution and is based on the assumption that errors are unbiased. It is
computed using the below formulae.

More Related Content

What's hot

Breast Cancer Detection with Convolutional Neural Networks (CNN)
Breast Cancer Detection with Convolutional Neural Networks (CNN)Breast Cancer Detection with Convolutional Neural Networks (CNN)
Breast Cancer Detection with Convolutional Neural Networks (CNN)Mehmet Çağrı Aksoy
Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learningamalalhait
Image classification with Deep Neural Networks
Image classification with Deep Neural NetworksImage classification with Deep Neural Networks
Image classification with Deep Neural NetworksYogendra Tamang
Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning Usama Fayyaz
Machine learning ppt.
Machine learning ppt.Machine learning ppt.
Machine learning ppt.ASHOK KUMAR
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep LearningOswald Campesato
Facial Expression Recognition System using Deep Convolutional Neural Networks.
Facial Expression Recognition  System using Deep Convolutional Neural Networks.Facial Expression Recognition  System using Deep Convolutional Neural Networks.
Facial Expression Recognition System using Deep Convolutional Neural Networks.Sandeep Wakchaure
Prospects of Deep Learning in Medical Imaging
Prospects of Deep Learning in Medical ImagingProspects of Deep Learning in Medical Imaging
Prospects of Deep Learning in Medical ImagingGodswll Egegwu
Support Vector Machine ppt presentation
Support Vector Machine ppt presentationSupport Vector Machine ppt presentation
Support Vector Machine ppt presentationAyanaRukasar
Machine learning Algorithms
Machine learning AlgorithmsMachine learning Algorithms
Machine learning AlgorithmsWalaa Hamdy Assy
Applying Deep Learning to Transform Breast Cancer Diagnosis
Applying Deep Learning to Transform Breast Cancer DiagnosisApplying Deep Learning to Transform Breast Cancer Diagnosis
Applying Deep Learning to Transform Breast Cancer DiagnosisCognizant
Machine learning & computer vision
Machine learning & computer visionMachine learning & computer vision
Machine learning & computer visionNetlight Consulting
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachinePulse
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Marina Santini
Lecture1 introduction to machine learning
Lecture1 introduction to machine learningLecture1 introduction to machine learning
Lecture1 introduction to machine learningUmmeSalmaM1

What's hot (20)

Breast Cancer Detection with Convolutional Neural Networks (CNN)
Breast Cancer Detection with Convolutional Neural Networks (CNN)Breast Cancer Detection with Convolutional Neural Networks (CNN)
Breast Cancer Detection with Convolutional Neural Networks (CNN)
Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learning
Image classification with Deep Neural Networks
Image classification with Deep Neural NetworksImage classification with Deep Neural Networks
Image classification with Deep Neural Networks
Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning
Machine learning ppt.
Machine learning ppt.Machine learning ppt.
Machine learning ppt.
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selectio...
Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selectio...Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selectio...
Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selectio...
Human Emotion Recognition
Human Emotion RecognitionHuman Emotion Recognition
Human Emotion Recognition
Facial Expression Recognition System using Deep Convolutional Neural Networks.
Facial Expression Recognition  System using Deep Convolutional Neural Networks.Facial Expression Recognition  System using Deep Convolutional Neural Networks.
Facial Expression Recognition System using Deep Convolutional Neural Networks.
Computer Vision
Computer VisionComputer Vision
Computer Vision
Prospects of Deep Learning in Medical Imaging
Prospects of Deep Learning in Medical ImagingProspects of Deep Learning in Medical Imaging
Prospects of Deep Learning in Medical Imaging
Support Vector Machine ppt presentation
Support Vector Machine ppt presentationSupport Vector Machine ppt presentation
Support Vector Machine ppt presentation
Machine learning Algorithms
Machine learning AlgorithmsMachine learning Algorithms
Machine learning Algorithms
Applying Deep Learning to Transform Breast Cancer Diagnosis
Applying Deep Learning to Transform Breast Cancer DiagnosisApplying Deep Learning to Transform Breast Cancer Diagnosis
Applying Deep Learning to Transform Breast Cancer Diagnosis
Machine learning & computer vision
Machine learning & computer visionMachine learning & computer vision
Machine learning & computer vision
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World Applications
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?
Image recognition
Image recognitionImage recognition
Image recognition
Machine learning
Machine learningMachine learning
Machine learning
Lecture1 introduction to machine learning
Lecture1 introduction to machine learningLecture1 introduction to machine learning
Lecture1 introduction to machine learning

Similar to ML All Chapter PDF.pdf

Introduction AI ML& Mathematicals of ML.pdf
Introduction AI ML& Mathematicals of ML.pdfIntroduction AI ML& Mathematicals of ML.pdf
Introduction AI ML& Mathematicals of ML.pdfGandhiMathy6
Lect 7 intro to M.L..pdf
Lect 7 intro to M.L..pdfLect 7 intro to M.L..pdf
Lect 7 intro to M.L..pdfHassanElalfy4
Machine learning
Machine learningMachine learning
Machine learningeonx_32
The Ultimate Guide to Machine Learning (ML)
The Ultimate Guide to Machine Learning (ML)The Ultimate Guide to Machine Learning (ML)
The Ultimate Guide to Machine Learning (ML)RR IT Zone
MACHINE LEARNING(R17A0534).pdfFayyoOlani
Machine Learning The Powerhouse of AI Explained.pdf
Machine Learning The Powerhouse of AI Explained.pdfMachine Learning The Powerhouse of AI Explained.pdf
Machine Learning The Powerhouse of AI Explained.pdfCIO Look Magazine
2.17Mb ppt
2.17Mb ppt2.17Mb ppt
2.17Mb pptbutest
Machine learning - session 1
Machine learning - session 1Machine learning - session 1
Machine learning - session 1Luis Borbon
Intro/Overview on Machine Learning Presentation
Intro/Overview on Machine Learning PresentationIntro/Overview on Machine Learning Presentation
Intro/Overview on Machine Learning PresentationAnkit Gupta
Directions in machine learning Ceadar webinar
Directions in machine learning Ceadar webinar Directions in machine learning Ceadar webinar
Directions in machine learning Ceadar webinar smckeever
Artificial intelligence slides beginners
Artificial intelligence slides beginners Artificial intelligence slides beginners
Artificial intelligence slides beginners Antonio Fernandes
Big data, big opportunities
Big data, big opportunitiesBig data, big opportunities
Big data, big opportunitiesChouaieb NEMRI
Machine Learning for Absolute Beginners ( PDFDrive ).pdf
Machine Learning for Absolute Beginners ( PDFDrive ).pdfMachine Learning for Absolute Beginners ( PDFDrive ).pdf
Machine Learning for Absolute Beginners ( PDFDrive ).pdfAnkitBiswas31
Fundamentals of Artificial Intelligence — QU AIO Leadership in AI
Fundamentals of Artificial Intelligence — QU AIO Leadership in AIFundamentals of Artificial Intelligence — QU AIO Leadership in AI
Fundamentals of Artificial Intelligence — QU AIO Leadership in AIJunaid Qadir
areeba khan presentation.pptx
areeba khan presentation.pptxareeba khan presentation.pptx
areeba khan presentation.pptxHabibUllah395955

Similar to ML All Chapter PDF.pdf (20)

AI Presentation 1
AI Presentation 1AI Presentation 1
AI Presentation 1
Introduction AI ML& Mathematicals of ML.pdf
Introduction AI ML& Mathematicals of ML.pdfIntroduction AI ML& Mathematicals of ML.pdf
Introduction AI ML& Mathematicals of ML.pdf
Lect 7 intro to M.L..pdf
Lect 7 intro to M.L..pdfLect 7 intro to M.L..pdf
Lect 7 intro to M.L..pdf
Machine learning
Machine learningMachine learning
Machine learning
The Ultimate Guide to Machine Learning (ML)
The Ultimate Guide to Machine Learning (ML)The Ultimate Guide to Machine Learning (ML)
The Ultimate Guide to Machine Learning (ML)
Machine learning
Machine learning Machine learning
Machine learning
Machine Learning The Powerhouse of AI Explained.pdf
Machine Learning The Powerhouse of AI Explained.pdfMachine Learning The Powerhouse of AI Explained.pdf
Machine Learning The Powerhouse of AI Explained.pdf
2.17Mb ppt
2.17Mb ppt2.17Mb ppt
2.17Mb ppt
Machine Learning ppt
Machine Learning pptMachine Learning ppt
Machine Learning ppt
Machine learning - session 1
Machine learning - session 1Machine learning - session 1
Machine learning - session 1
Intro/Overview on Machine Learning Presentation
Intro/Overview on Machine Learning PresentationIntro/Overview on Machine Learning Presentation
Intro/Overview on Machine Learning Presentation
Directions in machine learning Ceadar webinar
Directions in machine learning Ceadar webinar Directions in machine learning Ceadar webinar
Directions in machine learning Ceadar webinar
Artificial intelligence slides beginners
Artificial intelligence slides beginners Artificial intelligence slides beginners
Artificial intelligence slides beginners
Big data, big opportunities
Big data, big opportunitiesBig data, big opportunities
Big data, big opportunities
Machine Learning for Absolute Beginners ( PDFDrive ).pdf
Machine Learning for Absolute Beginners ( PDFDrive ).pdfMachine Learning for Absolute Beginners ( PDFDrive ).pdf
Machine Learning for Absolute Beginners ( PDFDrive ).pdf
machine learning
machine learningmachine learning
machine learning
Fundamentals of Artificial Intelligence — QU AIO Leadership in AI
Fundamentals of Artificial Intelligence — QU AIO Leadership in AIFundamentals of Artificial Intelligence — QU AIO Leadership in AI
Fundamentals of Artificial Intelligence — QU AIO Leadership in AI
areeba khan presentation.pptx
areeba khan presentation.pptxareeba khan presentation.pptx
areeba khan presentation.pptx
Secret of Machine Learning
Secret of Machine LearningSecret of Machine Learning
Secret of Machine Learning

Recently uploaded

CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxJiesonDelaCerna
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
MICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxMICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxabhijeetpadhi001
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...jaredbarbolino94
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxUnboundStockton
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxEyham Joco
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupJonathanParaisoCruz
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton

Recently uploaded (20)

CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
MICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxMICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptx
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docx
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptx
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized Group
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A

ML All Chapter PDF.pdf

  • 1. FUNDAMENTALS OF MACHINE LEARNING (ML) Alemu Kumilachew Facultiy of Computing, Bahir Dar University Instituite of Technology (BiT), Bahir Dar, Ethiopia 1
  • 2. OUTLINES OF THE COURSE 2 1 • Introduction to Machine Learning 2 • Concepts of Learning and its process 3 • Types of Learning and Machine learning methods 4 • Model Building 5 • Evaluation 6 • Applications & Current trends in machine learning
  • 3. REFERENCES  TEXT BOOKS: 1) Ethem Alpaydin, ”Introduction to Machine Learning”, MIT Press, Prentice Hall of India, 3rd Edition2014. 2) Mehryar Mohri, Afshin Rostamizadeh, Ameet Talwalkar ” Foundations of Machine Learning”, MIT Press,2012. 3) MACHINE LEARNING - An Algorithmic Perspective, Second Edition, Stephen Marsland, 2015. 4) Tom Mitchell, “Machine Learning”, McGraw Hill, 3rdEdition, 1997. REFERENCE BOOKS: 1) CharuC.Aggarwal,“Data Classification Algorithms and Applications”, CRCPress, 2014. 2) Charu C. Aggarwal, “DATA CLUSTERING Algorithms and Applications”, CRC Press, 2014. 3) Kevin P. Murphy ”Machine Learning: A Probabilistic Perspective”, The MIT Press, 2012 4) Jiawei Han and Micheline Kambers and JianPei, “Data Mining Concepts andTechniques”,3rd edition, Morgan Kaufman Publications, 2012. 3
  • 4. EVALUATION METHODS  Assignment/Quiz/test …………..……5%  Group Project ………………….……. 20%  Mid Exam……………………………..25%  Final Exam…………………………….50% 4
  • 5. OBJECTIVES OF LEARNING ML?  The main objectives of learning machine learning is to know and understand:  What are the different types in machine learning?  What are the different algorithms available for developing machine learning models?  What tools are available for developing these models?  What are the programming language choices?  What platforms support development and deployment of Machine Learning applications?  What IDEs (Integrated Development Environment) are available?  How to quickly upgrade your skills in this important area? 5
  • 7. AI VS. MACHINE LEARNING – “LEARN”  Machine learning (ML) is a subset of artificial intelligence (AI), that is all about getting an AI to accomplish tasks without being given specific instructions. In essence, it’s about teaching machines how to learn! 7
  • 8. AI VS. MACHINE LEARNING – “LEARNING”  AI is simulated human cognition, that is supposed to do via learning!  What is learning?  Were we born with PhD level intelligence? 8  Of course not! At the beginning of our lives, we have little understanding of the world around us, but over time we grow to learn a lot. We use our senses to take in data, and learn via a combination of interacting with the world around us, being explicitly taught certain things by others, finding patterns over time, and, of course, lots of trial-and-error  “Learning is any process by which a system improves performance from experience.” - Herbert Simon
  • 9. AI VS. MACHINE LEARNING – “LEARNING”  AI learns in a similar way. When it’s first created, an AI knows nothing; ML gives AI the ability to learn about its world.  AI is all about allowing a system to learn from examples rather than instructions. ML is what makes that possible. 9
  • 10. AI VS. MACHINE LEARNING – “LEARNING” 10  AIs are taught, not explicitly programmed. In other words, instead of spelling out specific rules to solve a problem, we give them examples of what they will encounter in the real world and let them find the patterns themselves. Allowing machines to find patterns is beneficial over spelling out the instructions when the instructions are hard or unknown or when the data has many different variables, for example treating cancer, predicting the stock market.
  • 11. WHAT IS MACHINE LEARNING?  Arthur Samuel (at IBM )- coined the term “Machine Learning” in 1959 at the first time .  He defined machine learning as: 11  No universally accepted definition for ML.  Different authors define the term differently. “the field of study that gives computers the ability to learn without being explicitly programmed.”
  • 12. DEFINITION OF ML  Machine learning is a branch of artificial intelligence (AI) and computer science which focuses on the use of data and algorithms to imitate the way that humans learn, gradually improving its accuracy. 12  Machine learning is programming computers to optimize a performance criterion using example data or past experience.  ML creates a model defined up to some parameters, and learning is the execution of a computer program to optimize the parameters of the model using the training data or past experience. The model may be predictive to make predictions in the future, or descriptive to gain knowledge from data, or both.  A model:  is a compressed version of a database;  extracts knowledge from it;  does not have perfect performance but is a useful approximation to the data.  Machine learning (ML) is defined as a discipline of artificial intelligence (AI) that provides machines the ability to automatically learn from data and past experiences to identify patterns and make predictions with minimal human intervention.
  • 13. DEFINITION OF ML?  Definition by Tom Mitchell (1998): Machine Learning is the study of algorithms that  improve their performance P  at some task T  with experience E. A well-defined learning task is given by <P,T,E> 13  A computer program which learns from experience is called a machine learning program or simply a learning program. Such a program is sometimes also referred to as a learner
  • 14. WHAT IS MACHINE LEARNING?- WHY “MACHINE LEARNING”? 1. “Big data” - models are based on huge amounts of data which is being produced and stored continuously  science: genomics, astronomy, materials science, particle accelerators. . .  sensor networks: weather measurements, traffic. . .  people: social networks, blogs, mobile phones, purchases, bank transactions. . . etc 2. Data is not random; it contains structure that can be used to predict outcomes, or gain knowledge in some way.  Ex: patterns of Amazon purchases can be used to recommend items. 3. It is more difficult to design algorithms for such tasks (compared to, say, sorting an array or calculating a payroll). Such algorithms need data.  Ex: construct a spam filter, using a collection of email messages labelled as spam/not spam. 4. Learning isn’t always useful:  There is no need to “learn” to calculate payroll 5. Data mining – extracting useful knowledge/insights from data  Ex: Data mining is designed to extract the rules via the application of ML methods from large databases. 14
  • 15. WHAT IS MACHINE LEARNING?- WHY “MACHINE LEARNING”?  Example #1:  A classic example of a task that requires machine learning: It is very hard to say what makes a 2 15
  • 16. WHAT IS MACHINE LEARNING?- WHY “MACHINE LEARNING”?  Example #2: House price prediction  After plotting various data points on the XY plot, we draw a best-fit line to do our predictions for any other house given its size. You will feed the known data to the machine and ask it to find the best fit line. Once the best fit line is found by the machine, you will test its suitability by feeding in a known house size, i.e. the Y-value in the above curve. The machine will now return the estimated X-value, i.e. the expected price of the house. 16
  • 17. WHAT IS MACHINE LEARNING?- WHY “MACHINE LEARNING”?  Example #3: Some more examples of tasks that are best solved by using a learning algorithm  Recognizing patterns:  Facial identities or facial expressions  Handwritten or spoken words  Medical images  Generating patterns:  Generating images or motion sequences  Recognizing anomalies:  Unusual credit card transactions  Unusual patterns of sensor readings in a nuclear power plant  Prediction:  Future stock prices or currency exchange rates 17
  • 21. WHAT IS MACHINE LEARNING?- WHEN DO WE USE MACHINE LEARNING?  ML is used when:  Human expertise does not exist (navigating on Mars)  Humans can’t explain their expertise (speech recognition)  Models must be customized (personalized medicine)  Solution needs to be adapted to particular cases (user biometrics)  Models are based on huge amounts of data (genomics)  Solution changes in time (routing on a computer network) 21
  • 24. STATE OF THE ART APPLICATIONS OF MACHINE LEARNING  Autonomous car technologies 24
  • 25. STATE OF THE ART APPLICATIONS OF MACHINE LEARNING  Deep learning emergence 25
  • 28. STATE OF THE ART APPLICATIONS OF MACHINE LEARNING  Training on Multiple Objects 28
  • 29. STATE OF THE ART APPLICATIONS OF MACHINE LEARNING  Automatic Speech recognition systems 29
  • 31. HISTORY OF ML  1950s – Samuel’s checker player – Selfridge’s Pandemonium  1960s: – Neural networks: Perceptron – Pattern recognition – Learning in the limit theory – Minsky and Papert prove limitationsof Perceptron  1970s: – Symbolic concept induction – Winston’s arch learner – Expert systems and the knowledge acquisition bottleneck – Quinlan’s D3 – Michalski’s AQ and soybean diagnosis – Scientific discovery with BACON – Mathematical discovery with AM 31
  • 32. HISTORY OF ML…  1980s: – Advanced decision tree and rule learning – Explanation-based Learning (EBL) – Learning and planning and problem solving – Utility problem – Analogy – Cognitive architectures – Resurgence of neural networks (connectionism, backpropagation) – Valiant’s PAC Learning Theory – Focus on experimental methodology  1990s – Data mining – Adaptive software agents and web applications – Text learning – Reinforcement learning (RL) – Inductive Logic Programming (ILP) – Ensembles: Bagging, Boosting, and Stacking – Bayes Net learning 32
  • 33. HISTORY OF ML…  2000s – Support vector machines & kernel methods – Graphical models – Statistical relational learning – Transfer learning – Sequence labeling – Collective classification and structured outputs – Computer Systems Applications (Compilers, Debugging, Graphics, Security) – E-mail management – Personalized assistants that learn – Learning in robotics and vision  2010s – Deep learning systems – Learning for big data – Bayesian methods – Multi-task & lifelong learning – Applications to vision, speech, social networks, learning to read, etc – ??? 33
  • 34. APPLICATION OF MACHINE LEARNING 34  The following is a list of some of the typical applications of machine learning. 1. In retail business, machine learning is used to study consumer behaviour. 2. In finance, banks analyze their past data to build models to use in credit applications, fraud detection, and the stock market. 3. In manufacturing, learning models are used for optimization, control, and troubleshooting. 3 4. In medicine, learning programs are used for medical diagnosis. 5. In telecommunications, call patterns are analyzed for network optimization and maximizing the quality of service. 6. In science, large amounts of data in physics, astronomy, and biology can only be analyzed fast enough by computers. The World Wide Web is huge; it is constantly growing and searching for relevant information cannot be done manually. 7. In artificial intelligence, it is used to teach a system to learn and adapt to changes so that the system designer need not foresee and provide solutions for all possible situations. 8. It is used to find solutions to many problems in vision, speech recognition, and robotics. 9. Machine learning methods are applied in the design of computer-controlled vehicles to steer correctly when driving on a variety of roads. 10. Machine learning methods have been used to develop programmes for playing games such as chess, backgammon and Go
  • 35. CHAPTER SUMMARY  Learning can be viewed as using direct or indirect experience to approximate a chosen target function.  Learning general models from a data of particular examples  Data is cheap and abundant (data warehouses, data marts); knowledge is expensive and scarce.  Example in retail: Customer transactions to consumer behavior: People who bought “Da Vinci Code” also bought “The Five People You Meet in Heaven” (  Machine Learning builds a model that is a good and useful approximation to the data. 35
  • 37. LEARNING  Definition  A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks T, as measured by P, improves with experience E. 37
  • 38. LEARNING …  Examples: defining a learning task I. Handwriting recognition learning problem  T: Recognizing and classifying handwritten words within images  P: Percent of words correctly classified  E: A dataset of handwritten words with given classifications II. A robot driving learning problem  T: Driving on highways using vision sensors  P: Average distance traveled before an error  E: A sequence of images and steering commands recorded while observing a human driver III. A chess learning problem  T: Playing chess  P: Percent of games won against opponents  E: Playing practice games against itself. IV. Spam filtering  T: Categorize email messages as spam or legitimate.  P: Percentage of email messages correctly classified.  E: Database of emails, some with human-given label 38
  • 39. COMPONENTS OF LEARNING  Basic components of learning process  The learning process, whether by a human or a machine, can be divided into four components, namely, data storage, abstraction, generalization and evaluation Fig. Components of the learning process 39
  • 40. COMPONENTS OF LEARNING PROCESS  Data storage (1)  Facilities for storing and retrieving huge amounts of data are an important component of the learning process. Humans and computers alike utilize data storage as a foundation for advanced reasoning.  In a human being, the data is stored in the brain and data is retrieved using electrochemical signals.  Computers use hard disk drives, flash memory, random access memory and similar devices to store data and use cables and other technology to retrieve data. 40
  • 41. COMPONENTS OF LEARNING PROCESS …  Abstraction (2)  The second component of the learning process is known as abstraction.  Abstraction is the process of extracting knowledge about stored data. This involves creating general concepts about the data as a whole. The creation of knowledge involves application of known models and creation of new models.  The process of fitting a model to a dataset is known as training. When the model has been trained, the data is transformed into an abstract form that summarizes the original information 41
  • 42. COMPONENTS OF LEARNING PROCESS …  Generalization (3)  The third component of the learning process is known as generalization.  The term generalization describes the process of turning the knowledge about stored data into a form that can be utilized for future action.  These actions are to be carried out on tasks that are similar, but not identical, to those what have been seen before.  In generalization, the goal is to discover those properties of the data that will be most relevant to future tasks 42
  • 43. COMPONENTS OF LEARNING PROCESS …  Evaluation (4)  Evaluation is the last component of the learning process. It is the process of giving feedback to the user to measure the utility of the learned knowledge.  This feedback is then utilized to effect improvements in the whole learning process 43
  • 44. LEARNING MODELS  Machine learning is concerned with using the right features to build the right models that achieve the right tasks.  For a given problem, the collection of all possible outcomes represents the sample space or instance space.  The basic idea of Learning models has divided into three categories.  Using a Logical expression. (Logical models)  Using the Geometry of the instance space. (Geometric models)  Using Probability to classify the instance space. (Probabilistic models)  Grouping and Grading (an orthogonal categorization to geometric-probabilistic-logical-compositional) 44
  • 45. LEARNING MODELS : LOGICAL MODELS  Logical models use a logical expression to divide the instance space into segments and hence construct grouping models.  A logical expression is an expression that returns a Boolean value, i.e., a True or False outcome.  Once the data is grouped using a logical expression, the data is divided into homogeneous groupings for the problem we are trying to solve.  For example, for a classification problem, all the instances in the group belong to one class. 45
  • 46. LEARNING MODELS : LOGICAL MODELS …  There are mainly two kinds of logical models: Tree models and Rule models.  Rule models consist of a collection of implications or IF-THEN rules.  For tree-based models, the ‘if-part’ defines a segment and the ‘then-part’ defines the behaviour of the model for this segment. Rule models follow the same reasoning.  logical models, such as decision trees, a logical expression is used to partition the instance space. Two instances are similar when they end up in the same logical segment. 46
  • 47. LEARNING MODELS : LOGICAL MODELS …  Example:  “Enjoy Sport” as shown above is defined by a set of data from some example days. Each data is described by six attributes. The task is to learn to predict the value of Enjoy Sport for an arbitrary day based on the values of its attribute values. The problem can be represented by a series of hypotheses. Each hypothesis is described by a conjunction of constraints on the attributes. The training data represents a set of positive and negative examples of the target function. In the example above, each hypothesis is a vector of six constraints, specifying the values of the six attributes – Sky, AirTemp, Humidity, Wind, Water, and Forecast. The training phase involves learning the set of days (as a conjunction of attributes) for which Enjoy Sport = yes.  Thus, the problem can be formulated as:  Given instances X which represent a set of all possible days, each described by the attributes:  o Sky – (values: Sunny, Cloudy, Rainy),  o AirTemp – (values: Warm, Cold),  o Humidity – (values: Normal, High),  o Wind – (values: Strong, Weak),  o Water – (values: Warm, Cold),  o Forecast – (values: Same, Change).  Q. Try to identify a function that can predict the target variable Enjoy Sport as yes/no, i.e., 1 or 0. 47
  • 48. LEARNING MODELS : GEOMETRIC MODELS …  In Geometric models, features could be described as points in two dimensions (x- and y-axis) or a three-dimensional space (x, y, and z).  for example, temperature as a function of time can be modelled in two axes  In geometric models, there are two ways we could impose similarity.  We could use geometric concepts like lines or planes to segment (classify) the instance space. These are called Linear models.  Alternatively, we can use the geometric notion of distance to represent similarity. In this case, if two points are close together, they have similar values for features and thus can be classed as similar. We call such models as Distance-based models. 48
  • 49. LEARNING MODELS : GEOMETRIC MODELS  Linear models  Linear models are relatively simple. In this case, the function is represented as a linear combination of its inputs.  In the simplest case where f(x) represents a straight line, we have an equation of the form f (x) = mx + c where c represents the intercept and m represents the slope. 49  Linear models are parametric, which means that they have a fixed form with a small number of numeric parameters that need to be learned from data. For example, in f (x) = mx + c, m and c are the parameters that we are trying to learn from the data. This technique is different from tree or rule models, where the structure of the model (e.g., which features to use in the tree, and where) is not fixed in advance.
  • 50. LEARNING MODELS : GEOMETRIC MODELS  Distance-based models  As the name implies, distance-based models work on the concept of distance. In the context of Machine learning, the concept of distance is not based on merely the physical distance between two points.  The distance metrics commonly used are Euclidean & Manhattan distance 50
  • 51. LEARNING MODELS : GEOMETRIC MODELS  Distance-based models  Distance is applied through the concept of neighbors and exemplars.  Neighbors are points in proximity with respect to the distance measure expressed through exemplars.  Exemplars are either centroids that find a center of mass according to a chosen distance metric or medoids that find the most centrally located data point.  The most commonly used centroid is the arithmetic mean, which minimizes squared Euclidean distance to all other points.  Notes:  The centroid represents the geometric center of a plane figure, i.e., the arithmetic mean position of all the points in the figure from the centroid point. This definition extends to any object in n-dimensional space: its centroid is the mean position of all the points.  Medoids are similar in concept to means or centroids. Medoids are most commonly used on data when a mean or centroid cannot be defined. They are used in contexts where the centroid is not representative of the dataset, such as in image data.  Examples of distance-based models include the nearest-neighbour models, which use the training data as exemplars – for example, in classification. The K-means clustering algorithm also uses exemplars to create clusters of similar data points. 51
  • 52. LEARNING MODELS : PROBABILISTIC MODELS  Probabilistic models use the idea of probability to classify new entities.  Probabilistic models see features and target variables as random variables. The process of modelling represents and manipulates the level of uncertainty with respect to these variables.  There are two types of probabilistic models: Predictive and Generative.  Predictive probability models use the idea of a conditional probability distribution P (Y |X) from which Y can be predicted from X.  Generative models estimate the joint distribution P (Y, X). Once we know the joint distribution for the generative models, we can derive any conditional or marginal distribution involving the same variables. Thus, the generative model is capable of creating new data points and their labels, knowing the joint probability distribution. The joint distribution looks for a relationship between two variables. Once this relationship is inferred, it is possible to infer new data points. 52
  • 53. LEARNING MODELS : PROBABILISTIC MODELS  Naïve Bayes  Naïve Bayes is an example of a probabilistic classifier. We can do this using the Bayes rule defined as  The Naïve Bayes algorithm is based on the idea of Conditional Probability. Conditional probability is based on finding the probability that something will happen, given that something else has already happened. The task of the algorithm then is to look at the evidence and to determine the likelihood of a specific class and assign a label accordingly to each entity. 53
  • 54. SUMMARY OF LEARNING MODELS  logical models use a logical expression to partition the instance space  Geometric(such as distance-based models) uses the idea of distance (e.g., Euclidian distance) to classify entities  probabilistic models use the idea of probability to classify new entities. 54 Learning models Geometric models K-nearest neighbors, linear regression, support vector machine, logistic regression, … Probabilistic Naïve Bayes, Gaussian process regression, conditional random field, … Logical models Decision tree, random forest, …
  • 55. DESIGNING A LEARNING SYSTEM  For any learning system, we must be knowing the three elements — T (Task), P (Performance Measure), and E (Training Experience).  At a high level, the process of learning system looks as below. 55
  • 56. DESIGNING A LEARNING SYSTEM  The learning process starts with task T, performance measure P and training experience E and objective are to find an unknown target function.  The target function is an exact knowledge to be learned from the training experience and its unknown.  For example, in a case of credit approval, the learning system will have customer application records as experience and task would be to classify whether the given customer application is eligible for a loan.  So in this case, the training examples can be represented as 8 (x1,y1)(x2,y2)..(xn,yn) where X represents customer application details and y represents the status of credit approval.  With these details, what is that exact knowledge to be learned from the training experience?  So the target function to be learned in the credit approval learning system is a mapping function f:X →y. This function represents the exact knowledge defining the relationship between input variable X and output variable y. 56
  • 57. DESIGNING A LEARNING SYSTEM  Just now we looked into the learning process and also understood the goal of the learning. When we want to design a learning system that follows the learning process, we need to consider a few design choices. The design choices will be to decide the following key components 1. Choose the training experience 2. Choose exactly what is to be learned (the target function) – i.e. the target function 3. Choose how to represent the target function 4. Choose a learning algorithm to infer the target function from the experience 5. The final design 57
  • 58. DESIGNING A LEARNING SYSTEM  Example:  We will look into the game - checkers learning problem and apply the above design choices.  For a checkers learning problem, the three elements will be, 1. Task T: To play checkers 2. Performance measure P: Total percent of the game won in the tournament. 3. Training experience E: A set of games played against itself 58
  • 60. SUPERVISED LEARNING: OVERVIEW  Labels are provided  SL is also called learning from exemplars.  Supervised learning is a type of machine learning that uses labeled data to train machine learning models. In labeled data, the output is already known. The model just needs to map the inputs to the respective outputs.  Supervised machine learning algorithm works by using and analyzing the labeled training data and produces/builds a function/model, which can be used for mapping new examples (the class labels for unseen instances) to its target outputs.  SL has this form: Given (x1, y1), (x2, y2), ..., (xn, yn) The algorithm learns a function f(x) to predict y given x. 60
  • 61. SUPERVISED LEARNING: OVERVIEW  Example#1:  Suppose the data consisting of the gender and age of the patients and each patient is labeled as “healthy” or “sick”.  Q. What will be the role of the supervised machine learning algorithm in the above example? 61  A. Therefore the purpose of a supervised machine learning algorithm here is to learn/train the above data and a build a function/model that identifies any new/unseen patient as “sick” or “healthy” based on his age and gender parameters.
  • 62. SUPERVISED LEARNING: OVERVIEW  Example#2:  An example of supervised learning is to train a system that identifies the image of an animal. 62
  • 63. SUPERVISED LEARNING: WHY “SUPERVISED LEARNING”?  Supervised Learning methods need external supervision to train machine learning models. They need guidance and additional information to return the desired result.  It can be thought of as a teacher supervising the learning process. We know the correct answers (that is, the correct outputs), the algorithm iteratively makes predictions on the training data and is corrected by the teacher. Learning stops when the algorithm achieves an acceptable level of performance. 63
  • 64. SUPERVISED LEARNING: TYPES SL PROBLEMS  Classification and regression problems are the most common types of supervised learning problems. 64
  • 65. SUPERVISED LEARNING: CLASSIFICATION  Classification: the labels to be predicted are categorical:  Works by pattern recognition  Face recognition:  Optical character recognition: different styles, slant. . .  Credit scoring: classify customers into high- and low-risk, based on their income and savings, using data about past loans (whether they were paid or not). 65 Model: IF income > θ1 AND savings > θ2 THEN low-risk ELSE high-risk
  • 66. SUPERVISED LEARNING: CLASSIFICATION …  Given (x1, y1), (x2, y2), ..., (xn, yn) Learn a function f(x) to predict y given x – y is categorical == classification 66
  • 67. SUPERVISED LEARNING: REGRESSION …  Regression: the labels to be predicted are continuous Given (x1, y1), (x2, y2), ..., (xn, yn) Learn a function f(x) to predict y given x – y is real-valued == regression 67
  • 68. SUPERVISED LEARNING: REGRESSION …  Example:  Credit scoring: classify customers into high- and low-risk, based on their income and savings, using data about past loans (whether they were paid or not).  Predict the price of a car from its mileage. 68
  • 69. SUPERVISED LEARNING: ALGORITHMS  A wide range of supervised learning algorithms are available, each with its strengths and weaknesses. There is no single learning algorithm that works best on all supervised learning problems  Some of the most popularly used supervised learning algorithms are:  Linear Regression  Logistic Regression  Support Vector Machine  K Nearest Neighbor  Decision Tree  Random Forest  Naive Bayes 69
  • 70. SUPERVISED LEARNING: APPLICATIONS  Supervised learning algorithms are generally used for solving classification and regression problems.  Few of the top supervised learning applications are weather prediction, sales forecasting, stock price analysis. 70
  • 71. UNSUPERVISED LEARNING  Unsupervised learning is a type of machine learning that uses unlabeled data to train machines and works by finding patterns and understands the trends in the data to discover the output. So, the model tries to label the data based on the features of the input data.  In unsupervised learning algorithms, a classification or categorization (labels/classes) is not included in the observations. But instead the algorithm tries to identify similarities between the inputs so that inputs that have something in common are categorized together.  The training process used in “unsupervised learning” techniques does not need any supervision to build models. They learn on their own and predict the output. 71
  • 72. UNSUPERVISED LEARNING …  Example #1  Here, we have taken an unlabeled input data, which means it is not categorized and corresponding outputs are also not given. Now, this unlabeled input data is fed to the machine learning model in order to train it. Firstly, it will interpret the raw data to find the hidden patterns from the data and then will apply suitable algorithms such as k-means clustering, Decision tree, etc. Once it applies the suitable algorithm, the algorithm divides the data objects into groups according to the similarities and difference between the objects. 72
  • 73. UNSUPERVISED LEARNING …  Example #2:  Depicted below is an example of an unsupervised learning technique that uses the images of vehicles to classify if it’s a bus or a truck. The model learns by identifying the parts of a vehicle, such as a length and width of the vehicle, the front, and rear end covers, roof hoods, the types of wheels used, etc. Based on these features, the model classifies if the vehicle is a bus or a truck. 73
  • 74. UNSUPERVISED LEARNING …  Example#3:  Consider the following data regarding patients entering a clinic. The data consists of the gender and age of the patients.  Q. Based on this data, can we infer anything regarding the patients entering the clinic? 74
  • 75. UNSUPERVISED LEARNING  no labels provided, only input data. – 75
  • 76. UNSUPERVISED LEARNING: APPLICATIONS  Unsupervised learning is used for solving clustering and association problems.  Learning associations:  Basket analysis: let p(Y |X) = “probability that a customer who buys product X also buys product Y ”, estimated from past purchases. If p(Y |X) is large (say 0.7), associate “X → Y ”. When someone buys X, recommend them Y .  Clustering: group similar data points/instances.  Density estimation: where are data points likely to lie?  Dimensionality reduction: data lies in a low-dimensional manifold.  Feature selection: keep only useful features.  Outlier/novelty detection  Customer segmentation: based on customer behavior, likes, dislikes, and interests, you can segment and cluster similar customers into a group.  Image compression: Color quantization 76
  • 77. UNSUPERVISED LEARNING: APPLICATIONS  Genomics application: group individuals by genetic similarity 77
  • 79. UNSUPERVISED LEARNING: ALGORITHMS  Selecting the right algorithm depends on the type of problem you are trying to solve. Some of the common examples of unsupervised learning are:  K Means Clustering  Hierarchical Clustering  DBSCAN  Principal Component Analysis (PCA) 79
  • 80. SEMI-SUPERVISED LEARNING  labels provided for some points only.  It is a branch of machine learning that combines a small amount of labeled data with a large amount of unlabeled data during training.  Semi-supervised learning falls between unsupervised learning (with no labeled training data) and supervised learning (with only labeled training data). 80
  • 81. SEMI-SUPERVISED LEARNING: HOW SEMI-SUPERVISED LEARNING WORKS  Semi-supervised machine learning is a combination of supervised and unsupervised learning. It uses a small amount of labeled data and a large amount of unlabeled data, which provides the benefits of both unsupervised and supervised learning while avoiding the challenges of finding a large amount of labeled data. That means you can train a model to label data without having to use as much labeled training data.  Here’s how it works: 1. Train the model with the small amount of labeled training data just like you would in supervised learning, until it gives you good results. 2. Then use it with the unlabeled training dataset to predict the outputs, which are pseudo labels since they may not be quite accurate. 3. Link the labels from the labeled training data with the pseudo labels created in the previous step. 4. Link the data inputs in the labeled training data with the inputs in the unlabeled data. 5. Then, train the model the same way as you did with the labeled set in the beginning in order to decrease the error and improve the model’s accuracy. 81
  • 82. SEMI-SUPERVISED LEARNING: APPLICATIONS  Text document classifier: this is the type of situation where semi-supervised learning is ideal because it would be nearly impossible to find a large amount of labeled text documents.  The Classification of Content on the Internet: the internet is a vast trove of web pages, and it cannot be expected that every page will be labeled and have all the data for the field that you desire. However, at the same time, it is true that over the years, some minority of web pages will have been labeled for one dimension or the other. 82
  • 83. SEMI-SUPERVISED LEARNING: ASSUMPTIONS  Semi-supervised methods must make some assumption about the data in order to justify using a small set of labeled data to make conclusions about the unlabeled data points. These can be grouped into three categories. 1. The first is the continuity assumption. This assumes that data points that are “close” to each other are more likely to have a common label. 2. The second is the cluster assumption. This assumes that the data naturally forms discrete clusters, and that points in the same cluster are more likely to share a label. 3. The third is the manifold assumption. This assumes that the data roughly lies in a lower-dimensional space (or manifold) than the input space. This scenario is relevant when an unobservable or difficult-to-observe system with a small number of parameters produces high-dimensional observable output. 83
  • 84. REINFORCEMENT LEARNING  This is somewhere between supervised and unsupervised learning. The algorithm gets told when the answer is wrong, but does not get told how to correct it. It has to explore and try out different possibilities until it works out how to get the answer right.  Reinforcement learning is sometime called learning with a critic because of this monitor that scores the answer, but does not suggest improvements.  No supervised output but delayed reward.  Given a sequence of states and actions with rewards, find a sequence of actions (policy) that reaches a goal (output a policy)  Policy is a mapping from states à actions that tells you what to do in a given state  Policies: what actions should an agent take in a particular situation  Utility estimation: how good is a state (used by policy) 84
  • 85. REINFORCEMENT LEARNING: HOW IT WORKS  Reinforcement learning follows trial and error methods to get the desired result. After accomplishing a task, the agent receives an award. An example could be to train a dog to catch the ball. If the dog learns to catch a ball, you give it a reward, such as a biscuit.  Reinforcement Learning methods do not need any external supervision to train models.  Reinforcement learning problems are reward-based. For every task or for every step completed, there will be a reward received by the agent. If the task is not achieved correctly, there will be some penalty added. 85
  • 86. REINFORCEMENT LEARNING: THE AGENT-ENVIRONMENT INTERFACE  Reinforcement Learning trains a machine to take suitable actions and maximize its rewards in a particular situation. It uses an agent and an environment to produce actions and rewards. The agent has a start and an end state. But, there might be different paths for reaching the end state, like a maze. In this learning technique, there is no predefined target variable. 86
  • 87. REINFORCEMENT LEARNING: EXAMPLE  Example#1:  An example of reinforcement learning is to train a machine that can identify the shape of an object, given a list of different objects. In the example shown, the model tries to predict the shape of the object, which is a square in this case.  Example #2:  Consider teaching a dog a new trick: we cannot tell it what to do, but we can reward/punish it if it does the right/wrong thing. It has to find out what it did that made it get the reward/punishment. We can use a similar method to train computers to do many tasks, such as playing backgammon or chess 87
  • 88. REINFORCEMENT LEARNING: APPLICATIONS  Reinforcement learning algorithms are widely used in the gaming industries to build games. It is also used to train robots to do human tasks.  Playing chess or a computer game  Credit assignment problem  Game playing  Robot in a maze 88
  • 89. REINFORCEMENT LEARNING : SUMMARY  Supervised (inductive) learning – Given: training data + desired outputs (labels)  Unsupervised learning – Given: training data (without desired outputs)  Semi-supervised learning – Given: training data + a few desired outputs  Reinforcement learning – Rewards from sequence of actions 89
  • 91. MACHINE LEARNING MODELS  Machine learning models are computer programs that are used to recognize patterns in data or make predictions.  Machine learning models are created from machine learning algorithms, which are trained using either labeled, unlabeled, or mixed data.  Different machine learning algorithms are selected as they can be suited to different goals, such as classification, regression, clustering, etc. 91
  • 92. HOW TO BUILD A MACHINE LEARNING MODEL: COMMON STEPS  Machine learning models are created by training algorithms with either labeled or unlabeled data, or a mix of both using different machine learning methods.  Building a machine learning model project commonly involves the following 10 steps:  Step 1: Understand the business problem (and define success)  Step 2: Understand and identify data  Step 3: Collecting Data  Step 4: Preparing data  Step 5: Choose a model  Step 6: Training a model  Step 7: Evaluating the Model  Step 8: Parameter tuning  Step 9: Making Predictions  Step 10: Deploy the machine learning model 92
  • 93. STEP 1. UNDERSTAND THE BUSINESS PROBLEM (AND DEFINE SUCCESS)  The first phase of any machine learning project is developing an understanding of the business requirements. You need to know what problem you're trying to solve before attempting to solve it.  To start, work with the owner of the project and make sure you understand the project's objectives and requirements.  Key questions to answer include the following:  What's the business objective that requires a cognitive solution?  What parts of the solution are cognitive, and what aren't?  Have all the necessary technical, business and deployment issues been addressed?  What are the defined "success" criteria for the project?  How can the project be staged in iterative sprints?  Are there any special requirements for transparency, explainability or bias reduction?  What are the ethical considerations?  What are the acceptable parameters for accuracy, precision and confusion matrix values?  What are the expected inputs to the model and the expected outputs?  What are the characteristics of the problem being solved? Is this a classification, regression or clustering problem?  What is the "heuristic" -- the quick-and-dirty approach to solving the problem that doesn't require machine learning? How much better than the heuristic does the model need to be?  How will the benefits of the model be measured? 93
  • 94. STEP 2. UNDERSTAND AND IDENTIFY DATA  A machine learning model is built by learning and generalizing from training data, then applying that acquired knowledge to new data it has never seen before to make predictions and fulfill its purpose. Lack of data will prevent you from building the model, and access to data isn't enough. Useful data needs to be clean and in a good shape.  Identify your data needs and determine whether the data is in proper shape for the machine learning project. The focus should be on data identification, initial collection, requirements, quality identification, insights and potentially interesting aspects that are worth further investigation.  Here are some key questions to consider:  Where are the sources of the data that's needed for training the model?  What quantity of data is needed for the machine learning project?  What is the current quantity and quality of training data?  How are the test set data and training set data being split?  For supervised learning tasks, is there a way to label that data?  Can pre-trained models be used?  Where is the operational and training data located?  Are there special needs for accessing real-time data on edge devices or in more difficult-to- reach places?  Answering these important questions helps you get a handle on the quantity and quality of data as well as understand the type of data that's needed to make the model work. 94
  • 95. STEP 3: COLLECTING DATA  This step requires a the need for reliable data source and quality data  It is of the utmost importance to collect reliable data so that your machine learning model can find the correct patterns. The quality of the data that you feed to the machine will determine how accurate your model is. If you have incorrect or outdated data, you will have wrong outcomes or predictions which are not relevant.  Make sure you use data from a reliable source, as it will directly affect the outcome of your model. Good data is relevant, contains very few missing and repeated values, and has a good representation of the various subcategories/classes present. 95
  • 96. STEP 4: PREPARING THE DATA  After you have your data, you have to prepare it. You can do this by :  Putting together all the data you have and randomizing it. This helps make sure that data is evenly distributed, and the ordering does not affect the learning process.  Cleaning the data to remove unwanted data, missing values, rows, and columns, duplicate values, data type conversion, etc. You might even have to restructure the dataset and change the rows and columns or index of rows and columns.  Visualize the data to understand how it is structured and understand the relationship between various variables and classes present.  Splitting the cleaned data into two sets - a training set and a testing set. The training set is the set your model learns from. A testing set is used to check the accuracy of your model after training.  Data preparation and cleansing tasks can take a substantial amount of time 96
  • 97. STEP 4: PREPARING THE DATA: SPECIFIC ACTIVITIES  Procedures during the data preparation, collection and cleansing process include the following:  Collect data from the various sources.  Standardize formats across different data sources.  Replace incorrect data.  Enhance and augment data.  Add more dimensions with pre-calculated amounts and aggregate information as needed.  Enhance data with third-party data.  "Multiply" image-based data sets if they aren't sufficient enough for training.  Remove extraneous information and deduplication.  Remove irrelevant data from training to improve results.  Reduce noise reduction and remove ambiguity.  Consider anonymizing data.  Normalize or standardize data to get it into formatted ranges.  Sample data from large data sets.  Select features that identify the most important dimensions and, if necessary, reduce dimensions using a variety of techniques.  Split data into training, test and validation sets. 97
  • 98. STEP 5: CHOOSING A MODEL  A machine learning model determines the output you get after running a machine learning algorithm on the collected data.  It is important to choose a model which is relevant to the task at hand.  Over the years, scientists and engineers developed various models suited for different tasks like speech recognition, image recognition, prediction, etc.  Apart from this, you also have to see if your model is suited for numerical or categorical data and choose accordingly. 98
  • 99. STEP 6: TRAINING THE MODEL  Training is the most important step in machine learning.  In training, you pass the prepared data to your machine learning model to find patterns and make predictions. It results in the model learning from the data so that it can accomplish the task set.  Over time, with training, the model gets better at predicting. 99
  • 100. STEP 5: EVALUATING THE MODEL  After training your model, you have to check to see how it’s performing. This is done by testing the performance of the model on previously unseen data. The unseen data used is the testing set that you split our data into earlier.  If testing was done on the same data which is used for training, you will not get an accurate measure, as the model is already used to the data, and finds the same patterns in it, as it previously did. This will give you disproportionately high accuracy.  When used on testing data, you get an accurate measure of how your model will perform and its speed. 100
  • 101. STEP 7: MODEL EVALUATION  During the model evaluation process, you should do the following:  Evaluate the models using a validation data set.  Determine confusion matrix values for classification problems.  Identify methods for k-fold cross-validation if that approach is used.  Further tune hyperparameters for optimal performance.  Compare the machine learning model to the baseline model or heuristic. 101
  • 102. STEP 8: PARAMETER TUNING  Once you have created and evaluated your model, see if its accuracy can be improved in any way. This is done by tuning the parameters present in your model.  Parameters are the variables in the model that the programmer generally decides.  At a particular value of your parameter, the accuracy will be the maximum. Parameter tuning refers to finding these values. 102
  • 103. STEP 9: MAKING PREDICTIONS  In the end, you can use your model on unseen data to make predictions accurately. 103
  • 104. STEP 9: DEPLOY THE MACHINE LEARNING MODEL  The last step in building a machine learning model is the deployment of the model.  Machine learning models are generally developed and tested in a local or offline environment using training and testing datasets.  Deployment is when the model is moved into a live environment, dealing with new and unseen data.  This is the point that the model starts to bring a return on investment to the organization, as it is performing the task it was trained to do with live data. 104
  • 106. MODEL EVALUATION: OVERVIEW  Key questions Q. How well the model works/perform in an unseen data?  While training a model is a key step, how the model generalizes on unseen data is an equally important aspect that should be considered in every machine learning pipeline.  We need to know whether it actually works and, consequently, if we can trust its predictions. 106
  • 107. MODEL EVALUATION : DEFINITION  Model evaluation aims to estimate the generalization accuracy of a model on future (unseen/out-of-sample) data.  The purpose of model evaluation is to help us to know which algorithm best suits the given dataset for solving a particular problem  To select the “Best Fit” algorithm  It evaluates the performance of different Machine Learning models, based on the same input dataset. 107
  • 108. MODEL EVALUATION TECHNIQUES  There are two methods that are used to evaluate a model performance. They are 1. Holdout 2. Cross Validation  Both methods use a test set (i.e data not seen by the model) to evaluate model performance.  It’s not recommended to use the data we used to build the model to evaluate it. This is because our model will simply remember the whole training set, and will therefore always predict the correct label for any point in the training set. This is known as overfitting 108
  • 109. MODEL EVALUATION TECHNIQUES: HOLDOUT METHOD  The Holdout method is used to evaluate the model performance and uses two types of data : training and testing  The training data is used to train the system  The test data is used to calculate the performance of the model whereas it is trained using the training data set  This method is used to check how well the machine learning model developed using different algorithm techniques performs on unseen samples of data.  The approach is simple, flexible and fast.  E.g. 80/20% train-test data split 109
  • 110. CROSS-VALIDATION  k-fold cross-validation is the most common cross-validation technique and it works as the following way:  The original dataset is partitioned into k equal size subsamples, called folds.  The k is a user-specified number, usually with 5 or 10 as its preferred value.  This is repeated k times, such that each time, one of the k subsets is used as the test set/validation set and the other k-1 subsets are put together to form a training set.  The error estimation is averaged over all k trials to get the total effectiveness of our model.  Example:  when performing five-fold cross-validation, the data is first partitioned into 5 parts of (approximately) equal size. A sequence of models is trained. The first model is trained using the first fold as the test set, and the remaining folds are used as the training set. This is repeated for each of these 5 splits of the data and the estimation of accuracy is averaged over all 5 trials to get the total effectiveness of our model.  Cross-validation is usually the preferred method because it gives your model the opportunity to train on multiple train-test splits. This gives you a better indication of how well your model will perform on unseen data. Hold-out, on the other hand, is dependent on just one train-test split. 110
  • 111. MODEL EVALUATION METRICS  Model evaluation metrics are required to quantify model performance.  The choice of evaluation metrics depends on a given machine learning task (such as classification, regression, ranking, clustering, topic modeling, among others).  All tasks may not require all evaluation metrics  Some metrics, such as precision-recall, are useful for multiple tasks.  Common types of evaluation metrics are depends on the type of machine learning task  Classification model  Clustering model  Forecast model  Outlier model 111
  • 112. MODEL EVALUATION: CLASSIFICATION METRICS  The different types of classification metrics are:  Classification Accuracy  Confusion Matrix  F-Measure  Logarithmic Loss  Area under Curve (AUC) 112
  • 113.  Classification Accuracy  Classification accuracy is similar to the term Accuracy. It is the ratio of the correct predictions to the total number of Predictions made by the model from the given data. 113 MODEL EVALUATION: CLASSIFICATION METRICS
  • 114.  Confusion Matrix  It is a NxN matrix structure used for evaluating the performance of a classification model, where N is the number of classes that are predicted.  It is operated on a test dataset in which the true values are known.  The matrix lets us know about the number of incorrect and correct predictions made by a classifier and is used to find correctness of the model.  It consists of values like True Positive, False Positive, True Negative, and False Negative, which helps in measuring Accuracy, Precision, Recall, Specificity, Sensitivity, and AUC curve. 114 MODEL EVALUATION: CLASSIFICATION METRICS
  • 115.  Confusion matrix:  There are 4 important terms in confusion matrix:  True Positives (TP): The cases in which our predictions are TRUE, and the actual output was also TRUE.  True Negatives (TN): The cases in which our predictions are FALSE, and the actual output was also FALSE.  False Positives (FP): The cases in which our predictions are TRUE, and the actual output was FALSE.  False Negative (FN): The cases in which our predictions are FALSE, and the actual output was TRUE.  Helps to calculate accuracy, precision, recall and F-measure  The accuracy can be calculated by using the mean of True Positive and True Negative values of the total sample values. It tells us about the total number of predictions made by the model that were correct.  Precision is the ratio of Number of True Positives in the sample to the total Positive samples predicted by the classifier. It tells us about the positive samples that were correctly identified by the model.  Recall is the ratio of Number of True Positives in the sample to the sum of True Positive and False Negative samples in the data.  F1 Score  It is also called as F-Measure. It is a best measure of the Test accuracy of the developed model. It makes our task easy by eliminating the need to calculate Precision and Recall separately to know about the model performance. F1 Score is the Harmonic mean of Recall and Precision. Higher the F1 Score, better the performance of the model. Without calculating Precision and Recall separately, we can calculate the model performance using F1 score as it is precise and robust. 115 MODEL EVALUATION: CLASSIFICATION METRICS
  • 116. REGRESSION METRICS  It helps to predict the state of outcome at any time with the help of independent variables that are correlated.  These metrics are designed in order to predict if the data is underfitted or overfitted for the better usage of the model.  They are:-  Mean Absolute Error (MAE)  Mean Squared Error (MSE)  Root Mean Squared Error (RMSE)  Mean Absolute Error is the average of the difference of the original values and the predicted values. It gives us an idea of how far the predictions are from the actual output. It doesn’t give clarity on whether the data is under fitted or over fitted. It is calculated as follows:  The mean squared error is similar to the mean absolute error. It is computed by taking the average of the square of the difference between original and predicted values. With the help of squaring, large errors can be converted to small errors and large errors can be dealt with. It is computed as follows.  The root mean squared error is the root of the mean of the square of difference of the predicted and actual values of the given data. It is the most popular metric evolution technique used in regression problems. It follows a normal distribution and is based on the assumption that errors are unbiased. It is computed using the below formulae. 116