Unit 1-ML (1) (1).pptx

INTRODUCTION TO
MACHINE
LEARNING
By
Archana M
Department of computer science

Introduction
• Machine Learning or ML is one of the most
successful applications of Artificial intelligence
which provides systems with automated learning
without being constantly programmed.
• It has acquired a ton of noticeable quality lately due
to its capacity to be applied across scores of ventures
to tackle complex issues quickly and effectively.

• From Digital assistants that play your music to the
products being recommended based on prior search,
Machine Learning has taken over many aspects of
our life.
• It is a skill in high demand as companies require
software that can grasp data and provide accurate
results. The core objective is to obtain optimal
functions with less confusion.

WHAT IS MACHINE
LEARNING?
• Machine Learning is a segment that comes under
Artificial Intelligence (AI) that increases the quality
of applications by using previously assimilated data.
It programs systems to learn and grasp data without
having to feed a new code for every new similar
activity.
• The aim is for the flow to be automated rather than
continuously modified. Hence by experience and past
intel, it improves the program by itself.

WHY MACHINE LEARNING?
• The domain of Machine Learning is a continuously
evolving field with high demand. Without human
intervention, it delivers real-time results using the
already existing and processed data.
• It generally helps analyze and assess large amounts of
data with ease by developing data-driven models. As
of today, Machine Learning has become a fast and
efficient way for firms to build models and strategize
plans.

ADVANTAGES OF MACHINE
LEARNING
• Completely Automated ( Zero human intervention)
• Analyses large amounts of data
• More efficient than traditional data analytical methods
• Identifies trends and patterns with ease
• Reliable and efficient
• Less usage of workforce
• Handles a variety of data
• Accommodates for most forms of applications

COMMONLY USED ALGORITHMS IN MACHINE LEARNING
There are many different models in Machine Learning. Here are the most
commonly used algorithms in the world today-
• Gradient Boosting algorithms dimensionality Reduction Algorithms
• Random Forest
• K-Means
• KNN
• Naive Bayes
• SVM
• Decision Tree
• Logistic Regression
• Linear Regression

• Machine Learning or ML is slowly but steadily having
a huge impact on data-driven business decisions
across the globe. It has also helped organizations
with the correct intel to make more informed, data-
driven choices that are quicker than conventional
methodologies.
• Yet, there are many issues in Machine Learning that
cannot be overlooked in spite of its high
productivity.

Machine Learning : A General
Perspective
• The goal in the machine learning is to recognize the
pattern in the dataset, in general manner. After you
recognize the patterns, you can use this information
to model the data, to interpret the data, or to predict
the outcome of the new data which hasn’t seen
before.

• Machine learning is a subfield of artificial intelligence and
machine learning algorithms are used in other related
fields like natural language processing and computer
vision.
• In general, there are three types of learning and these are
• supervised learning,
• unsupervised learning, and
• reinforcement learning.
Their names tell the main idea behind them actually.

Supervised learning
In supervised learning, your system learns under the
supervision of the data outputs so supervised algorithms are
preferred if your dataset contains output information.
Let me give you an example in there.
• Let’s assume you have a medical statistic company and
you have a dataset which contains patients’ features like
blood pressure, sugar rate in their blood, heart rate per
minute, etc.
• and also you have the information about if they have
experienced heart disease in their life or not.

• By training a machine learning algorithm, your
system can find a pattern between features and the
probability to experience heart disease. Therefore
your algorithm can predict whether a new patient has
a risk to experience a heart disease, so doctor takes
the precautions and save a person’s life

A Decision Tree from one of projects. Where x’s are features which are medical
tests in this case and the 0,1 values in boxes represents existence of heart disease.
As you can see, algorithm produced an interpretable tree.

Unsupervised Learning
• In unsupervised algorithms if your data doesn’t contain
output and if you would like to discover the clusters in
dataset.
• A good example of unsupervised learning is handwritten
digit recognition.
• In this application you know that there should be 10
clusters {0,1,2,3,4,5,6,7,8,9} but the problem in
handwritten digits is that there are countless ways to write
a digit by hand, and everyone write digits differently.

• How does a computer understand what is written with hand?
• In there, you should use an unsupervised algorithm like K-
means or EM-algorithm.
• What you do with these algorithms is that you start with initial
random cluster means and iteratively these mean points
converge to real cluster mean values. After you complete the
training, if you visualize the means of the clusters you can see
that they really look like digits. Then you label these clusters
with corresponding digits, and when the computer encounters a
new handwritten digit, algorithm labels the digit with the mean
which is closest to it.

Reinforcement learning
Let’s assume you want to create an intelligent agent
which plays chess.
In chess, you can’t handle movements one by one.
Your agent should consider a series of movements and then
decide to take an action which would maximize the utility.
Therefore your agent should play a couple of turns against
itself and decide the best action to take. We call this type of
learning as reinforcement learning and it is generally used in
games.

• Poor Quality of Data
• Under fitting of Training Data
• Over fitting of Training Data
• Machine Learning is a Complex Process
• Lack of Training Data
• Slow Implementation
• Imperfections in the Algorithm When Data Grows

1. Poor Quality of Data
• Data plays a significant role in the machine learning
process. One of the significant issues that machine
learning professionals face is the absence of good quality
data. Unclean and noisy data can make the whole process
extremely exhausting. We don’t want our algorithm to
make inaccurate or faulty predictions.
• Hence the quality of data is essential to enhance the
output. Therefore, we need to ensure that the process of
data preprocessing which includes removing outliers,
filtering missing values, and removing unwanted features,
is done with the utmost level of perfection.

2.Underfitting of Training Data
This process occurs when data is unable to establish an accurate
relationship between input and output variables. It simply means
trying to fit in undersized jeans. It signifies the data is too simple
to establish a precise relationship. To overcome this issue:
• Maximize the training time
• Enhance the complexity of the model
• Add more features to the data
• Reduce regular parameters
• Increasing the training time of model

3. Overfitting of Training Data
• Overfitting refers to a machine learning model
trained with a massive amount of data that negatively
affect its performance. It is like trying to fit in
Oversized jeans. Unfortunately, this is one of the
significant issues faced by machine learning
professionals. This means that the algorithm is
trained with noisy and biased data, which will affect
its overall performance

• . Let’s understand this with the help of an example.
Let’s consider a model trained to differentiate
between a cat, a rabbit, a dog, and a tiger. The
training data contains 1000 cats, 1000 dogs, 1000
tigers, and 4000 Rabbits. Then there is a considerable
probability that it will identify the cat as a rabbit. In
this example, we had a vast amount of data, but it
was biased; hence the prediction was negatively
affected.

4. Machine Learning is a
Complex Process
• The machine learning industry is young and is
continuously changing. Rapid hit and trial experiments are
being carried on. The process is transforming, and hence
there are high chances of error which makes the learning
complex.
• It includes analyzing the data, removing data bias, training
data, applying complex mathematical calculations, and a
lot more. Hence it is a really complicated process which is
another big challenge for Machine learning professionals

5. Lack of Training Data
• The most important task you need to do in the
machine learning process is to train the data to
achieve an accurate output. Less amount training data
will produce inaccurate or too biased predictions.
• Let us understand this with the help of an example.

• Consider a machine learning algorithm similar to training a
child. One day you decided to explain to a child how to
distinguish between an apple and a watermelon. You will take
an apple and a watermelon and show him the difference
between both based on their color, shape, and taste.
• In this way, soon, he will attain perfection in differentiating
between the two.
• But on the other hand, a machine-learning algorithm needs a lot
of data to distinguish. For complex problems, it may even
require millions of data to be trained. Therefore we need to
ensure that Machine learning algorithms are trained with
sufficient amounts of data.

6. Slow Implementation
• This is one of the common issues faced by machine
learning professionals. The machine learning models
are highly efficient in providing accurate results, but
it takes a tremendous amount of time.
• Slow programs, data overload, and excessive
requirements usually take a lot of time to provide
accurate results. Further, it requires constant
monitoring and maintenance to deliver the best
output.

7. Imperfections in the Algorithm
When Data Grows
• The model may become useless in the future as data
grows. The best model of the present may become
inaccurate in the coming Future and require further
rearrangement. So you need regular monitoring and
maintenance to keep the algorithm working. This is
one of the most exhausting issues faced by machine
learning professionals.

Conclusion:
• It is one of the most rapidly growing technologies
used in medical diagnosis, speech recognition,
robotic training, product recommendations, video
surveillance, and this list goes on

Design a Learning System in
Machine Learning
According to Arthur Samuel “Machine
Learning enables a Machine to Automatically learn
from Data, Improve performance from an Experience
and predict things without explicitly programmed.”

• In Simple Words, When we fed the Training Data to
Machine Learning Algorithm, this algorithm will
produce a mathematical model and with the help of
the mathematical model, the machine will make a
prediction and take a decision without being
explicitly programmed.
• Also, during training data, the more machine will
work with it the more it will get experience and the
more efficient result is produced.

Example :
• In Driverless Car, the training data is fed to
Algorithm like how to Drive Car in Highway, Busy
and Narrow Street with factors like speed limit,
parking, stop at signal etc.
• After that, a Logical and Mathematical model is
created on the basis of that and after that, the car will
work according to the logical model. Also, the more
data the data is fed the more efficient output is
produced

• According to Tom Mitchell, “A computer program is
said to be learning from experience (E), with respect
to some task (T). Thus, the performance measure (P)
is the performance at task T, which is measured by P,
and it improves with experience E.”

Example: In Spam E-Mail detection,
• Task, T: To classify mails into Spam or Not Spam.
• Performance measure, P: Total percent of mails
being correctly classified as being “Spam” or “Not
Spam”.
• Experience, E: Set of Mails with label “Spam”

Steps for Designing Learning System
are:

Step 1) Choosing the Training
Experience:
• The very important and first task is to choose the
training data or training experience which will be fed
to the Machine Learning Algorithm. It is important
to note that the data or experience that we fed to the
algorithm must have a significant impact on the
Success or Failure of the Model. So Training data or
experience should be chosen wisely.

Below are the attributes which will
impact on Success and Failure of
Data
• The training experience will be able to provide direct
or indirect feedback regarding choices. For example:
While Playing chess the training data will provide
feedback to itself like instead of this move if this is
chosen the chances of success increases.

• Second important attribute is the degree to which the
learner will control the sequences of training
examples. For example: when training data is fed to
the machine then at that time accuracy is very less
but when it gains experience while playing again and
again with itself or opponent the machine algorithm
will get feedback and control the chess game
accordingly.

• Third important attribute is how it will represent the
distribution of examples over which performance
will be measured. For example, a Machine learning
algorithm will get experience while going through a
number of different cases and different examples.
Thus, Machine Learning Algorithm will get more and
more experience by passing through more and more
examples and hence its performance will increase.

Step 2- Choosing target function:
• The next important step is choosing the target
function. It means according to the knowledge fed to
the algorithm the machine learning will choose
NextMove function which will describe what type of
legal moves should be taken.

• For example : While playing chess with the
opponent, when opponent will play then the machine
learning algorithm will decide what be the number of
possible legal moves taken in order to get success.

Step 3- Choosing Representation
for Target function:
• When the machine algorithm will know all the
possible legal moves the next step is to choose the
optimized move using any representation i.e. using
linear Equations, Hierarchical Graph Representation,
Tabular form etc. The NextMove function will move
the Target move like out of these move which will
provide more success rate.

• For Example : while playing chess machine have 4
possible moves, so the machine will choose that
optimized move which will provide success to it.

Step 4- Choosing Function
Approximation Algorithm:
• An optimized move cannot be chosen just with the
training data. The training data had to go through
with set of example and through these examples the
training data will approximates which steps are
chosen and after that machine will provide feedback
on it.

• For Example : When a training data of Playing chess
is fed to algorithm so at that time it is not machine
algorithm will fail or get success and again from that
failure or success it will measure while next move
what step should be chosen and what is its success
rate.

Step 5- Final Design:
• The final design is created at last when system goes
from number of examples , failures and success ,
correct and incorrect decision and what will be the
next step etc. Example: DeepBlue is an
intelligent computer which is ML-based won chess
game against the chess expert Garry Kasparov, and it
became the first computer which had beaten a
human chess expert.

Concept of Hypothesis
• The hypothesis is a common term in Machine Learning and data science
projects. As we know, machine learning is one of the most powerful
technologies across the world, which helps us to predict results based on
past experiences.
• Moreover, data scientists and ML professionals conduct experiments that
aim to solve a problem. These ML professionals and data scientists make
an initial assumption for the solution of the problem.
• This assumption in Machine learning is known as Hypothesis.
• In Machine Learning, at various times, Hypothesis and Model are used
interchangeably. However, a Hypothesis is an assumption made by
scientists, whereas a model is a mathematical representation that is used to
test the hypothesis

What is Hypothesis?
• The hypothesis is defined as the supposition or proposed
explanation based on insufficient evidence or
assumptions. It is just a guess based on some known facts but
has not yet been proven. A good hypothesis is testable, which
results in either true or false.
• Example: Let's understand the hypothesis with a common
example. Some scientist claims that ultraviolet (UV) light can
damage the eyes then it may also cause blindness.
• In this example, a scientist just claims that UV rays are harmful
to the eyes, but we assume they may cause blindness. However,
it may or may not be possible. Hence, these types of
assumptions are called a hypothesis.

Two important types of hypotheses
as follows:
• Null Hypothesis: A null hypothesis is a type of
statistical hypothesis which tells that there is no
statistically significant effect exists in the given set of
observations. It is also known as conjecture and is
used in quantitative analysis to test theories about
markets, investment, and finance to decide whether
an idea is true or false.

Alternative Hypothesis:
• An alternative hypothesis is a direct contradiction of
the null hypothesis, which means if one of the two
hypotheses is true, then the other must be false. In
other words, an alternative hypothesis is a type of
statistical hypothesis which tells that there is some
significant effect that exists in the given set of
observations

Hypothesis in Machine Learning
(ML)
• The hypothesis is one of the commonly used
concepts of statistics in Machine Learning. It is
specifically used in Supervised Machine learning,
where an ML model learns a function that best maps
the input to corresponding outputs with the help of
an available dataset.

The following figure shows the common method to
find out the possible hypothesis from the Hypothesis
space:

• There are some common methods given to find out
the possible hypothesis from the Hypothesis space,
where hypothesis space is represented by uppercase-
h (H) and hypothesis by lowercase-h (h). These are
defined as follows:

Hypothesis space (H):
• Hypothesis space is defined as a set of all
possible legal hypotheses; hence it is also known
as a hypothesis set. It is used by supervised
machine learning algorithms to determine the best
possible hypothesis to describe the target function or
best maps input to output.
• It is often constrained by choice of the framing of
the problem, the choice of model, and the choice of
model configuration

Hypothesis (h):
• It is defined as the approximate function that best describes the
target in supervised machine learning algorithms. It is
primarily based on data as well as bias and
restrictions applied to data.
• Hence hypothesis (h) can be concluded as a single
hypothesis that maps input to proper output and can
be evaluated as well as used to make predictions.

• The hypothesis (h) can be formulated in machine learning
as follows:
y= mx + b
Where,
• Y: Range
• m: Slope of the line which divided test data or changes in y
divided by change in x.
• x: domain
• c: intercept (constant)

Example: Let's understand the hypothesis (h) and hypothesis
space (H) with a two-dimensional coordinate plane showing the
distribution of data as follows:

Now, assume we have some test data by which
ML algorithms predict the outputs for input as
follows:

If we divide this coordinate plane in such as way that it
can help you to predict output or result as follows:

Based on the given test data, the output result
will be as follows:

However, based on data, algorithm, and constraints,
this coordinate plane can also be divided in the
following ways as follows:

With the above example, we can conclude that;
• Hypothesis space (H) is the composition of all legal
best possible ways to divide the coordinate plane so
that it best maps input to proper output.
• Further, each individual best possible way is called a
hypothesis (h). Hence, the hypothesis and hypothesis
space would be like this:

Version Spaces
• A version space is a hierarchical representation of
knowledge that enables you to keep track of all the
useful information supplied by a sequence of
learning examples without remembering any of the
examples.
• The version space method is a concept learning
process accomplished by managing multiple
models within a version space.

• A hypothesis “h” is consistent with a set of
training examples D of target concept c if and
only if h(x) = c(x) for each training example in
D.
• The version space VS with respect to
hypothesis space H and training examples D
is the subset of hypothesis from H consistent
with all training examples in D.

Version Space Characteristics
• A version space represents all the alternative
plausible descriptions of a heuristic.
• A plausible description is one that is applicable to
all known positive examples and no known
negative example.

A version space description consists of two
complementary trees:
1.One that contains nodes connected to
overly general models, and
2.One that contains nodes connected to
overly specific models.

Diagrammatical Guidelines
• There is a generalization tree and
a specialization tree.
• Each node is connected to a model.
• Nodes in the generalization tree are connected to a
model that matches everything in its subtree.
• Nodes in the specialization tree are connected to a
model that matches only one thing in its subtree.

Links between nodes and their models denote
• generalization relations in a generalization tree, and
• specialization relations in a specialization tree.

Diagram of a Version Space
the specialization tree is colored red, and the generalization tree
is colored green.

Generalization and Specialization
Leads to Version Space
Convergence
• The key idea in version space learning is that
specialization of the general models and
generalization of the specific models may
ultimately lead to just one correct model that
matches all observed positive examples and does
not match any negative examples.

Version Space Method Learning
Algorithm: Candidate-
Elimination
• The Candidate Elimination Algorithm computes the
version space containing all hypotheses from H that are
consistent with an observed sequence of training
examples.
• It begins by initializing the version space to the set of all
hypotheses in H, that is, by initializing the G boundary
set to contain the most general hypotheses in H
• G0 ← {<?,?,?,?,?,?,?>}
• And initializing the S boundary set to contain the most
specific hypothesis.
• S0 ← {<0,0,0,0,0,0,0>}

• These two boundary sets delimit the entire hypothesis space
because every other hypothesis in H is both more general than
S0 and more specific than G0.
• As each training example is considered, the S and G boundary sets
are generalized and specialized, respectively to eliminate from the
version space any hypothesis found inconsistent with the new
training example.
• After all the examples have been processed, the computed version
space contains all the hypotheses consistent with these examples
and hypotheses.

The Candidate Elimination Algorithm goes as follows -
1.Initialize G to the set of maximally general hypotheses in H.
2.Initialize S to the set of maximally specific hypotheses in H.
3.For each training example d
1. If d is a positive example
2. Remove from G any hypothesis that does not include.
3. For each hypothesis s in S that does not include d, remove s from S.
4. Add to S all minimal generalizations h of s such that h includes d, and
5. Some member of G is more general than h
1. Remove from S any hypothesis that is more general than another
hypothesis in S.
4.For each training example d
1. If d is a negative example
2. Remove from S any hypothesis that does not include.
3. For each hypothesis g in G that does not include d
4. Remove g from G
5.Add to G all minimal generalizations h of g such that
1. h does not include d and
2. Some member of S is more specific than h
6.Remove from G any hypothesis that is less general than another hypothesis in
G.
7.If G or S, ever becomes empty, data not consistent (with H).

Advantages of the version space method:
• Can describe all the possible hypotheses in the language
consistent with the data.
• Fast (close to linear).
Disadvantages of the version space method:
• Inconsistent data (noise) may cause the target concept to
be pruned.
• Learning disjunctive concepts is challenging.

Example 2
Size Colour Shape Class/label
Big Red Circle No
Small Red Triangle No
Small Red Circle Yes
Big Blue Circle NO
Small Blue Circle Yes

Find-S Algorithm
Find maximally specific
hypothesis

Performance Metrics
Evaluating the performance of a Machine learning model
is one of the important steps while building an effective ML
model. To evaluate the performance or quality of the
model, different metrics are used, and these metrics
are known as performance metrics or evaluation
metrics.
These performance metrics help us understand how well
our model has performed for the given data. In this way,
we can improve the model's performance by tuning the
hyper-parameters. Each ML model aims to generalize well
on unseen/new data, and performance metrics help
determine how well the model generalizes on the new
dataset.

• In machine learning, each task or problem is
divided into classification and Regression.
Not all metrics can be used for all types of
problems; hence, it is important to know and
understand which metrics should be used.
• Different evaluation metrics are used for both
Regression and Classification tasks. In this
topic, we will discuss metrics used for
classification and regression tasks.

Performance Metrics for
Classification
In a classification problem, the category
or classes of data is identified based on training
data. The model learns from the given dataset
and then classifies the new data into classes or
groups based on the training. It predicts class
labels as the output, such as Yes or No, 0 or 1,
Spam or Not Spam, etc. To evaluate the
performance of a classification model, different
metrics are used, and some of them are as
follows:

• Accuracy
• Confusion Matrix
• Precision
• Recall
• F-Score
• AUC(Area Under the Curve)-ROC

I. Accuracy
The accuracy metric is one of the simplest
Classification metrics to implement, and it can be
determined as the number of correct predictions
to the total number of predictions.
It can be formulated as:

II. Confusion Matrix
• A confusion matrix is a tabular representation of
prediction outcomes of any binary classifier, which is
used to describe the performance of the classification
model on a set of test data when true values are known.
• The confusion matrix is simple to implement, but the
terminologies used in this matrix might be confusing for
beginners.
• A typical confusion matrix for a binary classifier looks
like the below image(However, it can be extended to
use for classifiers with more than two classes).

We can determine the following
from the above matrix:
• In the matrix, columns are for the prediction values, and
rows specify the Actual values. Here Actual and
prediction give two possible classes, Yes or No. So, if
we are predicting the presence of a disease in a patient,
the Prediction column with Yes means, Patient has the
disease, and for NO, the Patient doesn't have the
disease.
• In this example, the total number of predictions are 165,
out of which 110 time predicted yes, whereas 55 times
predicted No.
• However, in reality, 60 cases in which patients don't
have the disease, whereas 105 cases in which patients
have the disease.

In general, the table is divided
into four terminologies, which
are as follows:
1.True Positive(TP): In this case, the prediction
outcome is true, and it is true in reality, also.
2.True Negative(TN): in this case, the prediction
outcome is false, and it is false in reality, also.
3.False Positive(FP): In this case, prediction
outcomes are true, but they are false in
actuality.
4.False Negative(FN): In this case, predictions
are false, and they are true in actuality.

III. Precision
The precision metric is used to overcome
the limitation of Accuracy. The precision
determines the proportion of positive prediction
that was actually correct. It can be calculated as
the True Positive or predictions that are actually
true to the total positive predictions (True
Positive and False Positive).

IV. Recall or Sensitivity
It is also similar to the Precision metric; however, it
aims to calculate the proportion of actual positive that was
identified incorrectly. It can be calculated as True Positive or
predictions that are actually true to the total number of
positives, either correctly predicted as positive or incorrectly
predicted as negative (true Positive and false negative).
The formula for calculating Recall is given below:

• Specificity
Specificity, in contrast to recall, may be defined as the number of
negatives returned by our ML model. We can easily calculate it by
confusion matrix with the help of following formula −

V. F-Scores
• F-score or F1 Score is a metric to evaluate a
binary classification model on the basis of
predictions that are made for the positive class. It
is calculated with the help of Precision and Recall.
It is a type of single score that represents both
Precision and Recall. So, the F1 Score can be
calculated as the harmonic mean of both
precision and Recall, assigning equal weight to
each of them.
The formula for calculating the F1 score is given
below:

• VI. AUC-ROC
Sometimes we need to visualize the
performance of the classification model on
charts; then, we can use the AUC-ROC curve. It
is one of the popular and important metrics for
evaluating the performance of the classification
model.

• Firstly, let's understand ROC (Receiver
Operating Characteristic curve) curve. ROC
represents a graph to show the
performance of a classification model at
different threshold levels. The curve is
plotted between two parameters, which are:
• True Positive Rate
• False Positive Rate

TPR or true Positive rate is a synonym for
Recall, hence can be calculated as:
FPR or False Positive Rate can be
calculated as:

To calculate value at any point in a ROC curve,
we can evaluate a logistic regression model
multiple times with different classification
thresholds, but this would not be much efficient.
So, for this, one efficient method is used, which
is known as AUC.

AUC: Area Under the ROC
curve
• AUC is known for Area Under the ROC
curve. As its name suggests, AUC calculates
the two-dimensional area under the entire
ROC curve, as shown below image:

• AUC calculates the performance across all the
thresholds and provides an aggregate
measure. The value of AUC ranges from 0 to
1. It means a model with 100% wrong
prediction will have an AUC of 0.0, whereas
models with 100% correct predictions will have
an AUC of 1.0.

Unit 1-ML (1) (1).pptx

Recommended

Recommended

More Related Content

Similar to Unit 1-ML (1) (1).pptx

Similar to Unit 1-ML (1) (1).pptx (20)

Recently uploaded

Recently uploaded (20)

Unit 1-ML (1) (1).pptx