Machine learning can be broadly categorized into four main types based on how they learn from data:
Supervised Learning: Imagine a teacher showing you labeled examples (like classifying pictures of cats and dogs). Supervised learning algorithms learn from labeled data, where each data point has a corresponding answer or label. The algorithm analyzes the data and learns to map the inputs to the desired outputs. This is commonly used for tasks like spam filtering, image recognition, and weather prediction.
Unsupervised Learning: Unlike supervised learning, unsupervised learning deals with unlabeled data. It's like being given a pile of toys and asked to organize them however you see fit. The algorithm finds hidden patterns or structures within the data. This is useful for tasks like customer segmentation, anomaly detection, and recommendation systems.
Reinforcement Learning: This is inspired by how humans learn through trial and error. The algorithm interacts with its environment and receives rewards for good decisions and penalties for bad ones. Over time, it learns to take actions that maximize the rewards. This is used in applications like training self-driving cars and playing games like chess.
Semi-Supervised Learning: This combines aspects of supervised and unsupervised learning. It leverages a small amount of labeled data along with a larger amount of unlabeled data to improve the learning process. This is beneficial when labeled data is scarce or expensive to obtain.
2. Machine Learning
A machine is said to be learning from past Experiences(data feed in) with respect
to some class of tasks if its Performance in a given Task improves with the
Experience.
For example, A machine has to predict whether a customer will buy a specific
year or not. The machine will do it by looking at the previous knowledge/past
experiences i.e the data of products that the customer had bought every year and
if he buys Antivirus every year, then there is a high probability that the
customer is going to buy an antivirus this year as well.
2
3. Types of ML
3
Task Driven Data Driven
Learn from Mistakes
Machine Learning systems can be classified according to the
amount and type of supervision they get during training.
4. Supervise
d
Learning
4
In supervised learning, the training
data you feed to the algorithm includes
the desired solutions, called labels.
And Supervised learning model is
getting trained on a labelled dataset.
A labelled dataset is one that has both
input and output parameters. In this
type of learning both training and
validation, datasets are labelled.
5. 5
Both the above figures have labelled data set –
Figure A: It is a dataset of
a shopping store that is
useful in predicting whether
a customer will purchase a
particular product under
consideration or not based on
his/ her gender, age, and
salary.
Input: Gender, Age, Salary
Output: Purchased i.e. 0 or
1; 1 means yes the customer
Figure B: It is a
Meteorological dataset that
serves the purpose of
predicting wind speed based on
different parameters.
Input: Dew Point, Temperature,
Pressure, Relative Humidity,
Wind Direction
Output: Wind Speed
6. Training
the
system:
• While training the model, data is
usually split in the ratio of 80:20 i.e.
80% as training data and rest as testing
data.
• In training data, we feed input as well
as output for 80% of data. The model
learns from training data only.
• Different machine learning algorithms
are used to build the ML model.
• By learning, it means that the model
will build some logic of its own.
• Once the model is ready then it is good
to be tested.
• At the time of testing, the input is fed
from the remaining 20% data which the
model has never seen before, the model
will predict some value and at last,
will compare it with actual output and
calculate the accuracy.
6
7. Types of
Supervised
Learning
Classification
It is a Supervised Learning task where
output is having defined labels(discrete
value).
For example in above Figure A, Output –
Purchased has defined labels i.e. 0 or
1; 1 means the customer will purchase
and 0 means that customer won’t
purchase. The goal here is to predict
discrete values belonging to a
particular class and evaluate them on
the basis of accuracy.
• It can be either binary or multi-class
classification.
In binary classification, the model
predicts either 0 or 1; yes or no but in
the case of multi-class classification,
the model predicts more than one class.
• Example: Gmail classifies mails in more
than one class like social, promotions,
updates, forums.
7
8. Regressio
n
• Another typical task is to predict a
target numeric value, such as the price
of a car, given a set of features
(mileage, age, brand, etc.) called
predictors.
• It is a Supervised Learning task where
output is having continuous value.
• Example in above Figure B, Output – Wind
Speed is not having any discrete value
but is continuous in the particular
range.
• The goal here is to predict a value as
much closer to the actual output value
as our model can and then evaluation is
done by calculating the error value. The
smaller the error the greater the
accuracy of our regression model.
8
9. 9
Example of Supervised Learning
Algorithms:
Clustering
k-Nearest Neighbors
Linear Regression
Logistic Regression
Decision Trees and Random Forests
Gaussian Naive Bayes
Support Vector Machines (SVMs)
10. Advantages:
• Supervised learning allows collecting data and
produces data output from previous experiences.
• Helps to optimize performance criteria with the help
of experience.
• Supervised machine learning helps to solve various
types of real-world computation problems.
Disadvantages:-
• Classifying big data can be challenging.
• Training for supervised learning needs a lot of
computation time. So, it requires a lot of time.
10
11. Unsupervised Learning
• It’s a type of learning where we don’t give a target to our
model while training i.e. training model has only input
parameter values. The model by itself has to find which way
it can learn.
• The training data is unlabeled
11
12. • Data-set in Figure A
is mall data that
contains information
of its clients that
subscribe to them.
Once subscribed they
are provided a
membership card and
so the mall has
complete information
about the customer
and his/her every
purchase. Now using
this data and
unsupervised
learning techniques,
the mall can easily
group clients based
on the parameters we
are feeding in.
12
13. • Training data we are feeding is –
• Unstructured data: May contain
noisy(meaningless) data, missing values,
or unknown data
• Unlabeled data: Data only contains a
value for input parameters, there is no
targeted value(output). It is easy to
collect as compared to labeled one in
the Supervised approach.
13
14. Types of Unsupervised Learning
14
Clustering: Broadly this technique is applied
to group data based on different patterns, our
machine model finds.
Algorithm to try to detect groups of similar
visitors. At no point do you tell the algorithm
which group a visitor belongs to: it finds
those connections without your help. For
example, it might notice that 40% of your
visitors are males who love comic books and
15. Associati
on
This technique is a rule-based ML technique that
finds out some very useful relations between
parameters of a large data set. For e.g.
shopping stores use algorithms based on this
technique to find out the relationship between
the sale of one product w.r.t to others sales
based on customer behavior. Once trained well,
such models can be used to increase their sales
by planning different offers.
• Some algorithms:
Clustering:
• K-Means Clustering
• Hierarchical Cluster Analysis (HCA)
• Expectation Maximization
Visualization and dimensionality reduction
• Principal Component Analysis (PCA)
• Kernel PCA
• Association rule learning
• Apriori
• Eclat
15
16. • Clustering: algorithm to try to detect groups of similar
visitors.
• Visualization algorithms are also good examples of
unsupervised learning algorithms
• You feed them a lot of complex and unlabeled data, and they
data, and they output a 2D or 3D representation of your data
of your data that can easily be plotted.
• These algorithms try to preserve as much structure as they
structure as they can (e.g., trying to keep separate clusters in
separate clusters in the input space from overlapping in the
overlapping in the visualization), so you can understand how the
understand how the data is organized and perhaps identify
perhaps identify unsuspected patterns.
• Dimensionality reduction, in which the goal is to simplify the
data without losing too much information. One way to do this is
way to do this is to merge several correlated features into one.
features into one. For example, a car’s mileage may be very
may be very correlated with its age, so the dimensionality
dimensionality reduction algorithm will merge them into one
them into one feature that represents the car’s wear and tear.
wear and tear. This is called feature extraction.
extraction.
16
17. 17
Anomaly detection — for example, detecting unusual credit card
transactions to prevent fraud, catching manufacturing defects, or
automatically removing outliers from a dataset before feeding it to
another learning algorithm. The system is trained with normal
instances, and when it sees a new instance it can tell whether it
looks like a normal one or whether it is likely an anomaly
18. • Association Rule Learning, in which the goal is to
dig into large amounts of data and discover interesting
discover interesting relations between attributes. For
attributes. For example, suppose you own a
a supermarket. Running an association rule on your
rule on your sales logs may reveal that people who
people who purchase barbecue sauce and potato chips
potato chips also tend to buy steak. Thus, you may want
Thus, you may want to place these items close to each
close to each other.
18
20. Semi-
supervised
Learning:
• Some algorithms can deal
with partially labeled
training data, usually a lot
of unlabeled data and a
little bit of labeled data.
This is called
semisupervised learning
• We can use the unsupervised
techniques to predict labels
and then feed these labels
to supervised techniques.
This technique is mostly
applicable in the case of
image data sets where
usually all images are not
labeled.
20
21. Reinforcemen
t Learning:
• The learning system,
called an agent in this
context, can observe the
environment, select and
perform actions, and get
rewards in return (or
penalties in the form of
negative rewards.
• It must then learn by
itself what is the best
strategy, called a
policy, to get the most
reward over time. A
policy defines what
action the agent should
choose when it is in a
given situation. 21
23. Batch and
Online
Learning
Batch learning
• In batch learning, the system is
incapable of learning incrementally: it
must be trained using all the available
data. This will generally take a lot of
time and computing resources, so it is
typically done offline. First the system
is trained, and then it is launched into
production and runs without learning
anymore; it just applies what it has
learned. This is called offline
learning.
• Online learning
• In online learning, you train the system
incrementally by feeding it data
instances sequentially, either
individually or by small groups called
mini-batches
23
24. Instance-Based Versus
Model-Based Learning
Instance-based learning
• If you were to create a spam filter this way,
it would just flag all emails that are
identical to emails that have already been
flagged by users— not the worst solution, but
certainly not the best.
• Instead of just flagging emails that are
identical to known spam emails, your spam
filter could be programmed to also flag emails
that are very similar to known spam emails.
This requires a measure of similarity between
two emails. A (very basic) similarity measure
between two emails could be to count the number
of words they have in common. The system would
flag an email as spam if it has many words in
common with a known spam email.
• This is called instance-based learning: the
system learns the examples by heart, then
generalizes to new cases using a similarity
measure
24
25. Model-
based
learnin
g
25
• Another way to generalize from a set
of examples is to build a model of
these examples, then use that model to
make predictions. This is called
model-based learning