SlideShare a Scribd company logo
1 of 83
Machine
Learning:
Unit 1
StopIteration
Outline
• Learning, Types of Learning
• Well defined learning problems, Designing a Learning System
• History of ML, Introduction of Machine Learning Approaches
• Artificial Neural Network, Clustering, Reinforcement Learning
• Decision Tree Learning, Bayesian networks
• Support Vector Machine, Genetic Algorithm
• Issues in Machine Learning
• Data Science Vs Machine Learning
Have you ever heard of !
• Virtual Personal Assistants
• Smart Speakers: Amazon Echo and
Google Home
• Mobile Apps: Ok Google
• Predictions while Commuting
• GPS navigation
• Videos Surveillance
• Social Media Services
• People You May Know
• Face Recognition
• Similar Pins
• Email Spam and Malware Filtering
• Online Customer Support
• Search Engine Result Refining
• Product Recommendations
• Online Fraud Detection
Machine Learning definition
• Arthur Samuel (1959). Machine Learning: Field of study that gives computers the
ability to learn without being explicitly programmed.
• Machine learning (ML) is a type of artificial intelligence (AI) that allows software
applications to become more accurate at predicting outcomes without being
explicitly programmed to do so. Machine learning algorithms use historical data as
input to predict new output values.
• Machine learning is an application of AI that enables systems to learn and improve
from experience without being explicitly programmed. Machine learning focuses
on developing computer programs that can access data and use it to learn for
themselves.
Well Posed Learning Problem
Machine learning Types
• Machine learning algorithms:
• Supervised learning
• Unsupervised learning
• Others: Reinforcement learning, recommender systems.
Supervised Learning
Supervised machine learning algorithms are designed to learn a machine by labels. The
name “supervised” learning originates from the idea that training this type of algorithm is
like having a teacher supervise the whole process.
Supervised Learning
When training a supervised learning algorithm, the training data will consist of inputs paired
with the correct outputs. During training, the algorithm will search for patterns in the data that
correlate with the desired outputs. After training, a supervised learning algorithm will take in
new unseen inputs and will determine which label the new inputs will be classified as based on
prior training data. The objective of a supervised learning model is to predict the correct label
for newly presented input data. At its most basic form, a supervised learning algorithm can be
written simply as:
Y=f(x)
Where Y is the predicted output that is determined by a mapping function that assigns a
class to an input value x. The function used to connect input features to a predicted output is
Supervised learning can be split into two subcategories:
• Regression
• Linear Regression
• Logistic Regression
• Polynomial Regression
• Decision Tree Regression
• Classification
• Linear Classifiers
• Support Vector Machines
• Decision Trees classification
• K-Nearest Neighbor
• Random Forest
Regression : example
Classification
A classification algorithm will be given data points with an assigned category. The job of a
classification algorithm is to then take an input value and assign it a class, or category, that
it fits into based on the training data provided.
Unsupervised learning
Unsupervised learning occurs when an algorithm learns from plain examples without any
associated response, leaving to the algorithm to determine the data patterns on its own.
When no labels are present in data set to train the model. This is called un supervised ML.
• Clustering
• SVD
• PCA
• HMM
• Neural Networks
• Fuzzy C-Means
Reinforcement learning
• Reinforcement Learning is a feedback-based Machine learning technique in which an agent learns to behave
in an environment by performing the actions and seeing the results of actions. For each action, the agent gets
feedback. E.g. AWSDeepRacer Car
• IN RL, an agent interacts with an environment with an objective to maximize its total reward.
• A reinforcement learning model will learn from its experience and over the time will be able to identify
which actions lead to the best rewards.
• The main component of RL are:
 Agent
 Environment
 State
 Reward
 Action
Applications of RL
technique.
Video gameplay: Reinforcement learning has been used to teach bots to
play a number of video games.
Resource management: Given finite resources and a defined goal,
reinforcement learning can help enterprises plan out how to allocate
resources.
Applications of ML
Email-Spam Filtering
Traffic Prediction
Virtual Personal Assistant: Google assistant, alexa, cortona, Siri
Social Media Personalization
Online Fraud Detection
Stock Market Prediction
Weather Prediction
Speech Recognition
Medical Diagnosis
Self driving car
Image Recognition
Issues in ML
Poor quality of data
 Unclean and noisy data
Remove outliers
Filter missing values
Remove unwanted features
Underfitting of training data
This process occurs when data is unable to establish an accurate relationship between input and output
variables. It simply means trying to fit in undersized jeans. It signifies the data is too simple to establish a
precise relationship. To overcome this issue:
Enhance the complexity of the model
Add more features to the data
Issues in ML
ML is a complex process
It includes analyzing the data, removing data bias, training data, applying complex mathematical calculations, and a lot more.
Hence it is a really complicated process which is another big challenge for Machine learning professionals.
Lack of training data
The most important task you need to do in the
machine learning process is to train the data to
achieve an accurate output. Less amount training
data will produce inaccurate or too biased
predictions.
Issues in ML
Slow implementation
This is one of the common issues faced by machine learning professionals. The machine learning models
are highly efficient in providing accurate results, but it takes a tremendous amount of time. Slow programs,
data overload, and excessive requirements usually take a lot of time to provide accurate results. Further, it
requires constant monitoring and maintenance to deliver the best output.
Imperfections in the Algorithm When
Data Grows
you have found quality data, trained it amazingly, and the predictions are really concise and accurate. Yay,
you have learned how to create a machine learning algorithm!! But wait, there is a twist; the model may
become useless in the future as data grows. The best model of the present may become inaccurate in the
coming Future and require further rearrangement. So you need regular monitoring and maintenance to keep
the algorithm working. This is one of the most exhausting issues faced by machine learning professionals.
Data Science vs. Machine Learning
1. A field of deep study of data that includes extracting
useful insights from the data, and processing that
information using different tools, statistical models,
and Machine learning algorithms.
2. It is used for discovering insights from the data.
3. It is a broad term that includes various steps to
create a model for a given problem and deploy
the model.
4. A data scientist needs to have skills to use big
data tools like Hadoop, Hive and Pig, statistics,
programming in Python, R, or Scala, data
visualization.
5. It can work with raw, structured, and
unstructured data.
6. Data scientists spent lots of time in handling the
data, cleansing the data, and understanding its
patterns.
1. Machine Leaning allows the computers to learn from
the past experiences by its own, it uses statistical
methods to improve the performance and predict the
output without being explicitly programmed.
2. It is used for making predictions and classifying
the result for new data points.
3. It is used in the data modeling step of the data
science as a complete process.
4. Machine Learning Engineer needs to have skills
such as computer science fundamentals,
programming skills in Python or R, statistics and
probability concepts, etc.
5. It mostly requires structured data to work on.
6. ML engineers spend a lot of time for managing
the complexities that occur during the
implementation of algorithms and mathematical
concepts behind that.
DESIGNING A LEARNING SYSTEM
DESIGNING A LEARNING SYSTEM
Choosing the Training Experience
(dataset):
The very important and first task is to choose the training data or training experience which will be
fed to the Machine Learning Algorithm. Three important parameters are:
Feedback regarding choice
Degree to control the sequence of training example
Distribution of example for performance measure
Choosing target function:
The next important step is choosing the target function. It means according to the knowledge fed to the
algorithm the machine learning will choose NextMove function which will describe what type of legal moves
should be taken. For example : While playing chess with the opponent, when opponent will play then
the machine learning algorithm will decide what be the number of possible legal moves taken in
order to get success.
DESIGNING A LEARNING SYSTEM
Choosing Representation for Target
function:
When the machine algorithm will know all the possible legal moves the next step is to choose the
optimized move using any representation i.e. using linear Equations, Hierarchical Graph Representation,
Tabular form etc. The NextMove function will move the Target move like out of these move which will
provide more success rate. For Example : while playing chess machine have 4 possible moves, so the
machine will choose that optimized move which will provide success to it.
Choosing Function Approximation
Algorithm:
An optimized move cannot be chosen just with the training data. The training data had to go through with
set of example and through these examples the training data will approximates which steps are chosen and
after that machine will provide feedback on it. For Example : When a training data of Playing chess is
fed to algorithm so at that time it is not machine algorithm will fail or get success and again from
that failure or success it will measure while next move what step should be chosen and what is its
success rate.
DESIGNING A LEARNING SYSTEM
Final Design:
The final design is created at last when system goes from number of examples , failures and success ,
correct and incorrect decision and what will be the next step etc. Example: DeepBlue is an
intelligent computer which is ML-based won chess game against the chess expert Garry Kasparov,
and it became the first computer which had beaten a human chess expert.
History
of
ML
Introduction of Machine Learning Approaches
We can decide which machine learning approaches/algorithm to select based on the problem
statement, its an interaction with the environment and what type of data and inputs are
going to be. We can categorize the machine learning algorithms in two groups:
1) Learning algorithms
2) Similarity algorithms.
The similarity algorithms further used as a learning model based on the types of problem
environment.
Machine Learning Algorithms
Similarity Algorithms
• Regression Algorithms
• Clustering
• Decision Tree Algorithms
• Artificial Neural Networks
• Support Vector Machine
• Reinforcement Learning
• Bayesian networks
• Support Vector Machine
• Genetic Algorithm
Artificial Neural Network
Warren McCulloch and Walter Pitts published the first concept of a simplified brain cell, the
so-called McCulloch-Pitts (MCP) neuron, in 1943 (A Logical Calculus of the Ideas
Immanent in nervous Activity, W. S. McCulloch and W. Pitts, Bulletin of Mathematical
Biophysics, 5(4): 115-133, 1943). Biological neurons are interconnected nerve cells in the
brain that are involved in the processing and transmitting of chemical and electrical signals.
Artificial Neural Network
McCulloch and Pitts described such a nerve cell as a simple logic gate with binary outputs;
multiple signals arrive at the dendrites, they are then integrated into the cell body, and, if the
accumulated signal exceeds a certain threshold, an output signal is generated that will be
passed on by the axon. Frank Rosenblatt published the first concept of the perceptron
learning rule based on the MCP neuron model (The Perceptron: A Perceiving and
Recognizing Automaton, F. Rosenblatt, Cornell Aeronautical Laboratory, 1957). With his
perceptron rule, Rosenblatt proposed an algorithm that would automatically learn the
optimal weight coefficients that would then be multiplied with the input features in order to
make the decision of whether a neuron fires (transmits a signal) or not. In the context of
supervised learning and classification, such an algorithm could then be used to predict
whether a new data point belongs to one class or the other.
The Formal Definition of An Artificial Neuron
More formally, we can put the idea behind artificial neurons into the context of a binary
classification task where we refer to our two classes as 1 (positive class) and –1 (negative
class) for simplicity. We can then define a decision function (𝜙(𝑧)) that takes a linear
combination of certain input values, x, and a corresponding weight vector, w, where z is the
so-called net input.
The Formal Definition of An Artificial
Neuron
• if the net input of a particular example, Xi, s greater than a defined threshold, 𝜃, we
predict class 1, and class –1 otherwise. In the perceptron algorithm, the decision function,
𝜙(·), is a variant of a unit step function:
• For simplicity, we can bring the threshold, 𝜃, to the left side of the equation and define a
weight-zero as 𝑤0 = -𝜃 and 𝑥= 1 so that we write z in a more compact form:
• In machine learning literature, the negative threshold, or weight, 𝑤0 = -𝜃, is usually called
the bias unit.
The following figure illustrates how the net input, 𝑧 = wTx is squashed into a binary output
(–1 or 1) by the decision function of the perceptron (left subfigure) and how it can be used
to discriminate between two linearly separable classes (right subfigure).
The perceptron learning rule
The whole idea behind the MCP neuron and Rosenblatt's thresholded perceptron model is to
use a reductionist approach to mimic how a single neuron in the brain works: it either fires
or it doesn't. Thus, Rosenblatt's initial perceptron rule is fairly simple, and the perceptron
algorithm can be summarized by the following steps:
1. Initialize the weights to 0 or small random numbers.
2. For each training example, 𝒙j
(𝑖):
a. Compute the output value, 𝑦^
b. Update the weights.
• Here, the output value is the class label predicted by the unit step function that we defined
earlier, and the simultaneous update of each weight, 𝑤j , in the weight vector , w, can be
more formally written as: 𝑤𝑗 := 𝑤𝑗+ Δ𝑤𝑗
• The update value for 𝑤𝑗, (or change in 𝑤𝑗) , which we refer to as Δ𝑤 , is calculated by the perceptron
learning rule as follows:
Δ𝑤= 𝜂(𝑦(𝑖)- 𝑦^(𝑖))𝑥j
(𝑖)
• Where 𝜂 is the learning rate (typically a constant between 0.0 and 1.0), y is the true class label of
the ith training example, and 𝑦^(𝑖) is the predicted class label. It is important to note that all weights
in the weight vector are being updated simultaneously, which means that we don't recompute the
predicted label 𝑦^(𝑖) before all of the weights are updated via the respective update values Δ𝑤j.
Concretely, for a two-dimensional dataset, we would write the update as
• let's go through a simple thought experiment to illustrate how beautifully simple this
learning rule really is. In the two scenarios where the perceptron predicts the class label
correctly, the weights remain unchanged, since the update values are 0:
• However, in the case of a wrong prediction, the weights are being pushed toward the
direction of the positive or negative target class:
• To get a better understanding of the multiplicative factor, xj
(i), let's go through another
simple example, where:
It is important to note that the convergence of the perceptron is only guaranteed if the two classes are
linearly separable and the learning rate is sufficiently small f the two classes can't be separated by a
linear decision boundary, we can set a maximum number of passes over the training dataset (epochs)
and/or a threshold for the number of tolerated misclassifications—the perceptron would never stop
updating the weights otherwise:
General concept of the perceptron
The three general layers of a neural network
The middle layers are considered hidden because, like human vision, they covertly
process objects between the input and output layers. When faced with four lines
connected in the shape of a square, our eyes instantly recognize those four lines as a
square. We don’t notice the mental processing that is involved to register the four
polylines (input) as a square (output).
Multilayer Perceptrons
Multilayer Perceptron: The multilayer perceptron (MLP), as with other ANN techniques,
is an algorithm for predicting a categorical (classification) or continuous (regression) target
variable. Multilayer perceptrons are powerful because they aggregate multiple models into a
unified prediction model, as demonstrated by the classification model.
Clustering
We used supervised learning techniques to build machine learning models, using data where
the answer was already known—the class labels were already available in our training data.
Now, we will switch gears and explore cluster analysis, a category of unsupervised learning
techniques that allows us to discover hidden structures in data where we do not know the
right answer upfront. The goal of clustering is to find a natural grouping in data so that
items in the same cluster are more similar to each other than to those from different clusters.
Grouping objects by similarity using k-means
• It is one of the most popular clustering algorithms which is widely used in academia as
well as in industry. Clustering (or cluster analysis) is a technique that allows us to find
groups of similar objects that are more related to each other than to objects in other
groups.
• Examples of business oriented applications of clustering include the grouping of
documents, music, and movies by different topics, or finding customers that share similar
interests based on common purchase behaviors as a basis for recommendation engines.
K-means clustering Algorithm
• k-means algorithm is extremely easy to implement, but it is also computationally very efficient
compared to other clustering algorithms, which might explain its popularity. The k-means algorithm
belongs to the category of prototype-based clustering. We will discuss two other categories of clustering,
hierarchical and density-based clustering.
• Prototype-based clustering means that each cluster is represented by a prototype, which is usually either
the centroid (average) of similar points with continuous features, or the medoid (the most representative
or the point that minimizes the distance to all other points that belong to a particular cluster) in the case
of categorical features. While k-means is very good at identifying clusters with a spherical shape, one of
the drawbacks of this clustering algorithm is that we have to specify the number of clusters, k, a priori.
An inappropriate choice for k can result in poor clustering performance. Later, we will discuss the elbow
method and silhouette plots, which are useful techniques to evaluate the quality of a clustering to help us
determine the optimal number of clusters, k.
K-means clustering Algorithm for k=3
If we were to set k to 4, an additional cluster would be derived from the dataset to produce four
clusters
How does k-means clustering separate the data
points?
• the first step is to examine the un-clustered data and manually select a centroid for each
cluster. That centroid then forms the epicenter of an individual cluster.
• Centroids can be chosen at random, which means you can nominate any data point on the
scatterplot to act as a centroid. However, you can save time by selecting centroids
dispersed across the scatterplot and not directly adjacent to each other. In other words,
start by guessing where you think the centroids for each cluster might be located. The
remaining data points on the scatterplot are then assigned to the nearest centroid by
measuring the Euclidean distance.
Each data point can be assigned to only one cluster, and each cluster is discrete. This means
that there’s no overlap between clusters and no case of nesting a cluster inside another
cluster. Also, all data points, including anomalies, are assigned to a centroid irrespective of
how they impact the final shape of the cluster. However, due to the statistical force that pulls
all nearby data points to a central point, clusters will typically form an elliptical or spherical
shape.
How does k-means clustering separate the data
points?
Decision Tree Learning
Decision tree classifiers are attractive models if we care about interpretability. As the name
"decision tree" suggests, we can think of this model as breaking down our data by making a
decision based on asking a series of questions. Let's consider the following example in
which we use a decision tree to decide upon an activity on a particular day:
Decision Tree Learning
Based on the features in our training dataset, the decision tree model learns a series of
questions to infer the class labels of the examples. Although the preceding figure illustrates
the concept of a decision tree based on categorical variables, the same concept applies if our
features are real numbers, like in the Iris dataset. For example, we could simply define a
cut-off value along the sepal width feature axis and ask a binary question: "Is the sepal
width = 2.8 cm?“. Using the decision algorithm, we start at the tree root and split the data on
the feature that results in the largest information gain (IG), which will be explained in more
detail in the following section. In an iterative process, we can then repeat this splitting
procedure at each child node until the leaves are pure. This means that the training examples
at each node all belong to the same class. In practice, this can result in a very deep tree with
many nodes, which can easily lead to overfitting. Thus, we typically want to prune the tree
by setting a limit for the maximal depth of the tree.
Decision Tree
In general, decision trees represent a disjunction of conjunctions of constraints on the attribute values
of instances. Each path from the tree root to a leaf corresponds to a conjunction of attribute tests, and
the tree itself to a disjunction of these conjunctions.
(Outlook = Sunny  Humidity = Normal) V (Outlook = Overcast) V (Outlook = Rain A Wind = Weak)
Decision Tree
Decision trees classify instances by sorting them down the tree from the root to some leaf
node, which provides the classification of the instance. Each node in the tree specifies a
test of some attribute of the instance, and each branch descending from that node
corresponds to one of the possible values for this attribute. An instance is classified by
starting at the root node of the tree, testing the attribute specified by this node, then moving
down the tree branch corresponding to the value of the attribute in the given example. This
process is then repeated for the subtree rooted at the new node. Decision tree classifies
Saturday mornings according to whether they are suitable for work to do.
e.g. (Outlook = Sunny, Temperature = Hot, Humidity = High, Wind = Strong)
APPROPRIATE PROBLEMS FOR DECISION
TREE LEARNING
• Instances are represented by attribute-value pairs: Instances are described by a fixed set of attributes
(e.g. Temperature) and their values (e.g., Hot). The easiest situation for decision tree learning is when
each attribute takes on a small number of disjoint possible values (e.g., Hot, Mild, Cold). However,
extensions to the basic algorithm allow handling real-valued attributes as well (e.g., representing
Temperature numerically).
• The target function has discrete output values: The decision tree assigns a Boolean classification
(e.g., yes or no) to each example. Decision tree methods easily extend to learning functions with more
than two possible output values. A more substantial extension allows learning target functions with real-
valued outputs, though the application of decision trees in this setting is less common.
• The training data may contain errors.
• The training data may contain missing attribute values.
APPROPRIATE PROBLEMS FOR
DECISION TREE LEARNING
Decision tree learning has therefore been applied to problems such as learning to classify
medical patients by their disease, equipment malfunctions by their cause, and loan
applicants by their likelihood of defaulting on payments. Such problems, in which the task
is to classify examples into one of a discrete set of possible categories, are often referred to
as classification problems.
What is Inductive Learning?
From the perspective of inductive learning, we are given input samples (x) and output samples (f(x)) and
the problem is to estimate the function (f). Specifically, the problem is to generalize from the samples and
the mapping to be useful to estimate the output for new samples in the future. In practice it is almost
always too hard to estimate the function, so we are looking for very good approximations of the function.
e.g.,
• Credit risk assessment.
• The x is the properties of the customer.
• The f(x) is credit approved or not.
• Disease diagnosis.
• The x are the properties of the patient.
• The f(x) is the disease they suffer from.
• Face recognition.
• The x are bitmaps of peoples faces.
• The f(x) is to assign a name to the face.
• Automatic steering.
• The x are bitmap images from a camera in front of the car.
• The f(x) is the degree the steering wheel should be turned.
When Should You Use Inductive Learning?
There are problems where inductive learning is not a good idea. It is important when to use
and when not to use supervised machine learning.
4 problems where inductive learning might be a good idea:
• Problems where there is no human expert. If people do not know the answer they
cannot write a program to solve it. These are areas of true discovery.
• Humans can perform the task but no one can describe how to do it. There are
problems where humans can do things that computer cannot do or do well. Examples
include riding a bike or driving a car.
• Problems where the desired function changes frequently. Humans could describe it
and they could write a program to do it, but the problem changes too often. It is not cost
effective. Examples include the stock market.
• Problems where each user needs a custom function. It is not cost effective to write a
custom program for each user. Example is recommendations of movies or books on
Netflix or Amazon.
Two perspectives on inductive learning:
• Learning is the removal of uncertainty. Having data removes some uncertainty.
Selecting a class of hypotheses we are removing more uncertainty.
• Learning is guessing a good and small hypothesis class. It requires guessing. We don’t
know the solution we must use a trial and error process. If you knew the domain with
certainty, you don’t need learning. But we are not guessing in the dark.
A Framework For Studying Inductive Learning
• Training example: a sample from x including its output from the target function
• Target function: the mapping function f from x to f(x)
• Hypothesis: approximation of f, a candidate function.
• Concept: A Boolean target function, positive examples and negative examples for the 1/0
class values.
• Classifier: Learning program outputs a classifier that can be used to classify.
• Learner: Process that creates the classifier.
• Hypothesis space: set of possible approximations of f that the algorithm can create.
• Version space: subset of the hypothesis space that is consistent with the observed data
Regression
• Linear
• Logistic
Linear Regression
Regression models are used to predict target variables on a continuous scale, which makes
them attractive for addressing many questions in science.
They also have applications in industry, such as understanding relationships between
variables, evaluating trends, or making forecasts. One example is predicting the sales of a
company in future months.
Introducing linear regression
The goal of linear regression is to model the relationship between one or multiple features
and a continuous target variable.
In contrast to classification—a different subcategory of supervised learning—regression
analysis aims to predict outputs on a continuous scale rather than categorical class labels.
Simple linear regression
• The goal of simple (univariate) linear regression is to model the relationship between a
single feature (explanatory variable, x) and a continuous-valued target (response variable,
y). The equation of a linear model with one explanatory variable is defined as follows
• Here w0 represents the y axis intercept and 𝑤1 is the weight coefficient of the explanatory
variable. Our goal is to learn the weights of the linear equation to describe the relationship
between the explanatory variable and the target variable, which can then be used to
predict the responses of new explanatory variables that were not part of the training
dataset.
Linear Regression
The values w0 and w1 must be chosen so that they minimize the error. If sum of squared
error is taken as a metric to evaluate the model, then goal to obtain a line that best reduces
the error. If we don’t square the error, then positive and negative point will cancel out each
other
Intercept Calculation 𝑤0 = 𝑦 − 𝑤1𝜘
Co-efficient Formula
• Exploring ‘w1’
• If w1 > 0, then x(predictor) and y(target) have a positive relationship. That is increase
in x will increase y.
• If w1 < 0, then x(predictor) and y(target) have a negative relationship. That is increase
in x will decrease y.
Exploring w0
• If the model does not include x=0, then the prediction will become meaningless with only
w0. For example, we have a dataset that relates height(x) and weight(y). Taking x=0(that
is height as 0), will make equation have only w0 value which is completely meaningless as
in real-time height and weight can never be zero. This resulted due to considering the
model values beyond its scope.
• If the model includes value 0, then ‘w0’ will be the average of all predicted values when
x=0. But, setting zero for all the predictor variables is often impossible.
• The value of w0 guarantee that residual have mean zero. If there is no ‘w0’ term, then
regression will be forced to pass over the origin. Both the regression co-efficient and
prediction will be biased.
0
100
200
300
400
500
0 500 1000 1500 2000 2500 3000
Thousands
Size (feet2)
Price
(in 1000s of
dollars)
Housing Prices
Notation:
m = Number of training examples
x’s = “input” variable / features
y’s = “output” variable / “target” variable
How to choose ‘s ?
Training Set (m=47)
Hypothesis:
‘s: Parameters
Size in feet2 (x) Price ($) in 1000's (y)
2104 460
1416 232
1534 315
852 178
… …
0
1
2
3
0 1 2 3
0
1
2
3
0 1 2 3
0
1
2
3
0 1 2 3
y
x
Idea: Choose so that
is close to for our
training examples
Linear regression with one variable
Hypothesis:
Parameters:
Cost Function:
Goal:
Simplified
0
1
2
3
0 1 2 3
y
x
(for fixed , this is a function of x) (function of the parameter )
0
1
2
3
-0.5 0 0.5 1 1.5 2 2.5
Classification
Email: Spam / Not Spam?
Online Transactions: Fraudulent (Yes / No)?
Tumor: Malignant / Benign ?
0: “Negative Class” (e.g., benign tumor)
1: “Positive Class” (e.g., malignant tumor)
Tumor Size
Threshold classifier output at 0.5:
If , predict “y = 1”
If , predict “y = 0”
Tumor Size
Malignant ?
(Yes) 1
(No) 0
Classification: y = 0 or 1
can be > 1 or < 0
Logistic Regression:
Logistic Regression
As demonstrated, linear regression is a useful technique to quantify relationships between
continuous variables. Now, Predicting discrete variables plays a major part in data analysis
and machine learning. For instance, is something “A” or “B?” Is it “positive” or “negative?”
Is this person a “new customer” or a “returning customer?” Unlike linear regression, the
dependent variable (y) is no longer a continuous variable (such as price) but rather a discrete
categorical variable. The independent variables used as input to predict the dependent
variable can be either categorical or continuous.
Sigmoid function
Logistic function
Logistic Regression Model
Want
1
0.5
0
Figure : A sigmoid function used to classify data points
Example: Linear regression (housing prices)
Overfitting: If we have too many features, the learned hypothesis
may fit the training set very well ( ), but
fail to generalize to new examples (predict prices on new examples).
Price
Size
Price
Size
Price
Size
Example: Logistic regression
( = sigmoid function)
x1
x2
x1
x2
x1
x2

More Related Content

What's hot

Machine learning
Machine learningMachine learning
Machine learningRohit Kumar
 
Supervised Machine Learning
Supervised Machine LearningSupervised Machine Learning
Supervised Machine LearningAnkit Rai
 
MachineLearning.ppt
MachineLearning.pptMachineLearning.ppt
MachineLearning.pptbutest
 
Machine learning ppt
Machine learning ppt Machine learning ppt
Machine learning ppt Poojamanic
 
Supervised Machine Learning With Types And Techniques
Supervised Machine Learning With Types And TechniquesSupervised Machine Learning With Types And Techniques
Supervised Machine Learning With Types And TechniquesSlideTeam
 
Machine learning overview
Machine learning overviewMachine learning overview
Machine learning overviewprih_yah
 
Artificial Intelligence - Machine Learning Vs Deep Learning
Artificial Intelligence - Machine Learning Vs Deep LearningArtificial Intelligence - Machine Learning Vs Deep Learning
Artificial Intelligence - Machine Learning Vs Deep LearningLogiticks
 
Machine learning - AI
Machine learning - AIMachine learning - AI
Machine learning - AIWitekio
 
Machine learning
Machine learningMachine learning
Machine learningWes Eklund
 
AIF360 - Trusted and Fair AI
AIF360 - Trusted and Fair AIAIF360 - Trusted and Fair AI
AIF360 - Trusted and Fair AIAnimesh Singh
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learningGanesh Satpute
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Marina Santini
 
Intro/Overview on Machine Learning Presentation
Intro/Overview on Machine Learning PresentationIntro/Overview on Machine Learning Presentation
Intro/Overview on Machine Learning PresentationAnkit Gupta
 
Machine learning
Machine learningMachine learning
Machine learningInfoFarm
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.butest
 

What's hot (20)

Machine learning
Machine learningMachine learning
Machine learning
 
Supervised Machine Learning
Supervised Machine LearningSupervised Machine Learning
Supervised Machine Learning
 
MachineLearning.ppt
MachineLearning.pptMachineLearning.ppt
MachineLearning.ppt
 
Machine learning
Machine learningMachine learning
Machine learning
 
Machine learning ppt
Machine learning ppt Machine learning ppt
Machine learning ppt
 
Machine learning
Machine learning Machine learning
Machine learning
 
Supervised Machine Learning With Types And Techniques
Supervised Machine Learning With Types And TechniquesSupervised Machine Learning With Types And Techniques
Supervised Machine Learning With Types And Techniques
 
Machine learning overview
Machine learning overviewMachine learning overview
Machine learning overview
 
Artificial Intelligence - Machine Learning Vs Deep Learning
Artificial Intelligence - Machine Learning Vs Deep LearningArtificial Intelligence - Machine Learning Vs Deep Learning
Artificial Intelligence - Machine Learning Vs Deep Learning
 
machine learning
machine learningmachine learning
machine learning
 
Machine learning - AI
Machine learning - AIMachine learning - AI
Machine learning - AI
 
Machine learning
Machine learningMachine learning
Machine learning
 
AIF360 - Trusted and Fair AI
AIF360 - Trusted and Fair AIAIF360 - Trusted and Fair AI
AIF360 - Trusted and Fair AI
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?
 
Intro/Overview on Machine Learning Presentation
Intro/Overview on Machine Learning PresentationIntro/Overview on Machine Learning Presentation
Intro/Overview on Machine Learning Presentation
 
Machine learning
Machine learningMachine learning
Machine learning
 
Machine learning
Machine learningMachine learning
Machine learning
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.
 
Machine learning
Machine learningMachine learning
Machine learning
 

Similar to Machine Learning Contents.pptx

introduction to machine learning
introduction to machine learningintroduction to machine learning
introduction to machine learningJohnson Ubah
 
Machine learning Chapter 1
Machine learning Chapter 1Machine learning Chapter 1
Machine learning Chapter 1JagadishPogu
 
Introduction to Machine Learning.pptx
Introduction to Machine Learning.pptxIntroduction to Machine Learning.pptx
Introduction to Machine Learning.pptxDr. Amanpreet Kaur
 
machine learning.docx
machine learning.docxmachine learning.docx
machine learning.docxJadhavArjun2
 
what-is-machine-learning-and-its-importance-in-todays-world.pdf
what-is-machine-learning-and-its-importance-in-todays-world.pdfwhat-is-machine-learning-and-its-importance-in-todays-world.pdf
what-is-machine-learning-and-its-importance-in-todays-world.pdfTemok IT Services
 
Machine Learning Basics
Machine Learning BasicsMachine Learning Basics
Machine Learning BasicsSuresh Arora
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningSujith Jayaprakash
 
Machine learning basics by akanksha bali
Machine learning basics by akanksha baliMachine learning basics by akanksha bali
Machine learning basics by akanksha baliAkanksha Bali
 
Machine learning basics
Machine learning basics Machine learning basics
Machine learning basics Akanksha Bali
 
Supervised learning techniques and applications
Supervised learning techniques and applicationsSupervised learning techniques and applications
Supervised learning techniques and applicationsBenjaminlapid1
 
Essential concepts for machine learning
Essential concepts for machine learning Essential concepts for machine learning
Essential concepts for machine learning pyingkodi maran
 
Machine Learning Ch 1.ppt
Machine Learning Ch 1.pptMachine Learning Ch 1.ppt
Machine Learning Ch 1.pptARVIND SARDAR
 
Machine Learning with Python- Methods for Machine Learning.pptx
Machine Learning with Python- Methods for Machine Learning.pptxMachine Learning with Python- Methods for Machine Learning.pptx
Machine Learning with Python- Methods for Machine Learning.pptxiaeronlineexm
 
Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)SwatiTripathi44
 
Unit 1-ML (1) (1).pptx
Unit 1-ML (1) (1).pptxUnit 1-ML (1) (1).pptx
Unit 1-ML (1) (1).pptxChitrachitrap
 
Machine learning applications nurturing growth of various business domains
Machine learning applications nurturing growth of various business domainsMachine learning applications nurturing growth of various business domains
Machine learning applications nurturing growth of various business domainsShrutika Oswal
 
Machine Learning for AIML course UG.pptx
Machine Learning for AIML course UG.pptxMachine Learning for AIML course UG.pptx
Machine Learning for AIML course UG.pptxJohnWilliam111370
 

Similar to Machine Learning Contents.pptx (20)

Machine Learning by Rj
Machine Learning by RjMachine Learning by Rj
Machine Learning by Rj
 
introduction to machine learning
introduction to machine learningintroduction to machine learning
introduction to machine learning
 
Machine learning Chapter 1
Machine learning Chapter 1Machine learning Chapter 1
Machine learning Chapter 1
 
Introduction to Machine Learning.pptx
Introduction to Machine Learning.pptxIntroduction to Machine Learning.pptx
Introduction to Machine Learning.pptx
 
Machine Learning_Unit 2_Full.ppt.pdf
Machine Learning_Unit 2_Full.ppt.pdfMachine Learning_Unit 2_Full.ppt.pdf
Machine Learning_Unit 2_Full.ppt.pdf
 
machine learning.docx
machine learning.docxmachine learning.docx
machine learning.docx
 
what-is-machine-learning-and-its-importance-in-todays-world.pdf
what-is-machine-learning-and-its-importance-in-todays-world.pdfwhat-is-machine-learning-and-its-importance-in-todays-world.pdf
what-is-machine-learning-and-its-importance-in-todays-world.pdf
 
Machine Learning Basics
Machine Learning BasicsMachine Learning Basics
Machine Learning Basics
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Machine learning basics by akanksha bali
Machine learning basics by akanksha baliMachine learning basics by akanksha bali
Machine learning basics by akanksha bali
 
Machine learning basics
Machine learning basics Machine learning basics
Machine learning basics
 
Supervised learning techniques and applications
Supervised learning techniques and applicationsSupervised learning techniques and applications
Supervised learning techniques and applications
 
Essential concepts for machine learning
Essential concepts for machine learning Essential concepts for machine learning
Essential concepts for machine learning
 
ML_Module_1.pdf
ML_Module_1.pdfML_Module_1.pdf
ML_Module_1.pdf
 
Machine Learning Ch 1.ppt
Machine Learning Ch 1.pptMachine Learning Ch 1.ppt
Machine Learning Ch 1.ppt
 
Machine Learning with Python- Methods for Machine Learning.pptx
Machine Learning with Python- Methods for Machine Learning.pptxMachine Learning with Python- Methods for Machine Learning.pptx
Machine Learning with Python- Methods for Machine Learning.pptx
 
Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)
 
Unit 1-ML (1) (1).pptx
Unit 1-ML (1) (1).pptxUnit 1-ML (1) (1).pptx
Unit 1-ML (1) (1).pptx
 
Machine learning applications nurturing growth of various business domains
Machine learning applications nurturing growth of various business domainsMachine learning applications nurturing growth of various business domains
Machine learning applications nurturing growth of various business domains
 
Machine Learning for AIML course UG.pptx
Machine Learning for AIML course UG.pptxMachine Learning for AIML course UG.pptx
Machine Learning for AIML course UG.pptx
 

Recently uploaded

Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . pptDineshKumar4165
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfJiananWang21
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesMayuraD1
 
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...HenryBriggs2
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdfKamal Acharya
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...drmkjayanthikannan
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"mphochane1998
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...Amil baba
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxSCMS School of Architecture
 
Introduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdfIntroduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdfsumitt6_25730773
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxJuliansyahHarahap1
 
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptxrouholahahmadi9876
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
 
Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Ramkumar k
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilVinayVitekari
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapRishantSharmaFr
 

Recently uploaded (20)

Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdf
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
Introduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdfIntroduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdf
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech Civil
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 

Machine Learning Contents.pptx

  • 2. Outline • Learning, Types of Learning • Well defined learning problems, Designing a Learning System • History of ML, Introduction of Machine Learning Approaches • Artificial Neural Network, Clustering, Reinforcement Learning • Decision Tree Learning, Bayesian networks • Support Vector Machine, Genetic Algorithm • Issues in Machine Learning • Data Science Vs Machine Learning
  • 3. Have you ever heard of ! • Virtual Personal Assistants • Smart Speakers: Amazon Echo and Google Home • Mobile Apps: Ok Google • Predictions while Commuting • GPS navigation • Videos Surveillance • Social Media Services • People You May Know • Face Recognition • Similar Pins • Email Spam and Malware Filtering • Online Customer Support • Search Engine Result Refining • Product Recommendations • Online Fraud Detection
  • 4.
  • 5. Machine Learning definition • Arthur Samuel (1959). Machine Learning: Field of study that gives computers the ability to learn without being explicitly programmed. • Machine learning (ML) is a type of artificial intelligence (AI) that allows software applications to become more accurate at predicting outcomes without being explicitly programmed to do so. Machine learning algorithms use historical data as input to predict new output values. • Machine learning is an application of AI that enables systems to learn and improve from experience without being explicitly programmed. Machine learning focuses on developing computer programs that can access data and use it to learn for themselves.
  • 7.
  • 8. Machine learning Types • Machine learning algorithms: • Supervised learning • Unsupervised learning • Others: Reinforcement learning, recommender systems.
  • 9. Supervised Learning Supervised machine learning algorithms are designed to learn a machine by labels. The name “supervised” learning originates from the idea that training this type of algorithm is like having a teacher supervise the whole process.
  • 10. Supervised Learning When training a supervised learning algorithm, the training data will consist of inputs paired with the correct outputs. During training, the algorithm will search for patterns in the data that correlate with the desired outputs. After training, a supervised learning algorithm will take in new unseen inputs and will determine which label the new inputs will be classified as based on prior training data. The objective of a supervised learning model is to predict the correct label for newly presented input data. At its most basic form, a supervised learning algorithm can be written simply as: Y=f(x) Where Y is the predicted output that is determined by a mapping function that assigns a class to an input value x. The function used to connect input features to a predicted output is
  • 11. Supervised learning can be split into two subcategories: • Regression • Linear Regression • Logistic Regression • Polynomial Regression • Decision Tree Regression • Classification • Linear Classifiers • Support Vector Machines • Decision Trees classification • K-Nearest Neighbor • Random Forest
  • 13. Classification A classification algorithm will be given data points with an assigned category. The job of a classification algorithm is to then take an input value and assign it a class, or category, that it fits into based on the training data provided.
  • 14.
  • 15. Unsupervised learning Unsupervised learning occurs when an algorithm learns from plain examples without any associated response, leaving to the algorithm to determine the data patterns on its own. When no labels are present in data set to train the model. This is called un supervised ML. • Clustering • SVD • PCA • HMM • Neural Networks • Fuzzy C-Means
  • 16. Reinforcement learning • Reinforcement Learning is a feedback-based Machine learning technique in which an agent learns to behave in an environment by performing the actions and seeing the results of actions. For each action, the agent gets feedback. E.g. AWSDeepRacer Car • IN RL, an agent interacts with an environment with an objective to maximize its total reward. • A reinforcement learning model will learn from its experience and over the time will be able to identify which actions lead to the best rewards. • The main component of RL are:  Agent  Environment  State  Reward  Action
  • 17. Applications of RL technique. Video gameplay: Reinforcement learning has been used to teach bots to play a number of video games. Resource management: Given finite resources and a defined goal, reinforcement learning can help enterprises plan out how to allocate resources.
  • 18. Applications of ML Email-Spam Filtering Traffic Prediction Virtual Personal Assistant: Google assistant, alexa, cortona, Siri Social Media Personalization Online Fraud Detection Stock Market Prediction Weather Prediction Speech Recognition Medical Diagnosis Self driving car Image Recognition
  • 19. Issues in ML Poor quality of data  Unclean and noisy data Remove outliers Filter missing values Remove unwanted features Underfitting of training data This process occurs when data is unable to establish an accurate relationship between input and output variables. It simply means trying to fit in undersized jeans. It signifies the data is too simple to establish a precise relationship. To overcome this issue: Enhance the complexity of the model Add more features to the data
  • 20. Issues in ML ML is a complex process It includes analyzing the data, removing data bias, training data, applying complex mathematical calculations, and a lot more. Hence it is a really complicated process which is another big challenge for Machine learning professionals. Lack of training data The most important task you need to do in the machine learning process is to train the data to achieve an accurate output. Less amount training data will produce inaccurate or too biased predictions.
  • 21. Issues in ML Slow implementation This is one of the common issues faced by machine learning professionals. The machine learning models are highly efficient in providing accurate results, but it takes a tremendous amount of time. Slow programs, data overload, and excessive requirements usually take a lot of time to provide accurate results. Further, it requires constant monitoring and maintenance to deliver the best output. Imperfections in the Algorithm When Data Grows you have found quality data, trained it amazingly, and the predictions are really concise and accurate. Yay, you have learned how to create a machine learning algorithm!! But wait, there is a twist; the model may become useless in the future as data grows. The best model of the present may become inaccurate in the coming Future and require further rearrangement. So you need regular monitoring and maintenance to keep the algorithm working. This is one of the most exhausting issues faced by machine learning professionals.
  • 22.
  • 23. Data Science vs. Machine Learning 1. A field of deep study of data that includes extracting useful insights from the data, and processing that information using different tools, statistical models, and Machine learning algorithms. 2. It is used for discovering insights from the data. 3. It is a broad term that includes various steps to create a model for a given problem and deploy the model. 4. A data scientist needs to have skills to use big data tools like Hadoop, Hive and Pig, statistics, programming in Python, R, or Scala, data visualization. 5. It can work with raw, structured, and unstructured data. 6. Data scientists spent lots of time in handling the data, cleansing the data, and understanding its patterns. 1. Machine Leaning allows the computers to learn from the past experiences by its own, it uses statistical methods to improve the performance and predict the output without being explicitly programmed. 2. It is used for making predictions and classifying the result for new data points. 3. It is used in the data modeling step of the data science as a complete process. 4. Machine Learning Engineer needs to have skills such as computer science fundamentals, programming skills in Python or R, statistics and probability concepts, etc. 5. It mostly requires structured data to work on. 6. ML engineers spend a lot of time for managing the complexities that occur during the implementation of algorithms and mathematical concepts behind that.
  • 25. DESIGNING A LEARNING SYSTEM Choosing the Training Experience (dataset): The very important and first task is to choose the training data or training experience which will be fed to the Machine Learning Algorithm. Three important parameters are: Feedback regarding choice Degree to control the sequence of training example Distribution of example for performance measure Choosing target function: The next important step is choosing the target function. It means according to the knowledge fed to the algorithm the machine learning will choose NextMove function which will describe what type of legal moves should be taken. For example : While playing chess with the opponent, when opponent will play then the machine learning algorithm will decide what be the number of possible legal moves taken in order to get success.
  • 26. DESIGNING A LEARNING SYSTEM Choosing Representation for Target function: When the machine algorithm will know all the possible legal moves the next step is to choose the optimized move using any representation i.e. using linear Equations, Hierarchical Graph Representation, Tabular form etc. The NextMove function will move the Target move like out of these move which will provide more success rate. For Example : while playing chess machine have 4 possible moves, so the machine will choose that optimized move which will provide success to it. Choosing Function Approximation Algorithm: An optimized move cannot be chosen just with the training data. The training data had to go through with set of example and through these examples the training data will approximates which steps are chosen and after that machine will provide feedback on it. For Example : When a training data of Playing chess is fed to algorithm so at that time it is not machine algorithm will fail or get success and again from that failure or success it will measure while next move what step should be chosen and what is its success rate.
  • 27. DESIGNING A LEARNING SYSTEM Final Design: The final design is created at last when system goes from number of examples , failures and success , correct and incorrect decision and what will be the next step etc. Example: DeepBlue is an intelligent computer which is ML-based won chess game against the chess expert Garry Kasparov, and it became the first computer which had beaten a human chess expert.
  • 28.
  • 30. Introduction of Machine Learning Approaches We can decide which machine learning approaches/algorithm to select based on the problem statement, its an interaction with the environment and what type of data and inputs are going to be. We can categorize the machine learning algorithms in two groups: 1) Learning algorithms 2) Similarity algorithms. The similarity algorithms further used as a learning model based on the types of problem environment.
  • 32.
  • 33. Similarity Algorithms • Regression Algorithms • Clustering • Decision Tree Algorithms • Artificial Neural Networks • Support Vector Machine • Reinforcement Learning • Bayesian networks • Support Vector Machine • Genetic Algorithm
  • 34. Artificial Neural Network Warren McCulloch and Walter Pitts published the first concept of a simplified brain cell, the so-called McCulloch-Pitts (MCP) neuron, in 1943 (A Logical Calculus of the Ideas Immanent in nervous Activity, W. S. McCulloch and W. Pitts, Bulletin of Mathematical Biophysics, 5(4): 115-133, 1943). Biological neurons are interconnected nerve cells in the brain that are involved in the processing and transmitting of chemical and electrical signals.
  • 35. Artificial Neural Network McCulloch and Pitts described such a nerve cell as a simple logic gate with binary outputs; multiple signals arrive at the dendrites, they are then integrated into the cell body, and, if the accumulated signal exceeds a certain threshold, an output signal is generated that will be passed on by the axon. Frank Rosenblatt published the first concept of the perceptron learning rule based on the MCP neuron model (The Perceptron: A Perceiving and Recognizing Automaton, F. Rosenblatt, Cornell Aeronautical Laboratory, 1957). With his perceptron rule, Rosenblatt proposed an algorithm that would automatically learn the optimal weight coefficients that would then be multiplied with the input features in order to make the decision of whether a neuron fires (transmits a signal) or not. In the context of supervised learning and classification, such an algorithm could then be used to predict whether a new data point belongs to one class or the other.
  • 36. The Formal Definition of An Artificial Neuron More formally, we can put the idea behind artificial neurons into the context of a binary classification task where we refer to our two classes as 1 (positive class) and –1 (negative class) for simplicity. We can then define a decision function (𝜙(𝑧)) that takes a linear combination of certain input values, x, and a corresponding weight vector, w, where z is the so-called net input.
  • 37. The Formal Definition of An Artificial Neuron • if the net input of a particular example, Xi, s greater than a defined threshold, 𝜃, we predict class 1, and class –1 otherwise. In the perceptron algorithm, the decision function, 𝜙(·), is a variant of a unit step function: • For simplicity, we can bring the threshold, 𝜃, to the left side of the equation and define a weight-zero as 𝑤0 = -𝜃 and 𝑥= 1 so that we write z in a more compact form: • In machine learning literature, the negative threshold, or weight, 𝑤0 = -𝜃, is usually called the bias unit.
  • 38. The following figure illustrates how the net input, 𝑧 = wTx is squashed into a binary output (–1 or 1) by the decision function of the perceptron (left subfigure) and how it can be used to discriminate between two linearly separable classes (right subfigure).
  • 39. The perceptron learning rule The whole idea behind the MCP neuron and Rosenblatt's thresholded perceptron model is to use a reductionist approach to mimic how a single neuron in the brain works: it either fires or it doesn't. Thus, Rosenblatt's initial perceptron rule is fairly simple, and the perceptron algorithm can be summarized by the following steps: 1. Initialize the weights to 0 or small random numbers. 2. For each training example, 𝒙j (𝑖): a. Compute the output value, 𝑦^ b. Update the weights. • Here, the output value is the class label predicted by the unit step function that we defined earlier, and the simultaneous update of each weight, 𝑤j , in the weight vector , w, can be more formally written as: 𝑤𝑗 := 𝑤𝑗+ Δ𝑤𝑗
  • 40. • The update value for 𝑤𝑗, (or change in 𝑤𝑗) , which we refer to as Δ𝑤 , is calculated by the perceptron learning rule as follows: Δ𝑤= 𝜂(𝑦(𝑖)- 𝑦^(𝑖))𝑥j (𝑖) • Where 𝜂 is the learning rate (typically a constant between 0.0 and 1.0), y is the true class label of the ith training example, and 𝑦^(𝑖) is the predicted class label. It is important to note that all weights in the weight vector are being updated simultaneously, which means that we don't recompute the predicted label 𝑦^(𝑖) before all of the weights are updated via the respective update values Δ𝑤j. Concretely, for a two-dimensional dataset, we would write the update as
  • 41. • let's go through a simple thought experiment to illustrate how beautifully simple this learning rule really is. In the two scenarios where the perceptron predicts the class label correctly, the weights remain unchanged, since the update values are 0: • However, in the case of a wrong prediction, the weights are being pushed toward the direction of the positive or negative target class: • To get a better understanding of the multiplicative factor, xj (i), let's go through another simple example, where:
  • 42. It is important to note that the convergence of the perceptron is only guaranteed if the two classes are linearly separable and the learning rate is sufficiently small f the two classes can't be separated by a linear decision boundary, we can set a maximum number of passes over the training dataset (epochs) and/or a threshold for the number of tolerated misclassifications—the perceptron would never stop updating the weights otherwise:
  • 43. General concept of the perceptron
  • 44. The three general layers of a neural network The middle layers are considered hidden because, like human vision, they covertly process objects between the input and output layers. When faced with four lines connected in the shape of a square, our eyes instantly recognize those four lines as a square. We don’t notice the mental processing that is involved to register the four polylines (input) as a square (output).
  • 45. Multilayer Perceptrons Multilayer Perceptron: The multilayer perceptron (MLP), as with other ANN techniques, is an algorithm for predicting a categorical (classification) or continuous (regression) target variable. Multilayer perceptrons are powerful because they aggregate multiple models into a unified prediction model, as demonstrated by the classification model.
  • 46. Clustering We used supervised learning techniques to build machine learning models, using data where the answer was already known—the class labels were already available in our training data. Now, we will switch gears and explore cluster analysis, a category of unsupervised learning techniques that allows us to discover hidden structures in data where we do not know the right answer upfront. The goal of clustering is to find a natural grouping in data so that items in the same cluster are more similar to each other than to those from different clusters.
  • 47. Grouping objects by similarity using k-means • It is one of the most popular clustering algorithms which is widely used in academia as well as in industry. Clustering (or cluster analysis) is a technique that allows us to find groups of similar objects that are more related to each other than to objects in other groups. • Examples of business oriented applications of clustering include the grouping of documents, music, and movies by different topics, or finding customers that share similar interests based on common purchase behaviors as a basis for recommendation engines.
  • 48. K-means clustering Algorithm • k-means algorithm is extremely easy to implement, but it is also computationally very efficient compared to other clustering algorithms, which might explain its popularity. The k-means algorithm belongs to the category of prototype-based clustering. We will discuss two other categories of clustering, hierarchical and density-based clustering. • Prototype-based clustering means that each cluster is represented by a prototype, which is usually either the centroid (average) of similar points with continuous features, or the medoid (the most representative or the point that minimizes the distance to all other points that belong to a particular cluster) in the case of categorical features. While k-means is very good at identifying clusters with a spherical shape, one of the drawbacks of this clustering algorithm is that we have to specify the number of clusters, k, a priori. An inappropriate choice for k can result in poor clustering performance. Later, we will discuss the elbow method and silhouette plots, which are useful techniques to evaluate the quality of a clustering to help us determine the optimal number of clusters, k.
  • 49. K-means clustering Algorithm for k=3 If we were to set k to 4, an additional cluster would be derived from the dataset to produce four clusters
  • 50. How does k-means clustering separate the data points? • the first step is to examine the un-clustered data and manually select a centroid for each cluster. That centroid then forms the epicenter of an individual cluster. • Centroids can be chosen at random, which means you can nominate any data point on the scatterplot to act as a centroid. However, you can save time by selecting centroids dispersed across the scatterplot and not directly adjacent to each other. In other words, start by guessing where you think the centroids for each cluster might be located. The remaining data points on the scatterplot are then assigned to the nearest centroid by measuring the Euclidean distance.
  • 51. Each data point can be assigned to only one cluster, and each cluster is discrete. This means that there’s no overlap between clusters and no case of nesting a cluster inside another cluster. Also, all data points, including anomalies, are assigned to a centroid irrespective of how they impact the final shape of the cluster. However, due to the statistical force that pulls all nearby data points to a central point, clusters will typically form an elliptical or spherical shape. How does k-means clustering separate the data points?
  • 52. Decision Tree Learning Decision tree classifiers are attractive models if we care about interpretability. As the name "decision tree" suggests, we can think of this model as breaking down our data by making a decision based on asking a series of questions. Let's consider the following example in which we use a decision tree to decide upon an activity on a particular day:
  • 53. Decision Tree Learning Based on the features in our training dataset, the decision tree model learns a series of questions to infer the class labels of the examples. Although the preceding figure illustrates the concept of a decision tree based on categorical variables, the same concept applies if our features are real numbers, like in the Iris dataset. For example, we could simply define a cut-off value along the sepal width feature axis and ask a binary question: "Is the sepal width = 2.8 cm?“. Using the decision algorithm, we start at the tree root and split the data on the feature that results in the largest information gain (IG), which will be explained in more detail in the following section. In an iterative process, we can then repeat this splitting procedure at each child node until the leaves are pure. This means that the training examples at each node all belong to the same class. In practice, this can result in a very deep tree with many nodes, which can easily lead to overfitting. Thus, we typically want to prune the tree by setting a limit for the maximal depth of the tree.
  • 54. Decision Tree In general, decision trees represent a disjunction of conjunctions of constraints on the attribute values of instances. Each path from the tree root to a leaf corresponds to a conjunction of attribute tests, and the tree itself to a disjunction of these conjunctions. (Outlook = Sunny  Humidity = Normal) V (Outlook = Overcast) V (Outlook = Rain A Wind = Weak)
  • 55. Decision Tree Decision trees classify instances by sorting them down the tree from the root to some leaf node, which provides the classification of the instance. Each node in the tree specifies a test of some attribute of the instance, and each branch descending from that node corresponds to one of the possible values for this attribute. An instance is classified by starting at the root node of the tree, testing the attribute specified by this node, then moving down the tree branch corresponding to the value of the attribute in the given example. This process is then repeated for the subtree rooted at the new node. Decision tree classifies Saturday mornings according to whether they are suitable for work to do. e.g. (Outlook = Sunny, Temperature = Hot, Humidity = High, Wind = Strong)
  • 56. APPROPRIATE PROBLEMS FOR DECISION TREE LEARNING • Instances are represented by attribute-value pairs: Instances are described by a fixed set of attributes (e.g. Temperature) and their values (e.g., Hot). The easiest situation for decision tree learning is when each attribute takes on a small number of disjoint possible values (e.g., Hot, Mild, Cold). However, extensions to the basic algorithm allow handling real-valued attributes as well (e.g., representing Temperature numerically). • The target function has discrete output values: The decision tree assigns a Boolean classification (e.g., yes or no) to each example. Decision tree methods easily extend to learning functions with more than two possible output values. A more substantial extension allows learning target functions with real- valued outputs, though the application of decision trees in this setting is less common. • The training data may contain errors. • The training data may contain missing attribute values.
  • 57. APPROPRIATE PROBLEMS FOR DECISION TREE LEARNING Decision tree learning has therefore been applied to problems such as learning to classify medical patients by their disease, equipment malfunctions by their cause, and loan applicants by their likelihood of defaulting on payments. Such problems, in which the task is to classify examples into one of a discrete set of possible categories, are often referred to as classification problems.
  • 58. What is Inductive Learning? From the perspective of inductive learning, we are given input samples (x) and output samples (f(x)) and the problem is to estimate the function (f). Specifically, the problem is to generalize from the samples and the mapping to be useful to estimate the output for new samples in the future. In practice it is almost always too hard to estimate the function, so we are looking for very good approximations of the function. e.g., • Credit risk assessment. • The x is the properties of the customer. • The f(x) is credit approved or not. • Disease diagnosis. • The x are the properties of the patient. • The f(x) is the disease they suffer from. • Face recognition. • The x are bitmaps of peoples faces. • The f(x) is to assign a name to the face. • Automatic steering. • The x are bitmap images from a camera in front of the car. • The f(x) is the degree the steering wheel should be turned.
  • 59. When Should You Use Inductive Learning? There are problems where inductive learning is not a good idea. It is important when to use and when not to use supervised machine learning. 4 problems where inductive learning might be a good idea: • Problems where there is no human expert. If people do not know the answer they cannot write a program to solve it. These are areas of true discovery. • Humans can perform the task but no one can describe how to do it. There are problems where humans can do things that computer cannot do or do well. Examples include riding a bike or driving a car. • Problems where the desired function changes frequently. Humans could describe it and they could write a program to do it, but the problem changes too often. It is not cost effective. Examples include the stock market. • Problems where each user needs a custom function. It is not cost effective to write a custom program for each user. Example is recommendations of movies or books on Netflix or Amazon.
  • 60. Two perspectives on inductive learning: • Learning is the removal of uncertainty. Having data removes some uncertainty. Selecting a class of hypotheses we are removing more uncertainty. • Learning is guessing a good and small hypothesis class. It requires guessing. We don’t know the solution we must use a trial and error process. If you knew the domain with certainty, you don’t need learning. But we are not guessing in the dark.
  • 61. A Framework For Studying Inductive Learning • Training example: a sample from x including its output from the target function • Target function: the mapping function f from x to f(x) • Hypothesis: approximation of f, a candidate function. • Concept: A Boolean target function, positive examples and negative examples for the 1/0 class values. • Classifier: Learning program outputs a classifier that can be used to classify. • Learner: Process that creates the classifier. • Hypothesis space: set of possible approximations of f that the algorithm can create. • Version space: subset of the hypothesis space that is consistent with the observed data
  • 63. Linear Regression Regression models are used to predict target variables on a continuous scale, which makes them attractive for addressing many questions in science. They also have applications in industry, such as understanding relationships between variables, evaluating trends, or making forecasts. One example is predicting the sales of a company in future months.
  • 64. Introducing linear regression The goal of linear regression is to model the relationship between one or multiple features and a continuous target variable. In contrast to classification—a different subcategory of supervised learning—regression analysis aims to predict outputs on a continuous scale rather than categorical class labels. Simple linear regression • The goal of simple (univariate) linear regression is to model the relationship between a single feature (explanatory variable, x) and a continuous-valued target (response variable, y). The equation of a linear model with one explanatory variable is defined as follows • Here w0 represents the y axis intercept and 𝑤1 is the weight coefficient of the explanatory variable. Our goal is to learn the weights of the linear equation to describe the relationship between the explanatory variable and the target variable, which can then be used to predict the responses of new explanatory variables that were not part of the training dataset.
  • 65.
  • 66. Linear Regression The values w0 and w1 must be chosen so that they minimize the error. If sum of squared error is taken as a metric to evaluate the model, then goal to obtain a line that best reduces the error. If we don’t square the error, then positive and negative point will cancel out each other Intercept Calculation 𝑤0 = 𝑦 − 𝑤1𝜘 Co-efficient Formula
  • 67. • Exploring ‘w1’ • If w1 > 0, then x(predictor) and y(target) have a positive relationship. That is increase in x will increase y. • If w1 < 0, then x(predictor) and y(target) have a negative relationship. That is increase in x will decrease y.
  • 68. Exploring w0 • If the model does not include x=0, then the prediction will become meaningless with only w0. For example, we have a dataset that relates height(x) and weight(y). Taking x=0(that is height as 0), will make equation have only w0 value which is completely meaningless as in real-time height and weight can never be zero. This resulted due to considering the model values beyond its scope. • If the model includes value 0, then ‘w0’ will be the average of all predicted values when x=0. But, setting zero for all the predictor variables is often impossible. • The value of w0 guarantee that residual have mean zero. If there is no ‘w0’ term, then regression will be forced to pass over the origin. Both the regression co-efficient and prediction will be biased.
  • 69. 0 100 200 300 400 500 0 500 1000 1500 2000 2500 3000 Thousands Size (feet2) Price (in 1000s of dollars) Housing Prices Notation: m = Number of training examples x’s = “input” variable / features y’s = “output” variable / “target” variable
  • 70.
  • 71. How to choose ‘s ? Training Set (m=47) Hypothesis: ‘s: Parameters Size in feet2 (x) Price ($) in 1000's (y) 2104 460 1416 232 1534 315 852 178 … …
  • 72. 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3
  • 73. y x Idea: Choose so that is close to for our training examples
  • 74. Linear regression with one variable Hypothesis: Parameters: Cost Function: Goal: Simplified
  • 75. 0 1 2 3 0 1 2 3 y x (for fixed , this is a function of x) (function of the parameter ) 0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5
  • 76. Classification Email: Spam / Not Spam? Online Transactions: Fraudulent (Yes / No)? Tumor: Malignant / Benign ? 0: “Negative Class” (e.g., benign tumor) 1: “Positive Class” (e.g., malignant tumor)
  • 77. Tumor Size Threshold classifier output at 0.5: If , predict “y = 1” If , predict “y = 0” Tumor Size Malignant ? (Yes) 1 (No) 0
  • 78. Classification: y = 0 or 1 can be > 1 or < 0 Logistic Regression:
  • 79. Logistic Regression As demonstrated, linear regression is a useful technique to quantify relationships between continuous variables. Now, Predicting discrete variables plays a major part in data analysis and machine learning. For instance, is something “A” or “B?” Is it “positive” or “negative?” Is this person a “new customer” or a “returning customer?” Unlike linear regression, the dependent variable (y) is no longer a continuous variable (such as price) but rather a discrete categorical variable. The independent variables used as input to predict the dependent variable can be either categorical or continuous.
  • 80. Sigmoid function Logistic function Logistic Regression Model Want 1 0.5 0
  • 81. Figure : A sigmoid function used to classify data points
  • 82. Example: Linear regression (housing prices) Overfitting: If we have too many features, the learned hypothesis may fit the training set very well ( ), but fail to generalize to new examples (predict prices on new examples). Price Size Price Size Price Size
  • 83. Example: Logistic regression ( = sigmoid function) x1 x2 x1 x2 x1 x2

Editor's Notes

  1. Reinforcement learning Reinforcement learning occurs when you present the algorithm with examples that lack labels, as in unsupervised learning. However, you can accompany an example with positive or negative feedback according to the solution the algorithm proposes. Reinforcement learning is connected to applications for which the algorithm Must make decisions (so the product is prescriptive, not just descriptive, as in unsupervised learning), and the decisions bear consequences. In the human world, it is just like learning by trial and error. Errors help you learn because they have a penalty added (cost, loss of time, regret, pain, and so on), teaching you that a certain course of action is less likely to succeed than others. An interesting example of reinforcement learning occurs when computers learn to play video games by themselves. In this case, an application presents the algorithm with examples of specific situations, such as having the gamer stuck in a maze while avoiding an enemy. The application lets the algorithm know the outcome of actions it takes, and learning occurs while trying to avoid what it discovers to be dangerous and to pursue survival. You can have a look at how the company Google DeepMind has created a reinforcement learning program that plays old Atari痴 videogames at https://www.youtube.com/watch?v=V1eYniJ0Rnk. When watching the video, notice how the program is initially clumsy and unskilled but steadily improves with training until it becomes a champion.
  2. Code Available: https://github.com/SSaishruthi/Linear_Regression_Detailed_Implementation