2. Definition
• The art of making machines intelligent without explicit programming
Machine Learning is a field which consists of learning algorithms or
techniques which
• Execute some tasks T (Regression/Classification)
• Improve their performance P (model performance)
• With experience E (Delta)
Algorithms that can learn from observational data, and can make
predictions based on it.
3. 3 Stages of ML process
• Representation (Selection of an algorithm/parameters)
• Evaluation (Objective function)
• Optimization (finding the optimal parameters)
4. Types of Machine Learning
Supervised Machine Learning
• Training is done using labelled data
• Algorithm learns the mapping function from the input to the output. Y =
f(X)
• Examples:
• Regression – used to predict continuous values
• Classification – used to predict categorical values
• In supervised learning, the algorithm learns from “correct” answers.
• The model is created and then used to predict answers from future data
• Example: Train a model to predict car price based on car attributes using
historical sales data. The model can predict optimal price for the new car
that haven’t been sold before.
Movie
Genres
5. Types of Machine Learning
Unsupervised Machine Learning
• Training is done using unlabeled data
• Algorithms are left to their own devices to discover and present the
interesting structure in the data.
• Clustering – used to discover the inherent groupings in the data
• Association – used to discover rules that describe large portions of the
data
• Example: group (cluster) of objects into 2 different sets based on
characteristics of those objects.
Dating
Sites
6. Evaluating Supervised Learning
• If you have a set of training data that includes the value you are
trying to predict; you don’t have to guess if the resulting model
is good or bad.
• If you have enough training data, you can split into two parts: a
training set and a test set.
• Then you train the model using training
set. Then measure the model’s accuracy
by asking it to predict values for the
test set, and compare that to the
unknown, true values.
7. Supervised ML - Regression
Regression is a technique that determines the relationship between one or
more independent variables and a dependent variable
Other naming style
Dependent : Independent
Target : Input
Criterion : Predictor
8. Linear Regression
• Simple Linear Regression : Only one independent variable
• Multiple Linear Regression : Two or more independent variables
Non-Linear Regression
• Dependent or non-linear transformation of independent variables
Supervised ML - Regression
9. Simple Linear Regression
In a Simple Linear Regression, we fit the best line between
the dependent variable and the independent variable given
as y = mx + c
m = Co-efficient of x (i.e. change in y divided by change in x)
c = Intercept (represents the variability in y)
• Fit a line to a dataset of observations
• Use this line to predict unobserved values
10. Real life scenarios will definitely have many more independent variables than
just one. To model an equation that studies the relationship between the
dependent variable and multiple independent variables.
The representation equation can be extended to
Y = m0 + m1x1 + m2x2 + m3x3 + … + mnxn
Where,
m0 is the intercept
m1 is the coefficient of variable x1
M2 is the coefficient of variable x2
And so on…
Multiple Linear Regression
11. Logistic Regression is a technique that determines the relationship between a
dependent variable and one or more independent variables, with the type of
dependent being a dichotomous categorical variable.
To maximize or minimize a function, we differentiate the function and find the point
when the gradient is zero.
Since, this is a non-linear function, we use the gradient descent i.e. calculate the
gradient of the function at each i.e. update the values of parameters
Supervised ML - Classification
12. The key objective in clustering is to identify distinct groups/clusters based on
similarities within a given dataset.
Agglomerative (Hierarchical)
Divisive (K-Means)
K-Means Clustering
• Start with random point initialization of the required number of centers. (“K”in K-
means stands for the number of clusters)
• Assign each data point to the “center“ closest to it. (Distance metric = Normal
Euclidean distance)
• Recalculate centers by averaging the dimensions of the points belonging to each
cluster.
• Repeat with new centers until we reach a point where the assignments become
stable.
Unsupervised ML - Classification
13. K-Means Clustering
• Start with random point initialization of the required number of centers.
(“K”in K-means stands for the number of clusters)
• Assign each data point to the “center“ closest to it. (Distance metric =
Normal Euclidean distance)
• Recalculate centers by averaging the dimensions of the points belonging
to each cluster.
• Repeat with new centers until we reach a point where the assignments
become stable.
Hierarchical Clustering
• Start with “n”clusters (n= # of data points)
• Combine the 2 closest clusters
• Repeat till only 1 cluster exists.
Unsupervised ML - Classification