Machine Learning Basics
Suresh Arora
suresh.arora@opendatalabs.in
Course contents
 Incomplete History of Machine Learning
 What is Machine Learning
 When do we need Machine Learning
 Machine Learning and AI
 Machine Learning and Statistics
 Machine Learning and Data Mining
 Types of Learning
 Supervised Learning – Classification
 Supervised Learning - Regression
 Unsupervised Learning – Clustering
 Unsupervised Learning – Dimensionality Reduction
 Common Machine Learning Algorithms
Course contents … cont
 Programming Languages for Machine Learning
 Python vs R for Machine Learning
 Installing Python packages
 Simple Linear Regression
 Logistic Regression
 K-means Clustering
Incomplete History of Machine Learning
Arthur Lee Samuel coined the term “Machine Learning” in 1959.While at IBM
he developed a program that learned how to play checkers better than him.
The Samuel Checkers-playing Program was among the world's first
successful self-learning programs.
Tom Mitchell, another well regarded machine learning researcher, proposed a
precise definition of Machine Learning in 1998:
“A computer program is said to learn from experience E with respect to some
task T and some performance measure P, if its performance on T, as
measured by P, improves with experience E.”
What is Machine Learning
 Learning is the process of converting experience into expertise
or knowledge
 Input to a learning algorithm is training data, representing
experience
 Output is some expertise, usually takes the form of computer
program that can perform some task
 Machine Learning is a way for computers to learn things without
being specifically programmed
 Machine Learning shares common threads with Statistics, Game
theory, Information theory, optimization etc.
 A key feature of machine learning, which distinguishes it from
other algorithmic tasks, is that here goal is generalization: to use
one set of data in order to perform well on new data we have not
seen yet
When do we need Machine Learning
 Two aspects of a problem call for the use of ML
 Problem complexity
 Need for adaptivity
 Problem complexity : Tasks where information is not complete to
write a well defined program e.g., Facial recognition, speech
recognition
 Adaptivity : Sometimes we need programs whose behavior
adapts to their input data e.g., decoding handwritten text
 Tasks with very big datasets often use machine learning e.g.,
Recommendation Systems, Information retrieval (Find images
with similar content)
Machine Learning & AI
Source : https://www.kdnuggets.com/2017/07/rapidminer-ai-machine-
learning-deep-learning.html
Machine Learning & AI
 AI - Use of computers to mimic the cognitive functions of
humans. When machines carry out tasks based on algorithms in
an “intelligent” manner.
 AI also includes Natural language understanding, language
synthesis, computer vision, robotics, sensor analysis,
optimization & simulation, and more
 ML – Subset of AI and focuses on the ability of machines to
receive a set of data and learn for themselves, changing
algorithms as they learn more about the information they are
processing
 ML also includes Deep Learning, support vector machines,
decision trees, Bayes learning, k-means clustering, association
rule learning, regression, and many more
Machine Learning & Statistics
 They’re related, sure. But their parents are different
 A lot of ML is rediscovery of things statisticians already knew, but the
emphasis is very different:
 Statistics is often interested in asymptotic behavior (like the
convergence of sample-based statistical estimates as the sample
sizes grow to infinity)
 ML focuses on finite sample bounds. Namely, given the size of
available samples, ML theory aims to figure out the degree of
accuracy that a learner can expect on the basis of such samples.
 ML goal is that a complicated algorithm produces impressive
results on a specific task
 ML uses statistical theory to build models – core task is inference from
a sample
 Machine learning is about the execution of learning by computers;
hence algorithmic issues are pivotal. Algorithms are developed to
perform the learning tasks and ML is concerned with their
computational efficiency
Machine Learning & Data Mining
 Machine learning and data mining use the same key algorithms to
discover patterns in the data. However their process, and
consequently utility, differ
 Unlike data mining, in machine learning, the machine must
automatically learn the parameters of models from the data
 Data mining typically uses batched information to reveal a new insight
at a particular point in time rather than an on-going basis
 ML can be used to continuously monitor the performance of equipment
and events and automatically determine what the norm is and when
failures are likely to occur
Machine Learning & Data Mining
Source : https://guavus.com/artificial-intelligence-vs-machine-learning-vs-
data-mining-101-whats-big-difference/
Relationship between ML and other fields
Source: https://blogs.sas.com/content/subconsciousmusings/2014/08/22/looking-
backwards-looking-forwards-sas-data-mining-and-machine-learning/
Types of Learning - Supervised
 Supervised Learning - Given examples of inputs and corresponding
desired outputs, predict outputs on future inputs, e.g., classification,
regression, time series prediction. The main goal in supervised
learning is to learn a model from labeled training data that allows us to
make predictions about unseen or future data
Source : Python Machine Learning by Sebastian Raschka
Types of Learning - Unsupervised
 Unsupervised Learning - Create a new representation of the input,
e.g., form clusters; extract features; compression; detect outliers
 In unsupervised learning, we deal with unlabeled data or data of
unknown structure. Here goal is to explore the structure of data to
extract meaningful information without the guidance of a known
outcome variable. Clustering a data set into subsets of similar objects
is a typical example of such a task
 In unsupervised learning, however, there is no distinction between
training and test data.
Source : Understanding Machine Learning From Theory to Algorithms by
Shai Shalev-Shwartz & Shai Ben-David
Types of Learning - Reinforcement
 Reinforcement Learning - is learning from rewards, by trial and error,
during normal interaction with the world
 Goal is to develop a system (agent) that improves its performance
based on interactions with the environment
Supervised Learning - Classification
 Classification
 subcategory of supervised learning where the goal is to predict the
categorical class labels (discrete, unordered values) of new
instances based on past observations
 Outputs are categorical (1-of-N)
 Inputs are anything
 Goal: select correct class for new inputs
 Ex: speech, character recognition, object recognition, medical
diagnosis
Supervised Learning - Regression
 Regression
 subcategory of supervised learning where goal is to predict the
value of one or more continuous target variables t given the value
of a D-dimensional vector x of input variables
 Outputs are continuous
 Inputs are anything (typically continuous)
 Goal: predict outputs accurately for new inputs
 Examples: predicting market prices, customer rating of movie
Unsupervised Learning - Clustering
 Clustering
 is an exploratory data analysis technique that allows us to organize
a pile of information into meaningful subgroups (clusters) without
having any prior knowledge of their group memberships
 also sometimes called "unsupervised classification"
 Clustering is often used in marketing in order to group users
according to multiple characteristics/features, such as location,
purchasing behavior, age, and gender
 Most important part of formulating the clustering problem is
selecting the variables/features on which the clustering is based
Unsupervised Learning – Dimensionality
Reduction
 Dimensionality Reduction for Data Compression
 is a commonly used approach in feature preprocessing to remove
noise from data, which can also degrade the predictive
performance of certain algorithms, and compress the data onto a
smaller dimensional subspace while retaining most of the relevant
information
Common Machine Learning Algorithms
 Top ML Algorithms
 Linear Regression
 Logistic Regression
 Linear Discriminant Analysis
 Decision Trees
 Naive Bayes
 K-Nearest Neighbors
 Support Vector Machines
 Random Forest
 Gradient Boosting algorithms
Programming Languages for ML
 Top 5 Languages which are used for ML tasks
 Python
 R
 C/C++
 Java
 JavaScript
 Other languages used in machine learning, includes Julia, Scala,
Ruby, Octave, MATLAB and SAS
 There is no such thing as a ‘best language for machine learning’ and it
all depends on what you want to build, where you’re coming from and
why you got involved in machine learning
Source : https://towardsdatascience.com/what-is-the-best-programming-
language-for-machine-learning-a745c156d6b7
Python vs R for Machine Learning
Installing Python Packages
After installing Python, we can execute pip from the command
line terminal to install additional Python packages.
> pip install numpy
> pip install scipy
> pip install pandas
> pip install scikit-learn
> pip install matplotlib
Simple Linear Regression
Goal of simple (univariate) linear regression is to model the relationship
between a single feature (explanatory variable x) and a continuous
valued response (target variable y)
y = w0 + w1x
Linear regression can be understood as finding the best-fitting straight
line through the sample points
This best-fitting line is also called the
regression line, and the vertical lines
from the regression line to the sample
points are the so-called offsets or
residuals—the errors of our prediction
Linear regression via scikit-learn
Ordinary Least Squares method
Let the curve y = a + bx + cx2 + dx3 …………... + kxm be fitted to the set of
data points (x1, y1), (x2, y2)…… (xn, yn)
We have to determine the constants a. b, c, …. k such that it represents
the curve of best fit. When n > m, we have to apply principle of least
squares for solving the n equations which are formed by substituting
the values of (xi, yi ) in the equation of curve.
At x = xi, Observed value is yi and expected (calculated) value is given
by a + bxi + cxi
2 …… + kxi
m = Φi
Error (or residual) at x = xi , is given by ei = yi – Φi
Since, some the error terms will be positive and others negative, we square
each of the terms to give equal weight to each error
Total Error is given by E = e1
2 + e2
2 + e3
2 ……… + en
2
The Curve of Best fit is that for which e’s are as small as possible i.e., sum of
the squares of the errors is a minimum.
Logistic Regression
 Logistic Regression is a method for Binary classification problems
 One of the most widely used algorithms for classification in industry
 Technique named after function used at the core of the method, the
logistic (or Sigmoid) function
 Logistic regression uses an equation as the representation, very much
like linear regression but the key difference is that the output value
being modeled is a binary values (0 or 1) rather than a numeric value
Key concepts for Logistic Regression
Odds Ratio : is the odds in favor of a particular event (event that we
want to predict).It is given by
p / (1 -p)
Where p stands for probability of positive event, for example, the
probability that a patient has a certain disease; we can think of the
positive event as class label y = 1
Logit function : logarithm of the odds ratio
logit(p) = log { p / (1 -p) }
Logit function takes input values in the range 0 to 1 and transforms them
to values over the entire real number range, which we can use to
express a linear relationship between feature values and the log-odds
logit{p(y=1|x)} = w0x0 + w1x1 + w2x2 …. + wmxm = ∑i=0
n wmxm = wT x
Key concepts for Logistic Regression
Logistic or Sigmoid Function : S-shaped curve that can take any real
valued number and map it into a value between 0 and 1, but never
exactly at those limits
ф(z) = 1 / (1 + e-z)
Here, z is the net input as given by z = wT x = w0 + w1x1 …. + wmxm
Approach for Logistic Regression
Output of the sigmoid function is then interpreted as the probability of
particular sample belonging to class 1 ф(z) = P ( y = 1| x ; w ) , given
its features x parameterized by the weights w
The predicted probability can then simply be converted into a binary
outcome via a quantizer (unit step function):
y = 1 if ф ( z ) ≥ 0.5
= 0 otherwise
K-means Clustering
Procedure for K-means Clustering:
1.Randomly pick k centroids from the sample points as initial cluster
centers.
2.Assign each sample to the nearest centroid μ( j ) , j ∈ { 1, ... , k }
3.Move the centroids to the center of the samples that were assigned to
it
4.Repeat the steps 2 and 3 until the cluster assignment do not change
or a user-defined tolerance or a maximum number of iterations is
reached

Machine Learning Basics

  • 1.
    Machine Learning Basics SureshArora suresh.arora@opendatalabs.in
  • 2.
    Course contents  IncompleteHistory of Machine Learning  What is Machine Learning  When do we need Machine Learning  Machine Learning and AI  Machine Learning and Statistics  Machine Learning and Data Mining  Types of Learning  Supervised Learning – Classification  Supervised Learning - Regression  Unsupervised Learning – Clustering  Unsupervised Learning – Dimensionality Reduction  Common Machine Learning Algorithms
  • 3.
    Course contents …cont  Programming Languages for Machine Learning  Python vs R for Machine Learning  Installing Python packages  Simple Linear Regression  Logistic Regression  K-means Clustering
  • 4.
    Incomplete History ofMachine Learning Arthur Lee Samuel coined the term “Machine Learning” in 1959.While at IBM he developed a program that learned how to play checkers better than him. The Samuel Checkers-playing Program was among the world's first successful self-learning programs. Tom Mitchell, another well regarded machine learning researcher, proposed a precise definition of Machine Learning in 1998: “A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.”
  • 5.
    What is MachineLearning  Learning is the process of converting experience into expertise or knowledge  Input to a learning algorithm is training data, representing experience  Output is some expertise, usually takes the form of computer program that can perform some task  Machine Learning is a way for computers to learn things without being specifically programmed  Machine Learning shares common threads with Statistics, Game theory, Information theory, optimization etc.  A key feature of machine learning, which distinguishes it from other algorithmic tasks, is that here goal is generalization: to use one set of data in order to perform well on new data we have not seen yet
  • 6.
    When do weneed Machine Learning  Two aspects of a problem call for the use of ML  Problem complexity  Need for adaptivity  Problem complexity : Tasks where information is not complete to write a well defined program e.g., Facial recognition, speech recognition  Adaptivity : Sometimes we need programs whose behavior adapts to their input data e.g., decoding handwritten text  Tasks with very big datasets often use machine learning e.g., Recommendation Systems, Information retrieval (Find images with similar content)
  • 7.
    Machine Learning &AI Source : https://www.kdnuggets.com/2017/07/rapidminer-ai-machine- learning-deep-learning.html
  • 8.
    Machine Learning &AI  AI - Use of computers to mimic the cognitive functions of humans. When machines carry out tasks based on algorithms in an “intelligent” manner.  AI also includes Natural language understanding, language synthesis, computer vision, robotics, sensor analysis, optimization & simulation, and more  ML – Subset of AI and focuses on the ability of machines to receive a set of data and learn for themselves, changing algorithms as they learn more about the information they are processing  ML also includes Deep Learning, support vector machines, decision trees, Bayes learning, k-means clustering, association rule learning, regression, and many more
  • 9.
    Machine Learning &Statistics  They’re related, sure. But their parents are different  A lot of ML is rediscovery of things statisticians already knew, but the emphasis is very different:  Statistics is often interested in asymptotic behavior (like the convergence of sample-based statistical estimates as the sample sizes grow to infinity)  ML focuses on finite sample bounds. Namely, given the size of available samples, ML theory aims to figure out the degree of accuracy that a learner can expect on the basis of such samples.  ML goal is that a complicated algorithm produces impressive results on a specific task  ML uses statistical theory to build models – core task is inference from a sample  Machine learning is about the execution of learning by computers; hence algorithmic issues are pivotal. Algorithms are developed to perform the learning tasks and ML is concerned with their computational efficiency
  • 10.
    Machine Learning &Data Mining  Machine learning and data mining use the same key algorithms to discover patterns in the data. However their process, and consequently utility, differ  Unlike data mining, in machine learning, the machine must automatically learn the parameters of models from the data  Data mining typically uses batched information to reveal a new insight at a particular point in time rather than an on-going basis  ML can be used to continuously monitor the performance of equipment and events and automatically determine what the norm is and when failures are likely to occur
  • 11.
    Machine Learning &Data Mining Source : https://guavus.com/artificial-intelligence-vs-machine-learning-vs- data-mining-101-whats-big-difference/
  • 12.
    Relationship between MLand other fields Source: https://blogs.sas.com/content/subconsciousmusings/2014/08/22/looking- backwards-looking-forwards-sas-data-mining-and-machine-learning/
  • 13.
    Types of Learning- Supervised  Supervised Learning - Given examples of inputs and corresponding desired outputs, predict outputs on future inputs, e.g., classification, regression, time series prediction. The main goal in supervised learning is to learn a model from labeled training data that allows us to make predictions about unseen or future data Source : Python Machine Learning by Sebastian Raschka
  • 14.
    Types of Learning- Unsupervised  Unsupervised Learning - Create a new representation of the input, e.g., form clusters; extract features; compression; detect outliers  In unsupervised learning, we deal with unlabeled data or data of unknown structure. Here goal is to explore the structure of data to extract meaningful information without the guidance of a known outcome variable. Clustering a data set into subsets of similar objects is a typical example of such a task  In unsupervised learning, however, there is no distinction between training and test data. Source : Understanding Machine Learning From Theory to Algorithms by Shai Shalev-Shwartz & Shai Ben-David
  • 15.
    Types of Learning- Reinforcement  Reinforcement Learning - is learning from rewards, by trial and error, during normal interaction with the world  Goal is to develop a system (agent) that improves its performance based on interactions with the environment
  • 16.
    Supervised Learning -Classification  Classification  subcategory of supervised learning where the goal is to predict the categorical class labels (discrete, unordered values) of new instances based on past observations  Outputs are categorical (1-of-N)  Inputs are anything  Goal: select correct class for new inputs  Ex: speech, character recognition, object recognition, medical diagnosis
  • 17.
    Supervised Learning -Regression  Regression  subcategory of supervised learning where goal is to predict the value of one or more continuous target variables t given the value of a D-dimensional vector x of input variables  Outputs are continuous  Inputs are anything (typically continuous)  Goal: predict outputs accurately for new inputs  Examples: predicting market prices, customer rating of movie
  • 18.
    Unsupervised Learning -Clustering  Clustering  is an exploratory data analysis technique that allows us to organize a pile of information into meaningful subgroups (clusters) without having any prior knowledge of their group memberships  also sometimes called "unsupervised classification"  Clustering is often used in marketing in order to group users according to multiple characteristics/features, such as location, purchasing behavior, age, and gender  Most important part of formulating the clustering problem is selecting the variables/features on which the clustering is based
  • 19.
    Unsupervised Learning –Dimensionality Reduction  Dimensionality Reduction for Data Compression  is a commonly used approach in feature preprocessing to remove noise from data, which can also degrade the predictive performance of certain algorithms, and compress the data onto a smaller dimensional subspace while retaining most of the relevant information
  • 20.
    Common Machine LearningAlgorithms  Top ML Algorithms  Linear Regression  Logistic Regression  Linear Discriminant Analysis  Decision Trees  Naive Bayes  K-Nearest Neighbors  Support Vector Machines  Random Forest  Gradient Boosting algorithms
  • 21.
    Programming Languages forML  Top 5 Languages which are used for ML tasks  Python  R  C/C++  Java  JavaScript  Other languages used in machine learning, includes Julia, Scala, Ruby, Octave, MATLAB and SAS  There is no such thing as a ‘best language for machine learning’ and it all depends on what you want to build, where you’re coming from and why you got involved in machine learning Source : https://towardsdatascience.com/what-is-the-best-programming- language-for-machine-learning-a745c156d6b7
  • 22.
    Python vs Rfor Machine Learning
  • 23.
    Installing Python Packages Afterinstalling Python, we can execute pip from the command line terminal to install additional Python packages. > pip install numpy > pip install scipy > pip install pandas > pip install scikit-learn > pip install matplotlib
  • 24.
    Simple Linear Regression Goalof simple (univariate) linear regression is to model the relationship between a single feature (explanatory variable x) and a continuous valued response (target variable y) y = w0 + w1x Linear regression can be understood as finding the best-fitting straight line through the sample points This best-fitting line is also called the regression line, and the vertical lines from the regression line to the sample points are the so-called offsets or residuals—the errors of our prediction
  • 25.
  • 26.
    Ordinary Least Squaresmethod Let the curve y = a + bx + cx2 + dx3 …………... + kxm be fitted to the set of data points (x1, y1), (x2, y2)…… (xn, yn) We have to determine the constants a. b, c, …. k such that it represents the curve of best fit. When n > m, we have to apply principle of least squares for solving the n equations which are formed by substituting the values of (xi, yi ) in the equation of curve. At x = xi, Observed value is yi and expected (calculated) value is given by a + bxi + cxi 2 …… + kxi m = Φi Error (or residual) at x = xi , is given by ei = yi – Φi Since, some the error terms will be positive and others negative, we square each of the terms to give equal weight to each error Total Error is given by E = e1 2 + e2 2 + e3 2 ……… + en 2 The Curve of Best fit is that for which e’s are as small as possible i.e., sum of the squares of the errors is a minimum.
  • 27.
    Logistic Regression  LogisticRegression is a method for Binary classification problems  One of the most widely used algorithms for classification in industry  Technique named after function used at the core of the method, the logistic (or Sigmoid) function  Logistic regression uses an equation as the representation, very much like linear regression but the key difference is that the output value being modeled is a binary values (0 or 1) rather than a numeric value
  • 28.
    Key concepts forLogistic Regression Odds Ratio : is the odds in favor of a particular event (event that we want to predict).It is given by p / (1 -p) Where p stands for probability of positive event, for example, the probability that a patient has a certain disease; we can think of the positive event as class label y = 1 Logit function : logarithm of the odds ratio logit(p) = log { p / (1 -p) } Logit function takes input values in the range 0 to 1 and transforms them to values over the entire real number range, which we can use to express a linear relationship between feature values and the log-odds logit{p(y=1|x)} = w0x0 + w1x1 + w2x2 …. + wmxm = ∑i=0 n wmxm = wT x
  • 29.
    Key concepts forLogistic Regression Logistic or Sigmoid Function : S-shaped curve that can take any real valued number and map it into a value between 0 and 1, but never exactly at those limits ф(z) = 1 / (1 + e-z) Here, z is the net input as given by z = wT x = w0 + w1x1 …. + wmxm
  • 30.
    Approach for LogisticRegression Output of the sigmoid function is then interpreted as the probability of particular sample belonging to class 1 ф(z) = P ( y = 1| x ; w ) , given its features x parameterized by the weights w The predicted probability can then simply be converted into a binary outcome via a quantizer (unit step function): y = 1 if ф ( z ) ≥ 0.5 = 0 otherwise
  • 31.
    K-means Clustering Procedure forK-means Clustering: 1.Randomly pick k centroids from the sample points as initial cluster centers. 2.Assign each sample to the nearest centroid μ( j ) , j ∈ { 1, ... , k } 3.Move the centroids to the center of the samples that were assigned to it 4.Repeat the steps 2 and 3 until the cluster assignment do not change or a user-defined tolerance or a maximum number of iterations is reached