Machine Learning Basics

Machine Learning Basics
Suresh Arora
suresh.arora@opendatalabs.in

Course contents
 Incomplete History of Machine Learning
 What is Machine Learning
 When do we need Machine Learning
 Machine Learning and AI
 Machine Learning and Statistics
 Machine Learning and Data Mining
 Types of Learning
 Supervised Learning – Classification
 Supervised Learning - Regression
 Unsupervised Learning – Clustering
 Unsupervised Learning – Dimensionality Reduction
 Common Machine Learning Algorithms

Course contents … cont
 Programming Languages for Machine Learning
 Python vs R for Machine Learning
 Installing Python packages
 Simple Linear Regression
 Logistic Regression
 K-means Clustering

Incomplete History of Machine Learning
Arthur Lee Samuel coined the term “Machine Learning” in 1959.While at IBM
he developed a program that learned how to play checkers better than him.
The Samuel Checkers-playing Program was among the world's first
successful self-learning programs.
Tom Mitchell, another well regarded machine learning researcher, proposed a
precise definition of Machine Learning in 1998:
“A computer program is said to learn from experience E with respect to some
task T and some performance measure P, if its performance on T, as
measured by P, improves with experience E.”

What is Machine Learning
 Learning is the process of converting experience into expertise
or knowledge
 Input to a learning algorithm is training data, representing
experience
 Output is some expertise, usually takes the form of computer
program that can perform some task
 Machine Learning is a way for computers to learn things without
being specifically programmed
 Machine Learning shares common threads with Statistics, Game
theory, Information theory, optimization etc.
 A key feature of machine learning, which distinguishes it from
other algorithmic tasks, is that here goal is generalization: to use
one set of data in order to perform well on new data we have not
seen yet

When do we need Machine Learning
 Two aspects of a problem call for the use of ML
 Problem complexity
 Need for adaptivity
 Problem complexity : Tasks where information is not complete to
write a well defined program e.g., Facial recognition, speech
recognition
 Adaptivity : Sometimes we need programs whose behavior
adapts to their input data e.g., decoding handwritten text
 Tasks with very big datasets often use machine learning e.g.,
Recommendation Systems, Information retrieval (Find images
with similar content)

Machine Learning & AI
Source : https://www.kdnuggets.com/2017/07/rapidminer-ai-machine-
learning-deep-learning.html

Machine Learning & AI
 AI - Use of computers to mimic the cognitive functions of
humans. When machines carry out tasks based on algorithms in
an “intelligent” manner.
 AI also includes Natural language understanding, language
synthesis, computer vision, robotics, sensor analysis,
optimization & simulation, and more
 ML – Subset of AI and focuses on the ability of machines to
receive a set of data and learn for themselves, changing
algorithms as they learn more about the information they are
processing
 ML also includes Deep Learning, support vector machines,
decision trees, Bayes learning, k-means clustering, association
rule learning, regression, and many more

Machine Learning & Statistics
 They’re related, sure. But their parents are different
 A lot of ML is rediscovery of things statisticians already knew, but the
emphasis is very different:
 Statistics is often interested in asymptotic behavior (like the
convergence of sample-based statistical estimates as the sample
sizes grow to infinity)
 ML focuses on finite sample bounds. Namely, given the size of
available samples, ML theory aims to figure out the degree of
accuracy that a learner can expect on the basis of such samples.
 ML goal is that a complicated algorithm produces impressive
results on a specific task
 ML uses statistical theory to build models – core task is inference from
a sample
 Machine learning is about the execution of learning by computers;
hence algorithmic issues are pivotal. Algorithms are developed to
perform the learning tasks and ML is concerned with their
computational efficiency

Machine Learning & Data Mining
 Machine learning and data mining use the same key algorithms to
discover patterns in the data. However their process, and
consequently utility, differ
 Unlike data mining, in machine learning, the machine must
automatically learn the parameters of models from the data
 Data mining typically uses batched information to reveal a new insight
at a particular point in time rather than an on-going basis
 ML can be used to continuously monitor the performance of equipment
and events and automatically determine what the norm is and when
failures are likely to occur

Machine Learning & Data Mining
Source : https://guavus.com/artificial-intelligence-vs-machine-learning-vs-
data-mining-101-whats-big-difference/

Relationship between ML and other fields
Source: https://blogs.sas.com/content/subconsciousmusings/2014/08/22/looking-
backwards-looking-forwards-sas-data-mining-and-machine-learning/

Types of Learning - Supervised
 Supervised Learning - Given examples of inputs and corresponding
desired outputs, predict outputs on future inputs, e.g., classification,
regression, time series prediction. The main goal in supervised
learning is to learn a model from labeled training data that allows us to
make predictions about unseen or future data
Source : Python Machine Learning by Sebastian Raschka

Types of Learning - Unsupervised
 Unsupervised Learning - Create a new representation of the input,
e.g., form clusters; extract features; compression; detect outliers
 In unsupervised learning, we deal with unlabeled data or data of
unknown structure. Here goal is to explore the structure of data to
extract meaningful information without the guidance of a known
outcome variable. Clustering a data set into subsets of similar objects
is a typical example of such a task
 In unsupervised learning, however, there is no distinction between
training and test data.
Source : Understanding Machine Learning From Theory to Algorithms by
Shai Shalev-Shwartz & Shai Ben-David

Types of Learning - Reinforcement
 Reinforcement Learning - is learning from rewards, by trial and error,
during normal interaction with the world
 Goal is to develop a system (agent) that improves its performance
based on interactions with the environment

Supervised Learning - Classification
 Classification
 subcategory of supervised learning where the goal is to predict the
categorical class labels (discrete, unordered values) of new
instances based on past observations
 Outputs are categorical (1-of-N)
 Inputs are anything
 Goal: select correct class for new inputs
 Ex: speech, character recognition, object recognition, medical
diagnosis

Supervised Learning - Regression
 Regression
 subcategory of supervised learning where goal is to predict the
value of one or more continuous target variables t given the value
of a D-dimensional vector x of input variables
 Outputs are continuous
 Inputs are anything (typically continuous)
 Goal: predict outputs accurately for new inputs
 Examples: predicting market prices, customer rating of movie

Unsupervised Learning - Clustering
 Clustering
 is an exploratory data analysis technique that allows us to organize
a pile of information into meaningful subgroups (clusters) without
having any prior knowledge of their group memberships
 also sometimes called "unsupervised classification"
 Clustering is often used in marketing in order to group users
according to multiple characteristics/features, such as location,
purchasing behavior, age, and gender
 Most important part of formulating the clustering problem is
selecting the variables/features on which the clustering is based

Unsupervised Learning – Dimensionality
Reduction
 Dimensionality Reduction for Data Compression
 is a commonly used approach in feature preprocessing to remove
noise from data, which can also degrade the predictive
performance of certain algorithms, and compress the data onto a
smaller dimensional subspace while retaining most of the relevant
information

Common Machine Learning Algorithms
 Top ML Algorithms
 Linear Regression
 Logistic Regression
 Linear Discriminant Analysis
 Decision Trees
 Naive Bayes
 K-Nearest Neighbors
 Support Vector Machines
 Random Forest
 Gradient Boosting algorithms

Programming Languages for ML
 Top 5 Languages which are used for ML tasks
 Python
 R
 C/C++
 Java
 JavaScript
 Other languages used in machine learning, includes Julia, Scala,
Ruby, Octave, MATLAB and SAS
 There is no such thing as a ‘best language for machine learning’ and it
all depends on what you want to build, where you’re coming from and
why you got involved in machine learning
Source : https://towardsdatascience.com/what-is-the-best-programming-
language-for-machine-learning-a745c156d6b7

Python vs R for Machine Learning

Installing Python Packages
After installing Python, we can execute pip from the command
line terminal to install additional Python packages.
> pip install numpy
> pip install scipy
> pip install pandas
> pip install scikit-learn
> pip install matplotlib

Simple Linear Regression
Goal of simple (univariate) linear regression is to model the relationship
between a single feature (explanatory variable x) and a continuous
valued response (target variable y)
y = w0 + w1x
Linear regression can be understood as finding the best-fitting straight
line through the sample points
This best-fitting line is also called the
regression line, and the vertical lines
from the regression line to the sample
points are the so-called offsets or
residuals—the errors of our prediction

Linear regression via scikit-learn

Ordinary Least Squares method
Let the curve y = a + bx + cx2 + dx3 …………... + kxm be fitted to the set of
data points (x1, y1), (x2, y2)…… (xn, yn)
We have to determine the constants a. b, c, …. k such that it represents
the curve of best fit. When n > m, we have to apply principle of least
squares for solving the n equations which are formed by substituting
the values of (xi, yi ) in the equation of curve.
At x = xi, Observed value is yi and expected (calculated) value is given
by a + bxi + cxi
2 …… + kxi
m = Φi
Error (or residual) at x = xi , is given by ei = yi – Φi
Since, some the error terms will be positive and others negative, we square
each of the terms to give equal weight to each error
Total Error is given by E = e1
2 + e2
2 + e3
2 ……… + en
2
The Curve of Best fit is that for which e’s are as small as possible i.e., sum of
the squares of the errors is a minimum.

Logistic Regression
 Logistic Regression is a method for Binary classification problems
 One of the most widely used algorithms for classification in industry
 Technique named after function used at the core of the method, the
logistic (or Sigmoid) function
 Logistic regression uses an equation as the representation, very much
like linear regression but the key difference is that the output value
being modeled is a binary values (0 or 1) rather than a numeric value

Key concepts for Logistic Regression
Odds Ratio : is the odds in favor of a particular event (event that we
want to predict).It is given by
p / (1 -p)
Where p stands for probability of positive event, for example, the
probability that a patient has a certain disease; we can think of the
positive event as class label y = 1
Logit function : logarithm of the odds ratio
logit(p) = log { p / (1 -p) }
Logit function takes input values in the range 0 to 1 and transforms them
to values over the entire real number range, which we can use to
express a linear relationship between feature values and the log-odds
logit{p(y=1|x)} = w0x0 + w1x1 + w2x2 …. + wmxm = ∑i=0
n wmxm = wT x

Key concepts for Logistic Regression
Logistic or Sigmoid Function : S-shaped curve that can take any real
valued number and map it into a value between 0 and 1, but never
exactly at those limits
ф(z) = 1 / (1 + e-z)
Here, z is the net input as given by z = wT x = w0 + w1x1 …. + wmxm

Approach for Logistic Regression
Output of the sigmoid function is then interpreted as the probability of
particular sample belonging to class 1 ф(z) = P ( y = 1| x ; w ) , given
its features x parameterized by the weights w
The predicted probability can then simply be converted into a binary
outcome via a quantizer (unit step function):
y = 1 if ф ( z ) ≥ 0.5
= 0 otherwise

K-means Clustering
Procedure for K-means Clustering:
1.Randomly pick k centroids from the sample points as initial cluster
centers.
2.Assign each sample to the nearest centroid μ( j ) , j ∈ { 1, ... , k }
3.Move the centroids to the center of the samples that were assigned to
it
4.Repeat the steps 2 and 3 until the cluster assignment do not change
or a user-defined tolerance or a maximum number of iterations is
reached

Machine Learning Basics

More Related Content

What's hot

Similar to Machine Learning Basics

Recently uploaded

Machine Learning Basics