SlideShare a Scribd company logo
1 of 100
SACON
SACON Pune 2018
India | Pune | May 18 – 19 | Hotel Hyatt Pune
Learning Machine Learning
Subrat Panda
Capillary Technologies
Principal Architect, AI and Data Sciences
SACON
LEARNING MACHINE LEARNING
Subrat Panda
Principal Architect, AI and Data Sciences,
Capillary Technologies (www.capillarytech.com)
Co-Founder : IDLI (Indian Deep Learning Initiative)
https://www.facebook.com/groups/idliai/
BTech(2002), PhD(2009) IIT KGP.
https://www.linkedin.com/in/subratpanda/
Email : subratpanda@gmail.com
Acknowledgements:
Biswa Gourav Singh
Co-Founder : IDLI (Indian Deep Learning Initiative)
https://www.linkedin.com/in/biswagsingh/
Email: biswagourav.singh@gmail.com
AI Community Across the Globe
SACON
SACON
Preface
• Artificial intelligence is already part of our everyday lives.
SACON 2018 - Pune
SACON
Application of AI, Machine Learning and Deep
Learning
SACON
Gartner Says By 2020,
Artificial Intelligence Will
Create More Jobs Than It
Eliminates
SACON
What this talk can motivate people to do
 STUDENTS:
 Motivates to participate in data science competitions
 Further learning and add the expertise to the resume
 Final year and fun projects.
 PROFESSIONALS:
 Find interesting data in your current project and apply machine learning
 Motivates further learning and profession change. Data scientists/Machine
learning engineers are highly paid professionals 
 TEACHERS:
 Motivates teachers to spread knowledge in the their university
 Conduct hackathons
SACON 2018 - Pune
SACON
SACON
Machine Learning Classical Definition
 Arthur Samuel (1959): "computer’s ability to learn without being
explicitly programmed.“
 Tom M Mitchel (1998): "A computer program is said to learn from
experience E with respect to some class of tasks T and performance
measure P if its performance at tasks in T, as measured by P,
improves with experience E.“
 Optimize a performance criterion using example data or past
experience.
SACON
Types of Machine Learning Algorithms
 Supervised Learning: Input data with
labeled responses
 Regression : Given a picture of a person, we
have to predict their age on the basis of the
given picture
 Classification : Given a patient with a tumor,
we have to predict whether the tumor is
malignant or benign. IRIS DATASET
SPECIES
CLASSIFICATION
TEXT
CLASSIFICATION
IMAGE
CLASSIFICATION
Linear Regression Non-Linear
Regression
SACON
Types of Machine Learning Algorithms
 Unsupervised Learning: Input data without labeled responses.
 Clustering: Take a collection of 1,000,000 different genes, and find a way to
automatically group these genes into groups that are somehow similar or
related by different variables, such as lifespan, location, roles, and so on.
 Non Clustering: Exploratory data analysis (PCA, Auto-encoders)
Customer
Segmentation
MNIST Digit Segmentation
SACON
Data Modeling
SACON
SACON
Pop Quiz
 Predicting housing prices based on input parameters like house
size, number of rooms, location of house etc. falls under which
category of machine learning problem:
 A) Regression
 B) Classification
 C) Clustering
 D) None
 Automatically segmenting your customers according to the customer
information falls under which category of machine learning.
 A) Regression
 B) Classification
 C) Clustering
 D) None
SACON
SACON Pune 2018
India | Pune | May 18 – 19 | Hotel Hyatt Pune
Supervised Learning
SACON 2018 - Pune
SACON
Linear Regression
• Linear regression is the simple form of Supervised learning.
• In a regression problem the target variable is continuous.
Living Area (Sq. feet) Year Built Price (1000$s)
2104 2012 400
1600 2013 300
2400 2014 369
1416 2013 232
3000 2015 540
. . .
. . .
. . .
Predict Housing Price from Historical data
SACON
Linear Regression
• The goal is to learn a function which assumes linear relationship
between target variable Y with input variable X
SACON
Linear Regression
• In supervised learning, our goal is, given a training set, to learn a function h : X
→ Y so that h(x) is a “good” predictor for the corresponding value of Y.
Living Area (Sq.
feet) Year Built
Price
(1000$s)
2104 2012 400
1600 2013 300
2400 2014 369
1416 2013 232
3000 2015 540
. . .
. . .
. . .
• Lets consider the housing data above. X’s represents a two dimensional vector
ad Y represents the price of the house.
SACON
Learning the curve
SACON
Cost Function I
• Lets approximate the Y as a linear function of X. Hence the hypothesis function
will be given by.
• θ’s are the parameters (also called weights) parameterizing the space of linear
functions mapping from X to Y.
• How do we pick, or learn, the parameters θ? One reasonable method seems to
be to make h(x) close to y, at least for the training examples we have. The cost
function is given by: (Considering θ1
• This is the least-squares cost function that gives rise to the ordinary least
squares regression model
SACON
Cost Function II
 We want to choose θ so as to minimize J(θ).
 We can see the cost associated with different values of θ and we can see the
graph has a slight bowl to its shape.
 The goal is to “roll down the hill”, and find θ corresponding to the bottom of
the bowl.
SACON
Gradient Descent
 We should use a search algorithm that starts with some “initial guess” for θ, and that
repeatedly changes θ to make J(θ) smaller, until we converge to a value of θ that
minimizes J(θ).
 The algorithm we choose is Gradient Descent Algorithm, which starts with some
initial θ and repeatedly perform the following update:
 If we calculate the partial derivate , we get the following output:
α = Learning Rate
If α is too small: slow convergence.
If α is too large: may not decrease on every iteration and thus may
not converge.
SACON
How the algorithm Works:
SACON 2018 - Pune
(θ0,θ1) = (-0.12, 820)
SACON
(θ0,θ1) = (0.0,
420)
(θ0,θ1) = (0.14, 220)
SACON 2018 - Pune
SACON
Other Optimization Methods:
 There is an alternative to batch gradient descent that also works very well.
Consider the following algorithm:
 Each time we encounter a training example, we update the parameters
according to the gradient of the error with respect to that single training
example only. This algorithm is called Stochastic Gradient Descent(SGD).
 Other examples of Optimization algorithms: BFGS, L-BFGS
 Mini batch gradient descent: performs an update for every batch.
SACON
Normal Equation
 Normal Equation is a method to solve for θ analytically.
 Our cost function looks like:
 To minimize a Quadratic function, the partial derivative of the function should
be equated to zero.
SACON
Normal Equation
 Given a training set with m examples and n features, define the
design matrix X to be the m-by-n matrix give like below:
 Thus, the value of θ that minimizes J(θ) is given
in closed form by the equation
 let y be the m-dimensional vector containing all the target values
from the training set:
SACON
Pop Quiz
• What is the effect of high learning rate on cost function :
SACON
Logistic Regression
SACON
Introduction
 It is an approach to the classification problem.
 The output vector is either 1 or 0 instead of a continuous range of
values
 y ∈ {0,1}
 Binary classification problem (two values)
 Linear regression wont work in the classification problem
IMAGE
CLASSIFICATION
SACON
Logistic Regression: Hypothesis
 The hypothesis should satisfy
 0 ≤ h(x) ≤ 1
 the "Sigmoid Function," also called
the "Logistic Function":
 We want to restrict the range to 0
and 1. This is accomplished by
plugging θTx into the Logistic
Function
SACON
Decision Boundary
In order to get our discrete 0 or 1 classification, we can translate the output of the
hypothesis function as follows:
hθ(x)≥0.5→y=1
hθ(x)<0.5→y=0
SACON
Cost Function
 Can not use squared cost function as Logistic Function will cause the
output to be wavy, causing many local optima.
SACON
Cost Function
 Logistic regression Cost function
SACON
Advanced Optimization
 Gradient Descent
 Conjugate Gradient
 BFGS
 L-BFGS
SACON
SVM: Support Vector Machine
SACON
Overview
 Intro. to Support Vector Machines (SVM)
 Properties of SVM
 Applications
 Discussion
SACON
 A Support Vector Machine (SVM) is a supervised machine
learning algorithm that can be employed for both classification
and regression purposes.
 SVMs are more commonly used in classification problems
Introduction
Plot shows size and weight of several
people, and there is also a way to
distinguish between men and women.
SACON
 We can see that it is possible to separate the data into classes.
 We could trace a line and then all the data points representing men will be
above the line, and all the data points representing women will be below the
line.
Separating Hyperplane
SACON
 Many separating hyperplane possible. Which one is best?
What is the Optimal Separating Hyperplane
SACON
• We will try to select an hyperplane as far as possible from data
points from each category (best hyperplane)
• Because it correctly classifies the training data
• And because it is the one which will generalize better with unseen data
What is the Optimal Separating Hyperplane
SACON
• Given a particular hyperplane, we can compute the distance between the hyperplane and the
closest data point(Support Vectors).
• Basically the margin is a no man's land. There will never be any data point inside the margin.
Large Margin Classifier
The optimal hyperplane will be the one with the
biggest margin. Margin A is better than Margin B
SACON
How do we calculate this Margin?
SACON
How do we maximize this Margin?
SACON
How do we maximize this Margin?
SACON
Non-linear SVMs
 Datasets that are linearly separable with some
noise work out great:
 But what are we going to do if the dataset is just
too hard?
 How about… mapping data to a higher-
dimensional space:
0 x
0 x
0 x
x2
SACON
Non-linear SVMs: Feature spaces
 General idea: the original input space can always
be mapped to some higher-dimensional feature
space where the training set is separable:
Φ: x → φ(x)
SACON
The“Kernel Trick”
 The linear classifier relies on dot product between vectors K(xi,xj)=xi
Txj
 If every data point is mapped into high-dimensional space via some
transformation Φ: x → φ(x), the dot product becomes:
K(xi,xj)= φ(xi) Tφ(xj)
 A kernel function is some function that corresponds to an inner product in
some expanded feature space.
 Example:
2-dimensional vectors x=[x1 x2]; let K(xi,xj)=(1 + xi
Txj)2
,
Need to show that K(xi,xj)= φ(xi)Tφ(xj):
K(xi,xj)=(1 + xi
Txj)2
,
= 1+ xi1
2xj1
2 + 2 xi1xj1 xi2xj2+ xi2
2xj2
2 + 2xi1xj1 + 2xi2xj2
= [1 xi1
2 √2 xi1xi2 xi2
2 √2xi1 √2xi2]T [1 xj1
2 √2 xj1xj2 xj2
2 √2xj1 √2xj2]
= φ(xi)Tφ(xj), where φ(x) = [1 x1
2 √2 x1x2 x2
2 √2x1 √2x2]
SACON
What Functions are Kernels?
 For some functions K(xi,xj) checking that
K(xi,xj)= φ(xi)Tφ(xj) can be cumbersome.
 Mercer’s theorem:
Every semi-positive definite symmetric function is a kernel
 Semi-positive definite symmetric functions correspond to a semi-positive
definite symmetric Gram matrix:
K(x1,x1) K(x1,x2) K(x1,x3) … K(x1,xN)
K(x2,x1) K(x2,x2) K(x2,x3) K(x2,xN)
… … … … …
K(xN,x1) K(xN,x2) K(xN,x3) … K(xN,xN)
K=
SACON
Examples of Kernel Functions
 Linear: K(xi,xj)= xi
Txj
 Polynomial of power p: K(xi,xj)= (1+ xi
Txj)p
 Gaussian (radial-basis function network):
 Sigmoid: K(xi,xj)= tanh(β0xi
Txj + β1)
)
2
exp(),( 2
2

ji
ji
xx
xx

K
SACON
Non-linear SVMs Mathematically
 Dual problem formulation:
 The solution is:
 Optimization techniques for finding αi’s remain the same!
Find α1…αN such that
Q(α) =Σαi - ½ΣΣαiαjyiyjK(xi, xj) is maximized and
(1) Σαiyi = 0
(2) αi ≥ 0 for all αi
f(x) = ΣαiyiK(xi,
xj)+ b
SACON
 SVM locates a separating hyperplane in the feature space and classify points in
that space
 It does not need to represent the space explicitly, simply by defining a kernel
function
 The kernel function plays the role of the dot product in the feature space.
Nonlinear SVM - Overview
SACON
Properties of SVM
 Flexibility in choosing a similarity function
 Sparseness of solution when dealing with large data sets
- only support vectors are used to specify the separating hyperplane
 Ability to handle large feature spaces
- complexity does not depend on the dimensionality of the feature space
 Overfitting can be controlled by soft margin approach
 Nice math property: a simple convex optimization problem which is
guaranteed to converge to a single global solution
 Feature Selection
SACON
SVM Applications
 SVM has been used successfully in many real-world problems
 Text (and hypertext) categorization
 Image classification
 Bioinformatics (Protein classification, Cancer classification)
 Hand-written character recognition
SACON
Application 1: Cancer Classification
High Dimensional
 - p>1000; n<100
Imbalanced
 - less positive samples
Many irrelevant features
Noisy
Genes
Patients g-1 g-2 …… g-p
P-1
p-2
…….
p-n
N
n
xxkxxK

 ),(],[
FEATURE SELECTION
In the linear case,
wi
2 gives the ranking of dim i
SVM is sensitive to noisy (mis-labeled) data 
SACON
Weakness of SVM
 It is sensitive to noise
- A relatively small number of mislabeled examples can dramatically decrease
the performance
 It only considers two classes
- how to do multi-class classification with SVM?
- Answer:
1) with output arity m, learn m SVM’s
 SVM 1 learns “Output==1” vs “Output != 1”
 SVM 2 learns “Output==2” vs “Output != 2”
 :
 SVM m learns “Output==m” vs “Output != m”
 2)To predict the output for a new input, just predict with each SVM and
find out which one puts the prediction the furthest into the positive region.
SACON
Application 2: Text Categorization
 Task: The classification of natural text (or hypertext) documents into a
fixed number of predefined categories based on their content.
- email filtering, web searching, sorting documents by topic, etc..
 A document can be assigned to more than one category, so this can
be viewed as a series of binary classification problems, one for each
category
SACON
Representation of Text
IR’s vector space model (aka bag-of-words
representation)
 A doc is represented by a vector indexed by a pre-
fixed set or dictionary of terms
 Values of an entry can be binary or weights
 Normalization, stop words, word stems
 Doc x => φ(x)
SACON
Text Categorization using SVM
 The distance between two documents is φ(x)·φ(z)
 K(x,z) = 〈φ(x)·φ(z) is a valid kernel, SVM can be used with K(x,z) for
discrimination.
 Why SVM?
 High dimensional input space
 Few irrelevant features (dense concept)
 Sparse document vectors (sparse instances)
 Text categorization problems are linearly separable
SACON
Some Issues
 Choice of kernel
 Gaussian or polynomial kernel is default
 If ineffective, more elaborate kernels are needed
 Domain experts can give assistance in formulating appropriate
similarity measures
 Choice of kernel parameters
 e.g. σ in Gaussian kernel
 σ is the distance between closest points with different classifications
 In the absence of reliable criteria, applications rely on the use of a
validation set or cross-validation to set such parameters.
 Optimization criterion – Hard margin v.s. Soft margin
 a lengthy series of experiments in which various parameters are tested
SACON
kNN: K nearest Neighbor
SACON
k-Nearest Neighbor Classification
(kNN)
 Unlike all the previous learning methods, kNN does not build model
from the training data.
 To classify a test instance d, define k-neighborhood P as k nearest
neighbors of d
 Count number n of training instances in P that belong to class cj
 Estimate Pr(cj|d) as n/k
 No training is needed. Classification time is linear in training set size
for each test case.
SACON
kNN Algorithm
 k is usually chosen empirically via a validation set
or cross-validation by trying a range of k values.
 Distance function is crucial, but depends on
applications.
SACON
Example: k=6 (6NN)
Government
Science
Arts
A new point
Pr(science|
)?
SACON
Discussions
 kNN can deal with complex and arbitrary decision boundaries.
 Despite its simplicity, researchers have shown that the classification
accuracy of kNN can be quite strong and in many cases as accurate
as those elaborated methods.
 kNN is slow at the classification time
 kNN does not produce an understandable model
SACON
CLUSTERING
SACON 2018 - Pune
SACON
INTRODUCTION-
What is clustering?
 Clustering is the classification of objects into different groups, or more
precisely, the partitioning of a data set into subsets (clusters), so that the data
in each subset (ideally) share some common trait - often according to some
defined distance measure.
SACON
TYPES OF CLUSTERING
 Hierarchical algorithms: these find successive clusters using previously
established clusters.
 Agglomerative ("bottom-up"): Agglomerative algorithms begin with each element as
a separate cluster and merge them into successively larger clusters.
 Divisive ("top-down"): Divisive algorithms begin with the whole set and proceed to
divide it into successively smaller clusters.
SACON 2018 - Pune
CLUSTER
DENDOGRAM
SACON
TYPES OF CLUSTERING
 Partitional clustering: Partitional algorithms determine all clusters at
once. They include:
 K-means and derivatives
 Fuzzy c-means clustering
 QT clustering algorithm
SACON
TYPES OF CLUSTERING
Distance measure will determine how the similarity of two
elements is calculated and it will influence the shape of the
clusters.
 They include:
The Euclidean distance (also called 2-norm distance) is given
by:
The Manhattan distance (also called taxicab norm or 1-norm) is
given by:
SACON
 The maximum norm is given by:
 The Mahalanobis distance corrects data for different scales and
correlations in the variables.
 Inner product space: The angle between two vectors can be used as a
distance measure when clustering high dimensional data
 Hamming distance (sometimes edit distance) measures the minimum
number of substitutions required to change one member into another.
SACON
K-MEANS CLUSTERING
The k-means algorithm is an algorithm to cluster n
objects based on attributes into k partitions, where k
< n.
It is similar to the expectation-maximization algorithm
for mixtures of Gaussians in that they both attempt to
find the centers of natural clusters in the data.
It assumes that the object attributes form a vector
space.
SACON
 An algorithm for partitioning (or clustering) N data points into K disjoint
subsets Sj containing data points so as to minimize the sum-of-squares
criterion
where xn is a vector representing the the nth data point and uj is the
geometric centroid of the data points in Sj.
 Simply speaking k-means clustering is an algorithm to categorize or to
group the objects based on attributes/features into K number of group.
 K is positive integer number.
 The grouping is done by minimizing the sum of squares of distances
between data and the corresponding cluster centroid.
SACON
HOW K-MEANS CLUSTERING WORKS?
 Step 1: Begin with a decision on the value of k = Number of
clusters
 Step 2: Put any initial partition that classifies the data into k
clusters. You may assign the training samples randomly, or
systematically as the following:
 Take the first k training sample as single- element
clusters
 Assign each of the remaining (N-k) training sample to the
cluster with the nearest centroid. After each assignment,
recompute the centroid of the gaining cluster.
 Step 3: Take each sample in sequence and compute its distance
from the centroid of each of the clusters. If a sample is not
currently in the cluster with the closest centroid, switch this
sample to that cluster and update the centroid of the cluster
gaining the new sample and the cluster losing the sample.
 Step 4 . Repeat step 3 until convergence is achieved, that is until a
pass through the training sample causes no new assignments.
SACON
SACON
SACON Pune 2018
India | Pune | May 18 – 19 | Hotel Hyatt Pune
Bias-Variance in Machine
Learning
SACON
 Bias is the algorithm's tendency to
consistently learn the wrong thing by
not taking into account all the
information in the data
 Variance is the algorithm's tendency to
learn random things irrespective of the
real signal by fitting highly flexible
models that follow the error/noise in
the data too closely
Bias/Variance
SACON
• Generalization ability gives an algorithm’s ability to give accurate
prediction new, previous unseen data
• Models that are too complex for the amount of training data
available are said to overfit and are not likely to generalize well to
new examples
• High variance can cause an algorithm to model the random noise in
the training data, rather than the intended outputs (overfitting).
• Models that are too simple, that do not even do well on training data,
are said to underfit and also not likely to generalize well.
• High bias can cause an algorithm to miss the relevant relations
between features and target outputs (underfitting).
Problem of high Bias/Variance
SACON
Bias-Variance: An Example
SACON
Bias/Variance is a Way to Understand
Overfitting and Underfitting
Error/Loss on
training set
Dtrain
Error/Loss on
an unseen test
set Dtest
high error
80
complex classifiersimple classifier
“too simple”
“too complex”
SACON
Definitions
• Overfitting: too much reliance on the training data
• Underfitting: a failure to learn the relationships in the training data
• High Variance: model changes significantly based on training data
• High Bias: assumptions about model lead to ignoring training data
• Overfitting and underfitting cause poor generalization on the test set
• A validation set for model tuning can prevent under and overfitting
SACON 2018 - Pune
SACON
Ways to Deal with
Overfitting and Underfitting
 Underfitting:
 Easier to resolve
 Try different machine learning models
 Try stronger models with higher capacity (hyperparameter
tuning)
 Try more features
 Overfitting
 Use a resampling technique like K-fold cross validation
 Improve the feature quality or remove some features
 Training with more data
 Early stopping
 Regularization
 Ensembling
Early Stopping
SACON
Regularization
• Regularization penalizes the coefficients. In machine learning, it
actually penalizes the weight matrices of the nodes.
• L1 and L2 are the most common types of regularization.
• These update the general cost function by adding another term
known as the regularization term.
Cost function = Loss (say, binary cross entropy) +
Regularization term
SACON
L1 and L2 Regularization
 In L2, we have:
 Here, lambda is the regularization parameter. It is the hyperparameter whose
value is optimized for better results. L2 regularization is also known as weight
decay as it forces the weights to decay towards zero (but not exactly zero).
 In L1, we have:
 In this, we penalize the absolute value of the weights. Unlike L2, the weights may
be reduced to zero here.
SACON
Neural Networks in Machine
Learning
SACON
Artificial Neural Networks
 A Single Neuron: The basic unit of computation in a neural network is
the neuron, often called a node or unit.
 The function f is non-linear and is called the Activation Function.
 The idea of ANNs is based on the belief that working of human brain by making
the right connections, can be imitated using silicon and wires as
living neurons and dendrites.
SACON
Activation Function
 Sigmoid: takes a real-valued input and squashes it to range between 0 and 1.
σ(x) = 1 / (1 + exp(−x))
 tanh: takes a real-valued input and squashes it to the range [-1, 1]
tanh(x) = 2σ(2x) − 1
 ReLU: ReLU stands for Rectified Linear Unit. It takes a real-valued input and
thresholds it at zero (replaces negative values with zero)
f(x) = max(0, x)
SACON
Neural Network Intuition (single layer)
SACON 2018 - Pune
SACON
Neural Network Intuition (Multiple Layer layer)
 Multi Layer Neural network is capable of learning complex
functions.
 Lets consider XNOR operation.
• CASE1: X1 XNOR X2 = (A’.B’) + (A.B)
NN
representation
• CASE2: X1 XNOR X2 = NOT [ (A+B).(A’+B’) ]
NN representation = ?
SACON
Back-Propagation
 Back-propagation (BP) algorithms works by
determining the loss (or error) at the output and
then propagating it back into the network.
 The weights are updated to minimize the error
resulting from each neuron.
SACON
Regularization: Dropout
 At every iteration, it randomly selects some nodes
and removes them along with all of their incoming
and outgoing connections
 We need to choose the dropout parameter such
that we get the appropriate fitting
SACON
Deep Learning
• Deep Neural Network has a been very successful recently in the field
of computer vision, Natural language Processing, Speech recognition
and many more.
• Some of the important/successful networks are
• Convolutional Neural Network: Has been very successful in computer vision
• Recurrent neural network: Has been successful in Natural Language
Processing and speech recognition as well.
SACON
Tree based modeling
SACON
Decision Tree
 Decision Tree is the supervised learning algorithm.
 We split the population or sample into two or more homogeneous sets (or sub-
populations) based on most significant differentiator in input variables.
1.Root Node: It represents entire
population or sample and this further
gets divided into two or more
homogeneous sets.
2.Splitting: It is a process of dividing
a node into two or more sub-nodes.
3.Decision Node: When a sub-node
splits into further sub-nodes, then it is
called decision node.
4.Leaf/ Terminal Node: Nodes do not
split is called Leaf or Terminal node.
SACON
Another Example
SACON
Methods of splitting: Information gain
which node can be described easily?
 Information theory is a measure to define this degree of disorganization in a
system known as Entropy.
Here p and q is probability of success and failure respectively in
that node.
SACON
Other Tree based methods
 Trade-off management of bias-variance errors.
 Bagging is a simple ensembling technique in which we
build many independent predictors/models/learners and
combine them using some model averaging techniques.
 Ensemble methods involve group of predictive models to
achieve a better accuracy and model stability.
 Random Forest: Multiple Trees instead of
single tree. It’s a bagging method
 To classify a new object based on
attributes, each tree gives a classification
and we say the tree “votes” for that class.
SACON
Other Tree based methods
 Gradient Boosting is a tree ensemble technique that creates a strong classifier
from a number of weak classifiers.
 It works in the technique of weak learners and the additive model.
 Boosting is an ensemble technique in which the predictors are not made
independently, but sequentially.
SACON
Iris Dataset
 Three species of Iris (Iris setosa, Iris virginica and Iris versicolor).
 Four features were measured from each sample: the length and the width of
the sepals and petals, in centimeters.
SACON
References
• Andrew Ng’s Coursera Course
• Scikit Learn Training example on Google
• Nvidia
• Sebastian Ruder’s blog
• HBR
• MIT Tech Review
• Lots of Others
• AI community in general
• IDLI Community

More Related Content

What's hot

InfoGAN and Generative Adversarial Networks
InfoGAN and Generative Adversarial NetworksInfoGAN and Generative Adversarial Networks
InfoGAN and Generative Adversarial NetworksZak Jost
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to RankBhaskar Mitra
 
Java căn bản - Chapter3
Java căn bản - Chapter3Java căn bản - Chapter3
Java căn bản - Chapter3Vince Vo
 
Adversarial Reinforced Learning for Unsupervised Domain Adaptation
Adversarial Reinforced Learning for Unsupervised Domain AdaptationAdversarial Reinforced Learning for Unsupervised Domain Adaptation
Adversarial Reinforced Learning for Unsupervised Domain Adaptationtaeseon ryu
 
Principal Component Analysis and Clustering
Principal Component Analysis and ClusteringPrincipal Component Analysis and Clustering
Principal Component Analysis and ClusteringUsha Vijay
 
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8Hakky St
 
Linear Regression (Machine Learning)
Linear Regression (Machine Learning)Linear Regression (Machine Learning)
Linear Regression (Machine Learning)Omkar Rane
 
Ridge regression, lasso and elastic net
Ridge regression, lasso and elastic netRidge regression, lasso and elastic net
Ridge regression, lasso and elastic netVivian S. Zhang
 
Applied Machine Learning For Search Engine Relevance
Applied Machine Learning For Search Engine Relevance Applied Machine Learning For Search Engine Relevance
Applied Machine Learning For Search Engine Relevance charlesmartin14
 
K-Means Clustering Simply
K-Means Clustering SimplyK-Means Clustering Simply
K-Means Clustering SimplyEmad Nabil
 
Support Vector Machines Simply
Support Vector Machines SimplySupport Vector Machines Simply
Support Vector Machines SimplyEmad Nabil
 
Image Classification And Support Vector Machine
Image Classification And Support Vector MachineImage Classification And Support Vector Machine
Image Classification And Support Vector MachineShao-Chuan Wang
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for ClassificationPrakash Pimpale
 
Greedy method1
Greedy method1Greedy method1
Greedy method1Rajendran
 
Skiena algorithm 2007 lecture16 introduction to dynamic programming
Skiena algorithm 2007 lecture16 introduction to dynamic programmingSkiena algorithm 2007 lecture16 introduction to dynamic programming
Skiena algorithm 2007 lecture16 introduction to dynamic programmingzukun
 
Deep learning paper review ppt sourece -Direct clr
Deep learning paper review ppt sourece -Direct clr Deep learning paper review ppt sourece -Direct clr
Deep learning paper review ppt sourece -Direct clr taeseon ryu
 
Nearest Neighbor Algorithm Zaffar Ahmed
Nearest Neighbor Algorithm  Zaffar AhmedNearest Neighbor Algorithm  Zaffar Ahmed
Nearest Neighbor Algorithm Zaffar AhmedZaffar Ahmed Shaikh
 

What's hot (20)

InfoGAN and Generative Adversarial Networks
InfoGAN and Generative Adversarial NetworksInfoGAN and Generative Adversarial Networks
InfoGAN and Generative Adversarial Networks
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to Rank
 
Java căn bản - Chapter3
Java căn bản - Chapter3Java căn bản - Chapter3
Java căn bản - Chapter3
 
Adversarial Reinforced Learning for Unsupervised Domain Adaptation
Adversarial Reinforced Learning for Unsupervised Domain AdaptationAdversarial Reinforced Learning for Unsupervised Domain Adaptation
Adversarial Reinforced Learning for Unsupervised Domain Adaptation
 
Lda
LdaLda
Lda
 
Principal Component Analysis and Clustering
Principal Component Analysis and ClusteringPrincipal Component Analysis and Clustering
Principal Component Analysis and Clustering
 
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
 
Linear Regression (Machine Learning)
Linear Regression (Machine Learning)Linear Regression (Machine Learning)
Linear Regression (Machine Learning)
 
Ridge regression, lasso and elastic net
Ridge regression, lasso and elastic netRidge regression, lasso and elastic net
Ridge regression, lasso and elastic net
 
Applied Machine Learning For Search Engine Relevance
Applied Machine Learning For Search Engine Relevance Applied Machine Learning For Search Engine Relevance
Applied Machine Learning For Search Engine Relevance
 
K-Means Clustering Simply
K-Means Clustering SimplyK-Means Clustering Simply
K-Means Clustering Simply
 
Support Vector Machines Simply
Support Vector Machines SimplySupport Vector Machines Simply
Support Vector Machines Simply
 
Image Classification And Support Vector Machine
Image Classification And Support Vector MachineImage Classification And Support Vector Machine
Image Classification And Support Vector Machine
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for Classification
 
XGBoostLSS - An extension of XGBoost to probabilistic forecasting, Alexander ...
XGBoostLSS - An extension of XGBoost to probabilistic forecasting, Alexander ...XGBoostLSS - An extension of XGBoost to probabilistic forecasting, Alexander ...
XGBoostLSS - An extension of XGBoost to probabilistic forecasting, Alexander ...
 
Greedy method1
Greedy method1Greedy method1
Greedy method1
 
Machine learning
Machine learningMachine learning
Machine learning
 
Skiena algorithm 2007 lecture16 introduction to dynamic programming
Skiena algorithm 2007 lecture16 introduction to dynamic programmingSkiena algorithm 2007 lecture16 introduction to dynamic programming
Skiena algorithm 2007 lecture16 introduction to dynamic programming
 
Deep learning paper review ppt sourece -Direct clr
Deep learning paper review ppt sourece -Direct clr Deep learning paper review ppt sourece -Direct clr
Deep learning paper review ppt sourece -Direct clr
 
Nearest Neighbor Algorithm Zaffar Ahmed
Nearest Neighbor Algorithm  Zaffar AhmedNearest Neighbor Algorithm  Zaffar Ahmed
Nearest Neighbor Algorithm Zaffar Ahmed
 

Similar to ML Workshop at SACON 2018

Learning Machine Learning (SACON May 2018)
Learning Machine Learning (SACON May 2018)Learning Machine Learning (SACON May 2018)
Learning Machine Learning (SACON May 2018)Priyanka Aash
 
Machine Learning Notes for beginners ,Step by step
Machine Learning Notes for beginners ,Step by stepMachine Learning Notes for beginners ,Step by step
Machine Learning Notes for beginners ,Step by stepSanjanaSaxena17
 
Machine Learning for Modern Developers
Machine Learning for Modern DevelopersMachine Learning for Modern Developers
Machine Learning for Modern Developerscacois
 
Neural networks with python
Neural networks with pythonNeural networks with python
Neural networks with pythonSimone Piunno
 
Essentials of machine learning algorithms
Essentials of machine learning algorithmsEssentials of machine learning algorithms
Essentials of machine learning algorithmsArunangsu Sahu
 
Predicting Employee Attrition
Predicting Employee AttritionPredicting Employee Attrition
Predicting Employee AttritionShruti Mohan
 
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...Simplilearn
 
Application of Machine Learning in Agriculture
Application of Machine  Learning in AgricultureApplication of Machine  Learning in Agriculture
Application of Machine Learning in AgricultureAman Vasisht
 
AML_030607.ppt
AML_030607.pptAML_030607.ppt
AML_030607.pptbutest
 
PCA and LDA in machine learning
PCA and LDA in machine learningPCA and LDA in machine learning
PCA and LDA in machine learningAkhilesh Joshi
 
Machine learning introduction lecture notes
Machine learning introduction lecture notesMachine learning introduction lecture notes
Machine learning introduction lecture notesUmeshJagga1
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnBenjamin Bengfort
 
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Gabriel Moreira
 
1시간만에 머신러닝 개념 따라 잡기
1시간만에 머신러닝 개념 따라 잡기1시간만에 머신러닝 개념 따라 잡기
1시간만에 머신러닝 개념 따라 잡기Sungmin Kim
 

Similar to ML Workshop at SACON 2018 (20)

Learning Machine Learning (SACON May 2018)
Learning Machine Learning (SACON May 2018)Learning Machine Learning (SACON May 2018)
Learning Machine Learning (SACON May 2018)
 
Machine Learning Notes for beginners ,Step by step
Machine Learning Notes for beginners ,Step by stepMachine Learning Notes for beginners ,Step by step
Machine Learning Notes for beginners ,Step by step
 
Machine Learning for Modern Developers
Machine Learning for Modern DevelopersMachine Learning for Modern Developers
Machine Learning for Modern Developers
 
Explore ml day 2
Explore ml day 2Explore ml day 2
Explore ml day 2
 
Neural networks with python
Neural networks with pythonNeural networks with python
Neural networks with python
 
Essentials of machine learning algorithms
Essentials of machine learning algorithmsEssentials of machine learning algorithms
Essentials of machine learning algorithms
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Predicting Employee Attrition
Predicting Employee AttritionPredicting Employee Attrition
Predicting Employee Attrition
 
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
 
Optimization
OptimizationOptimization
Optimization
 
3ml.pdf
3ml.pdf3ml.pdf
3ml.pdf
 
Application of Machine Learning in Agriculture
Application of Machine  Learning in AgricultureApplication of Machine  Learning in Agriculture
Application of Machine Learning in Agriculture
 
AML_030607.ppt
AML_030607.pptAML_030607.ppt
AML_030607.ppt
 
Session 4 .pdf
Session 4 .pdfSession 4 .pdf
Session 4 .pdf
 
PCA and LDA in machine learning
PCA and LDA in machine learningPCA and LDA in machine learning
PCA and LDA in machine learning
 
Machine learning introduction lecture notes
Machine learning introduction lecture notesMachine learning introduction lecture notes
Machine learning introduction lecture notes
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
 
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017
 
1시간만에 머신러닝 개념 따라 잡기
1시간만에 머신러닝 개념 따라 잡기1시간만에 머신러닝 개념 따라 잡기
1시간만에 머신러닝 개념 따라 잡기
 

More from Subrat Panda, PhD

More from Subrat Panda, PhD (9)

Role of technology in agriculture courses by srmist &amp; the hindu
Role of technology in agriculture courses by srmist &amp; the hinduRole of technology in agriculture courses by srmist &amp; the hindu
Role of technology in agriculture courses by srmist &amp; the hindu
 
Journey so far
Journey so farJourney so far
Journey so far
 
AI in security
AI in securityAI in security
AI in security
 
AI in Retail
AI in RetailAI in Retail
AI in Retail
 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial Industry
 
An introduction to reinforcement learning
An introduction to reinforcement learningAn introduction to reinforcement learning
An introduction to reinforcement learning
 
AI and Deep Learning
AI and Deep Learning AI and Deep Learning
AI and Deep Learning
 
AI and The future of work
AI and The future of work AI and The future of work
AI and The future of work
 
AI in retail
AI in retailAI in retail
AI in retail
 

Recently uploaded

Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 

Recently uploaded (20)

Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 

ML Workshop at SACON 2018

  • 1. SACON SACON Pune 2018 India | Pune | May 18 – 19 | Hotel Hyatt Pune Learning Machine Learning Subrat Panda Capillary Technologies Principal Architect, AI and Data Sciences
  • 2. SACON LEARNING MACHINE LEARNING Subrat Panda Principal Architect, AI and Data Sciences, Capillary Technologies (www.capillarytech.com) Co-Founder : IDLI (Indian Deep Learning Initiative) https://www.facebook.com/groups/idliai/ BTech(2002), PhD(2009) IIT KGP. https://www.linkedin.com/in/subratpanda/ Email : subratpanda@gmail.com Acknowledgements: Biswa Gourav Singh Co-Founder : IDLI (Indian Deep Learning Initiative) https://www.linkedin.com/in/biswagsingh/ Email: biswagourav.singh@gmail.com AI Community Across the Globe
  • 4. SACON Preface • Artificial intelligence is already part of our everyday lives. SACON 2018 - Pune
  • 5. SACON Application of AI, Machine Learning and Deep Learning
  • 6. SACON Gartner Says By 2020, Artificial Intelligence Will Create More Jobs Than It Eliminates
  • 7. SACON What this talk can motivate people to do  STUDENTS:  Motivates to participate in data science competitions  Further learning and add the expertise to the resume  Final year and fun projects.  PROFESSIONALS:  Find interesting data in your current project and apply machine learning  Motivates further learning and profession change. Data scientists/Machine learning engineers are highly paid professionals   TEACHERS:  Motivates teachers to spread knowledge in the their university  Conduct hackathons SACON 2018 - Pune
  • 9. SACON Machine Learning Classical Definition  Arthur Samuel (1959): "computer’s ability to learn without being explicitly programmed.“  Tom M Mitchel (1998): "A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E.“  Optimize a performance criterion using example data or past experience.
  • 10. SACON Types of Machine Learning Algorithms  Supervised Learning: Input data with labeled responses  Regression : Given a picture of a person, we have to predict their age on the basis of the given picture  Classification : Given a patient with a tumor, we have to predict whether the tumor is malignant or benign. IRIS DATASET SPECIES CLASSIFICATION TEXT CLASSIFICATION IMAGE CLASSIFICATION Linear Regression Non-Linear Regression
  • 11. SACON Types of Machine Learning Algorithms  Unsupervised Learning: Input data without labeled responses.  Clustering: Take a collection of 1,000,000 different genes, and find a way to automatically group these genes into groups that are somehow similar or related by different variables, such as lifespan, location, roles, and so on.  Non Clustering: Exploratory data analysis (PCA, Auto-encoders) Customer Segmentation MNIST Digit Segmentation
  • 13. SACON
  • 14. SACON Pop Quiz  Predicting housing prices based on input parameters like house size, number of rooms, location of house etc. falls under which category of machine learning problem:  A) Regression  B) Classification  C) Clustering  D) None  Automatically segmenting your customers according to the customer information falls under which category of machine learning.  A) Regression  B) Classification  C) Clustering  D) None
  • 15. SACON SACON Pune 2018 India | Pune | May 18 – 19 | Hotel Hyatt Pune Supervised Learning SACON 2018 - Pune
  • 16. SACON Linear Regression • Linear regression is the simple form of Supervised learning. • In a regression problem the target variable is continuous. Living Area (Sq. feet) Year Built Price (1000$s) 2104 2012 400 1600 2013 300 2400 2014 369 1416 2013 232 3000 2015 540 . . . . . . . . . Predict Housing Price from Historical data
  • 17. SACON Linear Regression • The goal is to learn a function which assumes linear relationship between target variable Y with input variable X
  • 18. SACON Linear Regression • In supervised learning, our goal is, given a training set, to learn a function h : X → Y so that h(x) is a “good” predictor for the corresponding value of Y. Living Area (Sq. feet) Year Built Price (1000$s) 2104 2012 400 1600 2013 300 2400 2014 369 1416 2013 232 3000 2015 540 . . . . . . . . . • Lets consider the housing data above. X’s represents a two dimensional vector ad Y represents the price of the house.
  • 20. SACON Cost Function I • Lets approximate the Y as a linear function of X. Hence the hypothesis function will be given by. • θ’s are the parameters (also called weights) parameterizing the space of linear functions mapping from X to Y. • How do we pick, or learn, the parameters θ? One reasonable method seems to be to make h(x) close to y, at least for the training examples we have. The cost function is given by: (Considering θ1 • This is the least-squares cost function that gives rise to the ordinary least squares regression model
  • 21. SACON Cost Function II  We want to choose θ so as to minimize J(θ).  We can see the cost associated with different values of θ and we can see the graph has a slight bowl to its shape.  The goal is to “roll down the hill”, and find θ corresponding to the bottom of the bowl.
  • 22. SACON Gradient Descent  We should use a search algorithm that starts with some “initial guess” for θ, and that repeatedly changes θ to make J(θ) smaller, until we converge to a value of θ that minimizes J(θ).  The algorithm we choose is Gradient Descent Algorithm, which starts with some initial θ and repeatedly perform the following update:  If we calculate the partial derivate , we get the following output: α = Learning Rate If α is too small: slow convergence. If α is too large: may not decrease on every iteration and thus may not converge.
  • 23. SACON How the algorithm Works: SACON 2018 - Pune (θ0,θ1) = (-0.12, 820)
  • 24. SACON (θ0,θ1) = (0.0, 420) (θ0,θ1) = (0.14, 220) SACON 2018 - Pune
  • 25. SACON Other Optimization Methods:  There is an alternative to batch gradient descent that also works very well. Consider the following algorithm:  Each time we encounter a training example, we update the parameters according to the gradient of the error with respect to that single training example only. This algorithm is called Stochastic Gradient Descent(SGD).  Other examples of Optimization algorithms: BFGS, L-BFGS  Mini batch gradient descent: performs an update for every batch.
  • 26. SACON Normal Equation  Normal Equation is a method to solve for θ analytically.  Our cost function looks like:  To minimize a Quadratic function, the partial derivative of the function should be equated to zero.
  • 27. SACON Normal Equation  Given a training set with m examples and n features, define the design matrix X to be the m-by-n matrix give like below:  Thus, the value of θ that minimizes J(θ) is given in closed form by the equation  let y be the m-dimensional vector containing all the target values from the training set:
  • 28. SACON Pop Quiz • What is the effect of high learning rate on cost function :
  • 30. SACON Introduction  It is an approach to the classification problem.  The output vector is either 1 or 0 instead of a continuous range of values  y ∈ {0,1}  Binary classification problem (two values)  Linear regression wont work in the classification problem IMAGE CLASSIFICATION
  • 31. SACON Logistic Regression: Hypothesis  The hypothesis should satisfy  0 ≤ h(x) ≤ 1  the "Sigmoid Function," also called the "Logistic Function":  We want to restrict the range to 0 and 1. This is accomplished by plugging θTx into the Logistic Function
  • 32. SACON Decision Boundary In order to get our discrete 0 or 1 classification, we can translate the output of the hypothesis function as follows: hθ(x)≥0.5→y=1 hθ(x)<0.5→y=0
  • 33. SACON Cost Function  Can not use squared cost function as Logistic Function will cause the output to be wavy, causing many local optima.
  • 34. SACON Cost Function  Logistic regression Cost function
  • 35. SACON Advanced Optimization  Gradient Descent  Conjugate Gradient  BFGS  L-BFGS
  • 37. SACON Overview  Intro. to Support Vector Machines (SVM)  Properties of SVM  Applications  Discussion
  • 38. SACON  A Support Vector Machine (SVM) is a supervised machine learning algorithm that can be employed for both classification and regression purposes.  SVMs are more commonly used in classification problems Introduction Plot shows size and weight of several people, and there is also a way to distinguish between men and women.
  • 39. SACON  We can see that it is possible to separate the data into classes.  We could trace a line and then all the data points representing men will be above the line, and all the data points representing women will be below the line. Separating Hyperplane
  • 40. SACON  Many separating hyperplane possible. Which one is best? What is the Optimal Separating Hyperplane
  • 41. SACON • We will try to select an hyperplane as far as possible from data points from each category (best hyperplane) • Because it correctly classifies the training data • And because it is the one which will generalize better with unseen data What is the Optimal Separating Hyperplane
  • 42. SACON • Given a particular hyperplane, we can compute the distance between the hyperplane and the closest data point(Support Vectors). • Basically the margin is a no man's land. There will never be any data point inside the margin. Large Margin Classifier The optimal hyperplane will be the one with the biggest margin. Margin A is better than Margin B
  • 43. SACON How do we calculate this Margin?
  • 44. SACON How do we maximize this Margin?
  • 45. SACON How do we maximize this Margin?
  • 46. SACON Non-linear SVMs  Datasets that are linearly separable with some noise work out great:  But what are we going to do if the dataset is just too hard?  How about… mapping data to a higher- dimensional space: 0 x 0 x 0 x x2
  • 47. SACON Non-linear SVMs: Feature spaces  General idea: the original input space can always be mapped to some higher-dimensional feature space where the training set is separable: Φ: x → φ(x)
  • 48. SACON The“Kernel Trick”  The linear classifier relies on dot product between vectors K(xi,xj)=xi Txj  If every data point is mapped into high-dimensional space via some transformation Φ: x → φ(x), the dot product becomes: K(xi,xj)= φ(xi) Tφ(xj)  A kernel function is some function that corresponds to an inner product in some expanded feature space.  Example: 2-dimensional vectors x=[x1 x2]; let K(xi,xj)=(1 + xi Txj)2 , Need to show that K(xi,xj)= φ(xi)Tφ(xj): K(xi,xj)=(1 + xi Txj)2 , = 1+ xi1 2xj1 2 + 2 xi1xj1 xi2xj2+ xi2 2xj2 2 + 2xi1xj1 + 2xi2xj2 = [1 xi1 2 √2 xi1xi2 xi2 2 √2xi1 √2xi2]T [1 xj1 2 √2 xj1xj2 xj2 2 √2xj1 √2xj2] = φ(xi)Tφ(xj), where φ(x) = [1 x1 2 √2 x1x2 x2 2 √2x1 √2x2]
  • 49. SACON What Functions are Kernels?  For some functions K(xi,xj) checking that K(xi,xj)= φ(xi)Tφ(xj) can be cumbersome.  Mercer’s theorem: Every semi-positive definite symmetric function is a kernel  Semi-positive definite symmetric functions correspond to a semi-positive definite symmetric Gram matrix: K(x1,x1) K(x1,x2) K(x1,x3) … K(x1,xN) K(x2,x1) K(x2,x2) K(x2,x3) K(x2,xN) … … … … … K(xN,x1) K(xN,x2) K(xN,x3) … K(xN,xN) K=
  • 50. SACON Examples of Kernel Functions  Linear: K(xi,xj)= xi Txj  Polynomial of power p: K(xi,xj)= (1+ xi Txj)p  Gaussian (radial-basis function network):  Sigmoid: K(xi,xj)= tanh(β0xi Txj + β1) ) 2 exp(),( 2 2  ji ji xx xx  K
  • 51. SACON Non-linear SVMs Mathematically  Dual problem formulation:  The solution is:  Optimization techniques for finding αi’s remain the same! Find α1…αN such that Q(α) =Σαi - ½ΣΣαiαjyiyjK(xi, xj) is maximized and (1) Σαiyi = 0 (2) αi ≥ 0 for all αi f(x) = ΣαiyiK(xi, xj)+ b
  • 52. SACON  SVM locates a separating hyperplane in the feature space and classify points in that space  It does not need to represent the space explicitly, simply by defining a kernel function  The kernel function plays the role of the dot product in the feature space. Nonlinear SVM - Overview
  • 53. SACON Properties of SVM  Flexibility in choosing a similarity function  Sparseness of solution when dealing with large data sets - only support vectors are used to specify the separating hyperplane  Ability to handle large feature spaces - complexity does not depend on the dimensionality of the feature space  Overfitting can be controlled by soft margin approach  Nice math property: a simple convex optimization problem which is guaranteed to converge to a single global solution  Feature Selection
  • 54. SACON SVM Applications  SVM has been used successfully in many real-world problems  Text (and hypertext) categorization  Image classification  Bioinformatics (Protein classification, Cancer classification)  Hand-written character recognition
  • 55. SACON Application 1: Cancer Classification High Dimensional  - p>1000; n<100 Imbalanced  - less positive samples Many irrelevant features Noisy Genes Patients g-1 g-2 …… g-p P-1 p-2 ……. p-n N n xxkxxK   ),(],[ FEATURE SELECTION In the linear case, wi 2 gives the ranking of dim i SVM is sensitive to noisy (mis-labeled) data 
  • 56. SACON Weakness of SVM  It is sensitive to noise - A relatively small number of mislabeled examples can dramatically decrease the performance  It only considers two classes - how to do multi-class classification with SVM? - Answer: 1) with output arity m, learn m SVM’s  SVM 1 learns “Output==1” vs “Output != 1”  SVM 2 learns “Output==2” vs “Output != 2”  :  SVM m learns “Output==m” vs “Output != m”  2)To predict the output for a new input, just predict with each SVM and find out which one puts the prediction the furthest into the positive region.
  • 57. SACON Application 2: Text Categorization  Task: The classification of natural text (or hypertext) documents into a fixed number of predefined categories based on their content. - email filtering, web searching, sorting documents by topic, etc..  A document can be assigned to more than one category, so this can be viewed as a series of binary classification problems, one for each category
  • 58. SACON Representation of Text IR’s vector space model (aka bag-of-words representation)  A doc is represented by a vector indexed by a pre- fixed set or dictionary of terms  Values of an entry can be binary or weights  Normalization, stop words, word stems  Doc x => φ(x)
  • 59. SACON Text Categorization using SVM  The distance between two documents is φ(x)·φ(z)  K(x,z) = 〈φ(x)·φ(z) is a valid kernel, SVM can be used with K(x,z) for discrimination.  Why SVM?  High dimensional input space  Few irrelevant features (dense concept)  Sparse document vectors (sparse instances)  Text categorization problems are linearly separable
  • 60. SACON Some Issues  Choice of kernel  Gaussian or polynomial kernel is default  If ineffective, more elaborate kernels are needed  Domain experts can give assistance in formulating appropriate similarity measures  Choice of kernel parameters  e.g. σ in Gaussian kernel  σ is the distance between closest points with different classifications  In the absence of reliable criteria, applications rely on the use of a validation set or cross-validation to set such parameters.  Optimization criterion – Hard margin v.s. Soft margin  a lengthy series of experiments in which various parameters are tested
  • 62. SACON k-Nearest Neighbor Classification (kNN)  Unlike all the previous learning methods, kNN does not build model from the training data.  To classify a test instance d, define k-neighborhood P as k nearest neighbors of d  Count number n of training instances in P that belong to class cj  Estimate Pr(cj|d) as n/k  No training is needed. Classification time is linear in training set size for each test case.
  • 63. SACON kNN Algorithm  k is usually chosen empirically via a validation set or cross-validation by trying a range of k values.  Distance function is crucial, but depends on applications.
  • 65. SACON Discussions  kNN can deal with complex and arbitrary decision boundaries.  Despite its simplicity, researchers have shown that the classification accuracy of kNN can be quite strong and in many cases as accurate as those elaborated methods.  kNN is slow at the classification time  kNN does not produce an understandable model
  • 67. SACON INTRODUCTION- What is clustering?  Clustering is the classification of objects into different groups, or more precisely, the partitioning of a data set into subsets (clusters), so that the data in each subset (ideally) share some common trait - often according to some defined distance measure.
  • 68. SACON TYPES OF CLUSTERING  Hierarchical algorithms: these find successive clusters using previously established clusters.  Agglomerative ("bottom-up"): Agglomerative algorithms begin with each element as a separate cluster and merge them into successively larger clusters.  Divisive ("top-down"): Divisive algorithms begin with the whole set and proceed to divide it into successively smaller clusters. SACON 2018 - Pune CLUSTER DENDOGRAM
  • 69. SACON TYPES OF CLUSTERING  Partitional clustering: Partitional algorithms determine all clusters at once. They include:  K-means and derivatives  Fuzzy c-means clustering  QT clustering algorithm
  • 70. SACON TYPES OF CLUSTERING Distance measure will determine how the similarity of two elements is calculated and it will influence the shape of the clusters.  They include: The Euclidean distance (also called 2-norm distance) is given by: The Manhattan distance (also called taxicab norm or 1-norm) is given by:
  • 71. SACON  The maximum norm is given by:  The Mahalanobis distance corrects data for different scales and correlations in the variables.  Inner product space: The angle between two vectors can be used as a distance measure when clustering high dimensional data  Hamming distance (sometimes edit distance) measures the minimum number of substitutions required to change one member into another.
  • 72. SACON K-MEANS CLUSTERING The k-means algorithm is an algorithm to cluster n objects based on attributes into k partitions, where k < n. It is similar to the expectation-maximization algorithm for mixtures of Gaussians in that they both attempt to find the centers of natural clusters in the data. It assumes that the object attributes form a vector space.
  • 73. SACON  An algorithm for partitioning (or clustering) N data points into K disjoint subsets Sj containing data points so as to minimize the sum-of-squares criterion where xn is a vector representing the the nth data point and uj is the geometric centroid of the data points in Sj.  Simply speaking k-means clustering is an algorithm to categorize or to group the objects based on attributes/features into K number of group.  K is positive integer number.  The grouping is done by minimizing the sum of squares of distances between data and the corresponding cluster centroid.
  • 74. SACON HOW K-MEANS CLUSTERING WORKS?  Step 1: Begin with a decision on the value of k = Number of clusters  Step 2: Put any initial partition that classifies the data into k clusters. You may assign the training samples randomly, or systematically as the following:  Take the first k training sample as single- element clusters  Assign each of the remaining (N-k) training sample to the cluster with the nearest centroid. After each assignment, recompute the centroid of the gaining cluster.  Step 3: Take each sample in sequence and compute its distance from the centroid of each of the clusters. If a sample is not currently in the cluster with the closest centroid, switch this sample to that cluster and update the centroid of the cluster gaining the new sample and the cluster losing the sample.  Step 4 . Repeat step 3 until convergence is achieved, that is until a pass through the training sample causes no new assignments.
  • 75. SACON
  • 76. SACON SACON Pune 2018 India | Pune | May 18 – 19 | Hotel Hyatt Pune Bias-Variance in Machine Learning
  • 77. SACON  Bias is the algorithm's tendency to consistently learn the wrong thing by not taking into account all the information in the data  Variance is the algorithm's tendency to learn random things irrespective of the real signal by fitting highly flexible models that follow the error/noise in the data too closely Bias/Variance
  • 78. SACON • Generalization ability gives an algorithm’s ability to give accurate prediction new, previous unseen data • Models that are too complex for the amount of training data available are said to overfit and are not likely to generalize well to new examples • High variance can cause an algorithm to model the random noise in the training data, rather than the intended outputs (overfitting). • Models that are too simple, that do not even do well on training data, are said to underfit and also not likely to generalize well. • High bias can cause an algorithm to miss the relevant relations between features and target outputs (underfitting). Problem of high Bias/Variance
  • 80. SACON Bias/Variance is a Way to Understand Overfitting and Underfitting Error/Loss on training set Dtrain Error/Loss on an unseen test set Dtest high error 80 complex classifiersimple classifier “too simple” “too complex”
  • 81. SACON Definitions • Overfitting: too much reliance on the training data • Underfitting: a failure to learn the relationships in the training data • High Variance: model changes significantly based on training data • High Bias: assumptions about model lead to ignoring training data • Overfitting and underfitting cause poor generalization on the test set • A validation set for model tuning can prevent under and overfitting SACON 2018 - Pune
  • 82. SACON Ways to Deal with Overfitting and Underfitting  Underfitting:  Easier to resolve  Try different machine learning models  Try stronger models with higher capacity (hyperparameter tuning)  Try more features  Overfitting  Use a resampling technique like K-fold cross validation  Improve the feature quality or remove some features  Training with more data  Early stopping  Regularization  Ensembling Early Stopping
  • 83. SACON Regularization • Regularization penalizes the coefficients. In machine learning, it actually penalizes the weight matrices of the nodes. • L1 and L2 are the most common types of regularization. • These update the general cost function by adding another term known as the regularization term. Cost function = Loss (say, binary cross entropy) + Regularization term
  • 84. SACON L1 and L2 Regularization  In L2, we have:  Here, lambda is the regularization parameter. It is the hyperparameter whose value is optimized for better results. L2 regularization is also known as weight decay as it forces the weights to decay towards zero (but not exactly zero).  In L1, we have:  In this, we penalize the absolute value of the weights. Unlike L2, the weights may be reduced to zero here.
  • 85. SACON Neural Networks in Machine Learning
  • 86. SACON Artificial Neural Networks  A Single Neuron: The basic unit of computation in a neural network is the neuron, often called a node or unit.  The function f is non-linear and is called the Activation Function.  The idea of ANNs is based on the belief that working of human brain by making the right connections, can be imitated using silicon and wires as living neurons and dendrites.
  • 87. SACON Activation Function  Sigmoid: takes a real-valued input and squashes it to range between 0 and 1. σ(x) = 1 / (1 + exp(−x))  tanh: takes a real-valued input and squashes it to the range [-1, 1] tanh(x) = 2σ(2x) − 1  ReLU: ReLU stands for Rectified Linear Unit. It takes a real-valued input and thresholds it at zero (replaces negative values with zero) f(x) = max(0, x)
  • 88. SACON Neural Network Intuition (single layer) SACON 2018 - Pune
  • 89. SACON Neural Network Intuition (Multiple Layer layer)  Multi Layer Neural network is capable of learning complex functions.  Lets consider XNOR operation. • CASE1: X1 XNOR X2 = (A’.B’) + (A.B) NN representation • CASE2: X1 XNOR X2 = NOT [ (A+B).(A’+B’) ] NN representation = ?
  • 90. SACON Back-Propagation  Back-propagation (BP) algorithms works by determining the loss (or error) at the output and then propagating it back into the network.  The weights are updated to minimize the error resulting from each neuron.
  • 91. SACON Regularization: Dropout  At every iteration, it randomly selects some nodes and removes them along with all of their incoming and outgoing connections  We need to choose the dropout parameter such that we get the appropriate fitting
  • 92. SACON Deep Learning • Deep Neural Network has a been very successful recently in the field of computer vision, Natural language Processing, Speech recognition and many more. • Some of the important/successful networks are • Convolutional Neural Network: Has been very successful in computer vision • Recurrent neural network: Has been successful in Natural Language Processing and speech recognition as well.
  • 94. SACON Decision Tree  Decision Tree is the supervised learning algorithm.  We split the population or sample into two or more homogeneous sets (or sub- populations) based on most significant differentiator in input variables. 1.Root Node: It represents entire population or sample and this further gets divided into two or more homogeneous sets. 2.Splitting: It is a process of dividing a node into two or more sub-nodes. 3.Decision Node: When a sub-node splits into further sub-nodes, then it is called decision node. 4.Leaf/ Terminal Node: Nodes do not split is called Leaf or Terminal node.
  • 96. SACON Methods of splitting: Information gain which node can be described easily?  Information theory is a measure to define this degree of disorganization in a system known as Entropy. Here p and q is probability of success and failure respectively in that node.
  • 97. SACON Other Tree based methods  Trade-off management of bias-variance errors.  Bagging is a simple ensembling technique in which we build many independent predictors/models/learners and combine them using some model averaging techniques.  Ensemble methods involve group of predictive models to achieve a better accuracy and model stability.  Random Forest: Multiple Trees instead of single tree. It’s a bagging method  To classify a new object based on attributes, each tree gives a classification and we say the tree “votes” for that class.
  • 98. SACON Other Tree based methods  Gradient Boosting is a tree ensemble technique that creates a strong classifier from a number of weak classifiers.  It works in the technique of weak learners and the additive model.  Boosting is an ensemble technique in which the predictors are not made independently, but sequentially.
  • 99. SACON Iris Dataset  Three species of Iris (Iris setosa, Iris virginica and Iris versicolor).  Four features were measured from each sample: the length and the width of the sepals and petals, in centimeters.
  • 100. SACON References • Andrew Ng’s Coursera Course • Scikit Learn Training example on Google • Nvidia • Sebastian Ruder’s blog • HBR • MIT Tech Review • Lots of Others • AI community in general • IDLI Community