SlideShare a Scribd company logo
MACHINE LEARNING
MODULE 3
NON-LINEAR LEARNING
 Introduction of Non-Linear Model
 Stochastic Vs Batch Gradient Descent
 Neural Network
 Model Representations
 Different Activation Functions
 Perceptron
 Multi Layer Perceptron
 Back Propagation
 Regularization : Variance Vs Bias
 Support Vector Machine (SVM)
 K-Nearest Neighbors (KNN)
THIS PRESENTATION IS ABOUT
INTRODUCTION OF NON-LINEAR
MODEL
LINEAR MODEL
As long as we have weights in Linear combinations
we call it linear model in machine learning.
Linear Model Examples
LINEAR MODEL EXAMPLE
Linear Model Example:
NON-LINEAR MODEL
Non Linear Model Examples
…NON LINEAR EXAMPLE
NON-LINEAR EXAMPLE
NON-LINEAR MODEL
Can Linear Model has Curve Shape?
Yes
Since, Linear means Linear Combinations of
Weights, NOT the Shape of the Function.
NON-LINEAR MODEL
NON-LINEAR MODEL
NON-LINEAR MODEL
NON-LINEAR MODEL
NON-LINEAR MODEL
LINEAR AND NON-LINEAR
• In this method one training sample (example) is passed through
the neural network at a time and the parameters (weights) of each
layer are updated with the computed gradient.
• So, at a time a single training sample is passed through the network
and its corresponding loss is computed. The parameters of all the
layers of the network are updated after every training sample.
• For example, if the training set contains 100 samples then the
parameters are updated 100 times that is one time after every
individual example is passed through the network.
STOCHASTIC GRADIENT DESCENT
1. It is easier to fit into memory due to a single training sample being
processed by the network
2. It is computationally fast as only one sample is processed at a time
3. For larger datasets it can converge faster as it causes updates to the
parameters more frequently
4. Due to frequent updates the steps taken towards the minima of the loss
function have oscillations which can help getting out of local minimums
of the loss function (in case the computed position turns out to be the
local minimum)
1. Due to frequent updates the steps taken towards the minima are very noisy.
This can often lead the gradient descent into other directions.
2. Also, due to noisy steps it may take longer to achieve convergence to the
minima of the loss function
3. Frequent updates are computationally expensive due to using all resources
for processing one training sample at a time
4. It loses the advantage of vectorized operations as it deals with only a single
example at a time
DISADVANTAGES OF STOCHASTIC GRADIENT DESCENT
ADVANTAGES OF STOCHASTIC GRADIENT DESCENT
BATCH GRADIENT DESCENT
• The concept of carrying out gradient descent is the same as
stochastic gradient descent. The difference is that instead of
updating the parameters of the network after computing the loss of
every training sample in the training set, the parameters are
updated once that is after all the training examples have been
passed through the network.
• For example, if the training dataset contains 100 training examples
then the parameters of the neural network are updated once.
ADVANTAGES OF BATCH GRADIENT DESCENT
1. Less oscillations and noisy steps taken towards the global minima of the
loss function due to updating the parameters by computing the average of
all the training samples rather than the value of a single sample
2. It can benefit from the vectorization which increases the speed of
processing all training samples together
3. It produces a more stable gradient descent convergence and stable error
gradient than stochastic gradient descent
4. It is computationally efficient as all computer resources are not being used
to process a single sample rather are being used for all training samples.
DISADVANTAGES OF BATCH GRADIENT DESCENT
1. Sometimes a stable error gradient can lead to a local minima and unlike
stochastic gradient descent no noisy steps are there to help get out of the
local minima.
2. The entire training set can be too large to process in the memory due to
which additional memory might be needed.
3. Depending on computer resources it can take too long for processing all
the training samples as a batch.
MINI BATCH GRADIENT DESCENT BATCH :
A COMPROMISE
• This is a mixture of both stochastic and batch gradient descent. The training
set is divided into multiple groups called batches. Each batch has a number
of training samples in it.
• At a time a single batch is passed through the network which computes the
loss of every sample in the batch and uses their average to update the
parameters of the neural network.
• For example, say the training set has 100 training examples which is divided
into 5 batches with each batch containing 20 training examples. This means
that the equation will be iterated over 5 times (number of batches).
This ensures the following advantages of both stochastic and batch
gradient descent are used due to which Mini Batch Gradient Descent is most
commonly used in practice.
1. Easily fits in the memory
2. It is computationally efficient
3. Benefit from vectorization
4. If stuck in local minimums, some noisy steps can lead the way out of them
5. Average of the training samples produces stable error gradients and
convergence.
NOISE IN GRADIENT DESCENT
NEURAL NETWORK
WHAT ARE NEURAL NETWORKS?
 Neural Networks are networks of neurons, for example, as
found in real (i.e. biological) brains
 Artificial neurons are crude approximations of the neurons
found in real brains. They may be physical devices, or purely
mathematical constructs.
 Artificial Neural Networks (ANNs) are networks of Artificial
Neurons and hence constitute crude approximations to parts
of real brains. They maybe physical devices, or simulated on
conventional computers.
 From a practical point of view, an ANN is just a parallel
computational system consisting of many simple processing
elements connected together in a specific way in order to
perform a particular task
ADVANTAGES
 They are extremely powerful computational devices.
 Massive parallelism makes them very efficient.
 They can learn and generalize from training data – so
there is no need for enormous feats of programming.
 They are particularly fault tolerant.
 They are very noise tolerant – so they can cope with
situations where normal symbolic systems would have
difficulty
 In principle, they can do anything a symbolic/logic
system can do, and more.
BOOLEAN FUNCTIONS AND PERCEPTRON
TYPES OF NEURAL NETWORK
TYPES OF NEURAL NETWORKS
THE NERVOUS SYSTEM
THE NEURON
• Dendrites receives the signals for the neuron.
• Axon transmits the signal from the neuron.
• Dendrites are connected to axon of the other neuron.
Signals from 1 neuron passes down to the next
neuron via axon
.
THE NEURON
Input layer is showing all the independent variable for one observation.
THE NEURON
Output can be continuous, binary or categorical.
THE NEURON
If it is categorical then
we might get multiple
outputs in terms of
dummy variables.
Eg: x1= age, x2= salary, xm = name Y= yes/ no
Will purchase a car?
THE NEURON
• Weights are adjusted by the process of learning
• It decides the importance/ strength of each signal
• Training a Neural network is based on Adjusting Weights
STEP 1: COMPUTATION OF WEIGHTED SUM OF
INPUT VALUES
Weighted sum of all Input Values
STEP 2: COMPUTATION OF ACTIVATION
FUNCTION
Activation function:
It decides if it needs to pass a signal or not to the output layer
STEP 3: SIGNAL PASSED TO THE OUTPUT
Neuron passes down that signal to the next neuron down the line.
THE ACTIVATION FUNCTION
If threshold function is >=0 then it passes 1 else 0
THE ACTIVATION FUNCTION
• Sigmoid function looks smooth in comparison to threshold function.
• It can be useful when predicting probabilities.
THE ACTIVATION FUNCTION
THE ACTIVATION FUNCTION
BRAIN TEASER
Assuming that your dependent variable is binary, what activation function
will you use?
BRAIN TEASER : ANSWER
PRACTICAL APPLICATION
May apply a rectifier function in
the hidden layer
Sigmoid function on
output layer
HOW DOES NEURAL NETWORK
WORK?
HOW DOES NEURAL NETWORK
WORK?
HOW DOES NEURAL NETWORK
WORK?
HOW DOES NEURAL NETWORK
WORK?
HOW DOES NEURAL NETWORK
WORK?
HOW NEURAL NETWORK LEARN
 Goal is to create a network that learns on its own.
 How can you distinguish cat and dog???
 Can learn on its own.
HOW NEURAL NETWORK LEARN
HOW NEURAL NETWORK LEARN
HOW NEURAL NETWORK LEARN
Cost function C tells the error
HOW NEURAL NETWORK LEARN
We update the weight in order to reduce cost function
HOW NEURAL NETWORK LEARN
PERCEPTRON
PERCEPTRON
…PERCEPETRON
PERCEPTRON MODEL REPRESENTATION
PERCEPTRON MODEL REPRESENTATION
PERCEPTRON MODEL REPRESENTATION
DIFFERENT ACTIVATION FUNCTIONS
MODEL REPRESENTATION:
EXAMPLE
DIFFERENT ACTIVATION FUNCTIONS
PERCEPTRON NODE – THRESHOLD
LOGIC UNIT
x1
xn
x2
w1
w2
wn
z
q
q
q
<
=
³
å
å
=
=
i
n
i
i
i
n
i
i
w
x
z
w
x
1
1
if
0
if
1
𝜽
• Learn weights such that an
objective function is maximized.
• What objective function should we
use?
• What learning algorithm should we
use?
PERCEPTRON LEARNING ALGORITHM
x1
x2
z
q
q
<
=
³
å
å
=
=
i
n
i
i
i
n
i
i
w
x
z
w
x
1
1
if
0
if
1
0.4
-0.2
0.1
x1 x2 t
0
1
.1
.3
.4
.8
FIRST TRAINING INSTANCE
q
q
<
=
³
å
å
=
=
i
n
i
i
i
n
i
i
w
x
z
w
x
1
1
if
0
if
1
0.8
0.3
z
0.4
-0.2
0.1
net = .8*.4 + .3*(-0.2) = 0.26
=1
x1 x2 t
0
1
.1
.3
.4
.8
SECOND TRAINING INSTANCE
q
q
<
=
³
å
å
=
=
i
n
i
i
i
n
i
i
w
x
z
w
x
1
1
if
0
if
1
x1 x2 t
0
1
.1
.3
.4
.8
.4
.1
z
.4
-.2
.01
net = .4*.4 + .1*-.2 = 0.14
=1
* c
Dwi = (t - z) * xi
SECOND EXAMPLE
PERCEPTRON RULE LEARNING
where wi is the weight from input i to perceptron node,
c is the learning rate,
t is the target for the current instance,
z is the current output,
xi is ith input
Dwi = (t - z)*c*xi
• Least perturbation principle
• Only change weights if there is an error
• small c rather than changing weights sufficient to make current
pattern correct
• Scale by xi
• Create a perceptron node with n inputs
• Iteratively apply a pattern from the training set and apply the
perceptron rule
• Each iteration through the training set is an epoch
• Continue training until total training set error ceases to improve
• Perceptron Convergence Theorem: Guaranteed to find a solution
in finite time if a solution exists
THE EXCLUSIVE OR PROBELM
THE XOR PROBELM
MULTI LAYER PERCEPTRON
MULTI LAYER PERCEPTRON
BACKPROPOGATION
BACKPROPOGATION: BASIC STEPS
BACKPROPOGATION
STEP1: FORWARD PASS
79
FORWARD PASS
STEP2: BACKWARD PASS
BACKWARD PASS
STEP3: UPDATION OF WEIGHTS
…UPDATION OF WEIGHTS
SOLVED BACK PROPOGATION
Rough Work
VARIANCE VS BIAS
If no bias weight then
the hyperplane must
go through the origin
VARIATION VS BIAS
In Training Phase
What is Bias?
Bias is the difference between the average prediction of our model and
the correct value which we are trying to predict. Model with high bias
pays very little attention to the training data and oversimplifies the
model. It always leads to high error on training and test data.
What is Variance?
Variance is the variability of model prediction for a given data point or a
value which tells us spread of our data. Model with high variance pays a
lot of attention to training data and does not generalize on the data
which it hasn’t seen before. As a result, such models perform very well
on training data but has high error rates on test data.
BIAS AND VARIANCE
If our model is too simple and has very few parameters then it may
have high bias and low variance. On the other hand if our model
has large number of parameters then it’s going to have high
variance and low bias. So we need to find the right/good balance
without overfitting and underfitting the data.
Total Error = Bias^ 2 + Variance + Irreducible Error
WHY IS BIAS VARIANCE TRADEOFF?
An optimal balance of bias and
variance would never overfit or
underfit the model.
SUPPORT VECTOR MACHINE
https://www.youtube.com/watch?v=efR1C6CvhmE&t=255s
https://www.youtube.com/watch?v=Toet3EiSFcM&t=7s
https://www.youtube.com/watch?v=Qc5IyLW_hns&t=52s
Reference Links:
INTRODUCTION
SUPPORT VECTOR MACHINES
HYPERPLANE AS DECISION SURFACE
SUPPORT VECTORS
MAXIMIZING THE MARGIN
SUPPORT VECTOR MACHINE (SVM)
MAXIMUM MARGIN : FORMALIZATION
GEOMETRIC MARGIN
LINEAR SUPPORT VECTOR MACHINE
LINEAR SVM: THE LINEARLY SEPARABLE CASE
NON-LINEAR SVM
NON-LINEAR SVM: FEATURE SPACE
KERNELS
K-NEAREST NEIGHBOR
Simple Analogy..
• Tell me about your friends(who your neighbors
are) and I will tell you who you are.
Instance-based Learning
Its very similar to
a Desktop!!
KNN – DIFFERENT NAMES
127
• K-Nearest Neighbors
• Memory-Based Reasoning
• Example-Based Reasoning
• Instance-Based Learning
• Lazy Learning
WHAT IS KNN?
• A powerful classification algorithm used in pattern
recognition.
• K nearest neighbors stores all available cases and
classifies new cases based on a similarity
measure(e.g distance function)
• One of the top data mining algorithms used today.
• A non-parametric lazy learning algorithm (An
Instance-based Learning method).
KNN: CLASSIFICATION APPROACH
• An object (a new instance) is classified by a
majority votes for its neighbor classes.
• The object is assigned to the most common
class amongst its K nearest neighbors
(measured by a distant function ).
DISTANCE MEASURE
Training
Records
Test
Record
Compute
Distance
Choose k of the
“nearest” records
DISTANCE FUNCTIONS FOR CONTINUOUS VARIABLES
DISTANCE BETWEEN NEIGHBORS
• Calculate the distance between new example
(E) and all examples in the training set.
• Euclidean distance between two examples.
– X = [x1,x2,x3,..,xn]
– Y = [y1,y2,y3,...,yn]
– The Euclidean distance between X and Y is defined as
n
D(X ,Y)  (xi  yi )
i1
2
K-NEAREST NEIGHBOR ALGORITHM
• Each instance is represented with a set of numerical
attributes.
• Each of the training data consists of a set of vectors and a
class label associated with each vector.
• Classification is done by comparing feature vectors of
different K nearest points.
• Select the K-nearest examples to E in the training set.
• Assign E to the most common class among its K-nearest
neighbors.
All the instances correspond to points in an n-
dimensional feature space.
3-KNN: EXAMPLE(1)
HOW TO SELECT K?
• If K is too small it is sensitive to noise points.
• Larger K works well. But too large K may include
majority points from other classes.
• Rule of thumb is K < sqrt(n), n is number of examples.
X
136
X X X
(a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor
K-nearest neighbors of a record x are data points that
have the k smallest distance to x.
137
KNN FEATURE
WEIGHTING
• Scale each feature by its importance for
classification.
• Can use our prior knowledge about which features
are more important
• Can learn the weights wk using cross‐validation
FEATURE NORMALIZATION
• Distance between neighbors could be dominated
by some attributes with relatively large
numbers.
e.g., income of customers in our previous example.
• Arises when two features are in different scales.
• Important to normalize those features.
– Mapping values to numbers between 0 – 1.
NOMINAL/CATEGORICAL DATA
• Distance works naturally with numerical attributes.
• Binary value categorical data attributes can be regarded
as 1 or 0.
KNN CLASSIFICATION
$50,000
$0
$100,000
$250,000
$200,000
$150,000
0 10 20 30 40 50 60 70
Non-Default
Default
Age
Loan$
20
KNN CLASSIFICATION –
DISTANCE
 Age Loan Default Distance
 25 $40,000 N 102000
 35 $60,000 N 82000
 45 $80,000 N 62000
 20 $20,000 N 122000
 35 $120,000 N 22000
 52 $18,000 N 124000
 23 $95,000 Y 47000
 40 $62,000 Y 80000
 60 $100,000 Y 42000
 48 $220,000 Y 78000
 33 $150,000 Y 8000
 48 $142,000 ?
2 2
2
1 2 1  y )
D  (x  x )  (y
142
KNN CLASSIFICATION –
STANDARDIZED DISTANCE
Distance
0.7652
0.5200
0.3160
0.9245
0.3428
0.6220
0.6669
0.4437
0.3650
0.3861
0.3771

X  Min
Max  Min
X s
143
Age Loan Default
0.125 0.11 N
0.375 0.21 N
0.625 0.31 N
0 0.01 N
0.375 0.50 N
0.8 0.00 N
0.075 0.38 Y
0.5 0.22 Y
1 0.41 Y
0.7 1.00 Y
0.325 0.65 Y
0.7 0.61 ?
STRENGTHS OF KNN
• Very simple and intuitive.
• Can be applied to the data from any distribution.
• Good classification if the number of samples is large enough.
Weaknesses of KNN
• Takes more time to classify a new example.
 need to calculate and compare distance from new
example to all other examples.
• Choosing k may be tricky.
• Need large number of samples for accuracy.
SOLVED K-NEAREST NEIGHBOR
ML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptx

More Related Content

Similar to ML Module 3 Non Linear Learning.pptx

Introduction to Perceptron and Neural Network.pptx
Introduction to Perceptron and Neural Network.pptxIntroduction to Perceptron and Neural Network.pptx
Introduction to Perceptron and Neural Network.pptx
Poonam60376
 
Terminology Machine Learning
Terminology Machine LearningTerminology Machine Learning
Terminology Machine Learning
DataminingTools Inc
 
Neural network
Neural networkNeural network
Neural network
Saddam Hussain
 
Deep learning
Deep learningDeep learning
Deep learning
Ratnakar Pandey
 
cnn ppt.pptx
cnn ppt.pptxcnn ppt.pptx
cnn ppt.pptx
rohithprabhas1
 
Lec 6-bp
Lec 6-bpLec 6-bp
Lec 6-bp
Taymoor Nazmy
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
Junaid Bhat
 
Introduction to Deep learning and H2O for beginner's
Introduction to Deep learning and H2O for beginner'sIntroduction to Deep learning and H2O for beginner's
Introduction to Deep learning and H2O for beginner's
Vidyasagar Bhargava
 
08 neural networks
08 neural networks08 neural networks
08 neural networks
ankit_ppt
 
nural network ER. Abhishek k. upadhyay
nural network ER. Abhishek  k. upadhyaynural network ER. Abhishek  k. upadhyay
nural network ER. Abhishek k. upadhyay
abhishek upadhyay
 
Introduction to Neural networks (under graduate course) Lecture 9 of 9
Introduction to Neural networks (under graduate course) Lecture 9 of 9Introduction to Neural networks (under graduate course) Lecture 9 of 9
Introduction to Neural networks (under graduate course) Lecture 9 of 9
Randa Elanwar
 
Deep learning notes.pptx
Deep learning notes.pptxDeep learning notes.pptx
Deep learning notes.pptx
Pandi Gingee
 
Deep learning
Deep learningDeep learning
Deep learning
Kuppusamy P
 
Neural Networks in Data Mining - “An Overview”
Neural Networks  in Data Mining -   “An Overview”Neural Networks  in Data Mining -   “An Overview”
Neural Networks in Data Mining - “An Overview”
Dr.(Mrs).Gethsiyal Augasta
 
Facial Emotion Detection on Children's Emotional Face
Facial Emotion Detection on Children's Emotional FaceFacial Emotion Detection on Children's Emotional Face
Facial Emotion Detection on Children's Emotional Face
Takrim Ul Islam Laskar
 
Designing your neural networks – a step by step walkthrough
Designing your neural networks – a step by step walkthroughDesigning your neural networks – a step by step walkthrough
Designing your neural networks – a step by step walkthrough
Lavanya Shukla
 
EssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdfEssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdf
Ankita Tiwari
 
Artificial Neural networks
Artificial Neural networksArtificial Neural networks
Artificial Neural networks
Learnbay Datascience
 
Unit 2 ml.pptx
Unit 2 ml.pptxUnit 2 ml.pptx
Unit 2 ml.pptx
PradeeshSAI
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
mustafa aadel
 

Similar to ML Module 3 Non Linear Learning.pptx (20)

Introduction to Perceptron and Neural Network.pptx
Introduction to Perceptron and Neural Network.pptxIntroduction to Perceptron and Neural Network.pptx
Introduction to Perceptron and Neural Network.pptx
 
Terminology Machine Learning
Terminology Machine LearningTerminology Machine Learning
Terminology Machine Learning
 
Neural network
Neural networkNeural network
Neural network
 
Deep learning
Deep learningDeep learning
Deep learning
 
cnn ppt.pptx
cnn ppt.pptxcnn ppt.pptx
cnn ppt.pptx
 
Lec 6-bp
Lec 6-bpLec 6-bp
Lec 6-bp
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
Introduction to Deep learning and H2O for beginner's
Introduction to Deep learning and H2O for beginner'sIntroduction to Deep learning and H2O for beginner's
Introduction to Deep learning and H2O for beginner's
 
08 neural networks
08 neural networks08 neural networks
08 neural networks
 
nural network ER. Abhishek k. upadhyay
nural network ER. Abhishek  k. upadhyaynural network ER. Abhishek  k. upadhyay
nural network ER. Abhishek k. upadhyay
 
Introduction to Neural networks (under graduate course) Lecture 9 of 9
Introduction to Neural networks (under graduate course) Lecture 9 of 9Introduction to Neural networks (under graduate course) Lecture 9 of 9
Introduction to Neural networks (under graduate course) Lecture 9 of 9
 
Deep learning notes.pptx
Deep learning notes.pptxDeep learning notes.pptx
Deep learning notes.pptx
 
Deep learning
Deep learningDeep learning
Deep learning
 
Neural Networks in Data Mining - “An Overview”
Neural Networks  in Data Mining -   “An Overview”Neural Networks  in Data Mining -   “An Overview”
Neural Networks in Data Mining - “An Overview”
 
Facial Emotion Detection on Children's Emotional Face
Facial Emotion Detection on Children's Emotional FaceFacial Emotion Detection on Children's Emotional Face
Facial Emotion Detection on Children's Emotional Face
 
Designing your neural networks – a step by step walkthrough
Designing your neural networks – a step by step walkthroughDesigning your neural networks – a step by step walkthrough
Designing your neural networks – a step by step walkthrough
 
EssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdfEssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdf
 
Artificial Neural networks
Artificial Neural networksArtificial Neural networks
Artificial Neural networks
 
Unit 2 ml.pptx
Unit 2 ml.pptxUnit 2 ml.pptx
Unit 2 ml.pptx
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
 

More from DebabrataPain1

OOSD_UNIT1 (1).pptx
OOSD_UNIT1 (1).pptxOOSD_UNIT1 (1).pptx
OOSD_UNIT1 (1).pptx
DebabrataPain1
 
Module 2.2.pptx
Module 2.2.pptxModule 2.2.pptx
Module 2.2.pptx
DebabrataPain1
 
Module 5_detailed ppt.pptx
Module 5_detailed ppt.pptxModule 5_detailed ppt.pptx
Module 5_detailed ppt.pptx
DebabrataPain1
 
er tut 1.ppt
er tut 1.ppter tut 1.ppt
er tut 1.ppt
DebabrataPain1
 
Lecture 1. Introduction to AI and it's applications.ppt
Lecture 1. Introduction to AI and it's applications.pptLecture 1. Introduction to AI and it's applications.ppt
Lecture 1. Introduction to AI and it's applications.ppt
DebabrataPain1
 
ETN-LECTURE_1.pptx
ETN-LECTURE_1.pptxETN-LECTURE_1.pptx
ETN-LECTURE_1.pptx
DebabrataPain1
 

More from DebabrataPain1 (6)

OOSD_UNIT1 (1).pptx
OOSD_UNIT1 (1).pptxOOSD_UNIT1 (1).pptx
OOSD_UNIT1 (1).pptx
 
Module 2.2.pptx
Module 2.2.pptxModule 2.2.pptx
Module 2.2.pptx
 
Module 5_detailed ppt.pptx
Module 5_detailed ppt.pptxModule 5_detailed ppt.pptx
Module 5_detailed ppt.pptx
 
er tut 1.ppt
er tut 1.ppter tut 1.ppt
er tut 1.ppt
 
Lecture 1. Introduction to AI and it's applications.ppt
Lecture 1. Introduction to AI and it's applications.pptLecture 1. Introduction to AI and it's applications.ppt
Lecture 1. Introduction to AI and it's applications.ppt
 
ETN-LECTURE_1.pptx
ETN-LECTURE_1.pptxETN-LECTURE_1.pptx
ETN-LECTURE_1.pptx
 

Recently uploaded

ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
AhmedHussein950959
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
MdTanvirMahtab2
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
AJAYKUMARPUND1
 
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
gerogepatton
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
karthi keyan
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
seandesed
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Teleport Manpower Consultant
 
space technology lecture notes on satellite
space technology lecture notes on satellitespace technology lecture notes on satellite
space technology lecture notes on satellite
ongomchris
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
ViniHema
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
AafreenAbuthahir2
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
gdsczhcet
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
WENKENLI1
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
Robbie Edward Sayers
 
ML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptxML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptx
Vijay Dialani, PhD
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
thanhdowork
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
Pipe Restoration Solutions
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
JoytuBarua2
 

Recently uploaded (20)

ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
 
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
 
space technology lecture notes on satellite
space technology lecture notes on satellitespace technology lecture notes on satellite
space technology lecture notes on satellite
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
 
ML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptxML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptx
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
 

ML Module 3 Non Linear Learning.pptx

  • 2.  Introduction of Non-Linear Model  Stochastic Vs Batch Gradient Descent  Neural Network  Model Representations  Different Activation Functions  Perceptron  Multi Layer Perceptron  Back Propagation  Regularization : Variance Vs Bias  Support Vector Machine (SVM)  K-Nearest Neighbors (KNN) THIS PRESENTATION IS ABOUT
  • 4. LINEAR MODEL As long as we have weights in Linear combinations we call it linear model in machine learning. Linear Model Examples
  • 9. NON-LINEAR MODEL Can Linear Model has Curve Shape? Yes Since, Linear means Linear Combinations of Weights, NOT the Shape of the Function.
  • 16. • In this method one training sample (example) is passed through the neural network at a time and the parameters (weights) of each layer are updated with the computed gradient. • So, at a time a single training sample is passed through the network and its corresponding loss is computed. The parameters of all the layers of the network are updated after every training sample. • For example, if the training set contains 100 samples then the parameters are updated 100 times that is one time after every individual example is passed through the network. STOCHASTIC GRADIENT DESCENT
  • 17. 1. It is easier to fit into memory due to a single training sample being processed by the network 2. It is computationally fast as only one sample is processed at a time 3. For larger datasets it can converge faster as it causes updates to the parameters more frequently 4. Due to frequent updates the steps taken towards the minima of the loss function have oscillations which can help getting out of local minimums of the loss function (in case the computed position turns out to be the local minimum) 1. Due to frequent updates the steps taken towards the minima are very noisy. This can often lead the gradient descent into other directions. 2. Also, due to noisy steps it may take longer to achieve convergence to the minima of the loss function 3. Frequent updates are computationally expensive due to using all resources for processing one training sample at a time 4. It loses the advantage of vectorized operations as it deals with only a single example at a time DISADVANTAGES OF STOCHASTIC GRADIENT DESCENT ADVANTAGES OF STOCHASTIC GRADIENT DESCENT
  • 18. BATCH GRADIENT DESCENT • The concept of carrying out gradient descent is the same as stochastic gradient descent. The difference is that instead of updating the parameters of the network after computing the loss of every training sample in the training set, the parameters are updated once that is after all the training examples have been passed through the network. • For example, if the training dataset contains 100 training examples then the parameters of the neural network are updated once.
  • 19. ADVANTAGES OF BATCH GRADIENT DESCENT 1. Less oscillations and noisy steps taken towards the global minima of the loss function due to updating the parameters by computing the average of all the training samples rather than the value of a single sample 2. It can benefit from the vectorization which increases the speed of processing all training samples together 3. It produces a more stable gradient descent convergence and stable error gradient than stochastic gradient descent 4. It is computationally efficient as all computer resources are not being used to process a single sample rather are being used for all training samples. DISADVANTAGES OF BATCH GRADIENT DESCENT 1. Sometimes a stable error gradient can lead to a local minima and unlike stochastic gradient descent no noisy steps are there to help get out of the local minima. 2. The entire training set can be too large to process in the memory due to which additional memory might be needed. 3. Depending on computer resources it can take too long for processing all the training samples as a batch.
  • 20. MINI BATCH GRADIENT DESCENT BATCH : A COMPROMISE • This is a mixture of both stochastic and batch gradient descent. The training set is divided into multiple groups called batches. Each batch has a number of training samples in it. • At a time a single batch is passed through the network which computes the loss of every sample in the batch and uses their average to update the parameters of the neural network. • For example, say the training set has 100 training examples which is divided into 5 batches with each batch containing 20 training examples. This means that the equation will be iterated over 5 times (number of batches). This ensures the following advantages of both stochastic and batch gradient descent are used due to which Mini Batch Gradient Descent is most commonly used in practice. 1. Easily fits in the memory 2. It is computationally efficient 3. Benefit from vectorization 4. If stuck in local minimums, some noisy steps can lead the way out of them 5. Average of the training samples produces stable error gradients and convergence.
  • 21. NOISE IN GRADIENT DESCENT
  • 23. WHAT ARE NEURAL NETWORKS?  Neural Networks are networks of neurons, for example, as found in real (i.e. biological) brains  Artificial neurons are crude approximations of the neurons found in real brains. They may be physical devices, or purely mathematical constructs.  Artificial Neural Networks (ANNs) are networks of Artificial Neurons and hence constitute crude approximations to parts of real brains. They maybe physical devices, or simulated on conventional computers.  From a practical point of view, an ANN is just a parallel computational system consisting of many simple processing elements connected together in a specific way in order to perform a particular task
  • 24. ADVANTAGES  They are extremely powerful computational devices.  Massive parallelism makes them very efficient.  They can learn and generalize from training data – so there is no need for enormous feats of programming.  They are particularly fault tolerant.  They are very noise tolerant – so they can cope with situations where normal symbolic systems would have difficulty  In principle, they can do anything a symbolic/logic system can do, and more.
  • 25. BOOLEAN FUNCTIONS AND PERCEPTRON
  • 26. TYPES OF NEURAL NETWORK
  • 27. TYPES OF NEURAL NETWORKS
  • 29. THE NEURON • Dendrites receives the signals for the neuron. • Axon transmits the signal from the neuron. • Dendrites are connected to axon of the other neuron. Signals from 1 neuron passes down to the next neuron via axon .
  • 30. THE NEURON Input layer is showing all the independent variable for one observation.
  • 31. THE NEURON Output can be continuous, binary or categorical.
  • 32. THE NEURON If it is categorical then we might get multiple outputs in terms of dummy variables. Eg: x1= age, x2= salary, xm = name Y= yes/ no Will purchase a car?
  • 33. THE NEURON • Weights are adjusted by the process of learning • It decides the importance/ strength of each signal • Training a Neural network is based on Adjusting Weights
  • 34. STEP 1: COMPUTATION OF WEIGHTED SUM OF INPUT VALUES Weighted sum of all Input Values
  • 35. STEP 2: COMPUTATION OF ACTIVATION FUNCTION Activation function: It decides if it needs to pass a signal or not to the output layer
  • 36. STEP 3: SIGNAL PASSED TO THE OUTPUT Neuron passes down that signal to the next neuron down the line.
  • 37. THE ACTIVATION FUNCTION If threshold function is >=0 then it passes 1 else 0
  • 38. THE ACTIVATION FUNCTION • Sigmoid function looks smooth in comparison to threshold function. • It can be useful when predicting probabilities.
  • 41. BRAIN TEASER Assuming that your dependent variable is binary, what activation function will you use?
  • 42. BRAIN TEASER : ANSWER
  • 44. May apply a rectifier function in the hidden layer Sigmoid function on output layer
  • 45. HOW DOES NEURAL NETWORK WORK?
  • 46. HOW DOES NEURAL NETWORK WORK?
  • 47. HOW DOES NEURAL NETWORK WORK?
  • 48. HOW DOES NEURAL NETWORK WORK?
  • 49. HOW DOES NEURAL NETWORK WORK?
  • 50. HOW NEURAL NETWORK LEARN  Goal is to create a network that learns on its own.  How can you distinguish cat and dog???  Can learn on its own.
  • 53. HOW NEURAL NETWORK LEARN Cost function C tells the error
  • 54. HOW NEURAL NETWORK LEARN We update the weight in order to reduce cost function
  • 65. PERCEPTRON NODE – THRESHOLD LOGIC UNIT x1 xn x2 w1 w2 wn z q q q < = ³ å å = = i n i i i n i i w x z w x 1 1 if 0 if 1 𝜽 • Learn weights such that an objective function is maximized. • What objective function should we use? • What learning algorithm should we use?
  • 68. SECOND TRAINING INSTANCE q q < = ³ å å = = i n i i i n i i w x z w x 1 1 if 0 if 1 x1 x2 t 0 1 .1 .3 .4 .8 .4 .1 z .4 -.2 .01 net = .4*.4 + .1*-.2 = 0.14 =1 * c Dwi = (t - z) * xi
  • 70. PERCEPTRON RULE LEARNING where wi is the weight from input i to perceptron node, c is the learning rate, t is the target for the current instance, z is the current output, xi is ith input Dwi = (t - z)*c*xi • Least perturbation principle • Only change weights if there is an error • small c rather than changing weights sufficient to make current pattern correct • Scale by xi • Create a perceptron node with n inputs • Iteratively apply a pattern from the training set and apply the perceptron rule • Each iteration through the training set is an epoch • Continue training until total training set error ceases to improve • Perceptron Convergence Theorem: Guaranteed to find a solution in finite time if a solution exists
  • 71. THE EXCLUSIVE OR PROBELM
  • 82.
  • 86.
  • 87.
  • 88.
  • 89.
  • 90.
  • 92.
  • 94. If no bias weight then the hyperplane must go through the origin VARIATION VS BIAS
  • 95.
  • 96.
  • 97.
  • 98.
  • 99.
  • 100.
  • 101.
  • 102.
  • 103.
  • 104.
  • 106.
  • 107.
  • 108. What is Bias? Bias is the difference between the average prediction of our model and the correct value which we are trying to predict. Model with high bias pays very little attention to the training data and oversimplifies the model. It always leads to high error on training and test data. What is Variance? Variance is the variability of model prediction for a given data point or a value which tells us spread of our data. Model with high variance pays a lot of attention to training data and does not generalize on the data which it hasn’t seen before. As a result, such models perform very well on training data but has high error rates on test data. BIAS AND VARIANCE
  • 109. If our model is too simple and has very few parameters then it may have high bias and low variance. On the other hand if our model has large number of parameters then it’s going to have high variance and low bias. So we need to find the right/good balance without overfitting and underfitting the data. Total Error = Bias^ 2 + Variance + Irreducible Error WHY IS BIAS VARIANCE TRADEOFF? An optimal balance of bias and variance would never overfit or underfit the model.
  • 117. MAXIMUM MARGIN : FORMALIZATION
  • 120. LINEAR SVM: THE LINEARLY SEPARABLE CASE
  • 125. Simple Analogy.. • Tell me about your friends(who your neighbors are) and I will tell you who you are.
  • 126. Instance-based Learning Its very similar to a Desktop!!
  • 127. KNN – DIFFERENT NAMES 127 • K-Nearest Neighbors • Memory-Based Reasoning • Example-Based Reasoning • Instance-Based Learning • Lazy Learning
  • 128. WHAT IS KNN? • A powerful classification algorithm used in pattern recognition. • K nearest neighbors stores all available cases and classifies new cases based on a similarity measure(e.g distance function) • One of the top data mining algorithms used today. • A non-parametric lazy learning algorithm (An Instance-based Learning method).
  • 129. KNN: CLASSIFICATION APPROACH • An object (a new instance) is classified by a majority votes for its neighbor classes. • The object is assigned to the most common class amongst its K nearest neighbors (measured by a distant function ).
  • 131. DISTANCE FUNCTIONS FOR CONTINUOUS VARIABLES
  • 132. DISTANCE BETWEEN NEIGHBORS • Calculate the distance between new example (E) and all examples in the training set. • Euclidean distance between two examples. – X = [x1,x2,x3,..,xn] – Y = [y1,y2,y3,...,yn] – The Euclidean distance between X and Y is defined as n D(X ,Y)  (xi  yi ) i1 2
  • 133. K-NEAREST NEIGHBOR ALGORITHM • Each instance is represented with a set of numerical attributes. • Each of the training data consists of a set of vectors and a class label associated with each vector. • Classification is done by comparing feature vectors of different K nearest points. • Select the K-nearest examples to E in the training set. • Assign E to the most common class among its K-nearest neighbors. All the instances correspond to points in an n- dimensional feature space.
  • 135. HOW TO SELECT K? • If K is too small it is sensitive to noise points. • Larger K works well. But too large K may include majority points from other classes. • Rule of thumb is K < sqrt(n), n is number of examples. X
  • 136. 136
  • 137. X X X (a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor K-nearest neighbors of a record x are data points that have the k smallest distance to x. 137
  • 138. KNN FEATURE WEIGHTING • Scale each feature by its importance for classification. • Can use our prior knowledge about which features are more important • Can learn the weights wk using cross‐validation
  • 139. FEATURE NORMALIZATION • Distance between neighbors could be dominated by some attributes with relatively large numbers. e.g., income of customers in our previous example. • Arises when two features are in different scales. • Important to normalize those features. – Mapping values to numbers between 0 – 1.
  • 140. NOMINAL/CATEGORICAL DATA • Distance works naturally with numerical attributes. • Binary value categorical data attributes can be regarded as 1 or 0.
  • 141. KNN CLASSIFICATION $50,000 $0 $100,000 $250,000 $200,000 $150,000 0 10 20 30 40 50 60 70 Non-Default Default Age Loan$ 20
  • 142. KNN CLASSIFICATION – DISTANCE  Age Loan Default Distance  25 $40,000 N 102000  35 $60,000 N 82000  45 $80,000 N 62000  20 $20,000 N 122000  35 $120,000 N 22000  52 $18,000 N 124000  23 $95,000 Y 47000  40 $62,000 Y 80000  60 $100,000 Y 42000  48 $220,000 Y 78000  33 $150,000 Y 8000  48 $142,000 ? 2 2 2 1 2 1  y ) D  (x  x )  (y 142
  • 143. KNN CLASSIFICATION – STANDARDIZED DISTANCE Distance 0.7652 0.5200 0.3160 0.9245 0.3428 0.6220 0.6669 0.4437 0.3650 0.3861 0.3771  X  Min Max  Min X s 143 Age Loan Default 0.125 0.11 N 0.375 0.21 N 0.625 0.31 N 0 0.01 N 0.375 0.50 N 0.8 0.00 N 0.075 0.38 Y 0.5 0.22 Y 1 0.41 Y 0.7 1.00 Y 0.325 0.65 Y 0.7 0.61 ?
  • 144. STRENGTHS OF KNN • Very simple and intuitive. • Can be applied to the data from any distribution. • Good classification if the number of samples is large enough. Weaknesses of KNN • Takes more time to classify a new example.  need to calculate and compare distance from new example to all other examples. • Choosing k may be tricky. • Need large number of samples for accuracy.