SlideShare a Scribd company logo
1 of 17
Dynamically updated parallel k-NN
search algorithm using MPI
Keywords:
 MPI (Message Passing Interface)
 Machine Learning
 Classification
 k-NN
 Clustering
 K-means
# MPI ( Message Passing Interface)
 MPI is an industrial standard that specifies library routines needed for writing
message passing programs
 MPI uses a library approach to support parallel programming
# Machine Learning
 Ability of a system to learn without being coded.
 To get more proficient in performing a task from being familiar, with the
time.
# Classification
 Supervised machine Learning Approach.
 Supervised in the sense that we know the class of the training instances to
which it belongs.
 To predict the class of a test instance on the basis of some similarity measure.
k-NN Classifier
 A famous classification approach
 Based on the assumption that a instance has similar class to which it closest
or close in the feature space.
 To classify a test instance ,find its k-nearest neighbor according to some kind
of similarity . Classify on the basis of majority .
# Clustering
 Unsupervised Machine Learning Approach.
 Unsupervised as we don’t know the class of the training instances to which
they belong.
 Partition the data so as instances in one group are more similar but too
different from the others.
K-means Clustering
 A clustering approach , to cluster the training instances into k-number of
cluster. The value of k being provided by the user.
 Clustering is done by a utilizing a SSE (sum of square error) function.
Problems with the k-NN
 Time complexity
 Sensitive to the local structure of the data.
 Curse of dimensionality.
# Our Proposed Approach to solve k-NN
parallel using MPI.
 Pre-processing step
 Perform clustering process on training set to divide it into p mutually
exclusive partition {P1,P2,…,Pp}, where p is number of process .
 Create the Representative Instance to represent each partition
# Step- IInd
For i =1 to p
 Apply k-means approach
 Evaluate nearest neighbor similarity of training instances with representative
instance(centroid) of each partition.
 Perform
 Competence Enhancement – Repeated Wilson Editing Rule (noise removal)
 Competence Preservation (removal of superfluous instance)
 Store the outliers of each cluster separately.
 Update the centroid of the cluster.
 Repeat step I & II until number of instances in the selected one partition >=k.
# Step- IIIrd
 Take a test instance .
 Select the partition whose R.I is closest to test instance.
 Repeat until reach at the last sub-partition.
 Apply the majority rule.
 Select the class label who has majority for the test instance.
# Updation of training set
 When the similarity value of the new test instance with the R.I. of the different
partition exceeds the max radius value which we store during the pre-processing
step.
 Update the R.I. of that partition only, which is closed to the new test instance
# Research papers considered to design the
dynamically updated parallel k-NN using MPI.
# For Preprocessing (Clustering process ):
 Efficient and Fast Initialization Algorithm for K-means Clustering
By Mohammed El Agha, Wesam M. Ashour ,
Islamic University of Gaza, Gaza, Palestine
 A new algorithm for initial cluster centers in k-means algorithm
By Murat Erisoglu, Nazif Calis, Sadullah Sakallioglu
Department of Statistics, Faculty of Science and Letters,
Cukurova University, 01300 Adana, Turkey
 An empirical comparison of four initialization methods for the K-Means algorithm
By J.M. Pe~na *,1, J.A. Lozano, P. Larra~naga,
Department of Computer Science and Artificial Intelligence,
Intelligent Systems Group, University of the Basque Country,
P.O. Box 649,E-20080 San Sebastian, Spain
# For finding k-NN and removal of Noise and Superfluous Instances.
 Fast Condensed Nearest Neighbor Rule
By Fabrizio Angiulli
ICAR-CNR, Via Pietro Bucci 41C, 87036 Rende (CS), Italy
 Advances in Instance Selection for Instance-Based Learning Algorithms
By HENRY BRIGHTON, Language Evolution and Computation
Research Unit, Department of Theoretical and Applied
Linguistics, The University of Edinburgh, Edinburgh, EH8 9LL, UK
 CHRIS MELLISH, Department of Artificial Intelligence,
The University of Edinburgh, Edinburgh EH1 1HN, UK
 Superlinear Parallelization of k-Nearest Neighbor Retrieval
By Antal van den Bosch Ko van der SlootILK Research Group
Dept. of Communication and Information Sciences,
Tilburg University, P.O. Box 90153, NL-5000 LE Tilburg,
The Netherlands
 Parallel Algorithms on Nearest Neighbor Search
By BERKAY AYDIN, Georgia State University
 K-Nearest-Neighbor Consistency in Data Clustering: Incorporating Local
Information into Global Optimization
By Chris Ding and Xiaofeng He
 Instance-based classifiers applied to medical databases: Diagnosis and knowledge
extraction
By Francesco Gagliardi, Department of Philosophy,
University of Rome
Thank You!...

More Related Content

What's hot

Smith et al. - Efficient auditory coding (Nature 2006)
Smith et al. - Efficient auditory coding (Nature 2006)Smith et al. - Efficient auditory coding (Nature 2006)
Smith et al. - Efficient auditory coding (Nature 2006)
xrampino
 
Fundamentals of data structures ellis horowitz & sartaj sahni
Fundamentals of data structures   ellis horowitz & sartaj sahniFundamentals of data structures   ellis horowitz & sartaj sahni
Fundamentals of data structures ellis horowitz & sartaj sahni
Hitesh Wagle
 
A Novel Framework for Short Tandem Repeats (STRs) Using Parallel String Matching
A Novel Framework for Short Tandem Repeats (STRs) Using Parallel String MatchingA Novel Framework for Short Tandem Repeats (STRs) Using Parallel String Matching
A Novel Framework for Short Tandem Repeats (STRs) Using Parallel String Matching
IJERA Editor
 

What's hot (19)

Data Applied:Forecast
Data Applied:ForecastData Applied:Forecast
Data Applied:Forecast
 
Getting the Most out of Transition-based Dependency Parsing
Getting the Most out of Transition-based Dependency ParsingGetting the Most out of Transition-based Dependency Parsing
Getting the Most out of Transition-based Dependency Parsing
 
Master Thesis of Computer Engineering: OpenTranslator
Master Thesis of Computer Engineering: OpenTranslatorMaster Thesis of Computer Engineering: OpenTranslator
Master Thesis of Computer Engineering: OpenTranslator
 
Resume of Gaurang Rathod, Embedded Software Developer
Resume of Gaurang Rathod, Embedded Software DeveloperResume of Gaurang Rathod, Embedded Software Developer
Resume of Gaurang Rathod, Embedded Software Developer
 
Smith et al. - Efficient auditory coding (Nature 2006)
Smith et al. - Efficient auditory coding (Nature 2006)Smith et al. - Efficient auditory coding (Nature 2006)
Smith et al. - Efficient auditory coding (Nature 2006)
 
Time-series forecasting of indoor temperature using pre-trained Deep Neural N...
Time-series forecasting of indoor temperature using pre-trained Deep Neural N...Time-series forecasting of indoor temperature using pre-trained Deep Neural N...
Time-series forecasting of indoor temperature using pre-trained Deep Neural N...
 
Fundamentals of data structures ellis horowitz & sartaj sahni
Fundamentals of data structures   ellis horowitz & sartaj sahniFundamentals of data structures   ellis horowitz & sartaj sahni
Fundamentals of data structures ellis horowitz & sartaj sahni
 
The Concurrent Constraint Programming Research Programmes -- Redux
The Concurrent Constraint Programming Research Programmes -- ReduxThe Concurrent Constraint Programming Research Programmes -- Redux
The Concurrent Constraint Programming Research Programmes -- Redux
 
Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information Retrieval
 
Role of Tensors in Machine Learning
Role of Tensors in Machine LearningRole of Tensors in Machine Learning
Role of Tensors in Machine Learning
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
COMPUTER LABORATORY-4 LAB MANUAL BE COMPUTER ENGINEERING
COMPUTER LABORATORY-4 LAB MANUAL BE COMPUTER ENGINEERINGCOMPUTER LABORATORY-4 LAB MANUAL BE COMPUTER ENGINEERING
COMPUTER LABORATORY-4 LAB MANUAL BE COMPUTER ENGINEERING
 
Probabilistic content models,
Probabilistic content models,Probabilistic content models,
Probabilistic content models,
 
W4301117121
W4301117121W4301117121
W4301117121
 
L010628894
L010628894L010628894
L010628894
 
Aa04404164169
Aa04404164169Aa04404164169
Aa04404164169
 
Performance of Matching Algorithmsfor Signal Approximation
Performance of Matching Algorithmsfor Signal ApproximationPerformance of Matching Algorithmsfor Signal Approximation
Performance of Matching Algorithmsfor Signal Approximation
 
Traffic Classification
Traffic ClassificationTraffic Classification
Traffic Classification
 
A Novel Framework for Short Tandem Repeats (STRs) Using Parallel String Matching
A Novel Framework for Short Tandem Repeats (STRs) Using Parallel String MatchingA Novel Framework for Short Tandem Repeats (STRs) Using Parallel String Matching
A Novel Framework for Short Tandem Repeats (STRs) Using Parallel String Matching
 

Similar to Dynamically updated parallel k-NN

Machine Learning
Machine LearningMachine Learning
Machine Learning
butest
 
Design & implementation of machine learning algorithm in (2)
Design & implementation of machine learning algorithm in (2)Design & implementation of machine learning algorithm in (2)
Design & implementation of machine learning algorithm in (2)
saurabh Kumar Chaudhary
 
instance bases k nearest neighbor algorithm.ppt
instance bases k nearest neighbor algorithm.pptinstance bases k nearest neighbor algorithm.ppt
instance bases k nearest neighbor algorithm.ppt
Johny139575
 

Similar to Dynamically updated parallel k-NN (20)

Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Enhancing Classification Accuracy of K-Nearest Neighbors Algorithm using Gain...
Enhancing Classification Accuracy of K-Nearest Neighbors Algorithm using Gain...Enhancing Classification Accuracy of K-Nearest Neighbors Algorithm using Gain...
Enhancing Classification Accuracy of K-Nearest Neighbors Algorithm using Gain...
 
EFFECTIVENESS PREDICTION OF MEMORY BASED CLASSIFIERS FOR THE CLASSIFICATION O...
EFFECTIVENESS PREDICTION OF MEMORY BASED CLASSIFIERS FOR THE CLASSIFICATION O...EFFECTIVENESS PREDICTION OF MEMORY BASED CLASSIFIERS FOR THE CLASSIFICATION O...
EFFECTIVENESS PREDICTION OF MEMORY BASED CLASSIFIERS FOR THE CLASSIFICATION O...
 
Occurrence Prediction_NLP
Occurrence Prediction_NLPOccurrence Prediction_NLP
Occurrence Prediction_NLP
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor Classifier
 
Machine learning
Machine learningMachine learning
Machine learning
 
Packet Classification using Support Vector Machines with String Kernels
Packet Classification using Support Vector Machines with String KernelsPacket Classification using Support Vector Machines with String Kernels
Packet Classification using Support Vector Machines with String Kernels
 
Machine Learning 1
Machine Learning 1Machine Learning 1
Machine Learning 1
 
Design & implementation of machine learning algorithm in (2)
Design & implementation of machine learning algorithm in (2)Design & implementation of machine learning algorithm in (2)
Design & implementation of machine learning algorithm in (2)
 
deepnet-lourentzou.ppt
deepnet-lourentzou.pptdeepnet-lourentzou.ppt
deepnet-lourentzou.ppt
 
Text Categorization Using Improved K Nearest Neighbor Algorithm
Text Categorization Using Improved K Nearest Neighbor AlgorithmText Categorization Using Improved K Nearest Neighbor Algorithm
Text Categorization Using Improved K Nearest Neighbor Algorithm
 
K means report
K means reportK means report
K means report
 
A Study of Efficiency Improvements Technique for K-Means Algorithm
A Study of Efficiency Improvements Technique for K-Means AlgorithmA Study of Efficiency Improvements Technique for K-Means Algorithm
A Study of Efficiency Improvements Technique for K-Means Algorithm
 
Clustering
ClusteringClustering
Clustering
 
artificial intelligence.pptx
artificial intelligence.pptxartificial intelligence.pptx
artificial intelligence.pptx
 
Av33274282
Av33274282Av33274282
Av33274282
 
Nearest neighbour algorithm
Nearest neighbour algorithmNearest neighbour algorithm
Nearest neighbour algorithm
 
3.5 model based clustering
3.5 model based clustering3.5 model based clustering
3.5 model based clustering
 
Instance based learning
Instance based learningInstance based learning
Instance based learning
 
instance bases k nearest neighbor algorithm.ppt
instance bases k nearest neighbor algorithm.pptinstance bases k nearest neighbor algorithm.ppt
instance bases k nearest neighbor algorithm.ppt
 

Dynamically updated parallel k-NN

  • 1. Dynamically updated parallel k-NN search algorithm using MPI
  • 2. Keywords:  MPI (Message Passing Interface)  Machine Learning  Classification  k-NN  Clustering  K-means
  • 3. # MPI ( Message Passing Interface)  MPI is an industrial standard that specifies library routines needed for writing message passing programs  MPI uses a library approach to support parallel programming
  • 4. # Machine Learning  Ability of a system to learn without being coded.  To get more proficient in performing a task from being familiar, with the time.
  • 5. # Classification  Supervised machine Learning Approach.  Supervised in the sense that we know the class of the training instances to which it belongs.  To predict the class of a test instance on the basis of some similarity measure.
  • 6. k-NN Classifier  A famous classification approach  Based on the assumption that a instance has similar class to which it closest or close in the feature space.  To classify a test instance ,find its k-nearest neighbor according to some kind of similarity . Classify on the basis of majority .
  • 7. # Clustering  Unsupervised Machine Learning Approach.  Unsupervised as we don’t know the class of the training instances to which they belong.  Partition the data so as instances in one group are more similar but too different from the others.
  • 8. K-means Clustering  A clustering approach , to cluster the training instances into k-number of cluster. The value of k being provided by the user.  Clustering is done by a utilizing a SSE (sum of square error) function.
  • 9. Problems with the k-NN  Time complexity  Sensitive to the local structure of the data.  Curse of dimensionality.
  • 10. # Our Proposed Approach to solve k-NN parallel using MPI.  Pre-processing step  Perform clustering process on training set to divide it into p mutually exclusive partition {P1,P2,…,Pp}, where p is number of process .  Create the Representative Instance to represent each partition
  • 11. # Step- IInd For i =1 to p  Apply k-means approach  Evaluate nearest neighbor similarity of training instances with representative instance(centroid) of each partition.  Perform  Competence Enhancement – Repeated Wilson Editing Rule (noise removal)  Competence Preservation (removal of superfluous instance)  Store the outliers of each cluster separately.  Update the centroid of the cluster.  Repeat step I & II until number of instances in the selected one partition >=k.
  • 12. # Step- IIIrd  Take a test instance .  Select the partition whose R.I is closest to test instance.  Repeat until reach at the last sub-partition.  Apply the majority rule.  Select the class label who has majority for the test instance.
  • 13. # Updation of training set  When the similarity value of the new test instance with the R.I. of the different partition exceeds the max radius value which we store during the pre-processing step.  Update the R.I. of that partition only, which is closed to the new test instance
  • 14. # Research papers considered to design the dynamically updated parallel k-NN using MPI. # For Preprocessing (Clustering process ):  Efficient and Fast Initialization Algorithm for K-means Clustering By Mohammed El Agha, Wesam M. Ashour , Islamic University of Gaza, Gaza, Palestine  A new algorithm for initial cluster centers in k-means algorithm By Murat Erisoglu, Nazif Calis, Sadullah Sakallioglu Department of Statistics, Faculty of Science and Letters, Cukurova University, 01300 Adana, Turkey  An empirical comparison of four initialization methods for the K-Means algorithm By J.M. Pe~na *,1, J.A. Lozano, P. Larra~naga, Department of Computer Science and Artificial Intelligence, Intelligent Systems Group, University of the Basque Country, P.O. Box 649,E-20080 San Sebastian, Spain
  • 15. # For finding k-NN and removal of Noise and Superfluous Instances.  Fast Condensed Nearest Neighbor Rule By Fabrizio Angiulli ICAR-CNR, Via Pietro Bucci 41C, 87036 Rende (CS), Italy  Advances in Instance Selection for Instance-Based Learning Algorithms By HENRY BRIGHTON, Language Evolution and Computation Research Unit, Department of Theoretical and Applied Linguistics, The University of Edinburgh, Edinburgh, EH8 9LL, UK  CHRIS MELLISH, Department of Artificial Intelligence, The University of Edinburgh, Edinburgh EH1 1HN, UK
  • 16.  Superlinear Parallelization of k-Nearest Neighbor Retrieval By Antal van den Bosch Ko van der SlootILK Research Group Dept. of Communication and Information Sciences, Tilburg University, P.O. Box 90153, NL-5000 LE Tilburg, The Netherlands  Parallel Algorithms on Nearest Neighbor Search By BERKAY AYDIN, Georgia State University  K-Nearest-Neighbor Consistency in Data Clustering: Incorporating Local Information into Global Optimization By Chris Ding and Xiaofeng He  Instance-based classifiers applied to medical databases: Diagnosis and knowledge extraction By Francesco Gagliardi, Department of Philosophy, University of Rome