3. # MPI ( Message Passing Interface)
MPI is an industrial standard that specifies library routines needed for writing
message passing programs
MPI uses a library approach to support parallel programming
4. # Machine Learning
Ability of a system to learn without being coded.
To get more proficient in performing a task from being familiar, with the
time.
5. # Classification
Supervised machine Learning Approach.
Supervised in the sense that we know the class of the training instances to
which it belongs.
To predict the class of a test instance on the basis of some similarity measure.
6. k-NN Classifier
A famous classification approach
Based on the assumption that a instance has similar class to which it closest
or close in the feature space.
To classify a test instance ,find its k-nearest neighbor according to some kind
of similarity . Classify on the basis of majority .
7. # Clustering
Unsupervised Machine Learning Approach.
Unsupervised as we don’t know the class of the training instances to which
they belong.
Partition the data so as instances in one group are more similar but too
different from the others.
8. K-means Clustering
A clustering approach , to cluster the training instances into k-number of
cluster. The value of k being provided by the user.
Clustering is done by a utilizing a SSE (sum of square error) function.
9. Problems with the k-NN
Time complexity
Sensitive to the local structure of the data.
Curse of dimensionality.
10. # Our Proposed Approach to solve k-NN
parallel using MPI.
Pre-processing step
Perform clustering process on training set to divide it into p mutually
exclusive partition {P1,P2,…,Pp}, where p is number of process .
Create the Representative Instance to represent each partition
11. # Step- IInd
For i =1 to p
Apply k-means approach
Evaluate nearest neighbor similarity of training instances with representative
instance(centroid) of each partition.
Perform
Competence Enhancement – Repeated Wilson Editing Rule (noise removal)
Competence Preservation (removal of superfluous instance)
Store the outliers of each cluster separately.
Update the centroid of the cluster.
Repeat step I & II until number of instances in the selected one partition >=k.
12. # Step- IIIrd
Take a test instance .
Select the partition whose R.I is closest to test instance.
Repeat until reach at the last sub-partition.
Apply the majority rule.
Select the class label who has majority for the test instance.
13. # Updation of training set
When the similarity value of the new test instance with the R.I. of the different
partition exceeds the max radius value which we store during the pre-processing
step.
Update the R.I. of that partition only, which is closed to the new test instance
14. # Research papers considered to design the
dynamically updated parallel k-NN using MPI.
# For Preprocessing (Clustering process ):
Efficient and Fast Initialization Algorithm for K-means Clustering
By Mohammed El Agha, Wesam M. Ashour ,
Islamic University of Gaza, Gaza, Palestine
A new algorithm for initial cluster centers in k-means algorithm
By Murat Erisoglu, Nazif Calis, Sadullah Sakallioglu
Department of Statistics, Faculty of Science and Letters,
Cukurova University, 01300 Adana, Turkey
An empirical comparison of four initialization methods for the K-Means algorithm
By J.M. Pe~na *,1, J.A. Lozano, P. Larra~naga,
Department of Computer Science and Artificial Intelligence,
Intelligent Systems Group, University of the Basque Country,
P.O. Box 649,E-20080 San Sebastian, Spain
15. # For finding k-NN and removal of Noise and Superfluous Instances.
Fast Condensed Nearest Neighbor Rule
By Fabrizio Angiulli
ICAR-CNR, Via Pietro Bucci 41C, 87036 Rende (CS), Italy
Advances in Instance Selection for Instance-Based Learning Algorithms
By HENRY BRIGHTON, Language Evolution and Computation
Research Unit, Department of Theoretical and Applied
Linguistics, The University of Edinburgh, Edinburgh, EH8 9LL, UK
CHRIS MELLISH, Department of Artificial Intelligence,
The University of Edinburgh, Edinburgh EH1 1HN, UK
16. Superlinear Parallelization of k-Nearest Neighbor Retrieval
By Antal van den Bosch Ko van der SlootILK Research Group
Dept. of Communication and Information Sciences,
Tilburg University, P.O. Box 90153, NL-5000 LE Tilburg,
The Netherlands
Parallel Algorithms on Nearest Neighbor Search
By BERKAY AYDIN, Georgia State University
K-Nearest-Neighbor Consistency in Data Clustering: Incorporating Local
Information into Global Optimization
By Chris Ding and Xiaofeng He
Instance-based classifiers applied to medical databases: Diagnosis and knowledge
extraction
By Francesco Gagliardi, Department of Philosophy,
University of Rome