Qrsvm (fast and communication efficient algorithm for distributed support vector machine training) screenshots
1. Venkat Java Projects
Mobile:+91 9966499110
Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com
Fast and Communication-Efficient Algorithm for Distributed Support
Vector Machine Training
Support Vector Machine is a machine learning algorithm used for classification
and this is a widely used algorithm in all fields such as heart disease to classify
whether patient hasheart diseaseor not, image recognition to recognizehuman
faces, object detection from images etc. SVM will be trained with past data
(dataset) to build training model and whenever user give new data then SVM
will apply train model on new test data to predict/classify class of new data. But
this algorithm will take much time to build train model when dataset set
increases and to overcome from this problem various parallel algorithms
introduced but those algorithm not concentrate on memory usage.
To overcomefromaboveproblem in this paper author introducenew algorithm
called Distributed Algorithm for SVM training that is scalable and
communication-efficient. The algorithm uses a compact representation of the
kernel matrix, which is based on the QR decomposition of low-rank
approximations, to reduce both computation and storage requirements for the
training stage. This is accompanied by considerablereduction in communication
required for a distributed implementation of the algorithm. In short this
technique also called as QRSVM Framework.
This algorithm consists of two parts called Distributed QR Decomposition and
Parallel Dual Ascent.
1) Distributed QR Decomposition: using this module we can read large
dataset and then prepare a matrix and this matrix will be decompose to
multiple partitions and each partition will be submit or distribute to
differentcores (computer running atvarious location). Each corewill take
partition data and then apply Distributed SVM on that partition data to
generate train model and then core will send back generated model to
master core (partition sender). Master core will gather model from all
cores. In existing technique a single machine was responsible to process
entire large dataset butin proposetechnique a same largedataset will be
distributed to multiple cores and to reduce execution time. In simple
terms propose paper share data processing task with multiple systems.
2) Parallel Dual Ascent: Using this module we will send all generated
partitions to different cores. This module responsibleto compute parallel
data processing and send output to master core for gathering.
2. Venkat Java Projects
Mobile:+91 9966499110
Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com
Above two techniques worked by using two algorithms
First algorithm responsible to decompose matrix to partitions and second
algorithm responsible to execute all partitions via parallel computation.
Resultant output will send to master core to gather generated output or train
model.
To implement this projectweare using ForestCovertypeDatasetwhich consists
of 20000 records and each records has 55 columns. This dataset is all about
forest information such as forestsize, soil type of that forest etc. We are using
this dataset for decomposition and parallel computation.
To implement this application we design 3 modules
1) Parallel Distributed Core1: This module will take data from ‘Parallel Dual
Ascent’ and process matrix with parallel computation. This module
receive data in the form of partitions and create separate thread to
process each partition in parallel way.
2) Parallel Distributed Core2: This will do the same job as Core1 but just to
show distributed processing we should have multiple systems so we
design two application and each application act like one system. If you
want you can run both this cores 1 and 2 in same or different system.
Whenever master core send partition then this 2 cores will receive to
process it.
3) Master Core: This application responsibleto read dataset and then build
a matrix and then this matrix will be decomposeto partitions and send all
partitions to core 1 and 2 to generate training model.
SVM Algorithm Details
Machine learning involves predicting and classifying data and to do so we
employ various machine learning algorithms according to the dataset. SVM or
Support Vector Machine is a linear model for classification and regression
problems. It can solve linear and non-linear problems and work well for many
practical problems. The idea of SVM is simple: The algorithm creates a line or a
hyperplanewhich separatesthe data into classes.Inmachinelearning, theradial
basis function kernel, or RBF kernel, is a popular kernelfunction used in various
kernelized learning algorithms. In particular, it is commonly used in support
vector machine classification. As a simple example, for a classification task with
3. Venkat Java Projects
Mobile:+91 9966499110
Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com
only two features (like the image above), you can think of a hyperplaneas a line
that linearly separates and classifies a set of data.
Intuitively, the further from the hyperplane our data points lie, the more
confident we are that they have been correctly classified. We therefore want
our data points to be as far away from the hyperplane as possible, while still
being on the correct side of it.
So when new testing data is added, whatever sideof the hyperplaneit lands will
decide the class that we assign to it.
How do we find the right hyperplane?
Or, in other words, how do we best segregate the two classes within the data?
The distancebetween the hyperplaneand the nearestdata point fromeither set
is known as the margin. The goal is to choose a hyperplane with the greatest
possible margin between the hyperplane and any point within the training set,
giving a greater chance of new data being classified correctly.
Screen shots
First double click on ‘run.bat’ file from ‘ParallelDistributedCore1’ folder to get
below screen and let it run
4. Venkat Java Projects
Mobile:+91 9966499110
Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com
In above screen we can see distributed core1 system started. Now double click
on ‘run.bat’ file from ‘ParallelDistributedCore2’ folder to get below screen and
let it run
In abovescreen wecan see distributed CORE2 system also started. Now click on
‘run.bat’ file from ‘MasterCore’ folder to get below screen
5. Venkat Java Projects
Mobile:+91 9966499110
Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com
In above screen click on ‘Upload CovType Dataset’ button to upload dataset
In abovescreen dataset is uploading, after dataset upload will get below screen
Now click on ‘ParallelQRDecompositionMatrixPartition’ button to read dataset
and then decompose to partitions
6. Venkat Java Projects
Mobile:+91 9966499110
Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com
In above screen we can see all dataset and partition details. Now click on
‘ParallelDualAscentPartition Send To Core’ button to send partitions to running
core. After clicking on this button we will get generated training model details
for each partitions on all cores
7. Venkat Java Projects
Mobile:+91 9966499110
Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com
Inabove2 screens Core1and Core2we can see which partition they receiveand
process and the accuracy of that partition train model. After processing all
partitions will get below message
Master core also display processed received details from each core of each
partitions. Now click on ‘Distributed QR Decomposition Time Graph’ button to
get below processing time graph
8. Venkat Java Projects
Mobile:+91 9966499110
Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com
In above graph x-axis represents technique name and y-axis represents time
taken to complete that technique task. Local QR Time represents time to
decompose matrix and Master QR time represents time to send and process
partitions andGather time representsmaster coretotaltime to gather partitions
output coming from cores