SlideShare a Scribd company logo
1 of 7
Download to read offline
Rupak Bhattacharyya et al. (Eds) : ACER 2013,
pp. 287–293, 2013. © CS & IT-CSCP 2013 DOI : 10.5121/csit.2013.3227
TOWARDS REDUCTION OF DATA FLOW
IN A DISTRIBUTED NETWORK USING
PRINCIPAL COMPONENT ANALYSIS
Devadatta Sinha1
and Anal Acharya2
1
Department of Computer Science and Engineering,
Calcutta University, Kolkata, India
devadatta.sinha@gmail.com
2
Computer Science Department, St. Xavier’s College, Kolkata, India.
anal.acharya@sxccal.edu
ABSTRACT
For performing distributed data mining two approaches are possible: First, data from several
sources are copied to a data warehouse and mining algorithms are applied in it. Secondly,
mining can performed at the local sites and the results can be aggregated. When the number of
features is high, a lot of bandwidth is consumed in transferring datasets to a centralized
location. For this dimensionality reduction can be done at the local sites. In dimensionality
reduction a certain encoding is applied on data so as to obtain its compressed form. The
reduced features thus obtained at the local sites are aggregated and data mining algorithms are
applied on them. There are several methods of performing dimensionality reduction. Two most
important ones are Discrete Wavelet Transforms (DWT) and Principal Component Analysis
(PCA). Here a detailed study is done on how PCA could be useful in reducing data flow across
a distributed network.
KEYWORDS
Distributed Data Mining (DDM), Principal Component Analysis (PCA), Eigen Vector,
Dimensionality Reduction.
1. INTRODUCTION
In the recent years data mining has gained a lot of importance. This is mainly due to what is
called as Knowledge Discovery from Data (KDD). This mainly deals with searching of
interesting patterns from large data sets. As stated in [2], data mining is an essential step in the
process of knowledge discovery. [5] classified three types of data mining algorithms: the first
type deals with the application of data mining algorithms on single valued vectors, the second
type deals with large data sets and the third type deals with distributed databases.
The focus of this paper is on distributed databases. This is because data today is maintained in a
distributed nature due to changing nature of business application. Also these applications are
scalable with the growth of business. Distributed Data Mining (DDM) can be carried out
following two models [1, 6]: Firstly, all data from each of the locations be transferred to a
centralized site and integrated to form a data warehouse. Then the required patterns can be
derived using a suitable data mining algorithm. In the second approach, a summary of the data at
each location is sent to the centralized locations [9]. The advantage of this approach is that it
involves much less data transfer. Basically what is done is that a certain mapping is applied on
288 Computer Science & Information Technology (CS & IT)
that data to reduce its dimension. A reverse mapping is then applied to get the original data. Thus
data flow is reduced across the distributed network.
To this end we assume a very simple framework that can perform distributed data mining. The
framework in brief consists of the following steps:
1. Data Preprocessing: This consists of data cleaning, dimension reduction etc.
2. Data Integration: Data from multiple sources are combined.
3. Data Mining: Various algorithms are applied on data to obtain interesting patterns.
4. Knowledge Discovery: Use the pattern thus obtained to perform analysis.
The paper concentrates on the first step i.e. data preprocessing. Piramuthu[9] concludes that 80
percent of the resources in a majority of data mining applications are spent on preprocessing of
data. This is because databases typically consist of data which are noisy, inconsistent and contain
missing values due to integration of data from various sources. One of the methods of data
preprocessing is dimension reduction. Two methods are studied here: Principal Component
Analysis (PCA) and Discrete Wavelet Transforms (DWT). In particular, we study how PCA can
be used to reduce data flow in a distributed system. We construct a distributed data mining
architecture along these lines. The PCA algorithm is next studied in details. This algorithm is then
applied to distributed systems. We next determine the computing complexity of the algorithm.
Finally, a simulation is done on sample data to illustrate how dimensionality reduction is done at
local sites.
2. LITERATURE SURVEY
The papers surveyed can be divided into two major categories: the models and the corresponding
mathematical tools. Kargupta el al[1] has done an exhaustive study on Distributed Data
Mining(DDM). In this he proposes a framework for DDM. Various data preprocessing techniques
for heterogeneous and homogeneous data sites are discussed in details. In [3] an agent based
DDM architecture is developed. The concepts of Parallel Data Mining agents are used. Qi et al[4]
in their paper proposes a novel method for performing distributed DDM. Faisal et al.[5] in their
work developed an algorithm called Distributed Fast Map for performing dimension reduction.
Mohanty et al[8] proposed a classification technique based on chi-square testing and then
selecting the best features. In [9] Piramuthu developed a novel method of feature selection using a
probabilistic distance based method. Association rules can also be used, particularly in text
mining[10]. The use of unsupervised learning has been used for online feature selection in [11].
Finally, an evaluation technique called stability index has been proposed in [12].
Li[6] in his work made a survey on the use of wavelets in data mining. It contains a detailed study
on the use of wavelets in Data Management, Preprocessing and various data mining algorithms.
The use of wavelets in clustering and classification is particularly interesting. Finally, Smith [7]
has given very lucid account of the mathematics involved in the working of PCA.
3. DIMENSION REDUCTION
While processing large databases the problem of high dimensionality of feature space is
encountered. For example a row in a particular table may contain several columns. In a
distributed database, it is virtually impossible to import data of such dimension to a centralized
location. Thus to improve computational efficiency, data is projected in a smaller space[6].
Smaller space also helps in visualizing and analyzing data. The goal here is to create a low
dimension space which yields maximum information. We propose the use two mathematical tools
Computer Science & Information Technology (CS & IT) 289
for this purpose. PCA is discussed in details here. We give a basic outline of our use of Wavelets
here. We leave the details for future work.
3.1 Wavelets
We suppose that we want to reduce the data to p columns where the original data set consists of n
columns, p<<n. We apply the simplest type of wavelet called Haar Wavelets on the feature
vector. This gives us a set of approximations and detailed coefficients. Their number decreases as
we increase the number of resolutions. Thus we can keep the first p coefficients and approximate
the others with zeros. A reverse transformation is applied on these to yield the original data.
3.2 Principal Component Analysis(PCA)
Now we propose the use of PCA algorithm for dimension reduction. It is to be noted that the
resulting vector set is orthogonal in nature. Thus we are able to reduce the vector set into much
smaller space. Here we enumerate the steps for performing PCA:
Step 1: For simplicity we assume a bi-variate data set {X,Y}.
Step 2: Compute the mean. We assume they are m1 and m2. We apply the transformation {X-
m1,Y-m2} on this data set. For simplicity we call them {P,Q}. This matrix is renamed M.
Step 3: We compute the covariance matrix,
Cov(P,P) Cov(P,Q)
Cov(Q,P) Cov(Q,Q)
We call this matrix C.
Step 4: We compute the Eigen values E and Eigen vectors of the covariance matrix.
E= (e1,e2). The Eigen vector is given as
V11 v12
V21 v22
We call this vector V. We note that this vector is a unit vector.
Step 5: We find the feature vector. This is the Eigen vector with the highest Eigen value. This
could be any one of [v11 v21]T
or [v12 v22]T
.
Step 6: Derive the transformed data matrix D:
D= VT
*MT
Step 7: Get back the original data using the transformation:
O= VT
*D
Step 8: Add the mean that was originally subtracted.
The above algorithm can be extended to any number of variables. The use of this algorithm in
distributed database is discussed in the next section.
290 Computer Science & Information Technology (CS & IT)
4. THE FRAMEWORK
The data mining architecture that we use is shown in the figure below:
Data Data Data
Source1 Source2 Sourcen
Fig 1: Our framework for DDM using PCA
We assume that there are n data sources represented by db1, db2,....... ,dbn. We assume that the
database is homogeneous in nature. Each of the rows has n vectors. We apply the PCA algorithm
stated earlier at each site and reduce it to k vectors, k<n. This gives us a uniform reduced set of
vectors at each local site. These vectors are then transferred to a centralized location to form
aggregate data set. Data mining tools can then be applied on this data.
5. COMPUTATIONAL COMPLEXITY
For simplicity, we perform this computation on a single site db1 and the rest follows from it. As
stated earlier, using PCA we reduce the dimension to p from N where p<<N. If the number of
rows in the table is m, without the use of PCA the amount of data to be transmitted is of the order
O(mN) whereas if we use PCA it reduces to O(mp). Clearly, considering that there are n data
sites, the complexity reduces to O(nmp) from O(nmN). Thus a large amount of reduction of
computational complexity is achieved by using PCA in a distributed network.
6. RESULTS
The above algorithm was implemented on a core i3 processor of 3.3 GHZ. For simplicity, we
perform PCA on the following bi-variate data set:
X 0.5 2.2 3.1 2.3 1.0 1.5 1.1
Y 0.7 2.9 3.0 2.7 1.1 1.6 0.9
Table 1: Original Data
This yields the following graph:
PCA Algorithm
Algorithm
PCA Algorithm
Reduced features
at local site
Reduced features
at local site
Reduced features
at local site
Reduced features
aggregation
Data Mining
Tools
PCA Algorithm
Algorithm
Computer Science & Information Technology (CS & IT) 291
Fig 2: Original data Plotted
We follow the steps enumerated earlier and obtain the covariance matrix:
0.6165 0.6154
0.6154 0.7165
Using a ‘C’ program we obtain the following Eigen values [0.0491 1.2840]
These yield the following set of Eigen Vectors
-0.7352 -0.6779
0.6778 -0.7352
Fig 3: Normalized data on application of the Eigen Vectors obtained from the covariance matrix.
The red line gives us the feature vector
-0.6779 -0.7353
-0.7352 0.6772
292 Computer Science & Information Technology (CS & IT)
The transformed data is obtained as
P 1.7775 -0.9922 -1.6758 -0.9129 1.1445 0.4380 1.2238
Q 0.1428 0.3843 -0.2094 0.1753 0.0464 0.0177 -0.1626
Table 2: Data obtained after applying PCA
Fig 4: Plot of the data obtained after applying PCA
Original data can be obtained by applying the reverse transformation as discussed in the earlier
section.
7. CONCLUSION AND FUTURE WORK
In this paper we made a study of a method used for dimension reduction in context of DDM.
There are other methods as well like applying Discrete Wavelets Transforms (DWT) on feature
vector to reduce dimensions. The architecture we have developed in rudimentary in nature. We
could expand this framework by adding agents to perform suitable tasks. Also some modelling
techniques could be used to model these. We leave these all for our future work.
REFERENCES
[1] Kargupta,Park, (2002), “Algorithms, Systems and Applications”, ABC Transactions on ECE, Vol.
10, No. 5, pp120-122.
[2] Haan and Kamber (2011) Data Mining: Concepts and Techniques, Third Edition, Morgan Kaufman
Publishers.
[3] Kargupta (2003), “Scalable, Distributed data mining using agent based architecture”.
[4] Qi,Wang,Birdwell, (2001), “Global Principal Component Analysis for performing dimension
reduction in distributed data mining”.
[5] Faisal,Samatova,Ostrouchov, (2007), “Distributed Dimension Reduction Algorithms for Widely
Dispersed Data”.
[6] Li,Zhu,Oghihara, (2009), “A survey on the use of wavelets in data mining”, SIGKDD
Explorations, Volume 4, Issue 2, page 49-69
[7] Smith (2009),”A Tutorial on Principal Component Analysis”
[8] Bhuyan,Mohanty,Das (2012), “Privacy Preservation for feature selection in data mining using
Centralized network” IJCSI, Vol. 3, No. 2.
[9] Piramuthu, (2004), “Evaluating feature Selection methods for learning in data mining applications”,
European Journal on Operations Research”, Vol. 9.
Computer Science & Information Technology (CS & IT) 293
[10] Do,Hui,Fong, (2007), “Associative feature Selection methods for text mining”.
[11] Hoi,Wang,Zhao, (2005), “Online Feature Selection for Mining big data”, Bignine, China
[12] Kurcheva, (2012), “A stability index for feature selection” 25th
IASTAD International Conference on
Artificial Intelligence, Austria
[13] Kargupta, Huang, SivaKumar,Johnson, (2001), “Distributed clustering using Collective Principal
Component analysis”, Knowledge and information Systems , Springer-Verlag,, London.
AUTHORS
Prof Devadatta Sinha is a Professor in University of Calcutta. He has over 30 years of
teaching and research experience. His interests include Distributed Processing, Software
Engineering.
Prof Anal Acharya is currently Head of the Department of Computer Science, St Xavier’s
College, Kolkata. His research interests include Distributed Data Mining Algorithms.

More Related Content

What's hot

MULTIDIMENSIONAL ANALYSIS FOR QOS IN WIRELESS SENSOR NETWORKS
MULTIDIMENSIONAL ANALYSIS FOR QOS IN WIRELESS SENSOR NETWORKSMULTIDIMENSIONAL ANALYSIS FOR QOS IN WIRELESS SENSOR NETWORKS
MULTIDIMENSIONAL ANALYSIS FOR QOS IN WIRELESS SENSOR NETWORKSijcses
 
Operating Task Redistribution in Hyperconverged Networks
Operating Task Redistribution in Hyperconverged Networks Operating Task Redistribution in Hyperconverged Networks
Operating Task Redistribution in Hyperconverged Networks IJECEIAES
 
Shortest path estimation for graph
Shortest path estimation for graphShortest path estimation for graph
Shortest path estimation for graphijdms
 
ENERGY-EFFICIENT DATA COLLECTION IN CLUSTERED WIRELESS SENSOR NETWORKS EMPLOY...
ENERGY-EFFICIENT DATA COLLECTION IN CLUSTERED WIRELESS SENSOR NETWORKS EMPLOY...ENERGY-EFFICIENT DATA COLLECTION IN CLUSTERED WIRELESS SENSOR NETWORKS EMPLOY...
ENERGY-EFFICIENT DATA COLLECTION IN CLUSTERED WIRELESS SENSOR NETWORKS EMPLOY...ijwmn
 
Skyline Query Processing using Filtering in Distributed Environment
Skyline Query Processing using Filtering in Distributed EnvironmentSkyline Query Processing using Filtering in Distributed Environment
Skyline Query Processing using Filtering in Distributed EnvironmentIJMER
 
Clustering using kernel entropy principal component analysis and variable ker...
Clustering using kernel entropy principal component analysis and variable ker...Clustering using kernel entropy principal component analysis and variable ker...
Clustering using kernel entropy principal component analysis and variable ker...IJECEIAES
 
The Impact of Data Replication on Job Scheduling Performance in Hierarchical ...
The Impact of Data Replication on Job Scheduling Performance in Hierarchical ...The Impact of Data Replication on Job Scheduling Performance in Hierarchical ...
The Impact of Data Replication on Job Scheduling Performance in Hierarchical ...graphhoc
 
Automated Clustering Project - 12th CONTECSI 34th WCARS
Automated Clustering Project - 12th CONTECSI 34th WCARS Automated Clustering Project - 12th CONTECSI 34th WCARS
Automated Clustering Project - 12th CONTECSI 34th WCARS TECSI FEA USP
 
Decision tree clustering a columnstores tuple reconstruction
Decision tree clustering  a columnstores tuple reconstructionDecision tree clustering  a columnstores tuple reconstruction
Decision tree clustering a columnstores tuple reconstructioncsandit
 
A general weighted_fuzzy_clustering_algorithm
A general weighted_fuzzy_clustering_algorithmA general weighted_fuzzy_clustering_algorithm
A general weighted_fuzzy_clustering_algorithmTA Minh Thuy
 
Clustering Algorithms for Data Stream
Clustering Algorithms for Data StreamClustering Algorithms for Data Stream
Clustering Algorithms for Data StreamIRJET Journal
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Real-time PMU Data Recovery Application Based on Singular Value Decomposition
Real-time PMU Data Recovery Application Based on Singular Value DecompositionReal-time PMU Data Recovery Application Based on Singular Value Decomposition
Real-time PMU Data Recovery Application Based on Singular Value DecompositionPower System Operation
 

What's hot (19)

MULTIDIMENSIONAL ANALYSIS FOR QOS IN WIRELESS SENSOR NETWORKS
MULTIDIMENSIONAL ANALYSIS FOR QOS IN WIRELESS SENSOR NETWORKSMULTIDIMENSIONAL ANALYSIS FOR QOS IN WIRELESS SENSOR NETWORKS
MULTIDIMENSIONAL ANALYSIS FOR QOS IN WIRELESS SENSOR NETWORKS
 
Operating Task Redistribution in Hyperconverged Networks
Operating Task Redistribution in Hyperconverged Networks Operating Task Redistribution in Hyperconverged Networks
Operating Task Redistribution in Hyperconverged Networks
 
Shortest path estimation for graph
Shortest path estimation for graphShortest path estimation for graph
Shortest path estimation for graph
 
Big Data Clustering Model based on Fuzzy Gaussian
Big Data Clustering Model based on Fuzzy GaussianBig Data Clustering Model based on Fuzzy Gaussian
Big Data Clustering Model based on Fuzzy Gaussian
 
M tree
M treeM tree
M tree
 
ENERGY-EFFICIENT DATA COLLECTION IN CLUSTERED WIRELESS SENSOR NETWORKS EMPLOY...
ENERGY-EFFICIENT DATA COLLECTION IN CLUSTERED WIRELESS SENSOR NETWORKS EMPLOY...ENERGY-EFFICIENT DATA COLLECTION IN CLUSTERED WIRELESS SENSOR NETWORKS EMPLOY...
ENERGY-EFFICIENT DATA COLLECTION IN CLUSTERED WIRELESS SENSOR NETWORKS EMPLOY...
 
50120140505013
5012014050501350120140505013
50120140505013
 
Skyline Query Processing using Filtering in Distributed Environment
Skyline Query Processing using Filtering in Distributed EnvironmentSkyline Query Processing using Filtering in Distributed Environment
Skyline Query Processing using Filtering in Distributed Environment
 
Clustering using kernel entropy principal component analysis and variable ker...
Clustering using kernel entropy principal component analysis and variable ker...Clustering using kernel entropy principal component analysis and variable ker...
Clustering using kernel entropy principal component analysis and variable ker...
 
The Impact of Data Replication on Job Scheduling Performance in Hierarchical ...
The Impact of Data Replication on Job Scheduling Performance in Hierarchical ...The Impact of Data Replication on Job Scheduling Performance in Hierarchical ...
The Impact of Data Replication on Job Scheduling Performance in Hierarchical ...
 
Automated Clustering Project - 12th CONTECSI 34th WCARS
Automated Clustering Project - 12th CONTECSI 34th WCARS Automated Clustering Project - 12th CONTECSI 34th WCARS
Automated Clustering Project - 12th CONTECSI 34th WCARS
 
B0330811
B0330811B0330811
B0330811
 
Decision tree clustering a columnstores tuple reconstruction
Decision tree clustering  a columnstores tuple reconstructionDecision tree clustering  a columnstores tuple reconstruction
Decision tree clustering a columnstores tuple reconstruction
 
C0312023
C0312023C0312023
C0312023
 
A0360109
A0360109A0360109
A0360109
 
A general weighted_fuzzy_clustering_algorithm
A general weighted_fuzzy_clustering_algorithmA general weighted_fuzzy_clustering_algorithm
A general weighted_fuzzy_clustering_algorithm
 
Clustering Algorithms for Data Stream
Clustering Algorithms for Data StreamClustering Algorithms for Data Stream
Clustering Algorithms for Data Stream
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
Real-time PMU Data Recovery Application Based on Singular Value Decomposition
Real-time PMU Data Recovery Application Based on Singular Value DecompositionReal-time PMU Data Recovery Application Based on Singular Value Decomposition
Real-time PMU Data Recovery Application Based on Singular Value Decomposition
 

Similar to Reducing Data Flow in Distributed Networks Using PCA

Spectral Clustering and Vantage Point Indexing for Efficient Data Retrieval
Spectral Clustering and Vantage Point Indexing for Efficient Data Retrieval Spectral Clustering and Vantage Point Indexing for Efficient Data Retrieval
Spectral Clustering and Vantage Point Indexing for Efficient Data Retrieval IJECEIAES
 
IRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering AlgorithmIRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering AlgorithmIRJET Journal
 
A Novel Approach for Clustering Big Data based on MapReduce
A Novel Approach for Clustering Big Data based on MapReduce A Novel Approach for Clustering Big Data based on MapReduce
A Novel Approach for Clustering Big Data based on MapReduce IJECEIAES
 
Experimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsExperimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsIJDKP
 
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...acijjournal
 
Multimode system condition monitoring using sparsity reconstruction for quali...
Multimode system condition monitoring using sparsity reconstruction for quali...Multimode system condition monitoring using sparsity reconstruction for quali...
Multimode system condition monitoring using sparsity reconstruction for quali...IJECEIAES
 
Implementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big dataImplementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big dataeSAT Publishing House
 
Optimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data PerspectiveOptimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data Perspectiveপল্লব রায়
 
Comparative analysis of various data stream mining procedures and various dim...
Comparative analysis of various data stream mining procedures and various dim...Comparative analysis of various data stream mining procedures and various dim...
Comparative analysis of various data stream mining procedures and various dim...Alexander Decker
 
Network Flow Pattern Extraction by Clustering Eugine Kang
Network Flow Pattern Extraction by Clustering Eugine KangNetwork Flow Pattern Extraction by Clustering Eugine Kang
Network Flow Pattern Extraction by Clustering Eugine KangEugine Kang
 
Ijariie1117 volume 1-issue 1-page-25-27
Ijariie1117 volume 1-issue 1-page-25-27Ijariie1117 volume 1-issue 1-page-25-27
Ijariie1117 volume 1-issue 1-page-25-27IJARIIE JOURNAL
 
Algorithm selection for sorting in embedded and mobile systems
Algorithm selection for sorting in embedded and mobile systemsAlgorithm selection for sorting in embedded and mobile systems
Algorithm selection for sorting in embedded and mobile systemsJigisha Aryya
 
Improving K-NN Internet Traffic Classification Using Clustering and Principle...
Improving K-NN Internet Traffic Classification Using Clustering and Principle...Improving K-NN Internet Traffic Classification Using Clustering and Principle...
Improving K-NN Internet Traffic Classification Using Clustering and Principle...journalBEEI
 
An optimized cost-based data allocation model for heterogeneous distributed ...
An optimized cost-based data allocation model for  heterogeneous distributed ...An optimized cost-based data allocation model for  heterogeneous distributed ...
An optimized cost-based data allocation model for heterogeneous distributed ...IJECEIAES
 
IRJET- Enhanced Density Based Method for Clustering Data Stream
IRJET-  	  Enhanced Density Based Method for Clustering Data StreamIRJET-  	  Enhanced Density Based Method for Clustering Data Stream
IRJET- Enhanced Density Based Method for Clustering Data StreamIRJET Journal
 
Survey on classification algorithms for data mining (comparison and evaluation)
Survey on classification algorithms for data mining (comparison and evaluation)Survey on classification algorithms for data mining (comparison and evaluation)
Survey on classification algorithms for data mining (comparison and evaluation)Alexander Decker
 
IRJET- Expert Independent Bayesian Data Fusion and Decision Making Model for ...
IRJET- Expert Independent Bayesian Data Fusion and Decision Making Model for ...IRJET- Expert Independent Bayesian Data Fusion and Decision Making Model for ...
IRJET- Expert Independent Bayesian Data Fusion and Decision Making Model for ...IRJET Journal
 
A Novel Technique to Enhance the Lifetime of Wireless Sensor Networks through...
A Novel Technique to Enhance the Lifetime of Wireless Sensor Networks through...A Novel Technique to Enhance the Lifetime of Wireless Sensor Networks through...
A Novel Technique to Enhance the Lifetime of Wireless Sensor Networks through...IJECEIAES
 
An enhanced adaptive scoring job scheduling algorithm with replication strate...
An enhanced adaptive scoring job scheduling algorithm with replication strate...An enhanced adaptive scoring job scheduling algorithm with replication strate...
An enhanced adaptive scoring job scheduling algorithm with replication strate...eSAT Publishing House
 

Similar to Reducing Data Flow in Distributed Networks Using PCA (20)

Spectral Clustering and Vantage Point Indexing for Efficient Data Retrieval
Spectral Clustering and Vantage Point Indexing for Efficient Data Retrieval Spectral Clustering and Vantage Point Indexing for Efficient Data Retrieval
Spectral Clustering and Vantage Point Indexing for Efficient Data Retrieval
 
IRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering AlgorithmIRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering Algorithm
 
A Novel Approach for Clustering Big Data based on MapReduce
A Novel Approach for Clustering Big Data based on MapReduce A Novel Approach for Clustering Big Data based on MapReduce
A Novel Approach for Clustering Big Data based on MapReduce
 
Experimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsExperimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithms
 
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
 
Multimode system condition monitoring using sparsity reconstruction for quali...
Multimode system condition monitoring using sparsity reconstruction for quali...Multimode system condition monitoring using sparsity reconstruction for quali...
Multimode system condition monitoring using sparsity reconstruction for quali...
 
IJET-V3I1P27
IJET-V3I1P27IJET-V3I1P27
IJET-V3I1P27
 
Implementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big dataImplementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big data
 
Optimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data PerspectiveOptimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data Perspective
 
Comparative analysis of various data stream mining procedures and various dim...
Comparative analysis of various data stream mining procedures and various dim...Comparative analysis of various data stream mining procedures and various dim...
Comparative analysis of various data stream mining procedures and various dim...
 
Network Flow Pattern Extraction by Clustering Eugine Kang
Network Flow Pattern Extraction by Clustering Eugine KangNetwork Flow Pattern Extraction by Clustering Eugine Kang
Network Flow Pattern Extraction by Clustering Eugine Kang
 
Ijariie1117 volume 1-issue 1-page-25-27
Ijariie1117 volume 1-issue 1-page-25-27Ijariie1117 volume 1-issue 1-page-25-27
Ijariie1117 volume 1-issue 1-page-25-27
 
Algorithm selection for sorting in embedded and mobile systems
Algorithm selection for sorting in embedded and mobile systemsAlgorithm selection for sorting in embedded and mobile systems
Algorithm selection for sorting in embedded and mobile systems
 
Improving K-NN Internet Traffic Classification Using Clustering and Principle...
Improving K-NN Internet Traffic Classification Using Clustering and Principle...Improving K-NN Internet Traffic Classification Using Clustering and Principle...
Improving K-NN Internet Traffic Classification Using Clustering and Principle...
 
An optimized cost-based data allocation model for heterogeneous distributed ...
An optimized cost-based data allocation model for  heterogeneous distributed ...An optimized cost-based data allocation model for  heterogeneous distributed ...
An optimized cost-based data allocation model for heterogeneous distributed ...
 
IRJET- Enhanced Density Based Method for Clustering Data Stream
IRJET-  	  Enhanced Density Based Method for Clustering Data StreamIRJET-  	  Enhanced Density Based Method for Clustering Data Stream
IRJET- Enhanced Density Based Method for Clustering Data Stream
 
Survey on classification algorithms for data mining (comparison and evaluation)
Survey on classification algorithms for data mining (comparison and evaluation)Survey on classification algorithms for data mining (comparison and evaluation)
Survey on classification algorithms for data mining (comparison and evaluation)
 
IRJET- Expert Independent Bayesian Data Fusion and Decision Making Model for ...
IRJET- Expert Independent Bayesian Data Fusion and Decision Making Model for ...IRJET- Expert Independent Bayesian Data Fusion and Decision Making Model for ...
IRJET- Expert Independent Bayesian Data Fusion and Decision Making Model for ...
 
A Novel Technique to Enhance the Lifetime of Wireless Sensor Networks through...
A Novel Technique to Enhance the Lifetime of Wireless Sensor Networks through...A Novel Technique to Enhance the Lifetime of Wireless Sensor Networks through...
A Novel Technique to Enhance the Lifetime of Wireless Sensor Networks through...
 
An enhanced adaptive scoring job scheduling algorithm with replication strate...
An enhanced adaptive scoring job scheduling algorithm with replication strate...An enhanced adaptive scoring job scheduling algorithm with replication strate...
An enhanced adaptive scoring job scheduling algorithm with replication strate...
 

More from cscpconf

ANALYSIS OF LAND SURFACE DEFORMATION GRADIENT BY DINSAR
ANALYSIS OF LAND SURFACE DEFORMATION GRADIENT BY DINSAR ANALYSIS OF LAND SURFACE DEFORMATION GRADIENT BY DINSAR
ANALYSIS OF LAND SURFACE DEFORMATION GRADIENT BY DINSAR cscpconf
 
4D AUTOMATIC LIP-READING FOR SPEAKER'S FACE IDENTIFCATION
4D AUTOMATIC LIP-READING FOR SPEAKER'S FACE IDENTIFCATION4D AUTOMATIC LIP-READING FOR SPEAKER'S FACE IDENTIFCATION
4D AUTOMATIC LIP-READING FOR SPEAKER'S FACE IDENTIFCATIONcscpconf
 
MOVING FROM WATERFALL TO AGILE PROCESS IN SOFTWARE ENGINEERING CAPSTONE PROJE...
MOVING FROM WATERFALL TO AGILE PROCESS IN SOFTWARE ENGINEERING CAPSTONE PROJE...MOVING FROM WATERFALL TO AGILE PROCESS IN SOFTWARE ENGINEERING CAPSTONE PROJE...
MOVING FROM WATERFALL TO AGILE PROCESS IN SOFTWARE ENGINEERING CAPSTONE PROJE...cscpconf
 
PROMOTING STUDENT ENGAGEMENT USING SOCIAL MEDIA TECHNOLOGIES
PROMOTING STUDENT ENGAGEMENT USING SOCIAL MEDIA TECHNOLOGIESPROMOTING STUDENT ENGAGEMENT USING SOCIAL MEDIA TECHNOLOGIES
PROMOTING STUDENT ENGAGEMENT USING SOCIAL MEDIA TECHNOLOGIEScscpconf
 
A SURVEY ON QUESTION ANSWERING SYSTEMS: THE ADVANCES OF FUZZY LOGIC
A SURVEY ON QUESTION ANSWERING SYSTEMS: THE ADVANCES OF FUZZY LOGICA SURVEY ON QUESTION ANSWERING SYSTEMS: THE ADVANCES OF FUZZY LOGIC
A SURVEY ON QUESTION ANSWERING SYSTEMS: THE ADVANCES OF FUZZY LOGICcscpconf
 
DYNAMIC PHONE WARPING – A METHOD TO MEASURE THE DISTANCE BETWEEN PRONUNCIATIONS
DYNAMIC PHONE WARPING – A METHOD TO MEASURE THE DISTANCE BETWEEN PRONUNCIATIONS DYNAMIC PHONE WARPING – A METHOD TO MEASURE THE DISTANCE BETWEEN PRONUNCIATIONS
DYNAMIC PHONE WARPING – A METHOD TO MEASURE THE DISTANCE BETWEEN PRONUNCIATIONS cscpconf
 
INTELLIGENT ELECTRONIC ASSESSMENT FOR SUBJECTIVE EXAMS
INTELLIGENT ELECTRONIC ASSESSMENT FOR SUBJECTIVE EXAMS INTELLIGENT ELECTRONIC ASSESSMENT FOR SUBJECTIVE EXAMS
INTELLIGENT ELECTRONIC ASSESSMENT FOR SUBJECTIVE EXAMS cscpconf
 
TWO DISCRETE BINARY VERSIONS OF AFRICAN BUFFALO OPTIMIZATION METAHEURISTIC
TWO DISCRETE BINARY VERSIONS OF AFRICAN BUFFALO OPTIMIZATION METAHEURISTICTWO DISCRETE BINARY VERSIONS OF AFRICAN BUFFALO OPTIMIZATION METAHEURISTIC
TWO DISCRETE BINARY VERSIONS OF AFRICAN BUFFALO OPTIMIZATION METAHEURISTICcscpconf
 
DETECTION OF ALGORITHMICALLY GENERATED MALICIOUS DOMAIN
DETECTION OF ALGORITHMICALLY GENERATED MALICIOUS DOMAINDETECTION OF ALGORITHMICALLY GENERATED MALICIOUS DOMAIN
DETECTION OF ALGORITHMICALLY GENERATED MALICIOUS DOMAINcscpconf
 
GLOBAL MUSIC ASSET ASSURANCE DIGITAL CURRENCY: A DRM SOLUTION FOR STREAMING C...
GLOBAL MUSIC ASSET ASSURANCE DIGITAL CURRENCY: A DRM SOLUTION FOR STREAMING C...GLOBAL MUSIC ASSET ASSURANCE DIGITAL CURRENCY: A DRM SOLUTION FOR STREAMING C...
GLOBAL MUSIC ASSET ASSURANCE DIGITAL CURRENCY: A DRM SOLUTION FOR STREAMING C...cscpconf
 
IMPORTANCE OF VERB SUFFIX MAPPING IN DISCOURSE TRANSLATION SYSTEM
IMPORTANCE OF VERB SUFFIX MAPPING IN DISCOURSE TRANSLATION SYSTEMIMPORTANCE OF VERB SUFFIX MAPPING IN DISCOURSE TRANSLATION SYSTEM
IMPORTANCE OF VERB SUFFIX MAPPING IN DISCOURSE TRANSLATION SYSTEMcscpconf
 
EXACT SOLUTIONS OF A FAMILY OF HIGHER-DIMENSIONAL SPACE-TIME FRACTIONAL KDV-T...
EXACT SOLUTIONS OF A FAMILY OF HIGHER-DIMENSIONAL SPACE-TIME FRACTIONAL KDV-T...EXACT SOLUTIONS OF A FAMILY OF HIGHER-DIMENSIONAL SPACE-TIME FRACTIONAL KDV-T...
EXACT SOLUTIONS OF A FAMILY OF HIGHER-DIMENSIONAL SPACE-TIME FRACTIONAL KDV-T...cscpconf
 
AUTOMATED PENETRATION TESTING: AN OVERVIEW
AUTOMATED PENETRATION TESTING: AN OVERVIEWAUTOMATED PENETRATION TESTING: AN OVERVIEW
AUTOMATED PENETRATION TESTING: AN OVERVIEWcscpconf
 
CLASSIFICATION OF ALZHEIMER USING fMRI DATA AND BRAIN NETWORK
CLASSIFICATION OF ALZHEIMER USING fMRI DATA AND BRAIN NETWORKCLASSIFICATION OF ALZHEIMER USING fMRI DATA AND BRAIN NETWORK
CLASSIFICATION OF ALZHEIMER USING fMRI DATA AND BRAIN NETWORKcscpconf
 
VALIDATION METHOD OF FUZZY ASSOCIATION RULES BASED ON FUZZY FORMAL CONCEPT AN...
VALIDATION METHOD OF FUZZY ASSOCIATION RULES BASED ON FUZZY FORMAL CONCEPT AN...VALIDATION METHOD OF FUZZY ASSOCIATION RULES BASED ON FUZZY FORMAL CONCEPT AN...
VALIDATION METHOD OF FUZZY ASSOCIATION RULES BASED ON FUZZY FORMAL CONCEPT AN...cscpconf
 
PROBABILITY BASED CLUSTER EXPANSION OVERSAMPLING TECHNIQUE FOR IMBALANCED DATA
PROBABILITY BASED CLUSTER EXPANSION OVERSAMPLING TECHNIQUE FOR IMBALANCED DATAPROBABILITY BASED CLUSTER EXPANSION OVERSAMPLING TECHNIQUE FOR IMBALANCED DATA
PROBABILITY BASED CLUSTER EXPANSION OVERSAMPLING TECHNIQUE FOR IMBALANCED DATAcscpconf
 
CHARACTER AND IMAGE RECOGNITION FOR DATA CATALOGING IN ECOLOGICAL RESEARCH
CHARACTER AND IMAGE RECOGNITION FOR DATA CATALOGING IN ECOLOGICAL RESEARCHCHARACTER AND IMAGE RECOGNITION FOR DATA CATALOGING IN ECOLOGICAL RESEARCH
CHARACTER AND IMAGE RECOGNITION FOR DATA CATALOGING IN ECOLOGICAL RESEARCHcscpconf
 
SOCIAL MEDIA ANALYTICS FOR SENTIMENT ANALYSIS AND EVENT DETECTION IN SMART CI...
SOCIAL MEDIA ANALYTICS FOR SENTIMENT ANALYSIS AND EVENT DETECTION IN SMART CI...SOCIAL MEDIA ANALYTICS FOR SENTIMENT ANALYSIS AND EVENT DETECTION IN SMART CI...
SOCIAL MEDIA ANALYTICS FOR SENTIMENT ANALYSIS AND EVENT DETECTION IN SMART CI...cscpconf
 
SOCIAL NETWORK HATE SPEECH DETECTION FOR AMHARIC LANGUAGE
SOCIAL NETWORK HATE SPEECH DETECTION FOR AMHARIC LANGUAGESOCIAL NETWORK HATE SPEECH DETECTION FOR AMHARIC LANGUAGE
SOCIAL NETWORK HATE SPEECH DETECTION FOR AMHARIC LANGUAGEcscpconf
 
GENERAL REGRESSION NEURAL NETWORK BASED POS TAGGING FOR NEPALI TEXT
GENERAL REGRESSION NEURAL NETWORK BASED POS TAGGING FOR NEPALI TEXTGENERAL REGRESSION NEURAL NETWORK BASED POS TAGGING FOR NEPALI TEXT
GENERAL REGRESSION NEURAL NETWORK BASED POS TAGGING FOR NEPALI TEXTcscpconf
 

More from cscpconf (20)

ANALYSIS OF LAND SURFACE DEFORMATION GRADIENT BY DINSAR
ANALYSIS OF LAND SURFACE DEFORMATION GRADIENT BY DINSAR ANALYSIS OF LAND SURFACE DEFORMATION GRADIENT BY DINSAR
ANALYSIS OF LAND SURFACE DEFORMATION GRADIENT BY DINSAR
 
4D AUTOMATIC LIP-READING FOR SPEAKER'S FACE IDENTIFCATION
4D AUTOMATIC LIP-READING FOR SPEAKER'S FACE IDENTIFCATION4D AUTOMATIC LIP-READING FOR SPEAKER'S FACE IDENTIFCATION
4D AUTOMATIC LIP-READING FOR SPEAKER'S FACE IDENTIFCATION
 
MOVING FROM WATERFALL TO AGILE PROCESS IN SOFTWARE ENGINEERING CAPSTONE PROJE...
MOVING FROM WATERFALL TO AGILE PROCESS IN SOFTWARE ENGINEERING CAPSTONE PROJE...MOVING FROM WATERFALL TO AGILE PROCESS IN SOFTWARE ENGINEERING CAPSTONE PROJE...
MOVING FROM WATERFALL TO AGILE PROCESS IN SOFTWARE ENGINEERING CAPSTONE PROJE...
 
PROMOTING STUDENT ENGAGEMENT USING SOCIAL MEDIA TECHNOLOGIES
PROMOTING STUDENT ENGAGEMENT USING SOCIAL MEDIA TECHNOLOGIESPROMOTING STUDENT ENGAGEMENT USING SOCIAL MEDIA TECHNOLOGIES
PROMOTING STUDENT ENGAGEMENT USING SOCIAL MEDIA TECHNOLOGIES
 
A SURVEY ON QUESTION ANSWERING SYSTEMS: THE ADVANCES OF FUZZY LOGIC
A SURVEY ON QUESTION ANSWERING SYSTEMS: THE ADVANCES OF FUZZY LOGICA SURVEY ON QUESTION ANSWERING SYSTEMS: THE ADVANCES OF FUZZY LOGIC
A SURVEY ON QUESTION ANSWERING SYSTEMS: THE ADVANCES OF FUZZY LOGIC
 
DYNAMIC PHONE WARPING – A METHOD TO MEASURE THE DISTANCE BETWEEN PRONUNCIATIONS
DYNAMIC PHONE WARPING – A METHOD TO MEASURE THE DISTANCE BETWEEN PRONUNCIATIONS DYNAMIC PHONE WARPING – A METHOD TO MEASURE THE DISTANCE BETWEEN PRONUNCIATIONS
DYNAMIC PHONE WARPING – A METHOD TO MEASURE THE DISTANCE BETWEEN PRONUNCIATIONS
 
INTELLIGENT ELECTRONIC ASSESSMENT FOR SUBJECTIVE EXAMS
INTELLIGENT ELECTRONIC ASSESSMENT FOR SUBJECTIVE EXAMS INTELLIGENT ELECTRONIC ASSESSMENT FOR SUBJECTIVE EXAMS
INTELLIGENT ELECTRONIC ASSESSMENT FOR SUBJECTIVE EXAMS
 
TWO DISCRETE BINARY VERSIONS OF AFRICAN BUFFALO OPTIMIZATION METAHEURISTIC
TWO DISCRETE BINARY VERSIONS OF AFRICAN BUFFALO OPTIMIZATION METAHEURISTICTWO DISCRETE BINARY VERSIONS OF AFRICAN BUFFALO OPTIMIZATION METAHEURISTIC
TWO DISCRETE BINARY VERSIONS OF AFRICAN BUFFALO OPTIMIZATION METAHEURISTIC
 
DETECTION OF ALGORITHMICALLY GENERATED MALICIOUS DOMAIN
DETECTION OF ALGORITHMICALLY GENERATED MALICIOUS DOMAINDETECTION OF ALGORITHMICALLY GENERATED MALICIOUS DOMAIN
DETECTION OF ALGORITHMICALLY GENERATED MALICIOUS DOMAIN
 
GLOBAL MUSIC ASSET ASSURANCE DIGITAL CURRENCY: A DRM SOLUTION FOR STREAMING C...
GLOBAL MUSIC ASSET ASSURANCE DIGITAL CURRENCY: A DRM SOLUTION FOR STREAMING C...GLOBAL MUSIC ASSET ASSURANCE DIGITAL CURRENCY: A DRM SOLUTION FOR STREAMING C...
GLOBAL MUSIC ASSET ASSURANCE DIGITAL CURRENCY: A DRM SOLUTION FOR STREAMING C...
 
IMPORTANCE OF VERB SUFFIX MAPPING IN DISCOURSE TRANSLATION SYSTEM
IMPORTANCE OF VERB SUFFIX MAPPING IN DISCOURSE TRANSLATION SYSTEMIMPORTANCE OF VERB SUFFIX MAPPING IN DISCOURSE TRANSLATION SYSTEM
IMPORTANCE OF VERB SUFFIX MAPPING IN DISCOURSE TRANSLATION SYSTEM
 
EXACT SOLUTIONS OF A FAMILY OF HIGHER-DIMENSIONAL SPACE-TIME FRACTIONAL KDV-T...
EXACT SOLUTIONS OF A FAMILY OF HIGHER-DIMENSIONAL SPACE-TIME FRACTIONAL KDV-T...EXACT SOLUTIONS OF A FAMILY OF HIGHER-DIMENSIONAL SPACE-TIME FRACTIONAL KDV-T...
EXACT SOLUTIONS OF A FAMILY OF HIGHER-DIMENSIONAL SPACE-TIME FRACTIONAL KDV-T...
 
AUTOMATED PENETRATION TESTING: AN OVERVIEW
AUTOMATED PENETRATION TESTING: AN OVERVIEWAUTOMATED PENETRATION TESTING: AN OVERVIEW
AUTOMATED PENETRATION TESTING: AN OVERVIEW
 
CLASSIFICATION OF ALZHEIMER USING fMRI DATA AND BRAIN NETWORK
CLASSIFICATION OF ALZHEIMER USING fMRI DATA AND BRAIN NETWORKCLASSIFICATION OF ALZHEIMER USING fMRI DATA AND BRAIN NETWORK
CLASSIFICATION OF ALZHEIMER USING fMRI DATA AND BRAIN NETWORK
 
VALIDATION METHOD OF FUZZY ASSOCIATION RULES BASED ON FUZZY FORMAL CONCEPT AN...
VALIDATION METHOD OF FUZZY ASSOCIATION RULES BASED ON FUZZY FORMAL CONCEPT AN...VALIDATION METHOD OF FUZZY ASSOCIATION RULES BASED ON FUZZY FORMAL CONCEPT AN...
VALIDATION METHOD OF FUZZY ASSOCIATION RULES BASED ON FUZZY FORMAL CONCEPT AN...
 
PROBABILITY BASED CLUSTER EXPANSION OVERSAMPLING TECHNIQUE FOR IMBALANCED DATA
PROBABILITY BASED CLUSTER EXPANSION OVERSAMPLING TECHNIQUE FOR IMBALANCED DATAPROBABILITY BASED CLUSTER EXPANSION OVERSAMPLING TECHNIQUE FOR IMBALANCED DATA
PROBABILITY BASED CLUSTER EXPANSION OVERSAMPLING TECHNIQUE FOR IMBALANCED DATA
 
CHARACTER AND IMAGE RECOGNITION FOR DATA CATALOGING IN ECOLOGICAL RESEARCH
CHARACTER AND IMAGE RECOGNITION FOR DATA CATALOGING IN ECOLOGICAL RESEARCHCHARACTER AND IMAGE RECOGNITION FOR DATA CATALOGING IN ECOLOGICAL RESEARCH
CHARACTER AND IMAGE RECOGNITION FOR DATA CATALOGING IN ECOLOGICAL RESEARCH
 
SOCIAL MEDIA ANALYTICS FOR SENTIMENT ANALYSIS AND EVENT DETECTION IN SMART CI...
SOCIAL MEDIA ANALYTICS FOR SENTIMENT ANALYSIS AND EVENT DETECTION IN SMART CI...SOCIAL MEDIA ANALYTICS FOR SENTIMENT ANALYSIS AND EVENT DETECTION IN SMART CI...
SOCIAL MEDIA ANALYTICS FOR SENTIMENT ANALYSIS AND EVENT DETECTION IN SMART CI...
 
SOCIAL NETWORK HATE SPEECH DETECTION FOR AMHARIC LANGUAGE
SOCIAL NETWORK HATE SPEECH DETECTION FOR AMHARIC LANGUAGESOCIAL NETWORK HATE SPEECH DETECTION FOR AMHARIC LANGUAGE
SOCIAL NETWORK HATE SPEECH DETECTION FOR AMHARIC LANGUAGE
 
GENERAL REGRESSION NEURAL NETWORK BASED POS TAGGING FOR NEPALI TEXT
GENERAL REGRESSION NEURAL NETWORK BASED POS TAGGING FOR NEPALI TEXTGENERAL REGRESSION NEURAL NETWORK BASED POS TAGGING FOR NEPALI TEXT
GENERAL REGRESSION NEURAL NETWORK BASED POS TAGGING FOR NEPALI TEXT
 

Recently uploaded

Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 

Recently uploaded (20)

Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 

Reducing Data Flow in Distributed Networks Using PCA

  • 1. Rupak Bhattacharyya et al. (Eds) : ACER 2013, pp. 287–293, 2013. © CS & IT-CSCP 2013 DOI : 10.5121/csit.2013.3227 TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPONENT ANALYSIS Devadatta Sinha1 and Anal Acharya2 1 Department of Computer Science and Engineering, Calcutta University, Kolkata, India devadatta.sinha@gmail.com 2 Computer Science Department, St. Xavier’s College, Kolkata, India. anal.acharya@sxccal.edu ABSTRACT For performing distributed data mining two approaches are possible: First, data from several sources are copied to a data warehouse and mining algorithms are applied in it. Secondly, mining can performed at the local sites and the results can be aggregated. When the number of features is high, a lot of bandwidth is consumed in transferring datasets to a centralized location. For this dimensionality reduction can be done at the local sites. In dimensionality reduction a certain encoding is applied on data so as to obtain its compressed form. The reduced features thus obtained at the local sites are aggregated and data mining algorithms are applied on them. There are several methods of performing dimensionality reduction. Two most important ones are Discrete Wavelet Transforms (DWT) and Principal Component Analysis (PCA). Here a detailed study is done on how PCA could be useful in reducing data flow across a distributed network. KEYWORDS Distributed Data Mining (DDM), Principal Component Analysis (PCA), Eigen Vector, Dimensionality Reduction. 1. INTRODUCTION In the recent years data mining has gained a lot of importance. This is mainly due to what is called as Knowledge Discovery from Data (KDD). This mainly deals with searching of interesting patterns from large data sets. As stated in [2], data mining is an essential step in the process of knowledge discovery. [5] classified three types of data mining algorithms: the first type deals with the application of data mining algorithms on single valued vectors, the second type deals with large data sets and the third type deals with distributed databases. The focus of this paper is on distributed databases. This is because data today is maintained in a distributed nature due to changing nature of business application. Also these applications are scalable with the growth of business. Distributed Data Mining (DDM) can be carried out following two models [1, 6]: Firstly, all data from each of the locations be transferred to a centralized site and integrated to form a data warehouse. Then the required patterns can be derived using a suitable data mining algorithm. In the second approach, a summary of the data at each location is sent to the centralized locations [9]. The advantage of this approach is that it involves much less data transfer. Basically what is done is that a certain mapping is applied on
  • 2. 288 Computer Science & Information Technology (CS & IT) that data to reduce its dimension. A reverse mapping is then applied to get the original data. Thus data flow is reduced across the distributed network. To this end we assume a very simple framework that can perform distributed data mining. The framework in brief consists of the following steps: 1. Data Preprocessing: This consists of data cleaning, dimension reduction etc. 2. Data Integration: Data from multiple sources are combined. 3. Data Mining: Various algorithms are applied on data to obtain interesting patterns. 4. Knowledge Discovery: Use the pattern thus obtained to perform analysis. The paper concentrates on the first step i.e. data preprocessing. Piramuthu[9] concludes that 80 percent of the resources in a majority of data mining applications are spent on preprocessing of data. This is because databases typically consist of data which are noisy, inconsistent and contain missing values due to integration of data from various sources. One of the methods of data preprocessing is dimension reduction. Two methods are studied here: Principal Component Analysis (PCA) and Discrete Wavelet Transforms (DWT). In particular, we study how PCA can be used to reduce data flow in a distributed system. We construct a distributed data mining architecture along these lines. The PCA algorithm is next studied in details. This algorithm is then applied to distributed systems. We next determine the computing complexity of the algorithm. Finally, a simulation is done on sample data to illustrate how dimensionality reduction is done at local sites. 2. LITERATURE SURVEY The papers surveyed can be divided into two major categories: the models and the corresponding mathematical tools. Kargupta el al[1] has done an exhaustive study on Distributed Data Mining(DDM). In this he proposes a framework for DDM. Various data preprocessing techniques for heterogeneous and homogeneous data sites are discussed in details. In [3] an agent based DDM architecture is developed. The concepts of Parallel Data Mining agents are used. Qi et al[4] in their paper proposes a novel method for performing distributed DDM. Faisal et al.[5] in their work developed an algorithm called Distributed Fast Map for performing dimension reduction. Mohanty et al[8] proposed a classification technique based on chi-square testing and then selecting the best features. In [9] Piramuthu developed a novel method of feature selection using a probabilistic distance based method. Association rules can also be used, particularly in text mining[10]. The use of unsupervised learning has been used for online feature selection in [11]. Finally, an evaluation technique called stability index has been proposed in [12]. Li[6] in his work made a survey on the use of wavelets in data mining. It contains a detailed study on the use of wavelets in Data Management, Preprocessing and various data mining algorithms. The use of wavelets in clustering and classification is particularly interesting. Finally, Smith [7] has given very lucid account of the mathematics involved in the working of PCA. 3. DIMENSION REDUCTION While processing large databases the problem of high dimensionality of feature space is encountered. For example a row in a particular table may contain several columns. In a distributed database, it is virtually impossible to import data of such dimension to a centralized location. Thus to improve computational efficiency, data is projected in a smaller space[6]. Smaller space also helps in visualizing and analyzing data. The goal here is to create a low dimension space which yields maximum information. We propose the use two mathematical tools
  • 3. Computer Science & Information Technology (CS & IT) 289 for this purpose. PCA is discussed in details here. We give a basic outline of our use of Wavelets here. We leave the details for future work. 3.1 Wavelets We suppose that we want to reduce the data to p columns where the original data set consists of n columns, p<<n. We apply the simplest type of wavelet called Haar Wavelets on the feature vector. This gives us a set of approximations and detailed coefficients. Their number decreases as we increase the number of resolutions. Thus we can keep the first p coefficients and approximate the others with zeros. A reverse transformation is applied on these to yield the original data. 3.2 Principal Component Analysis(PCA) Now we propose the use of PCA algorithm for dimension reduction. It is to be noted that the resulting vector set is orthogonal in nature. Thus we are able to reduce the vector set into much smaller space. Here we enumerate the steps for performing PCA: Step 1: For simplicity we assume a bi-variate data set {X,Y}. Step 2: Compute the mean. We assume they are m1 and m2. We apply the transformation {X- m1,Y-m2} on this data set. For simplicity we call them {P,Q}. This matrix is renamed M. Step 3: We compute the covariance matrix, Cov(P,P) Cov(P,Q) Cov(Q,P) Cov(Q,Q) We call this matrix C. Step 4: We compute the Eigen values E and Eigen vectors of the covariance matrix. E= (e1,e2). The Eigen vector is given as V11 v12 V21 v22 We call this vector V. We note that this vector is a unit vector. Step 5: We find the feature vector. This is the Eigen vector with the highest Eigen value. This could be any one of [v11 v21]T or [v12 v22]T . Step 6: Derive the transformed data matrix D: D= VT *MT Step 7: Get back the original data using the transformation: O= VT *D Step 8: Add the mean that was originally subtracted. The above algorithm can be extended to any number of variables. The use of this algorithm in distributed database is discussed in the next section.
  • 4. 290 Computer Science & Information Technology (CS & IT) 4. THE FRAMEWORK The data mining architecture that we use is shown in the figure below: Data Data Data Source1 Source2 Sourcen Fig 1: Our framework for DDM using PCA We assume that there are n data sources represented by db1, db2,....... ,dbn. We assume that the database is homogeneous in nature. Each of the rows has n vectors. We apply the PCA algorithm stated earlier at each site and reduce it to k vectors, k<n. This gives us a uniform reduced set of vectors at each local site. These vectors are then transferred to a centralized location to form aggregate data set. Data mining tools can then be applied on this data. 5. COMPUTATIONAL COMPLEXITY For simplicity, we perform this computation on a single site db1 and the rest follows from it. As stated earlier, using PCA we reduce the dimension to p from N where p<<N. If the number of rows in the table is m, without the use of PCA the amount of data to be transmitted is of the order O(mN) whereas if we use PCA it reduces to O(mp). Clearly, considering that there are n data sites, the complexity reduces to O(nmp) from O(nmN). Thus a large amount of reduction of computational complexity is achieved by using PCA in a distributed network. 6. RESULTS The above algorithm was implemented on a core i3 processor of 3.3 GHZ. For simplicity, we perform PCA on the following bi-variate data set: X 0.5 2.2 3.1 2.3 1.0 1.5 1.1 Y 0.7 2.9 3.0 2.7 1.1 1.6 0.9 Table 1: Original Data This yields the following graph: PCA Algorithm Algorithm PCA Algorithm Reduced features at local site Reduced features at local site Reduced features at local site Reduced features aggregation Data Mining Tools PCA Algorithm Algorithm
  • 5. Computer Science & Information Technology (CS & IT) 291 Fig 2: Original data Plotted We follow the steps enumerated earlier and obtain the covariance matrix: 0.6165 0.6154 0.6154 0.7165 Using a ‘C’ program we obtain the following Eigen values [0.0491 1.2840] These yield the following set of Eigen Vectors -0.7352 -0.6779 0.6778 -0.7352 Fig 3: Normalized data on application of the Eigen Vectors obtained from the covariance matrix. The red line gives us the feature vector -0.6779 -0.7353 -0.7352 0.6772
  • 6. 292 Computer Science & Information Technology (CS & IT) The transformed data is obtained as P 1.7775 -0.9922 -1.6758 -0.9129 1.1445 0.4380 1.2238 Q 0.1428 0.3843 -0.2094 0.1753 0.0464 0.0177 -0.1626 Table 2: Data obtained after applying PCA Fig 4: Plot of the data obtained after applying PCA Original data can be obtained by applying the reverse transformation as discussed in the earlier section. 7. CONCLUSION AND FUTURE WORK In this paper we made a study of a method used for dimension reduction in context of DDM. There are other methods as well like applying Discrete Wavelets Transforms (DWT) on feature vector to reduce dimensions. The architecture we have developed in rudimentary in nature. We could expand this framework by adding agents to perform suitable tasks. Also some modelling techniques could be used to model these. We leave these all for our future work. REFERENCES [1] Kargupta,Park, (2002), “Algorithms, Systems and Applications”, ABC Transactions on ECE, Vol. 10, No. 5, pp120-122. [2] Haan and Kamber (2011) Data Mining: Concepts and Techniques, Third Edition, Morgan Kaufman Publishers. [3] Kargupta (2003), “Scalable, Distributed data mining using agent based architecture”. [4] Qi,Wang,Birdwell, (2001), “Global Principal Component Analysis for performing dimension reduction in distributed data mining”. [5] Faisal,Samatova,Ostrouchov, (2007), “Distributed Dimension Reduction Algorithms for Widely Dispersed Data”. [6] Li,Zhu,Oghihara, (2009), “A survey on the use of wavelets in data mining”, SIGKDD Explorations, Volume 4, Issue 2, page 49-69 [7] Smith (2009),”A Tutorial on Principal Component Analysis” [8] Bhuyan,Mohanty,Das (2012), “Privacy Preservation for feature selection in data mining using Centralized network” IJCSI, Vol. 3, No. 2. [9] Piramuthu, (2004), “Evaluating feature Selection methods for learning in data mining applications”, European Journal on Operations Research”, Vol. 9.
  • 7. Computer Science & Information Technology (CS & IT) 293 [10] Do,Hui,Fong, (2007), “Associative feature Selection methods for text mining”. [11] Hoi,Wang,Zhao, (2005), “Online Feature Selection for Mining big data”, Bignine, China [12] Kurcheva, (2012), “A stability index for feature selection” 25th IASTAD International Conference on Artificial Intelligence, Austria [13] Kargupta, Huang, SivaKumar,Johnson, (2001), “Distributed clustering using Collective Principal Component analysis”, Knowledge and information Systems , Springer-Verlag,, London. AUTHORS Prof Devadatta Sinha is a Professor in University of Calcutta. He has over 30 years of teaching and research experience. His interests include Distributed Processing, Software Engineering. Prof Anal Acharya is currently Head of the Department of Computer Science, St Xavier’s College, Kolkata. His research interests include Distributed Data Mining Algorithms.