SlideShare a Scribd company logo
CONTACT: PRAVEEN KUMAR. L (,+91 – 9791938249)
MAIL ID: , praveen@nexgenproject.com
Web: www.nexgenproject.com,
AN INFORMATION THEORY-BASED FEATURE
SELECTIONFRAMEWORK FOR BIG DATA UNDER APACHE SPARK
ABSTRACT
With the advent of extremely high dimensional datasets, dimensionality reduction techniques are
becoming mandatory. Of the many techniques available, feature selection (FS) is of growing
interest for its ability to identify both relevant features and frequently repeated instances in huge
datasets. We aim to demonstrate that standard FS methods can be parallelized in big data
platforms like Apache Spark so as to boost both performance and accuracy. We propose a
distributed implementation of a generic FS framework that includes broad group of well-known
information theory-based methods. Experimental results for a broad set of real-world datasets
show that our distributed framework is capable of rapidly dealing with ultrahigh-dimensional
datasets as well as those with a huge number of samples, outperforming the sequential version in
all the cases studied.
EXISTING SYSTEM:
Existing FS methods are not expected to scale well when dealing with big data due to the fact
that efficiency may significantly deteriorates or that the FS approach may even become
inapplicable. Scalable distributed programming protocols and frameworks have emerged in the
last decade to manage the problem of big data. The first programming model developed was
MapReduce along with its open-source implementation Apache Hadoop. More recently, Apache
Spark has been presented as a new distributed framework with a fast, general large-scale data
processing engine that is popular with ML researchers due to its suitability for iterative
procedures. Likewise, several libraries for approaching ML tasks in big data environments have
emerged in recent years. The first such library was Mahout , followed up by MLlib ,built on the
CONTACT: PRAVEEN KUMAR. L (,+91 – 9791938249)
MAIL ID: , praveen@nexgenproject.com
Web: www.nexgenproject.com,
Spark system. Thanks to Spark’s capacity for in-memory computation that speeds up iterative
processes, algorithms developed for this kind of platform havebecome pervasive in the industry.
Although several gold standard algorithms for ML tasks have been redesigned to incorporate a
distributed implementation for big data technologies, this is not yet the case for FS algorithms.
PROPOSED SYSTEM:
This paper aims to fill the existing gap by demonstrating that standard FS methods can be
designed for big data platforms that can be usefully applied to big datasets while boosting both
performance and accuracy. We propose a new distributed design for an FS generic framework
based on information theory that is implemented using the Apache Spark paradigm. To make this
adaptation feasible, a wide variety of techniques from the distributed environment have been
used, including information caching, data partitioning and replication of relevant variables. Note
that adapting this framework to Spark implies a major challenge as it requires deep restructuring
of classic algorithms. To test the effectiveness of our framework, we applied it toa complete set
of real-world datasets [up to O(10) features and instances]. The results point to competitive
performance (in terms of generalization and efficiency) when dealing with datasets that are huge
in terms of both number of features and instances. As an illustrative example, we were able to
select100 features in a dataset with 29 ×1067features and 19 ×10instances in under 60 min (using
a 432-core cluster).
CONCLUSION
In discussing the problem of processing big data, especially from the perspective of
dimensionality, we have highlighted the impact of correctly identifying relevant features in
datasets and the corresponding difficulties caused by the combinatorial effects of incoming data
growing in terms of both instances and features. Despite the growing interest in dimensionality
reduction for big data, few FS methods have been developedto date that can adequately deal with
high-dimensionality problems. Adopting an information theory approach, we adapted a generic
CONTACT: PRAVEEN KUMAR. L (,+91 – 9791938249)
MAIL ID: , praveen@nexgenproject.com
Web: www.nexgenproject.com,
FS framework for big data from a proposal byBrown et al.. The framework contains
implementations of many state-of-the-art FS algorithms, including mRMRand JMI. The
adaptation entailed a radical redesign ofBrown et al.’s framework so as to adapt it to a
distributed paradigm. This paper has also resulted in contributing an FSmodule to the emerging
Spark and MLlib platforms, which, to date, included no complex FS algorithm. Our experimental
results demonstrate the usefulness of ourFS solution applied to a broad set of large real-world
problems. Our solution performed well with 2-D of big data (samples andfeatures) and yielded
competitive performance results in dealingwith both ultrahigh-dimensionality datasets and
datasets with a huge number of samples. Our distributed approach consistently outperformed the
sequential version, enabling the resolution of problems that could not be usefully resolved using
the classical approach. Future research will focus on the following.1) Designing new information
theory-based approaches for high-speed data streams and also extending theseapproaches to the
handling of drifts in concepts.2) Analyzing the impact of approximative selection on high-
dimensional data via faster solutions that do not incur a high penalty on accuracy.3) Designing a
new fully automatic FS system that selects the most relevant subset of features from a full set of
features, thereby eliminating the need to define the number of features to select in each
execution.
REFERENCES
[1] C.-C. Chang and C.-J. Lin, “LIBSVM: A library for support vec-tor machines,” ACM Trans.
Intell. Syst. Technol., vol. 2, no. 3,pp. 1–27, 2011. [Online]. Available:
http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/
[2] I. Guyon, S. Gunn, M. Nikravesh, and L. A. Zadeh, FeatureExtraction: Foundations and
Applications (Studies in Fuzziness and SoftComputing). Secaucus, NJ, USA: Springer-Verlag,
2006.
CONTACT: PRAVEEN KUMAR. L (,+91 – 9791938249)
MAIL ID: , praveen@nexgenproject.com
Web: www.nexgenproject.com,
[3] Z. A. Zhao and H. Liu, Spectral Feature Selection for Data Mining.Boca Raton, FL, USA:
CRC Press, 2011.
[4] V. Bolón-Canedo, N. Sánchez-Maroño, and A. Alonso-Betanzos,Feature Selection for High-
Dimensional Data (Artificial Intelligence:Foundations, Theory, and Algorithms). New York,
NY, USA: Springer,2015.
[5] Q. Wu, Z. Wang, F. Deng, Z. Chi, and D. D. Feng, “Realistic humanaction recognition with
multimodal feature selection and fusion,” IEEETrans. Syst., Man, Cybern., Syst., vol. 43, no. 4,
pp. 875–885, Jul. 2013.
[6] V. Bolón-Canedo, N. Sánchez-Maroño, A. Alonso-Betanzos,J. M. Benítez, and F. Herrera,
“A review of microarray datasetsand applied feature selection methods,” Inf. Sci., vol. 282, pp.
111–135,Oct. 2014.
[7] V. Bolón-Canedo, N. Sánchez-Maroño, and A. Alonso-Betanzos,“Recent advances and
emerging challenges of feature selection in thecontext of big data,” Knowl. Based Syst., vol. 86,
pp. 33–45, Sep. 2015.
[8] J. Dean and S. Ghemawat, “Mapreduce: Simplified data processingon large clusters,” in
Proc. OSDI, San Francisco, CA, USA, 2004,pp. 137–150.
[9] T. White, Hadoop, The Definitive Guide. Sebastopol, CA, USA: O’ReillyMedia, 2012.
[10] Apache Hadoop Project. (2017). Apache Hadoop. Accessed onFeb. 2017. [Online].
Available: http://hadoop.apache.org/
[11] H. Karau, A. Konwinski, P. Wendell, and M. Zaharia, Learning Spark:Lightening-Fast Big
Data Analytics. Beijing, China: O’Reilly Media,2015.

More Related Content

What's hot

Many Task Applications for Grids and Supercomputers
Many Task Applications for Grids and SupercomputersMany Task Applications for Grids and Supercomputers
Many Task Applications for Grids and Supercomputers
Ian Foster
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
nitesh saxena
 
Implementing Map Reduce Based Edmonds-Karp Algorithm to Determine Maximum Flo...
Implementing Map Reduce Based Edmonds-Karp Algorithm to Determine Maximum Flo...Implementing Map Reduce Based Edmonds-Karp Algorithm to Determine Maximum Flo...
Implementing Map Reduce Based Edmonds-Karp Algorithm to Determine Maximum Flo...
paperpublications3
 
Gray-Box Models for Performance Assessment of Spark Applications
Gray-Box Models for Performance Assessment of Spark ApplicationsGray-Box Models for Performance Assessment of Spark Applications
Gray-Box Models for Performance Assessment of Spark Applications
ATMOSPHERE .
 
Spatial approximate string search
Spatial approximate string searchSpatial approximate string search
Spatial approximate string search
JPINFOTECH JAYAPRAKASH
 
PAGE: A Partition Aware Engine for Parallel Graph Computation
PAGE: A Partition Aware Engine for Parallel Graph ComputationPAGE: A Partition Aware Engine for Parallel Graph Computation
PAGE: A Partition Aware Engine for Parallel Graph Computation
1crore projects
 
Fuzzy folded bloom filter-as-a-service for big data storage in the cloud
Fuzzy folded bloom filter-as-a-service for big data storage in the cloudFuzzy folded bloom filter-as-a-service for big data storage in the cloud
Fuzzy folded bloom filter-as-a-service for big data storage in the cloud
nexgentechnology
 
Comparative analysis of energy efficient scheduling algorithms for big data a...
Comparative analysis of energy efficient scheduling algorithms for big data a...Comparative analysis of energy efficient scheduling algorithms for big data a...
Comparative analysis of energy efficient scheduling algorithms for big data a...
nexgentechnology
 
rscript_paper-1
rscript_paper-1rscript_paper-1
rscript_paper-1
Eric Dagobert
 
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
The Statistical and Applied Mathematical Sciences Institute
 
APM project meeting - June 13, 2012 - LBNL, Berkeley, CA
APM project meeting - June 13, 2012 - LBNL, Berkeley, CAAPM project meeting - June 13, 2012 - LBNL, Berkeley, CA
APM project meeting - June 13, 2012 - LBNL, Berkeley, CA
balmanme
 
Dynamic adaptation balman
Dynamic adaptation balmanDynamic adaptation balman
Dynamic adaptation balman
balmanme
 
Clustering large probabilistic graphs
Clustering large probabilistic graphsClustering large probabilistic graphs
Clustering large probabilistic graphs
ecway
 
Facade
FacadeFacade
Facade
Louis Zhang
 
The Impact of Data Replication on Job Scheduling Performance in Hierarchical ...
The Impact of Data Replication on Job Scheduling Performance in Hierarchical ...The Impact of Data Replication on Job Scheduling Performance in Hierarchical ...
The Impact of Data Replication on Job Scheduling Performance in Hierarchical ...
graphhoc
 
Density Based Subspace Clustering Over Dynamic Data
Density Based Subspace Clustering Over Dynamic DataDensity Based Subspace Clustering Over Dynamic Data
Density Based Subspace Clustering Over Dynamic Data
Eirini Ntoutsi
 
Marvin_Capstone
Marvin_CapstoneMarvin_Capstone
Marvin_Capstone
Marvin Bertin
 
Classification of Big Data Use Cases by different Facets
Classification of Big Data Use Cases by different FacetsClassification of Big Data Use Cases by different Facets
Classification of Big Data Use Cases by different Facets
Geoffrey Fox
 
Efficient Doubletree: An Algorithm for Large-Scale Topology Discovery
Efficient Doubletree: An Algorithm for Large-Scale Topology DiscoveryEfficient Doubletree: An Algorithm for Large-Scale Topology Discovery
Efficient Doubletree: An Algorithm for Large-Scale Topology Discovery
IOSR Journals
 
EMR: A SCALABLE GRAPH-BASED RANKING MODEL FOR CONTENT-BASED IMAGE RETRIEVAL
EMR: A SCALABLE GRAPH-BASED RANKING MODEL FOR CONTENT-BASED IMAGE RETRIEVALEMR: A SCALABLE GRAPH-BASED RANKING MODEL FOR CONTENT-BASED IMAGE RETRIEVAL
EMR: A SCALABLE GRAPH-BASED RANKING MODEL FOR CONTENT-BASED IMAGE RETRIEVAL
I3E Technologies
 

What's hot (20)

Many Task Applications for Grids and Supercomputers
Many Task Applications for Grids and SupercomputersMany Task Applications for Grids and Supercomputers
Many Task Applications for Grids and Supercomputers
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Implementing Map Reduce Based Edmonds-Karp Algorithm to Determine Maximum Flo...
Implementing Map Reduce Based Edmonds-Karp Algorithm to Determine Maximum Flo...Implementing Map Reduce Based Edmonds-Karp Algorithm to Determine Maximum Flo...
Implementing Map Reduce Based Edmonds-Karp Algorithm to Determine Maximum Flo...
 
Gray-Box Models for Performance Assessment of Spark Applications
Gray-Box Models for Performance Assessment of Spark ApplicationsGray-Box Models for Performance Assessment of Spark Applications
Gray-Box Models for Performance Assessment of Spark Applications
 
Spatial approximate string search
Spatial approximate string searchSpatial approximate string search
Spatial approximate string search
 
PAGE: A Partition Aware Engine for Parallel Graph Computation
PAGE: A Partition Aware Engine for Parallel Graph ComputationPAGE: A Partition Aware Engine for Parallel Graph Computation
PAGE: A Partition Aware Engine for Parallel Graph Computation
 
Fuzzy folded bloom filter-as-a-service for big data storage in the cloud
Fuzzy folded bloom filter-as-a-service for big data storage in the cloudFuzzy folded bloom filter-as-a-service for big data storage in the cloud
Fuzzy folded bloom filter-as-a-service for big data storage in the cloud
 
Comparative analysis of energy efficient scheduling algorithms for big data a...
Comparative analysis of energy efficient scheduling algorithms for big data a...Comparative analysis of energy efficient scheduling algorithms for big data a...
Comparative analysis of energy efficient scheduling algorithms for big data a...
 
rscript_paper-1
rscript_paper-1rscript_paper-1
rscript_paper-1
 
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
 
APM project meeting - June 13, 2012 - LBNL, Berkeley, CA
APM project meeting - June 13, 2012 - LBNL, Berkeley, CAAPM project meeting - June 13, 2012 - LBNL, Berkeley, CA
APM project meeting - June 13, 2012 - LBNL, Berkeley, CA
 
Dynamic adaptation balman
Dynamic adaptation balmanDynamic adaptation balman
Dynamic adaptation balman
 
Clustering large probabilistic graphs
Clustering large probabilistic graphsClustering large probabilistic graphs
Clustering large probabilistic graphs
 
Facade
FacadeFacade
Facade
 
The Impact of Data Replication on Job Scheduling Performance in Hierarchical ...
The Impact of Data Replication on Job Scheduling Performance in Hierarchical ...The Impact of Data Replication on Job Scheduling Performance in Hierarchical ...
The Impact of Data Replication on Job Scheduling Performance in Hierarchical ...
 
Density Based Subspace Clustering Over Dynamic Data
Density Based Subspace Clustering Over Dynamic DataDensity Based Subspace Clustering Over Dynamic Data
Density Based Subspace Clustering Over Dynamic Data
 
Marvin_Capstone
Marvin_CapstoneMarvin_Capstone
Marvin_Capstone
 
Classification of Big Data Use Cases by different Facets
Classification of Big Data Use Cases by different FacetsClassification of Big Data Use Cases by different Facets
Classification of Big Data Use Cases by different Facets
 
Efficient Doubletree: An Algorithm for Large-Scale Topology Discovery
Efficient Doubletree: An Algorithm for Large-Scale Topology DiscoveryEfficient Doubletree: An Algorithm for Large-Scale Topology Discovery
Efficient Doubletree: An Algorithm for Large-Scale Topology Discovery
 
EMR: A SCALABLE GRAPH-BASED RANKING MODEL FOR CONTENT-BASED IMAGE RETRIEVAL
EMR: A SCALABLE GRAPH-BASED RANKING MODEL FOR CONTENT-BASED IMAGE RETRIEVALEMR: A SCALABLE GRAPH-BASED RANKING MODEL FOR CONTENT-BASED IMAGE RETRIEVAL
EMR: A SCALABLE GRAPH-BASED RANKING MODEL FOR CONTENT-BASED IMAGE RETRIEVAL
 

Similar to AN INFORMATION THEORY-BASED FEATURE SELECTIONFRAMEWORK FOR BIG DATA UNDER APACHE SPARK

IEEE Datamining 2016 Title and Abstract
IEEE  Datamining 2016 Title and AbstractIEEE  Datamining 2016 Title and Abstract
IEEE Datamining 2016 Title and Abstract
tsysglobalsolutions
 
2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )
2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )
2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )
SBGC
 
IEEE Big data 2016 Title and Abstract
IEEE Big data  2016 Title and AbstractIEEE Big data  2016 Title and Abstract
IEEE Big data 2016 Title and Abstract
tsysglobalsolutions
 
Parallel and distributed system projects for java and dot net
Parallel and distributed system projects for java and dot netParallel and distributed system projects for java and dot net
Parallel and distributed system projects for java and dot net
redpel dot com
 
Zhao huang deep sim deep learning code functional similarity
Zhao huang deep sim   deep learning code functional similarityZhao huang deep sim   deep learning code functional similarity
Zhao huang deep sim deep learning code functional similarity
itrejos
 
CLOUD BIOINFORMATICS Part1
 CLOUD BIOINFORMATICS Part1 CLOUD BIOINFORMATICS Part1
CLOUD BIOINFORMATICS Part1
ARPUTHA SELVARAJ A
 
Svm Classifier Algorithm for Data Stream Mining Using Hive and R
Svm Classifier Algorithm for Data Stream Mining Using Hive and RSvm Classifier Algorithm for Data Stream Mining Using Hive and R
Svm Classifier Algorithm for Data Stream Mining Using Hive and R
IRJET Journal
 
An Integrated Inductive-Deductive Framework for Data Mapping in Wireless Sens...
An Integrated Inductive-Deductive Framework for Data Mapping in Wireless Sens...An Integrated Inductive-Deductive Framework for Data Mapping in Wireless Sens...
An Integrated Inductive-Deductive Framework for Data Mapping in Wireless Sens...
M H
 
Novel Ensemble Tree for Fast Prediction on Data Streams
Novel Ensemble Tree for Fast Prediction on Data StreamsNovel Ensemble Tree for Fast Prediction on Data Streams
Novel Ensemble Tree for Fast Prediction on Data Streams
IJERA Editor
 
A Reconfigurable Component-Based Problem Solving Environment
A Reconfigurable Component-Based Problem Solving EnvironmentA Reconfigurable Component-Based Problem Solving Environment
A Reconfigurable Component-Based Problem Solving Environment
Sheila Sinclair
 
ADAPTER
ADAPTERADAPTER
Journals analysis ppt
Journals analysis pptJournals analysis ppt
Journals analysis ppt
Muhammad Heikal
 
A fuzzy clustering algorithm for high dimensional streaming data
A fuzzy clustering algorithm for high dimensional streaming dataA fuzzy clustering algorithm for high dimensional streaming data
A fuzzy clustering algorithm for high dimensional streaming data
Alexander Decker
 
EPAS: A SAMPLING BASED SIMILARITY IDENTIFICATION ALGORITHM FOR THE CLOUD
EPAS: A SAMPLING BASED SIMILARITY IDENTIFICATION ALGORITHM FOR THE CLOUDEPAS: A SAMPLING BASED SIMILARITY IDENTIFICATION ALGORITHM FOR THE CLOUD
EPAS: A SAMPLING BASED SIMILARITY IDENTIFICATION ALGORITHM FOR THE CLOUD
Nexgen Technology
 
a scalable two phase top down specialization approach for data anonymization ...
a scalable two phase top down specialization approach for data anonymization ...a scalable two phase top down specialization approach for data anonymization ...
a scalable two phase top down specialization approach for data anonymization ...
swathi78
 
IRJET- E-MORES: Efficient Multiple Output Regression for Streaming Data
IRJET- E-MORES: Efficient Multiple Output Regression for Streaming DataIRJET- E-MORES: Efficient Multiple Output Regression for Streaming Data
IRJET- E-MORES: Efficient Multiple Output Regression for Streaming Data
IRJET Journal
 
Using BIG DATA implementations onto Software Defined Networking
Using BIG DATA implementations onto Software Defined NetworkingUsing BIG DATA implementations onto Software Defined Networking
Using BIG DATA implementations onto Software Defined Networking
IJCSIS Research Publications
 
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
csandit
 
BIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONS
BIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONSBIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONS
BIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONS
cscpconf
 
benchmarks-sigmod09
benchmarks-sigmod09benchmarks-sigmod09
benchmarks-sigmod09
Hiroshi Ono
 

Similar to AN INFORMATION THEORY-BASED FEATURE SELECTIONFRAMEWORK FOR BIG DATA UNDER APACHE SPARK (20)

IEEE Datamining 2016 Title and Abstract
IEEE  Datamining 2016 Title and AbstractIEEE  Datamining 2016 Title and Abstract
IEEE Datamining 2016 Title and Abstract
 
2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )
2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )
2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )
 
IEEE Big data 2016 Title and Abstract
IEEE Big data  2016 Title and AbstractIEEE Big data  2016 Title and Abstract
IEEE Big data 2016 Title and Abstract
 
Parallel and distributed system projects for java and dot net
Parallel and distributed system projects for java and dot netParallel and distributed system projects for java and dot net
Parallel and distributed system projects for java and dot net
 
Zhao huang deep sim deep learning code functional similarity
Zhao huang deep sim   deep learning code functional similarityZhao huang deep sim   deep learning code functional similarity
Zhao huang deep sim deep learning code functional similarity
 
CLOUD BIOINFORMATICS Part1
 CLOUD BIOINFORMATICS Part1 CLOUD BIOINFORMATICS Part1
CLOUD BIOINFORMATICS Part1
 
Svm Classifier Algorithm for Data Stream Mining Using Hive and R
Svm Classifier Algorithm for Data Stream Mining Using Hive and RSvm Classifier Algorithm for Data Stream Mining Using Hive and R
Svm Classifier Algorithm for Data Stream Mining Using Hive and R
 
An Integrated Inductive-Deductive Framework for Data Mapping in Wireless Sens...
An Integrated Inductive-Deductive Framework for Data Mapping in Wireless Sens...An Integrated Inductive-Deductive Framework for Data Mapping in Wireless Sens...
An Integrated Inductive-Deductive Framework for Data Mapping in Wireless Sens...
 
Novel Ensemble Tree for Fast Prediction on Data Streams
Novel Ensemble Tree for Fast Prediction on Data StreamsNovel Ensemble Tree for Fast Prediction on Data Streams
Novel Ensemble Tree for Fast Prediction on Data Streams
 
A Reconfigurable Component-Based Problem Solving Environment
A Reconfigurable Component-Based Problem Solving EnvironmentA Reconfigurable Component-Based Problem Solving Environment
A Reconfigurable Component-Based Problem Solving Environment
 
ADAPTER
ADAPTERADAPTER
ADAPTER
 
Journals analysis ppt
Journals analysis pptJournals analysis ppt
Journals analysis ppt
 
A fuzzy clustering algorithm for high dimensional streaming data
A fuzzy clustering algorithm for high dimensional streaming dataA fuzzy clustering algorithm for high dimensional streaming data
A fuzzy clustering algorithm for high dimensional streaming data
 
EPAS: A SAMPLING BASED SIMILARITY IDENTIFICATION ALGORITHM FOR THE CLOUD
EPAS: A SAMPLING BASED SIMILARITY IDENTIFICATION ALGORITHM FOR THE CLOUDEPAS: A SAMPLING BASED SIMILARITY IDENTIFICATION ALGORITHM FOR THE CLOUD
EPAS: A SAMPLING BASED SIMILARITY IDENTIFICATION ALGORITHM FOR THE CLOUD
 
a scalable two phase top down specialization approach for data anonymization ...
a scalable two phase top down specialization approach for data anonymization ...a scalable two phase top down specialization approach for data anonymization ...
a scalable two phase top down specialization approach for data anonymization ...
 
IRJET- E-MORES: Efficient Multiple Output Regression for Streaming Data
IRJET- E-MORES: Efficient Multiple Output Regression for Streaming DataIRJET- E-MORES: Efficient Multiple Output Regression for Streaming Data
IRJET- E-MORES: Efficient Multiple Output Regression for Streaming Data
 
Using BIG DATA implementations onto Software Defined Networking
Using BIG DATA implementations onto Software Defined NetworkingUsing BIG DATA implementations onto Software Defined Networking
Using BIG DATA implementations onto Software Defined Networking
 
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
 
BIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONS
BIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONSBIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONS
BIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONS
 
benchmarks-sigmod09
benchmarks-sigmod09benchmarks-sigmod09
benchmarks-sigmod09
 

More from Nexgen Technology

MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
Nexgen Technology
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
Nexgen Technology
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
Nexgen Technology
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
Nexgen Technology
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
Nexgen Technology
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
Nexgen Technology
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CH...
     MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CH...     MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CH...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CH...
Nexgen Technology
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHENN...
  MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHENN...  MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHENN...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHENN...
Nexgen Technology
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
Nexgen Technology
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
Nexgen Technology
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHENNA...
 MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHENNA... MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHENNA...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHENNA...
Nexgen Technology
 
Ieee 2020 21 vlsi projects in pondicherry,ieee vlsi projects in chennai
Ieee 2020 21 vlsi projects in pondicherry,ieee  vlsi projects  in chennaiIeee 2020 21 vlsi projects in pondicherry,ieee  vlsi projects  in chennai
Ieee 2020 21 vlsi projects in pondicherry,ieee vlsi projects in chennai
Nexgen Technology
 
Ieee 2020 21 power electronics in pondicherry,Ieee 2020 21 power electronics
Ieee 2020 21 power electronics in pondicherry,Ieee 2020 21 power electronics Ieee 2020 21 power electronics in pondicherry,Ieee 2020 21 power electronics
Ieee 2020 21 power electronics in pondicherry,Ieee 2020 21 power electronics
Nexgen Technology
 
Ieee 2020 -21 ns2 in pondicherry, Ieee 2020 -21 ns2 projects,best project cen...
Ieee 2020 -21 ns2 in pondicherry, Ieee 2020 -21 ns2 projects,best project cen...Ieee 2020 -21 ns2 in pondicherry, Ieee 2020 -21 ns2 projects,best project cen...
Ieee 2020 -21 ns2 in pondicherry, Ieee 2020 -21 ns2 projects,best project cen...
Nexgen Technology
 
Ieee 2020 21 ns2 in pondicherry,best project center in pondicherry,final year...
Ieee 2020 21 ns2 in pondicherry,best project center in pondicherry,final year...Ieee 2020 21 ns2 in pondicherry,best project center in pondicherry,final year...
Ieee 2020 21 ns2 in pondicherry,best project center in pondicherry,final year...
Nexgen Technology
 
Ieee 2020 21 java dotnet in pondicherry,final year projects in pondicherry,pr...
Ieee 2020 21 java dotnet in pondicherry,final year projects in pondicherry,pr...Ieee 2020 21 java dotnet in pondicherry,final year projects in pondicherry,pr...
Ieee 2020 21 java dotnet in pondicherry,final year projects in pondicherry,pr...
Nexgen Technology
 
Ieee 2020 21 iot in pondicherry,final year projects in pondicherry,project ce...
Ieee 2020 21 iot in pondicherry,final year projects in pondicherry,project ce...Ieee 2020 21 iot in pondicherry,final year projects in pondicherry,project ce...
Ieee 2020 21 iot in pondicherry,final year projects in pondicherry,project ce...
Nexgen Technology
 
Ieee 2020 21 blockchain in pondicherry,final year projects in pondicherry,bes...
Ieee 2020 21 blockchain in pondicherry,final year projects in pondicherry,bes...Ieee 2020 21 blockchain in pondicherry,final year projects in pondicherry,bes...
Ieee 2020 21 blockchain in pondicherry,final year projects in pondicherry,bes...
Nexgen Technology
 
Ieee 2020 -21 bigdata in pondicherry,project center in pondicherry,best proje...
Ieee 2020 -21 bigdata in pondicherry,project center in pondicherry,best proje...Ieee 2020 -21 bigdata in pondicherry,project center in pondicherry,best proje...
Ieee 2020 -21 bigdata in pondicherry,project center in pondicherry,best proje...
Nexgen Technology
 
Ieee 2020 21 embedded in pondicherry,final year projects in pondicherry,best...
Ieee 2020 21  embedded in pondicherry,final year projects in pondicherry,best...Ieee 2020 21  embedded in pondicherry,final year projects in pondicherry,best...
Ieee 2020 21 embedded in pondicherry,final year projects in pondicherry,best...
Nexgen Technology
 

More from Nexgen Technology (20)

MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CH...
     MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CH...     MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CH...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CH...
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHENN...
  MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHENN...  MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHENN...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHENN...
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHENNA...
 MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHENNA... MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHENNA...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHENNA...
 
Ieee 2020 21 vlsi projects in pondicherry,ieee vlsi projects in chennai
Ieee 2020 21 vlsi projects in pondicherry,ieee  vlsi projects  in chennaiIeee 2020 21 vlsi projects in pondicherry,ieee  vlsi projects  in chennai
Ieee 2020 21 vlsi projects in pondicherry,ieee vlsi projects in chennai
 
Ieee 2020 21 power electronics in pondicherry,Ieee 2020 21 power electronics
Ieee 2020 21 power electronics in pondicherry,Ieee 2020 21 power electronics Ieee 2020 21 power electronics in pondicherry,Ieee 2020 21 power electronics
Ieee 2020 21 power electronics in pondicherry,Ieee 2020 21 power electronics
 
Ieee 2020 -21 ns2 in pondicherry, Ieee 2020 -21 ns2 projects,best project cen...
Ieee 2020 -21 ns2 in pondicherry, Ieee 2020 -21 ns2 projects,best project cen...Ieee 2020 -21 ns2 in pondicherry, Ieee 2020 -21 ns2 projects,best project cen...
Ieee 2020 -21 ns2 in pondicherry, Ieee 2020 -21 ns2 projects,best project cen...
 
Ieee 2020 21 ns2 in pondicherry,best project center in pondicherry,final year...
Ieee 2020 21 ns2 in pondicherry,best project center in pondicherry,final year...Ieee 2020 21 ns2 in pondicherry,best project center in pondicherry,final year...
Ieee 2020 21 ns2 in pondicherry,best project center in pondicherry,final year...
 
Ieee 2020 21 java dotnet in pondicherry,final year projects in pondicherry,pr...
Ieee 2020 21 java dotnet in pondicherry,final year projects in pondicherry,pr...Ieee 2020 21 java dotnet in pondicherry,final year projects in pondicherry,pr...
Ieee 2020 21 java dotnet in pondicherry,final year projects in pondicherry,pr...
 
Ieee 2020 21 iot in pondicherry,final year projects in pondicherry,project ce...
Ieee 2020 21 iot in pondicherry,final year projects in pondicherry,project ce...Ieee 2020 21 iot in pondicherry,final year projects in pondicherry,project ce...
Ieee 2020 21 iot in pondicherry,final year projects in pondicherry,project ce...
 
Ieee 2020 21 blockchain in pondicherry,final year projects in pondicherry,bes...
Ieee 2020 21 blockchain in pondicherry,final year projects in pondicherry,bes...Ieee 2020 21 blockchain in pondicherry,final year projects in pondicherry,bes...
Ieee 2020 21 blockchain in pondicherry,final year projects in pondicherry,bes...
 
Ieee 2020 -21 bigdata in pondicherry,project center in pondicherry,best proje...
Ieee 2020 -21 bigdata in pondicherry,project center in pondicherry,best proje...Ieee 2020 -21 bigdata in pondicherry,project center in pondicherry,best proje...
Ieee 2020 -21 bigdata in pondicherry,project center in pondicherry,best proje...
 
Ieee 2020 21 embedded in pondicherry,final year projects in pondicherry,best...
Ieee 2020 21  embedded in pondicherry,final year projects in pondicherry,best...Ieee 2020 21  embedded in pondicherry,final year projects in pondicherry,best...
Ieee 2020 21 embedded in pondicherry,final year projects in pondicherry,best...
 

Recently uploaded

Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47
MysoreMuleSoftMeetup
 
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
National Information Standards Organization (NISO)
 
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
GeorgeMilliken2
 
A Visual Guide to 1 Samuel | A Tale of Two Hearts
A Visual Guide to 1 Samuel | A Tale of Two HeartsA Visual Guide to 1 Samuel | A Tale of Two Hearts
A Visual Guide to 1 Samuel | A Tale of Two Hearts
Steve Thomason
 
How Barcodes Can Be Leveraged Within Odoo 17
How Barcodes Can Be Leveraged Within Odoo 17How Barcodes Can Be Leveraged Within Odoo 17
How Barcodes Can Be Leveraged Within Odoo 17
Celine George
 
Pharmaceutics Pharmaceuticals best of brub
Pharmaceutics Pharmaceuticals best of brubPharmaceutics Pharmaceuticals best of brub
Pharmaceutics Pharmaceuticals best of brub
danielkiash986
 
Benner "Expanding Pathways to Publishing Careers"
Benner "Expanding Pathways to Publishing Careers"Benner "Expanding Pathways to Publishing Careers"
Benner "Expanding Pathways to Publishing Careers"
National Information Standards Organization (NISO)
 
Oliver Asks for More by Charles Dickens (9)
Oliver Asks for More by Charles Dickens (9)Oliver Asks for More by Charles Dickens (9)
Oliver Asks for More by Charles Dickens (9)
nitinpv4ai
 
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
imrankhan141184
 
How to deliver Powerpoint Presentations.pptx
How to deliver Powerpoint  Presentations.pptxHow to deliver Powerpoint  Presentations.pptx
How to deliver Powerpoint Presentations.pptx
HajraNaeem15
 
Chapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptxChapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptx
Denish Jangid
 
Stack Memory Organization of 8086 Microprocessor
Stack Memory Organization of 8086 MicroprocessorStack Memory Organization of 8086 Microprocessor
Stack Memory Organization of 8086 Microprocessor
JomonJoseph58
 
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...
indexPub
 
SWOT analysis in the project Keeping the Memory @live.pptx
SWOT analysis in the project Keeping the Memory @live.pptxSWOT analysis in the project Keeping the Memory @live.pptx
SWOT analysis in the project Keeping the Memory @live.pptx
zuzanka
 
Data Structure using C by Dr. K Adisesha .ppsx
Data Structure using C by Dr. K Adisesha .ppsxData Structure using C by Dr. K Adisesha .ppsx
Data Structure using C by Dr. K Adisesha .ppsx
Prof. Dr. K. Adisesha
 
Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...
PsychoTech Services
 
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptxBIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
RidwanHassanYusuf
 
MDP on air pollution of class 8 year 2024-2025
MDP on air pollution of class 8 year 2024-2025MDP on air pollution of class 8 year 2024-2025
MDP on air pollution of class 8 year 2024-2025
khuleseema60
 
Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...
Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...
Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...
ImMuslim
 
skeleton System.pdf (skeleton system wow)
skeleton System.pdf (skeleton system wow)skeleton System.pdf (skeleton system wow)
skeleton System.pdf (skeleton system wow)
Mohammad Al-Dhahabi
 

Recently uploaded (20)

Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47
 
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
 
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
 
A Visual Guide to 1 Samuel | A Tale of Two Hearts
A Visual Guide to 1 Samuel | A Tale of Two HeartsA Visual Guide to 1 Samuel | A Tale of Two Hearts
A Visual Guide to 1 Samuel | A Tale of Two Hearts
 
How Barcodes Can Be Leveraged Within Odoo 17
How Barcodes Can Be Leveraged Within Odoo 17How Barcodes Can Be Leveraged Within Odoo 17
How Barcodes Can Be Leveraged Within Odoo 17
 
Pharmaceutics Pharmaceuticals best of brub
Pharmaceutics Pharmaceuticals best of brubPharmaceutics Pharmaceuticals best of brub
Pharmaceutics Pharmaceuticals best of brub
 
Benner "Expanding Pathways to Publishing Careers"
Benner "Expanding Pathways to Publishing Careers"Benner "Expanding Pathways to Publishing Careers"
Benner "Expanding Pathways to Publishing Careers"
 
Oliver Asks for More by Charles Dickens (9)
Oliver Asks for More by Charles Dickens (9)Oliver Asks for More by Charles Dickens (9)
Oliver Asks for More by Charles Dickens (9)
 
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
 
How to deliver Powerpoint Presentations.pptx
How to deliver Powerpoint  Presentations.pptxHow to deliver Powerpoint  Presentations.pptx
How to deliver Powerpoint Presentations.pptx
 
Chapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptxChapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptx
 
Stack Memory Organization of 8086 Microprocessor
Stack Memory Organization of 8086 MicroprocessorStack Memory Organization of 8086 Microprocessor
Stack Memory Organization of 8086 Microprocessor
 
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...
 
SWOT analysis in the project Keeping the Memory @live.pptx
SWOT analysis in the project Keeping the Memory @live.pptxSWOT analysis in the project Keeping the Memory @live.pptx
SWOT analysis in the project Keeping the Memory @live.pptx
 
Data Structure using C by Dr. K Adisesha .ppsx
Data Structure using C by Dr. K Adisesha .ppsxData Structure using C by Dr. K Adisesha .ppsx
Data Structure using C by Dr. K Adisesha .ppsx
 
Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...
 
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptxBIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
 
MDP on air pollution of class 8 year 2024-2025
MDP on air pollution of class 8 year 2024-2025MDP on air pollution of class 8 year 2024-2025
MDP on air pollution of class 8 year 2024-2025
 
Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...
Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...
Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...
 
skeleton System.pdf (skeleton system wow)
skeleton System.pdf (skeleton system wow)skeleton System.pdf (skeleton system wow)
skeleton System.pdf (skeleton system wow)
 

AN INFORMATION THEORY-BASED FEATURE SELECTIONFRAMEWORK FOR BIG DATA UNDER APACHE SPARK

  • 1. CONTACT: PRAVEEN KUMAR. L (,+91 – 9791938249) MAIL ID: , praveen@nexgenproject.com Web: www.nexgenproject.com, AN INFORMATION THEORY-BASED FEATURE SELECTIONFRAMEWORK FOR BIG DATA UNDER APACHE SPARK ABSTRACT With the advent of extremely high dimensional datasets, dimensionality reduction techniques are becoming mandatory. Of the many techniques available, feature selection (FS) is of growing interest for its ability to identify both relevant features and frequently repeated instances in huge datasets. We aim to demonstrate that standard FS methods can be parallelized in big data platforms like Apache Spark so as to boost both performance and accuracy. We propose a distributed implementation of a generic FS framework that includes broad group of well-known information theory-based methods. Experimental results for a broad set of real-world datasets show that our distributed framework is capable of rapidly dealing with ultrahigh-dimensional datasets as well as those with a huge number of samples, outperforming the sequential version in all the cases studied. EXISTING SYSTEM: Existing FS methods are not expected to scale well when dealing with big data due to the fact that efficiency may significantly deteriorates or that the FS approach may even become inapplicable. Scalable distributed programming protocols and frameworks have emerged in the last decade to manage the problem of big data. The first programming model developed was MapReduce along with its open-source implementation Apache Hadoop. More recently, Apache Spark has been presented as a new distributed framework with a fast, general large-scale data processing engine that is popular with ML researchers due to its suitability for iterative procedures. Likewise, several libraries for approaching ML tasks in big data environments have emerged in recent years. The first such library was Mahout , followed up by MLlib ,built on the
  • 2. CONTACT: PRAVEEN KUMAR. L (,+91 – 9791938249) MAIL ID: , praveen@nexgenproject.com Web: www.nexgenproject.com, Spark system. Thanks to Spark’s capacity for in-memory computation that speeds up iterative processes, algorithms developed for this kind of platform havebecome pervasive in the industry. Although several gold standard algorithms for ML tasks have been redesigned to incorporate a distributed implementation for big data technologies, this is not yet the case for FS algorithms. PROPOSED SYSTEM: This paper aims to fill the existing gap by demonstrating that standard FS methods can be designed for big data platforms that can be usefully applied to big datasets while boosting both performance and accuracy. We propose a new distributed design for an FS generic framework based on information theory that is implemented using the Apache Spark paradigm. To make this adaptation feasible, a wide variety of techniques from the distributed environment have been used, including information caching, data partitioning and replication of relevant variables. Note that adapting this framework to Spark implies a major challenge as it requires deep restructuring of classic algorithms. To test the effectiveness of our framework, we applied it toa complete set of real-world datasets [up to O(10) features and instances]. The results point to competitive performance (in terms of generalization and efficiency) when dealing with datasets that are huge in terms of both number of features and instances. As an illustrative example, we were able to select100 features in a dataset with 29 ×1067features and 19 ×10instances in under 60 min (using a 432-core cluster). CONCLUSION In discussing the problem of processing big data, especially from the perspective of dimensionality, we have highlighted the impact of correctly identifying relevant features in datasets and the corresponding difficulties caused by the combinatorial effects of incoming data growing in terms of both instances and features. Despite the growing interest in dimensionality reduction for big data, few FS methods have been developedto date that can adequately deal with high-dimensionality problems. Adopting an information theory approach, we adapted a generic
  • 3. CONTACT: PRAVEEN KUMAR. L (,+91 – 9791938249) MAIL ID: , praveen@nexgenproject.com Web: www.nexgenproject.com, FS framework for big data from a proposal byBrown et al.. The framework contains implementations of many state-of-the-art FS algorithms, including mRMRand JMI. The adaptation entailed a radical redesign ofBrown et al.’s framework so as to adapt it to a distributed paradigm. This paper has also resulted in contributing an FSmodule to the emerging Spark and MLlib platforms, which, to date, included no complex FS algorithm. Our experimental results demonstrate the usefulness of ourFS solution applied to a broad set of large real-world problems. Our solution performed well with 2-D of big data (samples andfeatures) and yielded competitive performance results in dealingwith both ultrahigh-dimensionality datasets and datasets with a huge number of samples. Our distributed approach consistently outperformed the sequential version, enabling the resolution of problems that could not be usefully resolved using the classical approach. Future research will focus on the following.1) Designing new information theory-based approaches for high-speed data streams and also extending theseapproaches to the handling of drifts in concepts.2) Analyzing the impact of approximative selection on high- dimensional data via faster solutions that do not incur a high penalty on accuracy.3) Designing a new fully automatic FS system that selects the most relevant subset of features from a full set of features, thereby eliminating the need to define the number of features to select in each execution. REFERENCES [1] C.-C. Chang and C.-J. Lin, “LIBSVM: A library for support vec-tor machines,” ACM Trans. Intell. Syst. Technol., vol. 2, no. 3,pp. 1–27, 2011. [Online]. Available: http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/ [2] I. Guyon, S. Gunn, M. Nikravesh, and L. A. Zadeh, FeatureExtraction: Foundations and Applications (Studies in Fuzziness and SoftComputing). Secaucus, NJ, USA: Springer-Verlag, 2006.
  • 4. CONTACT: PRAVEEN KUMAR. L (,+91 – 9791938249) MAIL ID: , praveen@nexgenproject.com Web: www.nexgenproject.com, [3] Z. A. Zhao and H. Liu, Spectral Feature Selection for Data Mining.Boca Raton, FL, USA: CRC Press, 2011. [4] V. Bolón-Canedo, N. Sánchez-Maroño, and A. Alonso-Betanzos,Feature Selection for High- Dimensional Data (Artificial Intelligence:Foundations, Theory, and Algorithms). New York, NY, USA: Springer,2015. [5] Q. Wu, Z. Wang, F. Deng, Z. Chi, and D. D. Feng, “Realistic humanaction recognition with multimodal feature selection and fusion,” IEEETrans. Syst., Man, Cybern., Syst., vol. 43, no. 4, pp. 875–885, Jul. 2013. [6] V. Bolón-Canedo, N. Sánchez-Maroño, A. Alonso-Betanzos,J. M. Benítez, and F. Herrera, “A review of microarray datasetsand applied feature selection methods,” Inf. Sci., vol. 282, pp. 111–135,Oct. 2014. [7] V. Bolón-Canedo, N. Sánchez-Maroño, and A. Alonso-Betanzos,“Recent advances and emerging challenges of feature selection in thecontext of big data,” Knowl. Based Syst., vol. 86, pp. 33–45, Sep. 2015. [8] J. Dean and S. Ghemawat, “Mapreduce: Simplified data processingon large clusters,” in Proc. OSDI, San Francisco, CA, USA, 2004,pp. 137–150. [9] T. White, Hadoop, The Definitive Guide. Sebastopol, CA, USA: O’ReillyMedia, 2012. [10] Apache Hadoop Project. (2017). Apache Hadoop. Accessed onFeb. 2017. [Online]. Available: http://hadoop.apache.org/ [11] H. Karau, A. Konwinski, P. Wendell, and M. Zaharia, Learning Spark:Lightening-Fast Big Data Analytics. Beijing, China: O’Reilly Media,2015.