SlideShare a Scribd company logo
1 of 3
CONTACT: PRAVEEN KUMAR. L (, +91 – 9791938249)
MAIL ID: sunsid1989@gmail.com, praveen@nexgenproject.com
Web: www.nexgenproject.com, www.finalyear-ieeeprojects.com
On Distributed Fuzzy Decision Trees for Big Data
Abstract:
Fuzzy decision trees (FDTs) have shown to be an effective solution in the framework of
fuzzy classification. The approaches proposed so far to FDT learning, however, have
generally neglected time and space requirements. In this paper, we propose a distributed FDT
learning scheme shaped according to the MapReduce programming model for generating
both binary and multi-way FDTs from big data. The scheme relies on a novel distributed
fuzzy discretizer that generates a strong fuzzy partition for each continuous attribute based on
fuzzy information entropy. The fuzzy partitions are therefore used as input to the FDT
learning algorithm, which employs fuzzy information gain for selecting the attributes at the
decision nodes. We have implemented the FDT learning scheme on the Apache Spark
framework. We have used ten real-world publicly available big datasets for evaluating the
behavior of the scheme along three dimensions: i) performance in terms of classification
accuracy, model complexity and execution time, ii) scalability varying the number of
computing units and iii) ability to efficiently accommodate an increasing dataset size. We
have demonstrated that the proposed scheme turns out to be suitable for managing big
datasets even with modest commodity hardware support. Finally, we have used the
distributed decision tree learning algorithm implemented in the MLLib library and the Chi-
FRBCS-BigData algorithm, a MapReduce distributed fuzzy rule-based classification system,
for comparative analysis.
Existing System:
A decision tree can be generated efficiently from very large datasets. The various techniques
proposed in the literature can be roughly grouped into two categories, which are characterized
by performing a pre-sorting of the data or by adopting approximate representations of the
data such as samples and/or histograms [46]. While pre-sorting techniques are more accurate,
they cannot accommodate very large datasets or streaming data [46]. One of the oldest
approaches in the first category is SLIQ, proposed in [47]. SLIQ reduces decision tree
learning time without loss in accuracy by exploiting a pre-sorting technique in the tree-
growth phase. This technique is integrated with a breadth-first tree growing strategy to enable
classification of disk-resident datasets.
Proposed System:
we propose a distributed fuzzy discretizer and a distributed FDT (DFDT) learning scheme
upon the MapReduce programming model for managing big data. To the best of our
knowledge, in the context of big data, no distributed discretizer for generating fuzzy
CONTACT: PRAVEEN KUMAR. L (, +91 – 9791938249)
MAIL ID: sunsid1989@gmail.com, praveen@nexgenproject.com
Web: www.nexgenproject.com, www.finalyear-ieeeprojects.com
partitions and no DFDT have been proposed in the literature. Our novel discretizer generates
a strong fuzzy partition for each continuous attribute by using a purposely adapted distributed
version of the well-known method proposed by Fayyad and Irani in [44]. The fuzzy partitions
computed by the discretizer are used as input to the DFDT learning algorithm. We adopt and
compare two different versions of the learning algorithm based on binary and multi-way
splits, respectively. Both the versions employ the information gain computed in terms of
fuzzy entropy for selecting the attribute to be adopted at each decision node.
Conclusions:
We have proposed a distributed fuzzy decision tree (FDT) learning scheme shaped according
to the MapReduce programming model for generating both binary (FBDT) and multiway
(FMDT) FDTs from big data. We have first introduced a novel distributed fuzzy discretizer,
which generates strong fuzzy partitions for each continuous attribute based on fuzzy
information entropy. Then, we have discussed a distributed implementation of an FDT
learning algorithm, which employs the fuzzy information gain for selecting the attributes to
be used in the decision nodes. We have implemented the FDT learning scheme on the Apache
Spark framework. we believe that the work presented in this paper is the first extensive study
on the application of FDTs to big data, considering both binary and multi-way splits. We
expect that the experimental results can be used as baseline for future research in this field.
REFERENCES
[1] R. Diao, K. Sun, V. Vittal, R. J. O’Keefe, M. R. Richardson, N. Bhatt, D. Stradford, and
S. K. Sarawgi, “Decision tree-based online voltage security assessment using PMU
measurements,” IEEE Transactions on Power Systems, vol. 24, no. 2, pp. 832–839, 2009.
[2] T. Goetz, The decision tree: Taking control of your health in the new era of personalized
medicine. Rodale Inc., 2010.
[3] Y. Zheng, L. Liu, L. Wang, and X. Xie, “Learning transportation mode from raw gps data
for geographic applications on the web,” in Proceedings of the 17th international conference
on World Wide Web, 2008, pp. 247–256.
[4] J. Han, M. Kamber, and J. Pei, Data mining: Concepts and techniques. Elsevier, 2011.
[5] L. Rokach and O. Maimon, Data mining with decision trees: Theory and applications.
World scientific, 2014.
[6] J. R. Quinlan, “Induction of decision trees,” Machine learning, vol. 1, no. 1, pp. 81–106,
1986.
CONTACT: PRAVEEN KUMAR. L (, +91 – 9791938249)
MAIL ID: sunsid1989@gmail.com, praveen@nexgenproject.com
Web: www.nexgenproject.com, www.finalyear-ieeeprojects.com
[7] L. Breiman, J. Friedman, C. J. Stone, and R. A. Olshen, Classification and regression
trees. CRC press, 1984.
[8] C. Z. Janikow, “Fuzzy decision trees: Issues and methods,” IEEE Transactions on
Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 28, no. 1, pp. 1–14, 1998.
[9] Y.-l. Chen, T. Wang, B.-s. Wang, and Z.-j. Li, “A survey of fuzzy decision tree
classifier,” Fuzzy Information and Engineering, vol. 1, no. 2, pp. 149–159, 2009.
[10] J. R. Quinlan, C4.5: Programs for Machine Learning. San Francisco, CA, USA: Morgan
Kaufmann Publishers Inc., 1993.
[11] X. Liu, X. Feng, and W. Pedrycz, “Extraction of fuzzy rules from fuzzy decision trees:
An axiomatic fuzzy sets (AFS) approach,” Data & Knowledge Engineering, vol. 84, pp. 1–
25, 2013.
[12] T. Hastie, R. Tibshirani, J. Friedman, and J. Franklin, “The elements of statistical
learning: Data mining, inference and prediction,” The Mathematical Intelligencer, vol. 27, no.
2, pp. 83–85, 2005.
[13] H. Kim and W.-Y. Loh, “Classification trees with unbiased multiway splits,” Journal of
the American Statistical Association, pp. 589–604, 2011.
[14] F. Berzal, J.-C. Cubero, N. Marın, and D. Sanchez, “Building multi- ´ way decision trees
with numerical attributes,” Information Sciences, vol. 165, no. 1, pp. 73–90, 2004.
[15] Y. Yuan and M. J. Shaw, “Induction of fuzzy decision trees,” Fuzzy Sets and systems,
vol. 69, no. 2, pp. 125–139, 1995.
[16] R. Weber, “Fuzzy-ID3: A class of methods for automatic knowledge acquisition,” in Int.
Conf. on Fuzzy Logic & Neural Networks, 1992, pp. 265–268.
[17] M. Zeinalkhani and M. Eftekhari, “Fuzzy partitioning of continuous attributes through
discretization methods to construct fuzzy decision tree classifiers,” Information Sciences, vol.
278, pp. 715–735, 2014.
[18] S. Garcia, J. Luengo, J. A. Saez, V. Lopez, and F. Herrera, “A survey of ´ discretization
techniques: Taxonomy and empirical analysis in supervised learning,” IEEE Trans. on
Knowledge and Data Engineering, vol. 25, no. 4, pp. 734–750, 2013.

More Related Content

What's hot

Different Classification Technique for Data mining in Insurance Industry usin...
Different Classification Technique for Data mining in Insurance Industry usin...Different Classification Technique for Data mining in Insurance Industry usin...
Different Classification Technique for Data mining in Insurance Industry usin...IOSRjournaljce
 
Evaluating the efficiency of rule techniques for file
Evaluating the efficiency of rule techniques for fileEvaluating the efficiency of rule techniques for file
Evaluating the efficiency of rule techniques for fileeSAT Publishing House
 
Distributed Digital Artifacts on the Semantic Web
Distributed Digital Artifacts on the Semantic WebDistributed Digital Artifacts on the Semantic Web
Distributed Digital Artifacts on the Semantic WebEditor IJCATR
 
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERING
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERINGA SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERING
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERINGijcsa
 
Privacy Preservation and Restoration of Data Using Unrealized Data Sets
Privacy Preservation and Restoration of Data Using Unrealized Data SetsPrivacy Preservation and Restoration of Data Using Unrealized Data Sets
Privacy Preservation and Restoration of Data Using Unrealized Data SetsIJERA Editor
 
Feature Subset Selection for High Dimensional Data Using Clustering Techniques
Feature Subset Selection for High Dimensional Data Using Clustering TechniquesFeature Subset Selection for High Dimensional Data Using Clustering Techniques
Feature Subset Selection for High Dimensional Data Using Clustering TechniquesIRJET Journal
 
Novel Ensemble Tree for Fast Prediction on Data Streams
Novel Ensemble Tree for Fast Prediction on Data StreamsNovel Ensemble Tree for Fast Prediction on Data Streams
Novel Ensemble Tree for Fast Prediction on Data StreamsIJERA Editor
 
Combined mining approach to generate patterns for complex data
Combined mining approach to generate patterns for complex dataCombined mining approach to generate patterns for complex data
Combined mining approach to generate patterns for complex datacsandit
 
COMBINED MINING APPROACH TO GENERATE PATTERNS FOR COMPLEX DATA
COMBINED MINING APPROACH TO GENERATE PATTERNS FOR COMPLEX DATACOMBINED MINING APPROACH TO GENERATE PATTERNS FOR COMPLEX DATA
COMBINED MINING APPROACH TO GENERATE PATTERNS FOR COMPLEX DATAcscpconf
 
Classification Techniques: A Review
Classification Techniques: A ReviewClassification Techniques: A Review
Classification Techniques: A ReviewIOSRjournaljce
 
A Hierarchical and Grid Based Clustering Method for Distributed Systems (Hgd ...
A Hierarchical and Grid Based Clustering Method for Distributed Systems (Hgd ...A Hierarchical and Grid Based Clustering Method for Distributed Systems (Hgd ...
A Hierarchical and Grid Based Clustering Method for Distributed Systems (Hgd ...iosrjce
 
DATA MINING.doc
DATA MINING.docDATA MINING.doc
DATA MINING.docbutest
 
A CONCEPTUAL METADATA FRAMEWORK FOR SPATIAL DATA WAREHOUSE
A CONCEPTUAL METADATA FRAMEWORK FOR SPATIAL DATA WAREHOUSEA CONCEPTUAL METADATA FRAMEWORK FOR SPATIAL DATA WAREHOUSE
A CONCEPTUAL METADATA FRAMEWORK FOR SPATIAL DATA WAREHOUSEIJDKP
 
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...theijes
 

What's hot (17)

Different Classification Technique for Data mining in Insurance Industry usin...
Different Classification Technique for Data mining in Insurance Industry usin...Different Classification Technique for Data mining in Insurance Industry usin...
Different Classification Technique for Data mining in Insurance Industry usin...
 
Evaluating the efficiency of rule techniques for file
Evaluating the efficiency of rule techniques for fileEvaluating the efficiency of rule techniques for file
Evaluating the efficiency of rule techniques for file
 
Distributed Digital Artifacts on the Semantic Web
Distributed Digital Artifacts on the Semantic WebDistributed Digital Artifacts on the Semantic Web
Distributed Digital Artifacts on the Semantic Web
 
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERING
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERINGA SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERING
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERING
 
Privacy Preservation and Restoration of Data Using Unrealized Data Sets
Privacy Preservation and Restoration of Data Using Unrealized Data SetsPrivacy Preservation and Restoration of Data Using Unrealized Data Sets
Privacy Preservation and Restoration of Data Using Unrealized Data Sets
 
Feature Subset Selection for High Dimensional Data Using Clustering Techniques
Feature Subset Selection for High Dimensional Data Using Clustering TechniquesFeature Subset Selection for High Dimensional Data Using Clustering Techniques
Feature Subset Selection for High Dimensional Data Using Clustering Techniques
 
Novel Ensemble Tree for Fast Prediction on Data Streams
Novel Ensemble Tree for Fast Prediction on Data StreamsNovel Ensemble Tree for Fast Prediction on Data Streams
Novel Ensemble Tree for Fast Prediction on Data Streams
 
130509
130509130509
130509
 
Combined mining approach to generate patterns for complex data
Combined mining approach to generate patterns for complex dataCombined mining approach to generate patterns for complex data
Combined mining approach to generate patterns for complex data
 
COMBINED MINING APPROACH TO GENERATE PATTERNS FOR COMPLEX DATA
COMBINED MINING APPROACH TO GENERATE PATTERNS FOR COMPLEX DATACOMBINED MINING APPROACH TO GENERATE PATTERNS FOR COMPLEX DATA
COMBINED MINING APPROACH TO GENERATE PATTERNS FOR COMPLEX DATA
 
31 34
31 3431 34
31 34
 
Classification Techniques: A Review
Classification Techniques: A ReviewClassification Techniques: A Review
Classification Techniques: A Review
 
Ir3116271633
Ir3116271633Ir3116271633
Ir3116271633
 
A Hierarchical and Grid Based Clustering Method for Distributed Systems (Hgd ...
A Hierarchical and Grid Based Clustering Method for Distributed Systems (Hgd ...A Hierarchical and Grid Based Clustering Method for Distributed Systems (Hgd ...
A Hierarchical and Grid Based Clustering Method for Distributed Systems (Hgd ...
 
DATA MINING.doc
DATA MINING.docDATA MINING.doc
DATA MINING.doc
 
A CONCEPTUAL METADATA FRAMEWORK FOR SPATIAL DATA WAREHOUSE
A CONCEPTUAL METADATA FRAMEWORK FOR SPATIAL DATA WAREHOUSEA CONCEPTUAL METADATA FRAMEWORK FOR SPATIAL DATA WAREHOUSE
A CONCEPTUAL METADATA FRAMEWORK FOR SPATIAL DATA WAREHOUSE
 
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
 

Similar to On distributed fuzzy decision trees for big data

ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA
 ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA
ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATANexgen Technology
 
A Survey of Modern Data Classification Techniques
A Survey of Modern Data Classification TechniquesA Survey of Modern Data Classification Techniques
A Survey of Modern Data Classification Techniquesijsrd.com
 
Data Mining System and Applications: A Review
Data Mining System and Applications: A ReviewData Mining System and Applications: A Review
Data Mining System and Applications: A Reviewijdpsjournal
 
A scenario based approach for dealing with
A scenario based approach for dealing withA scenario based approach for dealing with
A scenario based approach for dealing withijcsa
 
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerStudy and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerIJERA Editor
 
Data Analysis and Prediction System for Meteorological Data
Data Analysis and Prediction System for Meteorological DataData Analysis and Prediction System for Meteorological Data
Data Analysis and Prediction System for Meteorological DataIRJET Journal
 
Ontology Based PMSE with Manifold Preference
Ontology Based PMSE with Manifold PreferenceOntology Based PMSE with Manifold Preference
Ontology Based PMSE with Manifold PreferenceIJCERT
 
Comprehensive Survey of Data Classification & Prediction Techniques
Comprehensive Survey of Data Classification & Prediction TechniquesComprehensive Survey of Data Classification & Prediction Techniques
Comprehensive Survey of Data Classification & Prediction Techniquesijsrd.com
 
Big data cloud-based recommendation system using NLP techniques with machine ...
Big data cloud-based recommendation system using NLP techniques with machine ...Big data cloud-based recommendation system using NLP techniques with machine ...
Big data cloud-based recommendation system using NLP techniques with machine ...TELKOMNIKA JOURNAL
 
Frequent Item set Mining of Big Data for Social Media
Frequent Item set Mining of Big Data for Social MediaFrequent Item set Mining of Big Data for Social Media
Frequent Item set Mining of Big Data for Social MediaIJERA Editor
 
Frequent Item set Mining of Big Data for Social Media
Frequent Item set Mining of Big Data for Social MediaFrequent Item set Mining of Big Data for Social Media
Frequent Item set Mining of Big Data for Social MediaIJERA Editor
 
Ci2004-10.doc
Ci2004-10.docCi2004-10.doc
Ci2004-10.docbutest
 
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...acijjournal
 
Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...
Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...
Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...IJEACS
 
Survey on MapReduce in Big Data Clustering using Machine Learning Algorithms
Survey on MapReduce in Big Data Clustering using Machine Learning AlgorithmsSurvey on MapReduce in Big Data Clustering using Machine Learning Algorithms
Survey on MapReduce in Big Data Clustering using Machine Learning AlgorithmsIRJET Journal
 
ISSUES, CHALLENGES, AND SOLUTIONS: BIG DATA MINING
ISSUES, CHALLENGES, AND SOLUTIONS: BIG DATA MININGISSUES, CHALLENGES, AND SOLUTIONS: BIG DATA MINING
ISSUES, CHALLENGES, AND SOLUTIONS: BIG DATA MININGcscpconf
 
Issues, challenges, and solutions
Issues, challenges, and solutionsIssues, challenges, and solutions
Issues, challenges, and solutionscsandit
 

Similar to On distributed fuzzy decision trees for big data (20)

ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA
 ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA
ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA
 
A Survey of Modern Data Classification Techniques
A Survey of Modern Data Classification TechniquesA Survey of Modern Data Classification Techniques
A Survey of Modern Data Classification Techniques
 
Data Mining System and Applications: A Review
Data Mining System and Applications: A ReviewData Mining System and Applications: A Review
Data Mining System and Applications: A Review
 
A scenario based approach for dealing with
A scenario based approach for dealing withA scenario based approach for dealing with
A scenario based approach for dealing with
 
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerStudy and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
 
Data Analysis and Prediction System for Meteorological Data
Data Analysis and Prediction System for Meteorological DataData Analysis and Prediction System for Meteorological Data
Data Analysis and Prediction System for Meteorological Data
 
Ontology Based PMSE with Manifold Preference
Ontology Based PMSE with Manifold PreferenceOntology Based PMSE with Manifold Preference
Ontology Based PMSE with Manifold Preference
 
Comprehensive Survey of Data Classification & Prediction Techniques
Comprehensive Survey of Data Classification & Prediction TechniquesComprehensive Survey of Data Classification & Prediction Techniques
Comprehensive Survey of Data Classification & Prediction Techniques
 
Big data cloud-based recommendation system using NLP techniques with machine ...
Big data cloud-based recommendation system using NLP techniques with machine ...Big data cloud-based recommendation system using NLP techniques with machine ...
Big data cloud-based recommendation system using NLP techniques with machine ...
 
Frequent Item set Mining of Big Data for Social Media
Frequent Item set Mining of Big Data for Social MediaFrequent Item set Mining of Big Data for Social Media
Frequent Item set Mining of Big Data for Social Media
 
Frequent Item set Mining of Big Data for Social Media
Frequent Item set Mining of Big Data for Social MediaFrequent Item set Mining of Big Data for Social Media
Frequent Item set Mining of Big Data for Social Media
 
H017625354
H017625354H017625354
H017625354
 
Ijariie1184
Ijariie1184Ijariie1184
Ijariie1184
 
Ijariie1184
Ijariie1184Ijariie1184
Ijariie1184
 
Ci2004-10.doc
Ci2004-10.docCi2004-10.doc
Ci2004-10.doc
 
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
 
Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...
Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...
Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...
 
Survey on MapReduce in Big Data Clustering using Machine Learning Algorithms
Survey on MapReduce in Big Data Clustering using Machine Learning AlgorithmsSurvey on MapReduce in Big Data Clustering using Machine Learning Algorithms
Survey on MapReduce in Big Data Clustering using Machine Learning Algorithms
 
ISSUES, CHALLENGES, AND SOLUTIONS: BIG DATA MINING
ISSUES, CHALLENGES, AND SOLUTIONS: BIG DATA MININGISSUES, CHALLENGES, AND SOLUTIONS: BIG DATA MINING
ISSUES, CHALLENGES, AND SOLUTIONS: BIG DATA MINING
 
Issues, challenges, and solutions
Issues, challenges, and solutionsIssues, challenges, and solutions
Issues, challenges, and solutions
 

Recently uploaded

Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...anjaliyadav012327
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 

Recently uploaded (20)

Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 

On distributed fuzzy decision trees for big data

  • 1. CONTACT: PRAVEEN KUMAR. L (, +91 – 9791938249) MAIL ID: sunsid1989@gmail.com, praveen@nexgenproject.com Web: www.nexgenproject.com, www.finalyear-ieeeprojects.com On Distributed Fuzzy Decision Trees for Big Data Abstract: Fuzzy decision trees (FDTs) have shown to be an effective solution in the framework of fuzzy classification. The approaches proposed so far to FDT learning, however, have generally neglected time and space requirements. In this paper, we propose a distributed FDT learning scheme shaped according to the MapReduce programming model for generating both binary and multi-way FDTs from big data. The scheme relies on a novel distributed fuzzy discretizer that generates a strong fuzzy partition for each continuous attribute based on fuzzy information entropy. The fuzzy partitions are therefore used as input to the FDT learning algorithm, which employs fuzzy information gain for selecting the attributes at the decision nodes. We have implemented the FDT learning scheme on the Apache Spark framework. We have used ten real-world publicly available big datasets for evaluating the behavior of the scheme along three dimensions: i) performance in terms of classification accuracy, model complexity and execution time, ii) scalability varying the number of computing units and iii) ability to efficiently accommodate an increasing dataset size. We have demonstrated that the proposed scheme turns out to be suitable for managing big datasets even with modest commodity hardware support. Finally, we have used the distributed decision tree learning algorithm implemented in the MLLib library and the Chi- FRBCS-BigData algorithm, a MapReduce distributed fuzzy rule-based classification system, for comparative analysis. Existing System: A decision tree can be generated efficiently from very large datasets. The various techniques proposed in the literature can be roughly grouped into two categories, which are characterized by performing a pre-sorting of the data or by adopting approximate representations of the data such as samples and/or histograms [46]. While pre-sorting techniques are more accurate, they cannot accommodate very large datasets or streaming data [46]. One of the oldest approaches in the first category is SLIQ, proposed in [47]. SLIQ reduces decision tree learning time without loss in accuracy by exploiting a pre-sorting technique in the tree- growth phase. This technique is integrated with a breadth-first tree growing strategy to enable classification of disk-resident datasets. Proposed System: we propose a distributed fuzzy discretizer and a distributed FDT (DFDT) learning scheme upon the MapReduce programming model for managing big data. To the best of our knowledge, in the context of big data, no distributed discretizer for generating fuzzy
  • 2. CONTACT: PRAVEEN KUMAR. L (, +91 – 9791938249) MAIL ID: sunsid1989@gmail.com, praveen@nexgenproject.com Web: www.nexgenproject.com, www.finalyear-ieeeprojects.com partitions and no DFDT have been proposed in the literature. Our novel discretizer generates a strong fuzzy partition for each continuous attribute by using a purposely adapted distributed version of the well-known method proposed by Fayyad and Irani in [44]. The fuzzy partitions computed by the discretizer are used as input to the DFDT learning algorithm. We adopt and compare two different versions of the learning algorithm based on binary and multi-way splits, respectively. Both the versions employ the information gain computed in terms of fuzzy entropy for selecting the attribute to be adopted at each decision node. Conclusions: We have proposed a distributed fuzzy decision tree (FDT) learning scheme shaped according to the MapReduce programming model for generating both binary (FBDT) and multiway (FMDT) FDTs from big data. We have first introduced a novel distributed fuzzy discretizer, which generates strong fuzzy partitions for each continuous attribute based on fuzzy information entropy. Then, we have discussed a distributed implementation of an FDT learning algorithm, which employs the fuzzy information gain for selecting the attributes to be used in the decision nodes. We have implemented the FDT learning scheme on the Apache Spark framework. we believe that the work presented in this paper is the first extensive study on the application of FDTs to big data, considering both binary and multi-way splits. We expect that the experimental results can be used as baseline for future research in this field. REFERENCES [1] R. Diao, K. Sun, V. Vittal, R. J. O’Keefe, M. R. Richardson, N. Bhatt, D. Stradford, and S. K. Sarawgi, “Decision tree-based online voltage security assessment using PMU measurements,” IEEE Transactions on Power Systems, vol. 24, no. 2, pp. 832–839, 2009. [2] T. Goetz, The decision tree: Taking control of your health in the new era of personalized medicine. Rodale Inc., 2010. [3] Y. Zheng, L. Liu, L. Wang, and X. Xie, “Learning transportation mode from raw gps data for geographic applications on the web,” in Proceedings of the 17th international conference on World Wide Web, 2008, pp. 247–256. [4] J. Han, M. Kamber, and J. Pei, Data mining: Concepts and techniques. Elsevier, 2011. [5] L. Rokach and O. Maimon, Data mining with decision trees: Theory and applications. World scientific, 2014. [6] J. R. Quinlan, “Induction of decision trees,” Machine learning, vol. 1, no. 1, pp. 81–106, 1986.
  • 3. CONTACT: PRAVEEN KUMAR. L (, +91 – 9791938249) MAIL ID: sunsid1989@gmail.com, praveen@nexgenproject.com Web: www.nexgenproject.com, www.finalyear-ieeeprojects.com [7] L. Breiman, J. Friedman, C. J. Stone, and R. A. Olshen, Classification and regression trees. CRC press, 1984. [8] C. Z. Janikow, “Fuzzy decision trees: Issues and methods,” IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 28, no. 1, pp. 1–14, 1998. [9] Y.-l. Chen, T. Wang, B.-s. Wang, and Z.-j. Li, “A survey of fuzzy decision tree classifier,” Fuzzy Information and Engineering, vol. 1, no. 2, pp. 149–159, 2009. [10] J. R. Quinlan, C4.5: Programs for Machine Learning. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1993. [11] X. Liu, X. Feng, and W. Pedrycz, “Extraction of fuzzy rules from fuzzy decision trees: An axiomatic fuzzy sets (AFS) approach,” Data & Knowledge Engineering, vol. 84, pp. 1– 25, 2013. [12] T. Hastie, R. Tibshirani, J. Friedman, and J. Franklin, “The elements of statistical learning: Data mining, inference and prediction,” The Mathematical Intelligencer, vol. 27, no. 2, pp. 83–85, 2005. [13] H. Kim and W.-Y. Loh, “Classification trees with unbiased multiway splits,” Journal of the American Statistical Association, pp. 589–604, 2011. [14] F. Berzal, J.-C. Cubero, N. Marın, and D. Sanchez, “Building multi- ´ way decision trees with numerical attributes,” Information Sciences, vol. 165, no. 1, pp. 73–90, 2004. [15] Y. Yuan and M. J. Shaw, “Induction of fuzzy decision trees,” Fuzzy Sets and systems, vol. 69, no. 2, pp. 125–139, 1995. [16] R. Weber, “Fuzzy-ID3: A class of methods for automatic knowledge acquisition,” in Int. Conf. on Fuzzy Logic & Neural Networks, 1992, pp. 265–268. [17] M. Zeinalkhani and M. Eftekhari, “Fuzzy partitioning of continuous attributes through discretization methods to construct fuzzy decision tree classifiers,” Information Sciences, vol. 278, pp. 715–735, 2014. [18] S. Garcia, J. Luengo, J. A. Saez, V. Lopez, and F. Herrera, “A survey of ´ discretization techniques: Taxonomy and empirical analysis in supervised learning,” IEEE Trans. on Knowledge and Data Engineering, vol. 25, no. 4, pp. 734–750, 2013.