SlideShare a Scribd company logo
1 of 4
CONTACT: PRAVEEN KUMAR. L (,+91 – 9791938249)
MAIL ID: sunsid1989@gmail.com, praveen@nexgenproject.com
Web: www.nexgenproject.com, www.finalyear-ieeeprojects.com
FiDoop-DP: Data Partitioning in Frequent Itemset Mining on
Hadoop Clusters
Abstract:
Traditional parallel algorithms for mining frequent itemsets aim to balance
load by equally partitioning data among a group of computing nodes. We start
this study by discovering a serious performanceproblem of the existing parallel
Frequent Itemset Mining algorithms. Given a large dataset, data partitioning
strategies in the existing solutions suffer high communication and mining
overhead induced by redundant transactions transmitted among computing
nodes. We address this problem by developing a data partitioning approach
called FiDoop-DP using the MapReduce programming model. The overarching
goal of FiDoop-DP is to boost the performance of parallel Frequent Itemset
Mining on Hadoop clusters. At the heart of FiDoop-DP is the Voronoi diagram-
based data partitioning technique, which exploits correlations among
transactions. Incorporating the similarity metric and the Locality-Sensitive
Hashing technique, FiDoop-DP places highly similar transactions into a data
partition to improve locality without creating an excessive number of
redundant transactions. We implement FiDoop-DP on a 24-node Hadoop
cluster, driven by a wide range of datasets created by IBM Quest Market-
Basket Synthetic Data Generator. Experimental results reveal that FiDoop-DP is
conducive to reducing network and computing loads by the virtue of
eliminating redundant transactions on Hadoop nodes. FiDoop-DP significantly
CONTACT: PRAVEEN KUMAR. L (,+91 – 9791938249)
MAIL ID: sunsid1989@gmail.com, praveen@nexgenproject.com
Web: www.nexgenproject.com, www.finalyear-ieeeprojects.com
improves the performance of the existing parallel frequent-pattern scheme by
up to31% with an average of 18%
CONCLUSIONS AND FUTURE WORK
To mitigate high communication and reduce computing cost in MapReduce-
based FIM algorithms, we developed FiDoop-DP, which exploits correlation
among transactions to partition a large dataset across data nodes in a
Hadoopcluster. FiDoop-DP is able to (1) partition transactions with high
similarity together and (2) group highly correlated frequent items into a list.
One of the salient features of FiDoop-DP lies in its capability of lowering
network traffic and computing load through reducing the number of
redundant transactions, which are transmitted among Hadoop nodes. FiDoop-
DP applies the Voronoidiagrambaseddata partitioning technique to accomplish
data partition,in which LSH is incorporated to offer an analysisof correlation
among transactions. At the heart of FiDoop-DP is the second MapReduce job,
which (1) partitions alarge database to form a complete dataset for item
groupsand (2) conducts FP-Growth processing in parallel on localpartitions to
generate all frequent patterns. Our experimentalresults reveal that FiDoop-DP
significantly improves theFIMperformance of the existing Pfp solution by up to
31%with an average of 18%.We introduced in this study a similarity metric to
facilitatedata-aware partitioning. As a future research direction,we will apply
this metric to investigate advanced loadbalancingstrategies on a
CONTACT: PRAVEEN KUMAR. L (,+91 – 9791938249)
MAIL ID: sunsid1989@gmail.com, praveen@nexgenproject.com
Web: www.nexgenproject.com, www.finalyear-ieeeprojects.com
heterogeneous Hadoop cluster.In one of our earlier studies (see [32] for
details), we addressedthe data-placement issue in heterogeneous
Hadoopclusters, where data are placed across nodes in a way thateach node
has a balanced data processing load. Our dataplacement scheme [32] can
balance the amount of datastored in heterogeneous nodes to achieve
improved dataprocessingperformance. Such a scheme implemented at
thelevel of Hadoop distributed file system (HDFS) is unawareof correlations
among application data. To further improveload balancing mechanisms
implemented in HDFS, we planto integrate FiDoop-DP with a data-placement
mechanismin HDFS on heterogeneous clusters. In addition to
performanceissues, energy efficiency of parallel FIM systems willbe an
intriguing research direction.
REFERENCES
[1] M. J. Zaki, “Parallel and distributed association mining: A
survey,”Concurrency, IEEE, vol. 7, no. 4, pp. 14–25, 1999.
[2] I. Pramudiono and M. Kitsuregawa, “Fp-tax: Tree structure
basedgeneralized association rule mining,” in Proceedings of the 9th
ACMSIGMOD workshop on Research issues in data mining and
knowledgediscovery. ACM, 2004, pp. 60–63.
[3] J. Dean and S. Ghemawat, “Mapreduce: simplified data processingon large
clusters,” Communications of the ACM, vol. 51, no. 1, pp.107–113, 2008.
CONTACT: PRAVEEN KUMAR. L (,+91 – 9791938249)
MAIL ID: sunsid1989@gmail.com, praveen@nexgenproject.com
Web: www.nexgenproject.com, www.finalyear-ieeeprojects.com
[4] S. Sakr, A. Liu, and A. G. Fayoumi, “The family of mapreduceand large-scale
data processing systems,” ACMComputing Surveys(CSUR), vol. 46, no. 1, p. 11,
2013.
[5] M.-Y. Lin, P.-Y.Lee, and S.-C. Hsueh, “Apriori-based frequent itemsetmining
algorithms on mapreduce,” in Proceedings of the 6th InternationalConference
on Ubiquitous Information Management and Communication,ser. ICUIMC ’12.
New York, NY, USA: ACM, 2012, pp.76:1–76:8.
[6] X. Lin, “Mr-apriori: Association rules algorithm based on mapreduce,”in
Software Engineering and Service Science (ICSESS), 2014 5thIEEE International
Conference on. IEEE, 2014, pp. 141–144.
[7] L. Zhou, Z. Zhong, J. Chang, J. Li, J. Huang, and S. Feng, “Balancedparallel fp-
growth with mapreduce,” in Information Computing andTelecommunications
(YC-ICT), 2010 IEEE Youth Conference on. IEEE,2010, pp. 243–246.
[8] S. Hong, Z. Huaxuan, C. Shiping, and H. Chunyan, “The study ofimproved fp-
growth algorithm in mapreduce,” in 1st InternationalWorkshop on Cloud
Computing and Information Security. Atlantis Press,2013.
[9] M. Riondato, J. A. DeBrabant, R. Fonseca, and E. Upfal, “Parma:a parallel
randomized algorithm for approximate association rulesmining in mapreduce,”
in Proceedings of the 21st ACM internationalconference on Information and
knowledge management. ACM, 2012, pp.85–94.
[10] C. Lam, Hadoop in action. Manning Publications Co., 2010.

More Related Content

More from Nexgen Technology

More from Nexgen Technology (20)

MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CH...
     MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CH...     MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CH...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CH...
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHENN...
  MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHENN...  MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHENN...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHENN...
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHENNA...
 MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHENNA... MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHENNA...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHENNA...
 
Ieee 2020 21 vlsi projects in pondicherry,ieee vlsi projects in chennai
Ieee 2020 21 vlsi projects in pondicherry,ieee  vlsi projects  in chennaiIeee 2020 21 vlsi projects in pondicherry,ieee  vlsi projects  in chennai
Ieee 2020 21 vlsi projects in pondicherry,ieee vlsi projects in chennai
 
Ieee 2020 21 power electronics in pondicherry,Ieee 2020 21 power electronics
Ieee 2020 21 power electronics in pondicherry,Ieee 2020 21 power electronics Ieee 2020 21 power electronics in pondicherry,Ieee 2020 21 power electronics
Ieee 2020 21 power electronics in pondicherry,Ieee 2020 21 power electronics
 
Ieee 2020 -21 ns2 in pondicherry, Ieee 2020 -21 ns2 projects,best project cen...
Ieee 2020 -21 ns2 in pondicherry, Ieee 2020 -21 ns2 projects,best project cen...Ieee 2020 -21 ns2 in pondicherry, Ieee 2020 -21 ns2 projects,best project cen...
Ieee 2020 -21 ns2 in pondicherry, Ieee 2020 -21 ns2 projects,best project cen...
 
Ieee 2020 21 ns2 in pondicherry,best project center in pondicherry,final year...
Ieee 2020 21 ns2 in pondicherry,best project center in pondicherry,final year...Ieee 2020 21 ns2 in pondicherry,best project center in pondicherry,final year...
Ieee 2020 21 ns2 in pondicherry,best project center in pondicherry,final year...
 
Ieee 2020 21 java dotnet in pondicherry,final year projects in pondicherry,pr...
Ieee 2020 21 java dotnet in pondicherry,final year projects in pondicherry,pr...Ieee 2020 21 java dotnet in pondicherry,final year projects in pondicherry,pr...
Ieee 2020 21 java dotnet in pondicherry,final year projects in pondicherry,pr...
 
Ieee 2020 21 iot in pondicherry,final year projects in pondicherry,project ce...
Ieee 2020 21 iot in pondicherry,final year projects in pondicherry,project ce...Ieee 2020 21 iot in pondicherry,final year projects in pondicherry,project ce...
Ieee 2020 21 iot in pondicherry,final year projects in pondicherry,project ce...
 
Ieee 2020 21 blockchain in pondicherry,final year projects in pondicherry,bes...
Ieee 2020 21 blockchain in pondicherry,final year projects in pondicherry,bes...Ieee 2020 21 blockchain in pondicherry,final year projects in pondicherry,bes...
Ieee 2020 21 blockchain in pondicherry,final year projects in pondicherry,bes...
 
Ieee 2020 -21 bigdata in pondicherry,project center in pondicherry,best proje...
Ieee 2020 -21 bigdata in pondicherry,project center in pondicherry,best proje...Ieee 2020 -21 bigdata in pondicherry,project center in pondicherry,best proje...
Ieee 2020 -21 bigdata in pondicherry,project center in pondicherry,best proje...
 
Ieee 2020 21 embedded in pondicherry,final year projects in pondicherry,best...
Ieee 2020 21  embedded in pondicherry,final year projects in pondicherry,best...Ieee 2020 21  embedded in pondicherry,final year projects in pondicherry,best...
Ieee 2020 21 embedded in pondicherry,final year projects in pondicherry,best...
 

Recently uploaded

Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 

Recently uploaded (20)

Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIFood Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 

Fi doopdp data partitioning in frequent itemset

  • 1. CONTACT: PRAVEEN KUMAR. L (,+91 – 9791938249) MAIL ID: sunsid1989@gmail.com, praveen@nexgenproject.com Web: www.nexgenproject.com, www.finalyear-ieeeprojects.com FiDoop-DP: Data Partitioning in Frequent Itemset Mining on Hadoop Clusters Abstract: Traditional parallel algorithms for mining frequent itemsets aim to balance load by equally partitioning data among a group of computing nodes. We start this study by discovering a serious performanceproblem of the existing parallel Frequent Itemset Mining algorithms. Given a large dataset, data partitioning strategies in the existing solutions suffer high communication and mining overhead induced by redundant transactions transmitted among computing nodes. We address this problem by developing a data partitioning approach called FiDoop-DP using the MapReduce programming model. The overarching goal of FiDoop-DP is to boost the performance of parallel Frequent Itemset Mining on Hadoop clusters. At the heart of FiDoop-DP is the Voronoi diagram- based data partitioning technique, which exploits correlations among transactions. Incorporating the similarity metric and the Locality-Sensitive Hashing technique, FiDoop-DP places highly similar transactions into a data partition to improve locality without creating an excessive number of redundant transactions. We implement FiDoop-DP on a 24-node Hadoop cluster, driven by a wide range of datasets created by IBM Quest Market- Basket Synthetic Data Generator. Experimental results reveal that FiDoop-DP is conducive to reducing network and computing loads by the virtue of eliminating redundant transactions on Hadoop nodes. FiDoop-DP significantly
  • 2. CONTACT: PRAVEEN KUMAR. L (,+91 – 9791938249) MAIL ID: sunsid1989@gmail.com, praveen@nexgenproject.com Web: www.nexgenproject.com, www.finalyear-ieeeprojects.com improves the performance of the existing parallel frequent-pattern scheme by up to31% with an average of 18% CONCLUSIONS AND FUTURE WORK To mitigate high communication and reduce computing cost in MapReduce- based FIM algorithms, we developed FiDoop-DP, which exploits correlation among transactions to partition a large dataset across data nodes in a Hadoopcluster. FiDoop-DP is able to (1) partition transactions with high similarity together and (2) group highly correlated frequent items into a list. One of the salient features of FiDoop-DP lies in its capability of lowering network traffic and computing load through reducing the number of redundant transactions, which are transmitted among Hadoop nodes. FiDoop- DP applies the Voronoidiagrambaseddata partitioning technique to accomplish data partition,in which LSH is incorporated to offer an analysisof correlation among transactions. At the heart of FiDoop-DP is the second MapReduce job, which (1) partitions alarge database to form a complete dataset for item groupsand (2) conducts FP-Growth processing in parallel on localpartitions to generate all frequent patterns. Our experimentalresults reveal that FiDoop-DP significantly improves theFIMperformance of the existing Pfp solution by up to 31%with an average of 18%.We introduced in this study a similarity metric to facilitatedata-aware partitioning. As a future research direction,we will apply this metric to investigate advanced loadbalancingstrategies on a
  • 3. CONTACT: PRAVEEN KUMAR. L (,+91 – 9791938249) MAIL ID: sunsid1989@gmail.com, praveen@nexgenproject.com Web: www.nexgenproject.com, www.finalyear-ieeeprojects.com heterogeneous Hadoop cluster.In one of our earlier studies (see [32] for details), we addressedthe data-placement issue in heterogeneous Hadoopclusters, where data are placed across nodes in a way thateach node has a balanced data processing load. Our dataplacement scheme [32] can balance the amount of datastored in heterogeneous nodes to achieve improved dataprocessingperformance. Such a scheme implemented at thelevel of Hadoop distributed file system (HDFS) is unawareof correlations among application data. To further improveload balancing mechanisms implemented in HDFS, we planto integrate FiDoop-DP with a data-placement mechanismin HDFS on heterogeneous clusters. In addition to performanceissues, energy efficiency of parallel FIM systems willbe an intriguing research direction. REFERENCES [1] M. J. Zaki, “Parallel and distributed association mining: A survey,”Concurrency, IEEE, vol. 7, no. 4, pp. 14–25, 1999. [2] I. Pramudiono and M. Kitsuregawa, “Fp-tax: Tree structure basedgeneralized association rule mining,” in Proceedings of the 9th ACMSIGMOD workshop on Research issues in data mining and knowledgediscovery. ACM, 2004, pp. 60–63. [3] J. Dean and S. Ghemawat, “Mapreduce: simplified data processingon large clusters,” Communications of the ACM, vol. 51, no. 1, pp.107–113, 2008.
  • 4. CONTACT: PRAVEEN KUMAR. L (,+91 – 9791938249) MAIL ID: sunsid1989@gmail.com, praveen@nexgenproject.com Web: www.nexgenproject.com, www.finalyear-ieeeprojects.com [4] S. Sakr, A. Liu, and A. G. Fayoumi, “The family of mapreduceand large-scale data processing systems,” ACMComputing Surveys(CSUR), vol. 46, no. 1, p. 11, 2013. [5] M.-Y. Lin, P.-Y.Lee, and S.-C. Hsueh, “Apriori-based frequent itemsetmining algorithms on mapreduce,” in Proceedings of the 6th InternationalConference on Ubiquitous Information Management and Communication,ser. ICUIMC ’12. New York, NY, USA: ACM, 2012, pp.76:1–76:8. [6] X. Lin, “Mr-apriori: Association rules algorithm based on mapreduce,”in Software Engineering and Service Science (ICSESS), 2014 5thIEEE International Conference on. IEEE, 2014, pp. 141–144. [7] L. Zhou, Z. Zhong, J. Chang, J. Li, J. Huang, and S. Feng, “Balancedparallel fp- growth with mapreduce,” in Information Computing andTelecommunications (YC-ICT), 2010 IEEE Youth Conference on. IEEE,2010, pp. 243–246. [8] S. Hong, Z. Huaxuan, C. Shiping, and H. Chunyan, “The study ofimproved fp- growth algorithm in mapreduce,” in 1st InternationalWorkshop on Cloud Computing and Information Security. Atlantis Press,2013. [9] M. Riondato, J. A. DeBrabant, R. Fonseca, and E. Upfal, “Parma:a parallel randomized algorithm for approximate association rulesmining in mapreduce,” in Proceedings of the 21st ACM internationalconference on Information and knowledge management. ACM, 2012, pp.85–94. [10] C. Lam, Hadoop in action. Manning Publications Co., 2010.