SlideShare a Scribd company logo
1 of 20
A SCALABLE TWO-PHASE TOP-
DOWN SPECIALIZATION
APPROACH FOR DATA
ANONYMIZATION USING
MAPREDUCE ON CLOUD
PRESENTED BY
ABSTRACT
 In this paper, we propose a scalable two-phase
top-down specialization (TDS) approach to
anonymize large-scale data sets using the
MapReduce framework on cloud.
 In both phases of our approach, we
deliberately design a group of innovative
MapReduce jobs to concretely accomplish the
specialization computation in a highly scalable
way.
EXISTING SYSTEM
 We leverage MapReduce, a widely adopted
parallel data processing framework, to address
the scalability problem of the top-down
specialization (TDS) approach for large-scale
data anonymization.
 The TDS approach, offering a good tradeoff
between data utility and data consistency, is
widely applied for data anonymization.
EXISTING SYSTEM
 Most TDS algorithms are centralized, resulting
in their inadequacy in handling largescale data
sets.
 Although some distributed algorithms have
been proposed, they mainly focus on secure
anonymization of data sets from multiple
parties, rather than the scalability aspect.
DISADVANTAGES OF EXISTING
SYSTEM
 The MapReduce computation paradigm still a
challenge to design proper MapReduce jobs
for TDS.
PROPOSED SYSTEM
 In this paper, we propose a scalable two-phase
top-down specialization (TDS) approach to
anonymize large-scale data sets using the
MapReduce framework on cloud.
 In both phases of our approach, we deliberately
design a group of innovative MapReduce jobs to
concretely accomplish the specialization
computation in a highly scalable way.
ADVANTAGES OF PROPOSED
SYSTEM
 Accomplish the specializations in a highly
scalable fashion.
 Gain high scalability.
 Significantly improve the scalability and
efficiency of TDS for data anonymization over
existing approaches.
SYSTEM ARCHITECTURE
HARDWARE REQUIREMENTS
 System : Pentium IV
2.4 GHz.
 Hard Disk : 40 GB.
 Floppy Drive : 1.44 Mb.
 Monitor : 15 VGA
Colour.
 Mouse : Logitech.
 Ram : 512 Mb.
SOFTWARE REQUIREMENTS
 Operating system : Windows XP/7.
 Coding Language : JAVA/J2EE
 IDE : Netbeans 7.4
 Database : MYSQL
MODULES:
 DATA PARTITION
 ANONYMIZATION
 MERGING
 SPECIALIZATION
 OBS
DATA PARTITION:
 In this module the data partition is performed
on the cloud.
 Here we collect the large no of data sets.
 We are split the large into small data sets.
 Then we provides the random no for each data
sets.
ANONYMIZATION:
 After geting the individual data sets we apply
the anonymization.
 The anonymization means hide or remove the
sensitive field in data sets.
 Then we get the intermediate result for the
small data sets
MERGING:
 The intermediate result of the several small
data sets are merged here.
 The MRTDS driver is used to organizes the
small intermediate result
 For merging, the merged data sets are
collected on cloud.
 The merging result is again applied in
anonymization called specialization.
SPECIALIZATION:
 After geting the intermediate result those
results are merged into one.
 Then we again applies the anonymization on
the merged data it called specialization.
 Here we are using the two kinds of jobs such
as IGPL UPDATE AND IGPL INITIALIZATION.
OBS:
 The OBS called optimized balancing
scheduling.
 Here we focus on the two kinds of the
scheduling called time and size.
 Here data sets are split in to the specified size
and applied anonymization on specified time.
 The OBS approach is to provide the high
ability on handles the large data sets.
REFERENCES
 Xuyun Zhang, Laurence T. Yang,Chang Liu, and
Jinjun Chen,“A Scalable Two-Phase Top-
DownSpecialization Approach for Data
Anonymization Using MapReduce on Cloud”,
IEEE TRANSACTIONS ON PARALLEL AND
DISTRIBUTED SYSTEMS, VOL. 25, NO. 2,
FEBRUARY 2014

More Related Content

Similar to FINAL REVIEW of a scalable phase top down

Graph Based Workload Driven Partitioning System by Using MongoDB
Graph Based Workload Driven Partitioning System by Using MongoDBGraph Based Workload Driven Partitioning System by Using MongoDB
Graph Based Workload Driven Partitioning System by Using MongoDB
IJAAS Team
 
Distributed Framework for Data Mining As a Service on Private Cloud
Distributed Framework for Data Mining As a Service on Private CloudDistributed Framework for Data Mining As a Service on Private Cloud
Distributed Framework for Data Mining As a Service on Private Cloud
IJERA Editor
 
Hot-Spot analysis Using Apache Spark framework
Hot-Spot analysis Using Apache Spark frameworkHot-Spot analysis Using Apache Spark framework
Hot-Spot analysis Using Apache Spark framework
Supriya .
 
Dsm Presentation
Dsm PresentationDsm Presentation
Dsm Presentation
richoe
 

Similar to FINAL REVIEW of a scalable phase top down (20)

Ieeepro techno solutions ieee dotnet project - generalized approach for data
Ieeepro techno solutions  ieee dotnet project - generalized approach for dataIeeepro techno solutions  ieee dotnet project - generalized approach for data
Ieeepro techno solutions ieee dotnet project - generalized approach for data
 
Ieeepro techno solutions ieee java project - generalized approach for data
Ieeepro techno solutions  ieee java project - generalized approach for dataIeeepro techno solutions  ieee java project - generalized approach for data
Ieeepro techno solutions ieee java project - generalized approach for data
 
Ieeepro techno solutions ieee java project - generalized approach for data
Ieeepro techno solutions  ieee java project - generalized approach for dataIeeepro techno solutions  ieee java project - generalized approach for data
Ieeepro techno solutions ieee java project - generalized approach for data
 
Bigdata analytics K.kiruthika 2nd M.Sc.,computer science Bon secoures college...
Bigdata analytics K.kiruthika 2nd M.Sc.,computer science Bon secoures college...Bigdata analytics K.kiruthika 2nd M.Sc.,computer science Bon secoures college...
Bigdata analytics K.kiruthika 2nd M.Sc.,computer science Bon secoures college...
 
Geo distributed parallelization pacts in map reduce
Geo distributed parallelization pacts in map reduceGeo distributed parallelization pacts in map reduce
Geo distributed parallelization pacts in map reduce
 
Data Distribution Handling on Cloud for Deployment of Big Data
Data Distribution Handling on Cloud for Deployment of Big DataData Distribution Handling on Cloud for Deployment of Big Data
Data Distribution Handling on Cloud for Deployment of Big Data
 
Data Distribution Handling on Cloud for Deployment of Big Data
Data Distribution Handling on Cloud for Deployment of Big DataData Distribution Handling on Cloud for Deployment of Big Data
Data Distribution Handling on Cloud for Deployment of Big Data
 
Graph Based Workload Driven Partitioning System by Using MongoDB
Graph Based Workload Driven Partitioning System by Using MongoDBGraph Based Workload Driven Partitioning System by Using MongoDB
Graph Based Workload Driven Partitioning System by Using MongoDB
 
An Energy Efficient Data Transmission and Aggregation of WSN using Data Proce...
An Energy Efficient Data Transmission and Aggregation of WSN using Data Proce...An Energy Efficient Data Transmission and Aggregation of WSN using Data Proce...
An Energy Efficient Data Transmission and Aggregation of WSN using Data Proce...
 
Survey on load balancing and data skew mitigation in mapreduce applications
Survey on load balancing and data skew mitigation in mapreduce applicationsSurvey on load balancing and data skew mitigation in mapreduce applications
Survey on load balancing and data skew mitigation in mapreduce applications
 
Data Partitioning in Mongo DB with Cloud
Data Partitioning in Mongo DB with CloudData Partitioning in Mongo DB with Cloud
Data Partitioning in Mongo DB with Cloud
 
Hadoop Cluster Analysis and Assessment
Hadoop Cluster Analysis and AssessmentHadoop Cluster Analysis and Assessment
Hadoop Cluster Analysis and Assessment
 
IRJET- Enhanced Density Based Method for Clustering Data Stream
IRJET-  	  Enhanced Density Based Method for Clustering Data StreamIRJET-  	  Enhanced Density Based Method for Clustering Data Stream
IRJET- Enhanced Density Based Method for Clustering Data Stream
 
MAP REDUCE SLIDESHARE
MAP REDUCE SLIDESHAREMAP REDUCE SLIDESHARE
MAP REDUCE SLIDESHARE
 
MAP REDUCE BASED ON CLOAK DHT DATA REPLICATION EVALUATION
MAP REDUCE BASED ON CLOAK DHT DATA REPLICATION EVALUATIONMAP REDUCE BASED ON CLOAK DHT DATA REPLICATION EVALUATION
MAP REDUCE BASED ON CLOAK DHT DATA REPLICATION EVALUATION
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoop
 
Distributed Framework for Data Mining As a Service on Private Cloud
Distributed Framework for Data Mining As a Service on Private CloudDistributed Framework for Data Mining As a Service on Private Cloud
Distributed Framework for Data Mining As a Service on Private Cloud
 
Handling Big Data
Handling Big DataHandling Big Data
Handling Big Data
 
Hot-Spot analysis Using Apache Spark framework
Hot-Spot analysis Using Apache Spark frameworkHot-Spot analysis Using Apache Spark framework
Hot-Spot analysis Using Apache Spark framework
 
Dsm Presentation
Dsm PresentationDsm Presentation
Dsm Presentation
 

Recently uploaded

FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
dollysharma2066
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
dharasingh5698
 

Recently uploaded (20)

Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Intro To Electric Vehicles PDF Notes.pdf
Intro To Electric Vehicles PDF Notes.pdfIntro To Electric Vehicles PDF Notes.pdf
Intro To Electric Vehicles PDF Notes.pdf
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 

FINAL REVIEW of a scalable phase top down

  • 1. A SCALABLE TWO-PHASE TOP- DOWN SPECIALIZATION APPROACH FOR DATA ANONYMIZATION USING MAPREDUCE ON CLOUD PRESENTED BY
  • 2. ABSTRACT  In this paper, we propose a scalable two-phase top-down specialization (TDS) approach to anonymize large-scale data sets using the MapReduce framework on cloud.  In both phases of our approach, we deliberately design a group of innovative MapReduce jobs to concretely accomplish the specialization computation in a highly scalable way.
  • 3. EXISTING SYSTEM  We leverage MapReduce, a widely adopted parallel data processing framework, to address the scalability problem of the top-down specialization (TDS) approach for large-scale data anonymization.  The TDS approach, offering a good tradeoff between data utility and data consistency, is widely applied for data anonymization.
  • 4. EXISTING SYSTEM  Most TDS algorithms are centralized, resulting in their inadequacy in handling largescale data sets.  Although some distributed algorithms have been proposed, they mainly focus on secure anonymization of data sets from multiple parties, rather than the scalability aspect.
  • 5. DISADVANTAGES OF EXISTING SYSTEM  The MapReduce computation paradigm still a challenge to design proper MapReduce jobs for TDS.
  • 6. PROPOSED SYSTEM  In this paper, we propose a scalable two-phase top-down specialization (TDS) approach to anonymize large-scale data sets using the MapReduce framework on cloud.  In both phases of our approach, we deliberately design a group of innovative MapReduce jobs to concretely accomplish the specialization computation in a highly scalable way.
  • 7. ADVANTAGES OF PROPOSED SYSTEM  Accomplish the specializations in a highly scalable fashion.  Gain high scalability.  Significantly improve the scalability and efficiency of TDS for data anonymization over existing approaches.
  • 9. HARDWARE REQUIREMENTS  System : Pentium IV 2.4 GHz.  Hard Disk : 40 GB.  Floppy Drive : 1.44 Mb.  Monitor : 15 VGA Colour.  Mouse : Logitech.  Ram : 512 Mb.
  • 10. SOFTWARE REQUIREMENTS  Operating system : Windows XP/7.  Coding Language : JAVA/J2EE  IDE : Netbeans 7.4  Database : MYSQL
  • 11. MODULES:  DATA PARTITION  ANONYMIZATION  MERGING  SPECIALIZATION  OBS
  • 12. DATA PARTITION:  In this module the data partition is performed on the cloud.  Here we collect the large no of data sets.  We are split the large into small data sets.  Then we provides the random no for each data sets.
  • 13. ANONYMIZATION:  After geting the individual data sets we apply the anonymization.  The anonymization means hide or remove the sensitive field in data sets.  Then we get the intermediate result for the small data sets
  • 14. MERGING:  The intermediate result of the several small data sets are merged here.  The MRTDS driver is used to organizes the small intermediate result  For merging, the merged data sets are collected on cloud.  The merging result is again applied in anonymization called specialization.
  • 15. SPECIALIZATION:  After geting the intermediate result those results are merged into one.  Then we again applies the anonymization on the merged data it called specialization.  Here we are using the two kinds of jobs such as IGPL UPDATE AND IGPL INITIALIZATION.
  • 16. OBS:  The OBS called optimized balancing scheduling.  Here we focus on the two kinds of the scheduling called time and size.  Here data sets are split in to the specified size and applied anonymization on specified time.  The OBS approach is to provide the high ability on handles the large data sets.
  • 17.
  • 18.
  • 19.
  • 20. REFERENCES  Xuyun Zhang, Laurence T. Yang,Chang Liu, and Jinjun Chen,“A Scalable Two-Phase Top- DownSpecialization Approach for Data Anonymization Using MapReduce on Cloud”, IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 25, NO. 2, FEBRUARY 2014