SlideShare a Scribd company logo
1 of 8
A Random Decision Tree Framework for Privacy-Preserving Data
Mining
Data mining is used to discover knowledge by using existing or past
data and new data class can be find out by applying it on existing
using classification technique. Now-a-days multiple parties use same
data to identify class name of their data and if we expose all data to all
parties then privacy will be at risk.
For example multiple parties such as bank, insurance company or
credit card company will use same records but for different purposes
Bank will use it to find past transaction
Credit card will use data attributes related to pass payment
Insurance company will use to identify correct policy for that person
All above companies will use person profile information but with
different attributes. If all data expose to all company then privacy will
be at risk.
To overcome from such issue author has introduce data mining
algorithm called Random Decision Tree which can build tree by
randomly selected data and apply homomorphic encryption to provide
privacy to users data. All companies only knows class name and
dataset will be partition based on the company required. With
partition dataset Random decision tree will be build.
Dataset will be given to Random decision tree algorithm to build a
tree which is also called as classification model.
To classify new instance company will give all attributes values
related to their provided. Then application will apply new instance
(record) on decision tree model to predict or classify class name of
that instance.
In this paper author has given algorithms such as
Horizontal Partition: using this algorithm we will partition dataset
based on number of parties.
Encryption: Using this algorithm we will encrypt data using
Homomorphic encryption technique
Buildtree: using this algorithm Random decision tree will be build
Classify Instance: using this algorithm we will classify new data or
record belongs to which class by applying decision tree model.
In this paper author has done accuracy comparison between Random
Decision Tree and ID3 tree. To implement this algorithms author has
used WEKA tool and we are also using same tool java API to develop
this project.
In this paper author has used MUSHROOM and NURSERY Dataset
and we also used same dataset and this dataset is available inside
‘dataset’ folder. All information related to dataset columns you can
find inside information folder.
Some dataset examples form NURSERY dataset
parents,has_nurs,form,children,housing,finance,social,health,clas
s
usual,proper,complete,1,convenient,convenient,nonprob,recommende
d,recommend
usual,proper,complete,1,convenient,convenient,nonprob,priority,prior
ity
All bold words are column names and all below are two records from
that dataset and last column contains class name. While uploading
new records from test folder those records will not have class name
and application will classify and give class name for that new record.
See below test values.
2.203259994700768E307,1.8832849888521625E307,2.16639771156
39986E306,1.0250756057356276E306,2.2434704351677847E307,2.
2434704351677847E307,3.4845121783368866E306,1.34719204705
2717E307,?
2.203259994700768E307,1.8832849888521625E307,2.16639771156
39986E306,1.0250756057356276E306,2.2434704351677847E307,2.
2434704351677847E307,3.4845121783368866E306,2.06847477167
45147E307,?
Above test values are in encrypted format and in last column we can
see ? instead of class name as we don’t know it class and application
will predict it.
Screen shots
Double click on ‘run.bat’ file to get below screen
In above screen click on ‘Upload Dataset’ button and upload any
dataset
In above screen I am uploading nursery dataset, now click on ‘Open’
button to get below screen
Now click on ‘Run Data Partition & Privacy Encryption’ to partition
and encrypt data
In above screen we can see entire dataset records in plain format, if u
want to see Homomorphic encrypted data then click on ‘View
Encrypted Data’ to get below screen
In above screen we can see all records are encrypted and only class
name which are in last column are shown to parties. With this
encrypted data nobody can understand anything. Now to build tree on
this encrypted data click on ‘Run Random Decision Tree’ button to
build tree
In above screen we can see tree generated by random decision and all
nodes contains encrypted data and this tree got accuracy as 87%. In
last line we can see accuracy. Now click on ‘Build ID3 Tree’ button
to generate tree with ID3 technique
In above screen we can see ID3 tree also but its accuracy is 71%.
Now click on ‘Classify Instance’ button to upload test file and get
prediction or classification result. Here if u build decision tree with
NURSERY dataset then upload nursery test dataset only
In above screen I am uploading nursey test dataset and below is
classification result
In above screen each records contains ‘?’ at last column and in next
line application has given or predict it class name. for example in
above screen in first record is classified as ‘recommend’.
Now click on ‘Random Decision & ID3 Tree Accuracy Graph’ button
to get below accuracy graph of both algorithms
In above graph x-axis represents algorithm name and y-axis
represents accuracy of those algorithms.
Similarly you can upload MUSHROOM dataset and test

More Related Content

What's hot

Data_Processing_Program
Data_Processing_ProgramData_Processing_Program
Data_Processing_Program
Neil Dahlqvist
 

What's hot (17)

MS Sql Server: Introduction To Datamining Suing Sql Server
MS Sql Server: Introduction To Datamining Suing Sql ServerMS Sql Server: Introduction To Datamining Suing Sql Server
MS Sql Server: Introduction To Datamining Suing Sql Server
 
Database note for 4th semester Notes
Database note for 4th semester Notes Database note for 4th semester Notes
Database note for 4th semester Notes
 
Data Mining Techniques using WEKA (Ankit Pandey-10BM60012)
Data Mining Techniques using WEKA (Ankit Pandey-10BM60012)Data Mining Techniques using WEKA (Ankit Pandey-10BM60012)
Data Mining Techniques using WEKA (Ankit Pandey-10BM60012)
 
XL-MINER:Partition
XL-MINER:PartitionXL-MINER:Partition
XL-MINER:Partition
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
 
Udd for multiple web databases
Udd for multiple web databasesUdd for multiple web databases
Udd for multiple web databases
 
XL-MINER: Associations
XL-MINER: AssociationsXL-MINER: Associations
XL-MINER: Associations
 
ICS Part 2 Computer Science Short Notes
ICS Part 2 Computer Science Short NotesICS Part 2 Computer Science Short Notes
ICS Part 2 Computer Science Short Notes
 
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
Data Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessingData Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessing
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
 
Data_Processing_Program
Data_Processing_ProgramData_Processing_Program
Data_Processing_Program
 
XL-MINER:Prediction
XL-MINER:PredictionXL-MINER:Prediction
XL-MINER:Prediction
 
Protection models
Protection modelsProtection models
Protection models
 
OODM-object oriented data model
OODM-object oriented data modelOODM-object oriented data model
OODM-object oriented data model
 
Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...
Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...
Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...
 
XL Miner: Classification
XL Miner: ClassificationXL Miner: Classification
XL Miner: Classification
 
Data Mining using Weka
Data Mining using WekaData Mining using Weka
Data Mining using Weka
 
Introduction To XL-Miner
Introduction To XL-MinerIntroduction To XL-Miner
Introduction To XL-Miner
 

Similar to A random decision tree frameworkfor privacy preserving data mining

Distributed Digital Artifacts on the Semantic Web
Distributed Digital Artifacts on the Semantic WebDistributed Digital Artifacts on the Semantic Web
Distributed Digital Artifacts on the Semantic Web
Editor IJCATR
 
Data mining techniques using weka
Data mining techniques using wekaData mining techniques using weka
Data mining techniques using weka
rathorenitin87
 
1) 500 words or less, differentiate a Databases Management Systems.pdf
1) 500 words or less, differentiate  a Databases Management Systems.pdf1) 500 words or less, differentiate  a Databases Management Systems.pdf
1) 500 words or less, differentiate a Databases Management Systems.pdf
manjan6
 

Similar to A random decision tree frameworkfor privacy preserving data mining (20)

R decision tree
R   decision treeR   decision tree
R decision tree
 
A new architecture of internet of things and big data ecosystem for
A new architecture of internet of things and big data ecosystem forA new architecture of internet of things and big data ecosystem for
A new architecture of internet of things and big data ecosystem for
 
winbis1005
winbis1005winbis1005
winbis1005
 
Chapter 1.pdf
Chapter 1.pdfChapter 1.pdf
Chapter 1.pdf
 
Presentation_Malware Analysis.pptx
Presentation_Malware Analysis.pptxPresentation_Malware Analysis.pptx
Presentation_Malware Analysis.pptx
 
Implementation of Improved ID3 Algorithm to Obtain more Optimal Decision Tree.
Implementation of Improved ID3 Algorithm to Obtain more Optimal Decision Tree.Implementation of Improved ID3 Algorithm to Obtain more Optimal Decision Tree.
Implementation of Improved ID3 Algorithm to Obtain more Optimal Decision Tree.
 
Machine learning and decision trees
Machine learning and decision treesMachine learning and decision trees
Machine learning and decision trees
 
Research trends in data warehousing and data mining
Research trends in data warehousing and data miningResearch trends in data warehousing and data mining
Research trends in data warehousing and data mining
 
Document Classification Using Expectation Maximization with Semi Supervised L...
Document Classification Using Expectation Maximization with Semi Supervised L...Document Classification Using Expectation Maximization with Semi Supervised L...
Document Classification Using Expectation Maximization with Semi Supervised L...
 
Document Classification Using Expectation Maximization with Semi Supervised L...
Document Classification Using Expectation Maximization with Semi Supervised L...Document Classification Using Expectation Maximization with Semi Supervised L...
Document Classification Using Expectation Maximization with Semi Supervised L...
 
USING GOOGLE’S KEYWORD RELATION IN MULTIDOMAIN DOCUMENT CLASSIFICATION
USING GOOGLE’S KEYWORD RELATION IN MULTIDOMAIN DOCUMENT CLASSIFICATIONUSING GOOGLE’S KEYWORD RELATION IN MULTIDOMAIN DOCUMENT CLASSIFICATION
USING GOOGLE’S KEYWORD RELATION IN MULTIDOMAIN DOCUMENT CLASSIFICATION
 
Ijcet 06 07_002
Ijcet 06 07_002Ijcet 06 07_002
Ijcet 06 07_002
 
IRJET- Machine Learning: Survey, Types and Challenges
IRJET- Machine Learning: Survey, Types and ChallengesIRJET- Machine Learning: Survey, Types and Challenges
IRJET- Machine Learning: Survey, Types and Challenges
 
Opinion dynamics(opinion dynamics based group recommender systems) screen...
Opinion dynamics(opinion dynamics based group recommender systems)     screen...Opinion dynamics(opinion dynamics based group recommender systems)     screen...
Opinion dynamics(opinion dynamics based group recommender systems) screen...
 
Distributed Digital Artifacts on the Semantic Web
Distributed Digital Artifacts on the Semantic WebDistributed Digital Artifacts on the Semantic Web
Distributed Digital Artifacts on the Semantic Web
 
Data mining techniques using weka
Data mining techniques using wekaData mining techniques using weka
Data mining techniques using weka
 
DataMining Techniq
DataMining TechniqDataMining Techniq
DataMining Techniq
 
Data mining approaches and methods
Data mining approaches and methodsData mining approaches and methods
Data mining approaches and methods
 
1) 500 words or less, differentiate a Databases Management Systems.pdf
1) 500 words or less, differentiate  a Databases Management Systems.pdf1) 500 words or less, differentiate  a Databases Management Systems.pdf
1) 500 words or less, differentiate a Databases Management Systems.pdf
 
Svm and maximum entropy model for sentiment analysis of tweets
Svm and maximum entropy model for sentiment analysis of tweetsSvm and maximum entropy model for sentiment analysis of tweets
Svm and maximum entropy model for sentiment analysis of tweets
 

More from Venkat Projects

More from Venkat Projects (20)

1.AUTOMATIC DETECTION OF DIABETIC RETINOPATHY USING CNN.docx
1.AUTOMATIC DETECTION OF DIABETIC RETINOPATHY USING CNN.docx1.AUTOMATIC DETECTION OF DIABETIC RETINOPATHY USING CNN.docx
1.AUTOMATIC DETECTION OF DIABETIC RETINOPATHY USING CNN.docx
 
12.BLOCKCHAIN BASED MILK DELIVERY PLATFORM FOR STALLHOLDER DAIRY FARMERS IN K...
12.BLOCKCHAIN BASED MILK DELIVERY PLATFORM FOR STALLHOLDER DAIRY FARMERS IN K...12.BLOCKCHAIN BASED MILK DELIVERY PLATFORM FOR STALLHOLDER DAIRY FARMERS IN K...
12.BLOCKCHAIN BASED MILK DELIVERY PLATFORM FOR STALLHOLDER DAIRY FARMERS IN K...
 
10.ATTENDANCE CAPTURE SYSTEM USING FACE RECOGNITION.docx
10.ATTENDANCE CAPTURE SYSTEM USING FACE RECOGNITION.docx10.ATTENDANCE CAPTURE SYSTEM USING FACE RECOGNITION.docx
10.ATTENDANCE CAPTURE SYSTEM USING FACE RECOGNITION.docx
 
9.IMPLEMENTATION OF BLOCKCHAIN IN FINANCIAL SECTOR TO IMPROVE SCALABILITY.docx
9.IMPLEMENTATION OF BLOCKCHAIN IN FINANCIAL SECTOR TO IMPROVE SCALABILITY.docx9.IMPLEMENTATION OF BLOCKCHAIN IN FINANCIAL SECTOR TO IMPROVE SCALABILITY.docx
9.IMPLEMENTATION OF BLOCKCHAIN IN FINANCIAL SECTOR TO IMPROVE SCALABILITY.docx
 
8.Geo Tracking Of Waste And Triggering Alerts And Mapping Areas With High Was...
8.Geo Tracking Of Waste And Triggering Alerts And Mapping Areas With High Was...8.Geo Tracking Of Waste And Triggering Alerts And Mapping Areas With High Was...
8.Geo Tracking Of Waste And Triggering Alerts And Mapping Areas With High Was...
 
Image Forgery Detection Based on Fusion of Lightweight Deep Learning Models.docx
Image Forgery Detection Based on Fusion of Lightweight Deep Learning Models.docxImage Forgery Detection Based on Fusion of Lightweight Deep Learning Models.docx
Image Forgery Detection Based on Fusion of Lightweight Deep Learning Models.docx
 
6.A FOREST FIRE IDENTIFICATION METHOD FOR UNMANNED AERIAL VEHICLE MONITORING ...
6.A FOREST FIRE IDENTIFICATION METHOD FOR UNMANNED AERIAL VEHICLE MONITORING ...6.A FOREST FIRE IDENTIFICATION METHOD FOR UNMANNED AERIAL VEHICLE MONITORING ...
6.A FOREST FIRE IDENTIFICATION METHOD FOR UNMANNED AERIAL VEHICLE MONITORING ...
 
WATERMARKING IMAGES
WATERMARKING IMAGESWATERMARKING IMAGES
WATERMARKING IMAGES
 
4.LOCAL DYNAMIC NEIGHBORHOOD BASED OUTLIER DETECTION APPROACH AND ITS FRAMEWO...
4.LOCAL DYNAMIC NEIGHBORHOOD BASED OUTLIER DETECTION APPROACH AND ITS FRAMEWO...4.LOCAL DYNAMIC NEIGHBORHOOD BASED OUTLIER DETECTION APPROACH AND ITS FRAMEWO...
4.LOCAL DYNAMIC NEIGHBORHOOD BASED OUTLIER DETECTION APPROACH AND ITS FRAMEWO...
 
Application and evaluation of a K-Medoidsbased shape clustering method for an...
Application and evaluation of a K-Medoidsbased shape clustering method for an...Application and evaluation of a K-Medoidsbased shape clustering method for an...
Application and evaluation of a K-Medoidsbased shape clustering method for an...
 
OPTIMISED STACKED ENSEMBLE TECHNIQUES IN THE PREDICTION OF CERVICAL CANCER US...
OPTIMISED STACKED ENSEMBLE TECHNIQUES IN THE PREDICTION OF CERVICAL CANCER US...OPTIMISED STACKED ENSEMBLE TECHNIQUES IN THE PREDICTION OF CERVICAL CANCER US...
OPTIMISED STACKED ENSEMBLE TECHNIQUES IN THE PREDICTION OF CERVICAL CANCER US...
 
1.AUTOMATIC DETECTION OF DIABETIC RETINOPATHY USING CNN.docx
1.AUTOMATIC DETECTION OF DIABETIC RETINOPATHY USING CNN.docx1.AUTOMATIC DETECTION OF DIABETIC RETINOPATHY USING CNN.docx
1.AUTOMATIC DETECTION OF DIABETIC RETINOPATHY USING CNN.docx
 
2022 PYTHON MAJOR PROJECTS LIST.docx
2022 PYTHON MAJOR  PROJECTS LIST.docx2022 PYTHON MAJOR  PROJECTS LIST.docx
2022 PYTHON MAJOR PROJECTS LIST.docx
 
2022 PYTHON PROJECTS LIST.docx
2022 PYTHON PROJECTS LIST.docx2022 PYTHON PROJECTS LIST.docx
2022 PYTHON PROJECTS LIST.docx
 
2021 PYTHON PROJECTS LIST.docx
2021 PYTHON PROJECTS LIST.docx2021 PYTHON PROJECTS LIST.docx
2021 PYTHON PROJECTS LIST.docx
 
2021 python projects list
2021 python projects list2021 python projects list
2021 python projects list
 
10.sentiment analysis of customer product reviews using machine learni
10.sentiment analysis of customer product reviews using machine learni10.sentiment analysis of customer product reviews using machine learni
10.sentiment analysis of customer product reviews using machine learni
 
9.data analysis for understanding the impact of covid–19 vaccinations on the ...
9.data analysis for understanding the impact of covid–19 vaccinations on the ...9.data analysis for understanding the impact of covid–19 vaccinations on the ...
9.data analysis for understanding the impact of covid–19 vaccinations on the ...
 
6.iris recognition using machine learning technique
6.iris recognition using machine learning technique6.iris recognition using machine learning technique
6.iris recognition using machine learning technique
 
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
 

Recently uploaded

1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 

Recently uploaded (20)

A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 

A random decision tree frameworkfor privacy preserving data mining

  • 1. A Random Decision Tree Framework for Privacy-Preserving Data Mining Data mining is used to discover knowledge by using existing or past data and new data class can be find out by applying it on existing using classification technique. Now-a-days multiple parties use same data to identify class name of their data and if we expose all data to all parties then privacy will be at risk. For example multiple parties such as bank, insurance company or credit card company will use same records but for different purposes Bank will use it to find past transaction Credit card will use data attributes related to pass payment Insurance company will use to identify correct policy for that person All above companies will use person profile information but with different attributes. If all data expose to all company then privacy will be at risk. To overcome from such issue author has introduce data mining algorithm called Random Decision Tree which can build tree by randomly selected data and apply homomorphic encryption to provide privacy to users data. All companies only knows class name and dataset will be partition based on the company required. With partition dataset Random decision tree will be build. Dataset will be given to Random decision tree algorithm to build a tree which is also called as classification model. To classify new instance company will give all attributes values related to their provided. Then application will apply new instance (record) on decision tree model to predict or classify class name of that instance. In this paper author has given algorithms such as
  • 2. Horizontal Partition: using this algorithm we will partition dataset based on number of parties. Encryption: Using this algorithm we will encrypt data using Homomorphic encryption technique Buildtree: using this algorithm Random decision tree will be build Classify Instance: using this algorithm we will classify new data or record belongs to which class by applying decision tree model. In this paper author has done accuracy comparison between Random Decision Tree and ID3 tree. To implement this algorithms author has used WEKA tool and we are also using same tool java API to develop this project. In this paper author has used MUSHROOM and NURSERY Dataset and we also used same dataset and this dataset is available inside ‘dataset’ folder. All information related to dataset columns you can find inside information folder. Some dataset examples form NURSERY dataset parents,has_nurs,form,children,housing,finance,social,health,clas s usual,proper,complete,1,convenient,convenient,nonprob,recommende d,recommend usual,proper,complete,1,convenient,convenient,nonprob,priority,prior ity All bold words are column names and all below are two records from that dataset and last column contains class name. While uploading new records from test folder those records will not have class name and application will classify and give class name for that new record. See below test values.
  • 3. 2.203259994700768E307,1.8832849888521625E307,2.16639771156 39986E306,1.0250756057356276E306,2.2434704351677847E307,2. 2434704351677847E307,3.4845121783368866E306,1.34719204705 2717E307,? 2.203259994700768E307,1.8832849888521625E307,2.16639771156 39986E306,1.0250756057356276E306,2.2434704351677847E307,2. 2434704351677847E307,3.4845121783368866E306,2.06847477167 45147E307,? Above test values are in encrypted format and in last column we can see ? instead of class name as we don’t know it class and application will predict it. Screen shots Double click on ‘run.bat’ file to get below screen In above screen click on ‘Upload Dataset’ button and upload any dataset
  • 4. In above screen I am uploading nursery dataset, now click on ‘Open’ button to get below screen Now click on ‘Run Data Partition & Privacy Encryption’ to partition and encrypt data
  • 5. In above screen we can see entire dataset records in plain format, if u want to see Homomorphic encrypted data then click on ‘View Encrypted Data’ to get below screen In above screen we can see all records are encrypted and only class name which are in last column are shown to parties. With this encrypted data nobody can understand anything. Now to build tree on this encrypted data click on ‘Run Random Decision Tree’ button to build tree
  • 6. In above screen we can see tree generated by random decision and all nodes contains encrypted data and this tree got accuracy as 87%. In last line we can see accuracy. Now click on ‘Build ID3 Tree’ button to generate tree with ID3 technique In above screen we can see ID3 tree also but its accuracy is 71%. Now click on ‘Classify Instance’ button to upload test file and get prediction or classification result. Here if u build decision tree with NURSERY dataset then upload nursery test dataset only
  • 7. In above screen I am uploading nursey test dataset and below is classification result In above screen each records contains ‘?’ at last column and in next line application has given or predict it class name. for example in above screen in first record is classified as ‘recommend’.
  • 8. Now click on ‘Random Decision & ID3 Tree Accuracy Graph’ button to get below accuracy graph of both algorithms In above graph x-axis represents algorithm name and y-axis represents accuracy of those algorithms. Similarly you can upload MUSHROOM dataset and test