A random decision tree frameworkfor privacy preserving data mining

A Random Decision Tree Framework for Privacy-Preserving Data
Mining
Data mining is used to discover knowledge by using existing or past
data and new data class can be find out by applying it on existing
using classification technique. Now-a-days multiple parties use same
data to identify class name of their data and if we expose all data to all
parties then privacy will be at risk.
For example multiple parties such as bank, insurance company or
credit card company will use same records but for different purposes
Bank will use it to find past transaction
Credit card will use data attributes related to pass payment
Insurance company will use to identify correct policy for that person
All above companies will use person profile information but with
different attributes. If all data expose to all company then privacy will
be at risk.
To overcome from such issue author has introduce data mining
algorithm called Random Decision Tree which can build tree by
randomly selected data and apply homomorphic encryption to provide
privacy to users data. All companies only knows class name and
dataset will be partition based on the company required. With
partition dataset Random decision tree will be build.
Dataset will be given to Random decision tree algorithm to build a
tree which is also called as classification model.
To classify new instance company will give all attributes values
related to their provided. Then application will apply new instance
(record) on decision tree model to predict or classify class name of
that instance.
In this paper author has given algorithms such as

Horizontal Partition: using this algorithm we will partition dataset
based on number of parties.
Encryption: Using this algorithm we will encrypt data using
Homomorphic encryption technique
Buildtree: using this algorithm Random decision tree will be build
Classify Instance: using this algorithm we will classify new data or
record belongs to which class by applying decision tree model.
In this paper author has done accuracy comparison between Random
Decision Tree and ID3 tree. To implement this algorithms author has
used WEKA tool and we are also using same tool java API to develop
this project.
In this paper author has used MUSHROOM and NURSERY Dataset
and we also used same dataset and this dataset is available inside
‘dataset’ folder. All information related to dataset columns you can
find inside information folder.
Some dataset examples form NURSERY dataset
parents,has_nurs,form,children,housing,finance,social,health,clas
s
usual,proper,complete,1,convenient,convenient,nonprob,recommende
d,recommend
usual,proper,complete,1,convenient,convenient,nonprob,priority,prior
ity
All bold words are column names and all below are two records from
that dataset and last column contains class name. While uploading
new records from test folder those records will not have class name
and application will classify and give class name for that new record.
See below test values.

2.203259994700768E307,1.8832849888521625E307,2.16639771156
39986E306,1.0250756057356276E306,2.2434704351677847E307,2.
2434704351677847E307,3.4845121783368866E306,1.34719204705
2717E307,?
2.203259994700768E307,1.8832849888521625E307,2.16639771156
39986E306,1.0250756057356276E306,2.2434704351677847E307,2.
2434704351677847E307,3.4845121783368866E306,2.06847477167
45147E307,?
Above test values are in encrypted format and in last column we can
see ? instead of class name as we don’t know it class and application
will predict it.
Screen shots
Double click on ‘run.bat’ file to get below screen
In above screen click on ‘Upload Dataset’ button and upload any
dataset

In above screen I am uploading nursery dataset, now click on ‘Open’
button to get below screen
Now click on ‘Run Data Partition & Privacy Encryption’ to partition
and encrypt data

In above screen we can see entire dataset records in plain format, if u
want to see Homomorphic encrypted data then click on ‘View
Encrypted Data’ to get below screen
In above screen we can see all records are encrypted and only class
name which are in last column are shown to parties. With this
encrypted data nobody can understand anything. Now to build tree on
this encrypted data click on ‘Run Random Decision Tree’ button to
build tree

In above screen we can see tree generated by random decision and all
nodes contains encrypted data and this tree got accuracy as 87%. In
last line we can see accuracy. Now click on ‘Build ID3 Tree’ button
to generate tree with ID3 technique
In above screen we can see ID3 tree also but its accuracy is 71%.
Now click on ‘Classify Instance’ button to upload test file and get
prediction or classification result. Here if u build decision tree with
NURSERY dataset then upload nursery test dataset only

In above screen I am uploading nursey test dataset and below is
classification result
In above screen each records contains ‘?’ at last column and in next
line application has given or predict it class name. for example in
above screen in first record is classified as ‘recommend’.

Now click on ‘Random Decision & ID3 Tree Accuracy Graph’ button
to get below accuracy graph of both algorithms
In above graph x-axis represents algorithm name and y-axis
represents accuracy of those algorithms.
Similarly you can upload MUSHROOM dataset and test

A random decision tree frameworkfor privacy preserving data mining

Recommended

Recommended

More Related Content

What's hot

What's hot (17)

Similar to A random decision tree frameworkfor privacy preserving data mining

Similar to A random decision tree frameworkfor privacy preserving data mining (20)

More from Venkat Projects

More from Venkat Projects (20)

Recently uploaded

Recently uploaded (20)

A random decision tree frameworkfor privacy preserving data mining