SlideShare a Scribd company logo
1 of 30
Association Rule Mining with Privacy Preservation
In Horizontally Distributed Databases
Group 1 – Abhra Basak, Apoorva Kumar, Sachin K. Saini, Shiv Sankar, Suraj B. Malode
Introduction
Look before you leap
The Flow
Association
Rule Mining
Privacy
Preservation
Horizontally
Distributed
Datasets
Before we start mining!
trends or patterns in
large datasets
extracting useful
information
useful and
unexpected insights
analyze and
predicting system
behavior
Data Mining
Scalability
?
Artificial
Engineering
Machine
Learning
Statistics
Database
Systems
Association Rule Learning
By Rakesh Agarwal, IBM Almaden Research Center
• 80% of people who buy bread + butter, buy milk
• {Bread, Butter} → {Milk}
What is an Association Rule?
Antecedent
Consequent
Antecedent Consequent
Definitions
• 80% of people who buy bread + butter, buy milk
• {Bread, Butter} → {Milk}
Antecedent
• Prerequisites for
the rule to be
applied
Consequent
• The outcome
Support
• Percentage of
transaction
containing the
itemset
Confidence
• Faction of
transaction
satisfying the
rule
• Two different forms of constraints are used to generate the required association rules
• Syntactic Constraints: Restricts the attributes that may be present in a rule.
• Support Constraints: No of transactions that support a rule from the set of transactions.
Constraints
Association Rule Learning in Large Datasets
large datasets
• To find association rules
Generating
Large Itemset
• combinations of itemsets which are above a minimum support threshold
Generating
Association
Rules
•Mining all rules which are satisfied in that itemset
Association Rule Learning in Distributed Datasets
And Privacy Preservation
• Most tools used for mining association rules assume that data to be analyzed can be
collected at one central site.
• But issues like Privacy Preservation restrict the collection of data.
• Alternative methods for mining have to be devised for distributed datasets to the mining
process feasible while ensuring privacy.
Preview
• Dataset
• Combined data of Twitter and Facebook
• Rule
• How many percentage of people login into a social networking
site and post within the next 2 minutes?
Privacy Preservation
• Horizontally Partitioned (Example: Insurance Companies)
• Rule Being Mined: Does a procedure have an unusual rate of
complication?
• Implications:
• A company may have high cases of the procedure failing and
they may change policies to help.
• At the same time if this rule is exposed it may be a huge
problem for the company.
• The risks outweigh the gains.
Privacy Preservation
Patient ID Disease Prescription Effect
Patient ID Disease Prescription Effect
Patient ID Disease Prescription Effect
Company A
Company C
Company B
• Vertically Partitioned
Privacy Preservation
Credit Card No. Bought
tablet
2365987545623526 1
3639871526589414 1
4365845698742563 1
5962845632561200 1
6621563289657412 1
Credit Card No. Bought
TCover
2365987545623526 0
7639871526589414 1
4365845698742563 1
9962845632561200 0
6621563289657412 1
Common Property
Not One We
can exploit.
Mining of Association Rules
In Horizontally Partitioned Databases
What we want
• Computing Association Rules without revealing private information and getting
• The global support
• The global confidence
What we have
• Only the following information is available
• Local Support
• Local Confidence
• Size of the DB
Fundamental Steps
Even this information may not be shared freely between sites.
But we’ll get to that.
Calculating Required Values
• It protects individual privacy but each site has to disclose information.
• It reveals the local support and confidence in a rule at each site.
• This information if revealed can be harmful to an organization.
Problems with the approach
• We will be exploring two algorithms that have been used.
• One algorithm that has been used incorporates encryption with data distortion
while data sharing between sites.
• The second algorithm uses a particular Check Sum as the method of encryption.
Introducing the two Algorithms
Algorithm Uno
Some people are honest
• Phase 1: Uses encryption for mining of the large itemsets
• Phase 2: Uses a random number to preserve the privacy of each site (assuming a 3 or more party system)
Two phased algorithm
Phase 1: Commutative Encryption
Phase 2: Data Distortion
Site A
ABC:5
Size=100
Site B
ABC:6
Size=200
Site C
ABC:20
Size=300
R+count-5%*Size
=17+5-5%*100
13+20-5%*300 17+6-5%*200
13
17
18 >= R
R=17
• Doesn’t work for a 2 party system
• Assumes honest parties
• Assumes Boolean responses to variable for support of rules rather than a
subjective or weighted approach.
• As the no of candidate itemsets increases the encryption overhead
increases.
• The encryption overhead also varies directly proportional to the no of
sites or partitions.
Problems with the Algorithm
I got
……
Algorithm Dua
Don’t trust anyone
• Primarily used for to tackle semi honest sites.
• Data of each site is broken down into segments.
• Two interleaved nodes have a probability of hacking the one in between them.
• The neighbors are changed for each round. Hence, they can only obtain one such segment.
CK Secure Sum
P1
P2
P3
P4
Changing Neighbors
P1
P2
P4
P3
P1
P4
P2
P3
Round 1
Round 2
Round 3
Conclusion
The moral of the story...
Before you leave
• It is interesting that association rules play a vital role in data mining.
• Through this, what appears to be unrelated can have a logical explanation through
careful analysis.
• This aspect of data mining can be very useful in predicting patterns and foreseeing
trends in consumer behavior, choices and preferences.
• Association rules are indeed one of the best ways to succeed in business and enjoy the
harvest from data mining.
There are no dumb questions
(No questions please shhhh…)

More Related Content

Similar to Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases

CFO Half-Day Conference
CFO Half-Day ConferenceCFO Half-Day Conference
CFO Half-Day Conferencegppcpa
 
Blockchain and Cybersecurity
Blockchain and Cybersecurity Blockchain and Cybersecurity
Blockchain and Cybersecurity gppcpa
 
Advanced Analytics: Graph Database Use Cases
Advanced Analytics: Graph Database Use CasesAdvanced Analytics: Graph Database Use Cases
Advanced Analytics: Graph Database Use CasesDATAVERSITY
 
How Eastern Bank Uses Big Data to Better Serve and Protect its Customers
How Eastern Bank Uses Big Data to Better Serve and Protect its CustomersHow Eastern Bank Uses Big Data to Better Serve and Protect its Customers
How Eastern Bank Uses Big Data to Better Serve and Protect its CustomersBrian Griffith
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Roger Barga
 
Data warehousing and mining furc
Data warehousing and mining furcData warehousing and mining furc
Data warehousing and mining furcShani729
 
Group 2 Handling and Processing of big data.pptx
Group 2 Handling and Processing of big data.pptxGroup 2 Handling and Processing of big data.pptx
Group 2 Handling and Processing of big data.pptxsalutiontechnology
 
Webinar: Strategies to Enhance your Screening and Transaction Monitoring Proc...
Webinar: Strategies to Enhance your Screening and Transaction Monitoring Proc...Webinar: Strategies to Enhance your Screening and Transaction Monitoring Proc...
Webinar: Strategies to Enhance your Screening and Transaction Monitoring Proc...Alessa
 
Discovering Big Data in the Fog: Why Catalogs Matter
 Discovering Big Data in the Fog: Why Catalogs Matter Discovering Big Data in the Fog: Why Catalogs Matter
Discovering Big Data in the Fog: Why Catalogs MatterEric Kavanagh
 
Desai_edinburgh2001
Desai_edinburgh2001Desai_edinburgh2001
Desai_edinburgh2001Vijay Desai
 
Webinar Slides - How KeyBank Liberated its IT Ops from Rules-Based Event Mana...
Webinar Slides - How KeyBank Liberated its IT Ops from Rules-Based Event Mana...Webinar Slides - How KeyBank Liberated its IT Ops from Rules-Based Event Mana...
Webinar Slides - How KeyBank Liberated its IT Ops from Rules-Based Event Mana...Moogsoft
 
Riskfactorypcitheessentials 151125164111-lva1-app6892
Riskfactorypcitheessentials 151125164111-lva1-app6892Riskfactorypcitheessentials 151125164111-lva1-app6892
Riskfactorypcitheessentials 151125164111-lva1-app6892Risk Crew
 
Get the FUD out of Cybersecurity! ISACA CSXNA 2016 in Las Vegas
Get the FUD out of Cybersecurity! ISACA CSXNA 2016 in Las VegasGet the FUD out of Cybersecurity! ISACA CSXNA 2016 in Las Vegas
Get the FUD out of Cybersecurity! ISACA CSXNA 2016 in Las VegasShawn Tuma
 
Introduction to Data mining
Introduction to Data miningIntroduction to Data mining
Introduction to Data miningHadi Fadlallah
 
Legal Issues Associated with Third-Party Cyber Risk
Legal Issues Associated with Third-Party Cyber RiskLegal Issues Associated with Third-Party Cyber Risk
Legal Issues Associated with Third-Party Cyber RiskShawn Tuma
 

Similar to Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases (20)

CFO Half-Day Conference
CFO Half-Day ConferenceCFO Half-Day Conference
CFO Half-Day Conference
 
Blockchain and Cybersecurity
Blockchain and Cybersecurity Blockchain and Cybersecurity
Blockchain and Cybersecurity
 
Trends in data analytics
Trends in data analyticsTrends in data analytics
Trends in data analytics
 
Advanced Analytics: Graph Database Use Cases
Advanced Analytics: Graph Database Use CasesAdvanced Analytics: Graph Database Use Cases
Advanced Analytics: Graph Database Use Cases
 
Data mining
Data miningData mining
Data mining
 
How Eastern Bank Uses Big Data to Better Serve and Protect its Customers
How Eastern Bank Uses Big Data to Better Serve and Protect its CustomersHow Eastern Bank Uses Big Data to Better Serve and Protect its Customers
How Eastern Bank Uses Big Data to Better Serve and Protect its Customers
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015
 
Data warehousing and mining furc
Data warehousing and mining furcData warehousing and mining furc
Data warehousing and mining furc
 
Group 2 Handling and Processing of big data.pptx
Group 2 Handling and Processing of big data.pptxGroup 2 Handling and Processing of big data.pptx
Group 2 Handling and Processing of big data.pptx
 
Webinar: Strategies to Enhance your Screening and Transaction Monitoring Proc...
Webinar: Strategies to Enhance your Screening and Transaction Monitoring Proc...Webinar: Strategies to Enhance your Screening and Transaction Monitoring Proc...
Webinar: Strategies to Enhance your Screening and Transaction Monitoring Proc...
 
Discovering Big Data in the Fog: Why Catalogs Matter
 Discovering Big Data in the Fog: Why Catalogs Matter Discovering Big Data in the Fog: Why Catalogs Matter
Discovering Big Data in the Fog: Why Catalogs Matter
 
Desai_edinburgh2001
Desai_edinburgh2001Desai_edinburgh2001
Desai_edinburgh2001
 
MVA Project.pptx
MVA Project.pptxMVA Project.pptx
MVA Project.pptx
 
Webinar Slides - How KeyBank Liberated its IT Ops from Rules-Based Event Mana...
Webinar Slides - How KeyBank Liberated its IT Ops from Rules-Based Event Mana...Webinar Slides - How KeyBank Liberated its IT Ops from Rules-Based Event Mana...
Webinar Slides - How KeyBank Liberated its IT Ops from Rules-Based Event Mana...
 
Fraud detection analysis
Fraud detection analysis Fraud detection analysis
Fraud detection analysis
 
Design for Security
Design for SecurityDesign for Security
Design for Security
 
Riskfactorypcitheessentials 151125164111-lva1-app6892
Riskfactorypcitheessentials 151125164111-lva1-app6892Riskfactorypcitheessentials 151125164111-lva1-app6892
Riskfactorypcitheessentials 151125164111-lva1-app6892
 
Get the FUD out of Cybersecurity! ISACA CSXNA 2016 in Las Vegas
Get the FUD out of Cybersecurity! ISACA CSXNA 2016 in Las VegasGet the FUD out of Cybersecurity! ISACA CSXNA 2016 in Las Vegas
Get the FUD out of Cybersecurity! ISACA CSXNA 2016 in Las Vegas
 
Introduction to Data mining
Introduction to Data miningIntroduction to Data mining
Introduction to Data mining
 
Legal Issues Associated with Third-Party Cyber Risk
Legal Issues Associated with Third-Party Cyber RiskLegal Issues Associated with Third-Party Cyber Risk
Legal Issues Associated with Third-Party Cyber Risk
 

More from Abhra Basak

FINANCIAL MARKET PREDICTION AND PORTFOLIO OPTIMIZATION USING FUZZY DECISION T...
FINANCIAL MARKET PREDICTION AND PORTFOLIO OPTIMIZATION USING FUZZY DECISION T...FINANCIAL MARKET PREDICTION AND PORTFOLIO OPTIMIZATION USING FUZZY DECISION T...
FINANCIAL MARKET PREDICTION AND PORTFOLIO OPTIMIZATION USING FUZZY DECISION T...Abhra Basak
 
Concurrency in java
Concurrency in javaConcurrency in java
Concurrency in javaAbhra Basak
 
Introduction to XML
Introduction to XMLIntroduction to XML
Introduction to XMLAbhra Basak
 
Spanner - Google distributed database
Spanner - Google distributed databaseSpanner - Google distributed database
Spanner - Google distributed databaseAbhra Basak
 
DADAGIRI - The Fire Within
DADAGIRI - The Fire WithinDADAGIRI - The Fire Within
DADAGIRI - The Fire WithinAbhra Basak
 
Usability evaluation of the IIT Mandi Website
Usability evaluation of the IIT Mandi WebsiteUsability evaluation of the IIT Mandi Website
Usability evaluation of the IIT Mandi WebsiteAbhra Basak
 
Course Recommender
Course RecommenderCourse Recommender
Course RecommenderAbhra Basak
 
National Stock Exchange and Nasdaq 100
National Stock Exchange and Nasdaq 100National Stock Exchange and Nasdaq 100
National Stock Exchange and Nasdaq 100Abhra Basak
 

More from Abhra Basak (8)

FINANCIAL MARKET PREDICTION AND PORTFOLIO OPTIMIZATION USING FUZZY DECISION T...
FINANCIAL MARKET PREDICTION AND PORTFOLIO OPTIMIZATION USING FUZZY DECISION T...FINANCIAL MARKET PREDICTION AND PORTFOLIO OPTIMIZATION USING FUZZY DECISION T...
FINANCIAL MARKET PREDICTION AND PORTFOLIO OPTIMIZATION USING FUZZY DECISION T...
 
Concurrency in java
Concurrency in javaConcurrency in java
Concurrency in java
 
Introduction to XML
Introduction to XMLIntroduction to XML
Introduction to XML
 
Spanner - Google distributed database
Spanner - Google distributed databaseSpanner - Google distributed database
Spanner - Google distributed database
 
DADAGIRI - The Fire Within
DADAGIRI - The Fire WithinDADAGIRI - The Fire Within
DADAGIRI - The Fire Within
 
Usability evaluation of the IIT Mandi Website
Usability evaluation of the IIT Mandi WebsiteUsability evaluation of the IIT Mandi Website
Usability evaluation of the IIT Mandi Website
 
Course Recommender
Course RecommenderCourse Recommender
Course Recommender
 
National Stock Exchange and Nasdaq 100
National Stock Exchange and Nasdaq 100National Stock Exchange and Nasdaq 100
National Stock Exchange and Nasdaq 100
 

Recently uploaded

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 

Recently uploaded (20)

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases

  • 1. Association Rule Mining with Privacy Preservation In Horizontally Distributed Databases Group 1 – Abhra Basak, Apoorva Kumar, Sachin K. Saini, Shiv Sankar, Suraj B. Malode
  • 4. Before we start mining! trends or patterns in large datasets extracting useful information useful and unexpected insights analyze and predicting system behavior Data Mining Scalability ? Artificial Engineering Machine Learning Statistics Database Systems
  • 5. Association Rule Learning By Rakesh Agarwal, IBM Almaden Research Center
  • 6. • 80% of people who buy bread + butter, buy milk • {Bread, Butter} → {Milk} What is an Association Rule? Antecedent Consequent Antecedent Consequent
  • 7. Definitions • 80% of people who buy bread + butter, buy milk • {Bread, Butter} → {Milk} Antecedent • Prerequisites for the rule to be applied Consequent • The outcome Support • Percentage of transaction containing the itemset Confidence • Faction of transaction satisfying the rule
  • 8. • Two different forms of constraints are used to generate the required association rules • Syntactic Constraints: Restricts the attributes that may be present in a rule. • Support Constraints: No of transactions that support a rule from the set of transactions. Constraints
  • 9. Association Rule Learning in Large Datasets large datasets • To find association rules Generating Large Itemset • combinations of itemsets which are above a minimum support threshold Generating Association Rules •Mining all rules which are satisfied in that itemset
  • 10. Association Rule Learning in Distributed Datasets And Privacy Preservation
  • 11. • Most tools used for mining association rules assume that data to be analyzed can be collected at one central site. • But issues like Privacy Preservation restrict the collection of data. • Alternative methods for mining have to be devised for distributed datasets to the mining process feasible while ensuring privacy. Preview
  • 12. • Dataset • Combined data of Twitter and Facebook • Rule • How many percentage of people login into a social networking site and post within the next 2 minutes? Privacy Preservation
  • 13. • Horizontally Partitioned (Example: Insurance Companies) • Rule Being Mined: Does a procedure have an unusual rate of complication? • Implications: • A company may have high cases of the procedure failing and they may change policies to help. • At the same time if this rule is exposed it may be a huge problem for the company. • The risks outweigh the gains. Privacy Preservation Patient ID Disease Prescription Effect Patient ID Disease Prescription Effect Patient ID Disease Prescription Effect Company A Company C Company B
  • 14. • Vertically Partitioned Privacy Preservation Credit Card No. Bought tablet 2365987545623526 1 3639871526589414 1 4365845698742563 1 5962845632561200 1 6621563289657412 1 Credit Card No. Bought TCover 2365987545623526 0 7639871526589414 1 4365845698742563 1 9962845632561200 0 6621563289657412 1 Common Property Not One We can exploit.
  • 15. Mining of Association Rules In Horizontally Partitioned Databases
  • 16. What we want • Computing Association Rules without revealing private information and getting • The global support • The global confidence What we have • Only the following information is available • Local Support • Local Confidence • Size of the DB Fundamental Steps Even this information may not be shared freely between sites. But we’ll get to that.
  • 18. • It protects individual privacy but each site has to disclose information. • It reveals the local support and confidence in a rule at each site. • This information if revealed can be harmful to an organization. Problems with the approach
  • 19. • We will be exploring two algorithms that have been used. • One algorithm that has been used incorporates encryption with data distortion while data sharing between sites. • The second algorithm uses a particular Check Sum as the method of encryption. Introducing the two Algorithms
  • 21. • Phase 1: Uses encryption for mining of the large itemsets • Phase 2: Uses a random number to preserve the privacy of each site (assuming a 3 or more party system) Two phased algorithm
  • 22. Phase 1: Commutative Encryption
  • 23. Phase 2: Data Distortion Site A ABC:5 Size=100 Site B ABC:6 Size=200 Site C ABC:20 Size=300 R+count-5%*Size =17+5-5%*100 13+20-5%*300 17+6-5%*200 13 17 18 >= R R=17
  • 24. • Doesn’t work for a 2 party system • Assumes honest parties • Assumes Boolean responses to variable for support of rules rather than a subjective or weighted approach. • As the no of candidate itemsets increases the encryption overhead increases. • The encryption overhead also varies directly proportional to the no of sites or partitions. Problems with the Algorithm I got ……
  • 26. • Primarily used for to tackle semi honest sites. • Data of each site is broken down into segments. • Two interleaved nodes have a probability of hacking the one in between them. • The neighbors are changed for each round. Hence, they can only obtain one such segment. CK Secure Sum
  • 28. Conclusion The moral of the story...
  • 29. Before you leave • It is interesting that association rules play a vital role in data mining. • Through this, what appears to be unrelated can have a logical explanation through careful analysis. • This aspect of data mining can be very useful in predicting patterns and foreseeing trends in consumer behavior, choices and preferences. • Association rules are indeed one of the best ways to succeed in business and enjoy the harvest from data mining.
  • 30. There are no dumb questions (No questions please shhhh…)

Editor's Notes

  1. Replace arrows :P
  2. Support - It provides the idea of feasibility of a rule; sometimes applied to antecedent only
  3. Replace arrow