SlideShare a Scribd company logo
Association Rule Mining with Privacy Preservation
In Horizontally Distributed Databases
Group 1 – Abhra Basak, Apoorva Kumar, Sachin K. Saini, Shiv Sankar, Suraj B. Malode
Introduction
Look before you leap
The Flow
Association
Rule Mining
Privacy
Preservation
Horizontally
Distributed
Datasets
Before we start mining!
trends or patterns in
large datasets
extracting useful
information
useful and
unexpected insights
analyze and
predicting system
behavior
Data Mining
Scalability
?
Artificial
Engineering
Machine
Learning
Statistics
Database
Systems
Association Rule Learning
By Rakesh Agarwal, IBM Almaden Research Center
• 80% of people who buy bread + butter, buy milk
• {Bread, Butter} → {Milk}
What is an Association Rule?
Antecedent
Consequent
Antecedent Consequent
Definitions
• 80% of people who buy bread + butter, buy milk
• {Bread, Butter} → {Milk}
Antecedent
• Prerequisites for
the rule to be
applied
Consequent
• The outcome
Support
• Percentage of
transaction
containing the
itemset
Confidence
• Faction of
transaction
satisfying the
rule
• Two different forms of constraints are used to generate the required association rules
• Syntactic Constraints: Restricts the attributes that may be present in a rule.
• Support Constraints: No of transactions that support a rule from the set of transactions.
Constraints
Association Rule Learning in Large Datasets
large datasets
• To find association rules
Generating
Large Itemset
• combinations of itemsets which are above a minimum support threshold
Generating
Association
Rules
•Mining all rules which are satisfied in that itemset
Association Rule Learning in Distributed Datasets
And Privacy Preservation
• Most tools used for mining association rules assume that data to be analyzed can be
collected at one central site.
• But issues like Privacy Preservation restrict the collection of data.
• Alternative methods for mining have to be devised for distributed datasets to the mining
process feasible while ensuring privacy.
Preview
• Dataset
• Combined data of Twitter and Facebook
• Rule
• How many percentage of people login into a social networking
site and post within the next 2 minutes?
Privacy Preservation
• Horizontally Partitioned (Example: Insurance Companies)
• Rule Being Mined: Does a procedure have an unusual rate of
complication?
• Implications:
• A company may have high cases of the procedure failing and
they may change policies to help.
• At the same time if this rule is exposed it may be a huge
problem for the company.
• The risks outweigh the gains.
Privacy Preservation
Patient ID Disease Prescription Effect
Patient ID Disease Prescription Effect
Patient ID Disease Prescription Effect
Company A
Company C
Company B
• Vertically Partitioned
Privacy Preservation
Credit Card No. Bought
tablet
2365987545623526 1
3639871526589414 1
4365845698742563 1
5962845632561200 1
6621563289657412 1
Credit Card No. Bought
TCover
2365987545623526 0
7639871526589414 1
4365845698742563 1
9962845632561200 0
6621563289657412 1
Common Property
Not One We
can exploit.
Mining of Association Rules
In Horizontally Partitioned Databases
What we want
• Computing Association Rules without revealing private information and getting
• The global support
• The global confidence
What we have
• Only the following information is available
• Local Support
• Local Confidence
• Size of the DB
Fundamental Steps
Even this information may not be shared freely between sites.
But we’ll get to that.
Calculating Required Values
• It protects individual privacy but each site has to disclose information.
• It reveals the local support and confidence in a rule at each site.
• This information if revealed can be harmful to an organization.
Problems with the approach
• We will be exploring two algorithms that have been used.
• One algorithm that has been used incorporates encryption with data distortion
while data sharing between sites.
• The second algorithm uses a particular Check Sum as the method of encryption.
Introducing the two Algorithms
Algorithm Uno
Some people are honest
• Phase 1: Uses encryption for mining of the large itemsets
• Phase 2: Uses a random number to preserve the privacy of each site (assuming a 3 or more party system)
Two phased algorithm
Phase 1: Commutative Encryption
Phase 2: Data Distortion
Site A
ABC:5
Size=100
Site B
ABC:6
Size=200
Site C
ABC:20
Size=300
R+count-5%*Size
=17+5-5%*100
13+20-5%*300 17+6-5%*200
13
17
18 >= R
R=17
• Doesn’t work for a 2 party system
• Assumes honest parties
• Assumes Boolean responses to variable for support of rules rather than a
subjective or weighted approach.
• As the no of candidate itemsets increases the encryption overhead
increases.
• The encryption overhead also varies directly proportional to the no of
sites or partitions.
Problems with the Algorithm
I got
……
Algorithm Dua
Don’t trust anyone
• Primarily used for to tackle semi honest sites.
• Data of each site is broken down into segments.
• Two interleaved nodes have a probability of hacking the one in between them.
• The neighbors are changed for each round. Hence, they can only obtain one such segment.
CK Secure Sum
P1
P2
P3
P4
Changing Neighbors
P1
P2
P4
P3
P1
P4
P2
P3
Round 1
Round 2
Round 3
Conclusion
The moral of the story...
Before you leave
• It is interesting that association rules play a vital role in data mining.
• Through this, what appears to be unrelated can have a logical explanation through
careful analysis.
• This aspect of data mining can be very useful in predicting patterns and foreseeing
trends in consumer behavior, choices and preferences.
• Association rules are indeed one of the best ways to succeed in business and enjoy the
harvest from data mining.
There are no dumb questions
(No questions please shhhh…)

More Related Content

Similar to Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases

CFO Half-Day Conference
CFO Half-Day ConferenceCFO Half-Day Conference
CFO Half-Day Conference
gppcpa
 
Blockchain and Cybersecurity
Blockchain and Cybersecurity Blockchain and Cybersecurity
Blockchain and Cybersecurity
gppcpa
 
Trends in data analytics
Trends in data analyticsTrends in data analytics
Trends in data analytics
Ramakrishnan Venkataramanan
 
Advanced Analytics: Graph Database Use Cases
Advanced Analytics: Graph Database Use CasesAdvanced Analytics: Graph Database Use Cases
Advanced Analytics: Graph Database Use Cases
DATAVERSITY
 
Data mining
Data miningData mining
Data mining
SumitMuley2
 
How Eastern Bank Uses Big Data to Better Serve and Protect its Customers
How Eastern Bank Uses Big Data to Better Serve and Protect its CustomersHow Eastern Bank Uses Big Data to Better Serve and Protect its Customers
How Eastern Bank Uses Big Data to Better Serve and Protect its Customers
Brian Griffith
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015
Roger Barga
 
Data warehousing and mining furc
Data warehousing and mining furcData warehousing and mining furc
Data warehousing and mining furc
Shani729
 
Group 2 Handling and Processing of big data.pptx
Group 2 Handling and Processing of big data.pptxGroup 2 Handling and Processing of big data.pptx
Group 2 Handling and Processing of big data.pptx
salutiontechnology
 
Webinar: Strategies to Enhance your Screening and Transaction Monitoring Proc...
Webinar: Strategies to Enhance your Screening and Transaction Monitoring Proc...Webinar: Strategies to Enhance your Screening and Transaction Monitoring Proc...
Webinar: Strategies to Enhance your Screening and Transaction Monitoring Proc...
Alessa
 
Discovering Big Data in the Fog: Why Catalogs Matter
 Discovering Big Data in the Fog: Why Catalogs Matter Discovering Big Data in the Fog: Why Catalogs Matter
Discovering Big Data in the Fog: Why Catalogs Matter
Eric Kavanagh
 
Desai_edinburgh2001
Desai_edinburgh2001Desai_edinburgh2001
Desai_edinburgh2001
Vijay Desai
 
MVA Project.pptx
MVA Project.pptxMVA Project.pptx
MVA Project.pptx
SarjakManiar
 
Webinar Slides - How KeyBank Liberated its IT Ops from Rules-Based Event Mana...
Webinar Slides - How KeyBank Liberated its IT Ops from Rules-Based Event Mana...Webinar Slides - How KeyBank Liberated its IT Ops from Rules-Based Event Mana...
Webinar Slides - How KeyBank Liberated its IT Ops from Rules-Based Event Mana...
Moogsoft
 
Fraud detection analysis
Fraud detection analysis Fraud detection analysis
Fraud detection analysis
SAI MANIKANTA MANASANI
 
Design for Security
Design for SecurityDesign for Security
Design for Security
Samuel HyunGyu Kim
 
Riskfactorypcitheessentials 151125164111-lva1-app6892
Riskfactorypcitheessentials 151125164111-lva1-app6892Riskfactorypcitheessentials 151125164111-lva1-app6892
Riskfactorypcitheessentials 151125164111-lva1-app6892
Risk Crew
 
Get the FUD out of Cybersecurity! ISACA CSXNA 2016 in Las Vegas
Get the FUD out of Cybersecurity! ISACA CSXNA 2016 in Las VegasGet the FUD out of Cybersecurity! ISACA CSXNA 2016 in Las Vegas
Get the FUD out of Cybersecurity! ISACA CSXNA 2016 in Las Vegas
Shawn Tuma
 
Introduction to Data mining
Introduction to Data miningIntroduction to Data mining
Introduction to Data mining
Hadi Fadlallah
 
Legal Issues Associated with Third-Party Cyber Risk
Legal Issues Associated with Third-Party Cyber RiskLegal Issues Associated with Third-Party Cyber Risk
Legal Issues Associated with Third-Party Cyber Risk
Shawn Tuma
 

Similar to Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases (20)

CFO Half-Day Conference
CFO Half-Day ConferenceCFO Half-Day Conference
CFO Half-Day Conference
 
Blockchain and Cybersecurity
Blockchain and Cybersecurity Blockchain and Cybersecurity
Blockchain and Cybersecurity
 
Trends in data analytics
Trends in data analyticsTrends in data analytics
Trends in data analytics
 
Advanced Analytics: Graph Database Use Cases
Advanced Analytics: Graph Database Use CasesAdvanced Analytics: Graph Database Use Cases
Advanced Analytics: Graph Database Use Cases
 
Data mining
Data miningData mining
Data mining
 
How Eastern Bank Uses Big Data to Better Serve and Protect its Customers
How Eastern Bank Uses Big Data to Better Serve and Protect its CustomersHow Eastern Bank Uses Big Data to Better Serve and Protect its Customers
How Eastern Bank Uses Big Data to Better Serve and Protect its Customers
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015
 
Data warehousing and mining furc
Data warehousing and mining furcData warehousing and mining furc
Data warehousing and mining furc
 
Group 2 Handling and Processing of big data.pptx
Group 2 Handling and Processing of big data.pptxGroup 2 Handling and Processing of big data.pptx
Group 2 Handling and Processing of big data.pptx
 
Webinar: Strategies to Enhance your Screening and Transaction Monitoring Proc...
Webinar: Strategies to Enhance your Screening and Transaction Monitoring Proc...Webinar: Strategies to Enhance your Screening and Transaction Monitoring Proc...
Webinar: Strategies to Enhance your Screening and Transaction Monitoring Proc...
 
Discovering Big Data in the Fog: Why Catalogs Matter
 Discovering Big Data in the Fog: Why Catalogs Matter Discovering Big Data in the Fog: Why Catalogs Matter
Discovering Big Data in the Fog: Why Catalogs Matter
 
Desai_edinburgh2001
Desai_edinburgh2001Desai_edinburgh2001
Desai_edinburgh2001
 
MVA Project.pptx
MVA Project.pptxMVA Project.pptx
MVA Project.pptx
 
Webinar Slides - How KeyBank Liberated its IT Ops from Rules-Based Event Mana...
Webinar Slides - How KeyBank Liberated its IT Ops from Rules-Based Event Mana...Webinar Slides - How KeyBank Liberated its IT Ops from Rules-Based Event Mana...
Webinar Slides - How KeyBank Liberated its IT Ops from Rules-Based Event Mana...
 
Fraud detection analysis
Fraud detection analysis Fraud detection analysis
Fraud detection analysis
 
Design for Security
Design for SecurityDesign for Security
Design for Security
 
Riskfactorypcitheessentials 151125164111-lva1-app6892
Riskfactorypcitheessentials 151125164111-lva1-app6892Riskfactorypcitheessentials 151125164111-lva1-app6892
Riskfactorypcitheessentials 151125164111-lva1-app6892
 
Get the FUD out of Cybersecurity! ISACA CSXNA 2016 in Las Vegas
Get the FUD out of Cybersecurity! ISACA CSXNA 2016 in Las VegasGet the FUD out of Cybersecurity! ISACA CSXNA 2016 in Las Vegas
Get the FUD out of Cybersecurity! ISACA CSXNA 2016 in Las Vegas
 
Introduction to Data mining
Introduction to Data miningIntroduction to Data mining
Introduction to Data mining
 
Legal Issues Associated with Third-Party Cyber Risk
Legal Issues Associated with Third-Party Cyber RiskLegal Issues Associated with Third-Party Cyber Risk
Legal Issues Associated with Third-Party Cyber Risk
 

More from Abhra Basak

FINANCIAL MARKET PREDICTION AND PORTFOLIO OPTIMIZATION USING FUZZY DECISION T...
FINANCIAL MARKET PREDICTION AND PORTFOLIO OPTIMIZATION USING FUZZY DECISION T...FINANCIAL MARKET PREDICTION AND PORTFOLIO OPTIMIZATION USING FUZZY DECISION T...
FINANCIAL MARKET PREDICTION AND PORTFOLIO OPTIMIZATION USING FUZZY DECISION T...
Abhra Basak
 
Concurrency in java
Concurrency in javaConcurrency in java
Concurrency in java
Abhra Basak
 
Introduction to XML
Introduction to XMLIntroduction to XML
Introduction to XML
Abhra Basak
 
Spanner - Google distributed database
Spanner - Google distributed databaseSpanner - Google distributed database
Spanner - Google distributed database
Abhra Basak
 
DADAGIRI - The Fire Within
DADAGIRI - The Fire WithinDADAGIRI - The Fire Within
DADAGIRI - The Fire Within
Abhra Basak
 
Usability evaluation of the IIT Mandi Website
Usability evaluation of the IIT Mandi WebsiteUsability evaluation of the IIT Mandi Website
Usability evaluation of the IIT Mandi Website
Abhra Basak
 
Course Recommender
Course RecommenderCourse Recommender
Course Recommender
Abhra Basak
 
National Stock Exchange and Nasdaq 100
National Stock Exchange and Nasdaq 100National Stock Exchange and Nasdaq 100
National Stock Exchange and Nasdaq 100
Abhra Basak
 

More from Abhra Basak (8)

FINANCIAL MARKET PREDICTION AND PORTFOLIO OPTIMIZATION USING FUZZY DECISION T...
FINANCIAL MARKET PREDICTION AND PORTFOLIO OPTIMIZATION USING FUZZY DECISION T...FINANCIAL MARKET PREDICTION AND PORTFOLIO OPTIMIZATION USING FUZZY DECISION T...
FINANCIAL MARKET PREDICTION AND PORTFOLIO OPTIMIZATION USING FUZZY DECISION T...
 
Concurrency in java
Concurrency in javaConcurrency in java
Concurrency in java
 
Introduction to XML
Introduction to XMLIntroduction to XML
Introduction to XML
 
Spanner - Google distributed database
Spanner - Google distributed databaseSpanner - Google distributed database
Spanner - Google distributed database
 
DADAGIRI - The Fire Within
DADAGIRI - The Fire WithinDADAGIRI - The Fire Within
DADAGIRI - The Fire Within
 
Usability evaluation of the IIT Mandi Website
Usability evaluation of the IIT Mandi WebsiteUsability evaluation of the IIT Mandi Website
Usability evaluation of the IIT Mandi Website
 
Course Recommender
Course RecommenderCourse Recommender
Course Recommender
 
National Stock Exchange and Nasdaq 100
National Stock Exchange and Nasdaq 100National Stock Exchange and Nasdaq 100
National Stock Exchange and Nasdaq 100
 

Recently uploaded

Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 

Recently uploaded (20)

Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 

Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases

  • 1. Association Rule Mining with Privacy Preservation In Horizontally Distributed Databases Group 1 – Abhra Basak, Apoorva Kumar, Sachin K. Saini, Shiv Sankar, Suraj B. Malode
  • 4. Before we start mining! trends or patterns in large datasets extracting useful information useful and unexpected insights analyze and predicting system behavior Data Mining Scalability ? Artificial Engineering Machine Learning Statistics Database Systems
  • 5. Association Rule Learning By Rakesh Agarwal, IBM Almaden Research Center
  • 6. • 80% of people who buy bread + butter, buy milk • {Bread, Butter} → {Milk} What is an Association Rule? Antecedent Consequent Antecedent Consequent
  • 7. Definitions • 80% of people who buy bread + butter, buy milk • {Bread, Butter} → {Milk} Antecedent • Prerequisites for the rule to be applied Consequent • The outcome Support • Percentage of transaction containing the itemset Confidence • Faction of transaction satisfying the rule
  • 8. • Two different forms of constraints are used to generate the required association rules • Syntactic Constraints: Restricts the attributes that may be present in a rule. • Support Constraints: No of transactions that support a rule from the set of transactions. Constraints
  • 9. Association Rule Learning in Large Datasets large datasets • To find association rules Generating Large Itemset • combinations of itemsets which are above a minimum support threshold Generating Association Rules •Mining all rules which are satisfied in that itemset
  • 10. Association Rule Learning in Distributed Datasets And Privacy Preservation
  • 11. • Most tools used for mining association rules assume that data to be analyzed can be collected at one central site. • But issues like Privacy Preservation restrict the collection of data. • Alternative methods for mining have to be devised for distributed datasets to the mining process feasible while ensuring privacy. Preview
  • 12. • Dataset • Combined data of Twitter and Facebook • Rule • How many percentage of people login into a social networking site and post within the next 2 minutes? Privacy Preservation
  • 13. • Horizontally Partitioned (Example: Insurance Companies) • Rule Being Mined: Does a procedure have an unusual rate of complication? • Implications: • A company may have high cases of the procedure failing and they may change policies to help. • At the same time if this rule is exposed it may be a huge problem for the company. • The risks outweigh the gains. Privacy Preservation Patient ID Disease Prescription Effect Patient ID Disease Prescription Effect Patient ID Disease Prescription Effect Company A Company C Company B
  • 14. • Vertically Partitioned Privacy Preservation Credit Card No. Bought tablet 2365987545623526 1 3639871526589414 1 4365845698742563 1 5962845632561200 1 6621563289657412 1 Credit Card No. Bought TCover 2365987545623526 0 7639871526589414 1 4365845698742563 1 9962845632561200 0 6621563289657412 1 Common Property Not One We can exploit.
  • 15. Mining of Association Rules In Horizontally Partitioned Databases
  • 16. What we want • Computing Association Rules without revealing private information and getting • The global support • The global confidence What we have • Only the following information is available • Local Support • Local Confidence • Size of the DB Fundamental Steps Even this information may not be shared freely between sites. But we’ll get to that.
  • 18. • It protects individual privacy but each site has to disclose information. • It reveals the local support and confidence in a rule at each site. • This information if revealed can be harmful to an organization. Problems with the approach
  • 19. • We will be exploring two algorithms that have been used. • One algorithm that has been used incorporates encryption with data distortion while data sharing between sites. • The second algorithm uses a particular Check Sum as the method of encryption. Introducing the two Algorithms
  • 21. • Phase 1: Uses encryption for mining of the large itemsets • Phase 2: Uses a random number to preserve the privacy of each site (assuming a 3 or more party system) Two phased algorithm
  • 22. Phase 1: Commutative Encryption
  • 23. Phase 2: Data Distortion Site A ABC:5 Size=100 Site B ABC:6 Size=200 Site C ABC:20 Size=300 R+count-5%*Size =17+5-5%*100 13+20-5%*300 17+6-5%*200 13 17 18 >= R R=17
  • 24. • Doesn’t work for a 2 party system • Assumes honest parties • Assumes Boolean responses to variable for support of rules rather than a subjective or weighted approach. • As the no of candidate itemsets increases the encryption overhead increases. • The encryption overhead also varies directly proportional to the no of sites or partitions. Problems with the Algorithm I got ……
  • 26. • Primarily used for to tackle semi honest sites. • Data of each site is broken down into segments. • Two interleaved nodes have a probability of hacking the one in between them. • The neighbors are changed for each round. Hence, they can only obtain one such segment. CK Secure Sum
  • 28. Conclusion The moral of the story...
  • 29. Before you leave • It is interesting that association rules play a vital role in data mining. • Through this, what appears to be unrelated can have a logical explanation through careful analysis. • This aspect of data mining can be very useful in predicting patterns and foreseeing trends in consumer behavior, choices and preferences. • Association rules are indeed one of the best ways to succeed in business and enjoy the harvest from data mining.
  • 30. There are no dumb questions (No questions please shhhh…)

Editor's Notes

  1. Replace arrows :P
  2. Support - It provides the idea of feasibility of a rule; sometimes applied to antecedent only
  3. Replace arrow