Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases


Published on

Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Replace arrows :P
  • Support - It provides the idea of feasibility of a rule; sometimes applied to antecedent only
  • Replace arrow
  • Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases

    1. 1. Association Rule Mining with Privacy PreservationIn Horizontally Distributed DatabasesGroup 1 – Abhra Basak, Apoorva Kumar, Sachin K. Saini, Shiv Sankar, Suraj B. Malode
    2. 2. IntroductionLook before you leap
    3. 3. The FlowAssociationRule MiningPrivacyPreservationHorizontallyDistributedDatasets
    4. 4. Before we start mining!trends or patterns inlarge datasetsextracting usefulinformationuseful andunexpected insightsanalyze andpredicting systembehaviorData MiningScalability?ArtificialEngineeringMachineLearningStatisticsDatabaseSystems
    5. 5. Association Rule LearningBy Rakesh Agarwal, IBM Almaden Research Center
    6. 6. • 80% of people who buy bread + butter, buy milk• {Bread, Butter} → {Milk}What is an Association Rule?AntecedentConsequentAntecedent Consequent
    7. 7. Definitions• 80% of people who buy bread + butter, buy milk• {Bread, Butter} → {Milk}Antecedent• Prerequisites forthe rule to beappliedConsequent• The outcomeSupport• Percentage oftransactioncontaining theitemsetConfidence• Faction oftransactionsatisfying therule
    8. 8. • Two different forms of constraints are used to generate the required association rules• Syntactic Constraints: Restricts the attributes that may be present in a rule.• Support Constraints: No of transactions that support a rule from the set of transactions.Constraints
    9. 9. Association Rule Learning in Large Datasetslarge datasets• To find association rulesGeneratingLarge Itemset• combinations of itemsets which are above a minimum support thresholdGeneratingAssociationRules•Mining all rules which are satisfied in that itemset
    10. 10. Association Rule Learning in Distributed DatasetsAnd Privacy Preservation
    11. 11. • Most tools used for mining association rules assume that data to be analyzed can becollected at one central site.• But issues like Privacy Preservation restrict the collection of data.• Alternative methods for mining have to be devised for distributed datasets to the miningprocess feasible while ensuring privacy.Preview
    12. 12. • Dataset• Combined data of Twitter and Facebook• Rule• How many percentage of people login into a social networkingsite and post within the next 2 minutes?Privacy Preservation
    13. 13. • Horizontally Partitioned (Example: Insurance Companies)• Rule Being Mined: Does a procedure have an unusual rate ofcomplication?• Implications:• A company may have high cases of the procedure failing andthey may change policies to help.• At the same time if this rule is exposed it may be a hugeproblem for the company.• The risks outweigh the gains.Privacy PreservationPatient ID Disease Prescription EffectPatient ID Disease Prescription EffectPatient ID Disease Prescription EffectCompany ACompany CCompany B
    14. 14. • Vertically PartitionedPrivacy PreservationCredit Card No. Boughttablet2365987545623526 13639871526589414 14365845698742563 15962845632561200 16621563289657412 1Credit Card No. BoughtTCover2365987545623526 07639871526589414 14365845698742563 19962845632561200 06621563289657412 1Common PropertyNot One Wecan exploit.
    15. 15. Mining of Association RulesIn Horizontally Partitioned Databases
    16. 16. What we want• Computing Association Rules without revealing private information and getting• The global support• The global confidenceWhat we have• Only the following information is available• Local Support• Local Confidence• Size of the DBFundamental StepsEven this information may not be shared freely between sites.But we’ll get to that.
    17. 17. Calculating Required Values
    18. 18. • It protects individual privacy but each site has to disclose information.• It reveals the local support and confidence in a rule at each site.• This information if revealed can be harmful to an organization.Problems with the approach
    19. 19. • We will be exploring two algorithms that have been used.• One algorithm that has been used incorporates encryption with data distortionwhile data sharing between sites.• The second algorithm uses a particular Check Sum as the method of encryption.Introducing the two Algorithms
    20. 20. Algorithm UnoSome people are honest
    21. 21. • Phase 1: Uses encryption for mining of the large itemsets• Phase 2: Uses a random number to preserve the privacy of each site (assuming a 3 or more party system)Two phased algorithm
    22. 22. Phase 1: Commutative Encryption
    23. 23. Phase 2: Data DistortionSite AABC:5Size=100Site BABC:6Size=200Site CABC:20Size=300R+count-5%*Size=17+5-5%*10013+20-5%*300 17+6-5%*200131718 >= RR=17
    24. 24. • Doesn’t work for a 2 party system• Assumes honest parties• Assumes Boolean responses to variable for support of rules rather than asubjective or weighted approach.• As the no of candidate itemsets increases the encryption overheadincreases.• The encryption overhead also varies directly proportional to the no ofsites or partitions.Problems with the AlgorithmI got……
    25. 25. Algorithm DuaDon’t trust anyone
    26. 26. • Primarily used for to tackle semi honest sites.• Data of each site is broken down into segments.• Two interleaved nodes have a probability of hacking the one in between them.• The neighbors are changed for each round. Hence, they can only obtain one such segment.CK Secure Sum
    27. 27. P1P2P3P4Changing NeighborsP1P2P4P3P1P4P2P3Round 1Round 2Round 3
    28. 28. ConclusionThe moral of the story...
    29. 29. Before you leave• It is interesting that association rules play a vital role in data mining.• Through this, what appears to be unrelated can have a logical explanation throughcareful analysis.• This aspect of data mining can be very useful in predicting patterns and foreseeingtrends in consumer behavior, choices and preferences.• Association rules are indeed one of the best ways to succeed in business and enjoy theharvest from data mining.
    30. 30. There are no dumb questions(No questions please shhhh…)