Your SlideShare is downloading. ×
Implementation of Bitmap based
  Incognito and Performance
          Evaluation

    Hyunho Kang, Jaemyung Kim,
    Gapjoo...
Table of Contents
       Introduction
       Existing Solutions
          −   Binary Search
          −   Incognito
    ...
Introduction
       Privacy Problem and Solution (Sweeney)
          −   Released microdata → Join attack (Re-identificat...
Joining Attack
        Example - Joining Attack
        Voter Registration List                         Hospital Patients...
Joining Attack

       Voter Registration List                         Hospital Patients
       Name     DOB         Sex  ...
Basic Definitions (1/3)
       Quasi-Identifier Attribute Set (Q)
         −   minimal set of attributes in table T that ...
Basic Definitions (2/3)
       K-anonymity (K-anonymous)
         −   To satisfy the k-anonymity property(or k-anonymous)...
Basic Definitions (3/3)
       Generalization
          −   is defined by function (user-defined function)
          −   ...
Example of Generalization (1/3)
         Domain and Value Generalization
                                                ...
Example of Generalization (2/3)
       Generalization Lattice for Two Attributes

             <B1, S1>

     <B1,S0>    ...
Table of Contents
       Introduction
       Existing Solutions
          −   Binary Search
          −   Incognito
    ...
Full-Domain Generalization Algorithm
       Binary Search of the lattice finds solution of minimum
        height
       ...
Key Properties of Incognito
          Generalization Property: <Z0> →<Z1>
          Rollup Property
          Subset Pr...
Basic Incognito Example (1/3)
        Consider quasi-identifier (DOB, Sex, Zipcode) and k = 2
        Search 1-subsets  ...
Basic Incognito Example (1/3)
       Consider quasi-identifier (DOB, Sex, Zipcode) and k = 2
       Search 1-subsets    ...
Basic Incognito Example (1/3)
       Consider quasi-identifier (DOB, Sex, Zipcode) and k = 2
       Search 1-subsets    ...
Basic Incognito Example (2/3)
          Search all 2-subsets                     Hospital Patients
                      ...
Basic Incognito Example (2/3)
             Search all 2-subsets                  Hospital Patients
                      ...
Basic Incognito Example (2/3)
          Search all 2-subsets                     Hospital Patients
                      ...
Basic Incognito Example (2/3)
          Search all 2-subsets                     Hospital Patients
                      ...
Basic Incognito Example (3/3)
         Search 3-subsets                          Hospital Patients
                      ...
Basic Incognito Example (3/3)

                    <B1, S1, Z2>

   <B1, S1, Z1>     <B1, S0, Z2> <B0, S1, Z2>

   <B1, S1...
Table of Contents
       Introduction
       Existing Solutions
          −   Binary Search
          −   Incognito
    ...
What Is the Problem?
       Incognito Is Very Nice Algorithm
          −   but…
       Checking k-anonymity for each nod...
Bitmap based Incognito

       Generalization
          −   bitwise OR operation
       Combination
          −   bitwis...
Generalize 1-subset (single attr.)
      Hospital Patients
       DOB       Sex      Zipcode   Disease

       1/21/76   M...
Combination and Generalization


 Male, 53703       001100              &&        Male   111000
                          ...
Generate <S0,Z0> using Bitmap
                    <S1, Z2>
                                           S0
                 ...
Generate <S0,Z0> using Bitmap
                    <S1, Z2>
                                           S0
                 ...
Generate <S0,Z0> using Bitmap
                    <S1, Z2>
                                           S0
                 ...
Generate <S0,Z0> using Bitmap
                    <S1, Z2>
                                           S0
                 ...
Generalize 2-subset Bitmaps
                    <S1, Z2>                       <S0, Z0>
                                  ...
Generalize 2-subset Bitmaps
                    <S1, Z2>                       <S0, Z0>
                                  ...
Generalize 2-subset Bitmaps
                    <S1, Z2>                       <S0, Z0>
                                  ...
Check k-anonymity
                    <S1, Z2>                           <S0, Z0>
                                        ...
Table of Contents
       Introduction
       Existing Solutions
          −   Binary Search
          −   Incognito
    ...
Optimization Techniques
       1-Level Optimization
          −   Keep only 1-subset bitmaps for generating k-subset bitm...
1-level Optimization
                                                        e3
                                          ...
Reusing Optimization
       To generate <a2, g2, e1>
          −   a2 ∧ g2 ∧ e1
          −   <a2, g2> ∧ e1
          −  ...
Pruning Optimization




               1 => does not satisfy k
         can skip node generalization <Male, 53710>, … , <...
Single Instruction Multiple Data
       Using SIMD Instruction
          −   BitwiseAND/OR and bit-counting operation can...
Table of Contents
       Introduction
       Existing Solutions
          −   Binary Search
          −   Incognito
    ...
Performance Evaluation
       Dataset
          −   Small(5MB) and big(60MB) census data
          −   QI attributes set ...
Performance Evaluation
                                           Small Data
     25.000

     20.000

     15.000

     1...
Performance Evaluation
                                   Small Data (zoom in)
     1.400

     1.200

     1.000

     0....
Performance Evaluation
                                            Big Data
     1400.000

     1200.000

     1000.000

 ...
Performance Evaluation
                                     Big Data (zoom in)
     4.000
     3.500
     3.000
     2.500...
Table of Contents
       Introduction
       Existing Solutions
          −   Binary Search
          −   Incognito
    ...
Conclusion
       Incognito = very innovative k-anonymity algorithm
          −   Still inefficient in checking the for e...
Upcoming SlideShare
Loading in...5
×

Implementation of Bitmap based Incognito and Performance Evaluation

493

Published on

"Implementation of Bitmap based Incognito and Performance Evaluation"
DASFAA 2007, Bangkok, Thailand

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
493
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Implementation of Bitmap based Incognito and Performance Evaluation"

  1. 1. Implementation of Bitmap based Incognito and Performance Evaluation Hyunho Kang, Jaemyung Kim, Gapjoo Na, and Sangwon Lee Sungkyunkwan University
  2. 2. Table of Contents  Introduction  Existing Solutions − Binary Search − Incognito  Bitmap based Incognito  Optimization Techniques  Performance Evaluation  Conclusion Implementation of Bitmap based Incognito and Performance Evaluation 2
  3. 3. Introduction  Privacy Problem and Solution (Sweeney) − Released microdata → Join attack (Re-identification) − Solution: k-anonymization  K-anonymization Algorithm − Full-domain binary search − Incognito: one of the most efficient algorithm (Kristen)  Problem of Existing Incognito Algorithm − Require many repeating sorts against large volume data − Solution: using bitmap index structure  Completely eliminate the expensive sort Implementation of Bitmap based Incognito and Performance Evaluation 3
  4. 4. Joining Attack  Example - Joining Attack Voter Registration List Hospital Patients Name DOB Sex Zipcode DOB Sex Zipcode Disease Andre Andre 1/21/76 AndreMale 1/21/76 Male 53715 1/21/76 53715 Male 1/21/76 53715 1/21/76 Male Male Flu 53715 53715 Flu Flu Beth 1/10/81 Female 55410 1/21/76 Male 53703 Broken Arm Carol 10/1/44 Female 90210 2/28/76 Male 53703 Bronchitis Dan 2/21/84 Male 02174 4/13/86 Female 53715 Hepatitis Ellen 4/19/72 Female 02237 4/13/86 Female 53706 Sprained Ankle 2/28/86 Female 53706 Hang Nail Name DOB Sex Zipcode Disease Implementation of Bitmap based Incognito and Performance Evaluation 4
  5. 5. Joining Attack Voter Registration List Hospital Patients Name DOB Sex Zipcode DOB Sex Zipcode Disease Andre 1/21/76 Male 53715 1/21/76 Male 537** Flu Andre 1/21/76AndreMale 1/21/76 53715 Male 1/21/76 537** Male Flu OR 537** Flu Beth 1/10/81 Female 55410 1/21/76 Male 537** Broken Broken Arm 1/21/76 Male 537** Broken Carol 10/1/44 Female 90210 2/28/76 Male Arm537** Bronchitis Dan 2/21/84 Male 02174 4/13/86 Female 537** Hepatitis Ellen 4/19/72 Female 02237 4/13/86 Female 537** Sprained Ankle 2/28/86 Female 537** Hang Nail Name DOB Sex Zipcode Disease Implementation of Bitmap based Incognito and Performance Evaluation 5
  6. 6. Basic Definitions (1/3)  Quasi-Identifier Attribute Set (Q) − minimal set of attributes in table T that can be joined with external information to re-identify individual records − e.g. {Birthdate, Sex, Zipcode}  Frequency Set − a mapping from each unique combination of values of Q in T to the total number of tuples in T with these values of Q (the counts) Implementation of Bitmap based Incognito and Performance Evaluation 6
  7. 7. Basic Definitions (2/3)  K-anonymity (K-anonymous) − To satisfy the k-anonymity property(or k-anonymous) with respect to attribute set Q if every count in the frequency set of T with respect to Q is greater than or equal to k. − In SQL, table T is k-anonymous if each SELECT MIN(COUNT(*)) FROM T GROUP BY (Subset of Quasi-Identifier) is ≥ k − e.g. SELECT MIN(COUNT(*)) FROM “Hospital Patients” GROUP BY DOB, Sex, Zipcode Implementation of Bitmap based Incognito and Performance Evaluation 7
  8. 8. Basic Definitions (3/3)  Generalization − is defined by function (user-defined function) − Notation <D : Di <D Dj: Dj is generalization of Di Implementation of Bitmap based Incognito and Performance Evaluation 8
  9. 9. Example of Generalization (1/3)  Domain and Value Generalization 5371* = f(53715) Z2 537** 537** = f(5371*) Z1 5371* 5370* Zipcode(Z0) 53715 53710 53706 53703 B1 S1 Person * Birth(B0) 1/21/76 2/28/76 4/13/86 Sex(S0) Male Female Implementation of Bitmap based Incognito and Performance Evaluation 9
  10. 10. Example of Generalization (2/3)  Generalization Lattice for Two Attributes <B1, S1> <B1,S0> <B0, S1> <S1, Z2> Sex Zipcode <B0, S0> <S1, Z1> <S0, Z2> Male 537** Female 537** <B1, Z2> <S1, Z0> <S0, Z1> Sex Zipcode <B1, Z1> <B0, Z2> <S0, Z0> Male 5370* Male 5371* <B1, Z0> <B0, Z1> Female 5370* Female 5371* <B0, Z0> Implementation of Bitmap based Incognito and Performance Evaluation 10
  11. 11. Table of Contents  Introduction  Existing Solutions − Binary Search − Incognito  Bitmap based Incognito  Optimization Techniques  Performance Evaluation  Conclusion Implementation of Bitmap based Incognito and Performance Evaluation 11
  12. 12. Full-Domain Generalization Algorithm  Binary Search of the lattice finds solution of minimum height - if no generalization of height h satisfies k-anonymity, then no generalization of height h’ < h will satisfy k-anonymity. <S1, Z2> h : maximum height in the generalization lattice 1) Check generalization at height └h/2┘ <S0, Z2> 2) If this height satisfies k-anonymity <S1, Z1> 2-1) check generalization at height └h/4┘ 3) Else <S1, Z0> <S0, Z1> 3-1) check generalization at height └3h/4┘ 4) And so on… <S0, Z0>  This algorithm is proven to find a single minimal full- domain k-anonymization Implementation of Bitmap based Incognito and Performance Evaluation 12
  13. 13. Key Properties of Incognito  Generalization Property: <Z0> →<Z1>  Rollup Property  Subset Property: <S1,Z0,D1> → <S1,Z0>, <S1,D1>, <Z0,D1> Hospital Patients Hospital Patients B0 S0 Z0 D0 B0 S0 Z1 D0 1/21/76 Male 53715 Flu 1/21/76 Male 5371* Flu 1/21/76 Male 53703 Broken Arm 1/21/76 Male 5370* Broken Arm 2/28/76 Male 53703 Bronchitis 2/28/76 Male 5370* Bronchitis 4/13/86 Female 53715 Hepatitis 4/13/86 Female 5371* Hepatitis 4/13/86 Female 53706 Sprained Ankle 4/13/86 Female 5370* Sprained Ankle 2/28/86 Female 53706 Hang Nail 2/28/86 Female 5370* Hang Nail Implementation of Bitmap based Incognito and Performance Evaluation 13
  14. 14. Basic Incognito Example (1/3)  Consider quasi-identifier (DOB, Sex, Zipcode) and k = 2  Search 1-subsets Hospital Patients DOB Sex Zipcode Disease 1/21/76 Male 53715 Flu 1/21/76 Male 53703 Broken Arm B1 2/28/76 Male 53703 Bronchitis 4/13/86 Female 53715 Hepatitis B0 4/13/86 Female 53706 Sprained Ankle 2/28/76 Female 53706 Hang Nail DOB Count 1/21/76 2 4/13/86 2 SELECT 2/28/76 2 COUNT(*) GROUP BY DOB Implementation of Bitmap based Incognito and Performance Evaluation 14
  15. 15. Basic Incognito Example (1/3)  Consider quasi-identifier (DOB, Sex, Zipcode) and k = 2  Search 1-subsets Hospital Patients DOB Sex Zipcode Disease 1/21/76 Male 53715 Flu 1/21/76 Male 53703 Broken Arm S1 2/28/76 Male 53703 Bronchitis 4/13/86 Female 53715 Hepatitis S0 4/13/86 Female 53706 Sprained Ankle 2/28/76 Female 53706 Hang Nail Sex Count Male 3 Female 3 SELECT COUNT(*) GROUP BY Sex Implementation of Bitmap based Incognito and Performance Evaluation 15
  16. 16. Basic Incognito Example (1/3)  Consider quasi-identifier (DOB, Sex, Zipcode) and k = 2  Search 1-subsets Hospital Patients DOB Sex Zipcode Disease 1/21/76 Male 53715 Flu Z2 1/21/76 Male 53703 Broken Arm Z1 2/28/76 Male 53703 Bronchitis 4/13/86 Female 53715 Hepatitis Z0 4/13/86 Female 53706 Sprained Ankle 2/28/76 Female 53706 Hang Nail Zipcode Count 53715 2 53703 2 SELECT COUNT(*) 53706 2 GROUP BY Zipcode Implementation of Bitmap based Incognito and Performance Evaluation 16
  17. 17. Basic Incognito Example (2/3)  Search all 2-subsets Hospital Patients DOB Sex Zipcode Disease <S1, Z2> 1/21/76 Male 53715 Flu 1/21/76 Male 53703 Broken Arm <S1, Z1> <S0, Z2> 2/28/76 Male 53703 Bronchitis 4/13/86 Female 53715 Hepatitis <S1, Z0> <S0, Z1> 4/13/86 Female 53706 Sprained Ankle 2/28/76 Female 53706 Hang Nail <S0, Z0> Sex Zipcode Count SELECT Male 53715 1 COUNT(*) Female 53715 1 GROUP BY Male 53703 2 Sex, Zipcode Female 53706 2 Implementation of Bitmap based Incognito and Performance Evaluation 17
  18. 18. Basic Incognito Example (2/3)  Search all 2-subsets Hospital Patients DOB Sex Zipcode Disease <S1, Z2> 1/21/76 Male 53715 Flu 1/21/76 Male 53703 Broken Arm <S1, Z1> <S0, Z2> 2/28/76 Male 53703 Bronchitis 4/13/86 Female 53715 Hepatitis <S1, Z0> <S0, Z1> 4/13/86 Female 53706 Sprained Ankle 2/28/76 Female 53706 Hang Nail SELECT S1 Zipcode Count COUNT(*) * 53715 2 GROUP BY * 53703 2 S1, Zipcode * 53706 2 Implementation of Bitmap based Incognito and Performance Evaluation 18
  19. 19. Basic Incognito Example (2/3)  Search all 2-subsets Hospital Patients DOB Sex Zipcode Disease <S1, Z2> 1/21/76 Male 53715 Flu 1/21/76 Male 53703 Broken Arm <S1, Z1> <S0, Z2> 2/28/76 Male 53703 Bronchitis 4/13/86 Female 53715 Hepatitis <S1, Z0> <S0, Z1> 4/13/86 Female 53706 Sprained Ankle 2/28/76 Female 53706 Hang Nail Sex Z1 Count SELECT Male 5371* 1 COUNT(*) Female 5371* 1 GROUP BY Male 5370* 2 Sex, Z1 Female 5370* 2 Implementation of Bitmap based Incognito and Performance Evaluation 19
  20. 20. Basic Incognito Example (2/3)  Search all 2-subsets Hospital Patients DOB Sex Zipcode Disease <S1, Z2> 1/21/76 Male 53715 Flu 1/21/76 Male 53703 Broken Arm <S1, Z1> <S0, Z2> 2/28/76 Male 53703 Bronchitis 4/13/86 Female 53715 Hepatitis <S1, Z0> 4/13/86 Female 53706 Sprained Ankle 2/28/76 Female 53706 Hang Nail SELECT Sex Z2 Count COUNT(*) Male 537** 3 GROUP BY Female 537** 3 Sex, Z2 Implementation of Bitmap based Incognito and Performance Evaluation 20
  21. 21. Basic Incognito Example (3/3)  Search 3-subsets Hospital Patients DOB Sex Zipcode Disease 1/21/76 Male 53715 Flu <B1, S1, Z2> 1/21/76 Male 53703 Broken Arm 2/28/76 Male 53703 Bronchitis <B1, S1, Z1> <B1, S0, Z2> <B0, S1, Z2> 4/13/86 Female 53715 Hepatitis 4/13/86 Female 53706 Sprained Ankle <B1, S1, Z0> 2/28/76 Female 53706 Hang Nail Implementation of Bitmap based Incognito and Performance Evaluation 21
  22. 22. Basic Incognito Example (3/3) <B1, S1, Z2> <B1, S1, Z1> <B1, S0, Z2> <B0, S1, Z2> <B1, S1, Z0> VS. <B1, S1, Z2> <B1, S1, Z1> <B1, S0, Z2> <B0, S1, Z2> <B1, S1, Z0> <B1, S0, Z1> <B0, S1, Z1> <B0, S0, Z2> <B1, S0, Z0> <B0, S1, Z0> <B0, S0, Z1> <B0, S0, Z0> Implementation of Bitmap based Incognito and Performance Evaluation 22
  23. 23. Table of Contents  Introduction  Existing Solutions − Binary Search − Incognito  Bitmap based Incognito  Optimization Techniques  Performance Evaluation  Conclusion Implementation of Bitmap based Incognito and Performance Evaluation 23
  24. 24. What Is the Problem?  Incognito Is Very Nice Algorithm − but…  Checking k-anonymity for each node is still expensive! − SELECT MIN(COUNT(*)) FROM T GROUP BY (QI Attr. Set) Implementation of Bitmap based Incognito and Performance Evaluation 24
  25. 25. Bitmap based Incognito  Generalization − bitwise OR operation  Combination − bitwise AND operation  Checking k-anonymity − bit-counting operation Implementation of Bitmap based Incognito and Performance Evaluation 25
  26. 26. Generalize 1-subset (single attr.) Hospital Patients DOB Sex Zipcode Disease 1/21/76 Male 53715 Flu ???? 0 1/21/76 Male 53703 Broken Arm ???? 1 2/28/76 Male 53703 Bronchitis 5370* 1 4/13/86 Female 53715 Hepatitis 5370* = 0 5370* 1 4/13/86 Female 53706 Sprained Ankle 5370* 1 2/28/76 Female 53706 Hang Nail OR 0 0 537** 0 1 0 1 0 0 1 0 5371* 5370* 1 0 53715 53710 53706 53703 Implementation of Bitmap based Incognito and Performance Evaluation 26
  27. 27. Combination and Generalization Male, 53703 001100 && Male 111000 || * 111111 Male, 53706 000000 Female 000111 Male, 53715 100000 < S1 > < sex > Female, 53703 000000 53703 011000 && || 5370* 011011 Female, 53706 000011 53706 000011 Female, 53715 000100 53715 100100 5371* 100100 < S0, Z0 > < zipcode > < Z1 > Generate 2-subsets Generalize Implementation of Bitmap based Incognito and Performance Evaluation 27
  28. 28. Generate <S0,Z0> using Bitmap <S1, Z2> S0 Male Female <Male, 53703> <S1, Z1> <S0, Z2> 1 0 0 1 0 1 1 0 1 <S1, Z0> <S0, Z1> 0 1 0 0 1 0 0 1 0 <S0, Z0> AND Z0 53703 53706 53715 0 0 1 1 0 0 1 0 0 0 0 1 0 1 0 0 1 0 Implementation of Bitmap based Incognito and Performance Evaluation 28
  29. 29. Generate <S0,Z0> using Bitmap <S1, Z2> S0 Male Female <Male, 53706> <S1, Z1> <S0, Z2> 1 0 0 1 0 0 1 0 0 <S1, Z0> <S0, Z1> 0 1 0 0 1 0 0 1 0 <S0, Z0> AND Z0 53703 53706 53715 0 0 1 1 0 0 1 0 0 0 0 1 0 1 0 0 1 0 Implementation of Bitmap based Incognito and Performance Evaluation 29
  30. 30. Generate <S0,Z0> using Bitmap <S1, Z2> S0 Male Female <Male, 53715> <S1, Z1> <S0, Z2> 1 0 1 1 0 0 1 0 0 <S1, Z0> <S0, Z1> 0 1 0 0 1 0 0 1 0 <S0, Z0> AND Z0 53703 53706 53715 0 0 1 1 0 0 1 0 0 0 0 1 0 1 0 0 1 0 Implementation of Bitmap based Incognito and Performance Evaluation 30
  31. 31. Generate <S0,Z0> using Bitmap <S1, Z2> S0 <Male, <Male, Male Female 53703> 53715> <S1, Z1> <S0, Z2> 1 0 0 1 1 0 1 0 1 0 1 0 <S1, Z0> <S0, Z1> 0 1 0 0 0 1 0 0 0 1 0 0 <S0, Z0> AND Z0 <Male, 53703 53706 53715 53706> 0 0 1 0 1 0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0 1 0 0 Implementation of Bitmap based Incognito and Performance Evaluation 31
  32. 32. Generalize 2-subset Bitmaps <S1, Z2> <S0, Z0> <Male, <Male, <Male, 53703> 53706> 53715> <S1, Z1> <S0, Z2> 0 0 1 1 0 0 1 0 0 <S1, Z0> <S0, Z1> 0 0 0 0 0 0 0 0 0 <S0, Z0> <Female, <Female, <Female, <S1, Z0> <*, 53703> 53703> 53706> 53715> 0 0 0 0 1 0 0 0 1 0 0 0 0 OR 0 0 1 0 0 1 0 0 0 1 0 Implementation of Bitmap based Incognito and Performance Evaluation 32
  33. 33. Generalize 2-subset Bitmaps <S1, Z2> <S0, Z0> <Male, <Male, <Male, 53703> 53706> 53715> <S1, Z1> <S0, Z2> 0 0 1 1 0 0 1 0 0 <S1, Z0> <S0, Z1> 0 0 0 0 0 0 0 0 0 <S0, Z0> <Female, <Female, <Female, <S1, Z0> <*, 53703> <*, 53706> 53703> 53706> 53715> 0 0 0 0 0 1 0 0 0 0 OR 0 0 0 1 0 0 0 0 0 1 0 1 0 1 0 0 1 0 1 0 Implementation of Bitmap based Incognito and Performance Evaluation 33
  34. 34. Generalize 2-subset Bitmaps <S1, Z2> <S0, Z0> <Male, <Male, <Male, 53703> 53706> 53715> <S1, Z1> <S0, Z2> 0 0 1 1 0 0 1 0 0 <S1, Z0> <S0, Z1> 0 0 0 0 0 0 0 0 0 <S0, Z0> <Female, <Female, <Female, <S1, Z0> <*, 53703> <*, 53706> <*, 53715> 53703> 53706> 53715> 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 OR 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 0 1 0 Implementation of Bitmap based Incognito and Performance Evaluation 34
  35. 35. Check k-anonymity <S1, Z2> <S0, Z0> <Male, <Male, <Male, 53703> 53706> 53715> <S1, Z1> <S0, Z2> 0 0 1 1 0 0 1 0 0 <S1, Z0> <S0, Z1> 0 0 0 0 0 0 0 0 0 <S0, Z0> <Female, <Female, <Female, <S1, Z0> <*, 53703> <*, 53706> <*, 53715> 53703> 53706> 53715> 0 0 1 0 0 0 1 0 0 0 0 0 1 C 0 C 0 C 0 0 0 0 O 0 O 1 O 0 0 1 U U U 0 1 0 0 N 1 N 0 N 0 T 1 T 0 T 0 1 0 2 2 2 ☞ Satisfy K(2)-anonymity Implementation of Bitmap based Incognito and Performance Evaluation 35
  36. 36. Table of Contents  Introduction  Existing Solutions − Binary Search − Incognito  Bitmap based Incognito  Optimization Techniques  Performance Evaluation  Conclusion Implementation of Bitmap based Incognito and Performance Evaluation 36
  37. 37. Optimization Techniques  1-Level Optimization − Keep only 1-subset bitmaps for generating k-subset bitmaps  Reusing Optimization − Reuse intermediate (k-?)-subset bitmaps for generating k- subset bitmaps  Pruning Optimization − Stop counting operation if specific bitmap does not satisfy ‘k’ − And then check more generalized node  Single Instruction Multiple Data − Parallelize bitwise AND/OR operation using SIMD instruction Implementation of Bitmap based Incognito and Performance Evaluation 37
  38. 38. 1-level Optimization e3 ↑ a2 e2 g2 ↑ ↑ ↑ a1 g1 e1 ↑ ↑ ↑ a0 g0 e0 <a2, g2, e1> = a2 ∧ g2 ∧ e1 Reduce Memory and Disk Space for Bitmap! Implementation of Bitmap based Incognito and Performance Evaluation 38
  39. 39. Reusing Optimization  To generate <a2, g2, e1> − a2 ∧ g2 ∧ e1 − <a2, g2> ∧ e1 − <a2, e1> ^ g2 − <g2, e1> ^ a2  2-subset bitmaps are already created at the previous step Implementation of Bitmap based Incognito and Performance Evaluation 39
  40. 40. Pruning Optimization 1 => does not satisfy k can skip node generalization <Male, 53710>, … , <Female, 53715> Implementation of Bitmap based Incognito and Performance Evaluation 40
  41. 41. Single Instruction Multiple Data  Using SIMD Instruction − BitwiseAND/OR and bit-counting operation can be parallelized  We implemented using − Intel Pentium 4 Streamed SIMD Extensions(SSE) technology Implementation of Bitmap based Incognito and Performance Evaluation 41
  42. 42. Table of Contents  Introduction  Existing Solutions − Binary Search − Incognito  Bitmap based Incognito  Optimization Techniques  Performance Evaluation  Conclusion Implementation of Bitmap based Incognito and Performance Evaluation 42
  43. 43. Performance Evaluation  Dataset − Small(5MB) and big(60MB) census data − QI attributes set (four columns)  Generalization level: 3, 3, 2, 4 respectively − Index size: 2MB(40%) and 16MB(27%) − Bitmap size: 200KB(4%) and 2MB(3%)  Environment − Pentium IV 2.0 GHz − 1GB memory, 7200rpm hard disk − Oracle 10g R1 & Intel C++ Compiler 9.0 Implementation of Bitmap based Incognito and Performance Evaluation 43
  44. 44. Performance Evaluation Small Data 25.000 20.000 15.000 10.000 5.000 0.000 4000 2000 1000 500 100 1-Level Pruning Reusing Traditional Implementation of Bitmap based Incognito and Performance Evaluation 44
  45. 45. Performance Evaluation Small Data (zoom in) 1.400 1.200 1.000 0.800 0.600 0.400 0.200 0.000 4000 2000 1000 500 100 1-Level Pruning Reusing Implementation of Bitmap based Incognito and Performance Evaluation 45
  46. 46. Performance Evaluation Big Data 1400.000 1200.000 1000.000 800.000 600.000 400.000 200.000 0.000 4000 2000 1000 500 100 1-Level Pruning Reusing Traditional Implementation of Bitmap based Incognito and Performance Evaluation 46
  47. 47. Performance Evaluation Big Data (zoom in) 4.000 3.500 3.000 2.500 2.000 1.500 1.000 0.500 0.000 4000 2000 1000 500 100 1-Level Pruning Reusing Implementation of Bitmap based Incognito and Performance Evaluation 47
  48. 48. Table of Contents  Introduction  Existing Solutions − Binary Search − Incognito  Bitmap based Incognito  Optimization Techniques  Performance Evaluation  Conclusion Implementation of Bitmap based Incognito and Performance Evaluation 48
  49. 49. Conclusion  Incognito = very innovative k-anonymity algorithm − Still inefficient in checking the for each node − Expensive external sort or hash for counting (e.g. GROUP BY)  Using Bitmap (Bitwise AND/OR) − Additional optimization opportunities  Reusing Optimization  Pruning Optimization  Single Instruction Multiple Data − Space/time trade-off  1-level / Reusing Optimization Implementation of Bitmap based Incognito and Performance Evaluation 49

×