SlideShare a Scribd company logo
1 of 49
Implementation of Bitmap based
  Incognito and Performance
          Evaluation

    Hyunho Kang, Jaemyung Kim,
    Gapjoo Na, and Sangwon Lee

         Sungkyunkwan University
Table of Contents
       Introduction
       Existing Solutions
          −   Binary Search
          −   Incognito
       Bitmap based Incognito
       Optimization Techniques
       Performance Evaluation
       Conclusion




Implementation of Bitmap based Incognito and Performance Evaluation   2
Introduction
       Privacy Problem and Solution (Sweeney)
          −   Released microdata → Join attack (Re-identification)
          −   Solution: k-anonymization
       K-anonymization Algorithm
          −   Full-domain binary search
          −   Incognito: one of the most efficient algorithm (Kristen)
       Problem of Existing Incognito Algorithm
          −   Require many repeating sorts against large volume data
          −   Solution: using bitmap index structure
                  Completely eliminate the expensive sort

Implementation of Bitmap based Incognito and Performance Evaluation      3
Joining Attack
        Example - Joining Attack
        Voter Registration List                         Hospital Patients
        Name    DOB         Sex         Zipcode         DOB           Sex         Zipcode   Disease

        Andre
        Andre   1/21/76 AndreMale
                  1/21/76  Male         53715
                                     1/21/76
                                          53715   Male 1/21/76 53715
                                                       1/21/76     Male
                                                                    Male       Flu 53715
                                                                                    53715   Flu
                                                                                            Flu
        Beth    1/10/81     Female      55410           1/21/76       Male        53703     Broken Arm

        Carol   10/1/44     Female      90210           2/28/76       Male        53703     Bronchitis

        Dan     2/21/84     Male        02174           4/13/86       Female      53715     Hepatitis

        Ellen   4/19/72     Female      02237           4/13/86       Female      53706     Sprained Ankle

                                                        2/28/86       Female      53706     Hang Nail




                          Name       DOB          Sex             Zipcode      Disease




Implementation of Bitmap based Incognito and Performance Evaluation                                          4
Joining Attack

       Voter Registration List                         Hospital Patients
       Name     DOB         Sex        Zipcode         DOB           Sex         Zipcode   Disease
       Andre    1/21/76     Male      53715           1/21/76      Male           537**    Flu
       Andre      1/21/76AndreMale 1/21/76
                                        53715    Male 1/21/76 537** Male      Flu OR
                                                                                   537**   Flu
       Beth     1/10/81     Female    55410           1/21/76      Male           537**
                                                                              Broken       Broken Arm
                                                      1/21/76       Male           537**   Broken
       Carol    10/1/44     Female     90210          2/28/76      Male       Arm537**     Bronchitis
       Dan      2/21/84     Male       02174           4/13/86       Female      537**     Hepatitis
       Ellen    4/19/72     Female     02237           4/13/86       Female      537**     Sprained Ankle

                                                       2/28/86       Female      537**     Hang Nail




                          Name       DOB         Sex             Zipcode      Disease




Implementation of Bitmap based Incognito and Performance Evaluation                                         5
Basic Definitions (1/3)
       Quasi-Identifier Attribute Set (Q)
         −   minimal set of attributes in table T that can be joined with
             external information to re-identify individual records
         −   e.g. {Birthdate, Sex, Zipcode}
       Frequency Set
         −   a mapping from each unique combination of values of Q in T
             to the total number of tuples in T with these values of Q (the
             counts)




Implementation of Bitmap based Incognito and Performance Evaluation           6
Basic Definitions (2/3)
       K-anonymity (K-anonymous)
         −   To satisfy the k-anonymity property(or k-anonymous) with
             respect to attribute set Q if every count in the frequency set
             of T with respect to Q is greater than or equal to k.
         −   In SQL, table T is k-anonymous if each
                SELECT MIN(COUNT(*))
                FROM T
                GROUP BY (Subset of Quasi-Identifier)
              is ≥ k
         −   e.g.
                SELECT MIN(COUNT(*))
                FROM “Hospital Patients”
                GROUP BY DOB, Sex, Zipcode

Implementation of Bitmap based Incognito and Performance Evaluation           7
Basic Definitions (3/3)
       Generalization
          −   is defined by function (user-defined function)
          −   Notation <D : Di <D Dj: Dj is generalization of Di




Implementation of Bitmap based Incognito and Performance Evaluation   8
Example of Generalization (1/3)
         Domain and Value Generalization
                                                                        5371* = f(53715)
                      Z2                             537**
                                                                        537** = f(5371*)


                      Z1                  5371*               5370*


                 Zipcode(Z0) 53715            53710        53706      53703


     B1                                                          S1           Person
                                    *

    Birth(B0) 1/21/76           2/28/76         4/13/86        Sex(S0) Male       Female


Implementation of Bitmap based Incognito and Performance Evaluation                        9
Example of Generalization (2/3)
       Generalization Lattice for Two Attributes

             <B1, S1>

     <B1,S0>          <B0, S1>             <S1, Z2>
                                                                      Sex      Zipcode
              <B0, S0>             <S1, Z1>         <S0, Z2>          Male     537**
                                                                      Female   537**
             <B1, Z2>              <S1, Z0>         <S0, Z1>
                                                                      Sex      Zipcode
    <B1, Z1>          <B0, Z2>             <S0, Z0>                   Male     5370*

                                                                      Male     5371*
    <B1, Z0>          <B0, Z1>
                                                                      Female   5370*

                                                                      Female   5371*
             <B0, Z0>

Implementation of Bitmap based Incognito and Performance Evaluation                    10
Table of Contents
       Introduction
       Existing Solutions
          −   Binary Search
          −   Incognito
       Bitmap based Incognito
       Optimization Techniques
       Performance Evaluation
       Conclusion




Implementation of Bitmap based Incognito and Performance Evaluation   11
Full-Domain Generalization Algorithm
       Binary Search of the lattice finds solution of minimum
        height
        - if no generalization of height h satisfies k-anonymity, then
        no generalization of height h’ < h will satisfy k-anonymity.
                                                                                 <S1, Z2>
          h : maximum height in the generalization lattice
          1) Check generalization at height └h/2┘
                                                                                            <S0, Z2>
          2) If this height satisfies k-anonymity                     <S1, Z1>
             2-1) check generalization at height └h/4┘
          3) Else                                                     <S1, Z0>              <S0, Z1>
             3-1) check generalization at height └3h/4┘
          4) And so on…                                                          <S0, Z0>


       This algorithm is proven to find a single minimal full-
        domain k-anonymization

Implementation of Bitmap based Incognito and Performance Evaluation                                    12
Key Properties of Incognito
          Generalization Property: <Z0> →<Z1>
          Rollup Property
          Subset Property: <S1,Z0,D1> → <S1,Z0>, <S1,D1>, <Z0,D1>


                  Hospital Patients                               Hospital Patients
          B0        S0       Z0           D0              B0          S0     Z1           D0

        1/21/76   Male     53715   Flu                  1/21/76   Male     5371*   Flu

        1/21/76   Male     53703   Broken Arm           1/21/76   Male     5370*   Broken Arm

        2/28/76   Male     53703   Bronchitis           2/28/76   Male     5370*   Bronchitis

        4/13/86   Female   53715   Hepatitis            4/13/86   Female   5371*   Hepatitis

        4/13/86   Female   53706   Sprained Ankle       4/13/86   Female   5370*   Sprained Ankle

        2/28/86   Female   53706   Hang Nail            2/28/86   Female   5370*   Hang Nail



Implementation of Bitmap based Incognito and Performance Evaluation                                 13
Basic Incognito Example (1/3)
        Consider quasi-identifier (DOB, Sex, Zipcode) and k = 2
        Search 1-subsets                        Hospital Patients
                                                  DOB        Sex      Zipcode   Disease
                                                  1/21/76    Male     53715     Flu
                                                  1/21/76    Male     53703     Broken Arm

             B1                                   2/28/76    Male     53703     Bronchitis
                                                  4/13/86    Female   53715     Hepatitis
             B0                                   4/13/86    Female   53706     Sprained Ankle
                                                  2/28/76    Female   53706     Hang Nail
   DOB       Count
   1/21/76   2
   4/13/86   2                                                   SELECT
   2/28/76   2                                                  COUNT(*)
                                                              GROUP BY DOB

Implementation of Bitmap based Incognito and Performance Evaluation                          14
Basic Incognito Example (1/3)
       Consider quasi-identifier (DOB, Sex, Zipcode) and k = 2
       Search 1-subsets                         Hospital Patients
                                                  DOB        Sex      Zipcode   Disease
                                                  1/21/76    Male     53715     Flu
                                                  1/21/76    Male     53703     Broken Arm

                       S1                         2/28/76    Male     53703     Bronchitis
                                                  4/13/86    Female   53715     Hepatitis
                       S0                         4/13/86    Female   53706     Sprained Ankle
                                                  2/28/76    Female   53706     Hang Nail
              Sex       Count

              Male      3

              Female    3                                        SELECT
                                                                COUNT(*)
                                                              GROUP BY Sex

Implementation of Bitmap based Incognito and Performance Evaluation                          15
Basic Incognito Example (1/3)
       Consider quasi-identifier (DOB, Sex, Zipcode) and k = 2
       Search 1-subsets                         Hospital Patients
                                                  DOB        Sex       Zipcode   Disease
                                                  1/21/76    Male      53715     Flu
                                   Z2
                                                  1/21/76    Male      53703     Broken Arm

                                   Z1             2/28/76    Male      53703     Bronchitis
                                                  4/13/86    Female    53715     Hepatitis
                                   Z0             4/13/86    Female    53706     Sprained Ankle
                                                  2/28/76    Female    53706     Hang Nail
                         Zipcode        Count
                         53715          2
                         53703          2                            SELECT
                                                                   COUNT(*)
                         53706          2
                                                                   GROUP BY
                                                                    Zipcode

Implementation of Bitmap based Incognito and Performance Evaluation                           16
Basic Incognito Example (2/3)
          Search all 2-subsets                     Hospital Patients
                                                 DOB        Sex       Zipcode   Disease
                 <S1, Z2>
                                                 1/21/76    Male      53715     Flu
                                                 1/21/76    Male      53703     Broken Arm
      <S1, Z1>          <S0, Z2>
                                                 2/28/76    Male      53703     Bronchitis
                                                 4/13/86    Female    53715     Hepatitis
      <S1, Z0>          <S0, Z1>
                                                 4/13/86    Female    53706     Sprained Ankle
                                                 2/28/76    Female    53706     Hang Nail
                 <S0, Z0>

          Sex        Zipcode     Count
                                                          SELECT
          Male       53715       1                       COUNT(*)
          Female     53715       1                       GROUP BY
          Male       53703       2                     Sex, Zipcode
          Female     53706       2


Implementation of Bitmap based Incognito and Performance Evaluation                          17
Basic Incognito Example (2/3)
             Search all 2-subsets                  Hospital Patients
                                                 DOB        Sex       Zipcode   Disease
                 <S1, Z2>
                                                 1/21/76    Male      53715     Flu
                                                 1/21/76    Male      53703     Broken Arm
      <S1, Z1>          <S0, Z2>
                                                 2/28/76    Male      53703     Bronchitis
                                                 4/13/86    Female    53715     Hepatitis
      <S1, Z0>          <S0, Z1>
                                                 4/13/86    Female    53706     Sprained Ankle
                                                 2/28/76    Female    53706     Hang Nail




                                                           SELECT
          S1         Zipcode     Count                   COUNT(*)
          *          53715       2                        GROUP BY
          *          53703       2                      S1, Zipcode
          *          53706       2


Implementation of Bitmap based Incognito and Performance Evaluation                          18
Basic Incognito Example (2/3)
          Search all 2-subsets                     Hospital Patients
                                                 DOB        Sex       Zipcode   Disease
                 <S1, Z2>
                                                 1/21/76    Male      53715     Flu
                                                 1/21/76    Male      53703     Broken Arm
      <S1, Z1>            <S0, Z2>
                                                 2/28/76    Male      53703     Bronchitis
                                                 4/13/86    Female    53715     Hepatitis
      <S1, Z0>            <S0, Z1>
                                                 4/13/86    Female    53706     Sprained Ankle
                                                 2/28/76    Female    53706     Hang Nail



          Sex        Z1          Count
                                                            SELECT
          Male       5371*       1
                                                           COUNT(*)
          Female     5371*       1                         GROUP BY
          Male       5370*       2                          Sex, Z1
          Female     5370*       2


Implementation of Bitmap based Incognito and Performance Evaluation                          19
Basic Incognito Example (2/3)
          Search all 2-subsets                     Hospital Patients
                                                 DOB        Sex       Zipcode   Disease
                 <S1, Z2>
                                                 1/21/76    Male      53715     Flu
                                                 1/21/76    Male      53703     Broken Arm
      <S1, Z1>            <S0, Z2>
                                                 2/28/76    Male      53703     Bronchitis
                                                 4/13/86    Female    53715     Hepatitis
      <S1, Z0>
                                                 4/13/86    Female    53706     Sprained Ankle
                                                 2/28/76    Female    53706     Hang Nail




                                                            SELECT
          Sex        Z2          Count                     COUNT(*)
          Male       537**       3                         GROUP BY
          Female     537**       3                          Sex, Z2



Implementation of Bitmap based Incognito and Performance Evaluation                          20
Basic Incognito Example (3/3)
         Search 3-subsets                          Hospital Patients
                                                  DOB        Sex      Zipcode   Disease
                                                  1/21/76    Male     53715     Flu

                 <B1, S1, Z2>                     1/21/76    Male     53703     Broken Arm
                                                  2/28/76    Male     53703     Bronchitis
<B1, S1, Z1>     <B1, S0, Z2> <B0, S1, Z2>        4/13/86    Female   53715     Hepatitis
                                                  4/13/86    Female   53706     Sprained Ankle
<B1, S1, Z0>
                                                  2/28/76    Female   53706     Hang Nail




Implementation of Bitmap based Incognito and Performance Evaluation                          21
Basic Incognito Example (3/3)

                    <B1, S1, Z2>

   <B1, S1, Z1>     <B1, S0, Z2> <B0, S1, Z2>

   <B1, S1, Z0>
                                             VS.
                                                            <B1, S1, Z2>

                                           <B1, S1, Z1>     <B1, S0, Z2> <B0, S1, Z2>

                                     <B1, S1, Z0> <B1, S0, Z1> <B0, S1, Z1> <B0, S0, Z2>

                                            <B1, S0, Z0>     <B0, S1, Z0> <B0, S0, Z1>

                                                              <B0, S0, Z0>

Implementation of Bitmap based Incognito and Performance Evaluation                      22
Table of Contents
       Introduction
       Existing Solutions
          −   Binary Search
          −   Incognito
       Bitmap based Incognito
       Optimization Techniques
       Performance Evaluation
       Conclusion




Implementation of Bitmap based Incognito and Performance Evaluation   23
What Is the Problem?
       Incognito Is Very Nice Algorithm
          −   but…
       Checking k-anonymity for each node is still expensive!
          −   SELECT MIN(COUNT(*))
              FROM T
              GROUP BY (QI Attr. Set)




Implementation of Bitmap based Incognito and Performance Evaluation   24
Bitmap based Incognito

       Generalization
          −   bitwise OR operation
       Combination
          −   bitwise AND operation
       Checking k-anonymity
          −   bit-counting operation




Implementation of Bitmap based Incognito and Performance Evaluation   25
Generalize 1-subset (single attr.)
      Hospital Patients
       DOB       Sex      Zipcode   Disease

       1/21/76   Male     53715     Flu
                                                                      ????        0
       1/21/76   Male     53703     Broken Arm                        ????        1
       2/28/76   Male     53703     Bronchitis                        5370*       1
       4/13/86   Female   53715     Hepatitis                         5370*   =   0
                                                                      5370*       1
       4/13/86   Female   53706     Sprained Ankle
                                                                      5370*       1
       2/28/76   Female   53706     Hang Nail
                                                                                          OR

                                                                                      0        0
                                              537**                                   0        1
                                                                                      0        1
                                                                                      0        0
                                                                                      1        0
                                  5371*               5370*                           1        0


                          53715 53710 53706 53703

Implementation of Bitmap based Incognito and Performance Evaluation                                26
Combination and Generalization


 Male, 53703       001100              &&        Male   111000
                                                                      ||      *    111111
 Male, 53706       000000                        Female 000111
 Male, 53715       100000                                                    < S1 >
                                                   < sex >

 Female, 53703     000000                        53703 011000
                                  &&                                  ||       5370* 011011
 Female, 53706     000011                        53706 000011
 Female, 53715     000100                        53715 100100                  5371* 100100
      < S0, Z0 >                                 < zipcode >                        < Z1 >
                     Generate 2-subsets                               Generalize




Implementation of Bitmap based Incognito and Performance Evaluation                           27
Generate <S0,Z0> using Bitmap
                    <S1, Z2>
                                           S0
                                             Male             Female      <Male, 53703>
         <S1, Z1>              <S0, Z2>       1                 0              0
                                              1                 0              1
                                              1                 0              1
          <S1, Z0>             <S0, Z1>       0                 1              0
                                              0                 1              0
                                              0                 1              0
                    <S0, Z0>
                                                  AND
                                           Z0
                                              53703     53706     53715
                                               0          0         1
                                               1          0         0
                                               1          0         0
                                               0          0         1
                                               0          1         0
                                               0          1         0


Implementation of Bitmap based Incognito and Performance Evaluation                       28
Generate <S0,Z0> using Bitmap
                    <S1, Z2>
                                           S0
                                             Male             Female      <Male, 53706>
         <S1, Z1>              <S0, Z2>       1                 0              0
                                              1                 0              0
                                              1                 0              0
          <S1, Z0>             <S0, Z1>       0                 1              0
                                              0                 1              0
                                              0                 1              0
                    <S0, Z0>

                                                         AND
                                           Z0
                                              53703     53706     53715
                                               0          0         1
                                               1          0         0
                                               1          0         0
                                               0          0         1
                                               0          1         0
                                               0          1         0


Implementation of Bitmap based Incognito and Performance Evaluation                       29
Generate <S0,Z0> using Bitmap
                    <S1, Z2>
                                           S0
                                             Male             Female      <Male, 53715>
         <S1, Z1>              <S0, Z2>       1                 0              1
                                              1                 0              0
                                              1                 0              0
          <S1, Z0>             <S0, Z1>       0                 1              0
                                              0                 1              0
                                              0                 1              0
                    <S0, Z0>

                                                                 AND
                                           Z0
                                              53703     53706     53715
                                               0          0         1
                                               1          0         0
                                               1          0         0
                                               0          0         1
                                               0          1         0
                                               0          1         0


Implementation of Bitmap based Incognito and Performance Evaluation                       30
Generate <S0,Z0> using Bitmap
                    <S1, Z2>
                                           S0
                                                                          <Male,   <Male,
                                             Male             Female      53703>   53715>
         <S1, Z1>              <S0, Z2>       1                 0           0        1
                                              1                 0           1        0
                                              1                 0           1        0
          <S1, Z0>             <S0, Z1>       0                 1           0        0
                                              0                 1           0        0
                                              0                 1           0        0
                    <S0, Z0>
                                                            AND
                                           Z0                                      <Male,
                                              53703     53706     53715            53706>
                                               0          0         1                0
                                               1          0         0                0
                                               1          0         0                0
                                               0          0         1                0
                                               0          1         0                0
                                               0          1         0                0


Implementation of Bitmap based Incognito and Performance Evaluation                      31
Generalize 2-subset Bitmaps
                    <S1, Z2>                       <S0, Z0>
                                                        <Male,         <Male,    <Male,
                                                         53703>         53706>    53715>
         <S1, Z1>              <S0, Z2>                      0           0          1
                                                             1           0          0
                                                             1           0          0
          <S1, Z0>             <S0, Z1>                      0           0          0
                                                             0           0          0
                                                             0           0          0
                    <S0, Z0>

                                                       <Female,       <Female, <Female,
   <S1, Z0>
     <*, 53703>                                          53703>         53706>   53715>
           0                                                 0           0          0
           1                                                 0           0          0
           1                                                 0           0          0
           0 OR                                              0           0          1
           0                                                 0           1          0
           0                                                 0           1          0



Implementation of Bitmap based Incognito and Performance Evaluation                        32
Generalize 2-subset Bitmaps
                    <S1, Z2>                       <S0, Z0>
                                                        <Male,         <Male,    <Male,
                                                         53703>         53706>    53715>
         <S1, Z1>              <S0, Z2>                      0           0          1
                                                             1           0          0
                                                             1           0          0
          <S1, Z0>             <S0, Z1>                      0           0          0
                                                             0           0          0
                                                             0           0          0
                    <S0, Z0>

                                                       <Female,       <Female, <Female,
   <S1, Z0>
     <*, 53703> <*, 53706>                               53703>         53706>   53715>
           0              0                                  0           0          0
           1              0                                  0           0          0
                                      OR                     0           0          0
           1              0
           0              0                                  0           0          1
           0              1                                  0           1          0
           0              1                                  0           1          0



Implementation of Bitmap based Incognito and Performance Evaluation                        33
Generalize 2-subset Bitmaps
                    <S1, Z2>                       <S0, Z0>
                                                        <Male,         <Male,    <Male,
                                                         53703>         53706>    53715>
         <S1, Z1>              <S0, Z2>                      0           0          1
                                                             1           0          0
                                                             1           0          0
          <S1, Z0>             <S0, Z1>                      0           0          0
                                                             0           0          0
                                                             0           0          0
                    <S0, Z0>

                                                       <Female,       <Female, <Female,
   <S1, Z0>
     <*, 53703> <*, 53706> <*, 53715>                    53703>         53706>   53715>
           0              0               1                  0           0          0
           1              0               0                  0           0          0
           1              0               0                  0           0          0
                                              OR             0           0          1
           0              0               1
           0              1               0                  0           1          0
           0              1               0                  0           1          0



Implementation of Bitmap based Incognito and Performance Evaluation                        34
Check k-anonymity
                    <S1, Z2>                           <S0, Z0>
                                                            <Male,      <Male,    <Male,
                                                             53703>      53706>    53715>
         <S1, Z1>              <S0, Z2>                        0             0       1
                                                               1             0       0
                                                               1             0       0
          <S1, Z0>             <S0, Z1>                        0             0       0
                                                               0             0       0
                                                               0             0       0
                    <S0, Z0>

                                                          <Female,    <Female, <Female,
   <S1, Z0>
     <*, 53703> <*, 53706> <*, 53715>                       53703>      53706>   53715>
           0              0               1                    0             0       0
           1              0               0                    0             0       0
           1   C          0    C          0   C                0             0       0
           0   O          0    O          1   O                0             0       1
               U               U              U                0             1       0
           0   N          1    N          0   N
           0   T          1    T          0   T                0             1       0

               2                2             2   ☞ Satisfy K(2)-anonymity

Implementation of Bitmap based Incognito and Performance Evaluation                         35
Table of Contents
       Introduction
       Existing Solutions
          −   Binary Search
          −   Incognito
       Bitmap based Incognito
       Optimization Techniques
       Performance Evaluation
       Conclusion




Implementation of Bitmap based Incognito and Performance Evaluation   36
Optimization Techniques
       1-Level Optimization
          −   Keep only 1-subset bitmaps for generating k-subset bitmaps
       Reusing Optimization
          −   Reuse intermediate (k-?)-subset bitmaps for generating k-
              subset bitmaps
       Pruning Optimization
          −   Stop counting operation if specific bitmap does not satisfy ‘k’
          −   And then check more generalized node
       Single Instruction Multiple Data
          −   Parallelize bitwise AND/OR operation using SIMD instruction


Implementation of Bitmap based Incognito and Performance Evaluation             37
1-level Optimization
                                                        e3
                                                         ↑
                                     a2                 e2
                                              g2
                                      ↑        ↑         ↑
                                     a1       g1        e1
                                      ↑        ↑         ↑
                                     a0       g0        e0




                                <a2, g2, e1> = a2 ∧ g2 ∧ e1

                        Reduce Memory and Disk Space for Bitmap!




Implementation of Bitmap based Incognito and Performance Evaluation   38
Reusing Optimization
       To generate <a2, g2, e1>
          −   a2 ∧ g2 ∧ e1
          −   <a2, g2> ∧ e1
          −   <a2, e1> ^ g2
          −   <g2, e1> ^ a2
       2-subset bitmaps are already created at the previous step




Implementation of Bitmap based Incognito and Performance Evaluation   39
Pruning Optimization




               1 => does not satisfy k
         can skip node generalization <Male, 53710>, … , <Female, 53715>

Implementation of Bitmap based Incognito and Performance Evaluation        40
Single Instruction Multiple Data
       Using SIMD Instruction
          −   BitwiseAND/OR and bit-counting operation can be parallelized
       We implemented using
          −   Intel Pentium 4 Streamed SIMD Extensions(SSE) technology




Implementation of Bitmap based Incognito and Performance Evaluation      41
Table of Contents
       Introduction
       Existing Solutions
          −   Binary Search
          −   Incognito
       Bitmap based Incognito
       Optimization Techniques
       Performance Evaluation
       Conclusion




Implementation of Bitmap based Incognito and Performance Evaluation   42
Performance Evaluation
       Dataset
          −   Small(5MB) and big(60MB) census data
          −   QI attributes set (four columns)
                  Generalization level: 3, 3, 2, 4 respectively
          −   Index size: 2MB(40%) and 16MB(27%)
          −   Bitmap size: 200KB(4%) and 2MB(3%)
       Environment
          −   Pentium IV 2.0 GHz
          −   1GB memory, 7200rpm hard disk
          −   Oracle 10g R1 & Intel C++ Compiler 9.0


Implementation of Bitmap based Incognito and Performance Evaluation   43
Performance Evaluation
                                           Small Data
     25.000

     20.000

     15.000

     10.000

      5.000

      0.000
                     4000            2000            1000                500        100
                         1-Level       Pruning      Reusing           Traditional

Implementation of Bitmap based Incognito and Performance Evaluation                       44
Performance Evaluation
                                   Small Data (zoom in)
     1.400

     1.200

     1.000

     0.800

     0.600

     0.400

     0.200

     0.000
                    4000            2000             1000             500   100
                                  1-Level      Pruning       Reusing

Implementation of Bitmap based Incognito and Performance Evaluation               45
Performance Evaluation
                                            Big Data
     1400.000

     1200.000

     1000.000

      800.000

      600.000

      400.000

      200.000

         0.000
                        4000           2000            1000              500        100
                         1-Level       Pruning      Reusing           Traditional

Implementation of Bitmap based Incognito and Performance Evaluation                       46
Performance Evaluation
                                     Big Data (zoom in)
     4.000
     3.500
     3.000
     2.500
     2.000
     1.500
     1.000
     0.500
     0.000
                    4000            2000             1000             500   100
                                  1-Level      Pruning       Reusing

Implementation of Bitmap based Incognito and Performance Evaluation               47
Table of Contents
       Introduction
       Existing Solutions
          −   Binary Search
          −   Incognito
       Bitmap based Incognito
       Optimization Techniques
       Performance Evaluation
       Conclusion




Implementation of Bitmap based Incognito and Performance Evaluation   48
Conclusion
       Incognito = very innovative k-anonymity algorithm
          −   Still inefficient in checking the for each node
          −   Expensive external sort or hash for counting (e.g. GROUP BY)
       Using Bitmap (Bitwise AND/OR)
          −   Additional optimization opportunities
                  Reusing Optimization
                  Pruning Optimization
                  Single Instruction Multiple Data
          −   Space/time trade-off
                  1-level / Reusing Optimization


Implementation of Bitmap based Incognito and Performance Evaluation          49

More Related Content

Recently uploaded

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Recently uploaded (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 

Featured

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

Featured (20)

Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 

Implementation of Bitmap based Incognito and Performance Evaluation

  • 1. Implementation of Bitmap based Incognito and Performance Evaluation Hyunho Kang, Jaemyung Kim, Gapjoo Na, and Sangwon Lee Sungkyunkwan University
  • 2. Table of Contents  Introduction  Existing Solutions − Binary Search − Incognito  Bitmap based Incognito  Optimization Techniques  Performance Evaluation  Conclusion Implementation of Bitmap based Incognito and Performance Evaluation 2
  • 3. Introduction  Privacy Problem and Solution (Sweeney) − Released microdata → Join attack (Re-identification) − Solution: k-anonymization  K-anonymization Algorithm − Full-domain binary search − Incognito: one of the most efficient algorithm (Kristen)  Problem of Existing Incognito Algorithm − Require many repeating sorts against large volume data − Solution: using bitmap index structure  Completely eliminate the expensive sort Implementation of Bitmap based Incognito and Performance Evaluation 3
  • 4. Joining Attack  Example - Joining Attack Voter Registration List Hospital Patients Name DOB Sex Zipcode DOB Sex Zipcode Disease Andre Andre 1/21/76 AndreMale 1/21/76 Male 53715 1/21/76 53715 Male 1/21/76 53715 1/21/76 Male Male Flu 53715 53715 Flu Flu Beth 1/10/81 Female 55410 1/21/76 Male 53703 Broken Arm Carol 10/1/44 Female 90210 2/28/76 Male 53703 Bronchitis Dan 2/21/84 Male 02174 4/13/86 Female 53715 Hepatitis Ellen 4/19/72 Female 02237 4/13/86 Female 53706 Sprained Ankle 2/28/86 Female 53706 Hang Nail Name DOB Sex Zipcode Disease Implementation of Bitmap based Incognito and Performance Evaluation 4
  • 5. Joining Attack Voter Registration List Hospital Patients Name DOB Sex Zipcode DOB Sex Zipcode Disease Andre 1/21/76 Male 53715 1/21/76 Male 537** Flu Andre 1/21/76AndreMale 1/21/76 53715 Male 1/21/76 537** Male Flu OR 537** Flu Beth 1/10/81 Female 55410 1/21/76 Male 537** Broken Broken Arm 1/21/76 Male 537** Broken Carol 10/1/44 Female 90210 2/28/76 Male Arm537** Bronchitis Dan 2/21/84 Male 02174 4/13/86 Female 537** Hepatitis Ellen 4/19/72 Female 02237 4/13/86 Female 537** Sprained Ankle 2/28/86 Female 537** Hang Nail Name DOB Sex Zipcode Disease Implementation of Bitmap based Incognito and Performance Evaluation 5
  • 6. Basic Definitions (1/3)  Quasi-Identifier Attribute Set (Q) − minimal set of attributes in table T that can be joined with external information to re-identify individual records − e.g. {Birthdate, Sex, Zipcode}  Frequency Set − a mapping from each unique combination of values of Q in T to the total number of tuples in T with these values of Q (the counts) Implementation of Bitmap based Incognito and Performance Evaluation 6
  • 7. Basic Definitions (2/3)  K-anonymity (K-anonymous) − To satisfy the k-anonymity property(or k-anonymous) with respect to attribute set Q if every count in the frequency set of T with respect to Q is greater than or equal to k. − In SQL, table T is k-anonymous if each SELECT MIN(COUNT(*)) FROM T GROUP BY (Subset of Quasi-Identifier) is ≥ k − e.g. SELECT MIN(COUNT(*)) FROM “Hospital Patients” GROUP BY DOB, Sex, Zipcode Implementation of Bitmap based Incognito and Performance Evaluation 7
  • 8. Basic Definitions (3/3)  Generalization − is defined by function (user-defined function) − Notation <D : Di <D Dj: Dj is generalization of Di Implementation of Bitmap based Incognito and Performance Evaluation 8
  • 9. Example of Generalization (1/3)  Domain and Value Generalization 5371* = f(53715) Z2 537** 537** = f(5371*) Z1 5371* 5370* Zipcode(Z0) 53715 53710 53706 53703 B1 S1 Person * Birth(B0) 1/21/76 2/28/76 4/13/86 Sex(S0) Male Female Implementation of Bitmap based Incognito and Performance Evaluation 9
  • 10. Example of Generalization (2/3)  Generalization Lattice for Two Attributes <B1, S1> <B1,S0> <B0, S1> <S1, Z2> Sex Zipcode <B0, S0> <S1, Z1> <S0, Z2> Male 537** Female 537** <B1, Z2> <S1, Z0> <S0, Z1> Sex Zipcode <B1, Z1> <B0, Z2> <S0, Z0> Male 5370* Male 5371* <B1, Z0> <B0, Z1> Female 5370* Female 5371* <B0, Z0> Implementation of Bitmap based Incognito and Performance Evaluation 10
  • 11. Table of Contents  Introduction  Existing Solutions − Binary Search − Incognito  Bitmap based Incognito  Optimization Techniques  Performance Evaluation  Conclusion Implementation of Bitmap based Incognito and Performance Evaluation 11
  • 12. Full-Domain Generalization Algorithm  Binary Search of the lattice finds solution of minimum height - if no generalization of height h satisfies k-anonymity, then no generalization of height h’ < h will satisfy k-anonymity. <S1, Z2> h : maximum height in the generalization lattice 1) Check generalization at height └h/2┘ <S0, Z2> 2) If this height satisfies k-anonymity <S1, Z1> 2-1) check generalization at height └h/4┘ 3) Else <S1, Z0> <S0, Z1> 3-1) check generalization at height └3h/4┘ 4) And so on… <S0, Z0>  This algorithm is proven to find a single minimal full- domain k-anonymization Implementation of Bitmap based Incognito and Performance Evaluation 12
  • 13. Key Properties of Incognito  Generalization Property: <Z0> →<Z1>  Rollup Property  Subset Property: <S1,Z0,D1> → <S1,Z0>, <S1,D1>, <Z0,D1> Hospital Patients Hospital Patients B0 S0 Z0 D0 B0 S0 Z1 D0 1/21/76 Male 53715 Flu 1/21/76 Male 5371* Flu 1/21/76 Male 53703 Broken Arm 1/21/76 Male 5370* Broken Arm 2/28/76 Male 53703 Bronchitis 2/28/76 Male 5370* Bronchitis 4/13/86 Female 53715 Hepatitis 4/13/86 Female 5371* Hepatitis 4/13/86 Female 53706 Sprained Ankle 4/13/86 Female 5370* Sprained Ankle 2/28/86 Female 53706 Hang Nail 2/28/86 Female 5370* Hang Nail Implementation of Bitmap based Incognito and Performance Evaluation 13
  • 14. Basic Incognito Example (1/3)  Consider quasi-identifier (DOB, Sex, Zipcode) and k = 2  Search 1-subsets Hospital Patients DOB Sex Zipcode Disease 1/21/76 Male 53715 Flu 1/21/76 Male 53703 Broken Arm B1 2/28/76 Male 53703 Bronchitis 4/13/86 Female 53715 Hepatitis B0 4/13/86 Female 53706 Sprained Ankle 2/28/76 Female 53706 Hang Nail DOB Count 1/21/76 2 4/13/86 2 SELECT 2/28/76 2 COUNT(*) GROUP BY DOB Implementation of Bitmap based Incognito and Performance Evaluation 14
  • 15. Basic Incognito Example (1/3)  Consider quasi-identifier (DOB, Sex, Zipcode) and k = 2  Search 1-subsets Hospital Patients DOB Sex Zipcode Disease 1/21/76 Male 53715 Flu 1/21/76 Male 53703 Broken Arm S1 2/28/76 Male 53703 Bronchitis 4/13/86 Female 53715 Hepatitis S0 4/13/86 Female 53706 Sprained Ankle 2/28/76 Female 53706 Hang Nail Sex Count Male 3 Female 3 SELECT COUNT(*) GROUP BY Sex Implementation of Bitmap based Incognito and Performance Evaluation 15
  • 16. Basic Incognito Example (1/3)  Consider quasi-identifier (DOB, Sex, Zipcode) and k = 2  Search 1-subsets Hospital Patients DOB Sex Zipcode Disease 1/21/76 Male 53715 Flu Z2 1/21/76 Male 53703 Broken Arm Z1 2/28/76 Male 53703 Bronchitis 4/13/86 Female 53715 Hepatitis Z0 4/13/86 Female 53706 Sprained Ankle 2/28/76 Female 53706 Hang Nail Zipcode Count 53715 2 53703 2 SELECT COUNT(*) 53706 2 GROUP BY Zipcode Implementation of Bitmap based Incognito and Performance Evaluation 16
  • 17. Basic Incognito Example (2/3)  Search all 2-subsets Hospital Patients DOB Sex Zipcode Disease <S1, Z2> 1/21/76 Male 53715 Flu 1/21/76 Male 53703 Broken Arm <S1, Z1> <S0, Z2> 2/28/76 Male 53703 Bronchitis 4/13/86 Female 53715 Hepatitis <S1, Z0> <S0, Z1> 4/13/86 Female 53706 Sprained Ankle 2/28/76 Female 53706 Hang Nail <S0, Z0> Sex Zipcode Count SELECT Male 53715 1 COUNT(*) Female 53715 1 GROUP BY Male 53703 2 Sex, Zipcode Female 53706 2 Implementation of Bitmap based Incognito and Performance Evaluation 17
  • 18. Basic Incognito Example (2/3)  Search all 2-subsets Hospital Patients DOB Sex Zipcode Disease <S1, Z2> 1/21/76 Male 53715 Flu 1/21/76 Male 53703 Broken Arm <S1, Z1> <S0, Z2> 2/28/76 Male 53703 Bronchitis 4/13/86 Female 53715 Hepatitis <S1, Z0> <S0, Z1> 4/13/86 Female 53706 Sprained Ankle 2/28/76 Female 53706 Hang Nail SELECT S1 Zipcode Count COUNT(*) * 53715 2 GROUP BY * 53703 2 S1, Zipcode * 53706 2 Implementation of Bitmap based Incognito and Performance Evaluation 18
  • 19. Basic Incognito Example (2/3)  Search all 2-subsets Hospital Patients DOB Sex Zipcode Disease <S1, Z2> 1/21/76 Male 53715 Flu 1/21/76 Male 53703 Broken Arm <S1, Z1> <S0, Z2> 2/28/76 Male 53703 Bronchitis 4/13/86 Female 53715 Hepatitis <S1, Z0> <S0, Z1> 4/13/86 Female 53706 Sprained Ankle 2/28/76 Female 53706 Hang Nail Sex Z1 Count SELECT Male 5371* 1 COUNT(*) Female 5371* 1 GROUP BY Male 5370* 2 Sex, Z1 Female 5370* 2 Implementation of Bitmap based Incognito and Performance Evaluation 19
  • 20. Basic Incognito Example (2/3)  Search all 2-subsets Hospital Patients DOB Sex Zipcode Disease <S1, Z2> 1/21/76 Male 53715 Flu 1/21/76 Male 53703 Broken Arm <S1, Z1> <S0, Z2> 2/28/76 Male 53703 Bronchitis 4/13/86 Female 53715 Hepatitis <S1, Z0> 4/13/86 Female 53706 Sprained Ankle 2/28/76 Female 53706 Hang Nail SELECT Sex Z2 Count COUNT(*) Male 537** 3 GROUP BY Female 537** 3 Sex, Z2 Implementation of Bitmap based Incognito and Performance Evaluation 20
  • 21. Basic Incognito Example (3/3)  Search 3-subsets Hospital Patients DOB Sex Zipcode Disease 1/21/76 Male 53715 Flu <B1, S1, Z2> 1/21/76 Male 53703 Broken Arm 2/28/76 Male 53703 Bronchitis <B1, S1, Z1> <B1, S0, Z2> <B0, S1, Z2> 4/13/86 Female 53715 Hepatitis 4/13/86 Female 53706 Sprained Ankle <B1, S1, Z0> 2/28/76 Female 53706 Hang Nail Implementation of Bitmap based Incognito and Performance Evaluation 21
  • 22. Basic Incognito Example (3/3) <B1, S1, Z2> <B1, S1, Z1> <B1, S0, Z2> <B0, S1, Z2> <B1, S1, Z0> VS. <B1, S1, Z2> <B1, S1, Z1> <B1, S0, Z2> <B0, S1, Z2> <B1, S1, Z0> <B1, S0, Z1> <B0, S1, Z1> <B0, S0, Z2> <B1, S0, Z0> <B0, S1, Z0> <B0, S0, Z1> <B0, S0, Z0> Implementation of Bitmap based Incognito and Performance Evaluation 22
  • 23. Table of Contents  Introduction  Existing Solutions − Binary Search − Incognito  Bitmap based Incognito  Optimization Techniques  Performance Evaluation  Conclusion Implementation of Bitmap based Incognito and Performance Evaluation 23
  • 24. What Is the Problem?  Incognito Is Very Nice Algorithm − but…  Checking k-anonymity for each node is still expensive! − SELECT MIN(COUNT(*)) FROM T GROUP BY (QI Attr. Set) Implementation of Bitmap based Incognito and Performance Evaluation 24
  • 25. Bitmap based Incognito  Generalization − bitwise OR operation  Combination − bitwise AND operation  Checking k-anonymity − bit-counting operation Implementation of Bitmap based Incognito and Performance Evaluation 25
  • 26. Generalize 1-subset (single attr.) Hospital Patients DOB Sex Zipcode Disease 1/21/76 Male 53715 Flu ???? 0 1/21/76 Male 53703 Broken Arm ???? 1 2/28/76 Male 53703 Bronchitis 5370* 1 4/13/86 Female 53715 Hepatitis 5370* = 0 5370* 1 4/13/86 Female 53706 Sprained Ankle 5370* 1 2/28/76 Female 53706 Hang Nail OR 0 0 537** 0 1 0 1 0 0 1 0 5371* 5370* 1 0 53715 53710 53706 53703 Implementation of Bitmap based Incognito and Performance Evaluation 26
  • 27. Combination and Generalization Male, 53703 001100 && Male 111000 || * 111111 Male, 53706 000000 Female 000111 Male, 53715 100000 < S1 > < sex > Female, 53703 000000 53703 011000 && || 5370* 011011 Female, 53706 000011 53706 000011 Female, 53715 000100 53715 100100 5371* 100100 < S0, Z0 > < zipcode > < Z1 > Generate 2-subsets Generalize Implementation of Bitmap based Incognito and Performance Evaluation 27
  • 28. Generate <S0,Z0> using Bitmap <S1, Z2> S0 Male Female <Male, 53703> <S1, Z1> <S0, Z2> 1 0 0 1 0 1 1 0 1 <S1, Z0> <S0, Z1> 0 1 0 0 1 0 0 1 0 <S0, Z0> AND Z0 53703 53706 53715 0 0 1 1 0 0 1 0 0 0 0 1 0 1 0 0 1 0 Implementation of Bitmap based Incognito and Performance Evaluation 28
  • 29. Generate <S0,Z0> using Bitmap <S1, Z2> S0 Male Female <Male, 53706> <S1, Z1> <S0, Z2> 1 0 0 1 0 0 1 0 0 <S1, Z0> <S0, Z1> 0 1 0 0 1 0 0 1 0 <S0, Z0> AND Z0 53703 53706 53715 0 0 1 1 0 0 1 0 0 0 0 1 0 1 0 0 1 0 Implementation of Bitmap based Incognito and Performance Evaluation 29
  • 30. Generate <S0,Z0> using Bitmap <S1, Z2> S0 Male Female <Male, 53715> <S1, Z1> <S0, Z2> 1 0 1 1 0 0 1 0 0 <S1, Z0> <S0, Z1> 0 1 0 0 1 0 0 1 0 <S0, Z0> AND Z0 53703 53706 53715 0 0 1 1 0 0 1 0 0 0 0 1 0 1 0 0 1 0 Implementation of Bitmap based Incognito and Performance Evaluation 30
  • 31. Generate <S0,Z0> using Bitmap <S1, Z2> S0 <Male, <Male, Male Female 53703> 53715> <S1, Z1> <S0, Z2> 1 0 0 1 1 0 1 0 1 0 1 0 <S1, Z0> <S0, Z1> 0 1 0 0 0 1 0 0 0 1 0 0 <S0, Z0> AND Z0 <Male, 53703 53706 53715 53706> 0 0 1 0 1 0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0 1 0 0 Implementation of Bitmap based Incognito and Performance Evaluation 31
  • 32. Generalize 2-subset Bitmaps <S1, Z2> <S0, Z0> <Male, <Male, <Male, 53703> 53706> 53715> <S1, Z1> <S0, Z2> 0 0 1 1 0 0 1 0 0 <S1, Z0> <S0, Z1> 0 0 0 0 0 0 0 0 0 <S0, Z0> <Female, <Female, <Female, <S1, Z0> <*, 53703> 53703> 53706> 53715> 0 0 0 0 1 0 0 0 1 0 0 0 0 OR 0 0 1 0 0 1 0 0 0 1 0 Implementation of Bitmap based Incognito and Performance Evaluation 32
  • 33. Generalize 2-subset Bitmaps <S1, Z2> <S0, Z0> <Male, <Male, <Male, 53703> 53706> 53715> <S1, Z1> <S0, Z2> 0 0 1 1 0 0 1 0 0 <S1, Z0> <S0, Z1> 0 0 0 0 0 0 0 0 0 <S0, Z0> <Female, <Female, <Female, <S1, Z0> <*, 53703> <*, 53706> 53703> 53706> 53715> 0 0 0 0 0 1 0 0 0 0 OR 0 0 0 1 0 0 0 0 0 1 0 1 0 1 0 0 1 0 1 0 Implementation of Bitmap based Incognito and Performance Evaluation 33
  • 34. Generalize 2-subset Bitmaps <S1, Z2> <S0, Z0> <Male, <Male, <Male, 53703> 53706> 53715> <S1, Z1> <S0, Z2> 0 0 1 1 0 0 1 0 0 <S1, Z0> <S0, Z1> 0 0 0 0 0 0 0 0 0 <S0, Z0> <Female, <Female, <Female, <S1, Z0> <*, 53703> <*, 53706> <*, 53715> 53703> 53706> 53715> 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 OR 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 0 1 0 Implementation of Bitmap based Incognito and Performance Evaluation 34
  • 35. Check k-anonymity <S1, Z2> <S0, Z0> <Male, <Male, <Male, 53703> 53706> 53715> <S1, Z1> <S0, Z2> 0 0 1 1 0 0 1 0 0 <S1, Z0> <S0, Z1> 0 0 0 0 0 0 0 0 0 <S0, Z0> <Female, <Female, <Female, <S1, Z0> <*, 53703> <*, 53706> <*, 53715> 53703> 53706> 53715> 0 0 1 0 0 0 1 0 0 0 0 0 1 C 0 C 0 C 0 0 0 0 O 0 O 1 O 0 0 1 U U U 0 1 0 0 N 1 N 0 N 0 T 1 T 0 T 0 1 0 2 2 2 ☞ Satisfy K(2)-anonymity Implementation of Bitmap based Incognito and Performance Evaluation 35
  • 36. Table of Contents  Introduction  Existing Solutions − Binary Search − Incognito  Bitmap based Incognito  Optimization Techniques  Performance Evaluation  Conclusion Implementation of Bitmap based Incognito and Performance Evaluation 36
  • 37. Optimization Techniques  1-Level Optimization − Keep only 1-subset bitmaps for generating k-subset bitmaps  Reusing Optimization − Reuse intermediate (k-?)-subset bitmaps for generating k- subset bitmaps  Pruning Optimization − Stop counting operation if specific bitmap does not satisfy ‘k’ − And then check more generalized node  Single Instruction Multiple Data − Parallelize bitwise AND/OR operation using SIMD instruction Implementation of Bitmap based Incognito and Performance Evaluation 37
  • 38. 1-level Optimization e3 ↑ a2 e2 g2 ↑ ↑ ↑ a1 g1 e1 ↑ ↑ ↑ a0 g0 e0 <a2, g2, e1> = a2 ∧ g2 ∧ e1 Reduce Memory and Disk Space for Bitmap! Implementation of Bitmap based Incognito and Performance Evaluation 38
  • 39. Reusing Optimization  To generate <a2, g2, e1> − a2 ∧ g2 ∧ e1 − <a2, g2> ∧ e1 − <a2, e1> ^ g2 − <g2, e1> ^ a2  2-subset bitmaps are already created at the previous step Implementation of Bitmap based Incognito and Performance Evaluation 39
  • 40. Pruning Optimization 1 => does not satisfy k can skip node generalization <Male, 53710>, … , <Female, 53715> Implementation of Bitmap based Incognito and Performance Evaluation 40
  • 41. Single Instruction Multiple Data  Using SIMD Instruction − BitwiseAND/OR and bit-counting operation can be parallelized  We implemented using − Intel Pentium 4 Streamed SIMD Extensions(SSE) technology Implementation of Bitmap based Incognito and Performance Evaluation 41
  • 42. Table of Contents  Introduction  Existing Solutions − Binary Search − Incognito  Bitmap based Incognito  Optimization Techniques  Performance Evaluation  Conclusion Implementation of Bitmap based Incognito and Performance Evaluation 42
  • 43. Performance Evaluation  Dataset − Small(5MB) and big(60MB) census data − QI attributes set (four columns)  Generalization level: 3, 3, 2, 4 respectively − Index size: 2MB(40%) and 16MB(27%) − Bitmap size: 200KB(4%) and 2MB(3%)  Environment − Pentium IV 2.0 GHz − 1GB memory, 7200rpm hard disk − Oracle 10g R1 & Intel C++ Compiler 9.0 Implementation of Bitmap based Incognito and Performance Evaluation 43
  • 44. Performance Evaluation Small Data 25.000 20.000 15.000 10.000 5.000 0.000 4000 2000 1000 500 100 1-Level Pruning Reusing Traditional Implementation of Bitmap based Incognito and Performance Evaluation 44
  • 45. Performance Evaluation Small Data (zoom in) 1.400 1.200 1.000 0.800 0.600 0.400 0.200 0.000 4000 2000 1000 500 100 1-Level Pruning Reusing Implementation of Bitmap based Incognito and Performance Evaluation 45
  • 46. Performance Evaluation Big Data 1400.000 1200.000 1000.000 800.000 600.000 400.000 200.000 0.000 4000 2000 1000 500 100 1-Level Pruning Reusing Traditional Implementation of Bitmap based Incognito and Performance Evaluation 46
  • 47. Performance Evaluation Big Data (zoom in) 4.000 3.500 3.000 2.500 2.000 1.500 1.000 0.500 0.000 4000 2000 1000 500 100 1-Level Pruning Reusing Implementation of Bitmap based Incognito and Performance Evaluation 47
  • 48. Table of Contents  Introduction  Existing Solutions − Binary Search − Incognito  Bitmap based Incognito  Optimization Techniques  Performance Evaluation  Conclusion Implementation of Bitmap based Incognito and Performance Evaluation 48
  • 49. Conclusion  Incognito = very innovative k-anonymity algorithm − Still inefficient in checking the for each node − Expensive external sort or hash for counting (e.g. GROUP BY)  Using Bitmap (Bitwise AND/OR) − Additional optimization opportunities  Reusing Optimization  Pruning Optimization  Single Instruction Multiple Data − Space/time trade-off  1-level / Reusing Optimization Implementation of Bitmap based Incognito and Performance Evaluation 49