Call Girl Number in Vashi Mumbai📲 9833363713 💞 Full Night Enjoy
Disclosing Small Geographic Areas while Protecting Privacy—GeoLeader
1. Disclosing Small Geographic Areas while Protecting
Privacy—GeoLeader
Khaled El Emam, Luk Arbuckle (Presenter)
Electronic Health Information Laboratory
Background Methods Evaluation Results
Comparing Data Set Information Loss
Abstract GeoLeader Algorithm Summary
Geographic information is often collected We modified the Leader spatial clustering Cropping GeoLeader GeoLeader results in less information loss
from patients in surveillance and screening algorithm to meet privacy criteria. The Baby Date of Birth quarter/year quarter/year than cropping, is better able to produce data
programs. Although this is useful for algorithm’s aim is to ensure that aggregated Mother Date of Birth quarter/year quarter/year sets that allow the detection of outbreak
displaying the geographic distribution of areas are not too small but at the same time Baby Sex unchanged unchanged clusters, and the accuracy of outbreak cluster
diseases or in cluster detection, geographic to limit the amount of aggregation and Postal Code 1st Character only GeoLeader groups detection for data sets aggregated using
information also makes it easier to re-identify maximize the utility of the data. Minimum Entropy 994,610.5 573,338.1 GeoLeader is almost the same as on the
individuals in the data, especially if the original data when there are few quasi-
geographic areas are small. The algorithm creates areas that are Comparing Outbreak Cluster Detection identifiers in the data set.
contiguous and compact. It does not require
We developed a privacy-preserving clustering a predefined number of aggregated areas. It The advantages of GeoLeader are that it takes
method that optimally aggregates small does not require the computation of all into account the other non-geographic
geographic areas while minimizing pairwise distances among the original areas, variables in a data set that can be used for re-
information loss. Its effectiveness is which would not scale for large data sets. It is identification, and results in less aggregation
demonstrated through a simulation in which the first aggregation algorithm that takes into of small areas. This means data sets will be of
outbreaks are detected equally well with account the population density in the areas higher utility for health services research and
aggregated and original geographic data. being aggregated. public health investigations.
Empirical Evaluation Limitations
Comparing data set information loss. We GeoLeader requires an adjacency matrix,
used GeoLeader to de-identify data from the which shows which areas are physically
provincial birth registry of Ontario. We adjacent to one another. This only needs to
compared the information loss for the whole be computed once, before spatial clustering.
data set between the use of the GeoLeader
algorithm and cropping of postal codes. The We only used the Kulldorff spatial scan
lower the entropy the higher the quality of statistic but it has been shown to work very
the resultant data set. well in practice. Our goal was to assess
general performance under different
Comparing outbreak cluster detection. We aggregation levels.
conducted a simulation study to evaluate
Example postal code and adjacent regions.
outbreak cluster detection before and after
using GeoLeader. Population distributions
Contributing Authors were preserved, for more realistic results. We
Khaled El Emam, Fida Dankar, Philip conducted a purely spatial analysis using a
AbdelMalik, Grant Middleton, Luk Arbuckle, Bernouilli probability model based on the
Sean Rose. Kulldorff scan statistic. Lines represent different numbers of max equivalence
classes (aggregation and suppression is used to ensure this).
Copyright 2013 CHEO Research Institute, 401 Smyth Rd., Ottawa, Canada, K1H8L1