Disclosing Small Geographic Areas while Protecting Privacy—GeoLeader

•

1 like•257 views

Luk Arbuckle

Poster for The Ontario Public Health Convention (TOPHC) 2013.

Health & Medicine

Disclosing Small Geographic Areas while Protecting
Privacy—GeoLeader
Khaled El Emam, Luk Arbuckle (Presenter)
Electronic Health Information Laboratory

Background Methods Evaluation Results
Comparing Data Set Information Loss
Abstract GeoLeader Algorithm Summary
Geographic information is often collected We modified the Leader spatial clustering Cropping GeoLeader GeoLeader results in less information loss
from patients in surveillance and screening algorithm to meet privacy criteria. The Baby Date of Birth quarter/year quarter/year than cropping, is better able to produce data
programs. Although this is useful for algorithm’s aim is to ensure that aggregated Mother Date of Birth quarter/year quarter/year sets that allow the detection of outbreak
displaying the geographic distribution of areas are not too small but at the same time Baby Sex unchanged unchanged clusters, and the accuracy of outbreak cluster
diseases or in cluster detection, geographic to limit the amount of aggregation and Postal Code 1st Character only GeoLeader groups detection for data sets aggregated using
information also makes it easier to re-identify maximize the utility of the data. Minimum Entropy 994,610.5 573,338.1 GeoLeader is almost the same as on the
individuals in the data, especially if the original data when there are few quasi-
geographic areas are small. The algorithm creates areas that are Comparing Outbreak Cluster Detection identifiers in the data set.
contiguous and compact. It does not require
We developed a privacy-preserving clustering a predefined number of aggregated areas. It The advantages of GeoLeader are that it takes
method that optimally aggregates small does not require the computation of all into account the other non-geographic
geographic areas while minimizing pairwise distances among the original areas, variables in a data set that can be used for re-
information loss. Its effectiveness is which would not scale for large data sets. It is identification, and results in less aggregation
demonstrated through a simulation in which the first aggregation algorithm that takes into of small areas. This means data sets will be of
outbreaks are detected equally well with account the population density in the areas higher utility for health services research and
aggregated and original geographic data. being aggregated. public health investigations.

Empirical Evaluation Limitations
Comparing data set information loss. We GeoLeader requires an adjacency matrix,
used GeoLeader to de-identify data from the which shows which areas are physically
provincial birth registry of Ontario. We adjacent to one another. This only needs to
compared the information loss for the whole be computed once, before spatial clustering.
data set between the use of the GeoLeader
algorithm and cropping of postal codes. The We only used the Kulldorff spatial scan
lower the entropy the higher the quality of statistic but it has been shown to work very
the resultant data set. well in practice. Our goal was to assess
general performance under different
Comparing outbreak cluster detection. We aggregation levels.
conducted a simulation study to evaluate
Example postal code and adjacent regions.
outbreak cluster detection before and after
using GeoLeader. Population distributions
Contributing Authors were preserved, for more realistic results. We
Khaled El Emam, Fida Dankar, Philip conducted a purely spatial analysis using a
AbdelMalik, Grant Middleton, Luk Arbuckle, Bernouilli probability model based on the
Sean Rose. Kulldorff scan statistic. Lines represent different numbers of max equivalence
classes (aggregation and suppression is used to ensure this).

Copyright 2013 CHEO Research Institute, 401 Smyth Rd., Ottawa, Canada, K1H8L1

Similar to Disclosing Small Geographic Areas while Protecting Privacy—GeoLeader

A hybrid approach for analysis of dynamic changes in spatial dataijdms

Fuzzy In Remote ClassificationUniversity of Oradea

fmelleHumanActivityRecognitionWithMobileSensorsFridtjof Melle

cec01.doc.docbutest

Technology Capabilitiesteraelement

Conferencia 9 andreas_huenimontaval

TWO LEVEL DATA FUSION MODEL FOR DATA MINIMIZATION AND EVENT DETECTION IN PERI...pijans

SensfusionAsiri Indrajith

Data Analysis and Prediction System for Meteorological DataIRJET Journal

An exploratory analysis on half hourly electricity load patterns leading to h...acijjournal

An exploratory analysis on half hourly electricity load patterns leading to h...ijaia

Improving the effectiveness of information retrieval system using adaptive ge...ijcsit

Quantifying the Digital Disruption of HealthMontana State University

Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor

CONTENT BASED VIDEO CATEGORIZATION USING RELATIONAL CLUSTERING WITH LOCAL SCA...ijcsit

Estimating Parameters of Multiple Heterogeneous Target Objects Using Composit...ambitlick

Clutter Reduction in Multi-Dimensional Visualization by Using Dimension Reduc...International Journal of Science and Research (IJSR)

A HYBRID FUZZY SYSTEM BASED COOPERATIVE SCALABLE AND SECURED LOCALIZATION SCH...ijwmn

Similar to Disclosing Small Geographic Areas while Protecting Privacy—GeoLeader (20)

A hybrid approach for analysis of dynamic changes in spatial data

Fuzzy In Remote Classification

fmelleHumanActivityRecognitionWithMobileSensors

cec01.doc.doc

Technology Capabilities

Conferencia 9 andreas_hueni

TWO LEVEL DATA FUSION MODEL FOR DATA MINIMIZATION AND EVENT DETECTION IN PERI...

Sensfusion

Data Analysis and Prediction System for Meteorological Data

An exploratory analysis on half hourly electricity load patterns leading to h...

Improving the effectiveness of information retrieval system using adaptive ge...

Quantifying the Digital Disruption of Health

Welcome to International Journal of Engineering Research and Development (IJERD)

CONTENT BASED VIDEO CATEGORIZATION USING RELATIONAL CLUSTERING WITH LOCAL SCA...

Estimating Parameters of Multiple Heterogeneous Target Objects Using Composit...

Clutter Reduction in Multi-Dimensional Visualization by Using Dimension Reduc...

A HYBRID FUZZY SYSTEM BASED COOPERATIVE SCALABLE AND SECURED LOCALIZATION SCH...

Recently uploaded

Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...hotbabesbook

Call Girls Aurangabad Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora

Call Girls Kochi Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora

High Profile Call Girls Coimbatore Saanvi☎️ 8250192130 Independent Escort Se...narwatsonia7

♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...astropune

Call Girls Ooty Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora

Call Girls Siliguri Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora

Bangalore Call Girl Whatsapp Number 100% Complete Your Sexual NeedsGfnyt

(👑VVIP ISHAAN ) Russian Call Girls Service Navi Mumbai🖕9920874524🖕Independent...Taniya Sharma

Call Girl Number in Panvel Mumbai📲 9833363713 💞 Full Night Enjoybabeytanya

Russian Call Girls in Delhi Tanvi ➡️ 9711199012 💋📞 Independent Escort Service...Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure

All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...Arohi Goyal

Bangalore Call Girls Hebbal Kempapura Number 7001035870 Meetin With Bangalor...narwatsonia7

Premium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort Servicevidya singh

Vip Call Girls Anna Salai Chennai 👉 8250192130 ❣️💯 Top Class Girls AvailableNehru place Escorts

Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora

Artifacts in Nuclear Medicine with Identifying and resolving artifacts.MiadAlsulami

Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...aartirawatdelhi

💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...Taniya Sharma

Call Girl Number in Vashi Mumbai📲 9833363713 💞 Full Night Enjoybabeytanya

Recently uploaded (20)

Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...

Call Girls Aurangabad Just Call 9907093804 Top Class Call Girl Service Available

Call Girls Kochi Just Call 9907093804 Top Class Call Girl Service Available

High Profile Call Girls Coimbatore Saanvi☎️ 8250192130 Independent Escort Se...

♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...

Call Girls Ooty Just Call 9907093804 Top Class Call Girl Service Available

Call Girls Siliguri Just Call 9907093804 Top Class Call Girl Service Available

Bangalore Call Girl Whatsapp Number 100% Complete Your Sexual Needs

(👑VVIP ISHAAN ) Russian Call Girls Service Navi Mumbai🖕9920874524🖕Independent...

Call Girl Number in Panvel Mumbai📲 9833363713 💞 Full Night Enjoy

Russian Call Girls in Delhi Tanvi ➡️ 9711199012 💋📞 Independent Escort Service...

All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...

Bangalore Call Girls Hebbal Kempapura Number 7001035870 Meetin With Bangalor...

Premium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort Service

Vip Call Girls Anna Salai Chennai 👉 8250192130 ❣️💯 Top Class Girls Available

Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service Available

Artifacts in Nuclear Medicine with Identifying and resolving artifacts.

Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...

💎VVIP Kolkata Call Girls Parganas🩱7001035870🩱Independent Girl ( Ac Rooms Avai...

Call Girl Number in Vashi Mumbai📲 9833363713 💞 Full Night Enjoy

Disclosing Small Geographic Areas while Protecting Privacy—GeoLeader

1. Disclosing Small Geographic Areas while Protecting Privacy—GeoLeader Khaled El Emam, Luk Arbuckle (Presenter) Electronic Health Information Laboratory Background Methods Evaluation Results Comparing Data Set Information Loss Abstract GeoLeader Algorithm Summary Geographic information is often collected We modified the Leader spatial clustering Cropping GeoLeader GeoLeader results in less information loss from patients in surveillance and screening algorithm to meet privacy criteria. The Baby Date of Birth quarter/year quarter/year than cropping, is better able to produce data programs. Although this is useful for algorithm’s aim is to ensure that aggregated Mother Date of Birth quarter/year quarter/year sets that allow the detection of outbreak displaying the geographic distribution of areas are not too small but at the same time Baby Sex unchanged unchanged clusters, and the accuracy of outbreak cluster diseases or in cluster detection, geographic to limit the amount of aggregation and Postal Code 1st Character only GeoLeader groups detection for data sets aggregated using information also makes it easier to re-identify maximize the utility of the data. Minimum Entropy 994,610.5 573,338.1 GeoLeader is almost the same as on the individuals in the data, especially if the original data when there are few quasi- geographic areas are small. The algorithm creates areas that are Comparing Outbreak Cluster Detection identifiers in the data set. contiguous and compact. It does not require We developed a privacy-preserving clustering a predefined number of aggregated areas. It The advantages of GeoLeader are that it takes method that optimally aggregates small does not require the computation of all into account the other non-geographic geographic areas while minimizing pairwise distances among the original areas, variables in a data set that can be used for re- information loss. Its effectiveness is which would not scale for large data sets. It is identification, and results in less aggregation demonstrated through a simulation in which the first aggregation algorithm that takes into of small areas. This means data sets will be of outbreaks are detected equally well with account the population density in the areas higher utility for health services research and aggregated and original geographic data. being aggregated. public health investigations. Empirical Evaluation Limitations Comparing data set information loss. We GeoLeader requires an adjacency matrix, used GeoLeader to de-identify data from the which shows which areas are physically provincial birth registry of Ontario. We adjacent to one another. This only needs to compared the information loss for the whole be computed once, before spatial clustering. data set between the use of the GeoLeader algorithm and cropping of postal codes. The We only used the Kulldorff spatial scan lower the entropy the higher the quality of statistic but it has been shown to work very the resultant data set. well in practice. Our goal was to assess general performance under different Comparing outbreak cluster detection. We aggregation levels. conducted a simulation study to evaluate Example postal code and adjacent regions. outbreak cluster detection before and after using GeoLeader. Population distributions Contributing Authors were preserved, for more realistic results. We Khaled El Emam, Fida Dankar, Philip conducted a purely spatial analysis using a AbdelMalik, Grant Middleton, Luk Arbuckle, Bernouilli probability model based on the Sean Rose. Kulldorff scan statistic. Lines represent different numbers of max equivalence classes (aggregation and suppression is used to ensure this). Copyright 2013 CHEO Research Institute, 401 Smyth Rd., Ottawa, Canada, K1H8L1

Disclosing Small Geographic Areas while Protecting Privacy—GeoLeader

Recommended

Recommended

More Related Content

Similar to Disclosing Small Geographic Areas while Protecting Privacy—GeoLeader

Similar to Disclosing Small Geographic Areas while Protecting Privacy—GeoLeader (20)

Recently uploaded

Recently uploaded (20)

Disclosing Small Geographic Areas while Protecting Privacy—GeoLeader