SlideShare a Scribd company logo
1 of 21
Download to read offline
sdcSpatial: Privacy protected density maps
Edwin de Jonge @edwindjonge
Statistics Netherlands Research & Development
@edwindjonge
useR! 2019, July 11 2019
@edwindjonge #sdcSpatial
sdcSpatial: Privacy protected maps
@edwindjonge #sdcSpatial
sdcSpatial: Privacy protected maps
Takeout message: sdcSpatial has methods for:
·Creating a raster map: sdc_raster for pop density, value
density and mean density, using the excellent raster package
by Hijmans (2019).
·Finding out which locations are sensitive: plot_sensitive,
is_sensitive.
·Adjusting raster map for protecting data: protect_smooth,
protect_quadtree.
·Removing sensitive locations.
@edwindjonge #sdcSpatial
Who am I and why sdcSpatial?
·Statistical consultant, Data Scientist @cbs.nl / Statistics NL
·Statistics Netherlands is producer main official statistics in the
Netherlands:
− Stats on Demographics, economy (GDP), education,
environment, agriculture, Finance etc.
− Part of the European Statistical System, ESS.
Motivation for sdcSpatial
·ESS has European Code of Statistical Practice (predates
GDPR, European law on Data Protection):
no individual information may be revealed.
@edwindjonge #sdcSpatial
Sdc in sdcSpatial?
SDC = “Statistical Disclosure Control”
Collection of statistical methods to:
·Check if data is safe to be published
·Protect data by slightly altering (aggregated) data
− adding noise
− shifting mass
·Most SDC methods operate on records.
·sdcSpatial works upon locations.
@edwindjonge #sdcSpatial
Data
data(dwellings, package="sdcSpatial")
nrow(dwellings)
## [1] 90603
head(dwellings) # consumption/unemployed are simulated!
## x y consumption unemployed
## 1 149712 470104 2049.926 FALSE
## 2 149639 469906 1814.938 FALSE
## 3 149631 469888 2074.882 FALSE
## 4 149788 469831 1927.989 FALSE
## 5 149773 469834 2164.969 FALSE
## 6 149688 469898 1987.958 FALSE
@edwindjonge #sdcSpatial
Let’s create a sdc_raster
Creation:
library(sdcSpatial)
unemployed <- sdc_raster( dwellings[c("x", "y")] # realistic locations
, dwellings$unemployed # simulated data!
, r = 500 # raster resolution of 500m
, min_count = 10 # min support
)
What has been created?
print(unemployed)
## logical sdc_raster object:
## resolution: 500 500 , max_risk: 0.95 , min_count: 10
## mean sensitivity score [0,1]: 0.4249471
42% of the data on this map is sensitive!
@edwindjonge #sdcSpatial
What is sensitivity?
Binary score (logical) per raster cell indicating if it’s unsafe to
publish.
Calculated:
a) Per location (xi , yi ) (raster cell)
b) Using risk function disclosure_risk r(x, y) ∈ [0, 1]. How
accurate can an attacker estimate the value of an individual?
If r(xi , yi ) > max_risk then (xi , yi ) is sensitive.
c) Using a minimum number of observations.
If counti < min_count, then (xi , yi ) is sensitive.
@edwindjonge #sdcSpatial
Disclosure risks
External (numeric)
r(x, y) = max
vi
i∈(x,y) vi
with vi ∈ R
Discrete (logical)
r(x, y) =
1
n i∈(x,y)
vi with vi ∈ {0, 1}
@edwindjonge #sdcSpatial
Type of raster density maps:
(Stored in unemployed$value):
Density can be area-based:
·number of people per square ($count): population density.
·(total) value per square ($sum): number of unemployed per
square.
Or density can population-based:
·Mean value per square ($mean): unemployment rate per
square.
Note: All density types are valid, but (total) value per square
strongly interacts with population density.
(e.g. https://xkcd.com/1138).
@edwindjonge #sdcSpatial
Plotting a sdc_raster
plot(unemployed, "mean")
150000 154000 158000
460000465000470000
unemployed
0.0
0.2
0.4
0.6
0.8
1.0
150000 154000 158000
460000465000470000
sensitive
0.0
0.2
0.4
0.6
0.8
1.0
@edwindjonge #sdcSpatial
How to reduce sensitivity?
Options:
a) Use a coarser raster: sdc_raster.
b) Apply spatial smoothing: protect_smooth method by Wolf
and Jonge (2018), Jonge and Wolf (2016).
c) Aggregate sensitive cells hierarchically with a quad tree until
not sensitive: protect_quadtree method by Suñé et al.
(2017).
d) Remove sensitive locations: remove_sensitive.
@edwindjonge #sdcSpatial
Option: coarser raster
unemployed_1km <- sdc_raster( dwellings[c("x", "y")]
, dwellings$unemployed, r =1e3) # 1km!
plot(unemployed_1km, "mean")
150000 154000 158000 162000
455000460000465000470000
unemployed_1km
0.0
0.2
0.4
0.6
0.8
1.0
150000 154000 158000 162000
455000460000465000470000
sensitive
0.0
0.2
0.4
0.6
0.8
1.0
@edwindjonge #sdcSpatial
Option: Coarsening
Pros
·Simple and easy explainable
Cons
·Detailed spatial patterns are removed
·visually unattractive: “Blocky”
@edwindjonge #sdcSpatial
Option: KDE-smoothing
unemployed_smoothed <- protect_smooth(unemployed, bw = 1500)
plot(unemployed_smoothed, "mean")
150000 154000 158000
460000465000470000
unemployed_smoothed
0.0
0.2
0.4
0.6
0.8
150000 154000 158000
460000465000470000
sensitive
0.0
0.2
0.4
0.6
0.8
@edwindjonge #sdcSpatial
Options: KDE-smoothing
Pro’s
·Often enhances spatial pattern visualization, removing spatial
noise.
·Makes it a density map and used as source for e.g. contour
map.
Con’s
·Does not remove all sensitive values (depends on bandwidth
bw)
·A fixed band width is used for all locations: may remove
detailed patterns. . .
spatial processes often have location dependent band widths.
(= future work)
@edwindjonge #sdcSpatial
Option: Quad tree
unemployed_100m <- sdc_raster( dwellings[c("x","y")], dwellings$unemployed
, r = 100) # use a finer raster
unemployed_qt <- protect_quadtree(unemployed_100m)
plot(unemployed_qt)
150000 154000 158000
460000465000470000
unemployed_qt
0.0
0.2
0.4
0.6
0.8
150000 154000 158000
460000465000470000
sensitive
0.0
0.2
0.4
0.6
0.8
@edwindjonge #sdcSpatial
Option: Quad tree
Pro
·Adapts to data density
·Adjusts until no sensitive data is left.
Cons
·Visually: “Blocky” / “Mondrian-like” result.
@edwindjonge #sdcSpatial
Publish: visual interpolation
So in 5 lines we create a visual attractive map that is safe:
unemployed <- sdc_raster(dwellings[c("x","y")], dwellings$unemployed, r=500)
unemployed_smoothed <- protect_smooth(unemployed, bw = 1500)
unemployed_safe <- remove_sensitive(unemployed_smoothed)
mean_unemployed <- mean(unemployed_safe)
raster::filledContour(mean_unemployed, main="Unemployment rate")
0.0
0.2
0.4
0.6
0.8
150000 155000 160000
458000
460000
462000
464000
466000
468000
470000
Unemployment rate
@edwindjonge #sdcSpatial
The end
Thank you for your attention!
Questions?
Curious?
install.packages("sdcSpatial")
Feedback and suggestions?
https://github.com/edwindj/sdcSpatial/issues
@edwindjonge #sdcSpatial
References
Hijmans, Robert J. 2019. Raster: Geographic Data Analysis and
Modeling. https://CRAN.R-project.org/package=raster.
Jonge, Edwin de, and Peter-Paul de Wolf. 2016. “Spatial
Smoothing and Statistical Disclosure Control.” In Privacy in
Statistical Databases, edited by Josep Domingo-Ferrer and Mirjana
Pejić-Bach, 107–17. Springer.
Suñé, E., C. Rovira, D. Ibáñez, and M. Farré. 2017. “Statistical
Disclosure Control on Visualising Geocoded Population Data Using
Quadtrees.”
http://nt17.pg2.at/data/x_abstracts/x_abstract_286.docx.
Wolf, Peter-Paul de, and Edwin de Jonge. 2018. “Spatial
Smoothing and Statistical Disclosure Control.” In Privacy in
Statistical Databases - Psd 2018, edited by Josep Domingo-Ferrer
and Francisco Montes Suay. Springer.
@edwindjonge #sdcSpatial

More Related Content

What's hot

What's hot (12)

Making BIG DATA smaller
Making BIG DATA smallerMaking BIG DATA smaller
Making BIG DATA smaller
 
GRPHICS01 - Introduction to 3D Graphics
GRPHICS01 - Introduction to 3D GraphicsGRPHICS01 - Introduction to 3D Graphics
GRPHICS01 - Introduction to 3D Graphics
 
Implementation of ECC and ECDSA for Image Security
Implementation of ECC and ECDSA for Image SecurityImplementation of ECC and ECDSA for Image Security
Implementation of ECC and ECDSA for Image Security
 
Ics2311 l02 Graphics fundamentals
Ics2311 l02 Graphics fundamentalsIcs2311 l02 Graphics fundamentals
Ics2311 l02 Graphics fundamentals
 
Unit-2 raster scan graphics,line,circle and polygon algorithms
Unit-2 raster scan graphics,line,circle and polygon algorithmsUnit-2 raster scan graphics,line,circle and polygon algorithms
Unit-2 raster scan graphics,line,circle and polygon algorithms
 
Computer graphics notes watermark
Computer graphics notes watermarkComputer graphics notes watermark
Computer graphics notes watermark
 
Eigenfaces In Scala
Eigenfaces In ScalaEigenfaces In Scala
Eigenfaces In Scala
 
Computer Graphics - Introduction in Brief By: Prof. Manisha Waghmare- Butkar
Computer Graphics - Introduction in Brief By: Prof. Manisha Waghmare- ButkarComputer Graphics - Introduction in Brief By: Prof. Manisha Waghmare- Butkar
Computer Graphics - Introduction in Brief By: Prof. Manisha Waghmare- Butkar
 
Vc pred
Vc predVc pred
Vc pred
 
T01022103108
T01022103108T01022103108
T01022103108
 
Class[5][9th jul] [three js-meshes_geometries_and_primitives]
Class[5][9th jul] [three js-meshes_geometries_and_primitives]Class[5][9th jul] [three js-meshes_geometries_and_primitives]
Class[5][9th jul] [three js-meshes_geometries_and_primitives]
 
Visual Cryptography
Visual CryptographyVisual Cryptography
Visual Cryptography
 

Similar to sdcSpatial user!2019

VoxelNet
VoxelNetVoxelNet
VoxelNet
taeseon ryu
 
Implementation of bpcs steganography (synopsis)
Implementation of bpcs steganography (synopsis)Implementation of bpcs steganography (synopsis)
Implementation of bpcs steganography (synopsis)
Mumbai Academisc
 
A Tutorial On Ip 1
A Tutorial On Ip 1A Tutorial On Ip 1
A Tutorial On Ip 1
ankuredkie
 
Application of bpcs steganography to wavelet compressed video (synopsis)
Application of bpcs steganography to wavelet compressed video (synopsis)Application of bpcs steganography to wavelet compressed video (synopsis)
Application of bpcs steganography to wavelet compressed video (synopsis)
Mumbai Academisc
 
Implementation of bpsc stegnography ( synopsis)
Implementation of bpsc stegnography ( synopsis)Implementation of bpsc stegnography ( synopsis)
Implementation of bpsc stegnography ( synopsis)
Mumbai Academisc
 
A novel steganographic technique based on lsb dct approach by Mohit Goel
A novel steganographic technique based on lsb dct approach  by Mohit GoelA novel steganographic technique based on lsb dct approach  by Mohit Goel
A novel steganographic technique based on lsb dct approach by Mohit Goel
Mohit Goel
 
A STUDY AND ANALYSIS OF DIFFERENT EDGE DETECTION TECHNIQUES
A STUDY AND ANALYSIS OF DIFFERENT EDGE DETECTION TECHNIQUESA STUDY AND ANALYSIS OF DIFFERENT EDGE DETECTION TECHNIQUES
A STUDY AND ANALYSIS OF DIFFERENT EDGE DETECTION TECHNIQUES
cscpconf
 

Similar to sdcSpatial user!2019 (20)

Cgm Lab Manual
Cgm Lab ManualCgm Lab Manual
Cgm Lab Manual
 
V2 v posenet
V2 v posenetV2 v posenet
V2 v posenet
 
Cgm Lab Manual
Cgm Lab ManualCgm Lab Manual
Cgm Lab Manual
 
VoxelNet
VoxelNetVoxelNet
VoxelNet
 
HYBRIDIZATION OF DCT BASED STEGANOGRAPHY AND RANDOM GRIDS
HYBRIDIZATION OF DCT BASED STEGANOGRAPHY AND RANDOM GRIDSHYBRIDIZATION OF DCT BASED STEGANOGRAPHY AND RANDOM GRIDS
HYBRIDIZATION OF DCT BASED STEGANOGRAPHY AND RANDOM GRIDS
 
HYBRIDIZATION OF DCT BASED STEGANOGRAPHY AND RANDOM GRIDS
HYBRIDIZATION OF DCT BASED STEGANOGRAPHY AND RANDOM GRIDSHYBRIDIZATION OF DCT BASED STEGANOGRAPHY AND RANDOM GRIDS
HYBRIDIZATION OF DCT BASED STEGANOGRAPHY AND RANDOM GRIDS
 
Implementation of bpcs steganography (synopsis)
Implementation of bpcs steganography (synopsis)Implementation of bpcs steganography (synopsis)
Implementation of bpcs steganography (synopsis)
 
A Tutorial On Ip 1
A Tutorial On Ip 1A Tutorial On Ip 1
A Tutorial On Ip 1
 
Finger detection
Finger detectionFinger detection
Finger detection
 
Normalization cross correlation value of
Normalization cross correlation value ofNormalization cross correlation value of
Normalization cross correlation value of
 
Application of bpcs steganography to wavelet compressed video (synopsis)
Application of bpcs steganography to wavelet compressed video (synopsis)Application of bpcs steganography to wavelet compressed video (synopsis)
Application of bpcs steganography to wavelet compressed video (synopsis)
 
Implementation of bpsc stegnography ( synopsis)
Implementation of bpsc stegnography ( synopsis)Implementation of bpsc stegnography ( synopsis)
Implementation of bpsc stegnography ( synopsis)
 
A novel steganographic technique based on lsb dct approach by Mohit Goel
A novel steganographic technique based on lsb dct approach  by Mohit GoelA novel steganographic technique based on lsb dct approach  by Mohit Goel
A novel steganographic technique based on lsb dct approach by Mohit Goel
 
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
 
Interactive High-Dimensional Visualization of Social Graphs
Interactive High-Dimensional Visualization of Social GraphsInteractive High-Dimensional Visualization of Social Graphs
Interactive High-Dimensional Visualization of Social Graphs
 
Toward Accurate Data Analysis under Local Privacy
Toward Accurate Data Analysis under Local PrivacyToward Accurate Data Analysis under Local Privacy
Toward Accurate Data Analysis under Local Privacy
 
Digital Image Processing: Digital Image Fundamentals
Digital Image Processing: Digital Image FundamentalsDigital Image Processing: Digital Image Fundamentals
Digital Image Processing: Digital Image Fundamentals
 
A Secure Color Image Steganography in Transform Domain
A Secure Color Image Steganography in Transform Domain A Secure Color Image Steganography in Transform Domain
A Secure Color Image Steganography in Transform Domain
 
A Gesture Based Digital Art with Colour Coherence Vector Algorithm
A Gesture Based Digital Art with Colour Coherence Vector AlgorithmA Gesture Based Digital Art with Colour Coherence Vector Algorithm
A Gesture Based Digital Art with Colour Coherence Vector Algorithm
 
A STUDY AND ANALYSIS OF DIFFERENT EDGE DETECTION TECHNIQUES
A STUDY AND ANALYSIS OF DIFFERENT EDGE DETECTION TECHNIQUESA STUDY AND ANALYSIS OF DIFFERENT EDGE DETECTION TECHNIQUES
A STUDY AND ANALYSIS OF DIFFERENT EDGE DETECTION TECHNIQUES
 

More from Edwin de Jonge

StatMine (New Technologies and Techniques for Statistics)
StatMine (New Technologies and Techniques for Statistics)StatMine (New Technologies and Techniques for Statistics)
StatMine (New Technologies and Techniques for Statistics)
Edwin de Jonge
 

More from Edwin de Jonge (16)

Validatetools, resolve and simplify contradictive or data validation rules
Validatetools, resolve and simplify contradictive or data validation rulesValidatetools, resolve and simplify contradictive or data validation rules
Validatetools, resolve and simplify contradictive or data validation rules
 
Data error! But where?
Data error! But where?Data error! But where?
Data error! But where?
 
Daff: diff, patch and merge for data.frame
Daff: diff, patch and merge for data.frameDaff: diff, patch and merge for data.frame
Daff: diff, patch and merge for data.frame
 
Chunked, dplyr for large text files
Chunked, dplyr for large text filesChunked, dplyr for large text files
Chunked, dplyr for large text files
 
Uncertainty visualisation
Uncertainty visualisationUncertainty visualisation
Uncertainty visualisation
 
Heatmaps best practices Strata Hadoop
Heatmaps best practices Strata HadoopHeatmaps best practices Strata Hadoop
Heatmaps best practices Strata Hadoop
 
Docopt, beautiful command-line options for R, user2014
Docopt, beautiful command-line options for R,  user2014Docopt, beautiful command-line options for R,  user2014
Docopt, beautiful command-line options for R, user2014
 
Big data experiments
Big data experimentsBig data experiments
Big data experiments
 
StatMine
StatMineStatMine
StatMine
 
Big Data Visualization
Big Data VisualizationBig Data Visualization
Big Data Visualization
 
ffbase, statistical functions for large datasets
ffbase, statistical functions for large datasetsffbase, statistical functions for large datasets
ffbase, statistical functions for large datasets
 
Tabplotd3, interactive inspection of large data
Tabplotd3, interactive inspection of large dataTabplotd3, interactive inspection of large data
Tabplotd3, interactive inspection of large data
 
Big data as a source for official statistics
Big data as a source for official statisticsBig data as a source for official statistics
Big data as a source for official statistics
 
Statmine, Visuele dataexploratie
Statmine, Visuele dataexploratieStatmine, Visuele dataexploratie
Statmine, Visuele dataexploratie
 
StatMine (New Technologies and Techniques for Statistics)
StatMine (New Technologies and Techniques for Statistics)StatMine (New Technologies and Techniques for Statistics)
StatMine (New Technologies and Techniques for Statistics)
 
StatMine, visual exploration of output data
StatMine, visual exploration of output dataStatMine, visual exploration of output data
StatMine, visual exploration of output data
 

Recently uploaded

Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
Sérgio Sacani
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
Sérgio Sacani
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
RohitNehra6
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Lokesh Kothari
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
University of Hertfordshire
 

Recently uploaded (20)

Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 

sdcSpatial user!2019

  • 1. sdcSpatial: Privacy protected density maps Edwin de Jonge @edwindjonge Statistics Netherlands Research & Development @edwindjonge useR! 2019, July 11 2019 @edwindjonge #sdcSpatial
  • 2. sdcSpatial: Privacy protected maps @edwindjonge #sdcSpatial
  • 3. sdcSpatial: Privacy protected maps Takeout message: sdcSpatial has methods for: ·Creating a raster map: sdc_raster for pop density, value density and mean density, using the excellent raster package by Hijmans (2019). ·Finding out which locations are sensitive: plot_sensitive, is_sensitive. ·Adjusting raster map for protecting data: protect_smooth, protect_quadtree. ·Removing sensitive locations. @edwindjonge #sdcSpatial
  • 4. Who am I and why sdcSpatial? ·Statistical consultant, Data Scientist @cbs.nl / Statistics NL ·Statistics Netherlands is producer main official statistics in the Netherlands: − Stats on Demographics, economy (GDP), education, environment, agriculture, Finance etc. − Part of the European Statistical System, ESS. Motivation for sdcSpatial ·ESS has European Code of Statistical Practice (predates GDPR, European law on Data Protection): no individual information may be revealed. @edwindjonge #sdcSpatial
  • 5. Sdc in sdcSpatial? SDC = “Statistical Disclosure Control” Collection of statistical methods to: ·Check if data is safe to be published ·Protect data by slightly altering (aggregated) data − adding noise − shifting mass ·Most SDC methods operate on records. ·sdcSpatial works upon locations. @edwindjonge #sdcSpatial
  • 6. Data data(dwellings, package="sdcSpatial") nrow(dwellings) ## [1] 90603 head(dwellings) # consumption/unemployed are simulated! ## x y consumption unemployed ## 1 149712 470104 2049.926 FALSE ## 2 149639 469906 1814.938 FALSE ## 3 149631 469888 2074.882 FALSE ## 4 149788 469831 1927.989 FALSE ## 5 149773 469834 2164.969 FALSE ## 6 149688 469898 1987.958 FALSE @edwindjonge #sdcSpatial
  • 7. Let’s create a sdc_raster Creation: library(sdcSpatial) unemployed <- sdc_raster( dwellings[c("x", "y")] # realistic locations , dwellings$unemployed # simulated data! , r = 500 # raster resolution of 500m , min_count = 10 # min support ) What has been created? print(unemployed) ## logical sdc_raster object: ## resolution: 500 500 , max_risk: 0.95 , min_count: 10 ## mean sensitivity score [0,1]: 0.4249471 42% of the data on this map is sensitive! @edwindjonge #sdcSpatial
  • 8. What is sensitivity? Binary score (logical) per raster cell indicating if it’s unsafe to publish. Calculated: a) Per location (xi , yi ) (raster cell) b) Using risk function disclosure_risk r(x, y) ∈ [0, 1]. How accurate can an attacker estimate the value of an individual? If r(xi , yi ) > max_risk then (xi , yi ) is sensitive. c) Using a minimum number of observations. If counti < min_count, then (xi , yi ) is sensitive. @edwindjonge #sdcSpatial
  • 9. Disclosure risks External (numeric) r(x, y) = max vi i∈(x,y) vi with vi ∈ R Discrete (logical) r(x, y) = 1 n i∈(x,y) vi with vi ∈ {0, 1} @edwindjonge #sdcSpatial
  • 10. Type of raster density maps: (Stored in unemployed$value): Density can be area-based: ·number of people per square ($count): population density. ·(total) value per square ($sum): number of unemployed per square. Or density can population-based: ·Mean value per square ($mean): unemployment rate per square. Note: All density types are valid, but (total) value per square strongly interacts with population density. (e.g. https://xkcd.com/1138). @edwindjonge #sdcSpatial
  • 11. Plotting a sdc_raster plot(unemployed, "mean") 150000 154000 158000 460000465000470000 unemployed 0.0 0.2 0.4 0.6 0.8 1.0 150000 154000 158000 460000465000470000 sensitive 0.0 0.2 0.4 0.6 0.8 1.0 @edwindjonge #sdcSpatial
  • 12. How to reduce sensitivity? Options: a) Use a coarser raster: sdc_raster. b) Apply spatial smoothing: protect_smooth method by Wolf and Jonge (2018), Jonge and Wolf (2016). c) Aggregate sensitive cells hierarchically with a quad tree until not sensitive: protect_quadtree method by Suñé et al. (2017). d) Remove sensitive locations: remove_sensitive. @edwindjonge #sdcSpatial
  • 13. Option: coarser raster unemployed_1km <- sdc_raster( dwellings[c("x", "y")] , dwellings$unemployed, r =1e3) # 1km! plot(unemployed_1km, "mean") 150000 154000 158000 162000 455000460000465000470000 unemployed_1km 0.0 0.2 0.4 0.6 0.8 1.0 150000 154000 158000 162000 455000460000465000470000 sensitive 0.0 0.2 0.4 0.6 0.8 1.0 @edwindjonge #sdcSpatial
  • 14. Option: Coarsening Pros ·Simple and easy explainable Cons ·Detailed spatial patterns are removed ·visually unattractive: “Blocky” @edwindjonge #sdcSpatial
  • 15. Option: KDE-smoothing unemployed_smoothed <- protect_smooth(unemployed, bw = 1500) plot(unemployed_smoothed, "mean") 150000 154000 158000 460000465000470000 unemployed_smoothed 0.0 0.2 0.4 0.6 0.8 150000 154000 158000 460000465000470000 sensitive 0.0 0.2 0.4 0.6 0.8 @edwindjonge #sdcSpatial
  • 16. Options: KDE-smoothing Pro’s ·Often enhances spatial pattern visualization, removing spatial noise. ·Makes it a density map and used as source for e.g. contour map. Con’s ·Does not remove all sensitive values (depends on bandwidth bw) ·A fixed band width is used for all locations: may remove detailed patterns. . . spatial processes often have location dependent band widths. (= future work) @edwindjonge #sdcSpatial
  • 17. Option: Quad tree unemployed_100m <- sdc_raster( dwellings[c("x","y")], dwellings$unemployed , r = 100) # use a finer raster unemployed_qt <- protect_quadtree(unemployed_100m) plot(unemployed_qt) 150000 154000 158000 460000465000470000 unemployed_qt 0.0 0.2 0.4 0.6 0.8 150000 154000 158000 460000465000470000 sensitive 0.0 0.2 0.4 0.6 0.8 @edwindjonge #sdcSpatial
  • 18. Option: Quad tree Pro ·Adapts to data density ·Adjusts until no sensitive data is left. Cons ·Visually: “Blocky” / “Mondrian-like” result. @edwindjonge #sdcSpatial
  • 19. Publish: visual interpolation So in 5 lines we create a visual attractive map that is safe: unemployed <- sdc_raster(dwellings[c("x","y")], dwellings$unemployed, r=500) unemployed_smoothed <- protect_smooth(unemployed, bw = 1500) unemployed_safe <- remove_sensitive(unemployed_smoothed) mean_unemployed <- mean(unemployed_safe) raster::filledContour(mean_unemployed, main="Unemployment rate") 0.0 0.2 0.4 0.6 0.8 150000 155000 160000 458000 460000 462000 464000 466000 468000 470000 Unemployment rate @edwindjonge #sdcSpatial
  • 20. The end Thank you for your attention! Questions? Curious? install.packages("sdcSpatial") Feedback and suggestions? https://github.com/edwindj/sdcSpatial/issues @edwindjonge #sdcSpatial
  • 21. References Hijmans, Robert J. 2019. Raster: Geographic Data Analysis and Modeling. https://CRAN.R-project.org/package=raster. Jonge, Edwin de, and Peter-Paul de Wolf. 2016. “Spatial Smoothing and Statistical Disclosure Control.” In Privacy in Statistical Databases, edited by Josep Domingo-Ferrer and Mirjana Pejić-Bach, 107–17. Springer. Suñé, E., C. Rovira, D. Ibáñez, and M. Farré. 2017. “Statistical Disclosure Control on Visualising Geocoded Population Data Using Quadtrees.” http://nt17.pg2.at/data/x_abstracts/x_abstract_286.docx. Wolf, Peter-Paul de, and Edwin de Jonge. 2018. “Spatial Smoothing and Statistical Disclosure Control.” In Privacy in Statistical Databases - Psd 2018, edited by Josep Domingo-Ferrer and Francisco Montes Suay. Springer. @edwindjonge #sdcSpatial