Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

A Lifecycle Approach to Information Privacy


Published on

Presented at the Simons Foundation, March 2013

  • Be the first to comment

A Lifecycle Approach to Information Privacy

  1. 1. Prepared for:New Directions in the Science of Differential PrivacyMarch 2013A Lifecycle Approachto Information PrivacyMicah Altman<>Director of Research, MIT LibrariesNon-Resident Senior Fellow, Brookings InstitutionA Lifecycle Approach to Information Privacy
  2. 2. Collaborators*A Lifecycle Approach to Information Privacy• Privacy Tools for Sharing Research Data Project:Edo Airoldi, Stephen Chong, Merce Crosas, CynthiaDwork Gary King, Phil Malone, Latanya Sweeney, SalilVadhan• Research SupportThanks to, the National Science Foundation (award 1237235),the Sloan Foundation and the Massachusetts Institute ofTechnology, & Harvard University.* And co-conspirators
  3. 3. Related WorkReprints available• Comments on ANPRM: Human SubjectsProtection,• Privacy tools project proposal: Lifecycle Approach to Information Privacy
  4. 4. Overarching challengesA Lifecycle Approach to Information Privacy• Law is evolving– specification of technical requirements– new legal concepts – “Right to be forgotten”• Research is changing– evidence base shifting:reliant on big data, transactional data, new forms of data– conduct of research distributive, collaborative, multi-institutional, multi-national– Infrastructure is changing:cloud & distributed third-party computation & storage• privacy analysis is advancing– new computational privacy solution concepts– new findings from reidentification experiments– new methods for estimating utility/privacy tradeoffs
  5. 5. Shifting social science evidence baseHow to deidentify without destroying utility?• The “Netflix Problem”: large, sparse datasets that overlap can beprobabilistically linked [see Narayan and Shmatikov 2008]• The “GIS problem”: fine geo-spatial-temporal data very difficult tomask, when correlated with external data [see Zimmerman & Pavlik2008; Zan et al, 2013; Srivatsa & Hicks 2013]• The “Facebook Problem”: Possible to identify masked network data, if onlya few nodes controlled. [see Backstrom, et. al 2007]• The “Blog problem” : Pseudononymous communication can be linkedthrough textual analysis [see Novak,, Raghavan, and Tomkins 2004]Source: [Calberese 2008; Real Time Rome Project 2007]A Lifecycle Approach to Information Privacy
  6. 6. CUSP aims for the the Leading Edge• Urban Informatics –high-velocity localized social science• Leading edge data –sensors, crowd-sourcing• Leading edge privacy needs –privacy policy,privacy award information management,privacy ethicsA Lifecycle Approach to Information Privacy
  7. 7. Data InputOutput ApproachPublished Outputs* Jones * * 1961 021** Jones * * 1961 021** Jones * * 1972 9404** Jones * * 1972 9404** Jones * * 1972 9404*Modal Practice“The correlation between X andY was large and statisticallysignificant”Summary statisticsContingency tablePublic use sample microdataInformation VisualizationA Lifecycle Approach to Information Privacy
  8. 8. Questions Generated from Data I/O ModelSolution Concepts• Comparison of risksacross concepts• Extension of solutionconcepts rangeProcessing Stage• How to apply DP to new analyticmethods?– Bayesian methods– Data mining methods– Text analysis methods• How to apply DP to different types of“Microdata”– Network data– Text– Geospatial traces– Relations A Lifecycle Approach to Information PrivacyDisclosure Deterministic ProbabilisticIndividual Record LinkageK-anonymityReidentification probabilityGroupattributesK-anonymity +heterogeneity(e.g. l-diversityThreat analysisSDC on skewed magnitude tablesIndividualAttributesAttribute disclosure Differential privacyDistributional privacyBayesian-optimal privacyspecifiedcolumns/rowsPrivate Multiparty ComputationQuestions about transformation– Imputation methods– Computation efficiency– Informational utility*See for example:- Dwork & Smith 2009* “My, what a large ε you have, grandma!”
  9. 9. Information Life Cycle ModelA Lifecycle Approach to Information PrivacyCreation/CollectionStorage/IngestProcessingInternal SharingAnalysisExternaldissemination/publicationRe-useLong-termaccess
  10. 10. Legal/Policy FrameworksContract Intellectual PropertyAccessRights ConfidentialityCopyrightFair UseDMCADatabase RightsMoral RightsIntellectualAttributionTrade SecretPatentTrademarkCommon Rule45 CFR 26HIPAAFERPAEU Privacy DirectivePrivacyTorts(Invasion,Defamation)Rights ofPublicitySensitive butUnclassifiedPotentiallyHarmful(ArcheologicalSites,EndangeredSpecies, AnimalTesting, …)ClassifiedFOIACIPSEAStatePrivacy LawsEARState FOILawsJournalReplicationRequirementsFunder OpenAccessContractLicenseClick-WrapTOUITARExportRestrictionsA Lifecycle Approach to Information Privacy
  11. 11. Questions Generated by Lifecycle ModelA Lifecycle Approach to Information Privacy• Which laws apply to each stage:– are legal requirements consistentacross stages?• How to align legal instruments:– consent forms, SLA, DUA’s• Optimizing privacy risk/utility/costacross the research stages…when is it more efficient to…– apply disclosure limitation at datacollection stage?– Use particular solution concepts atparticular stages– Harmonize concepts/treatments acrossstages• Policy design– Policies to internalize future / publicstakeholder needs– Policy equilibrium under differentprivacy solution concepts• Information reuse– Bayesian priors– Scientific verification and replication• Infrastructure needs– Data acquisition, storage, dissemination– Identification, authorization,authentication– Metadata, protocolsCreation/CollectionStorage/IngestProcessingInternalSharingAnalysisExternaldissemination/publicationRe-useLong-termaccessResearchmethodsData ManagementSystemsLegal / PolicyFrameworks∂∂Statistical /ComputationalFrameworks
  12. 12. Questions on Differential Privacy fromInformation Lifecycle Analysis: Legal• Legal requirements -- when does law …– require exact answers? (DP does not give exact answers)– give safe harbor if linkages are ‘only’ probabilistic? (DP provides safeharbor in this case)– require action based on “actual knowledge”? (How do we include stronglyinformative priors in DP? When is DP not actually “worst case”?)– require analysis of a specific unit of observation? (DP does not giveanswers for individual units.)– require balance of privacy and utility (DP does not inherently balance, butuses minimax – maximizes utility subject to given privacy constraint. Whatis appropriate choice of privacy constraint?_• Legal instruments -- how to describe DP protections in a legallycoherent way for …– service level agreements– consent/deposit terms– data usage agreementsA Lifecycle Approach to Information Privacy
  13. 13. Questions on Differential Privacy fromInformation Lifecycle Analysis: System Design• System design: potential increased implementation cost of DP:– Information security -- hardening– Information security – certification & auditing– Model server development, provisioning, maintenance, reliability, availability• System design: information security tradeoffs of DP… Interactive systems have largervulnerability:– Availability risks: denial of service attack– Availability/integrity risks: privacy budget exhaustion attacks– Integrity risks: modification of delivered results (e.g. man-in-the-middle attacks)– Secrecy/privacy: breach of authentication/authorization layer• System design: optimizing privacy & utility across lifecycle– When does limiting disclosive data collection (e.g. using randomized response, group aggregatedmethods) dominate applying DP to data analysis stage– When does restricted virtual data enclaves + public synthetic data dominate public DP queries (ofsame type)• System design: Information reuse– How do you incorporate informative priors in DP privacy solution concept?(When does the “Terry Gross” problem apply?)– What’s required for ensuring scientific replication/verification of results produced by differentiallyprivate model servers?– How to do DP query on confidential data linked with externally provided microdata?A Lifecycle Approach to Information Privacy
  14. 14. Questions on Differential Privacy fromInformation Lifecycle Analysis: Policy Design• Policy design: “market failures” for privacy goods– Is their a market failure, how do we know?– What is the nature of the market failure:• Conditions on market structure/market power: Barriers to entry? Naturalmonopoly/network effect? First-mover advantage, path dependency?• Conditions on goods: excludability, rivalry, externality• Conditions on exchange: transaction costs, agency problems, boundedrationality, or informational asymmetry• Policy design: policy equlibria– When does enforcing a specific privacy concept yield sociallyoptimal solution?– When is DP a prisoner’s dilemma?(E.g. I contribute to a database for a small payment, since myunilateral entry does note effect result, but equilibrium is thatdatabase is largeand you learn substantially more about me than if itdatabase was small.)A Lifecycle Approach to Information Privacy
  15. 15. Urban Instrumentation and ConfidentialitySpecific data source• Administrative records• Transactions• Traffic• Health• Mobile phones• Microenvironment• CrowdsourcePossible nosy questions…• Were you fined?• What did you buy?• Where were you?• Are you sick?• How rich are you?• Do you have meth lab?A Lifecycle Approach to Information PrivacyCategories• Infrastructure• Environment• People• Community – self-identifiedneighborhood, school district,voting precinct, election district,police beat, crime locations,grocery prices, produce availabilityPrivacy implications• Business confidentiality• Security & safety – infrastructurechokepoints; police coverage;endangered species; animaltesting labs; environmentalhazards• Personal privacy
  16. 16. LawSocial SciencePublic PolicyData CollectionMethods(ResearchMethodology)Data Management(Information Science)StatisticsComputer Science• Privacy-aware data-managementsystems• Methods for confidential datacollection and managementInterdisciplinary Research RequiredLawSocialSciencePublic PolicyResearchMethodologyInformationScienceStatisticsComputerScience• Creative-Commons-like modular license pluginsfor privacy data use; consent; terms of service• Model legislation – for modern privacy concepts• Privacy requirements taxonomy andclassification• Game theoretic/social-choice models of socialprivacy equilibria under different privacypoliciesA Lifecycle Approach to Information Privacy
  17. 17. References• Backstrom, Lars, Cynthia Dwork, and Jon Kleinberg. "Wherefore art thou r3579x?: anonymized socialnetworks, hidden patterns, and structural steganography." Proceedings of the 16th internationalconference on World Wide Web. ACM, 2007• C. Dwork, A. Smith, 2009, “Differential Privacy for Statistics: What we Know and What we Want toLearn “, Journal of Privacy and Confidentiality (2009) 1(2) 135–154• Narayanan, Arvind, and Vitaly Shmatikov. "Robust de-anonymization of large sparsedatasets." Security and Privacy, 2008. SP 2008. IEEE Symposium on. IEEE, 2008.• Novak, Jasmine, Prabhakar Raghavan, and Andrew Tomkins. "Anti-aliasing on the web." Proceedingsof the 13th international conference on World Wide Web. ACM, 2004.• M Srivatsa and Mhi cks. 2012. Deanonymizing mobility traces: using social network as a side-channel.In Proceedings of the 2012 ACM conference on Computer and communications security (CCS 12).ACM, New York, NY, USA, 628-637. DOI=10.1145/2382196.2382262• Bin Zan, Zhanbo Sun, Macro Gruteser, and Xuegang Ban. 2013. Linking anonymous location tracesthrough driving characteristics. In Proceedings of the third ACM conference on Data and applicationsecurity and privacy (CODASPY 13). ACM, New York, NY, USA, 293-300.DOI=10.1145/2435349.2435391• Zimmerman, D. L., Pavlik, C. (2008). Quantifying the effects of mask metadata disclosure andmultiple releases on the confidentiality of geographically masked health data. Geographical Analysis40.1, 52 (25).A Lifecycle Approach to Information Privacy
  18. 18. DiscussionPersonal Web:micahaltman.comPrivacy Tools for Sharing Research micah_altman@alumni.brown.eduTwitter: @drmaltman