A Lifecycle Approach to Information Privacy

A Lifecycle Approach to Information Privacy



Presented at the Simons Foundation, March 2013

Presented at the Simons Foundation, March 2013



Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution-ShareAlike LicenseCC Attribution-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment
  • This work. by Micah Altman (http://micahaltman.com) is licensed under the Creative Commons Attribution-Share Alike 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.
  • ----- Meeting Notes (12/14/12 15:33) -----Common - law -- no probability , fail by showing lack direct of harm Public corporation data breaches -- stock law

A Lifecycle Approach to Information Privacy A Lifecycle Approach to Information Privacy Presentation Transcript

  • Prepared for:New Directions in the Science of Differential PrivacyMarch 2013A Lifecycle Approachto Information PrivacyMicah Altman<Micah_Altman@alumni.brown.edu>Director of Research, MIT LibrariesNon-Resident Senior Fellow, Brookings InstitutionA Lifecycle Approach to Information Privacy
  • Collaborators*A Lifecycle Approach to Information Privacy• Privacy Tools for Sharing Research Data Project:Edo Airoldi, Stephen Chong, Merce Crosas, CynthiaDwork Gary King, Phil Malone, Latanya Sweeney, SalilVadhan• Research SupportThanks to, the National Science Foundation (award 1237235),the Sloan Foundation and the Massachusetts Institute ofTechnology, & Harvard University.* And co-conspirators
  • Related WorkReprints available from:micahaltman.com• Comments on ANPRM: Human SubjectsProtection,http://dataprivacylab.org/projects/irb/Vadhan.pdf• Privacy tools project proposal:http://privacytools.seas.harvard.edu/full-project-descriptionA Lifecycle Approach to Information Privacy
  • Overarching challengesA Lifecycle Approach to Information Privacy• Law is evolving– specification of technical requirements– new legal concepts – “Right to be forgotten”• Research is changing– evidence base shifting:reliant on big data, transactional data, new forms of data– conduct of research distributive, collaborative, multi-institutional, multi-national– Infrastructure is changing:cloud & distributed third-party computation & storage• privacy analysis is advancing– new computational privacy solution concepts– new findings from reidentification experiments– new methods for estimating utility/privacy tradeoffs
  • Shifting social science evidence baseHow to deidentify without destroying utility?• The “Netflix Problem”: large, sparse datasets that overlap can beprobabilistically linked [see Narayan and Shmatikov 2008]• The “GIS problem”: fine geo-spatial-temporal data very difficult tomask, when correlated with external data [see Zimmerman & Pavlik2008; Zan et al, 2013; Srivatsa & Hicks 2013]• The “Facebook Problem”: Possible to identify masked network data, if onlya few nodes controlled. [see Backstrom, et. al 2007]• The “Blog problem” : Pseudononymous communication can be linkedthrough textual analysis [see Novak,, Raghavan, and Tomkins 2004]Source: [Calberese 2008; Real Time Rome Project 2007]A Lifecycle Approach to Information Privacy
  • CUSP aims for the the Leading Edge• Urban Informatics –high-velocity localized social science• Leading edge data –sensors, crowd-sourcing• Leading edge privacy needs –privacy policy,privacy award information management,privacy ethicsA Lifecycle Approach to Information Privacy
  • Data InputOutput ApproachPublished Outputs* Jones * * 1961 021** Jones * * 1961 021** Jones * * 1972 9404** Jones * * 1972 9404** Jones * * 1972 9404*Modal Practice“The correlation between X andY was large and statisticallysignificant”Summary statisticsContingency tablePublic use sample microdataInformation VisualizationA Lifecycle Approach to Information Privacy
  • Questions Generated from Data I/O ModelSolution Concepts• Comparison of risksacross concepts• Extension of solutionconcepts rangeProcessing Stage• How to apply DP to new analyticmethods?– Bayesian methods– Data mining methods– Text analysis methods• How to apply DP to different types of“Microdata”– Network data– Text– Geospatial traces– Relations A Lifecycle Approach to Information PrivacyDisclosure Deterministic ProbabilisticIndividual Record LinkageK-anonymityReidentification probabilityGroupattributesK-anonymity +heterogeneity(e.g. l-diversityThreat analysisSDC on skewed magnitude tablesIndividualAttributesAttribute disclosure Differential privacyDistributional privacyBayesian-optimal privacyspecifiedcolumns/rowsPrivate Multiparty ComputationQuestions about transformation– Imputation methods– Computation efficiency– Informational utility*See for example:- Dwork & Smith 2009* “My, what a large ε you have, grandma!”
  • Information Life Cycle ModelA Lifecycle Approach to Information PrivacyCreation/CollectionStorage/IngestProcessingInternal SharingAnalysisExternaldissemination/publicationRe-useLong-termaccess
  • Legal/Policy FrameworksContract Intellectual PropertyAccessRights ConfidentialityCopyrightFair UseDMCADatabase RightsMoral RightsIntellectualAttributionTrade SecretPatentTrademarkCommon Rule45 CFR 26HIPAAFERPAEU Privacy DirectivePrivacyTorts(Invasion,Defamation)Rights ofPublicitySensitive butUnclassifiedPotentiallyHarmful(ArcheologicalSites,EndangeredSpecies, AnimalTesting, …)ClassifiedFOIACIPSEAStatePrivacy LawsEARState FOILawsJournalReplicationRequirementsFunder OpenAccessContractLicenseClick-WrapTOUITARExportRestrictionsA Lifecycle Approach to Information Privacy
  • Questions Generated by Lifecycle ModelA Lifecycle Approach to Information Privacy• Which laws apply to each stage:– are legal requirements consistentacross stages?• How to align legal instruments:– consent forms, SLA, DUA’s• Optimizing privacy risk/utility/costacross the research stages…when is it more efficient to…– apply disclosure limitation at datacollection stage?– Use particular solution concepts atparticular stages– Harmonize concepts/treatments acrossstages• Policy design– Policies to internalize future / publicstakeholder needs– Policy equilibrium under differentprivacy solution concepts• Information reuse– Bayesian priors– Scientific verification and replication• Infrastructure needs– Data acquisition, storage, dissemination– Identification, authorization,authentication– Metadata, protocolsCreation/CollectionStorage/IngestProcessingInternalSharingAnalysisExternaldissemination/publicationRe-useLong-termaccessResearchmethodsData ManagementSystemsLegal / PolicyFrameworks∂∂Statistical /ComputationalFrameworks
  • Questions on Differential Privacy fromInformation Lifecycle Analysis: Legal• Legal requirements -- when does law …– require exact answers? (DP does not give exact answers)– give safe harbor if linkages are ‘only’ probabilistic? (DP provides safeharbor in this case)– require action based on “actual knowledge”? (How do we include stronglyinformative priors in DP? When is DP not actually “worst case”?)– require analysis of a specific unit of observation? (DP does not giveanswers for individual units.)– require balance of privacy and utility (DP does not inherently balance, butuses minimax – maximizes utility subject to given privacy constraint. Whatis appropriate choice of privacy constraint?_• Legal instruments -- how to describe DP protections in a legallycoherent way for …– service level agreements– consent/deposit terms– data usage agreementsA Lifecycle Approach to Information Privacy
  • Questions on Differential Privacy fromInformation Lifecycle Analysis: System Design• System design: potential increased implementation cost of DP:– Information security -- hardening– Information security – certification & auditing– Model server development, provisioning, maintenance, reliability, availability• System design: information security tradeoffs of DP… Interactive systems have largervulnerability:– Availability risks: denial of service attack– Availability/integrity risks: privacy budget exhaustion attacks– Integrity risks: modification of delivered results (e.g. man-in-the-middle attacks)– Secrecy/privacy: breach of authentication/authorization layer• System design: optimizing privacy & utility across lifecycle– When does limiting disclosive data collection (e.g. using randomized response, group aggregatedmethods) dominate applying DP to data analysis stage– When does restricted virtual data enclaves + public synthetic data dominate public DP queries (ofsame type)• System design: Information reuse– How do you incorporate informative priors in DP privacy solution concept?(When does the “Terry Gross” problem apply?)– What’s required for ensuring scientific replication/verification of results produced by differentiallyprivate model servers?– How to do DP query on confidential data linked with externally provided microdata?A Lifecycle Approach to Information Privacy
  • Questions on Differential Privacy fromInformation Lifecycle Analysis: Policy Design• Policy design: “market failures” for privacy goods– Is their a market failure, how do we know?– What is the nature of the market failure:• Conditions on market structure/market power: Barriers to entry? Naturalmonopoly/network effect? First-mover advantage, path dependency?• Conditions on goods: excludability, rivalry, externality• Conditions on exchange: transaction costs, agency problems, boundedrationality, or informational asymmetry• Policy design: policy equlibria– When does enforcing a specific privacy concept yield sociallyoptimal solution?– When is DP a prisoner’s dilemma?(E.g. I contribute to a database for a small payment, since myunilateral entry does note effect result, but equilibrium is thatdatabase is largeand you learn substantially more about me than if itdatabase was small.)A Lifecycle Approach to Information Privacy
  • Urban Instrumentation and ConfidentialitySpecific data source• Administrative records• Transactions• Traffic• Health• Mobile phones• Microenvironment• CrowdsourcePossible nosy questions…• Were you fined?• What did you buy?• Where were you?• Are you sick?• How rich are you?• Do you have meth lab?A Lifecycle Approach to Information PrivacyCategories• Infrastructure• Environment• People• Community – self-identifiedneighborhood, school district,voting precinct, election district,police beat, crime locations,grocery prices, produce availabilityPrivacy implications• Business confidentiality• Security & safety – infrastructurechokepoints; police coverage;endangered species; animaltesting labs; environmentalhazards• Personal privacy
  • LawSocial SciencePublic PolicyData CollectionMethods(ResearchMethodology)Data Management(Information Science)StatisticsComputer Science• Privacy-aware data-managementsystems• Methods for confidential datacollection and managementInterdisciplinary Research RequiredLawSocialSciencePublic PolicyResearchMethodologyInformationScienceStatisticsComputerScience• Creative-Commons-like modular license pluginsfor privacy data use; consent; terms of service• Model legislation – for modern privacy concepts• Privacy requirements taxonomy andclassification• Game theoretic/social-choice models of socialprivacy equilibria under different privacypoliciesA Lifecycle Approach to Information Privacy
  • References• Backstrom, Lars, Cynthia Dwork, and Jon Kleinberg. "Wherefore art thou r3579x?: anonymized socialnetworks, hidden patterns, and structural steganography." Proceedings of the 16th internationalconference on World Wide Web. ACM, 2007• C. Dwork, A. Smith, 2009, “Differential Privacy for Statistics: What we Know and What we Want toLearn “, Journal of Privacy and Confidentiality (2009) 1(2) 135–154• Narayanan, Arvind, and Vitaly Shmatikov. "Robust de-anonymization of large sparsedatasets." Security and Privacy, 2008. SP 2008. IEEE Symposium on. IEEE, 2008.• Novak, Jasmine, Prabhakar Raghavan, and Andrew Tomkins. "Anti-aliasing on the web." Proceedingsof the 13th international conference on World Wide Web. ACM, 2004.• M Srivatsa and Mhi cks. 2012. Deanonymizing mobility traces: using social network as a side-channel.In Proceedings of the 2012 ACM conference on Computer and communications security (CCS 12).ACM, New York, NY, USA, 628-637. DOI=10.1145/2382196.2382262http://doi.acm.org/10.1145/2382196.2382262• Bin Zan, Zhanbo Sun, Macro Gruteser, and Xuegang Ban. 2013. Linking anonymous location tracesthrough driving characteristics. In Proceedings of the third ACM conference on Data and applicationsecurity and privacy (CODASPY 13). ACM, New York, NY, USA, 293-300.DOI=10.1145/2435349.2435391 http://doi.acm.org/10.1145/2435349.2435391• Zimmerman, D. L., Pavlik, C. (2008). Quantifying the effects of mask metadata disclosure andmultiple releases on the confidentiality of geographically masked health data. Geographical Analysis40.1, 52 (25).A Lifecycle Approach to Information Privacy
  • DiscussionPersonal Web:micahaltman.comPrivacy Tools for Sharing Research Data:privacytools.seas.harvard.edu/E-mail: micah_altman@alumni.brown.eduTwitter: @drmaltman