Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Open ended data

Characterizing data utility and arguing for required underlying needs for sharing private data in an open-ended fashion.

  • Be the first to comment

  • Be the first to like this

Open ended data

  1. 1. A case for “open-ended” data Srinath Srinivasa, Web Science Lab, IIIT Bangalore
  2. 2. Open-data concerns Utilization of data in a social system is influenced by three primary concerns: Transparency, Privacy and Security Open-data initiatives (like focus on data elements that promote transparency, and exclude data that infringes on privacy (PII) and/or are sensitive towards (national) security. Transparency Privacy Security
  3. 3. Open-data concerns Data elements that are critical for transparency concerns are called “open data.” Data elements that can potentially compromise collective security and have to be tightly controlled, are called “closed data.” These are typically managed in the form of shared secrets. Private data is critical to the safety and well-being of individuals. But it may sometimes need to be disseminated in an “open-ended” fashion (i.e. not in the control of the owner of the data.) Transparency Privacy Security Open data Open- ended data Closed data (shared secrets)
  4. 4. Open-ended data Private data that may need to be disseminated in an “open-ended” fashion Open-ended means: ● Owner of data may not have knowledge of all recipients ● Owner of data may not be able to unilaterally control dissemination Examples: Dissemination of Aadhar details of a person to different state and non-state stakeholders by organizations Sharing of medical records among hospitals Sharing of exam records among universities Open-ended data dissemination is critically dependent on the data dissemination framework and the credibility of its decisions
  5. 5. Regulations for private data dissemination EU GDPR ● Right to access ● Right to be forgotten ● Privacy by design ● Data protection officers ● Breach notification regulations ● Data portability rights Indian data protection act (white paper) ● Technology agnosticism ● Holistic application (uniformity of legal framework) ● Informed consent ● Data minimization (no soliciting extraneous data) ● Controller accountability ● Structured enforcement ● Deterrent penalties
  6. 6. Characterizing Data Utility Context of Utility Utility of data is typically bounded within specific contexts. Taken out of context, the data element(s) may lose their utility. Stakeholder capacity Utility is not an objective characteristic of data -- but a characteristic of the association between the data and the stakeholder capacity. Divergent Aggregation of Utility A given collection of data elements may be aggregated in different ways for different utilitarian contexts. There is no “one” correct aggregation. Confounding of Utilities The utilization of data by one stakeholder may (positively or negatively) impact other stakeholders.
  7. 7. Characterizing Data Utility (Examples) Context of Utility Applicability of 5% GST is limited to specific contexts (restaurants, not even catering). Stakeholder capacity Data about JEE cut-off marks for admission may be useless to a layperson, but very utilitarian to a student applying for engineering. Divergent Aggregation of Utility Open data about weather can be utilized in different contexts for different purposes (agriculture, aviation, traffic management, etc.) Confounding of Utilities Utilizing data about a person’s medical condition by an advertiser may result in negative utility for the person. (The Target pregnancy ad example).
  8. 8. Many Worlds on a Frame (MWF) A knowledge representation framework for publication and open-ended dissemination of private data. Essential building blocks: ● Worlds ● Actors ● Resources XIIT Raju Raju Role Table NIRF toRole from:Raju to:Role from:XIIT
  9. 9. Many Worlds on a Frame Resource ● Refers to all forms of data elements that are published and consumed in a technology agnostic fashion Actor ● Refers to consumers or producers of data. May be a human user or an application. All actors have login credentials or access keys to enter the Frame World ● Refers to a semantic boundary in which certain data are relevant, and can be published and consumed by legitimate actors in appropriate capacity
  10. 10. Many Worlds on a Frame Actors and Worlds Every actor has an associated world with the same name Actors publish and consume data only from their worlds Data flow between worlds managed by worlds “participating” or “playing roles” in other worlds Raju Raju to:XIIT
  11. 11. Many Worlds on a Frame Participations A world participating in another world, is said to be playing a “role” in the other world. Each Role definition exports an “Interface” that can be used to publish or consume data via that role. When data elements are published or accessed via a role, then that operation is said to have taken place in the “capacity” of that role. World Role Table Role | Interface | Players Privileges Table Role | Constraints | Privileges
  12. 12. Many Worlds on a Frame Participations Example XIIT participates in NIRF in the role of “Affiliate Institution” Through this role, it can interact with NIRF data using the following interfaces: getRankData(), uploadApplication() XIIT also participates in the role of “Mentor Institution” in NIRF using which, it can access the following interfaces: getMembers(), uploadReview() XIIT can hence interact with data in the NIRF world in two capacities: Affiliate Institution and Member Institution, with different privileges. NIRF Role Table Role | Interface | Players Affiliate | getRankdata() | XIIT Instt uploadApplication() Mentor | getMembers() | XIIT Instt uploadReview()
  13. 13. Many Worlds on a Frame Participations Every Role is associated with it, a set of “privileges” and “constraints” Constraints are represented in the form of required participations. Example: the role “Affiliate Institution” in NIRF may have the constraint “Recognized Institution” in the world called UGC. That is, only worlds that are “Recognized Institutions” in UGC are eligible to play the role of “Affiliate Institution” in NIRF. The set of privileges cover various aspects of the system operations like, create worlds, edit worlds, add data, read data, delete data, represent worlds, grant privileges, etc. World Role Table Role | Interface | Players Privileges Table Role | Constraints | Privileges
  14. 14. Many Worlds on a Frame Representations Actors (users or application programs) are associated with their own worlds, which they represent fully Based on the roles they play in other worlds, they may represent those worlds in its participation Example: Raju plays the role of Director in world XIIT. The Director role (highlighted in Red) allows Raju to access the NIRF world in the capacity of “Mentor Instt” by acting as a representative of XIIT. Raju (the user or application program) now has access to the interfaces for “Mentor Instt” exported by NIRF. Bala, who plays the role of Dean at XIIT, can access NIRF in the capacity of “Affiliate Instt” by representing XIIT. XIIT Privileges Table Role | Constraints | Privileges Admin | | :all Chairman| | :represent(:all) Director | | :represent(Mentor Instt, NIRF) Dean | | :represent(Affiliate Instt, NIRF) Raju Raju to:XIIT Director Bala Bala to:XIIT Dean
  15. 15. Resource Tagging The simplest interface for a Role are get() and put() functions. The get() function for role_id r in world w, gets all resources from the target world that are tagged to:r in w, and will be locally tagged as from:w The put() function for role_id r, uploads all resources to the target world, which are locally tagged as to:r Many Worlds on a Frame Bots Bots are virtual actors associated with worlds that can represent the world in some or all roles. The function of bots is to represent the world in all other worlds where it is playing a role, by calling the interface functions.
  16. 16. Many Worlds on a Frame Worlds can be located-in or contained-in another world -- different from playing a role Containment has following semantics. If world w is contained in world c then: ● All role players of c are entitled to at least the same roles and privileges in w ● If world c is inaccessible or invisible for actor a, then w and all worlds contained in c are also inaccessible or invisible to a. For any installation of MWF, there is an overarching container world (usually called UoD or Universe of Discourse). IISc NIAS
  17. 17. MWF Grid An MWF grid is created over multiple installations or “sites” The main site has the UoD which is not contained in any other world All other sites (called grid nodes) have their top-most container world, itself being contained in one of the existing worlds in an existing site. Main site Grid node UoD W
  18. 18. Provenance All member sites of an MWF grid are part of a distributed ledger system (blockchain) that maintain a copy of transaction logs Each transaction entry contains at least the following information: ● Nature of the transaction ● World(s) involved in the transaction ● Resource(s) involved in the transaction ● Actor(s) involved in the transaction ● Capacity in which the transaction was performed ● Outcome of the transaction Image Source: Wikipedia
  19. 19. MWF and GDPR ● Right to access ○ Actors publish data in their own worlds and provide access by means of playing roles. (Further dissemination of their data currently only accessible via transaction logs) ● Right to be forgotten ○ While worlds can discontinue their roles, MWF (as yet) does not factor right to be forgotten of older data ● Privacy by design ○ Check ● Data protection officers ○ Implemented by means of roles ● Breach notification regulations ○ Can be implemented on top of provenance logging ● Data portability rights ○ Applies naturally to MWF since all data pertinent to a person are managed in their world and can be ported based on their participations
  20. 20. MWF and Indian Data Protection Act ● Technology agnosticism ○ Check (MWF is a formal, technology agnostic model) ● Holistic application ○ Check (common framework for different kinds of worlds) ● Informed consent ○ Check (User data stored in their world, and shared based on participation through informed consent) ● Data minimization (no soliciting extraneous data) ○ Check (Role interfaces) ● Controller accountability ○ Check (Enforceable by logging capacity and provenance) ● Structured enforcement ○ Check (World containment provides scalable semantics for structured enforcement and jurisdictions) ● Deterrent penalties ○ Can be implemented as a layer over MWF
  21. 21. Conclusions Three concerns of data sharing: Transparency, Privacy and Security leads to three modalities of openness: Open, Open-ended and Closed data MWF as a scalable formalism for open-ended dissemination of data Current projects implementing MWF: ● RootSet ( ○ Single node implementation of deprecated version of MWF ● Sandesh ( ○ Single node MWF as an underlying formalism for semantic integration of open data ● Open City ○ Ongoing PoC project using MWF as a data-exchange platform for smart city implementations