Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Open Data is Not Enough: Making Data Sharing Work


Published on

Latest version of this talk, presented at the American Chemistry Society meeting

Published in: Data & Analytics
  • Data sharing among parties requires a zero-trust decentralized approach that prevents any party from misusing securely commingled data, but allows all parties to perform analytics on same. This involves removing any user access to the raw commingled data that is the root-cause of data breach, risk and misuse.
    Are you sure you want to  Yes  No
    Your message goes here

Open Data is Not Enough: Making Data Sharing Work

  1. 1. Unless otherwise noted, the slides in this presentation are licensed by Mark A. Parsons under a Creative Commons Attribution-Share Alike 3.0 License Open Data is Not Enough Making Data Sharing Work Mark A. Parsons 0000-0002-7723-0950 Secretary General American Chemical Society San Diego, California, USA 13 March 2016
  2. 2. All of society’s grand challenges require diverse (often large) data to be shared and integrated across cultures, scales, and technologies.
  3. 3. Research Data Alliance Vision Researchers and innovators openly share data across technologies, disciplines, and countries to address the grand challenges of society. Mission RDA builds the social and technical bridges that enable open sharing of data.
  4. 4. Dynamics of Infrastructure Edwards, et al. 2007 Understanding Infrastructure: Dynamics, Tensions, and Design. • Infrastructures become “ubiquitous, accessible, reliable, and transparent” as they mature. • Systems Networks Inter-networks • “system-building, characterized by the deliberate and successful design of technology-based services.” • “technology transfer across domains and locations results in variations on the original design, as well as the emergence of competing systems.” • Finally, “a process of consolidation characterized by gateways that allow dissimilar systems to be linked into networks.”
  5. 5. Not what, but When is infrastructure?
  6. 6. Not what, but When and Who is infrastructure?
  7. 7. Bridges and Gateways Gateways are often wrongly understood as “technologies,” i.e. hardware or software alone. A more accurate approach conceives them as combining a technical solution with a social choice, i.e. a standard, both of which must be integrated into existing users’ communities of practice. Because of this, gateways rarely perform perfectly.
 — Edwards et al. 2007
  8. 8. Infrastructure is Relationships, interactions, and connections between people, technologies, and institutions
  9. 9. Fran Berman, Research Data Alliance “Create - Adopt - Use” (in 12-18 months) Systems Interoperability Adopted Policy Sustainable Economics Common Types, 
 Standards, Metadata Traffic Image: 
 Mike Gonzalez Adopted Community Practice Training, Education, Workforce
  10. 10. Shared Principles • Openness • Consensus • Balance • Harmonization • Community Driven • Non-profit
  11. 11. Solving the problem must include adopters in the process. Image courtesy
  12. 12. Open problem solving is key. Figure courtesy
  13. 13. No defined architecture. Architecture figure courtesy
  14. 14. South America 1% North America 34% Europe 49% Australasia 4% Asia 9% Africa 3% Organizational Type Members
 (Feb 2016) Press & Media 22 Policy/Funding Agency 58 Large Enterprise 85 IT Consultancy/Development 119 Small and Medium Enterprise 212 Other 198 Government/Public Services 583 Academia/Research 2447 TOTAL 3724 The RDA Community:
 3700+ members from 110 countries (February 2016) May - July Aug - Oct Nov - Jan Feb - Apr May - July Aug - Oct Nov - Jan Feb -Apr May - July Aug -Oct Nov - Jan Feb- Apr 392 991 1274 1656 2048 2404 2636 2881 3126 3434 3698 3724 60+ Working and Interest Groups
  15. 15. RDA Organisational Members RDA Affiliate Members RDA Organisational & Affiliate members Represent the interests of RDA’s organisational members and ensure that their input and needs play a role in guiding the programs and activities of the RDA.
  16. 16. Fran Berman, Research Data Alliance RDA: Accelerate Data Sharing and Interoperability Across Cultures, Communities, 
 Scales, Technologies ▪ Technical parts of the data engine: ▪ Data type registries reference model ▪ Wheat data interoperability framework ▪ Rules of the road: ▪ Common agreement on data citation ▪ Common practice for data repositories ▪ Principles of legal interoperability ▪ Better drivers • Summer schools in data science and cloud computing in the developing world (with CODATA) • Active data management plan development and monitoring Policy and Practice Systems Interoperability Sustainable Economics Common Types, 
 Standards, Metadata Training, Education, Workforce
  17. 17. Working Glocally—Bridging across scales Glocalization “means the simultaneity—the co-presence— of both universalizing and and particularizing tendencies.” — Roland Robertson Glocalism is playing at multiple scales at once.
  18. 18. The Wheat Data Interoperability WG Active members: Alaux Michael (INRA, France), Aubin Sophie (INRA, France), Arnaud Elizabeth (Bioversity, France), Baumann Ute (Adelaide Uni, Australia), Buche Patrice (INRA, France), Cooper Laurel (Planteome, USA), Fulss Richard (CIMMYT, Mexico), Hologne Odile (INRA, France), Laporte Marie-Angélique (Bioversity, France), Larmand Pierre (IRD, France), Letellier Thomas (INRA, France), Lucas Hélène (INRA, France), Pommier Cyril (INRA, France), Protonotarios Vassilis (Agro-Know, Greece), Quesneville Hadi (INRA, France), Shrestha Rosemary (INRA, France), Subirats Imma (FAO of the United Nations, Italy), Aravind Venkatesan (IBC, France), Whan Alex (CSIRO, Australia) Co-chairs: Esther Dzalé Yeumo Kaboré (INRA, France), Richard Allan Fulss (CIMMYT, Mexico) Aims: contribute to the improvement of Wheat related data interoperability by Building a common interoperability framework (metadata, data formats and vocabularies) Providing guidelines for describing, representing and linking Wheat related data Contributors Sponsors slide courtesy Esther Dzalé
  19. 19. Guidelines ( Data exchange formats Example: VCF (Variant Call Format) for sequence variation data, GFF3 for genome annotation data, etc. Data description best practices Consistent use of ontologies, consistent use of external database cross references Data sharing best practices Share data matrices along with relevant metadata (example: trait along with method, units and scales or environmental ones) Useful tools and use cases that highlight data formats and vocabularies issues A portal of wheat related ontologies and vocabularies ( Allows the access to the ontologies and vocabularies through APIs. A prototype Implementation of use cases of wheat data integration within the AgroLD (Agronomic Linked Data) tool: The deliverables slide courtesy Esther Dzalé
  20. 20. RDA Chemistry Data Interest Group • Involves International Union of Pure & Applied Chemistry • Building connections to instrument makers • Connecting to other RDA groups including Materials Data and Data Citation • Planning Working Groups
  21. 21. Some themes amidst the difference 1. Persistent Identifiers for data, documents, people, organisations, instruments—Everything! 2. Certifying Trust in assertions, evidence, organisations, processes… 3. The value of Conversations, Relationships, and Mediation — an agile network effect.
  22. 22. ‹#› An Area of Convergence and Agreement Internet Domain nodes with IP numbers packages being exchanged standardized protocols Data Domain objects with PID numbers objects being exchanged standardized protocols Slide courtesy P. Wittenberg from L. Lannom from D. Clark
  23. 23. Some themes amidst the difference 1. Persistent Identifiers for data, documents, people, organisations, instruments—Everything! 2. Certifying Trust in assertions, evidence, organisations, processes… 3. The value of Conversations, Relationships, and Mediation — an agile network effect.
  24. 24. Increasing Complexity of Mediation From: C. Borgman, 2008, NSF Cyberlearning Report
  25. 25. Some themes amidst the difference 1. Persistent Identifiers for data, documents, people, organisations, instruments—Everything! 2. Certifying Trust in assertions, evidence, organisations, processes… 3. The value of Conversations, Relationships, and Mediation — an agile network effect. Trust
  26. 26. • When or do we need to certify trust? Do we? • We must preserve the freedom to tinker. • Build in decentralization where possible. Any centralization must be community governed. • Trust is built through • shared experience— e.g., RDA Plenaries • shared perspectives — RDA is a forum for engagement and constructive disagreement • actual reuse and adoption — in RDA consensus is defined through use. • sustained performance — RDA seeks to build a broad coalition of international support Some amateur thoughts on trust and sharing and infrastructure
  27. 27. Getting involved Individuals ✓Observers ✓Contributors ✓Drivers 31 Organisations ✓ Insight ✓ Adopt ✓ Drive National level ✓ Coordination & Knowledge Exchange, Strategy & / or Implementation • Members • WGs-IGs-BoFs • Requests for Comments • Plenaries • Member • WGs-IGs-BoFs • RfCs • Funded projects • Adoption / Uptake • Papers & Events • Meetings & Fora • Training & Workshops • Uptake pilots
  28. 28. 12-16 September 2016 in Denver, Colorado, USA
  29. 29. Info: @resdatall

  30. 30. 34RDA Interest (IG) and Working Groups (WG) by Focus 1
  31. 31. 35 RDA Interest (IG) and Working Groups (WG) by Focus 2