The NIH Data Commons
Digital Ecosystems for using and sharing FAIR Data
Vivien Bonazzi, Ph.D.
Senior Advisor for Data Science
Office of Data Science (ADDS)
National Institutes of Health
The Data Commons
is a platform
that fosters the development of a digital ecosystem
That digital ecosystem allows
transactions to occur on FAIR data
at scale
Data Commons is a Platform that fosters development of a digital Ecosystem
Treats products of research – data, software, methods, papers etc as a
digital asset (object)
Digital objects need to conform to FAIR principles
- Findable, Accessible, Interoperable, Reproducible
Digital objects exist in a shared virtual space (initial)
- Find, Deposit, Manage, Share and Reuse: digital assets
Enables interactions between Producers and Consumers of digital assets
Gives currency to digital assets and the people who develop and support
them
To understand the
Data Commons Platform
(and how it works for biomedical data)
we need to use a Platform stack
to help visualize the concept
NIH Data Commons - Platform Stack
https://datascience.nih.gov/commons
https://datascience.nih.gov/commons
NIH Data Commons - Platform Stack
NIH Data Commons - Platform Stack
Digital Market Place, Bazaar, Community
Sangeet Paul Choudary – Platform Scale
Network/Com
munity
Market Place
Technology
Data
NIH Data Commons Pilots
Current Data Commons Pilots
Reference Data Sets
Commons Stack
Pilots
Cloud Credit Model
Resource Search &
Index
• Explore feasibility of the Commons Platform (FW)
• Provide data objects to populate the Commons
• Facilitate collaboration and interoperability
• Provide access to cloud (IaaS) and PaaS/SaaS via credits
• Connecting credits to NIH Grant
• Making large and/or high value NIH funded data sets
and tool accessible in the cloud
• Developing Data & Software Indexing methods
• Leveraging BD2K efforts bioCADDIE et al
• Collaborating with external groups
Data Commons Pilot – connecting the pieces
Co-location of large and/or highly
utilized NIH funded data on the cloud
+ commonly used tools for analyzing
and sharing digital objects
to create an interoperable resource for
the research community.
Investigators will be able to collaborate
and share digital objects within this
environment and connect with others
An NIH Wide Data Commons Pilot
Data Lake
Data Lake
Indexing
Data Lake
Indexing
Data Lake
Indexing
Data Lake
New large data
projects
Messy data
Data Pond
Indexing
Authorization /authentication layer
Considerations
Metrics - understanding and accounting of data usage patterns
Cost - Cloud Storage, pay for use cloud compute (NIH credits)
Hybrid Clouds – Mix of research and commercial clouds
Connecting - Interoperability with other Commons, clouds
Consent - Reconsenting data, Dynamic consents
Standards – Metadata, UIDs, APIs
A digital economy is
characterized by making
data a central currency
to gain a business advantage
Organizations that are not born
digital will be at a disadvantage
in the new economy
Thank you
• ADDS Office
- Phil Bourne, Michelle Dunn, Jennie Larkin, Mark Guyer, Sonynka Ngosso
• NCBI: George Komatsoulis
• NHGRI: Valentina di Francesco
• NIGMS: Susan Gregurik
• CIT: Andrea Norris, Debbie Sinmao,
• NCI: Warren Kibbe, Tony Kerlavage, Tanja Davidsen, Ian Fore
• NIAID: JJ McGowan, Nick Weber, Darrell Hurt, Maria Giovanni, Alison Yao
• The NIH Common Fund: Betsy Wilder, Jim Anderson, Leslie Derr
• Trans NIH BD2K Executive Committee & Working groups
• Many biomedical researchers, cloud providers, IT professionals
Stay in Touch
QR Business Card
LinkedIn
@VivienBonazzi
Slideshare
Blog
(Coming soon!)
Vivien Bonazzi
bonazziv@mail.nih.gov

NIH Data Commons - Note: Presentation has animations

  • 1.
    The NIH DataCommons Digital Ecosystems for using and sharing FAIR Data Vivien Bonazzi, Ph.D. Senior Advisor for Data Science Office of Data Science (ADDS) National Institutes of Health
  • 2.
    The Data Commons isa platform that fosters the development of a digital ecosystem
  • 3.
    That digital ecosystemallows transactions to occur on FAIR data at scale
  • 4.
    Data Commons isa Platform that fosters development of a digital Ecosystem Treats products of research – data, software, methods, papers etc as a digital asset (object) Digital objects need to conform to FAIR principles - Findable, Accessible, Interoperable, Reproducible Digital objects exist in a shared virtual space (initial) - Find, Deposit, Manage, Share and Reuse: digital assets Enables interactions between Producers and Consumers of digital assets Gives currency to digital assets and the people who develop and support them
  • 5.
    To understand the DataCommons Platform (and how it works for biomedical data) we need to use a Platform stack to help visualize the concept
  • 6.
    NIH Data Commons- Platform Stack https://datascience.nih.gov/commons
  • 7.
  • 8.
    NIH Data Commons- Platform Stack Digital Market Place, Bazaar, Community Sangeet Paul Choudary – Platform Scale Network/Com munity Market Place Technology Data
  • 9.
  • 10.
    Current Data CommonsPilots Reference Data Sets Commons Stack Pilots Cloud Credit Model Resource Search & Index • Explore feasibility of the Commons Platform (FW) • Provide data objects to populate the Commons • Facilitate collaboration and interoperability • Provide access to cloud (IaaS) and PaaS/SaaS via credits • Connecting credits to NIH Grant • Making large and/or high value NIH funded data sets and tool accessible in the cloud • Developing Data & Software Indexing methods • Leveraging BD2K efforts bioCADDIE et al • Collaborating with external groups
  • 12.
    Data Commons Pilot– connecting the pieces Co-location of large and/or highly utilized NIH funded data on the cloud + commonly used tools for analyzing and sharing digital objects to create an interoperable resource for the research community. Investigators will be able to collaborate and share digital objects within this environment and connect with others
  • 13.
    An NIH WideData Commons Pilot Data Lake
  • 14.
  • 15.
  • 16.
  • 17.
    Indexing Data Lake New largedata projects Messy data Data Pond
  • 18.
  • 19.
    Considerations Metrics - understandingand accounting of data usage patterns Cost - Cloud Storage, pay for use cloud compute (NIH credits) Hybrid Clouds – Mix of research and commercial clouds Connecting - Interoperability with other Commons, clouds Consent - Reconsenting data, Dynamic consents Standards – Metadata, UIDs, APIs
  • 20.
    A digital economyis characterized by making data a central currency to gain a business advantage Organizations that are not born digital will be at a disadvantage in the new economy
  • 21.
    Thank you • ADDSOffice - Phil Bourne, Michelle Dunn, Jennie Larkin, Mark Guyer, Sonynka Ngosso • NCBI: George Komatsoulis • NHGRI: Valentina di Francesco • NIGMS: Susan Gregurik • CIT: Andrea Norris, Debbie Sinmao, • NCI: Warren Kibbe, Tony Kerlavage, Tanja Davidsen, Ian Fore • NIAID: JJ McGowan, Nick Weber, Darrell Hurt, Maria Giovanni, Alison Yao • The NIH Common Fund: Betsy Wilder, Jim Anderson, Leslie Derr • Trans NIH BD2K Executive Committee & Working groups • Many biomedical researchers, cloud providers, IT professionals
  • 22.
    Stay in Touch QRBusiness Card LinkedIn @VivienBonazzi Slideshare Blog (Coming soon!) Vivien Bonazzi bonazziv@mail.nih.gov

Editor's Notes

  • #6 Framework helps visualize the concept of the platform