Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
The Data Commons
Digital Ecosystems for Sharing and
Analyzing biomedical Big Data
Vivien Bonazzi, Ph.D.
Senior Advisor for...
Lets Talk About Biomedical Big Data
What Makes Big Data Big?
VOLUME
VELOCITY
VARIETY
VERACITY
It’s a signal of the coming Digital Economy
DATA has VALUE
DATA is CENTRAL to the Digital Economy
But its more than this…..
An economy characterized by
using data to gain a business
advantage
(yes, institutions are a business)
Organizations that ...
Organizations will be defined by their digital assets
Scientific digital assets
Data
Software
Workflows
The most successful organizations of the future will
be those that can leverage their digital assets and
transform them in...
Make data
The currency of an organization
Usable in a digital ecosystems – Data Commons
The problem with biomedical data
Digital assets includes Data
Challenges Biomedical Data
The Journal Article is the end goal
Data is a means to an ends (low value)
Data is not FAIR
Fin...
The Problem
With
Biomedical DATA
https://www.youtube.com/watch?v=N2zK3sAtr-4
What’s
Changing?
FAIR principles drive data to become the currency
Policies that promote data sharing via FAIR help
change the culture
We also need a digital ecosystem that allows
transactions to occur on FAIR data
at scale
The Data Commons
is a platform
that fosters the development of a digital ecosystem
The Data Commons platform that fosters development of a digital
ecosystem
Treats products of research – data, software, me...
The Data Commons
is a platform?
that fosters the development of a digital ecosystem
“A platform is a plug and play model
that allows multiple participants (producers and
consumers) to connect to it, interac...
A lot of what see today uses a platform approach ”
Sangeet Paul Choudary – Platform Scale
The goal of the a Data Commons Platform is to
enable interactions between producers and
consumers
Sangeet Paul Choudary – ...
To understand the
Data Commons Platform
(and how it works for biomedical data)
we need to use a Platform stack
to help vis...
Sangeet Paul Choudary – Platform Scale
Platforms have 3 layers
NIH Data Commons - Platform Stack
https://datascience.nih.gov/commons
Technology
Technology
Data
Network/
market place
https://datascience.nih.gov/commons
NIH Data Commons - Platform Stack
Initial Phase
Unique digital object identifiers of resolvable to original authoritative source
Machine readable
A minimal ...
Data Commons Platform drives digital ecosystem
The NIH Data Commons Pilot
The NIH Data Commons Pilot
Co-location of large and/or highly
utilized NIH funded data with
storage and computing infrastr...
NIH Nascent Commons Pilots
An NIH Wide Data Commons Pilot
Indexing
Indexing
Indexing
Authorization /authentication layer
Considerations
Metrics - understanding and accounting of data usage patterns
Cost - Cloud Storage, pay for use cloud compu...
An Australian Commons Experiment?
A Garvan Data Commons Platform?
Garvan DATA
NCI + Cloud
Analysis tools (Inc 3rd party)
Apps Store
Community – Research, Cl...
An Australian Data Commons?
Australian DATA - Flora and Fauna
Commercial Cloud (NCI)
Analysis tools (Inc 3rd party)
Apps S...
“To achieve great things, two things are
needed: a plan and not quite enough
time”
Leonard Bernstein
Thank you
• ADDS Office
- Phil Bourne, Michelle Dunn, Jennie Larkin, Mark Guyer, Sonynka Ngosso
• NCBI: Jim Ostell, David ...
Stay in Touch
QR Business Card
LinkedIn
@Vivien.Bonazzi
Slideshare
Blog
(Coming soon!)
Vivien Bonazzi
bonazziv@mail.nih.gov
Data Commons Garvan -  2016
Data Commons Garvan -  2016
Data Commons Garvan -  2016
Data Commons Garvan -  2016
Upcoming SlideShare
Loading in …5
×

Data Commons Garvan - 2016

300 views

Published on

Presentation at the Garvan Institute, Sydney Australia - May 2016

Published in: Science

Data Commons Garvan - 2016

  1. 1. The Data Commons Digital Ecosystems for Sharing and Analyzing biomedical Big Data Vivien Bonazzi, Ph.D. Senior Advisor for Data Science Office of Data Science (ADDS) National Institutes of Health
  2. 2. Lets Talk About Biomedical Big Data
  3. 3. What Makes Big Data Big? VOLUME VELOCITY VARIETY VERACITY
  4. 4. It’s a signal of the coming Digital Economy DATA has VALUE DATA is CENTRAL to the Digital Economy But its more than this…..
  5. 5. An economy characterized by using data to gain a business advantage (yes, institutions are a business) Organizations that are not born digital will be at a disadvantage in the new economy
  6. 6. Organizations will be defined by their digital assets Scientific digital assets Data Software Workflows
  7. 7. The most successful organizations of the future will be those that can leverage their digital assets and transform them into a digital enterprise
  8. 8. Make data The currency of an organization Usable in a digital ecosystems – Data Commons
  9. 9. The problem with biomedical data Digital assets includes Data
  10. 10. Challenges Biomedical Data The Journal Article is the end goal Data is a means to an ends (low value) Data is not FAIR Findable, Accessible, Interoperable, Reproducible Limited e-infrastructures to
  11. 11. The Problem With Biomedical DATA https://www.youtube.com/watch?v=N2zK3sAtr-4
  12. 12. What’s Changing?
  13. 13. FAIR principles drive data to become the currency Policies that promote data sharing via FAIR help change the culture
  14. 14. We also need a digital ecosystem that allows transactions to occur on FAIR data at scale
  15. 15. The Data Commons is a platform that fosters the development of a digital ecosystem
  16. 16. The Data Commons platform that fosters development of a digital ecosystem Treats products of research – data, software, methods, papers etc as digital asset (object) Digital objects need to conform to FAIR principles Digital objects exist in a shared virtual space - Find, Deposit, Manage, Share and Reuse: digital assets Enables interactions between Producers and Consumers of digital assets Gives currency to digital assets and the people who develop and support them
  17. 17. The Data Commons is a platform? that fosters the development of a digital ecosystem
  18. 18. “A platform is a plug and play model that allows multiple participants (producers and consumers) to connect to it, interact with each other and create value” Sangeet Paul Choudary – Platform Scale
  19. 19. A lot of what see today uses a platform approach ” Sangeet Paul Choudary – Platform Scale
  20. 20. The goal of the a Data Commons Platform is to enable interactions between producers and consumers Sangeet Paul Choudary – Platform Scale
  21. 21. To understand the Data Commons Platform (and how it works for biomedical data) we need to use a Platform stack to help visualize the concept
  22. 22. Sangeet Paul Choudary – Platform Scale Platforms have 3 layers
  23. 23. NIH Data Commons - Platform Stack https://datascience.nih.gov/commons Technology Technology Data Network/ market place
  24. 24. https://datascience.nih.gov/commons NIH Data Commons - Platform Stack
  25. 25. Initial Phase Unique digital object identifiers of resolvable to original authoritative source Machine readable A minimal set of searchable metadata Clear access rules (especially important for human subjects data) An entry (with metadata) in one or more indices Future Phases Standard, community based unique digital object identifiers Conform to community approved standard metadata and ontologies for enhanced searching Digital objects accessible via open standard APIs NIH Data Commons: Digital Asset Compliance Making things FAIR
  26. 26. Data Commons Platform drives digital ecosystem
  27. 27. The NIH Data Commons Pilot
  28. 28. The NIH Data Commons Pilot Co-location of large and/or highly utilized NIH funded data with storage and computing infrastructure + Commonly used tools for analyzing and sharing digital objects to create an interoperable resource for the research community. Investigators will be able to collaborate and share digital objects within this environment and connect with others
  29. 29. NIH Nascent Commons Pilots
  30. 30. An NIH Wide Data Commons Pilot
  31. 31. Indexing
  32. 32. Indexing
  33. 33. Indexing Authorization /authentication layer
  34. 34. Considerations Metrics - understanding and accounting of data usage patterns Cost - Cloud Storage, pay for use cloud compute (NIH credits) Hybrid Clouds – Mix of research and commercial clouds Connecting - Interoperability with other Commons, clouds Consent - Reconsenting data, Dynamic consents Standards – Metadata, UIDs, APIs
  35. 35. An Australian Commons Experiment?
  36. 36. A Garvan Data Commons Platform? Garvan DATA NCI + Cloud Analysis tools (Inc 3rd party) Apps Store Community – Research, Clinical, Public API connectivity with other Commons
  37. 37. An Australian Data Commons? Australian DATA - Flora and Fauna Commercial Cloud (NCI) Analysis tools (Inc 3rd party) Apps Store Community – Research, Clinical, Public API connectivity with other Commons
  38. 38. “To achieve great things, two things are needed: a plan and not quite enough time” Leonard Bernstein
  39. 39. Thank you • ADDS Office - Phil Bourne, Michelle Dunn, Jennie Larkin, Mark Guyer, Sonynka Ngosso • NCBI: Jim Ostell, David Lipman, George Komatsoulis • NHGRI: Valentina di Francesco, Kevin Lee, Eric Green • NIGMS: John Lorsch, Susan Gregurik • CIT: Andrea Norris, Debbie Sinmao, Stacy Charland • NCI: Warren Kibbe, Tony Kerlavage, Lou Staudt, Tanja Davidsen, Ian Fore • NIAID: JJ McGowan, Nick Weber, Darrell Hurt, Maria Giovanni • The NIH Common Fund: Betsy Wilder, Jim Anderson, Leslie Derr • Trans NIH BD2K Executive Committee & Working groups • Many biomedical researchers, cloud providers, IT professionals • John Mattick and the Garvan Institute
  40. 40. Stay in Touch QR Business Card LinkedIn @Vivien.Bonazzi Slideshare Blog (Coming soon!) Vivien Bonazzi bonazziv@mail.nih.gov

×