“ELIXIR and Open Data” View from an ELIXIR Node” presentation given by Barend Mons, Scientific Director NBIC at ELIXIR Launch event, 18th December 2013

  • 163 views
Uploaded on

“ELIXIR and Open Data” View from an ELIXIR Node” presentation given by Barend Mons, Scientific Director NBIC at ELIXIR Launch event, 18th December 2013

“ELIXIR and Open Data” View from an ELIXIR Node” presentation given by Barend Mons, Scientific Director NBIC at ELIXIR Launch event, 18th December 2013

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
163
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
7
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. “ELIXIR and Open Data” View from an ELIXIR Node” Barend Mons Prof. Biosemantics, LUMC, Scientific director NBIC, Head of ELIXIR Node 18 December 2013 ELIXIR Launch, Brussels European Life Sciences Infrastructure for Biological Information www.elixir-europe.org
  • 2. Outline • The Dutch Node • Data in the eScience era: Pattern Recognition and Excavation • Data Collection, Archiving and: Reduction • Why is ELIXIR important for Open Data? • What are the needs of clinical institutes and Industry: Open and Managed Data • Why training as part of ELIXIR? 2
  • 3. Data interoperability and exchange Compute and storage infrastructure services Training & Education 3 Nodes and a Hub in NL (DTL) > Node in >>>>>>
  • 4. The Data cycle in eScience 4 Data Stewardship covers the entire datacycle >>>>>> ?
  • 5. Organism Cells and Organs
  • 6. The Data Challenge • Computer speed and storage capacity is doubling every 18 months and this rate is steady • DNA sequence data is doubling every 6-8 months over the last 3 years and looks to continue for this decade 6 Guy Cochrane, ENA, EMBL-EBI Proper Data stewardship and analysis may be THE limiting factor in eScience
  • 7. Simplified eScience All Legacy information New dataset New Insights User 7 The Goal is Knowledge Discovery, not Data Collection
  • 8. X AREAL SURVEY DEEP EXCAVATION Pattern Recognition in Open Data and detailed Excavation should be separated 8
  • 9. The Explicitome: 14 10 Individual explicit associations 9 How do we discover patterns in „Ridiculograms‟?
  • 10. The Semantic Web approach to interoperability n identical assertions „n‟ different provenances Cardinal Assertion 10 The Unique Explicitome: 1011 Cardinal Assertions
  • 11. We publish about less than a million LS concepts 11 106 concept clusters (Knowlets)
  • 12. Zipping the Explicitome 14 10 Individual explicit associations 1011 CA‟s 5 x 105 Knowlets 12 ≈99.999996% reduction of infoburden
  • 13. In silico knowledge discovery for the millions.. experimentation In cerebro rationalisation And confirmational reading 13 Enrichment of the explicitome In silico hypothesis generation Reasoning takes place on aggregated and zipped data
  • 14. The Implicitome 14 > 1 M hypotheses hidden in the implicitome
  • 15. eScience & ELIXIR All Legacy information New dataset New Insights User 15 The Goal is Knowledge Discovery, not Data Collection
  • 16. The Role of ELIXIR: Open versus Managed Clinical Data sets Tools and standards for Public Data sets All Legacy information Private Data sets interoperability Data Collection(s) Anywhere 16 ELIXIR Data sets everywhere
  • 17. Not only „hardware‟ For Big Data to become huge, however, there are still hurdles to leap. For one thing, the tools to analyse data are not yet good enough. And people with the skills to analyse data are scarce and will become scarcer. By 2018 there will be a “talent gap” of between 140,000 and 190,000 people, 17 Only three things count: Experts Experts Experts
  • 18. Vision (personal, not necessarily ELIXIR) • Data Collections are not a goal in themselves • They ultimately serve Knowledge Discovery • E-datastewardship in a time of plenty is thus also data zipping • E-datastewardship should address the entire data cycle • ELIXIR is more about a trusted partner for the ‘Tools&Rules’ for data stewardship and interoperability than about data archiving. • Interoperability is for both humans and computers. • Open data needs to be talking to ‘closed data’ • Data experts need the place in the egosystem they deserve 18
  • 19. SHARED knowledge is Double Knowledge 19 „Knowledge is like love: it multiplies when shared‟