“ELIXIR and Open Data”
View from an ELIXIR Node”
Barend Mons

Prof. Biosemantics, LUMC, Scientific director NBIC,
Head of ...
Outline
• The Dutch Node
• Data in the eScience era: Pattern Recognition
and Excavation
• Data Collection, Archiving and: ...
Data interoperability
and exchange

Compute and storage
infrastructure services

Training & Education

3

Nodes and a Hub ...
The Data cycle in eScience

4

Data Stewardship covers the entire datacycle >>>>>>

?
Organism

Cells and Organs
The Data Challenge
• Computer speed and
storage capacity is
doubling every 18
months and this rate is
steady

• DNA sequen...
Simplified eScience

All
Legacy
information
New
dataset
New
Insights

User

7

The Goal is Knowledge Discovery, not Data C...
X

AREAL SURVEY

DEEP EXCAVATION

Pattern Recognition in Open Data and detailed Excavation should be separated
8
The Explicitome:

14
10
Individual explicit
associations

9

How do we discover patterns in „Ridiculograms‟?
The Semantic Web approach to interoperability

n identical
assertions

„n‟ different
provenances

Cardinal Assertion
10

T...
We publish about less than a million LS concepts

11

106 concept clusters (Knowlets)
Zipping the Explicitome

14
10
Individual explicit
associations

1011
CA‟s

5 x 105

Knowlets

12

≈99.999996% reduction o...
In silico knowledge discovery for the millions..

experimentation

In cerebro rationalisation
And confirmational reading

...
The Implicitome

14

> 1 M hypotheses hidden in the implicitome
eScience & ELIXIR

All
Legacy
information
New
dataset
New
Insights

User

15

The Goal is Knowledge Discovery, not Data Co...
The Role of ELIXIR: Open versus Managed
Clinical
Data sets

Tools and standards for

Public
Data sets

All
Legacy
informat...
Not only „hardware‟

For Big Data to become huge, however, there are still
hurdles to leap. For one thing, the tools to an...
Vision (personal, not necessarily ELIXIR)
• Data Collections are not a goal in themselves
• They ultimately serve Knowledg...
SHARED knowledge
is
Double Knowledge

19

„Knowledge is like love: it multiplies when shared‟
Upcoming SlideShare
Loading in …5
×

“ELIXIR and Open Data” View from an ELIXIR Node” presentation given by Barend Mons, Scientific Director NBIC at ELIXIR Launch event, 18th December 2013

649 views

Published on

“ELIXIR and Open Data” View from an ELIXIR Node” presentation given by Barend Mons, Scientific Director NBIC at ELIXIR Launch event, 18th December 2013

Published in: Technology, Education

“ELIXIR and Open Data” View from an ELIXIR Node” presentation given by Barend Mons, Scientific Director NBIC at ELIXIR Launch event, 18th December 2013

  1. 1. “ELIXIR and Open Data” View from an ELIXIR Node” Barend Mons Prof. Biosemantics, LUMC, Scientific director NBIC, Head of ELIXIR Node 18 December 2013 ELIXIR Launch, Brussels European Life Sciences Infrastructure for Biological Information www.elixir-europe.org
  2. 2. Outline • The Dutch Node • Data in the eScience era: Pattern Recognition and Excavation • Data Collection, Archiving and: Reduction • Why is ELIXIR important for Open Data? • What are the needs of clinical institutes and Industry: Open and Managed Data • Why training as part of ELIXIR? 2
  3. 3. Data interoperability and exchange Compute and storage infrastructure services Training & Education 3 Nodes and a Hub in NL (DTL) > Node in >>>>>>
  4. 4. The Data cycle in eScience 4 Data Stewardship covers the entire datacycle >>>>>> ?
  5. 5. Organism Cells and Organs
  6. 6. The Data Challenge • Computer speed and storage capacity is doubling every 18 months and this rate is steady • DNA sequence data is doubling every 6-8 months over the last 3 years and looks to continue for this decade 6 Guy Cochrane, ENA, EMBL-EBI Proper Data stewardship and analysis may be THE limiting factor in eScience
  7. 7. Simplified eScience All Legacy information New dataset New Insights User 7 The Goal is Knowledge Discovery, not Data Collection
  8. 8. X AREAL SURVEY DEEP EXCAVATION Pattern Recognition in Open Data and detailed Excavation should be separated 8
  9. 9. The Explicitome: 14 10 Individual explicit associations 9 How do we discover patterns in „Ridiculograms‟?
  10. 10. The Semantic Web approach to interoperability n identical assertions „n‟ different provenances Cardinal Assertion 10 The Unique Explicitome: 1011 Cardinal Assertions
  11. 11. We publish about less than a million LS concepts 11 106 concept clusters (Knowlets)
  12. 12. Zipping the Explicitome 14 10 Individual explicit associations 1011 CA‟s 5 x 105 Knowlets 12 ≈99.999996% reduction of infoburden
  13. 13. In silico knowledge discovery for the millions.. experimentation In cerebro rationalisation And confirmational reading 13 Enrichment of the explicitome In silico hypothesis generation Reasoning takes place on aggregated and zipped data
  14. 14. The Implicitome 14 > 1 M hypotheses hidden in the implicitome
  15. 15. eScience & ELIXIR All Legacy information New dataset New Insights User 15 The Goal is Knowledge Discovery, not Data Collection
  16. 16. The Role of ELIXIR: Open versus Managed Clinical Data sets Tools and standards for Public Data sets All Legacy information Private Data sets interoperability Data Collection(s) Anywhere 16 ELIXIR Data sets everywhere
  17. 17. Not only „hardware‟ For Big Data to become huge, however, there are still hurdles to leap. For one thing, the tools to analyse data are not yet good enough. And people with the skills to analyse data are scarce and will become scarcer. By 2018 there will be a “talent gap” of between 140,000 and 190,000 people, 17 Only three things count: Experts Experts Experts
  18. 18. Vision (personal, not necessarily ELIXIR) • Data Collections are not a goal in themselves • They ultimately serve Knowledge Discovery • E-datastewardship in a time of plenty is thus also data zipping • E-datastewardship should address the entire data cycle • ELIXIR is more about a trusted partner for the ‘Tools&Rules’ for data stewardship and interoperability than about data archiving. • Interoperability is for both humans and computers. • Open data needs to be talking to ‘closed data’ • Data experts need the place in the egosystem they deserve 18
  19. 19. SHARED knowledge is Double Knowledge 19 „Knowledge is like love: it multiplies when shared‟

×