Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
A curse of interdisciplinarity‘ A challenge in the other discipline always   seems ‘easy’ because we are not hindered by  ...
PPP10/09/12         2
ELIXIRSafeguarding the results of life science                   research in Europe        European Life Sciences Infrastr...
DISC: the connected data departments of DTL research Hotels                                         DISC*technologyfacilit...
What is bioinformatics?• The science of storing,  retrieving and analysing  large amounts of biological  information• An i...
Bioinformatics underpins life-science research 11Genomes   GenomesContain genes Contain genes     22Genes are        Genes...
Life Science data: Multi-omics, multi-technology, multi organism, multi dimensional
From molecules to medicineMolecular components                  Integration                         Translation           ...
What is ELIXIR?• An ESFRI research infrastructure of global significance• Unites Europe’s leading life science organisatio...
Why ELIXIR?• Creating a robust infrastructure for biological  information is a bigger task than EMBL-EBI – or  any individ...
The challenge• Computer speed  and storage capacity  is doubling every 18  months and this  rate is steady• DNA sequence d...
Europe has already paid for the           science              Annual cost of generating new protein              structur...
ELIXIR’s missionTo build a sustainableEuropean infrastructure forbiological information,supporting life scienceresearch an...
A distributed pan-European        infrastructure                             14
BenefitsELIXIR will contribute to European innovation by:• Optimising access and exploitation of life-science data• Ensuri...
The scientific reason for ELIXIR• Data is an essential commodity  for life-science research.• Ten years ago, finding the  ...
One societal reason for ELIXIR• The era of personal genome  sequencing is upon us.• Sequence data will not cross  national...
The financial reason for ELIXIR• Europe has already spent  the money to generate the  data.• It will waste all this  inves...
Maintaining open access• Open access to life science is essential for  advances in many areas of research• Open access to ...
13 ELIXIR Countries                      21
Part two >>>> eScience in LS• The way we dicover knowledge has changed  fundamentally over just a decade.                 ...
The general challenge: Data has far outgrown institutional handling capacity is everywhere                                ...
Nanopublications & Cardinal Assertions            Nanopublication                                     A Nanopublication is...
Under the hood……
Managing volume & complexityCombining Cardinal Assertions with                                      5                     ...
The LS concept web: 2x2x106 concepts (profiles)
A dynamic Concept Web versus a static Ontology28
= Known reference pairs                          = non-co-occurrence pairs  More mutual informationNo increase in concept ...
eScience…. in silico reasoning and in cerebro validation                      Expert Skype calls                       Rea...
Organisation of the ecosystemGlobal Authority         Nanopublishers   App & Service     Users                            ...
33
IN ANY CASE: regardless of how     ‘sensitive’ your data is, it is malpractice                        to:        - Generat...
Acceptance of Semantic Web ApproachOver the last decade, academicresearch organisations developednew methodologies and too...
The ‘Dutch Team’                                Acknowledging… •   Herman van Haagen , MsC. (LUMC) •   Dr. Peter Bram ‘t H...
Big Data
Big Data
Upcoming SlideShare
Loading in …5
×

Big Data

7,133 views

Published on

Barend Mons over Big Data op de SURFnet Relatiedagen 2012

Published in: Technology
  • Be the first to comment

Big Data

  1. 1. A curse of interdisciplinarity‘ A challenge in the other discipline always seems ‘easy’ because we are not hindered by knowledge’.Barend Mons(DTL-DISC/ELIXIR)NBIC, LUMC. 1
  2. 2. PPP10/09/12 2
  3. 3. ELIXIRSafeguarding the results of life science research in Europe European Life Sciences Infrastructure for Biological Information www.elixir-europe.org
  4. 4. DISC: the connected data departments of DTL research Hotels DISC*technologyfacilitiestechnologyresearcheducation DTL& training *) DISC = DTL Data Integration & Stewardship Centre
  5. 5. What is bioinformatics?• The science of storing, retrieving and analysing large amounts of biological information• An interdisciplinary science involving biologists, biochemists, computer scientists and mathematicians• At the heart of modern biology 5
  6. 6. Bioinformatics underpins life-science research 11Genomes GenomesContain genes Contain genes 22Genes are Genes are transcribed transcribed 33Transcripts translate Transcripts translate to protein sequences to protein sequences 44Proteins form three- Proteins form three- dimensional structures dimensional structures 55Proteins interact with each other Proteins interact with each other and with small molecules to form and with small molecules to form pathways pathways 6 Pathways combine 6 Pathways combine to build systems to build systems 6
  7. 7. Life Science data: Multi-omics, multi-technology, multi organism, multi dimensional
  8. 8. From molecules to medicineMolecular components Integration Translation Genomes Human populations Nucleotides Biobanks Tissues and organs Transcripts Complexes Therapies Proteins Disease prevention Domains Pathways Cells Human Early individuals DiagnosisStructures Small molecules 8
  9. 9. What is ELIXIR?• An ESFRI research infrastructure of global significance• Unites Europe’s leading life science organisations in managing and safeguarding the vast amounts of data being generated every day by publicly funded research.• A large-scale initiative that will provide the facilities necessary for Europe’s life-science researchers to make the most of our rapidly growing store of information about living systems, which is the foundation on which our understanding of life is built. 9
  10. 10. Why ELIXIR?• Creating a robust infrastructure for biological information is a bigger task than EMBL-EBI – or any individual organisation or nation – can take on alone.• Biology has by far the largest research community: • ~3 million life science researchers in Europe • >6 million web hits a day at EMBL-EBI alone• We need to involve other European partners 10
  11. 11. The challenge• Computer speed and storage capacity is doubling every 18 months and this rate is steady• DNA sequence data is doubling every 6- 8 months over the last 3 years and looks to continue for Guy Cochrane, ENA, EMBL-EBI this decade 11
  12. 12. Europe has already paid for the science Annual cost of generating new protein structure data in labs around the world Annual cost of maintaining the data in a central database 12
  13. 13. ELIXIR’s missionTo build a sustainableEuropean infrastructure forbiological information,supporting life scienceresearch and its medicinetranslation to: environment bioindustries society 13
  14. 14. A distributed pan-European infrastructure 14
  15. 15. BenefitsELIXIR will contribute to European innovation by:• Optimising access and exploitation of life-science data• Ensuring longevity of the data, thereby protecting investments already made in research• Enhancing the quality of European research by supporting national efforts to increase the competence and number of bioinformatics users through training• Strengthening the global position and influence of Europe in life-science research in both in academia and industry 15
  16. 16. The scientific reason for ELIXIR• Data is an essential commodity for life-science research.• Ten years ago, finding the connection between a gene and a characteristic (e.g. drought tolerance, risk of heart disease) could take years; now it takes minutes. Image courtesy of Genome Research Ltd.• Data analysis is now the bottleneck in life-science research• ELIXIR is our only realistic hope of easing that bottleneck 16
  17. 17. One societal reason for ELIXIR• The era of personal genome sequencing is upon us.• Sequence data will not cross national boundaries.• Every national health system will need expertise to interpret it and treat patients accordingly.• Individuals need to be sure that their personal biological data are in safe hands. 18
  18. 18. The financial reason for ELIXIR• Europe has already spent the money to generate the data.• It will waste all this investment in research if the future of the data is not secured.• Industry, from SMEs to big multinationals, needs access to public data to analyse its proprietary data. 19
  19. 19. Maintaining open access• Open access to life science is essential for advances in many areas of research• Open access to bioinformatics resources provides a valuable path to discovery, one that in many other areas of research is limited by commercial confidentiality Mark Forster, Syngenta,• Charging for that data, or seeking to restrict member of the EMBL-EBI Industry Programme access through exercising Intellectual Property (IP) rights, would impede progress• ELIXIR will guarantee that open access to biological data is maintained. Speaking with a single voice will strengthen Europe’s influence in such global discussions. 20
  20. 20. 13 ELIXIR Countries 21
  21. 21. Part two >>>> eScience in LS• The way we dicover knowledge has changed fundamentally over just a decade. BIGNORANC E10/09/12 22
  22. 22. The general challenge: Data has far outgrown institutional handling capacity is everywhere The Data Deluge The Issue: But Life Sciences is particularly challenged and complex. More and more We write ‘about datasets’ ….The amount of digital data is That are too large to publish exploding, with a staggering 1.8 zettabytes in 2011 In narrative
  23. 23. Nanopublications & Cardinal Assertions Nanopublication A Nanopublication is the smallest unit of publishable information containing: 1.Assertion A statement of concepts in terms of one or more ‘subject -> predicate -> object’ (triple) relationships. 1.Provenance a)Attribution – Who made this assertion,1 ‘n’ when and where?identical different b)Supporting information – Any otherassertion provenances information which is relevant to the assertion (e.g. this assertion is only valid in humans under 18). A Cardinal Assertion aggregates all ‘n’ Nanopublications making the same assertion. It therefore has 1 assertion and ‘n’ provenances, eliminating redundancy. Cardinal Assertion
  24. 24. Under the hood……
  25. 25. Managing volume & complexityCombining Cardinal Assertions with 5 5Concept profiles reduces the amount ofdata with ≈99.999996% 4 4 1 1Individual 2 2Concept Profiles≈4x106IndividualCardinal Assertions 5 4 2 1> 10 11IndividualNanopublications> 1014
  26. 26. The LS concept web: 2x2x106 concepts (profiles)
  27. 27. A dynamic Concept Web versus a static Ontology28
  28. 28. = Known reference pairs = non-co-occurrence pairs More mutual informationNo increase in concept overlap Including manual curation More concepts in common Removal of low info paths
  29. 29. eScience…. in silico reasoning and in cerebro validation Expert Skype calls Reading up
  30. 30. Organisation of the ecosystemGlobal Authority Nanopublishers App & Service Users Providers Endorse CA Space Application Knowledge (OCS & ICS) development Management Providers Reasoning services Practices Academic & Best ONS/INSs technical and Commercial process Users consultancy project Knowledge Original delivery Discovery Assist & Data Owners capacity Certify
  31. 31. 33
  32. 32. IN ANY CASE: regardless of how ‘sensitive’ your data is, it is malpractice to: - Generate data without a solid stewardship plan - Build impenetrable SILOS - Fail to record provenance - Store them in non interoperable format - Think that data=information -EVEN if your only goal is the Nobel Prize (or for Dutch: a Spinoza Prize)34
  33. 33. Acceptance of Semantic Web ApproachOver the last decade, academicresearch organisations developednew methodologies and tools toaddress the Big Data problem.Global agreement by leadingscientists on uniqueNanopublication solution.100’s of millions already investedin the basis technologyApplicable as a technology across(STM) domains and industries.Pharmaceutical companies areearly adopters (InnovativeMedicine Initiative).
  34. 34. The ‘Dutch Team’ Acknowledging… • Herman van Haagen , MsC. (LUMC) • Dr. Peter Bram ‘t Hoen (LUMC) CWA- Open PHACTS • Dr. Marco Roos (LUMC) • Prof. Amos Bairoch (SIB, Switzerland, CWA) • Dr. Erik Schultes (LUMC) • Prof. Carole Goble (Mancheste, CWA, OPS) • Prof. Johan den Dunnen (LUMC) • Prof. Katy Borner (Indiana University CWA) • Prof. Gertjan van Ommen (LUMC) • Prof. Mark Musen (NCBO, Stanford CWA,OPS) • Dr. Erik van Mulligen (EMC) • Dr. Pascale Gaudet (UniProt, ISB, CWA • Dr. Jan Kors (EMC) • Dr. Mike Colon (VIVO, UF, CWA) • Dr. Martijn Schuemie (EMC) • Prof. Maryann Martone (Force 11, USC, CWA) • Prof. Johan van der Lei (EMC) • Dr. Nigam Shah (NCBO, Stanford, CWA, OPS) • Dr. Rob Hooft (NBIC) • Dr. Mark Wlikinson (Canada, CWA) • Dr. Christine Chichester (NBIC) • Abel Packer (Brazil, Scielo, CWA, OPS) • Dr. Leon Mei (NBIC) • Jan Velterop (ACKnowledge, CWA, OPS) • Kees Burger (NBIC) • Albert Mons (CWA, NBIC) • Bharat Singh (NBIC/EMC) • Prof. Frank van Harnelen (FUA/LARKC, CWA, OPS) • Dr. Marc van Driel (NBIC) • Dr. Chris Evelo (Maastrciht, CWA, OPS) • Dr. Ruben Kok (NBIC) • Dr. Antony Willams (RSC/ChemSpider, CWA,OPS) • Prof. Marcel Reinders (NBIC) • Dr. Richard Kidd (RSC, OPS) • Prof. Jaap Heringa (NBIC) • Dr. Paul Groth (FUA, CWA, OPS) • Prof. Gert Vriend (NBIC) • Dr. Michel Dumontier (Canada, CWA, OPS) • Dr. Morris Schwertz (BBMRI, CWA) • Dr .Andrew Gibson, UA, CWA, OPS) • Dr. Andra Waagmeester (NBIC) • Dr. Bryn Williams-Jones (Pfizer, OPS) • Dr. Kristina Hettne (LUMC) • Dr. Ian Dix (Astra Zeneca, OPS) • Dr. Rene van Schaik (eScience Cenrte) • Dr. Niklas Blomberg (Astra Zeneca, OPS) • Drs. Albert Mons (PHORTOS consultants) • Dr. Mike Barnes, GSK, OPS) • Mr. Drs. Arie Baak (PHORTOS consultants) • Prof. Jan-erik Litton (CWA, BBMRI)

×