STI Summit 2011 - LS4 LS Khaos

409 views

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
409
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

STI Summit 2011 - LS4 LS Khaos

  1. 1. Linked Data and Life Sciences Riga STI Summit 6,8 july 2011 José F. Aldana Montes
  2. 2. Life Sciences Linked DataProducing Consuming
  3. 3. Producing Life Sciences Linked Data (Problems) Most Linked Open Data is created and provided without the help of the original data provider whoAlmost all Linked Open Data in Life Sciences is provided by Bio2RDF
  4. 4. Producing Life Sciences Linked Data (Problems)• Data Base is a life’s work for a biologist and He/she wants to publish it – but not to lose the control• An RDF dump of the DB is cheap – but supporting Queries and Data Analysis is expensive – where is the money comming from?• They are very motivated to add value to the data – but they are still lacking up to date ICT skills• Help is wanted to kill Bio2RDFAlmost all Linked Open Data in Life Sciences is provided by Bio2RDF
  5. 5. Consuming Linked Data• Number of Linked Data repositories will keep growing• Use of Linked Data in Life Sciences means Linking data with existing tools which are de facto standards in certain subdomains: • Pathways http://sbmm.uma.es • Proteins
  6. 6. Consuming Linked Data• Data Analysis Services not only queries but also Data Mining, Crawling, and Reasoning are need to engage community – BioMedical uses (Pharmaceuticals testing, drug screening)
  7. 7. Consuming Linked Data• Reasoning, removed to make data reuse possible, should be re-introduced in some cases over real complex ontologies with large sets of data – BioPax Level 3 (Level 4 under development) • OWL Species: DL • DL Expressivity: SHIF(D) • Consistent: Yes – BioPax Level 3 (4 officially identified databases, more DBs public data as BioPax Level 3 instances) • Reactome Database – 1.54 GB – 2 980 230 triples – BioPax Level 2 (9 officially identified databases)• Previously, data and ontologies should be cleaned up
  8. 8. Consuming Linked Data• Reasoning Services over real complex ontologies with large sets of data – Cost reduction in experiment design – Hypothesis demonstration/refutation – Privacy in reasoning with public + private data
  9. 9. Consuming Linked Data• Reasoning for classification problems – Disease classification / diagnosis – Protein identification – Pathway alignment
  10. 10. Consuming Linked Data• Digital Data Curation / cross-validation
  11. 11. Consuming Linked Data• Domain oriented (customizable) user interfaces
  12. 12. Scalability Issues in Life Sciences• Real scenarios with rich ontologies are starting to appear: – BioPax Level 3 4: complex OWL ontology (transitive, reflexive, inverse and functional properties, restrictions in most of the classes, 70 classes) – Big data sets in OWL format (from 20MB to 45GB of data) – Problems with the data: • undetected Abox (even Tbox problems) inconsistencies because of the lack of scalable reasoners • Lack of SPARQL endpoints to query these data
  13. 13. Summary: Are we losing the war?• Producing Linked Data in Life Sciences: Some risks and some needs detected: – A motivating rewarding schema for the data owner – Some specific infrastructure (action, facility, institute, foundation, private…) support could be useful • to engage data owners, • to aport tecnnical capability and • to share costs
  14. 14. Summary: Are we losing the war?• Consuming Linked Data in Life Sciences Opportunities – Connecting Linking data with existing tools which are de facto standards in certain LS subdomains • to multiply impact – Not only Queries Services but also Data Analysis Services (Crawling, Mining, Reasoning, etc.) should be provided to the community • but this is expensive for the average DB owner – Data must be cleaned up, curate and cross-validated • main thread – Domain is lacking specific user interfaces • this is related with the connection of LD to (de facto) standard tools – In this domain makes sense to reason • but scalability is still an issue
  15. 15. Linked Data and Life Sciences José F. Aldana Montes jfam@lcc.uma.es

×