Your SlideShare is downloading. ×
STI Summit 2011 - LS4 LS Khaos
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

STI Summit 2011 - LS4 LS Khaos

194
views

Published on

Published in: Technology, Education

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
194
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Linked Data and Life Sciences Riga STI Summit 6,8 july 2011 José F. Aldana Montes
  • 2. Life Sciences Linked DataProducing Consuming
  • 3. Producing Life Sciences Linked Data (Problems) Most Linked Open Data is created and provided without the help of the original data provider whoAlmost all Linked Open Data in Life Sciences is provided by Bio2RDF
  • 4. Producing Life Sciences Linked Data (Problems)• Data Base is a life’s work for a biologist and He/she wants to publish it – but not to lose the control• An RDF dump of the DB is cheap – but supporting Queries and Data Analysis is expensive – where is the money comming from?• They are very motivated to add value to the data – but they are still lacking up to date ICT skills• Help is wanted to kill Bio2RDFAlmost all Linked Open Data in Life Sciences is provided by Bio2RDF
  • 5. Consuming Linked Data• Number of Linked Data repositories will keep growing• Use of Linked Data in Life Sciences means Linking data with existing tools which are de facto standards in certain subdomains: • Pathways http://sbmm.uma.es • Proteins
  • 6. Consuming Linked Data• Data Analysis Services not only queries but also Data Mining, Crawling, and Reasoning are need to engage community – BioMedical uses (Pharmaceuticals testing, drug screening)
  • 7. Consuming Linked Data• Reasoning, removed to make data reuse possible, should be re-introduced in some cases over real complex ontologies with large sets of data – BioPax Level 3 (Level 4 under development) • OWL Species: DL • DL Expressivity: SHIF(D) • Consistent: Yes – BioPax Level 3 (4 officially identified databases, more DBs public data as BioPax Level 3 instances) • Reactome Database – 1.54 GB – 2 980 230 triples – BioPax Level 2 (9 officially identified databases)• Previously, data and ontologies should be cleaned up
  • 8. Consuming Linked Data• Reasoning Services over real complex ontologies with large sets of data – Cost reduction in experiment design – Hypothesis demonstration/refutation – Privacy in reasoning with public + private data
  • 9. Consuming Linked Data• Reasoning for classification problems – Disease classification / diagnosis – Protein identification – Pathway alignment
  • 10. Consuming Linked Data• Digital Data Curation / cross-validation
  • 11. Consuming Linked Data• Domain oriented (customizable) user interfaces
  • 12. Scalability Issues in Life Sciences• Real scenarios with rich ontologies are starting to appear: – BioPax Level 3 4: complex OWL ontology (transitive, reflexive, inverse and functional properties, restrictions in most of the classes, 70 classes) – Big data sets in OWL format (from 20MB to 45GB of data) – Problems with the data: • undetected Abox (even Tbox problems) inconsistencies because of the lack of scalable reasoners • Lack of SPARQL endpoints to query these data
  • 13. Summary: Are we losing the war?• Producing Linked Data in Life Sciences: Some risks and some needs detected: – A motivating rewarding schema for the data owner – Some specific infrastructure (action, facility, institute, foundation, private…) support could be useful • to engage data owners, • to aport tecnnical capability and • to share costs
  • 14. Summary: Are we losing the war?• Consuming Linked Data in Life Sciences Opportunities – Connecting Linking data with existing tools which are de facto standards in certain LS subdomains • to multiply impact – Not only Queries Services but also Data Analysis Services (Crawling, Mining, Reasoning, etc.) should be provided to the community • but this is expensive for the average DB owner – Data must be cleaned up, curate and cross-validated • main thread – Domain is lacking specific user interfaces • this is related with the connection of LD to (de facto) standard tools – In this domain makes sense to reason • but scalability is still an issue
  • 15. Linked Data and Life Sciences José F. Aldana Montes jfam@lcc.uma.es

×