Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
BioSamples Database Linked Data 
(2014 edition) 
Marco Brandizi, 
Functional Genomics Team 
SWAT4LS Tutorial, Dec 9th, 201...
Why a BioSamples Database (aka BioSD)? 
• A reference system, where to search/browse information about biological 
samples...
Why Linked Data for BioSD? 
• Potentially useful to application developers and Linked Data tools 
• Integration with simil...
The BioSD Model 
Sample Groups 
Submission 
External links 
Samples 
http://www.ebi.ac.uk/biosamples
The BioSD Model 
Group's (or Submission's) samples 
Sample's (or Groups') attribute types 
and values 
External links
Changes to Linked Data Model 
• • Main Main Entities: Entities: http://http://tinyurl.tinyurl.com/com/lo33ncc 
lo33ncc 
• ...
SPARQL Queries
Find Samples and attributes 
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
PREFIX biosd-terms: <http://rdf.ebi.ac....
Samples about a given organism 
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
PREFIX biosd-terms: <http://rdf.ebi....
Geo-located Samples/Sample Groups 
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
PREFIX biosd-terms: <http://rdf.e...
Expressed Genes and Samples 
• For http://purl.uniprot.org/uniprot/P04637 (P53 in Human) 
• Find the EFO classes for which...
New Ideas and Alike
Geo-Samples, Google Map Integration 
• Exercise: From geo-located samples to Google Map. Think how to do it: 
● Gmaps supp...
Search-by-Feature Similarity (ongoing) 
SELECT DISTINCT ?smp ? smpDescr (COUNT (DISTINCT ?pv) AS ?score) 
WHERE { 
{ 
?smp...
More (possibly for the hackathon) 
• Continuing with the similarity search 
• Improving linkage with other data sets 
– e....
Acknowledgements 
• BioSD Team - Alvis Brazma, Tony Burdett, Adam 
Faulconbridge, Mike Gostev, Helen Parkinson, Rui Perrer...
And you all! 
Contact info: 
www.ebi.ac.uk/biosamples 
www.marcobrandizi.info 
Sorry, we have grown to ~4M samples, yet we...
Extras
BioSD Data (External Data Sources) 
SPARQL Source: http://tinyurl.com/o95xa5v 
Tag Cloud made with http://www.wordle.net 
...
BioSD Data (Common Attribute Types) 
SPARQL Source: http://goo.gl/wk0RHp 
Tag Cloud made with http://www.wordle.net 
(2013...
Main Ontologies used in BioSD / Linked Data 
• See Doc Page http://www.ebi.ac.uk/rdf/documentation/biosamples 
• biosd-ter...
Upcoming SlideShare
Loading in …5
×

BioSD Tutorial 2014 Editition

604 views

Published on

The tutoria about the Biosamples Database Linked Data, presented at SWAT4LS 2014

Published in: Health & Medicine
  • D0WNL0AD FULL ▶ ▶ ▶ ▶ http://1lite.top/vyF5i ◀ ◀ ◀ ◀
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

BioSD Tutorial 2014 Editition

  1. 1. BioSamples Database Linked Data (2014 edition) Marco Brandizi, Functional Genomics Team SWAT4LS Tutorial, Dec 9th, 2014 Find this presentation on http://www.slideshare.net/mbrandizi
  2. 2. Why a BioSamples Database (aka BioSD)? • A reference system, where to search/browse information about biological samples, used/useable for biomedical experiments • Focused on the sample context (i.e., independent on the specific assay type/technology) • Supports heterogeneous experiments – Single place that assay repositories can link (reference samples, authoritative source for repositories like Metagenomics/ENA/ArrayExpress) – Single place for searches and related-to or same-as relationships (e.g., see the 'myEquivalents' project) • Common interfaces to access sample information and links to specific data/repositories (e.g., web, XML/REST, RDF)
  3. 3. Why Linked Data for BioSD? • Potentially useful to application developers and Linked Data tools • Integration with similar/related data-sets • Exploitation of ontologies – Standardisation – A little semantics goes a long way – Improved searching • As usually, open to unexpected uses – e.g., http://www.phyloviz.net/NGSonto
  4. 4. The BioSD Model Sample Groups Submission External links Samples http://www.ebi.ac.uk/biosamples
  5. 5. The BioSD Model Group's (or Submission's) samples Sample's (or Groups') attribute types and values External links
  6. 6. Changes to Linked Data Model • • Main Main Entities: Entities: http://http://tinyurl.tinyurl.com/com/lo33ncc lo33ncc • • Details Details about about Sample Sample Attributes: Attributes: http://http://tinyurl.tinyurl.com/com/n5oyvyd n5oyvyd Several improvements to the conversion software, Several improvements to the conversion software, A Aimiminingg aatt mmoorree ffrreeqquueenntt aauuttoo--uuppddaatteess
  7. 7. SPARQL Queries
  8. 8. Find Samples and attributes PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX biosd-terms: <http://rdf.ebi.ac.uk/terms/biosd/> PREFIX sio: <http://semanticscience.org/resource/> SELECT DISTINCT ?smp ?pvLabel ?propTypeLabel WHERE { ?smp a biosd-terms:Sample; biosd-terms:has-bio-characteristic | sio:SIO_000332 ?pv. # is about ?pv rdfs:label ?pvLabel; biosd-terms:has-bio-characteristic-type ?pvType. ?pvType rdfs:label ?propTypeLabel. } • Exercise: use FILTER()/REGEX() to find organism=homo sapiens • Exercise: Find sample' repositories of provenance and their links – Hint: explore the sample's links (?smp) and see how RepositoryWebRecord looks like Try it at: http://www.ebi.ac.uk/rdf/services/biosamples/sparql Excercise Solution: see examples on such page
  9. 9. Samples about a given organism PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX biosd-terms: <http://rdf.ebi.ac.uk/terms/biosd/> SELECT DISTINCT ?smp ?pvLabel ?propTypeLabel WHERE { ?smp biosd-terms:has-bio-characteristic ?pv. ?pv biosd-terms:has-bio-characteristic-type ?pvType; rdfs:label ?pvLabel. ?pvType a ?pvTypeClass. # Listeria ?pvTypeClass rdfs:label ?propTypeLabel; # '*' gives you transitive closure, even when inference is disabled rdfs:subClassOf* <http://purl.obolibrary.org/obo/NCBITaxon_1637> } • Exercise: Use the Bioportal Service to first find all subclasses of 'alcohol' (obo:CHEBI_30879) and then search samples annotated with such subclasses – Hint: Use SERVICE <http://sparql.bioontology.org/ontologies/sparql/?apikey=KEY> Try it at: http://www.ebi.ac.uk/rdf/services/biosamples/sparql Excercise Solution: see one of the examples on such page
  10. 10. Geo-located Samples/Sample Groups PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX biosd-terms: <http://rdf.ebi.ac.uk/terms/biosd/> PREFIX sio: <http://semanticscience.org/resource/> SELECT DISTINCT ?item ?latVal ?longVal WHERE { ?item biosd-terms:has-bio-characteristic ?latPv, ?longPv. ?latPv biosd-terms:has-bio-characteristic-type [ rdfs:label ?latLabel]; sio:SIO_000300 ?latVal. # sio:has value FILTER ( REGEX ( ?latLabel, "latitude", "i" ) ). ?longPv biosd-terms:has-bio-characteristic-type [ rdfs:label ?longLabel ]; sio:SIO_000300 ?longVal. # sio:has value FILTER ( REGEX ( ?longLabel, "longitude", "i" ) ). } • Find all samples having an attribute of type temperature, with a numerical value and a unit specified. Hint: use sio:SIO_000221 (has unit), sio:SIO_000300 (has value) • Find samples/groups annotated with intervals, which use the properties biosd-terms:has-low-value and has-high-value and optionally have a unit. Try it at: http://www.ebi.ac.uk/rdf/services/biosamples/sparql Excercise Solutions: see examples on that page
  11. 11. Expressed Genes and Samples • For http://purl.uniprot.org/uniprot/P04637 (P53 in Human) • Find the EFO classes for which it is up-regulated in the Atlas (p-value < 1E-9) • And show the Atlas expression value label . Hints: – Start from the example http://tinyurl.com/kvvhw6b, – Use the Atlas endpoint: http://www.ebi.ac.uk/rdf/services/atlas/sparql • Find the samples having attributes that are instances of such EFO classes • Which comes from a repository other than 'ArrayExpress' • Hints: – Use SERVICE <http://www.ebi.ac.uk/rdf/services/biosamples/sparql> and a sub-query – Search property values linked to prop. types that are instances of the e.f. found by the Atlas – Then link to the samples, the samples to the submissions, the submissions to the web records ● OR JUST HAVE A LOOK: http://goo.gl/kOfE1r (will take a while...)
  12. 12. New Ideas and Alike
  13. 13. Geo-Samples, Google Map Integration • Exercise: From geo-located samples to Google Map. Think how to do it: ● Gmaps supports the KML format (https://developers.google.com/kml) ● You can type a KML-returning URL into maps.google.com (or pass it via GET, q=<kml-url>) ● The SPARQL endpoint can return results in XML format ● There are on line XSLTs: http://services.w3.org/xslt?xslfile=<url>&xmlfile=<url> http://tinyurl.com/kzd2pg4 http://tinyurl.com/lf2623l http://tinyurl.com/lltqy2u http://goo.gl/maps/CMRrk Many thanks to Costanza Romano
  14. 14. Search-by-Feature Similarity (ongoing) SELECT DISTINCT ?smp ? smpDescr (COUNT (DISTINCT ?pv) AS ?score) WHERE { { ?smp a biosd-terms:Sample; rdfs:comment ?smpDescr. ?smp biosd-terms:has-bio-characteristic ?pv. ?pv biosd-terms:has-bio-characteristic-type ?pvType. ?pvType a <http://purl.obolibrary.org/obo/NCBITaxon_10090>. } UNION { ?smp a biosd-terms:Sample; rdfs:comment ?smpDescr. ... ?pvType a <http://purl.obolibrary.org/obo/NCBITaxon_10090>. } UNION ... } GROUP BY ?smp ?smpDescr HAVING (COUNT (DISTINCT ?pv) > 0) ORDER BY DESC (COUNT (DISTINCT ?pv)) • Many thanks to AbdulShakur Abdullah, Eric Hillaert, Prasad Nuli (https://github.com/CapStoneEBI2014/biosd_similarity_search)
  15. 15. More (possibly for the hackathon) • Continuing with the similarity search • Improving linkage with other data sets – e.g., targeting samples in ArrayExpress/Atlas – e.g., links to EPMC data sets (PMID->PMC conversion), Bio2RDF publications, LLD publications • Aiming at supporting similar datasets – Interested in the on-going HCLS' work about HL7->RDF – Collaborating with the European biobank community ● Interested in the BBMRI-ontology (http://tinyurl.com/qjttyge) • Visualisations/widgets – Geo-located samples on a map – Samples on body map – Using the BioJS library
  16. 16. Acknowledgements • BioSD Team - Alvis Brazma, Tony Burdett, Adam Faulconbridge, Mike Gostev, Helen Parkinson, Rui Perreria, Ugis Sarkans, Drashtti Vasant • Tony Burdett for the help with Zooma • Simon Jupp, Andy Jenkinson, James Malone, for their great help with developing and setting up BioSD/RDF – The rest of the Linked Data team @EBI (http://www.ebi.ac.uk/rdf) • BiomedBridges FP7 project (http://www.biomedbridges.eu), for funding us
  17. 17. And you all! Contact info: www.ebi.ac.uk/biosamples www.marcobrandizi.info Sorry, we have grown to ~4M samples, yet we don't have all of them, not even this year... (Sources: http://en.wikipedia.org/wiki/File:Assorted_computer_mice_-_MfK_Bern.jpg, http://tinyurl.com/otfnhk6, http://tinyurl.com/odkadvn, http://tinyurl.com/pyrqrdf)
  18. 18. Extras
  19. 19. BioSD Data (External Data Sources) SPARQL Source: http://tinyurl.com/o95xa5v Tag Cloud made with http://www.wordle.net (2013) submissions sampleGroups samples 126490 126492 3925151 Computed on v20141205, SPARQL Source: http://tinyurl.com/ocyb2ld Total number of triples is 190637851 (http://tinyurl.com/pkyvmnc)
  20. 20. BioSD Data (Common Attribute Types) SPARQL Source: http://goo.gl/wk0RHp Tag Cloud made with http://www.wordle.net (2013)
  21. 21. Main Ontologies used in BioSD / Linked Data • See Doc Page http://www.ebi.ac.uk/rdf/documentation/biosamples • biosd-terms (http://tiny.cc/biosd_terms) – a small application ontology defining specific classes and properties, e.g., sample, sample group, has-knowledgeable-person • Experimental Factors Ontology (EFO) – mainly to define/annotate sample attributes • Ontology for Biomedical Investigations (OBI) • Information Artefacts Ontology (IAO) • Semantic Science Ontology (SIO) – to define main classes in BioSD/RDF • Bibliographic Ontology (BIBO) – We link publications about submissions/sample sets • Dublin Core, schema.org, FOAF – for general categories and in the Linked Data spirit • Linked automatically by Zooma: many more (e.g., CHEBI, NCBI-Tax, GO)

×