BioSamples Database Linked DataBioSamples Database Linked Data
Marco Brandizi, Functional Genomics Team
SWAT4LS Tutorial, ...
• A reference system, where to search/browse information about biological
samples used/useable for biomedical experiments
...
• Yet another type of interface, potentially useful to application developers
and Linked Data tools
• Integration with sim...
The BioSD Model
Sample Groups
Submission
External links
Samples
http://www.ebi.ac.uk/biosamples
The BioSD Model
Group's (or Submission's) samples
Sample's (or Groups') attribute types
and values
External links
BioSD Data (External Data Sources)
SPARQL Source: http://tinyurl.com/o95xa5v
Tag Cloud made with http://www.wordle.net
SPA...
BioSD Data (Common Attribute Types)
SPARQL Source: http://tinyurl.com/pjgdtzs
Tag Cloud made with http://www.wordle.net
BioSD Linked Data Model (Main Entities)
Please have a look at:
http://tinyurl.com/lo33ncc
BioSD Linked Data Model (Sample Attributes)
Please have a look at:
http://tinyurl.com/n5oyvyd
SPARQL Queries
Find Samples and attributes
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX biosd-terms: <http://rdf.ebi.ac.uk...
Samples about a given organism
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX biosd-terms: <http://rdf.ebi.ac...
Geo-located Samples/Sample Groups
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX biosd-terms: <http://rdf.ebi...
Expressed Genes and Samples
• For http://purl.uniprot.org/uniprot/P04637 (P53 in Human)
• Find the EFO classes for which i...
Ideas for the Hackaton
• Refer to http://tinyurl.com/mo7wgye for details
• From geo-located samples (samples annotated wit...
Acknowledgements
• BioSD Team - Alvis Brazma, Tony Burdett, Adam
Faulconbridge, Mike Gostev, Helen Parkinson, Rui Perreria...
And you all!
Sorry, we have 2.7M samples, but not all of them...
(Source: http://en.wikipedia.org/wiki/File:Assorted_compu...
Extras
• biosd-terms (http://tiny.cc/biosd_terms)
– a small application ontology defining specific classes and properties, e.g.,
...
BioSD → RDF
Conversion
github.com/EBIBioSamples/biosd2rdf
github.com/EBIBioSamples/biosd2rdf
Upcoming SlideShare
Loading in...5
×

BioSamples Database Linked Data, SWAT4LS Tutorial

566

Published on

Presentation used for the SWAT4LS tutorial about EBI RDF/Linked Data tutorial. This about the Biosamples databsse dataset.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
566
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
17
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • &amp;lt;number&amp;gt;
  • &amp;lt;number&amp;gt;
  • &amp;lt;number&amp;gt;
  • &amp;lt;number&amp;gt;
  • &amp;lt;number&amp;gt;
  • &amp;lt;number&amp;gt;
  • &amp;lt;number&amp;gt;
  • BioSamples Database Linked Data, SWAT4LS Tutorial

    1. 1. BioSamples Database Linked DataBioSamples Database Linked Data Marco Brandizi, Functional Genomics Team SWAT4LS Tutorial, Dec 9th, 2013 Find this presentation at http://tiny.cc/bsdswt13
    2. 2. • A reference system, where to search/browse information about biological samples used/useable for biomedical experiments • Focused on the sample context (i.e., independent on the specific assay type/technology) • Supports heterogeneous experiments – Single place assay repositories can link (reference samples, authoritative source for repositories like Metagenomics/ENA/ArrayExpress) – Single place for searches and related-to or same-as relationships (e.g., see the 'myEquivalents' project) • Allows for consistency/standardisation of sample attributes/annotations • Common IT interfaces to access sample information and links to specific data/repositories (e.g., web, XML/REST, RDF) Why a BioSamples Database (aka BioSD)?
    3. 3. • Yet another type of interface, potentially useful to application developers and Linked Data tools • Integration with similar/related data-sets (see example queries below!) • Exploitation of ontologies (see below!) – Standardisation – A little semantics goes a long way • Modelling of certain aspects enhanced – e.g., numbers, intervals, dates, units are detected from string value labels and triplified. • Who knows? – Apps! – See Hackaton ideas below! Why Linked Data for BioSD?
    4. 4. The BioSD Model Sample Groups Submission External links Samples http://www.ebi.ac.uk/biosamples
    5. 5. The BioSD Model Group's (or Submission's) samples Sample's (or Groups') attribute types and values External links
    6. 6. BioSD Data (External Data Sources) SPARQL Source: http://tinyurl.com/o95xa5v Tag Cloud made with http://www.wordle.net SPARQL Source: http://tinyurl.com/ocyb2ld
    7. 7. BioSD Data (Common Attribute Types) SPARQL Source: http://tinyurl.com/pjgdtzs Tag Cloud made with http://www.wordle.net
    8. 8. BioSD Linked Data Model (Main Entities) Please have a look at: http://tinyurl.com/lo33ncc
    9. 9. BioSD Linked Data Model (Sample Attributes) Please have a look at: http://tinyurl.com/n5oyvyd
    10. 10. SPARQL Queries
    11. 11. Find Samples and attributes PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX biosd-terms: <http://rdf.ebi.ac.uk/terms/biosd/> PREFIX sio: <http://semanticscience.org/resource/> SELECT DISTINCT ?smp ?pvLabel ?propTypeLabel WHERE { ?smp a biosd-terms:Sample; biosd-terms:has-bio-characteristic | sio:SIO_000332 ?pv. # is about ?pv rdfs:label ?pvLabel; biosd-terms:has-bio-characteristic-type ?pvType. ?pvType rdfs:label ?propTypeLabel. } • Exercise: use FILTER()/REGEX() to find organism=homo sapiens • Exercise: Find sample provenance repositories and their links – Hint: explore the sample's links (?smp) and see how RepositoryWebRecord looks like Try it at: http://www.ebi.ac.uk/rdf/services/biosamples/sparql Excercise Solution: see examples on such page
    12. 12. Samples about a given organism PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX biosd-terms: <http://rdf.ebi.ac.uk/terms/biosd/> SELECT DISTINCT ?smp ?pvLabel ?propTypeLabel WHERE { ?smp biosd-terms:has-bio-characteristic ?pv. ?pv biosd-terms:has-bio-characteristic-type ?pvType; rdfs:label ?pvLabel. ?pvType a ?pvTypeClass. # Listeria ?pvTypeClass rdfs:label ?propTypeLabel; # '*' gives you transitive closure, even when inference is didsbled rdfs:subClassOf* <http://purl.obolibrary.org/obo/NCBITaxon_1637> } • Exercise: Use the Bioportal Service to first find all subclasses of 'alchool' (obo:CHEBI_30879) and then search samples annotated with such subclasses – Hint: Use SERVICE <http://sparql.bioontology.org/ontologies/sparql/?apikey=KEY> Try it at: http://www.ebi.ac.uk/rdf/services/biosamples/sparql Excercise Solution: see one of the examples on such page
    13. 13. Geo-located Samples/Sample Groups PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX biosd-terms: <http://rdf.ebi.ac.uk/terms/biosd/> PREFIX sio: <http://semanticscience.org/resource/> SELECT DISTINCT ?item ?latVal ?longVal WHERE { ?item biosd-terms:has-bio-characteristic ?latPv, ?longPv. ?latPv biosd-terms:has-bio-characteristic-type [ rdfs:label ?latLabel]; sio:SIO_000300 ?latVal. # sio:has value FILTER ( REGEX ( ?latLabel, "latitude", "i" ) ). ?longPv biosd-terms:has-bio-characteristic-type [ rdfs:label ?longLabel ]; sio:SIO_000300 ?longVal. # sio:has value FILTER ( REGEX ( ?longLabel, "longitude", "i" ) ). } • Find all samples having an attribute of type temperature, with a numerical value and a unit specified. Hint: use sio:SIO_000221 (has unit), sio:SIO_000300 (has value) • Find samples/groups annotated with intervals, which use the properties biosd-terms:has-low- value and has-high-value and optionally have a unit. Try it at: http://www.ebi.ac.uk/rdf/services/biosamples/sparql Excercise Solutions: see examples on that page
    14. 14. Expressed Genes and Samples • For http://purl.uniprot.org/uniprot/P04637 (P53 in Human) • Find the EFO classes for which it is up-regulated in the Atlas (p-value < 1E-9) • And show the atlas expression value label . Hints: – Start from the example http://tinyurl.com/kvvhw6b, – Use the Atlas endpoint: http://www.ebi.ac.uk/rdf/services/atlas/sparql • Find the samples having attributes that are instances of such EFO classes • Which comes from a repository other than 'ArrayExpress' • Hints: – Use SERVICE <http://www.ebi.ac.uk/rdf/services/biosamples/sparql> and a sub-query – Search property values linked to prop. types that are instances of the e.f. found by the Atlas – Then link to the samples, the samples to the submissions, the submissions to the web records ● OR JUST HAVE A LOOK: http://tinyurl.com/ln3m7nv (will take a while...)
    15. 15. Ideas for the Hackaton • Refer to http://tinyurl.com/mo7wgye for details • From geo-located samples (samples annotated with latitude/longitude) to Google maps, e.g, by using Exhibit (http://www.simile-widgets.org/exhibit/) • Take similar datasets (e.g., MAASTRO, Breast Cancer Data, your data), unify the schemas (e.g., using CONSTRUCT), define federated queries • Use the Shape or OpenPHACTS validator to define sensible rules for BioSD and similar data-sets, e.g., must contain an organism, should have a treatment • Design/build an App (or Web widget) that asks for eligibility criterion, i.e., pairs of attribute value/type, and translate it into a SPARQL query (or a more complex search based on SPARQL) to find samples – Use common ontologies for auto-completion over property types – Use string-based auto-completion for values – Consider numerical values, intervals, units – Do approximate matching, i.e., matching 8/10 of specified pairs is good.
    16. 16. Acknowledgements • BioSD Team - Alvis Brazma, Tony Burdett, Adam Faulconbridge, Mike Gostev, Helen Parkinson, Rui Perreria, Ugis Sarkans, Drashtti Vasant • Tony Burdett for the help with Zooma • Simon Jupp, Andy Jenkinson, James Malone, for their great help with developing and setting up BioSD/RDF – The rest of the Linked Data team @EBI (http://www.ebi.ac.uk/rdf) • BiomedBridges FP7 project (http://www.biomedbridges.eu), for funding us
    17. 17. And you all! Sorry, we have 2.7M samples, but not all of them... (Source: http://en.wikipedia.org/wiki/File:Assorted_computer_mice_-_MfK_Bern.jpg) Contact info: www.ebi.ac.uk/biosamples www.marcobrandizi.info
    18. 18. Extras
    19. 19. • biosd-terms (http://tiny.cc/biosd_terms) – a small application ontology defining specific classes and properties, e.g., sample, sample group, has-knowledgeable-person • Experimental Factors Ontology (EFO) – mainly to define/annotate sample attributes • Ontology for Biomedical Investigations (OBI) • Information Artefacts Ontology (IAO) • Semantic Science Ontology (SIO) – to define main classes in BioSD/RDF • Bibliographic Ontology (BIBO) – We link publications about submissions/sample sets • Dublin Core, schema.org, FOAF – for general categories and in the Linked Data spirit • Linked automatically by Zooma: many more (e.g., CHEBI, NCBI-Tax, GO) Main Ontologies used in BioSD / Linked Data
    20. 20. BioSD → RDF Conversion github.com/EBIBioSamples/biosd2rdf github.com/EBIBioSamples/biosd2rdf
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×