Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Semantic Variation Graphs the case for RDF & SPARQL

387 views

Published on

Presentation given to the GA4GH dataworking group. It starts with an introduction to what RDF is followed by how one can model genomic variation graphs in RDF. Then we show how one can use SPARQL to query this data.

Published in: Science
  • Be the first to comment

  • Be the first to like this

Semantic Variation Graphs the case for RDF & SPARQL

  1. 1. Jerven Bolleman Swiss-Prot Group Semantic Variation Graphs the case for RDF & SPARQL
  2. 2. Jerven Bolleman Swiss-Prot Group
  3. 3. Resource Description Framework Subjec t ObjectPredicate Resource Description Framework
  4. 4. Resource Description Framework Subjec t ObjectPredicate a Resource Description Framework
  5. 5. Resource Description Framework Subjec t ObjectPredicate a Resource Description Framework
  6. 6. Virtuoso Universal Server Lots of SPARQL databases Resource Description Framework ✔︎
  7. 7. RDF Turtle RDFa inside HTML N- Triples RDF / THRIF T JSON- LD RDF / XML Resource Description Framework
  8. 8. RDF Turtle RDFa inside HTML N- Triples RDF / THRIF T JSON- LD RDF / XML Resource Description Framework
  9. 9. Nodes and Edges are Resources • Resource → Identified by a URI – http://purl.uniprot.org/core/ – urn:guid:21EC2020-3AEA-4069-A2DD-08002B30309D – mailto:help@uniprot.org – urb:isbn:978-3-16-148410-0 • Nice if public but not a requirement Resource Description Framework
  10. 10. Terminal edges are literals • String (xsd:string) “P53” • Date (xsd:date & xsd:dateTime) "1987-08-13"^^xsd:date • Numbers (xsd:int & xsd:decimal & …) 1 or “1”^^xsd:integer or -1.1 or “-1.1”^^xsd:decimal • Language string “Switzerland”@en “Suisse”@fr “Schweiz”@de “Svizzera”@it Resource Description Framework
  11. 11. Others use it too, and are cross query-able
  12. 12. 13 one party evolves data format everyone evolves data format Protocol Buffers Google's data interchange formatGFF
  13. 13. Jerven Bolleman Swiss-Prot Group
  14. 14. AC 4 nodes 15 ACTG T GA Variation Graph as RDF
  15. 15. T 4 nodes 16 1 2 4 3 AC ACTG GA base <uri of vg schema> prefix node:<uri of vg graph> node:1 a <Node> ; rdf:value “ACTG” . node:2 a <Node> ; rdf:value “AC” . node:3 a <Node> ; rdf:value “T” . node:4 a <Node> ; rdf:value “GA” Variation Graph as RDF
  16. 16. T 4 nodes 17 1 2 4 3 AC ACTG GA base <uri of vg schema> prefix node:<uri of vg graph> node:1 <linksForwardToForward> node:2 , node:3 . node:2 <linksForwardToForward> node:4 . node:3 <linksForwardToForward> node:4 . Variation Graph as RDF
  17. 17. T 4 nodes → 1 Path 18 1 2 4 3 AC ACTG GA base <uri of vg schema> prefix n:<uri of vg graph> path:1 a <Path> ; rdfs:label “Genome of patient a” ; rdfs:comment “Paths through VG make linear sequences, e.g. a reference genome or a patient assembly” Variation Graph as RDF
  18. 18. T 4 nodes → 1 Path → 3 Steps 19 1 2 4 3 AC ACTG GA base <uri of vg schema> prefix n:<uri of vg graph> step:1 a <Step> ; <node> node:1 ; <rank> 1 ; <path> path:1 . step:2 a <Step> ; <node> node:2 ; <rank> 2 ; <path> path:1 . Variation Graph as RDF
  19. 19. Jerven Bolleman Swiss-Prot Group
  20. 20. Build a “FASTA” from a VG 21 PREFIX vg:<http://example.org/vg/> PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#> SELECT ?path (group_concat(?sequence; separator="") as ?pathSeq) WHERE { [] vg:path ?path; vg:node ?node; vg:rank ?rank. ?node rdf:value ?sequence } GROUP BY ?path ORDER BY ?rank Variation Graph as RDF
  21. 21. 22 PREFIX vg:<http://example.org/vg/> PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#> SELECT ?path (group_concat(?sequence; separator="") as ?pathSeq) WHERE { [] vg:path ?path; vg:node ?node; vg:rank ?rank. ?node rdf:value ?sequence } GROUP BY ?path ORDER BY ?rank Build a “FASTA” from a VG
  22. 22. 23 PREFIX vg:<http://example.org/vg/> PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#> SELECT ?path (group_concat(?sequence; separator="") as ?pathSeq) WHERE { [] vg:path ?path; vg:node ?node; vg:rank ?rank. ?node rdf:value ?sequence } GROUP BY ?path ORDER BY ?rank Build a “FASTA” from a VG
  23. 23. PREFIX vg:<http://example.org/vg/> PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#> SELECT ?path (group_concat(?sequence; separator="") as ?pathSeq) WHERE { [] vg:path ?path; vg:node ?node; vg:rank ?rank. ?node rdf:value ?sequence } GROUP BY ?path ORDER BY ?rank 24 Build a “FASTA” from a VG
  24. 24. SPARQL a standard query language See VG WIKI for more examples VG 1000 Genomes → 50 GB on disk in DB VG 100,000 Genomes → ±2 TB on disk in DB Querying a Variation Graph
  25. 25. Summary • RDF – simple data model – consistent identifiers – anyone can say anything about anything • SPARQL – graph query language – wide scale commercial deployment – HTTP|REST in the box – in clinical use – federated queries on user demand – can be used for variation graphs
  26. 26. Questions? 27

×