Linked Data for integrating life-science databases

3,375 views
3,314 views

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
3,375
On SlideShare
0
From Embeds
0
Number of Embeds
1,787
Actions
Shares
0
Downloads
16
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Linked Data for integrating life-science databases

  1. 1. RDF
  2. 2. •• etc••
  3. 3. B= 150B 113B 75B 38B B 1982 1986 1990 1994 1998 2002 2006 2010
  4. 4. IDGene Ontology, EC etc
  5. 5. RDF•• UniProt• PDBJ DDBJ•Bio2RDF BioGateway RDF
  6. 6. UniProt RDF• UniProt•• UniProt RDF
  7. 7. UniProt Name Description Source File size #triples uniprot Protein annotation data UniProt consortium 14G 3.3 B uniref Clusters of proteins with similar sequences UniProt consortium 7G 900M uniparc Non-redundant archive of UniProt sequences UniProt consortium 65G 1B citations Literature citations UniProt consortium 1355M 10,177,308 taxonomy Classification of organisms UniProt consortium 421M 5,041,437 journals Journals UniProt consortium 3M 34,850 pathways Pathways UniProt consortium 1000K 8,865 keywords Keywords UniProt consortium 940K 8,449 locations Subcellular locations UniProt consortium 468K 4,476 tissues TIssues UniProt consortium 572K 7439components Cellular components (Organelles) UniProt consortium 6K 43 go Gene onotology SBI 25M 263,944 enzymes Classification of enzymes GO consortium 4M 4,476 core.owl Classes and properties for UniProt RDF UniProt consortium 152K
  8. 8. #triples Sesame Java 70 M 4store C 15 B 5store C Virtuoso C 15.4 B Jena Java 1.7 B Bigdata Java 12.7 B ARC PHPAllegroGraph Lisp 1B http://esw.w3.org/LargeTripleStores
  9. 9. Protein UniProt Components encodedIn core.owl<owl:ObjectProperty rdf:about="encodedIn"> <rdfs:label rdf:datatype="&xsd;string">encoded in</rdfs:label> <rdfs:comment rdf:datatype="&xsd;string" >The subcellular location where a protein is encoded.</rdfs:comment> <rdfs:domain rdf:resource="Protein"/> <rdfs:range rdf:resource="Subcellular_Location"/></owl:ObjectProperty>
  10. 10. RDF purl http://purl.uniprot.org/{database}/{identifier} UniProt http://purl.uniprot.org/core/ Gene URI http://purl.uniprot.org/core/Gene type
  11. 11. PDBJ, DDBJ RDF• PDBJ 47 4.7B• http://www.pdbj.org/rdf ID• DDBJ INSD: International Nucleotide Sequence Database 1.2 76 7.6B• mulgara (http://mulgara.org/)
  12. 12. RDF KEGG Taxonomy 23,238KEGG GENES Cyanobacteria 708,745 KEGG OC 10,384,602 hmmer Pfam-A vs Cyano 11,881,212 hmmer Pfam-B vs Cyano 7,007,154 Kazusa Annotatioin 2,807,879
  13. 13. 1•• Synechococcus• 1.0e-20• Pfam
  14. 14. 1 SPARQLSPARQL PREFIX hmmer: <http://hmmer.janelia.org/>PREFIX kegg: <http://www.kegg.jp/>PREFIX kg: <http://www.kegg.jp/entry/>PREFIX pfam: <http://pfam.sanger.ac.uk/>PREFIX kt: <http://www.kegg.jp/taxon/>SELECT ?pfam1, ?pfam2, COUNT(DISTINCT(?org))WHERE {  GRAPH <hmmer_pfam_a_cyano> {    ?gene hmmer:hit ?n1 .    ?gene hmmer:hit ?n2 .    ?n1 pfam:pfam_id ?pfam1 .    ?n1 hmmer:i-evalue ?eval1 .    ?n2 pfam:pfam_id ?pfam2 .    ?n2 hmmer:i-evalue ?eval2 .  }  GRAPH <http://www.kegg.jp/genes> {    ?gene kegg:belongs_to ?org .  }  GRAPH <http://www.kegg.jp/taxonomy> {    ?org kegg:belongs_to kt:Synechococcus .  }  FILTER (?eval1 < 1.0e-10 && ?eval2 < 1.0e-10 && ?pfam1 != ?pfam2)};
  15. 15. 10 Domain I Domain II #genes #speciesRNA_pol_Rpb2 RNA_pol_Rpb2 9 9 _3 G6PD_N _1 G6PD_C 9 95_3_exonuc_N 5_3_exonuc 9 9 HIT DcpS_C 9 9Glyco_hydro_38 Glyco_hydro_38 9 9 CRNA_pol_Rpb2 RNA_pol_Rpb2 9 9 _6 GARS_N _3 GARS_C 9 9 DSHCT DEAD 9 9 adh_short KR 12 9 EFG_C EFG_IV 10 9 .... 171 9 Synechococcus
  16. 16. 2• KEGG OC• Cyanobacteria• Kazusa Annotation PumMed• KO KEGG Othology
  17. 17. 2 SPARQLSPARQLPREFIX kegg: <http://www.kegg.jp/>PREFIX kg: <http://www.kegg.jp/entry/>PREFIX kt: <http://www.kegg.jp/taxon/>PREFIX kns: <http://a.kazusa.or.jp/ns/>SELECT ?oc, ?gene, ?ko, COUNT(DISTINCT(?pm))WHERE {  GRAPH <http://www.kegg.jp/oc> {    ?gene kegg:belongs_to ?oc .  }  GRAPH <http://www.kegg.jp/genes> {    ?gene kegg:belongs_to ?taxon .    ?gene kegg:linked_to ?cb_gene .    OPTIONAL {      ?gene kg:ortholog ?ko .    }  }  GRAPH <http://www.kegg.jp/taxonomy> {    ?taxon kegg:belongs_to kt:Cyanobacteria .  }  GRAPH <http://kazusa.or.jp/cyanobase> {    ?cb_gene ?p1 ?bm .    ?bm ?p2 ?pm .  }};
  18. 18. PumMed ID 10 OC #gene with PMID #PMID Genes_537709 3 1296 Genes_565278 3 761 Genes_710476 2 527 Genes_189668 1 497 Genes_710587 1 479 Genes_710480 1 416 Genes_711471 1 407 Genes_71824 1 393 Genes_75617 5 381 Genes_711511 1 376
  19. 19. Semantic Web• URI••• W3C
  20. 20. Semantic Web• SPARQL ->• ->•• ->

×