Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
RDF/SPARQL:
a UniProtKB/Swiss-Prot
practical perspective
Jerven Bolleman
Developer
Swiss-Prot Group
Our Goals
• Provide	core	Bioinformatics	resources	
– UniProtKB/	
– 		
– …	
• Provide	services	and	infrastructure	
– 						...
Genetic	Variations	and	Diseases	in	
UniProtKB/Swiss-Prot:		
The	Ins	and	Outs	of	Expert	Manual	Curation	
Famiglietti, et al...
Why provide a public SPARQL endpoint
• A	10	man	wet	laboratory	can	not	afford:
Why provide a public SPARQL endpoint
• A	10	man	wet	laboratory	can	not	afford:	
– to	host	their	own	database	in	house	hold...
Why provide a public SPARQL endpoint
• A	10	man	wet	laboratory	can	not	afford:	
– to	host	their	own	database	in	house	hold...
← Not CPU Time...
But Brain Time
↓
The right kind of optimisation
Why provide a public SPARQL endpoint
• Classical	SQL	can	be	provided	on	the	web	
–Is	not	practical	
–No	federation	
–Poor	...
Data Integration Traditional
Pathway.txt
UniProt.txt
Pathway
Parser
UniProt
Parser
Pathway
Schema
UniProt
Schema
Own Lab D...
Data Integration RDF/SPARQL
Pathway.rdf
UniProt.rdf
Own Lab Data
Triple Store
SPARQL
Queries
$
$?
Why not some other graph database?
Ecosystem
RDF enables sharing and reuse of data at low cost
Identity Precision Standards
Why provide a public SPARQL endpoint
• Document	centric	REST	is	not	enough	
–Swiss-Prot	available	as	REST		
–(over e-mail ...
13
© 2015 SIB
100
10'000
1'000'000
2015-012015-022015-032015-042015-052015-062015-072015-082015-09
queries ask select
construct describe...
Real users
Mix between hard analytics and super specific
Estimate somewhere between:
400 - 1200 real humans per month
We k...
Questions?
18
© 2015 SIB
help@uniprot.org
Why sparql tohu
Upcoming SlideShare
Loading in …5
×

Why sparql tohu

525 views

Published on

Presentation given at the 2nd DBCLS RDF Summit, discussing Swiss-Prot's RDF representation and why we have one.

Published in: Science
  • Be the first to comment

  • Be the first to like this

Why sparql tohu

  1. 1. RDF/SPARQL: a UniProtKB/Swiss-Prot practical perspective Jerven Bolleman Developer Swiss-Prot Group
  2. 2. Our Goals • Provide core Bioinformatics resources – UniProtKB/ – – … • Provide services and infrastructure – Vital-IT : HPC for the life-sciences – …
  3. 3. Genetic Variations and Diseases in UniProtKB/Swiss-Prot: The Ins and Outs of Expert Manual Curation Famiglietti, et al. We annotate a lot of disease/variants! http://europepmc.org/abstract/MED/24848695
  4. 4. Why provide a public SPARQL endpoint • A 10 man wet laboratory can not afford:
  5. 5. Why provide a public SPARQL endpoint • A 10 man wet laboratory can not afford: – to host their own database in house holding all or even a bit of all life science data.
  6. 6. Why provide a public SPARQL endpoint • A 10 man wet laboratory can not afford: – to host their own database in house holding all or even a bit of all life science data. – not to have access, and use, existing life science information.
  7. 7. ← Not CPU Time... But Brain Time ↓ The right kind of optimisation
  8. 8. Why provide a public SPARQL endpoint • Classical SQL can be provided on the web –Is not practical –No federation –Poor standards conformance • Local SQL is expensive • Local JSON is no better • Nor is local XML
  9. 9. Data Integration Traditional Pathway.txt UniProt.txt Pathway Parser UniProt Parser Pathway Schema UniProt Schema Own Lab Data Data warehouse SQL queries $ $ $ $ $ $
  10. 10. Data Integration RDF/SPARQL Pathway.rdf UniProt.rdf Own Lab Data Triple Store SPARQL Queries $ $?
  11. 11. Why not some other graph database? Ecosystem RDF enables sharing and reuse of data at low cost Identity Precision Standards
  12. 12. Why provide a public SPARQL endpoint • Document centric REST is not enough –Swiss-Prot available as REST –(over e-mail !!) since 1986 –expasy.ch since 1993 –www.uniprot.org since 2002 • Most user use a GUI not a CLI • developers build GUI on a CLI
  13. 13. 13 © 2015 SIB
  14. 14. 100 10'000 1'000'000 2015-012015-022015-032015-042015-052015-062015-072015-082015-09 queries ask select construct describe Queries per month in 2015 peak: 4 million per month
  15. 15. Real users Mix between hard analytics and super specific Estimate somewhere between: 400 - 1200 real humans per month We know they are real because they take holidays ;)
  16. 16. Questions?
  17. 17. 18 © 2015 SIB help@uniprot.org

×