SlideShare a Scribd company logo
1 of 17
Download to read offline
The UniProt SPARQL endpoint:
in production
© 2015 SIB
Why provide a public SPARQL endpoint
• A 10 man wet laboratory can not afford:
– to host their own database houses holding all or even a bit of all
life science data.
– not to have access, and use, existing life science information.
• Classical SQL can be provided on the web
– Is not practical
– No federation
– No standards adherence
• Document centric REST is not enough
– Swiss-Prot available as REST (over e-mail !!) since
1986
– www.uniprot.org since 2002
2
© 2015 SIB
3
© 2015 SIB
4
© 2015 SIB
5
help@uniprot.org
© 2015 SIB
14,821,380,921
6
Node 1
64 cpu cores
256 GB ram
2.5 TB consumer SSD
Load Balancer = Apache mod_balancer
Node 2
64 cpu cores
256 GB ram
2.5 TB consumer SSD
© 2015 SIB
7
Node 1 Node 2
Tomcat + Sesame + UI
Virtuoso 7.2 (+)
Tomcat + Sesame + UI
Virtuoso 7.2 (+)
Load Balancer = Apache mod_balancer
14,821,380,921
15,288,484,658
© 2015 SIB
8
Node 1 Node 2
Tomcat + Sesame + UI
Virtuoso 7.2 (+)
Tomcat + Sesame + UI
Virtuoso 7.2 (+)
Load Balancer =
Apache mod_balancer
Load Balancer =
Apache mod_balancer
2 independant datacentres
© 2015 SIB
Dedicated machine for loading and testing
• Loading RDF data “solved” problem
– 500,000 triples per second easy
• that’s what our machine plus virtuoso 7.2
– and some tricks does
– 1,000,000 possible (xz unzip limit on our machine)
• nquads or rdf/xml
– higher values needs parallel readers
• or even lighter weight parsers
– highest observed rate
• 2.5 million per second on 1/4 exadata
– could be pushed higher
9
© 2015 SIB
Openlink SW: Virtuoso 7.2
• very responsive to issues
• performance is good and getting better
• not quite fully SPARQL 1.1 yet (corner cases)
• anytime query
– do not accept the default settings
– spend some time securing your setup
– (jails,cgroups, etc…)
10
⚠
© 2015 SIB
Challenges as a public endpoint provider
• Query load unpredictable
• Simple data discovery queries are hard
– 1 TB+ of DB files
– e.g. from monitoring services
• Query timeouts not sufficient
– aim for 100% utilisation
– what can http reasonably support
– we want to be able to answer hard questions
11
© 2015 SIB
12
Entries in UniProtKB over time
Growing by 400 million triples a month
100
10'000
1'000'000
2015-01
2015-02
2015-03
2015-04
2015-05
2015-06
2015-07
2015-08
queries
ask
select
construct
describe
© 2015 SIB
Queries per month in 2015
peak: 4 million per month
13
© 2015 SIB
Real users
Mix between hard analytics and super specific
Estimate somewhere between:
300 - 1000 real humans per month
We know they are real because they take holidays ;)
14
© 2015 SIB
15
• Public monitoring also hard
– often lower uptime than what is being monitored
– robots.txt
– not enough community support
– service description
• not being parsed
– HEAD last modified?
© 2015 SIB
Public monitoring key aid in quality assurance
16
✔sparqles
© 2015 SIB
Key-Value orientated SPARQL endpoint anyone?
• assume 400 million
named graphs
– average 50 triples
• max 5000 triples
– get the whole
named graph
• single IO
operation
17
CONSTRUCT {}
FROM uniprot:P05067
WHERE {}

More Related Content

Similar to UniProt SPARQL Endpoint: Why Provide Public Access to Life Science Data

20 billion triples in production
20 billion triples in production20 billion triples in production
20 billion triples in productionLDBC council
 
FOSDEM 2015 - NoSQL and SQL the best of both worlds
FOSDEM 2015 - NoSQL and SQL the best of both worldsFOSDEM 2015 - NoSQL and SQL the best of both worlds
FOSDEM 2015 - NoSQL and SQL the best of both worldsAndrew Morgan
 
Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store
Azure + DataStax Enterprise (DSE) Powers Office365 Per User StoreAzure + DataStax Enterprise (DSE) Powers Office365 Per User Store
Azure + DataStax Enterprise (DSE) Powers Office365 Per User StoreDataStax Academy
 
Top 10 Tips for an Effective Postgres Deployment
Top 10 Tips for an Effective Postgres DeploymentTop 10 Tips for an Effective Postgres Deployment
Top 10 Tips for an Effective Postgres DeploymentEDB
 
Alta Disponibilidade no MySQL 5.7
Alta Disponibilidade no MySQL 5.7Alta Disponibilidade no MySQL 5.7
Alta Disponibilidade no MySQL 5.7MySQL Brasil
 
IPv6 deployment on GridPP & WLCG
IPv6 deployment on GridPP & WLCGIPv6 deployment on GridPP & WLCG
IPv6 deployment on GridPP & WLCGJisc
 
IMCSummit 2015 - 1 IT Business - The Evolution of Pivotal Gemfire
IMCSummit 2015 - 1 IT Business  - The Evolution of Pivotal GemfireIMCSummit 2015 - 1 IT Business  - The Evolution of Pivotal Gemfire
IMCSummit 2015 - 1 IT Business - The Evolution of Pivotal GemfireIn-Memory Computing Summit
 
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander DibboOpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander DibboOpenNebula Project
 
Openstack Summit Vancouver 2015 - Maintaining and Operating Swift at Public C...
Openstack Summit Vancouver 2015 - Maintaining and Operating Swift at Public C...Openstack Summit Vancouver 2015 - Maintaining and Operating Swift at Public C...
Openstack Summit Vancouver 2015 - Maintaining and Operating Swift at Public C...donaghmccabe
 
Kubernetes Scaling SIG (K8Scale)
Kubernetes Scaling SIG (K8Scale)Kubernetes Scaling SIG (K8Scale)
Kubernetes Scaling SIG (K8Scale)KubeAcademy
 
K8scale update-kubecon2015
K8scale update-kubecon2015K8scale update-kubecon2015
K8scale update-kubecon2015Bob Wise
 
200 million qps on commodity hardware : Getting started with MySQL Cluster 7.4
200 million qps on commodity hardware : Getting started with MySQL Cluster 7.4200 million qps on commodity hardware : Getting started with MySQL Cluster 7.4
200 million qps on commodity hardware : Getting started with MySQL Cluster 7.4Frazer Clement
 
Skipping OpenStack Releases: (You Don't) Gotta Catch 'Em All
Skipping OpenStack Releases: (You Don't) Gotta Catch 'Em AllSkipping OpenStack Releases: (You Don't) Gotta Catch 'Em All
Skipping OpenStack Releases: (You Don't) Gotta Catch 'Em AllMark Voelker
 
HTTP/2 Comes to Java - What Servlet 4.0 Means to You
HTTP/2 Comes to Java - What Servlet 4.0 Means to YouHTTP/2 Comes to Java - What Servlet 4.0 Means to You
HTTP/2 Comes to Java - What Servlet 4.0 Means to YouDavid Delabassee
 
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...Spark Summit
 
BIO IT 15 - Are Your Researchers Paying Too Much for Their Cloud-Based Data B...
BIO IT 15 - Are Your Researchers Paying Too Much for Their Cloud-Based Data B...BIO IT 15 - Are Your Researchers Paying Too Much for Their Cloud-Based Data B...
BIO IT 15 - Are Your Researchers Paying Too Much for Their Cloud-Based Data B...Dirk Petersen
 
Unlocking big data with Hadoop + MySQL
Unlocking big data with Hadoop + MySQLUnlocking big data with Hadoop + MySQL
Unlocking big data with Hadoop + MySQLRicky Setyawan
 
MySQL 5.7 como Document Store
MySQL 5.7 como Document StoreMySQL 5.7 como Document Store
MySQL 5.7 como Document StoreMySQL Brasil
 

Similar to UniProt SPARQL Endpoint: Why Provide Public Access to Life Science Data (20)

20 billion triples in production
20 billion triples in production20 billion triples in production
20 billion triples in production
 
FOSDEM 2015 - NoSQL and SQL the best of both worlds
FOSDEM 2015 - NoSQL and SQL the best of both worldsFOSDEM 2015 - NoSQL and SQL the best of both worlds
FOSDEM 2015 - NoSQL and SQL the best of both worlds
 
Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store
Azure + DataStax Enterprise (DSE) Powers Office365 Per User StoreAzure + DataStax Enterprise (DSE) Powers Office365 Per User Store
Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store
 
Top 10 Tips for an Effective Postgres Deployment
Top 10 Tips for an Effective Postgres DeploymentTop 10 Tips for an Effective Postgres Deployment
Top 10 Tips for an Effective Postgres Deployment
 
Alta Disponibilidade no MySQL 5.7
Alta Disponibilidade no MySQL 5.7Alta Disponibilidade no MySQL 5.7
Alta Disponibilidade no MySQL 5.7
 
IPv6 deployment on GridPP & WLCG
IPv6 deployment on GridPP & WLCGIPv6 deployment on GridPP & WLCG
IPv6 deployment on GridPP & WLCG
 
IMCSummit 2015 - 1 IT Business - The Evolution of Pivotal Gemfire
IMCSummit 2015 - 1 IT Business  - The Evolution of Pivotal GemfireIMCSummit 2015 - 1 IT Business  - The Evolution of Pivotal Gemfire
IMCSummit 2015 - 1 IT Business - The Evolution of Pivotal Gemfire
 
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander DibboOpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
 
Openstack Summit Vancouver 2015 - Maintaining and Operating Swift at Public C...
Openstack Summit Vancouver 2015 - Maintaining and Operating Swift at Public C...Openstack Summit Vancouver 2015 - Maintaining and Operating Swift at Public C...
Openstack Summit Vancouver 2015 - Maintaining and Operating Swift at Public C...
 
Kubernetes Scaling SIG (K8Scale)
Kubernetes Scaling SIG (K8Scale)Kubernetes Scaling SIG (K8Scale)
Kubernetes Scaling SIG (K8Scale)
 
K8scale update-kubecon2015
K8scale update-kubecon2015K8scale update-kubecon2015
K8scale update-kubecon2015
 
200 million qps on commodity hardware : Getting started with MySQL Cluster 7.4
200 million qps on commodity hardware : Getting started with MySQL Cluster 7.4200 million qps on commodity hardware : Getting started with MySQL Cluster 7.4
200 million qps on commodity hardware : Getting started with MySQL Cluster 7.4
 
Skipping OpenStack Releases: (You Don't) Gotta Catch 'Em All
Skipping OpenStack Releases: (You Don't) Gotta Catch 'Em AllSkipping OpenStack Releases: (You Don't) Gotta Catch 'Em All
Skipping OpenStack Releases: (You Don't) Gotta Catch 'Em All
 
HTTP/2 Comes to Java - What Servlet 4.0 Means to You
HTTP/2 Comes to Java - What Servlet 4.0 Means to YouHTTP/2 Comes to Java - What Servlet 4.0 Means to You
HTTP/2 Comes to Java - What Servlet 4.0 Means to You
 
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
 
Apouc 2014-enterprise-manager-12c
Apouc 2014-enterprise-manager-12cApouc 2014-enterprise-manager-12c
Apouc 2014-enterprise-manager-12c
 
Exadata Cloud Service Overview(v2)
Exadata Cloud Service Overview(v2) Exadata Cloud Service Overview(v2)
Exadata Cloud Service Overview(v2)
 
BIO IT 15 - Are Your Researchers Paying Too Much for Their Cloud-Based Data B...
BIO IT 15 - Are Your Researchers Paying Too Much for Their Cloud-Based Data B...BIO IT 15 - Are Your Researchers Paying Too Much for Their Cloud-Based Data B...
BIO IT 15 - Are Your Researchers Paying Too Much for Their Cloud-Based Data B...
 
Unlocking big data with Hadoop + MySQL
Unlocking big data with Hadoop + MySQLUnlocking big data with Hadoop + MySQL
Unlocking big data with Hadoop + MySQL
 
MySQL 5.7 como Document Store
MySQL 5.7 como Document StoreMySQL 5.7 como Document Store
MySQL 5.7 como Document Store
 

Recently uploaded

《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》rnrncn29
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologycaarthichand2003
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxmaryFF1
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXDole Philippines School
 
Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...navyadasi1992
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPirithiRaju
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensorsonawaneprad
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...D. B. S. College Kanpur
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 

Recently uploaded (20)

《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technology
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
 
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort ServiceHot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
 
Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensor
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 

UniProt SPARQL Endpoint: Why Provide Public Access to Life Science Data

  • 1. The UniProt SPARQL endpoint: in production
  • 2. © 2015 SIB Why provide a public SPARQL endpoint • A 10 man wet laboratory can not afford: – to host their own database houses holding all or even a bit of all life science data. – not to have access, and use, existing life science information. • Classical SQL can be provided on the web – Is not practical – No federation – No standards adherence • Document centric REST is not enough – Swiss-Prot available as REST (over e-mail !!) since 1986 – www.uniprot.org since 2002 2
  • 6. © 2015 SIB 14,821,380,921 6 Node 1 64 cpu cores 256 GB ram 2.5 TB consumer SSD Load Balancer = Apache mod_balancer Node 2 64 cpu cores 256 GB ram 2.5 TB consumer SSD
  • 7. © 2015 SIB 7 Node 1 Node 2 Tomcat + Sesame + UI Virtuoso 7.2 (+) Tomcat + Sesame + UI Virtuoso 7.2 (+) Load Balancer = Apache mod_balancer 14,821,380,921
  • 8. 15,288,484,658 © 2015 SIB 8 Node 1 Node 2 Tomcat + Sesame + UI Virtuoso 7.2 (+) Tomcat + Sesame + UI Virtuoso 7.2 (+) Load Balancer = Apache mod_balancer Load Balancer = Apache mod_balancer 2 independant datacentres
  • 9. © 2015 SIB Dedicated machine for loading and testing • Loading RDF data “solved” problem – 500,000 triples per second easy • that’s what our machine plus virtuoso 7.2 – and some tricks does – 1,000,000 possible (xz unzip limit on our machine) • nquads or rdf/xml – higher values needs parallel readers • or even lighter weight parsers – highest observed rate • 2.5 million per second on 1/4 exadata – could be pushed higher 9
  • 10. © 2015 SIB Openlink SW: Virtuoso 7.2 • very responsive to issues • performance is good and getting better • not quite fully SPARQL 1.1 yet (corner cases) • anytime query – do not accept the default settings – spend some time securing your setup – (jails,cgroups, etc…) 10 ⚠
  • 11. © 2015 SIB Challenges as a public endpoint provider • Query load unpredictable • Simple data discovery queries are hard – 1 TB+ of DB files – e.g. from monitoring services • Query timeouts not sufficient – aim for 100% utilisation – what can http reasonably support – we want to be able to answer hard questions 11
  • 12. © 2015 SIB 12 Entries in UniProtKB over time Growing by 400 million triples a month
  • 14. © 2015 SIB Real users Mix between hard analytics and super specific Estimate somewhere between: 300 - 1000 real humans per month We know they are real because they take holidays ;) 14
  • 16. • Public monitoring also hard – often lower uptime than what is being monitored – robots.txt – not enough community support – service description • not being parsed – HEAD last modified? © 2015 SIB Public monitoring key aid in quality assurance 16 ✔sparqles
  • 17. © 2015 SIB Key-Value orientated SPARQL endpoint anyone? • assume 400 million named graphs – average 50 triples • max 5000 triples – get the whole named graph • single IO operation 17 CONSTRUCT {} FROM uniprot:P05067 WHERE {}