Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Live DBpedia querying with high availability


Published on

Slides from talk at the DBpedia Community Meeting in Leipzig, 2014

Published in: Internet
  • Login to see the comments

Live DBpedia querying with high availability

  1. 1. Live DBpedia querying with high availability Ruben Verborgh
  2. 2. “If DBpedia goes down, nobody complains. If the BBC Linked Data Platform goes down, that’s a problem for live applications.” Don’t we want live applications on top of DBpedia? It’s a vicious cycle: no apps because downtime, downtime acceptable because no apps.
  3. 3. Despite all traffic it receives, DBpedia has an uptime of around 95%, making it one of the more reliable endpoints. DBpedia is unavailable for 1.5 days each month.
  4. 4. Public endpoints like DBpedia are hosted voluntarily, as-is and free of charge. Since DBpedia is hosted for free, we should not complain too hard when it is unavailable.
  5. 5. The majority of the information on the Web is hosted voluntarily, as-is and free of charge. Wouldn’t we complain if that information was unavailable for 1.5 days a month?
  6. 6. There’s a difference! DBpedia offers a SPARQL interface, which is much more expensive than HTTP.
  7. 7. Exactly. If DBpedia offers information as-is and free of charge, why choose such an expensive interface? Why does it commit itself to such a strong engagement? (The BBC doesn’t do it!)
  8. 8. Why offer to answer complex queries if you cannot reliably answer simple ones?
  9. 9. SELECT * { ?s ?p ?o } LIMIT 1
  10. 10. What other interfaces to RDF are available? low server demand high server demand data dump SPARQL endpoint derefer-encing Triple Pattern Fragments
  11. 11. A triple pattern fragments interface is much less powerful than SPARQL. So also much less demanding, like other HTTP interfaces. Servers of triple pattern fragments have very high availability.
  12. 12. How do we handle complex queries? Clients execute them! SELECT ?person ?city WHERE { ?person a dbpedia-owl:Artist. dbpedia-owl:birthPlace ?city. ?city foaf:name "York"@en. } 2–3 seconds
  13. 13. data processing server
  14. 14. data processing server client
  15. 15. Server CPU usage remains low. 1 10 100 100 50 0 clients Fig. 3.3: Query timeouts 10 100 100 50 0 clients Fig. 3.5: Server processor usage per core 100 1 client 100 clients
  16. 16. 1 10 100 0 clients Fig. 3.2: Server network traffic 1 10 100 20 10 0 clients Fig. 3.4: Cache network traffic 8 6 Cache reuse becomes very high. 1 client 100 clients
  17. 17. Don’t build intelligent servers, because scaling them is expensive. Build servers that enable clients to be intelligent.
  18. 18. By offering a much simpler interface, DBpedia can be available like all sites. This lets us build Web applications on top of live DBpedia data. Let’s make simple things work reliably, then worry about the complex.
  19. 19. Live DBpedia querying with high availability Ruben Verborgh