Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Initial Usage Analysis of DBpedia's Triple Pattern Fragments

1,965 views

Published on

Slides for my talk at the 5th International Usage Analysis and the Web of Data (USEWOD) Workshop.

  • Be the first to comment

Initial Usage Analysis of DBpedia's Triple Pattern Fragments

  1. 1. Initial Usage Analysis
 of DBpedia's
 Triple Pattern Fragments Ruben Verborgh,
 Erik Mannens & Rik Van de Walle
  2. 2. What role will the Semantic Web play
 for the future generation? Will it be remotely as important
 as the Web now is to us?
  3. 3. There used to be no applications
 because there was no data. Linked Data more of less solved
 this chicken-and-egg problem.
  4. 4. There are no applications
 because data is not queryable. SPARQL endpoints are unreliable.
 Data dumps are not live.
  5. 5. We analyzed DBpedia’s low-cost
 Triple Pattern Fragments interface
 between Nov 2014 and Feb 2015. Over 4M requests were made.
 There was 1 minute of downtime.
  6. 6. Web interfaces to triples Four months of fragments Extending the analysis
  7. 7. Web interfaces to triples Four months of fragments Extending the analysis
  8. 8. Web interfaces act as gateways
 between clients and databases. Database Client Web interface The interface hides the database schema. The interface restricts the kind of queries.
  9. 9. No sane Web developer or admin
 would give direct database access. Database Client Web interface The client must know the database schema. The client can ask any query.
  10. 10. SPARQL endpoints happily give
 direct access to the database. Triple
 store Client SPARQL protocol The client must know the database schema. The client can ask any query.
  11. 11. SPARQL interfaces are expensive,
 so we have an availability problem. There are few SPARQL endpoints
 because hosting them is expensive. Many of the endpoints that exist
 suffer from low availability. You already give data for free.
 Do you have to pay for query time as well?
  12. 12. Data dumps allow you to set up
 your own private SPARQL endpoint. But then we no longer query the Web… No usage statistics whatsoever. Not everybody can do this:
 mobile devices, non-technical users, …
  13. 13. The interface hides the database schema. The interface restricts the kind of queries. A Triple Pattern Fragments interface
 acts as a gateway to an RDF source. RDF
 source Client TPF
 interface
  14. 14. A Triple Pattern Fragments interface
 acts as a gateway to an RDF source. Client can only ask ?s ?p ?o patterns. Decompose complex SPARQL queries
 on the client client-side. Low server cost, highly cacheable,
 but higher bandwidth and query time.
  15. 15. Web interfaces to triples Four months of fragments Extending the analysis
  16. 16. In mid-October 2014, we started
 an official TPF interface for DBpedia. Will this interface be used? How will clients use it? Will the availability be sufficient
 for live application usage?
  17. 17. The server is deployed virtually, availability monitored externally. Amazon Elastic Compute Cloud
 c3.2xlarge machine (8 CPU, 15GB RAM) Compressed HDT format as backend Pingdom for analysis
  18. 18. 4.5 million Triple Pattern Fragments
 of DBpedia 2014@en were requested.
  19. 19. The TPF client library consumed most,
 followed by crawlers and Chrome.
  20. 20. Turtle as content type is most popular,
 but being surpassed by TriG.
  21. 21. Most requests come from Europe,
 the US and China are following.
  22. 22. The “type” fragment was popular,
 but it’s hard to conclude anything.
  23. 23. A quarter of all requests was cached
 (but we could cache everything).
  24. 24. During four months,
 the API had 99.9994% availability. We deeply apologize for
 that one minute of downtime
 in November.
  25. 25. Web interfaces to triples Four months of fragments Extending the analysis
  26. 26. We don’t know exactly
 which clients executed queries. Was the TPF client used standalone? As a library of another application? Also hard for SPARQL endpoints.
  27. 27. The analysis did not give insights
 in which queries clients executed. Good for privacy! We can try reconstructing SPARQL queries,
 but maybe clients did something else. We only know with SPARQL endpoints,
 not with data dumps or LD documents.
  28. 28. We could learn from the human Web:
 can clients give explicit feedback? “This is the query I executed.
 It took me 5 seconds.” Potential source of insights,
 but clients need a gain. Will this be representative/truthful?
  29. 29. Web interfaces to triples Four months of fragments Extending the analysis
  30. 30. We have a >99.999% available API
 to the most popular RDF datasource. No more excuses not to build apps. So where are they? Is something else holding us back?
  31. 31. We need to think differently
 on how to build Linked Data apps. The paradigm of querying a database
 and waiting for the results
 does not scale to the Web. Live data requires new interfaces
 and new visualizations.
  32. 32. We need developers to build bridges
 from data to end users. Now that the chicken-and-egg problem
 and the availability problems are solved,
 we need to tackle fundamental questions. Where are the killer apps
 the next generation is waiting for?
  33. 33. Initial Usage Analysis
 of DBpedia's
 Triple Pattern Fragments @RubenVerborgh
 ruben.verborgh.org
 linkeddatafragments.org

×