Successfully reported this slideshow.

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

WESO CAEPIA-20111108

  1. 1. Query Expansion Methods and Performance Evaluation for Reusing Linking Open Data of the European Public Procurement Notices José María Álvarez Rodríguez WESO-Universidad de Oviedo http://purl.org/weso/moldeas/ Tecnologías de Linked Data y sus aplicaciones en España (TLDE) CAEPIA 2011-Tenerife (Spain) 8th of November, 2011 Code: TSI-020100-2010-919
  2. 2. Overview Use case & Context SPARQL & Performance Next Steps
  3. 3. Objective Creation of a pan-european e-procurement platform
  4. 4. E-procurement Long Tail TED BOE (official bulletin of the Spanish Governement) BOPA (official bulletin of the Asturian Governement)
  5. 5. To Be Able to answer to… Which public procurement notices are relevant to Dutch companies (only SMEs) that want to tender for contracts announced by local authorities with a total value lower than 170K € to procure “Road bridge construction work” and a two year duration in the Dutch- speaking region of Flanders (Belgium)?
  6. 6. Structuring public procurement notices d Providing new semantic- based services yD Z ^ ^ ^ D D K W KW LOD enrichment K W ^ Ws Z Easing the access to the published data using the Ehd^ LOD approach Transforming government classifications
  7. 7. Preliminary Results / d d W s Z K Ehd^ W
  8. 8. Semantic-based Services Problem of «Query Expansion» depending on the kind of information variable
  9. 9. Methods of«Query Expansion» / ' d Z E ' ^ ^ h , ^ Z
  10. 10. Remembering… Which public procurement notices are relevant to Dutch companies (only SMEs) that want to tender for contracts announced by local authorities with a total value lower than 170K € to procure “Road bridge construction work” and a two year duration in the Dutch- speaking region of Flanders (Belgium)?
  11. 11. cpv:45221111-3 NL Query… Ehd^ Z' t KEE ppn:nutsCode ppn:hasDuration cpv:CodeIn2008 ppn:hasAmount org:classification ^D
  12. 12. cpv:45221111-3 NL Applying Query Expansion… Ehd^ Ehd^ E Ehd^ Ehd^ ppn:nutsCode ppn:hasDuration cpv:CodeIn2008 ppn:hasAmount org:classification ^D
  13. 13. Example of SPARQL query SELECT DISTINCT * WHERE { ?ppn rdf:type http://purl.org/weso/ppn/def#ppn. ?ppn ppn:nutsCode ?nutsCode. ?ppn cpv:codeIn2008 ?cpvCode. ?ppn ppn:hasDuration ?duration ?ppn dc:identifier ?id. ?ppn dc:date ?date . ? ppn ppn:hasAmount ?amount. FILTER(? cpvCode = cpv:45221111-3 ... ) . FILTER ( (xsd:double(?amount) = xsd:long(170,000)) (xsd:double(?amount) = xsd:long(200,000)) ). . FILTER(?nutsCode = nuts:B3 ... ) . FILTER ( (xsd:long(?duration) = xsd:long(2)) (xsd:long(?duration) = xsd:long(3)) ). }
  14. 14. Context Performance of SPARQL Queries ~30 sec.
  15. 15. Hardware Software DELL PC 2GB RAM and 30GB HardDisk Virtual Box (version 4.0.6) Linux 2.6.35-22-server #33-Ubuntu 2 SMP x86_64 GNU/Linux Ubuntu 10.10 OpenLink Virtuoso Opensource-6- 20110218
  16. 16. Question? How to decrease the time of query execution without modify the hardware and not use any vendor feature?
  17. 17. TripleStore 25 graphs 20 M of RDF Triples But… 8 graphs 11 M of RDF Triples
  18. 18. Focus on.. The generation of SPARQL queries
  19. 19. Let’s start… 9 SPARQL Queries 3 executions
  20. 20. d ^ /D/d /dZ 'Z W,^ ^ W d d d d d d d d d d d d d d d d
  21. 21. Simple SPARQL query SELECT DISTINCT * WHERE { ?ppn rdf:type http://purl.org/weso/ppn/def#ppn. ?ppn ppn:nutsCode ?nutsCode. ?ppn cpv:codeIn2008 ?cpvCode. ?ppn ppn:hasDuration ?duration ?ppn dc:identifier ?id. ?ppn dc:date ?date . ? ppn ppn:hasAmount ?amount. FILTER(? cpvCode = cpv:15331137 ) . . FILTER(?nutsCode = nuts:UK) . }
  22. 22. Simple Query 1 CPV Code 1 NUTS Code Time: ~3,29 sec.
  23. 23. T1 Rewrite SPARQL queries: Match triples from specific to general Filter as soon as possible
  24. 24. T2 Use the LIMIT clause Value set to 10,000
  25. 25. Rewrite SPARQL query SELECT DISTINCT * WHERE { ?ppn rdf:type http://purl.org/weso/ppn/def#ppn. ?ppn cpv:codeIn2008 ?cpvCode. FILTER(? cpvCode = cpv:15331137 ) . ?ppn ppn:nutsCode ?nutsCode. FILTER(?nutsCode = nuts:UK) . ?ppn ppn:hasDuration ?duration ?ppn dc:identifier ?id. ?ppn dc:date ?date . ? ppn ppn:hasAmount ?amount. . } LIMIT 10000
  26. 26. Results T2 1 CPV Code 1 NUTS Code Time: ~3,26 sec.
  27. 27. Evaluation There is no significant changes in execution time and gain… and We are interested in “enhanced queries”
  28. 28. T3 Execution of enhanced queries
  29. 29. Enhanced SPARQL query SELECT DISTINCT * WHERE { ?ppn rdf:type http://purl.org/weso/ppn/def#ppn. ?ppn ppn:nutsCode ?nutsCode. ?ppn cpv:codeIn2008 ?cpvCode. ?ppn ppn:hasDuration ?duration ?ppn dc:identifier ?id. ?ppn dc:date ?date . ? ppn ppn:hasAmount ?amount. FILTER(? cpvCode = {cpv:15331137 , cpv:48611000, cpv: 48611000, cpv:50531510, cpv: 15871210}) . . FILTER(?nutsCode = {nuts:B3, nuts:PL, nuts:RO ) . }
  30. 30. Results T3 5 CPV Codes 3 NUTS Codes 1 query Time: ~20,65 sec.
  31. 31. T4 Rewrite SPARQL queries + Use the LIMIT clause
  32. 32. Results T4 wrt T3 5 CPV Codes 3 NUTS Codes 1 query Time: ~20,55 sec.
  33. 33. Info 8 graphs 11 M of RDF Triples
  34. 34. T5 Rewrite SPARQL queries + Use the LIMIT clause + Named Graphs (FROM)
  35. 35. Results T5 wrt T3 5 CPV Codes 3 NUTS Codes 1 query Time: ~20,65 sec.
  36. 36. T6 Rewrite SPARQL queries + Use the LIMIT clause + Named Graphs (FROM) + Split into simple queries
  37. 37. Results T6 wrt T3 5 CPV Codes 3 NUTS Codes 4 Graphs 4 simple queries Time: ~20,60 sec.
  38. 38. T6-1 Rewrite SPARQL queries + Use the LIMIT clause + Named Graphs (FROM) + Split enhance query into simple queries + Parallelization of query execution (ad-hoc map/reduce)
  39. 39. Results T6-1 wrt T3 5 CPV Codes 3 NUTS Codes 4 Graphs 4 simple queries Time: ~11,93 sec.
  40. 40. T7 Rewrite SPARQL queries + Use the LIMIT clause + Split enhance query into simple queries
  41. 41. Results T7 wrt T3 1 CPV Code (5) 3 NUTS Code 5 simple queries Time: ~15,81 sec.
  42. 42. T7-1 Rewrite SPARQL queries + Use the LIMIT clause + Split enhance query into simple queries + Parallelization of query execution (ad-hoc map/reduce)
  43. 43. Results T7-1 wrt T3 1 CPV Code (5) 3 NUTS Codes 5 simple queries Time: ~10,55 sec.
  44. 44. T8 Rewrite SPARQL queries + Use the LIMIT clause + Named Graphs (FROM) + Split into simple queries
  45. 45. Results T8 wrt T3 1 CPV Code (5) 3 NUTS Codes 4 Graphs 20 simple queries Time: ~32,34 sec.
  46. 46. T8-1 Rewrite SPARQL queries + Use the LIMIT clause + Named Graphs (FROM) + Split enhance query into simple queries + Parallelization of query execution (ad-hoc map/reduce)
  47. 47. Results T8-1 wrt T3 1 CPV Code (5) 3 NUTS Codes 4 Graphs 20 simple queries Time: ~18,45 sec.
  48. 48. T9 Rewrite SPARQL queries + Use the LIMIT clause + Split enhance query into simple queries (1 CPV code+1 NUTS code)
  49. 49. Results T9 wrt T3 1 CPV Code (5) 1 NUTS Code (3) 15 simple queries Time: ~22,462 sec.
  50. 50. T9-1 Rewrite SPARQL queries + Use the LIMIT clause + Split enhance query into simple queries (1 CPV code+1 NUTS code) + Parallelization of query execution (ad-hoc map/reduce)
  51. 51. Results T9-1 wrt T3 1 CPV Code (5) 1 NUTS Code (3) 15 simple queries Time: ~12,77 sec.
  52. 52. T10 Rewrite SPARQL queries + Use the LIMIT clause + Named Graphs (FROM) + Split into simple queries (1 CPV code+1 NUTS code)
  53. 53. Results T10 wrt T3 1 CPV Code (5) 1 NUTS Code (3) 4 Graphs 60 simple queries Time: ~71,17 sec.
  54. 54. T10-1 Rewrite SPARQL queries + Use the LIMIT clause + Named Graphs (FROM) + Split enhance query into simple queries (1 CPV code+1 NUTS code) + Parallelization of query execution (ad-hoc map/reduce)
  55. 55. Results T10-1 wrt T3 1 CPV Code (5) 1 NUTS Code (3) 4 Graphs 60 simple queries Time: ~35,13 sec.
  56. 56. d Table of Results d ' d E d d E d d d d d d d d d d d d
  57. 57. Discussion • The number of queries is a key-factor • The number of CPV codes implies more execution time • The parallelization improves execution time • T7-1 is the best execution in terms of time • Rewrite SPARQL queries • Use the LIMIT clause • Split enhance query into simple queries • Parallelization of query execution
  58. 58. Further Steps • Distribute graphs in different nodes (HW improvement) • Use of other triple stores • (SW comparison) • Add SPARQL 1.1 new features (Expressiveness improvement) • Cache of queries (SW improvement)
  59. 59. Some References… • http://www4.wiwiss.fu- berlin.de/bizer/BerlinSPARQLBenchmark/results/index.html#comparison • http://www.slideshare.net/olafhartig/an-overview-on-linked-data- management-and-sparql-querying-isslod2011 • http://squin.sourceforge.net/ • http://www2.informatik.hu- berlin.de/~hartig/files/Slides_Hartig_ISSLOD2011.pdf • http://www2008.org/papers/pdf/p595-stocker1.pdf • http://www.informatik.uni- freiburg.de/~mschmidt/docs/diss_final01122010.pdf • http://mayor2.dia.fi.upm.es/oeg-upm/files/sparql-dqp/eswc11-bac-ext.pdf • http://www.slideshare.net/olafhartig/the-sparql-query-graph-model-for- query-optimization-1259536 • http://www.w3.org/TR/sparql-features/
  60. 60. Query Expansion Methods and Performance Evaluation for Reusing Linking Open Data of the European Public Procurement Notices José María Álvarez Rodríguez WESO-Universidad de Oviedo http://purl.org/weso/moldeas/ Tecnologías de Linked Data y sus aplicaciones en España (TLDE) CAEPIA 2011-Tenerife (Spain) 8th of November, 2011 Code: TSI-020100-2010-919

×