WESO CAEPIA-20111108

986 views

Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

WESO CAEPIA-20111108

  1. 1. Query Expansion Methods and Performance Evaluation for Reusing Linking Open Data of the European Public Procurement Notices José María Álvarez Rodríguez WESO-Universidad de Oviedo http://purl.org/weso/moldeas/ Tecnologías de Linked Data y sus aplicaciones en España (TLDE) CAEPIA 2011-Tenerife (Spain) 8th of November, 2011Code: TSI-020100-2010-919
  2. 2. Overview Use case & ContextSPARQL & Performance Next Steps
  3. 3. ObjectiveCreation of a pan-european e-procurement platform
  4. 4. E-procurement Long Tail TED BOE (official bulletin of the Spanish Governement) BOPA (official bulletin of the Asturian Governement)
  5. 5. To Be Able to answer to… Which public procurement notices arerelevant to Dutch companies (only SMEs) that want to tender for contracts announced by local authorities with a total value lower than 170K € to procure “Road bridge construction work” and a two year duration in the Dutch- speaking region of Flanders (Belgium)?
  6. 6. Structuring public procurement notices d Providing new semantic- based services yD Z ^ ^ ^ D DK WKW LOD enrichmentK W ^ Ws Z Easing the access to the published data using theEhd^ LOD approach Transforming government classifications
  7. 7. Preliminary Results/ d d Ws ZKEhd^W
  8. 8. Semantic-based Services Problem of«Query Expansion»depending on the kind of information variable
  9. 9. Methods of«Query Expansion» / d Z E ^ ^ h , ^ Z
  10. 10. Remembering… Which public procurement notices arerelevant to Dutch companies (only SMEs) that want to tender for contracts announced by local authorities with a total value lower than 170K € to procure “Road bridge construction work” and a two year duration in the Dutch- speaking region of Flanders (Belgium)?
  11. 11. cpv:45221111-3 NL Query… Ehd^ Z t KEE ppn:nutsCode ppn:hasDuration cpv:CodeIn2008 ppn:hasAmount org:classification ^D
  12. 12. cpv:45221111-3 NL Applying Query Expansion… Ehd^ Ehd^ E Ehd^ Ehd^ ppn:nutsCode ppn:hasDuration cpv:CodeIn2008 ppn:hasAmount org:classification ^D
  13. 13. Example of SPARQL querySELECT DISTINCT * WHERE { ?ppn rdf:type http://purl.org/weso/ppn/def#ppn. ?ppn ppn:nutsCode ?nutsCode. ?ppn cpv:codeIn2008 ?cpvCode. ?ppn ppn:hasDuration ?duration ?ppn dc:identifier ?id. ?ppn dc:date ?date . ? ppn ppn:hasAmount ?amount. FILTER(? cpvCode = cpv:45221111-3 ... ) . FILTER ( (xsd:double(?amount) = xsd:long(170,000)) (xsd:double(?amount) = xsd:long(200,000)) ).. FILTER(?nutsCode = nuts:B3 ... ) . FILTER ( (xsd:long(?duration) = xsd:long(2)) (xsd:long(?duration) = xsd:long(3)) ).}
  14. 14. ContextPerformance of SPARQL Queries ~30 sec.
  15. 15. Hardware SoftwareDELL PC 2GB RAM and 30GB HardDisk Virtual Box (version 4.0.6)Linux 2.6.35-22-server #33-Ubuntu 2 SMP x86_64 GNU/Linux Ubuntu 10.10 OpenLink Virtuoso Opensource-6- 20110218
  16. 16. Question?How to decrease the time of query execution withoutmodify the hardware and not use any vendor feature?
  17. 17. TripleStore 25 graphs20 M of RDF Triples But… 8 graphs11 M of RDF Triples
  18. 18. Focus on..The generation of SPARQL queries
  19. 19. Let’s start…9 SPARQL Queries 3 executions
  20. 20. d ^ /D/d /dZ Z W,^ ^ W dddddddddddddddd
  21. 21. Simple SPARQL querySELECT DISTINCT * WHERE { ?ppn rdf:type http://purl.org/weso/ppn/def#ppn. ?ppn ppn:nutsCode ?nutsCode. ?ppn cpv:codeIn2008 ?cpvCode. ?ppn ppn:hasDuration ?duration ?ppn dc:identifier ?id. ?ppn dc:date ?date . ? ppn ppn:hasAmount ?amount. FILTER(? cpvCode = cpv:15331137 ) .. FILTER(?nutsCode = nuts:UK) .}
  22. 22. Simple Query 1 CPV Code 1 NUTS CodeTime: ~3,29 sec.
  23. 23. T1Rewrite SPARQL queries:Match triples from specific to general Filter as soon as possible
  24. 24. T2Use the LIMIT clause Value set to 10,000
  25. 25. Rewrite SPARQL querySELECT DISTINCT * WHERE { ?ppn rdf:type http://purl.org/weso/ppn/def#ppn. ?ppn cpv:codeIn2008 ?cpvCode. FILTER(? cpvCode = cpv:15331137 ) . ?ppn ppn:nutsCode ?nutsCode. FILTER(?nutsCode = nuts:UK) . ?ppn ppn:hasDuration ?duration ?ppn dc:identifier ?id. ?ppn dc:date ?date . ? ppn ppn:hasAmount ?amount.. }LIMIT 10000
  26. 26. Results T2 1 CPV Code 1 NUTS CodeTime: ~3,26 sec.
  27. 27. Evaluation There is no significantchanges in execution time and gain… and We are interested in “enhanced queries”
  28. 28. T3Execution of enhanced queries
  29. 29. Enhanced SPARQL querySELECT DISTINCT * WHERE { ?ppn rdf:type http://purl.org/weso/ppn/def#ppn. ?ppn ppn:nutsCode ?nutsCode. ?ppn cpv:codeIn2008 ?cpvCode. ?ppn ppn:hasDuration ?duration ?ppn dc:identifier ?id. ?ppn dc:date ?date . ? ppn ppn:hasAmount ?amount. FILTER(? cpvCode = {cpv:15331137 , cpv:48611000, cpv: 48611000, cpv:50531510, cpv: 15871210}) .. FILTER(?nutsCode = {nuts:B3, nuts:PL, nuts:RO ) .}
  30. 30. Results T3 5 CPV Codes 3 NUTS Codes 1 queryTime: ~20,65 sec.
  31. 31. T4Rewrite SPARQL queries + Use the LIMIT clause
  32. 32. Results T4 wrt T3 5 CPV Codes 3 NUTS Codes 1 queryTime: ~20,55 sec.
  33. 33. Info 8 graphs11 M of RDF Triples
  34. 34. T5Rewrite SPARQL queries + Use the LIMIT clause + Named Graphs (FROM)
  35. 35. Results T5 wrt T3 5 CPV Codes 3 NUTS Codes 1 queryTime: ~20,65 sec.
  36. 36. T6Rewrite SPARQL queries + Use the LIMIT clause + Named Graphs (FROM) +Split into simple queries
  37. 37. Results T6 wrt T3 5 CPV Codes 3 NUTS Codes 4 Graphs 4 simple queriesTime: ~20,60 sec.
  38. 38. T6-1 Rewrite SPARQL queries + Use the LIMIT clause + Named Graphs (FROM) +Split enhance query into simple queries + Parallelization of query execution (ad-hoc map/reduce)
  39. 39. Results T6-1 wrt T3 5 CPV Codes 3 NUTS Codes 4 Graphs 4 simple queriesTime: ~11,93 sec.
  40. 40. T7 Rewrite SPARQL queries + Use the LIMIT clause +Split enhance query into simple queries
  41. 41. Results T7 wrt T3 1 CPV Code (5) 3 NUTS Code 5 simple queriesTime: ~15,81 sec.
  42. 42. T7-1 Rewrite SPARQL queries + Use the LIMIT clause +Split enhance query into simple queries + Parallelization of query execution (ad-hoc map/reduce)
  43. 43. Results T7-1 wrt T3 1 CPV Code (5) 3 NUTS Codes 5 simple queriesTime: ~10,55 sec.
  44. 44. T8Rewrite SPARQL queries + Use the LIMIT clause + Named Graphs (FROM) +Split into simple queries
  45. 45. Results T8 wrt T3 1 CPV Code (5) 3 NUTS Codes 4 Graphs 20 simple queriesTime: ~32,34 sec.
  46. 46. T8-1 Rewrite SPARQL queries + Use the LIMIT clause + Named Graphs (FROM) +Split enhance query into simple queries + Parallelization of query execution (ad-hoc map/reduce)
  47. 47. Results T8-1 wrt T3 1 CPV Code (5) 3 NUTS Codes 4 Graphs 20 simple queriesTime: ~18,45 sec.
  48. 48. T9 Rewrite SPARQL queries + Use the LIMIT clause +Split enhance query into simple queries (1 CPV code+1 NUTS code)
  49. 49. Results T9 wrt T3 1 CPV Code (5) 1 NUTS Code (3) 15 simple queriesTime: ~22,462 sec.
  50. 50. T9-1 Rewrite SPARQL queries + Use the LIMIT clause +Split enhance query into simple queries (1 CPV code+1 NUTS code) + Parallelization of query execution (ad-hoc map/reduce)
  51. 51. Results T9-1 wrt T3 1 CPV Code (5) 1 NUTS Code (3) 15 simple queriesTime: ~12,77 sec.
  52. 52. T10 Rewrite SPARQL queries + Use the LIMIT clause + Named Graphs (FROM) + Split into simple queries(1 CPV code+1 NUTS code)
  53. 53. Results T10 wrt T3 1 CPV Code (5) 1 NUTS Code (3) 4 Graphs 60 simple queriesTime: ~71,17 sec.
  54. 54. T10-1 Rewrite SPARQL queries + Use the LIMIT clause + Named Graphs (FROM) +Split enhance query into simple queries (1 CPV code+1 NUTS code) + Parallelization of query execution (ad-hoc map/reduce)
  55. 55. Results T10-1 wrt T3 1 CPV Code (5) 1 NUTS Code (3) 4 Graphs 60 simple queriesTime: ~35,13 sec.
  56. 56. d Table of Results d d E d d E d d d d d d d d d d d d
  57. 57. Discussion• The number of queries is a key-factor• The number of CPV codes implies more execution time• The parallelization improves execution time• T7-1 is the best execution in terms of time • Rewrite SPARQL queries • Use the LIMIT clause • Split enhance query into simple queries • Parallelization of query execution
  58. 58. Further Steps• Distribute graphs in different nodes (HW improvement)• Use of other triple stores• (SW comparison)• Add SPARQL 1.1 new features (Expressiveness improvement)• Cache of queries (SW improvement)
  59. 59. Some References…• http://www4.wiwiss.fu- berlin.de/bizer/BerlinSPARQLBenchmark/results/index.html#comparison• http://www.slideshare.net/olafhartig/an-overview-on-linked-data- management-and-sparql-querying-isslod2011• http://squin.sourceforge.net/• http://www2.informatik.hu- berlin.de/~hartig/files/Slides_Hartig_ISSLOD2011.pdf• http://www2008.org/papers/pdf/p595-stocker1.pdf• http://www.informatik.uni- freiburg.de/~mschmidt/docs/diss_final01122010.pdf• http://mayor2.dia.fi.upm.es/oeg-upm/files/sparql-dqp/eswc11-bac-ext.pdf• http://www.slideshare.net/olafhartig/the-sparql-query-graph-model-for- query-optimization-1259536• http://www.w3.org/TR/sparql-features/
  60. 60. Query Expansion Methods and Performance Evaluation for Reusing Linking Open Data of the European Public Procurement Notices José María Álvarez Rodríguez WESO-Universidad de Oviedo http://purl.org/weso/moldeas/ Tecnologías de Linked Data y sus aplicaciones en España (TLDE) CAEPIA 2011-Tenerife (Spain) 8th of November, 2011Code: TSI-020100-2010-919

×