Your SlideShare is downloading. ×
WESO CAEPIA-20111108
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

WESO CAEPIA-20111108

604
views

Published on

Published in: Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
604
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Query Expansion Methods and Performance Evaluation for Reusing Linking Open Data of the European Public Procurement Notices José María Álvarez Rodríguez WESO-Universidad de Oviedo http://purl.org/weso/moldeas/ Tecnologías de Linked Data y sus aplicaciones en España (TLDE) CAEPIA 2011-Tenerife (Spain) 8th of November, 2011Code: TSI-020100-2010-919
  • 2. Overview Use case & ContextSPARQL & Performance Next Steps
  • 3. ObjectiveCreation of a pan-european e-procurement platform
  • 4. E-procurement Long Tail TED BOE (official bulletin of the Spanish Governement) BOPA (official bulletin of the Asturian Governement)
  • 5. To Be Able to answer to… Which public procurement notices arerelevant to Dutch companies (only SMEs) that want to tender for contracts announced by local authorities with a total value lower than 170K € to procure “Road bridge construction work” and a two year duration in the Dutch- speaking region of Flanders (Belgium)?
  • 6. Structuring public procurement notices d Providing new semantic- based services yD Z ^ ^ ^ D DK WKW LOD enrichmentK W ^ Ws Z Easing the access to the published data using theEhd^ LOD approach Transforming government classifications
  • 7. Preliminary Results/ d d Ws ZKEhd^W
  • 8. Semantic-based Services Problem of«Query Expansion»depending on the kind of information variable
  • 9. Methods of«Query Expansion» / d Z E ^ ^ h , ^ Z
  • 10. Remembering… Which public procurement notices arerelevant to Dutch companies (only SMEs) that want to tender for contracts announced by local authorities with a total value lower than 170K € to procure “Road bridge construction work” and a two year duration in the Dutch- speaking region of Flanders (Belgium)?
  • 11. cpv:45221111-3 NL Query… Ehd^ Z t KEE ppn:nutsCode ppn:hasDuration cpv:CodeIn2008 ppn:hasAmount org:classification ^D
  • 12. cpv:45221111-3 NL Applying Query Expansion… Ehd^ Ehd^ E Ehd^ Ehd^ ppn:nutsCode ppn:hasDuration cpv:CodeIn2008 ppn:hasAmount org:classification ^D
  • 13. Example of SPARQL querySELECT DISTINCT * WHERE { ?ppn rdf:type http://purl.org/weso/ppn/def#ppn. ?ppn ppn:nutsCode ?nutsCode. ?ppn cpv:codeIn2008 ?cpvCode. ?ppn ppn:hasDuration ?duration ?ppn dc:identifier ?id. ?ppn dc:date ?date . ? ppn ppn:hasAmount ?amount. FILTER(? cpvCode = cpv:45221111-3 ... ) . FILTER ( (xsd:double(?amount) = xsd:long(170,000)) (xsd:double(?amount) = xsd:long(200,000)) ).. FILTER(?nutsCode = nuts:B3 ... ) . FILTER ( (xsd:long(?duration) = xsd:long(2)) (xsd:long(?duration) = xsd:long(3)) ).}
  • 14. ContextPerformance of SPARQL Queries ~30 sec.
  • 15. Hardware SoftwareDELL PC 2GB RAM and 30GB HardDisk Virtual Box (version 4.0.6)Linux 2.6.35-22-server #33-Ubuntu 2 SMP x86_64 GNU/Linux Ubuntu 10.10 OpenLink Virtuoso Opensource-6- 20110218
  • 16. Question?How to decrease the time of query execution withoutmodify the hardware and not use any vendor feature?
  • 17. TripleStore 25 graphs20 M of RDF Triples But… 8 graphs11 M of RDF Triples
  • 18. Focus on..The generation of SPARQL queries
  • 19. Let’s start…9 SPARQL Queries 3 executions
  • 20. d ^ /D/d /dZ Z W,^ ^ W dddddddddddddddd
  • 21. Simple SPARQL querySELECT DISTINCT * WHERE { ?ppn rdf:type http://purl.org/weso/ppn/def#ppn. ?ppn ppn:nutsCode ?nutsCode. ?ppn cpv:codeIn2008 ?cpvCode. ?ppn ppn:hasDuration ?duration ?ppn dc:identifier ?id. ?ppn dc:date ?date . ? ppn ppn:hasAmount ?amount. FILTER(? cpvCode = cpv:15331137 ) .. FILTER(?nutsCode = nuts:UK) .}
  • 22. Simple Query 1 CPV Code 1 NUTS CodeTime: ~3,29 sec.
  • 23. T1Rewrite SPARQL queries:Match triples from specific to general Filter as soon as possible
  • 24. T2Use the LIMIT clause Value set to 10,000
  • 25. Rewrite SPARQL querySELECT DISTINCT * WHERE { ?ppn rdf:type http://purl.org/weso/ppn/def#ppn. ?ppn cpv:codeIn2008 ?cpvCode. FILTER(? cpvCode = cpv:15331137 ) . ?ppn ppn:nutsCode ?nutsCode. FILTER(?nutsCode = nuts:UK) . ?ppn ppn:hasDuration ?duration ?ppn dc:identifier ?id. ?ppn dc:date ?date . ? ppn ppn:hasAmount ?amount.. }LIMIT 10000
  • 26. Results T2 1 CPV Code 1 NUTS CodeTime: ~3,26 sec.
  • 27. Evaluation There is no significantchanges in execution time and gain… and We are interested in “enhanced queries”
  • 28. T3Execution of enhanced queries
  • 29. Enhanced SPARQL querySELECT DISTINCT * WHERE { ?ppn rdf:type http://purl.org/weso/ppn/def#ppn. ?ppn ppn:nutsCode ?nutsCode. ?ppn cpv:codeIn2008 ?cpvCode. ?ppn ppn:hasDuration ?duration ?ppn dc:identifier ?id. ?ppn dc:date ?date . ? ppn ppn:hasAmount ?amount. FILTER(? cpvCode = {cpv:15331137 , cpv:48611000, cpv: 48611000, cpv:50531510, cpv: 15871210}) .. FILTER(?nutsCode = {nuts:B3, nuts:PL, nuts:RO ) .}
  • 30. Results T3 5 CPV Codes 3 NUTS Codes 1 queryTime: ~20,65 sec.
  • 31. T4Rewrite SPARQL queries + Use the LIMIT clause
  • 32. Results T4 wrt T3 5 CPV Codes 3 NUTS Codes 1 queryTime: ~20,55 sec.
  • 33. Info 8 graphs11 M of RDF Triples
  • 34. T5Rewrite SPARQL queries + Use the LIMIT clause + Named Graphs (FROM)
  • 35. Results T5 wrt T3 5 CPV Codes 3 NUTS Codes 1 queryTime: ~20,65 sec.
  • 36. T6Rewrite SPARQL queries + Use the LIMIT clause + Named Graphs (FROM) +Split into simple queries
  • 37. Results T6 wrt T3 5 CPV Codes 3 NUTS Codes 4 Graphs 4 simple queriesTime: ~20,60 sec.
  • 38. T6-1 Rewrite SPARQL queries + Use the LIMIT clause + Named Graphs (FROM) +Split enhance query into simple queries + Parallelization of query execution (ad-hoc map/reduce)
  • 39. Results T6-1 wrt T3 5 CPV Codes 3 NUTS Codes 4 Graphs 4 simple queriesTime: ~11,93 sec.
  • 40. T7 Rewrite SPARQL queries + Use the LIMIT clause +Split enhance query into simple queries
  • 41. Results T7 wrt T3 1 CPV Code (5) 3 NUTS Code 5 simple queriesTime: ~15,81 sec.
  • 42. T7-1 Rewrite SPARQL queries + Use the LIMIT clause +Split enhance query into simple queries + Parallelization of query execution (ad-hoc map/reduce)
  • 43. Results T7-1 wrt T3 1 CPV Code (5) 3 NUTS Codes 5 simple queriesTime: ~10,55 sec.
  • 44. T8Rewrite SPARQL queries + Use the LIMIT clause + Named Graphs (FROM) +Split into simple queries
  • 45. Results T8 wrt T3 1 CPV Code (5) 3 NUTS Codes 4 Graphs 20 simple queriesTime: ~32,34 sec.
  • 46. T8-1 Rewrite SPARQL queries + Use the LIMIT clause + Named Graphs (FROM) +Split enhance query into simple queries + Parallelization of query execution (ad-hoc map/reduce)
  • 47. Results T8-1 wrt T3 1 CPV Code (5) 3 NUTS Codes 4 Graphs 20 simple queriesTime: ~18,45 sec.
  • 48. T9 Rewrite SPARQL queries + Use the LIMIT clause +Split enhance query into simple queries (1 CPV code+1 NUTS code)
  • 49. Results T9 wrt T3 1 CPV Code (5) 1 NUTS Code (3) 15 simple queriesTime: ~22,462 sec.
  • 50. T9-1 Rewrite SPARQL queries + Use the LIMIT clause +Split enhance query into simple queries (1 CPV code+1 NUTS code) + Parallelization of query execution (ad-hoc map/reduce)
  • 51. Results T9-1 wrt T3 1 CPV Code (5) 1 NUTS Code (3) 15 simple queriesTime: ~12,77 sec.
  • 52. T10 Rewrite SPARQL queries + Use the LIMIT clause + Named Graphs (FROM) + Split into simple queries(1 CPV code+1 NUTS code)
  • 53. Results T10 wrt T3 1 CPV Code (5) 1 NUTS Code (3) 4 Graphs 60 simple queriesTime: ~71,17 sec.
  • 54. T10-1 Rewrite SPARQL queries + Use the LIMIT clause + Named Graphs (FROM) +Split enhance query into simple queries (1 CPV code+1 NUTS code) + Parallelization of query execution (ad-hoc map/reduce)
  • 55. Results T10-1 wrt T3 1 CPV Code (5) 1 NUTS Code (3) 4 Graphs 60 simple queriesTime: ~35,13 sec.
  • 56. d Table of Results d d E d d E d d d d d d d d d d d d
  • 57. Discussion• The number of queries is a key-factor• The number of CPV codes implies more execution time• The parallelization improves execution time• T7-1 is the best execution in terms of time • Rewrite SPARQL queries • Use the LIMIT clause • Split enhance query into simple queries • Parallelization of query execution
  • 58. Further Steps• Distribute graphs in different nodes (HW improvement)• Use of other triple stores• (SW comparison)• Add SPARQL 1.1 new features (Expressiveness improvement)• Cache of queries (SW improvement)
  • 59. Some References…• http://www4.wiwiss.fu- berlin.de/bizer/BerlinSPARQLBenchmark/results/index.html#comparison• http://www.slideshare.net/olafhartig/an-overview-on-linked-data- management-and-sparql-querying-isslod2011• http://squin.sourceforge.net/• http://www2.informatik.hu- berlin.de/~hartig/files/Slides_Hartig_ISSLOD2011.pdf• http://www2008.org/papers/pdf/p595-stocker1.pdf• http://www.informatik.uni- freiburg.de/~mschmidt/docs/diss_final01122010.pdf• http://mayor2.dia.fi.upm.es/oeg-upm/files/sparql-dqp/eswc11-bac-ext.pdf• http://www.slideshare.net/olafhartig/the-sparql-query-graph-model-for- query-optimization-1259536• http://www.w3.org/TR/sparql-features/
  • 60. Query Expansion Methods and Performance Evaluation for Reusing Linking Open Data of the European Public Procurement Notices José María Álvarez Rodríguez WESO-Universidad de Oviedo http://purl.org/weso/moldeas/ Tecnologías de Linked Data y sus aplicaciones en España (TLDE) CAEPIA 2011-Tenerife (Spain) 8th of November, 2011Code: TSI-020100-2010-919