Query Expansion Methods and
Performance Evaluation
for
Reusing Linking Open Data of the
European Public Procurement Notices
José María Álvarez Rodríguez
WESO-Universidad de Oviedo
http://purl.org/weso/moldeas/
Tecnologías de Linked Data y sus aplicaciones en España (TLDE)
CAEPIA 2011-Tenerife (Spain)
8th of November, 2011
Code: TSI-020100-2010-919
E-procurement
Long Tail
TED
BOE
(official bulletin
of the Spanish
Governement) BOPA
(official bulletin
of the Asturian
Governement)
To Be Able to answer to…
Which public procurement notices are
relevant to Dutch companies (only SMEs) that
want to tender for contracts announced by
local authorities with a total value lower than
170K € to procure “Road bridge construction
work” and a two year duration in the Dutch-
speaking region of Flanders (Belgium)?
Structuring public procurement notices
d
Providing new semantic-
based services
yD Z ^
^
^
D D
K W
KW LOD
enrichment
K
W ^
Ws Z
Easing the access to the
published data using the
Ehd^ LOD approach
Transforming government classifications
Remembering…
Which public procurement notices are
relevant to Dutch companies (only SMEs) that
want to tender for contracts announced by
local authorities with a total value lower than
170K € to procure “Road bridge construction
work” and a two year duration in the Dutch-
speaking region of Flanders (Belgium)?
Results T8 wrt T3
1 CPV Code (5)
3 NUTS Codes
4 Graphs
20 simple queries
Time: ~32,34 sec.
T8-1
Rewrite SPARQL queries
+
Use the LIMIT clause
+
Named Graphs (FROM)
+
Split enhance query into simple queries
+
Parallelization of query execution
(ad-hoc map/reduce)
Results T8-1 wrt T3
1 CPV Code (5)
3 NUTS Codes
4 Graphs
20 simple queries
Time: ~18,45 sec.
T9
Rewrite SPARQL queries
+
Use the LIMIT clause
+
Split enhance query into simple queries
(1 CPV code+1 NUTS code)
Results T9 wrt T3
1 CPV Code (5)
1 NUTS Code (3)
15 simple queries
Time: ~22,462 sec.
T9-1
Rewrite SPARQL queries
+
Use the LIMIT clause
+
Split enhance query into simple queries
(1 CPV code+1 NUTS code)
+
Parallelization of query execution
(ad-hoc map/reduce)
Results T9-1 wrt T3
1 CPV Code (5)
1 NUTS Code (3)
15 simple queries
Time: ~12,77 sec.
T10
Rewrite SPARQL queries
+
Use the LIMIT clause
+
Named Graphs (FROM)
+
Split into simple queries
(1 CPV code+1 NUTS code)
Results T10 wrt T3
1 CPV Code (5)
1 NUTS Code (3)
4 Graphs
60 simple queries
Time: ~71,17 sec.
T10-1
Rewrite SPARQL queries
+
Use the LIMIT clause
+
Named Graphs (FROM)
+
Split enhance query into simple queries
(1 CPV code+1 NUTS code)
+
Parallelization of query execution
(ad-hoc map/reduce)
Results T10-1 wrt T3
1 CPV Code (5)
1 NUTS Code (3)
4 Graphs
60 simple queries
Time: ~35,13 sec.
d Table of Results
d '
d E
d
d E
d
d
d
d
d
d
d
d
d
d
d
d
Discussion
• The number of queries is a key-factor
• The number of CPV codes implies more
execution time
• The parallelization improves execution
time
• T7-1 is the best execution in terms of
time
• Rewrite SPARQL queries
• Use the LIMIT clause
• Split enhance query into simple queries
• Parallelization of query execution
Further Steps
• Distribute graphs in different nodes
(HW improvement)
• Use of other triple stores
• (SW comparison)
• Add SPARQL 1.1 new features
(Expressiveness improvement)
• Cache of queries (SW improvement)
Query Expansion Methods and
Performance Evaluation
for
Reusing Linking Open Data of the
European Public Procurement Notices
José María Álvarez Rodríguez
WESO-Universidad de Oviedo
http://purl.org/weso/moldeas/
Tecnologías de Linked Data y sus aplicaciones en España (TLDE)
CAEPIA 2011-Tenerife (Spain)
8th of November, 2011
Code: TSI-020100-2010-919