SlideShare a Scribd company logo
ParlBench: a SPARQL-benchmark for electronic
publishing applications
Tatiana Tarasova Maarten Marx
University of Amsterdam
Information and Language Processing Systems
May 26, 2013
Workshop on Benchmarking RDF Systems, ESWC 2013
MEDIA	
  
PUBLICATIONS	
  
LIFE-­‐SCIENCES	
  CROSS-­‐DOMAIN	
  
GEOGRAPHIC	
  
GOVERNMENT	
  
MEDIA	
  
PUBLICATIONS	
  
LIFE-­‐SCIENCES	
  CROSS-­‐DOMAIN	
  
GEOGRAPHIC	
  
GOVERNMENT	
  
?	
  
The ParlBench Benchmark
Goal:
→ test performances of RDF store systems in the settings of e-publishing
applications
The ParlBench Benchmark
Goal:
→ test performances of RDF store systems in the settings of e-publishing
applications
Components:
→ real-world data: Dutch parliamentary proceedings, members and
political parties
→ vocabulary: Parliamentary Proceedings [2] (ParliPro) + mix of
existing vocabularies
→ 19 analytical SPARQL queries grouped into 4 micro-benchmarks:
Average, Count, Factual and Top 10
The ParlBench Benchmark
Goal:
→ test performances of RDF store systems in the settings of e-publishing
applications
Components:
→ real-world data: Dutch parliamentary proceedings, members and
political parties
→ vocabulary: Parliamentary Proceedings [2] (ParliPro) + mix of
existing vocabularies
→ 19 analytical SPARQL queries grouped into 4 micro-benchmarks:
Average, Count, Factual and Top 10
Performance metrics:
→ loading time
→ query response time
Outline
1 The ParlBench Benchmark
Data Sets
Queries
2 ParlBench experimental run on Virtuoso
Outline
1 The ParlBench Benchmark
Data Sets
Queries
2 ParlBench experimental run on Virtuoso
Outline
1 The ParlBench Benchmark
Data Sets
Queries
2 ParlBench experimental run on Virtuoso
The ParlBench Data Sets I
PoliticalMashup: characteristics
→ Dutch parliamentary proceedings (1814-2013),
political parties and politicians
→ richly structured XML documents (∼ 54.000)
→ URIs of concepts
→ metadata: who said what and when
→ links to Wikipedia
The ParlBench Data Sets I
PoliticalMashup: characteristics
→ Dutch parliamentary proceedings (1814-2013),
political parties and politicians
→ richly structured XML documents (∼ 54.000)
→ URIs of concepts
→ metadata: who said what and when
→ links to Wikipedia
Linked PoliticalMashup: design choices
→ keep the URIs and linking structure
→ re-use existing vocabularies
→ link to the Linked Open Data cloud
→ separate the structure from the text
The ParlBench Data Sets II
parties: Dutch political parties
members: members of the Dutch parliament
proceedings: structure of the Dutch parliamentary proceedings
paragraphs: content of speeches of the parliamentary meetings
tagged entities: links from the paragraphs to DBpedia
# of triples
parties members proceedings paragraphs tagged entities total
510 33,885 ∼36.5M ∼11.25M ∼34.4M ∼82.2M
RDF Data Model
Parliamentary Proceedings: ParliPro [2], DC and DC Terms [8]
Topic
Stage
Direction
Speech
Paragraph
Scene
Parliament
Member
Political
Party
has part
Parliamentary
Proceedings
has part
has parthas part
references
member
references
party
has part
has part
has part
has part
RDF Data Model
Parliament Member: FOAF [4], Bio [3] and DBpedia Ontology [5]
Parliament
Member
DBpedia
resource
same as
Biography
biography
RDF Data Model
Parties: ParliPro [2]
Political
Party
DBpedia
resource
same as
RDF Data Model
Paragraphs: ParliPro [2]
Paragraph
Content of the
paragraph
has text
RDF Data Model
Tagged Entities: MUTO [6], FOAF [4], Basic WGS84 [7]
Paragraph
Tag
DBpedia
resource
has auto meaning
Person Organization
Spatial
Thing
is a
is a
is a
Outline
1 The ParlBench Benchmark
Data Sets
Queries
2 ParlBench experimental run on Virtuoso
19 ParlBench queries: 4 micro-benchmarks
→ 3 Average, e.g.
A0: Retrieve average number of people spoke per topic.
→ 5 Count, e.g.
C4: Count speeches of a female speaker from the topic where only one
female spoke.
→ 6 Factual, e.g.
F3: What is the percentage of female speakers?
→ 5 Top 10, e.g.
T4: Retrieve top 10 longest topics (i.e., number of paragraphs).
Outline
1 The ParlBench Benchmark
Data Sets
Queries
2 ParlBench experimental run on Virtuoso
ParlBench experimental run
Test Machine
→ MacBook Pro + Mac OS X Lion 10.7.6 x64
→ CPUs: 2.8 GHz Intel Core i7 (2x2 cores)
→ Memory: 8GB
ParlBench experimental run
System Under Test
→ Virtuoso Open Source Edition v.06.01.3, native RDF store
→ default Virtuoso index scheme
→ configuration for large data sets loading
ParlBench experimental run
Experimental set-up
→ 8 test collections: Parties, Members, scaled Proceedings (from 1 to
100%)
→ single user mode
→ 1 run = 10 permutations of 19 queries (190 queries)
→ warm-up period: 5 runs (950 queries)
→ measuring period: 3 runs (570 queries)
→ query response time: mean of all the permutations of all the runs
(10*3 = 30 runs)
Scaling of proceedings
Scaling Factor 1% 2% 4% 8% 16% 32% 64% 100%
# of triples ∼0.5M ∼1M ∼1.9M ∼3.9M ∼7.6M ∼15M ∼23M ∼36.5M
Loading Time, log2
(time, sec)
1 2 4 8 16 32 64 100
1
2
4
8
16
32
64
128
256
512
1024
2048
4096
Size of proceedings, %
Time,sec
Query Response Time by Micro-Benchmarks,
log2
(SUM(time), sec)
1 2 4 8 16 32 64 100
0.25
0.5
1
2
4
8
16
32
64
128
256
Size of proceedings, %
Sumofexecutiontime,sec
average
count
factual
top10
Query Response Time on the Largest Collection (∼36M)
A0 A1 A2 C0 C1 C2 C3 C4 F0 F1 F2 F3 F4 F5 T0 T1 T2 T3 T4
0
10
20
30
40
50
60
70
80
90
100
110
120
130
140
150
160
170
Queries
Time,sec
45.9422
39.5885
47.1268
2.4212
10.6883
1.4383 0.8649
30.0118
7.9996
78.1858
22.377822.4192
0.1053
48.8887
0.8357
10.2813
41.6915
0.9241
168.1313
average
count
factual
top10
T4: Retrieve top 10 longest topics (i.e., number of
paragraphs).
SELECT ?topic COUNT(?par) as ?numOfPars
WHERE {
?topic rdf:type parlipro:Topic .
?speech rdf:type parlipro:Speech .
?speech dcterms:hasPart ?par .
?par rdf:type parlipro:Paragraph .
{?topic dcterms:hasPart ?speech .}
UNION{
?topic dcterms:hasPart ?sd .
?sd rdf:type parlipro:StageDirection .
?sd dcterms:hasPart ?speech .}
UNION{
?topic dcterms:hasPart ?scene .
?scene rdf:type parlipro:Scene .
?scene dcterms:hasPart ?speech .}}
GROUP BY ?topic
ORDER BY DESC(?numOfPars)
LIMIT 10
Characteristics of ParlBench queries
micro benchmark
Average Count Factual Top 10
A0 A1 A2 C0 C1 C2 C3 C4 F0 F1 F2 F3 F4 F5 T0 T1 T2 T3 T4
FILTER + + + + + + + +
UNION + + + + + + + + +
LIMIT + + + + + + +
ORDER BY + + + + + + +
GROUP BY + + + + + + + + + + + +
COUNT + + + + + + + + + + + + + + + + +
DISTINCT + + + +
AVG + + +
negation +
OPTIONAL + +
subquery + + + + + + +
blank node scoping + + + + + + + + +
# of triple patterns 10 9 12 5 5 5 6 13 8 16 6 6 2 4 2 4 9 3 11
T2: Retrieve top 10 topics with the most speeches
SELECT ?topic COUNT(?speech) as ?numOfSpeeches
WHERE {
?topic rdf:type parlipro:Topic .
?speech rdf:type parlipro:Speech .
{?topic dcterms:hasPart ?speech .}
UNION{
{?topic dcterms:hasPart ?sd .
?sd rdf:type parlipro:StageDirection .
?sd dcterms:hasPart ?speech .}
UNION{
?topic dcterms:hasPart ?scene .
?scene rdf:type parlipro:Scene .
?scene dcterms:hasPart ?speech .}}
GROUP BY ?topic
ORDER BY DESC(?numOfSpeeches)
LIMIT 10
Conclusion
→ SPARQL-benchmark for e-publishing applications
→ large collections of real data
→ intuitive analytical queries
→ micro-benchmarks for SPARQL features analysis
Future work
→ enlarge the data sets
- votes in proceedings
- interlink proceedings with the Dutch legislation data set [1] (>280M of
triples)
- tagged entities: more tags
→ extend the queries
- SPARQL 1.1: path expressions
- Linked Open Data integration scenario
→ run the benchmark on other RDF stores
Thank you!
ParlBench resources
→ data access:
→ resolvable URIs
→ RDF data dumps at http://data.politicalmashup.nl/RDF/data/
→ experimental run:
website describing an experimental run
http://data.politicalmashup.nl/RDF/
public SPARQL-endpoint to a test collection
http://data.politicalmashup.nl/sparql/
→ scripts are available at
http://data.politicalmashup.nl/RDF/scripts/
→ ParliPro vocabulary:
RDF representation http://purl.org/vocab/parlipro#
HTML representation
http://data.politicalmashup.nl/RDF/vocabularies/parlipro
Thank you!
Questions?
References I
Dutch national regulations in CEN MetaLex
http://doc.metalex.eu/
The Parliamentary Proceedings (ParliPro) Vocabulary
http://purl.org/vocab/parlipro#
BIO: A vocabulary for biographical information
http://vocab.org/bio
The Friend of a Friend Vocabulary (FOAF)
http://xmlns.com/foaf/0.1/
The DBpedia Ontology http://dbpedia.org/ontology/
The Modular Unified Tagging Ontology (MUTO)
http://muto.socialtagging.org/
Basic Geo (WGS84 lat/long) Vocabulary
http://www.w3.org/2003/01/geo/wgs84_pos#
References II
Dublin Core Metadata Element Set
http://purl.org/dc/elements/1.1/ and Dublin Core collection
description Terms http://purl.org/dc/terms/
Statistics of the benchmark data sets
dataset # of triples size # of files
members 33,885 14M 3,583
parties 510 612K 151
proceedings 36,503,688 4.15G 51,233
paragraphs 11,250,295 5.77G 51,233
tagged entities 34,449,033 2.57G 34,755
TOTAL: 82,237,411 ∼13G 140,955
Statistics of the ParlBench data sets
Number of classes: 9
Number of properties: 25
Number of instances per class:
Member: 3,583
Party: 151
Proceedings: 51,233
Topic: 102,289
Stage Direction: 1,776,598
Scene: 189,226
Speech: 2,495,969
Paragraph: 11,211,520
Tagged Entity: 11,383,787
Parliamentary Proceedings: example of encoding
parlipro:Parliamentary
Proceedings
pm:nl.proc.ob.d.h-
tk-19992000-2432-2483
rdf:type
pm:nl.proc.ob.d.h-
tk-19992000-2432-2483.1 parlipro:Topic
dcterms:hasPart
pm:nl.proc.ob.d.h-
tk-19992000-2432-2483.1.7.30
parlipro:Speech
rdf:type
dcterms:hasPart
pm:nl.proc.ob.d.h-
tk-19992000-2432-2483.1.7.30.1
parlipro:Paragraph
rdf:type
dcterms:hasPart
pm:nl.p.gl
pm:nl.m.02547
parlipro:refMember
parlipro:refParty
1999-12-08
rdf:type
dc:date
pm:nl.proc.ob.d.h-
tk-19992000-2432-2483.1.7 parlipro:Scene
rdf:type
dcterms:hasPart
…
Members: example of encoding
nl-dbpedia:Marijke_Vos
owl:sameAs
_:bio
bio:biography
pm:nl.m.02547
foaf:gender
bio:Biography
en-dbpedia:Marijke_Vos
owl:sameAs
dbpedia-
ont:Female
rdf:type
1957-05-04
foaf:birthday
Leidschendam
dbpedia-
ont:birthPlace
Parliament
Member
rdf:type
Paragraphs and Tagged Entities: example of encoding
Paragraph
pm:nl.proc.ob.d.h-
tk-19992000-2432-2483.1.7.30.1
parlipro:Paragraph
rdf:type
Blijkbaar is er nu het een en ander mis in de relatie
tussen de Europese Unie en de Russische Federatie. ...
has text
Paragraphs and Tagged Entities: example of encoding
Paragraph
pm:nl.proc.ob.d.h-
tk-19992000-2432-2483.1.7.30.1
parlipro:Paragraph
rdf:type
Blijkbaar is er nu het een en ander mis in de relatie
tussen de Europese Unie en de Russische Federatie. ...
has text
Tagged Entity
muto:hasTag
pm:nl.proc.ob.d.h-
tk-19992000-2432-2483.1.7.30.1
_:tag
muto:hasAutoMeaning
nl-dbpedia:Rusland geo:SpatialThing
rdf:type
parlipro:Paragraph
rdf:type

More Related Content

What's hot

What's hot (20)

inteSearch: An Intelligent Linked Data Information Access Framework
inteSearch: An Intelligent Linked Data Information Access FrameworkinteSearch: An Intelligent Linked Data Information Access Framework
inteSearch: An Intelligent Linked Data Information Access Framework
 
RDF Stream Processing: Let's React
RDF Stream Processing: Let's ReactRDF Stream Processing: Let's React
RDF Stream Processing: Let's React
 
Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-
 
Interactive Knowledge Discovery over Web of Data.
Interactive Knowledge Discovery over Web of Data.Interactive Knowledge Discovery over Web of Data.
Interactive Knowledge Discovery over Web of Data.
 
Accessing R from Python using RPy2
Accessing R from Python using RPy2Accessing R from Python using RPy2
Accessing R from Python using RPy2
 
RDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-rRDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-r
 
Text Mining with R
Text Mining with RText Mining with R
Text Mining with R
 
Connecting Stream Reasoners on the Web
Connecting Stream Reasoners on the WebConnecting Stream Reasoners on the Web
Connecting Stream Reasoners on the Web
 
Rethinking Online SPARQL Querying to Support Incremental Result Visualization
Rethinking Online SPARQL Querying to Support Incremental Result VisualizationRethinking Online SPARQL Querying to Support Incremental Result Visualization
Rethinking Online SPARQL Querying to Support Incremental Result Visualization
 
Triplewave: a step towards RDF Stream Processing on the Web
Triplewave: a step towards RDF Stream Processing on the WebTriplewave: a step towards RDF Stream Processing on the Web
Triplewave: a step towards RDF Stream Processing on the Web
 
Mapping Lo Dto Proton Revised [Compatibility Mode]
Mapping Lo Dto Proton Revised [Compatibility Mode]Mapping Lo Dto Proton Revised [Compatibility Mode]
Mapping Lo Dto Proton Revised [Compatibility Mode]
 
SAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes
SAFE: Policy Aware SPARQL Query Federation Over RDF Data CubesSAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes
SAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes
 
RDF Stream Processing Tutorial: RSP implementations
RDF Stream Processing Tutorial: RSP implementationsRDF Stream Processing Tutorial: RSP implementations
RDF Stream Processing Tutorial: RSP implementations
 
Triple Stores
Triple StoresTriple Stores
Triple Stores
 
Learning Commonalities in RDF
Learning Commonalities in RDFLearning Commonalities in RDF
Learning Commonalities in RDF
 
OVH-Change Data Capture in production with Apache Flink - Meetup Rennes 2019-...
OVH-Change Data Capture in production with Apache Flink - Meetup Rennes 2019-...OVH-Change Data Capture in production with Apache Flink - Meetup Rennes 2019-...
OVH-Change Data Capture in production with Apache Flink - Meetup Rennes 2019-...
 
Navigating and Exploring RDF Data using Formal Concept Analysis
Navigating and Exploring RDF Data using Formal Concept AnalysisNavigating and Exploring RDF Data using Formal Concept Analysis
Navigating and Exploring RDF Data using Formal Concept Analysis
 
LD4KD 2015 - Demos and tools
LD4KD 2015 - Demos and toolsLD4KD 2015 - Demos and tools
LD4KD 2015 - Demos and tools
 
2017 biological databases_part1_vupload
2017 biological databases_part1_vupload2017 biological databases_part1_vupload
2017 biological databases_part1_vupload
 
TripleWave: Spreading RDF Streams on the Web
TripleWave: Spreading RDF Streams on the WebTripleWave: Spreading RDF Streams on the Web
TripleWave: Spreading RDF Streams on the Web
 

Similar to ParlBench: a SPARQL-benchmark for electronic publishing applications.

2009 Dils Flyweb
2009 Dils Flyweb2009 Dils Flyweb
2009 Dils Flyweb
Jun Zhao
 
Consuming Linked Data 4/5 Semtech2011
Consuming Linked Data 4/5 Semtech2011Consuming Linked Data 4/5 Semtech2011
Consuming Linked Data 4/5 Semtech2011
Juan Sequeda
 
2010 03 Lodoxf Openflydata
2010 03 Lodoxf Openflydata2010 03 Lodoxf Openflydata
2010 03 Lodoxf Openflydata
Jun Zhao
 
SPARQL in the Semantic Web
SPARQL in the Semantic WebSPARQL in the Semantic Web
SPARQL in the Semantic Web
Jan Beeck
 

Similar to ParlBench: a SPARQL-benchmark for electronic publishing applications. (20)

2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod Gmod
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big Data
 
Spark Community Update - Spark Summit San Francisco 2015
Spark Community Update - Spark Summit San Francisco 2015Spark Community Update - Spark Summit San Francisco 2015
Spark Community Update - Spark Summit San Francisco 2015
 
A Comparison Between Python APIs For RDF Processing
A Comparison Between Python APIs For RDF ProcessingA Comparison Between Python APIs For RDF Processing
A Comparison Between Python APIs For RDF Processing
 
Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013
 
On the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream ProcessingOn the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream Processing
 
OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Pr...
OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Pr...OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Pr...
OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Pr...
 
2009 Dils Flyweb
2009 Dils Flyweb2009 Dils Flyweb
2009 Dils Flyweb
 
Consuming Linked Data 4/5 Semtech2011
Consuming Linked Data 4/5 Semtech2011Consuming Linked Data 4/5 Semtech2011
Consuming Linked Data 4/5 Semtech2011
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival data
 
2010 03 Lodoxf Openflydata
2010 03 Lodoxf Openflydata2010 03 Lodoxf Openflydata
2010 03 Lodoxf Openflydata
 
Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks
 
The CIARD RINGValeri
The CIARD RINGValeriThe CIARD RINGValeri
The CIARD RINGValeri
 
Sustainable queryable access to Linked Data
Sustainable queryable access to Linked DataSustainable queryable access to Linked Data
Sustainable queryable access to Linked Data
 
Querying data on the Web – client or server?
Querying data on the Web – client or server?Querying data on the Web – client or server?
Querying data on the Web – client or server?
 
Re-using Media on the Web: Media fragment re-mixing and playout
Re-using Media on the Web: Media fragment re-mixing and playoutRe-using Media on the Web: Media fragment re-mixing and playout
Re-using Media on the Web: Media fragment re-mixing and playout
 
Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012
 
State of the Semantic Web
State of the Semantic WebState of the Semantic Web
State of the Semantic Web
 
SPARQL in the Semantic Web
SPARQL in the Semantic WebSPARQL in the Semantic Web
SPARQL in the Semantic Web
 
SampLD, Structural Properties as Proxy for Semantic Relevance
SampLD, Structural Properties as Proxy for Semantic RelevanceSampLD, Structural Properties as Proxy for Semantic Relevance
SampLD, Structural Properties as Proxy for Semantic Relevance
 

Recently uploaded

Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
joachimlavalley1
 

Recently uploaded (20)

Basic_QTL_Marker-assisted_Selection_Sourabh.ppt
Basic_QTL_Marker-assisted_Selection_Sourabh.pptBasic_QTL_Marker-assisted_Selection_Sourabh.ppt
Basic_QTL_Marker-assisted_Selection_Sourabh.ppt
 
Salient features of Environment protection Act 1986.pptx
Salient features of Environment protection Act 1986.pptxSalient features of Environment protection Act 1986.pptx
Salient features of Environment protection Act 1986.pptx
 
Jose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptx
Jose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptxJose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptx
Jose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptx
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdf
 
The impact of social media on mental health and well-being has been a topic o...
The impact of social media on mental health and well-being has been a topic o...The impact of social media on mental health and well-being has been a topic o...
The impact of social media on mental health and well-being has been a topic o...
 
Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"
Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"
Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"
 
Gyanartha SciBizTech Quiz slideshare.pptx
Gyanartha SciBizTech Quiz slideshare.pptxGyanartha SciBizTech Quiz slideshare.pptx
Gyanartha SciBizTech Quiz slideshare.pptx
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
 
Application of Matrices in real life. Presentation on application of matrices
Application of Matrices in real life. Presentation on application of matricesApplication of Matrices in real life. Presentation on application of matrices
Application of Matrices in real life. Presentation on application of matrices
 
slides CapTechTalks Webinar May 2024 Alexander Perry.pptx
slides CapTechTalks Webinar May 2024 Alexander Perry.pptxslides CapTechTalks Webinar May 2024 Alexander Perry.pptx
slides CapTechTalks Webinar May 2024 Alexander Perry.pptx
 
Matatag-Curriculum and the 21st Century Skills Presentation.pptx
Matatag-Curriculum and the 21st Century Skills Presentation.pptxMatatag-Curriculum and the 21st Century Skills Presentation.pptx
Matatag-Curriculum and the 21st Century Skills Presentation.pptx
 
Telling Your Story_ Simple Steps to Build Your Nonprofit's Brand Webinar.pdf
Telling Your Story_ Simple Steps to Build Your Nonprofit's Brand Webinar.pdfTelling Your Story_ Simple Steps to Build Your Nonprofit's Brand Webinar.pdf
Telling Your Story_ Simple Steps to Build Your Nonprofit's Brand Webinar.pdf
 
UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...
UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...
UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...
 
Danh sách HSG Bộ môn cấp trường - Cấp THPT.pdf
Danh sách HSG Bộ môn cấp trường - Cấp THPT.pdfDanh sách HSG Bộ môn cấp trường - Cấp THPT.pdf
Danh sách HSG Bộ môn cấp trường - Cấp THPT.pdf
 
50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...
50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...
50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
B.ed spl. HI pdusu exam paper-2023-24.pdf
B.ed spl. HI pdusu exam paper-2023-24.pdfB.ed spl. HI pdusu exam paper-2023-24.pdf
B.ed spl. HI pdusu exam paper-2023-24.pdf
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
 

ParlBench: a SPARQL-benchmark for electronic publishing applications.

  • 1. ParlBench: a SPARQL-benchmark for electronic publishing applications Tatiana Tarasova Maarten Marx University of Amsterdam Information and Language Processing Systems May 26, 2013 Workshop on Benchmarking RDF Systems, ESWC 2013
  • 2. MEDIA   PUBLICATIONS   LIFE-­‐SCIENCES  CROSS-­‐DOMAIN   GEOGRAPHIC   GOVERNMENT  
  • 3. MEDIA   PUBLICATIONS   LIFE-­‐SCIENCES  CROSS-­‐DOMAIN   GEOGRAPHIC   GOVERNMENT   ?  
  • 4. The ParlBench Benchmark Goal: → test performances of RDF store systems in the settings of e-publishing applications
  • 5. The ParlBench Benchmark Goal: → test performances of RDF store systems in the settings of e-publishing applications Components: → real-world data: Dutch parliamentary proceedings, members and political parties → vocabulary: Parliamentary Proceedings [2] (ParliPro) + mix of existing vocabularies → 19 analytical SPARQL queries grouped into 4 micro-benchmarks: Average, Count, Factual and Top 10
  • 6. The ParlBench Benchmark Goal: → test performances of RDF store systems in the settings of e-publishing applications Components: → real-world data: Dutch parliamentary proceedings, members and political parties → vocabulary: Parliamentary Proceedings [2] (ParliPro) + mix of existing vocabularies → 19 analytical SPARQL queries grouped into 4 micro-benchmarks: Average, Count, Factual and Top 10 Performance metrics: → loading time → query response time
  • 7. Outline 1 The ParlBench Benchmark Data Sets Queries 2 ParlBench experimental run on Virtuoso
  • 8. Outline 1 The ParlBench Benchmark Data Sets Queries 2 ParlBench experimental run on Virtuoso
  • 9. Outline 1 The ParlBench Benchmark Data Sets Queries 2 ParlBench experimental run on Virtuoso
  • 10. The ParlBench Data Sets I PoliticalMashup: characteristics → Dutch parliamentary proceedings (1814-2013), political parties and politicians → richly structured XML documents (∼ 54.000) → URIs of concepts → metadata: who said what and when → links to Wikipedia
  • 11. The ParlBench Data Sets I PoliticalMashup: characteristics → Dutch parliamentary proceedings (1814-2013), political parties and politicians → richly structured XML documents (∼ 54.000) → URIs of concepts → metadata: who said what and when → links to Wikipedia Linked PoliticalMashup: design choices → keep the URIs and linking structure → re-use existing vocabularies → link to the Linked Open Data cloud → separate the structure from the text
  • 12. The ParlBench Data Sets II parties: Dutch political parties members: members of the Dutch parliament proceedings: structure of the Dutch parliamentary proceedings paragraphs: content of speeches of the parliamentary meetings tagged entities: links from the paragraphs to DBpedia # of triples parties members proceedings paragraphs tagged entities total 510 33,885 ∼36.5M ∼11.25M ∼34.4M ∼82.2M
  • 13. RDF Data Model Parliamentary Proceedings: ParliPro [2], DC and DC Terms [8] Topic Stage Direction Speech Paragraph Scene Parliament Member Political Party has part Parliamentary Proceedings has part has parthas part references member references party has part has part has part has part
  • 14. RDF Data Model Parliament Member: FOAF [4], Bio [3] and DBpedia Ontology [5] Parliament Member DBpedia resource same as Biography biography
  • 15. RDF Data Model Parties: ParliPro [2] Political Party DBpedia resource same as
  • 16. RDF Data Model Paragraphs: ParliPro [2] Paragraph Content of the paragraph has text
  • 17. RDF Data Model Tagged Entities: MUTO [6], FOAF [4], Basic WGS84 [7] Paragraph Tag DBpedia resource has auto meaning Person Organization Spatial Thing is a is a is a
  • 18. Outline 1 The ParlBench Benchmark Data Sets Queries 2 ParlBench experimental run on Virtuoso
  • 19. 19 ParlBench queries: 4 micro-benchmarks → 3 Average, e.g. A0: Retrieve average number of people spoke per topic. → 5 Count, e.g. C4: Count speeches of a female speaker from the topic where only one female spoke. → 6 Factual, e.g. F3: What is the percentage of female speakers? → 5 Top 10, e.g. T4: Retrieve top 10 longest topics (i.e., number of paragraphs).
  • 20. Outline 1 The ParlBench Benchmark Data Sets Queries 2 ParlBench experimental run on Virtuoso
  • 21. ParlBench experimental run Test Machine → MacBook Pro + Mac OS X Lion 10.7.6 x64 → CPUs: 2.8 GHz Intel Core i7 (2x2 cores) → Memory: 8GB
  • 22. ParlBench experimental run System Under Test → Virtuoso Open Source Edition v.06.01.3, native RDF store → default Virtuoso index scheme → configuration for large data sets loading
  • 23. ParlBench experimental run Experimental set-up → 8 test collections: Parties, Members, scaled Proceedings (from 1 to 100%) → single user mode → 1 run = 10 permutations of 19 queries (190 queries) → warm-up period: 5 runs (950 queries) → measuring period: 3 runs (570 queries) → query response time: mean of all the permutations of all the runs (10*3 = 30 runs) Scaling of proceedings Scaling Factor 1% 2% 4% 8% 16% 32% 64% 100% # of triples ∼0.5M ∼1M ∼1.9M ∼3.9M ∼7.6M ∼15M ∼23M ∼36.5M
  • 24. Loading Time, log2 (time, sec) 1 2 4 8 16 32 64 100 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 Size of proceedings, % Time,sec
  • 25. Query Response Time by Micro-Benchmarks, log2 (SUM(time), sec) 1 2 4 8 16 32 64 100 0.25 0.5 1 2 4 8 16 32 64 128 256 Size of proceedings, % Sumofexecutiontime,sec average count factual top10
  • 26. Query Response Time on the Largest Collection (∼36M) A0 A1 A2 C0 C1 C2 C3 C4 F0 F1 F2 F3 F4 F5 T0 T1 T2 T3 T4 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 Queries Time,sec 45.9422 39.5885 47.1268 2.4212 10.6883 1.4383 0.8649 30.0118 7.9996 78.1858 22.377822.4192 0.1053 48.8887 0.8357 10.2813 41.6915 0.9241 168.1313 average count factual top10
  • 27. T4: Retrieve top 10 longest topics (i.e., number of paragraphs). SELECT ?topic COUNT(?par) as ?numOfPars WHERE { ?topic rdf:type parlipro:Topic . ?speech rdf:type parlipro:Speech . ?speech dcterms:hasPart ?par . ?par rdf:type parlipro:Paragraph . {?topic dcterms:hasPart ?speech .} UNION{ ?topic dcterms:hasPart ?sd . ?sd rdf:type parlipro:StageDirection . ?sd dcterms:hasPart ?speech .} UNION{ ?topic dcterms:hasPart ?scene . ?scene rdf:type parlipro:Scene . ?scene dcterms:hasPart ?speech .}} GROUP BY ?topic ORDER BY DESC(?numOfPars) LIMIT 10
  • 28. Characteristics of ParlBench queries micro benchmark Average Count Factual Top 10 A0 A1 A2 C0 C1 C2 C3 C4 F0 F1 F2 F3 F4 F5 T0 T1 T2 T3 T4 FILTER + + + + + + + + UNION + + + + + + + + + LIMIT + + + + + + + ORDER BY + + + + + + + GROUP BY + + + + + + + + + + + + COUNT + + + + + + + + + + + + + + + + + DISTINCT + + + + AVG + + + negation + OPTIONAL + + subquery + + + + + + + blank node scoping + + + + + + + + + # of triple patterns 10 9 12 5 5 5 6 13 8 16 6 6 2 4 2 4 9 3 11
  • 29. T2: Retrieve top 10 topics with the most speeches SELECT ?topic COUNT(?speech) as ?numOfSpeeches WHERE { ?topic rdf:type parlipro:Topic . ?speech rdf:type parlipro:Speech . {?topic dcterms:hasPart ?speech .} UNION{ {?topic dcterms:hasPart ?sd . ?sd rdf:type parlipro:StageDirection . ?sd dcterms:hasPart ?speech .} UNION{ ?topic dcterms:hasPart ?scene . ?scene rdf:type parlipro:Scene . ?scene dcterms:hasPart ?speech .}} GROUP BY ?topic ORDER BY DESC(?numOfSpeeches) LIMIT 10
  • 30. Conclusion → SPARQL-benchmark for e-publishing applications → large collections of real data → intuitive analytical queries → micro-benchmarks for SPARQL features analysis Future work → enlarge the data sets - votes in proceedings - interlink proceedings with the Dutch legislation data set [1] (>280M of triples) - tagged entities: more tags → extend the queries - SPARQL 1.1: path expressions - Linked Open Data integration scenario → run the benchmark on other RDF stores
  • 31. Thank you! ParlBench resources → data access: → resolvable URIs → RDF data dumps at http://data.politicalmashup.nl/RDF/data/ → experimental run: website describing an experimental run http://data.politicalmashup.nl/RDF/ public SPARQL-endpoint to a test collection http://data.politicalmashup.nl/sparql/ → scripts are available at http://data.politicalmashup.nl/RDF/scripts/ → ParliPro vocabulary: RDF representation http://purl.org/vocab/parlipro# HTML representation http://data.politicalmashup.nl/RDF/vocabularies/parlipro
  • 33. References I Dutch national regulations in CEN MetaLex http://doc.metalex.eu/ The Parliamentary Proceedings (ParliPro) Vocabulary http://purl.org/vocab/parlipro# BIO: A vocabulary for biographical information http://vocab.org/bio The Friend of a Friend Vocabulary (FOAF) http://xmlns.com/foaf/0.1/ The DBpedia Ontology http://dbpedia.org/ontology/ The Modular Unified Tagging Ontology (MUTO) http://muto.socialtagging.org/ Basic Geo (WGS84 lat/long) Vocabulary http://www.w3.org/2003/01/geo/wgs84_pos#
  • 34. References II Dublin Core Metadata Element Set http://purl.org/dc/elements/1.1/ and Dublin Core collection description Terms http://purl.org/dc/terms/
  • 35. Statistics of the benchmark data sets dataset # of triples size # of files members 33,885 14M 3,583 parties 510 612K 151 proceedings 36,503,688 4.15G 51,233 paragraphs 11,250,295 5.77G 51,233 tagged entities 34,449,033 2.57G 34,755 TOTAL: 82,237,411 ∼13G 140,955
  • 36. Statistics of the ParlBench data sets Number of classes: 9 Number of properties: 25 Number of instances per class: Member: 3,583 Party: 151 Proceedings: 51,233 Topic: 102,289 Stage Direction: 1,776,598 Scene: 189,226 Speech: 2,495,969 Paragraph: 11,211,520 Tagged Entity: 11,383,787
  • 37. Parliamentary Proceedings: example of encoding parlipro:Parliamentary Proceedings pm:nl.proc.ob.d.h- tk-19992000-2432-2483 rdf:type pm:nl.proc.ob.d.h- tk-19992000-2432-2483.1 parlipro:Topic dcterms:hasPart pm:nl.proc.ob.d.h- tk-19992000-2432-2483.1.7.30 parlipro:Speech rdf:type dcterms:hasPart pm:nl.proc.ob.d.h- tk-19992000-2432-2483.1.7.30.1 parlipro:Paragraph rdf:type dcterms:hasPart pm:nl.p.gl pm:nl.m.02547 parlipro:refMember parlipro:refParty 1999-12-08 rdf:type dc:date pm:nl.proc.ob.d.h- tk-19992000-2432-2483.1.7 parlipro:Scene rdf:type dcterms:hasPart …
  • 38. Members: example of encoding nl-dbpedia:Marijke_Vos owl:sameAs _:bio bio:biography pm:nl.m.02547 foaf:gender bio:Biography en-dbpedia:Marijke_Vos owl:sameAs dbpedia- ont:Female rdf:type 1957-05-04 foaf:birthday Leidschendam dbpedia- ont:birthPlace Parliament Member rdf:type
  • 39. Paragraphs and Tagged Entities: example of encoding Paragraph pm:nl.proc.ob.d.h- tk-19992000-2432-2483.1.7.30.1 parlipro:Paragraph rdf:type Blijkbaar is er nu het een en ander mis in de relatie tussen de Europese Unie en de Russische Federatie. ... has text
  • 40. Paragraphs and Tagged Entities: example of encoding Paragraph pm:nl.proc.ob.d.h- tk-19992000-2432-2483.1.7.30.1 parlipro:Paragraph rdf:type Blijkbaar is er nu het een en ander mis in de relatie tussen de Europese Unie en de Russische Federatie. ... has text Tagged Entity muto:hasTag pm:nl.proc.ob.d.h- tk-19992000-2432-2483.1.7.30.1 _:tag muto:hasAutoMeaning nl-dbpedia:Rusland geo:SpatialThing rdf:type parlipro:Paragraph rdf:type