The document presents YABench, a framework for assessing the correctness and performance of RDF stream processors. It generates configurable RDF streams, includes an oracle to validate results, and measures engine performance. YABench was validated against an existing benchmark and used to benchmark C-SPARQL and CQELS. The experiments revealed insights such as CQELS having better precision/recall for simple queries while C-SPARQL was more memory efficient and had lower delay for complex queries. A "gracious mode" was introduced to estimate timing discrepancies between engines and the oracle.
Bentham & Hooker's Classification. along with the merits and demerits of the ...
YABench: A Comprehensive Framework for RDF Stream Processor Correctness and Performance Assessment
1. YABench:
A Comprehensive Framework for RDF
Stream Processor Correctness and
Performance Assessment
Maxim Kolchin, Peter Wetz, Elmar Kiesling, A Min Tjoa
ITMO University, Russia | TU Wien, Austria
The 16th International Conference on Web Engineering 2016, Lugano, Switzerland
2. RDF Stream Processing (RSP)
RDF Stream - a potentially infinite sequence of time-varying
data elements encoded in RDF
Continuous query - a query registered over streams that in
most cases are observed through windows
Query results - similarly to SPARQL they can be tuples, RDF
dataset or a new RDF Stream
2
3. State of the art
■ LSBench (2012)
■ SRBench (2012)
■ CSRBench (2013)
■ CityBench (2015)
Details can be found at W3C RSP Community Group’s Wiki: https://www.w3.
org/community/rsp/wiki/RSP_Benchmarking
3
4. Our contribution
■ We propose a benchmarking framework for RDF Stream Processing
engines that focuses on correctness and performance
○ Stream generator (generates configurable RDF stream)
○ Oracle (validates correctness of the results)
○ Runner (measures performance of an RSP engine)
■ We run a benchmark with the window-based RDF stream processing
engines:
○ C-SPARQL
○ CQELS
4
6. Architecture
1. Define tests,
2. Generate data streams,
3. Run the tests with a given
engine,
a. Performance metrics
are collected in a
separate process,
4. At the end validate the
results with the oracle.
6
8. Validation against CSRBench
We validated the correctness checking functionality of YABench by
reproducing the CSRBench* benchmark.
CSRBench defines 7 queries for C-SPARQL, CQELS and SPARQLstream
engines.
Datasets, test configurations and results are available online: github.
com/YABench/csrbench-validation
*Daniele Dell’Aglio, et al. “On Correctness in RDF Stream Processor Benchmarking”, 2013
8
9. Validation against CSRBench (C-SPARQL)
Query
C-SPARQL
CSRBench YABench
Q1 ✓ ✓
Q2 ✓ ✓
Q3 ✓ ✓*
Q4 ✓ ✓
Q5 ✗ ✗
Q6 ✓ ✓*
Q7 ✓ ✓*
* - the results are the same, but
because of timing discrepancies
some results sometimes present in
the subsequent window
9
10. Validation against CSRBench (CQELS)
Query
CQELS
CSRBench YABench
Q1 ✓ ✓
Q2 ✓ ✓
Q3 ✓ ✓
Q4 ✗ ✗
Q5 ✓ ✓
Q6 ✗ ✗
Q7 ✗ ✗**
** - indicates that the query did not
execute successfully on the CQELS
engine. The engine crashed before
returning the query results
10
11. Benchmark
We reuse queries introduced by CSRBench, but we’re able to parametrize them, e.
g. window size, window slide, filter values, etc.
Measure:
- Precision and recall,
- Window and result size, and delay,
- Memory and CPU usage, # of threads
We run each test 10 times, to compute the distribution of precision/recall.
Detailed results are available online: github.com/YABench/yabench-one
11
12. Benchmark: Data Stream Model
A data stream is generated
based on:
■ Number of weather
stations,
■ Time interval between
two observations of a
single station,
■ Duration of the stream,
■ A seed for the
randomize function
12
13. Benchmark: Queries
Experiment 1: SELECT + FILTER
Experiment 2: SELECT + AVG + FILTER
Experiment 3: joining of triples from different timestamps
Experiment 4: demonstrates the use of gracious mode which implemented by the
oracle to eliminate the timing discrepancy issues of the engines
13
24. Architecture: Gracious mode
In this mode the oracle tries to adjusts its window scope to match the scope of an
actual window, by moving the left and right borders to back and/or forth while the
precision and recall grows.
It allows to:
(a) confirm our assumption on why precision and recall are low,
(b) reconstruct and visualize the actual window borders
24
25. Experiment 4: gracious vs non-gracious modes
(a) In non-gracious (default) mode (b) In gracious mode
C-SPARQL
25
26. Experiment 4: gracious vs non-gracious modes
(a) In non-gracious (default) mode (b) In gracious mode
CQELS
26
27. Conclusion
■ We build a framework for benchmarking RSP engines which allows to assess
their correctness and performance
■ We run a benchmark which revealed some insides:
○ CQELS shows better precision/recall for simple queries,
○ C-SPARQL is slightly more memory efficient than CQELS,
○ C-SPARQL outperformes CQELS in terms of delay for more complex queries, which is mainly
caused by a different reporting strategy
■ By introducing gracious mode we’re able to estimate the extent of the timing
discrepancy
27