Dataset of 40GB from Twitter
GridRS - PUCRS
4 x 3.52 GHz (Intel Xeon)
2 GB RAM
Latency: time to a tuple traverse the graph
Throughput: no. of tuples processed per sec.
Loss of Tuples
5 runs per test.
Every 3s each operator sends its status with
no. of tuples processed.
The PerfMon sink collects a tuple every
100ms, and sends the average latency every
3s (and cleans up the collected tuples).
Number of nodes
Number of operator instances
The system was able to process more
data with the inclusion of more nodes
On the other hand, distributing the
load increased the latency
The scheduler has to reduce the
The communication between workers
in the same node has to happen
through main memory
Chakravarthy, Sharma. Stream data processing: a quality of
service perspective: modeling, scheduling, load shedding, and
complex event processing. Vol. 36. Springer, 2009.
Cormode, Graham, and S. Muthukrishnan. "An improved data
stream summary: the count-min sketch and its applications."
Journal of Algorithms 55.1 (2005): 58-75.
Gulisano, Vincenzo Massimiliano, Ricardo Jiménez Peris, and
Patrick Valduriez. StreamCloud: An Elastic Parallel-Distributed
Stream Processing Engine. Diss. Informatica, 2012.
Source code @ github.com/mayconbordin/tempest