2. Spark / Storm
Spark Storm
Implemented in Scala Clojure, Java
Delivery Semantics
Exactly once At least once. Exactly
once with Trident
API
Python, Java, Scala Java, Scala, Clojure,
Python, etc. Trident:
Java, Scala, Clojure.
Processing Model
Batch. Micro-batches
with Spark Streaming.
~ 500ms
Record at a time/
Trident allows for
micro-batches.
Latency 1 - 2 seconds sub-seconds
7. Metric
Throughput: amount of data that is being
processed.
● By changing batch size
● By changing load (i.e. Scaling up)
● Programs used for benchmarking will be wordcount.
19. Takeaways
● Setting the batch interval in spark streaming
should be done by monitoring processing times
and load size
● For Storm as numbers of producers increase so
does throughput and spout latency.
20. Would like to add:
● Increase number of producers. Use real data.
● Add a graph as a second use case.
● Dashboard to monitor live streaming.
21. Gabriela Choy
Bsc. in Chem. Engineering. ULA, Vnzla.
Msc in Statistics. UT Dallas
Previously: Worked in Device Reliability Engineering at View, Inc.
About Me