With Apache Beam, you can process massive out-of-order streams (or standard batch use cases too) by defining high-level transformation pipelines that you can then run on a variety of backends, including Apache Apex, Apache Flink, Apache Spark, and Google Cloud Dataflow.
This talk introduces a new feature of the Beam programming model: stateful processing with processing-time and event-time timers. This enhancement unlocks new use cases and efficiencies, such as:
- Micro-service like workflows ("register this user, remind them after a day, and expire their sign up after a week")
- Customized output control ("only output when the signal has changed by more than 0.3")
- Carefully batched RPCs ("write as many items as possible at the same time, but no more than 500")
- Stream joins with custom output triggering ("join these two streams on an arbitrary join predicate with correct exactly once results")
In this talk, you will learn how to use Beam to develop complex, stateful pipelines to easily implement scenarios like the above, which you can finely tailor to your precise use case.