This document discusses Samza, an open source stream processing framework. It provides an overview of Samza's core capabilities including its streaming APIs, flexible deployment options, and convergence of stream and batch processing. The key points are:
- Samza allows writing streaming applications using either low-level or new high-level APIs, with the latter supporting pipeline composition and built-in transformations.
- Samza applications can be deployed either embedded or in a cluster via YARN, and a new Zookeeper-based coordination model enables flexible deployments.
- Samza aims to unify stream and batch processing by allowing streaming applications to read from and write to batch systems like HDFS, without code changes.