Stream processing applications built on Apache Apex run on Hadoop clusters and typically power analytics use cases where availability, flexible scaling, high throughput, low latency and correctness are essential. These applications consume data from a variety of sources, including streaming sources like Apache Kafka, Kinesis or JMS, file based sources or databases. Processing results often need to be stored in external systems (sinks) for downstream consumers (pub-sub messaging, real-time visualization, Hive and other SQL databases etc.). Apex has the Malhar library with a wide range of connectors and other operators that are readily available to build applications. We will cover key characteristics like partitioning and processing guarantees, generic building blocks for new operators (write-ahead-log, incremental state saving, windowing etc.) and APIs for application specification.
Thomas Weise is Apache Apex PMC member and architect/co-founder at DataTorrent.
Chinmay Kolhatkar is a Committer for Apache Apex and a Software Engineer at DataTorrent
If you'd like to learn more about Apache Apex and DataTorrent, feel free to schedule a 30-minute consultation with our team here: https://www.datatorrent.com/30-minutes-consult/