SlideShare a Scribd company logo
From Batch to
Streaming ET(L) with
Apache Apex
Thomas Weise
Apache Apex PMC Chair
thw@apache.org @thweise @atrato_io
Stream Data Processing with Apache Apex
2
Mobile Devices
Logs
Sensor Data
Social
Databases
CDC
Oper1 Oper2 Oper3
Real-time visualization,
storage, etc
Data Delivery & Storage Transform / Analytics
SQL
Declarative
API
DAG API
SAMOA
Beam
Operator
Library
SAMOA
Beam
(roadmap)
Data Sources
https://www.slideshare.net/ashishtadose1/realtime-adtech-reporting-targeting-with-apache-apex
Use Case
3
Batch processing with several
hours till insight:
● Available data stale, does
no longer apply to current
situation
● Current data stuck in batch
pipeline
● Complex batch processing
orchestration with many
different components
● Hours of delay translate to
high cost due to inability to
make timely campaign
adjustments.
Batch pipeline
> 5 hours
Processing with Apex, reuse
batch ingestion:
● Existing ingestion
mechanism (files in S3,
shared with other
pipelines)
● Migrate transform logic to
Apex
● Enable reporting from
application state
(“Queryable State”)
● Reduced latency, valuable
as intermediate step.
Batch ingest +
streaming transforms
~ 20 minutes
Streaming source and
processing:
● Data comes directly from
Kafka clusters
● Significantly reduced
latency
● Balanced load (no
ingestion spikes)
● Reduced resource
consumption with Apex
support for multi-cluster
Kafka consumers
● Reporting meets SLA
requirements
End-to-end stream
processing
seconds
4
Phased Transition
Phase 1 (batch ingest)
https://www.slideshare.net/ApacheApex/real-time-insights-for-advertising-tech5
Phase 2 (hybrid)
6
Real-time Streaming
7
Real-time Dashboard
https://www.slideshare.net/ashishtadose1/realtime-adtech-reporting-targeting-with-apache-apex8
Pipeline Transformations
9
Kafka/
Files
Decompress
& Parse
Decompress
& Parse
Decompress
& Parse
Enrich
& Map
Enrich
& Map
Enrich
& Map
Dimensional
Compute
Dimensional
Compute
Dimensional
Compute
Query
Results
Visualization
Input Tuples
Input Tuples
Input Tuples
Parsed
Tuples
Parsed
Tuples
Parsed
Tuples
Enriched
Tuples
Enriched
Tuples
Enriched
Tuples
Partial
Aggregates
Partial
Aggregates
Partial
Aggregates
Visualization
Results
Visualization
Query
Aggregate
Query
Aggregate
Results
https://www.slideshare.net/ApacheApex/actionable-insights-with-apache-apex-at-apache-big-data-2017-by-devendra-tagare
Store
Store
Dimension Computation
10
hour advertiser location cost revenue impr clicks
10:00 6 9 => 10 22 3
10:00 Burger King 4 6 12 2
10:00 Subway 2 3 => 4 10 2
10:00 CA 4 6 15 3
10:00 WA 2 3 => 4 7 1
10:00 Burger King CA 2 3 5 1
10:00 Burger King WA 2 3 7 1
10:00 Subway CA 2 3 10 2
10:00 Subway WA 0 => 1
Advertiser: Subway
Location: WA
Cost: 2
Revenue: 1
Impressions: 5
Clicks: 1
Time: 10:15:30
● 6 geographically distributed data centers
● 10 PB of data under management
● 50 TB/day of data generated from auction & client logs
● 40+ billion ad impressions and 350+ billion bids per day
● Average data inflow of 450K events/sec
● 64 Kafka Input partitions, 32 instances of in-memory distributed store
● 1.2 TB of memory for the Apex application
Scale
11
● State Management & Fault tolerance
○ Exactly-once, Checkpointing and Windowing
○ Fine grained recovery, low-latency SLA support
○ Queryable state
● Processing based on event time
○ Accuracy, Repeatable/Replay
● Native Streaming
○ Low latency + high throughput, efficient resource utilization
○ Pipelined processing (data in motion)
● Scalability
○ Process more data by adding compute resources, no platform/architecture limits
○ Dynamic scaling and resource allocation, elasticity
● Library of connectors and transformations
○ Time to value
Why Apex
12
Apex Library
13
Stateful Transformations
• Windowing: sliding, tumbling, session
• Accumulations: sum, merge, join, sort, top n, …
• Triggering, Watermarks
• Dimensional Aggregations (with state management for historical
data + query)
• Deduplication
RDBMS
• JDBC
• MySQL
• Oracle
• MemSQL
NoSQL
• Cassandra, HBase
• Aerospike, Accumulo
• Couchbase, CouchDB
• Redis, MongoDB
• Geode, Kudu
Messaging
• Kafka
• JMS (ActiveMQ etc.)
• Kinesis, SQS
• Flume, NiFi
• MQTT
File Systems
• HDFS / Hive
• Local File
• S3
• FTP
Stateless Transformations
• Parsers: XML, JSON, CSV, Avro
• Filter
• Enrich
• Configurable POJO schema
• Map, FlatMap (custom Java function)
• Script (JavaScript, Jython)
Other
• Elastic Search
• Solr
• Twitter
• WebSocket / HTTP
• SMTP
How to build it
14
Example Application (Twitter)
● Top N hashtags
● Tweet stats time series
● Queryable state
● WebSocket Pub/Sub
● Visualization with Grafana
Source code: https://github.com/tweise/apex-samples/tree/master/twitter
15
Real-time Visualization
16
Top Hashtags
● Keyed sum accumulation (5 minute window, count trigger)
● TopN accumulation of upstream windowed counts
17
Queryable State
A set of operators in the library that support real-time queries of operator state.
18
Hashtag
Extractor
TopN
Window
Twitter Feed
Input
Operator
CountByKey
Window
Snapshot
Server Result
Pub/Sub
Broker
HTTPWebSocket
Query
Input
● Pub/Sub server: https://github.com/atrato/pubsub-server
● Grafana data source: https://github.com/atrato/apex-grafana-datasource-server
Queryable State
● Snapshot server
○ Stateful operator that holds last received list of objects
○ Receives query and emits the list as JSON formatted query result
● Source schema configured, result fields via query
● Predefined schemas (Apex library): “Snapshot”, “Dimensional”
19
Demo
20
● Apex runner in Apache Beam
● Iterative processing
● Integrated with Apache Samoa, opens up ML
● Integrated with Apache Calcite, enables SQL
● Scalable, incremental state management
● User defined control tuples (watermarks, batch control, …)
● Enhanced support for Batch Processing
● Support for Mesos and Kubernetes
● Encrypted Streams
● Support for Python
Apex - Recent Additions & Roadmap
21
Resources
22
• http://apex.apache.org/
• Powered by Apex - http://apex.apache.org/powered-by-apex.html
• Learn more - http://apex.apache.org/docs.html
• Getting involved - http://apex.apache.org/community.html
• Download - http://apex.apache.org/downloads.html
• Follow @ApacheApex - https://twitter.com/apacheapex
• Meetups - https://www.meetup.com/topics/apache-apex/
• Examples - https://github.com/apache/apex-malhar/tree/master/examples
• Slideshare - http://www.slideshare.net/ApacheApex/presentations

More Related Content

What's hot

Leveraging Microservices and Apache Kafka to Scale Developer Productivity
Leveraging Microservices and Apache Kafka to Scale Developer ProductivityLeveraging Microservices and Apache Kafka to Scale Developer Productivity
Leveraging Microservices and Apache Kafka to Scale Developer Productivity
confluent
 
Performance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State StoresPerformance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State Stores
confluent
 
From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...
From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...
From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...
confluent
 
What's New in Confluent Platform 5.5
What's New in Confluent Platform 5.5What's New in Confluent Platform 5.5
What's New in Confluent Platform 5.5
confluent
 
3 Ways to Deliver an Elastic, Cost-Effective Cloud Architecture (ANZ)
3 Ways to Deliver an Elastic, Cost-Effective Cloud Architecture (ANZ)3 Ways to Deliver an Elastic, Cost-Effective Cloud Architecture (ANZ)
3 Ways to Deliver an Elastic, Cost-Effective Cloud Architecture (ANZ)
confluent
 
When to KSQL & When to Live the KStream (Dani Traphagen, Confluent) Kafka Sum...
When to KSQL & When to Live the KStream (Dani Traphagen, Confluent) Kafka Sum...When to KSQL & When to Live the KStream (Dani Traphagen, Confluent) Kafka Sum...
When to KSQL & When to Live the KStream (Dani Traphagen, Confluent) Kafka Sum...
confluent
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!
Guido Schmutz
 
ksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database SystemksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database System
confluent
 
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafka
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache KafkaKafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafka
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafka
confluent
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to StreamingBravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Yaroslav Tkachenko
 
Event Stream Processing with Kafka and Samza
Event Stream Processing with Kafka and SamzaEvent Stream Processing with Kafka and Samza
Event Stream Processing with Kafka and Samza
Zach Cox
 
So You’ve Inherited Kafka? Now What? (Alon Gavra, AppsFlyer) Kafka Summit Lon...
So You’ve Inherited Kafka? Now What? (Alon Gavra, AppsFlyer) Kafka Summit Lon...So You’ve Inherited Kafka? Now What? (Alon Gavra, AppsFlyer) Kafka Summit Lon...
So You’ve Inherited Kafka? Now What? (Alon Gavra, AppsFlyer) Kafka Summit Lon...
confluent
 
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ UberKafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
confluent
 
Building a Web Application with Kafka as your Database
Building a Web Application with Kafka as your DatabaseBuilding a Web Application with Kafka as your Database
Building a Web Application with Kafka as your Database
confluent
 
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
confluent
 
IoT and Event Streaming at Scale with Apache Kafka
IoT and Event Streaming at Scale with Apache KafkaIoT and Event Streaming at Scale with Apache Kafka
IoT and Event Streaming at Scale with Apache Kafka
confluent
 
Kafka Summit NYC 2017 - Building Advanced Streaming Applications using the La...
Kafka Summit NYC 2017 - Building Advanced Streaming Applications using the La...Kafka Summit NYC 2017 - Building Advanced Streaming Applications using the La...
Kafka Summit NYC 2017 - Building Advanced Streaming Applications using the La...
confluent
 
Deploying Confluent Platform for Production
Deploying Confluent Platform for ProductionDeploying Confluent Platform for Production
Deploying Confluent Platform for Production
confluent
 
What is Apache Kafka®?
What is Apache Kafka®?What is Apache Kafka®?
What is Apache Kafka®?
confluent
 
user Behavior Analysis with Session Windows and Apache Kafka's Streams API
user Behavior Analysis with Session Windows and Apache Kafka's Streams APIuser Behavior Analysis with Session Windows and Apache Kafka's Streams API
user Behavior Analysis with Session Windows and Apache Kafka's Streams API
confluent
 

What's hot (20)

Leveraging Microservices and Apache Kafka to Scale Developer Productivity
Leveraging Microservices and Apache Kafka to Scale Developer ProductivityLeveraging Microservices and Apache Kafka to Scale Developer Productivity
Leveraging Microservices and Apache Kafka to Scale Developer Productivity
 
Performance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State StoresPerformance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State Stores
 
From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...
From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...
From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...
 
What's New in Confluent Platform 5.5
What's New in Confluent Platform 5.5What's New in Confluent Platform 5.5
What's New in Confluent Platform 5.5
 
3 Ways to Deliver an Elastic, Cost-Effective Cloud Architecture (ANZ)
3 Ways to Deliver an Elastic, Cost-Effective Cloud Architecture (ANZ)3 Ways to Deliver an Elastic, Cost-Effective Cloud Architecture (ANZ)
3 Ways to Deliver an Elastic, Cost-Effective Cloud Architecture (ANZ)
 
When to KSQL & When to Live the KStream (Dani Traphagen, Confluent) Kafka Sum...
When to KSQL & When to Live the KStream (Dani Traphagen, Confluent) Kafka Sum...When to KSQL & When to Live the KStream (Dani Traphagen, Confluent) Kafka Sum...
When to KSQL & When to Live the KStream (Dani Traphagen, Confluent) Kafka Sum...
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!
 
ksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database SystemksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database System
 
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafka
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache KafkaKafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafka
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafka
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to StreamingBravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
 
Event Stream Processing with Kafka and Samza
Event Stream Processing with Kafka and SamzaEvent Stream Processing with Kafka and Samza
Event Stream Processing with Kafka and Samza
 
So You’ve Inherited Kafka? Now What? (Alon Gavra, AppsFlyer) Kafka Summit Lon...
So You’ve Inherited Kafka? Now What? (Alon Gavra, AppsFlyer) Kafka Summit Lon...So You’ve Inherited Kafka? Now What? (Alon Gavra, AppsFlyer) Kafka Summit Lon...
So You’ve Inherited Kafka? Now What? (Alon Gavra, AppsFlyer) Kafka Summit Lon...
 
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ UberKafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
 
Building a Web Application with Kafka as your Database
Building a Web Application with Kafka as your DatabaseBuilding a Web Application with Kafka as your Database
Building a Web Application with Kafka as your Database
 
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
 
IoT and Event Streaming at Scale with Apache Kafka
IoT and Event Streaming at Scale with Apache KafkaIoT and Event Streaming at Scale with Apache Kafka
IoT and Event Streaming at Scale with Apache Kafka
 
Kafka Summit NYC 2017 - Building Advanced Streaming Applications using the La...
Kafka Summit NYC 2017 - Building Advanced Streaming Applications using the La...Kafka Summit NYC 2017 - Building Advanced Streaming Applications using the La...
Kafka Summit NYC 2017 - Building Advanced Streaming Applications using the La...
 
Deploying Confluent Platform for Production
Deploying Confluent Platform for ProductionDeploying Confluent Platform for Production
Deploying Confluent Platform for Production
 
What is Apache Kafka®?
What is Apache Kafka®?What is Apache Kafka®?
What is Apache Kafka®?
 
user Behavior Analysis with Session Windows and Apache Kafka's Streams API
user Behavior Analysis with Session Windows and Apache Kafka's Streams APIuser Behavior Analysis with Session Windows and Apache Kafka's Streams API
user Behavior Analysis with Session Windows and Apache Kafka's Streams API
 

Similar to From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017

From Batch to Streaming ET(L) with Apache Apex
From Batch to Streaming ET(L) with Apache ApexFrom Batch to Streaming ET(L) with Apache Apex
From Batch to Streaming ET(L) with Apache Apex
DataWorks Summit
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Apache Apex
 
Real Time Insights for Advertising Tech
Real Time Insights for Advertising TechReal Time Insights for Advertising Tech
Real Time Insights for Advertising Tech
Apache Apex
 
Ingestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexIngestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache Apex
Apache Apex
 
Next Gen Big Data Analytics with Apache Apex
Next Gen Big Data Analytics with Apache Apex Next Gen Big Data Analytics with Apache Apex
Next Gen Big Data Analytics with Apache Apex
DataWorks Summit/Hadoop Summit
 
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache ApexHadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Apache Apex
 
Architectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark StreamingArchitectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark Streaming
Apache Apex
 
Data Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkData Stream Processing with Apache Flink
Data Stream Processing with Apache Flink
Fabian Hueske
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
DataWorks Summit
 
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Apex
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL
WSO2
 
Apache Flink: Past, Present and Future
Apache Flink: Past, Present and FutureApache Flink: Past, Present and Future
Apache Flink: Past, Present and Future
Gyula Fóra
 
Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015
Taro L. Saito
 
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 KeynoteAdvanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
StreamNative
 
Flink Streaming @BudapestData
Flink Streaming @BudapestDataFlink Streaming @BudapestData
Flink Streaming @BudapestData
Gyula Fóra
 
GOTO Night Amsterdam - Stream processing with Apache Flink
GOTO Night Amsterdam - Stream processing with Apache FlinkGOTO Night Amsterdam - Stream processing with Apache Flink
GOTO Night Amsterdam - Stream processing with Apache Flink
Robert Metzger
 
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Intro to Apache Apex (next gen Hadoop) & comparison to Spark StreamingIntro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Apache Apex
 
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
Amazon Web Services
 
Introduction to Apache Apex by Thomas Weise
Introduction to Apache Apex by Thomas WeiseIntroduction to Apache Apex by Thomas Weise
Introduction to Apache Apex by Thomas Weise
Big Data Spain
 
Sub-Second SQL Search, Aggregations and Joins with Kafka and Rockset | Dhruba...
Sub-Second SQL Search, Aggregations and Joins with Kafka and Rockset | Dhruba...Sub-Second SQL Search, Aggregations and Joins with Kafka and Rockset | Dhruba...
Sub-Second SQL Search, Aggregations and Joins with Kafka and Rockset | Dhruba...
HostedbyConfluent
 

Similar to From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017 (20)

From Batch to Streaming ET(L) with Apache Apex
From Batch to Streaming ET(L) with Apache ApexFrom Batch to Streaming ET(L) with Apache Apex
From Batch to Streaming ET(L) with Apache Apex
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
 
Real Time Insights for Advertising Tech
Real Time Insights for Advertising TechReal Time Insights for Advertising Tech
Real Time Insights for Advertising Tech
 
Ingestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexIngestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache Apex
 
Next Gen Big Data Analytics with Apache Apex
Next Gen Big Data Analytics with Apache Apex Next Gen Big Data Analytics with Apache Apex
Next Gen Big Data Analytics with Apache Apex
 
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache ApexHadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
 
Architectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark StreamingArchitectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark Streaming
 
Data Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkData Stream Processing with Apache Flink
Data Stream Processing with Apache Flink
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL
 
Apache Flink: Past, Present and Future
Apache Flink: Past, Present and FutureApache Flink: Past, Present and Future
Apache Flink: Past, Present and Future
 
Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015
 
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 KeynoteAdvanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
 
Flink Streaming @BudapestData
Flink Streaming @BudapestDataFlink Streaming @BudapestData
Flink Streaming @BudapestData
 
GOTO Night Amsterdam - Stream processing with Apache Flink
GOTO Night Amsterdam - Stream processing with Apache FlinkGOTO Night Amsterdam - Stream processing with Apache Flink
GOTO Night Amsterdam - Stream processing with Apache Flink
 
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Intro to Apache Apex (next gen Hadoop) & comparison to Spark StreamingIntro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
 
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
 
Introduction to Apache Apex by Thomas Weise
Introduction to Apache Apex by Thomas WeiseIntroduction to Apache Apex by Thomas Weise
Introduction to Apache Apex by Thomas Weise
 
Sub-Second SQL Search, Aggregations and Joins with Kafka and Rockset | Dhruba...
Sub-Second SQL Search, Aggregations and Joins with Kafka and Rockset | Dhruba...Sub-Second SQL Search, Aggregations and Joins with Kafka and Rockset | Dhruba...
Sub-Second SQL Search, Aggregations and Joins with Kafka and Rockset | Dhruba...
 

More from Thomas Weise

Streaming your Lyft Ride Prices - Flink Forward SF 2019
Streaming your Lyft Ride Prices - Flink Forward SF 2019Streaming your Lyft Ride Prices - Flink Forward SF 2019
Streaming your Lyft Ride Prices - Flink Forward SF 2019
Thomas Weise
 
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Thomas Weise
 
BigDataSpain 2016: Introduction to Apache Apex
BigDataSpain 2016: Introduction to Apache ApexBigDataSpain 2016: Introduction to Apache Apex
BigDataSpain 2016: Introduction to Apache Apex
Thomas Weise
 
BigDataSpain 2016: Stream Processing Applications with Apache Apex
BigDataSpain 2016: Stream Processing Applications with Apache ApexBigDataSpain 2016: Stream Processing Applications with Apache Apex
BigDataSpain 2016: Stream Processing Applications with Apache Apex
Thomas Weise
 
Apache Apex: Stream Processing Architecture and Applications
Apache Apex: Stream Processing Architecture and ApplicationsApache Apex: Stream Processing Architecture and Applications
Apache Apex: Stream Processing Architecture and Applications
Thomas Weise
 
DataTorrent Presentation @ Big Data Application Meetup
DataTorrent Presentation @ Big Data Application MeetupDataTorrent Presentation @ Big Data Application Meetup
DataTorrent Presentation @ Big Data Application Meetup
Thomas Weise
 

More from Thomas Weise (6)

Streaming your Lyft Ride Prices - Flink Forward SF 2019
Streaming your Lyft Ride Prices - Flink Forward SF 2019Streaming your Lyft Ride Prices - Flink Forward SF 2019
Streaming your Lyft Ride Prices - Flink Forward SF 2019
 
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
 
BigDataSpain 2016: Introduction to Apache Apex
BigDataSpain 2016: Introduction to Apache ApexBigDataSpain 2016: Introduction to Apache Apex
BigDataSpain 2016: Introduction to Apache Apex
 
BigDataSpain 2016: Stream Processing Applications with Apache Apex
BigDataSpain 2016: Stream Processing Applications with Apache ApexBigDataSpain 2016: Stream Processing Applications with Apache Apex
BigDataSpain 2016: Stream Processing Applications with Apache Apex
 
Apache Apex: Stream Processing Architecture and Applications
Apache Apex: Stream Processing Architecture and ApplicationsApache Apex: Stream Processing Architecture and Applications
Apache Apex: Stream Processing Architecture and Applications
 
DataTorrent Presentation @ Big Data Application Meetup
DataTorrent Presentation @ Big Data Application MeetupDataTorrent Presentation @ Big Data Application Meetup
DataTorrent Presentation @ Big Data Application Meetup
 

Recently uploaded

SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.
ViralQR
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
UiPathCommunity
 

Recently uploaded (20)

SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
 

From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017

  • 1. From Batch to Streaming ET(L) with Apache Apex Thomas Weise Apache Apex PMC Chair thw@apache.org @thweise @atrato_io
  • 2. Stream Data Processing with Apache Apex 2 Mobile Devices Logs Sensor Data Social Databases CDC Oper1 Oper2 Oper3 Real-time visualization, storage, etc Data Delivery & Storage Transform / Analytics SQL Declarative API DAG API SAMOA Beam Operator Library SAMOA Beam (roadmap) Data Sources
  • 4. Batch processing with several hours till insight: ● Available data stale, does no longer apply to current situation ● Current data stuck in batch pipeline ● Complex batch processing orchestration with many different components ● Hours of delay translate to high cost due to inability to make timely campaign adjustments. Batch pipeline > 5 hours Processing with Apex, reuse batch ingestion: ● Existing ingestion mechanism (files in S3, shared with other pipelines) ● Migrate transform logic to Apex ● Enable reporting from application state (“Queryable State”) ● Reduced latency, valuable as intermediate step. Batch ingest + streaming transforms ~ 20 minutes Streaming source and processing: ● Data comes directly from Kafka clusters ● Significantly reduced latency ● Balanced load (no ingestion spikes) ● Reduced resource consumption with Apex support for multi-cluster Kafka consumers ● Reporting meets SLA requirements End-to-end stream processing seconds 4 Phased Transition
  • 5. Phase 1 (batch ingest) https://www.slideshare.net/ApacheApex/real-time-insights-for-advertising-tech5
  • 9. Pipeline Transformations 9 Kafka/ Files Decompress & Parse Decompress & Parse Decompress & Parse Enrich & Map Enrich & Map Enrich & Map Dimensional Compute Dimensional Compute Dimensional Compute Query Results Visualization Input Tuples Input Tuples Input Tuples Parsed Tuples Parsed Tuples Parsed Tuples Enriched Tuples Enriched Tuples Enriched Tuples Partial Aggregates Partial Aggregates Partial Aggregates Visualization Results Visualization Query Aggregate Query Aggregate Results https://www.slideshare.net/ApacheApex/actionable-insights-with-apache-apex-at-apache-big-data-2017-by-devendra-tagare Store Store
  • 10. Dimension Computation 10 hour advertiser location cost revenue impr clicks 10:00 6 9 => 10 22 3 10:00 Burger King 4 6 12 2 10:00 Subway 2 3 => 4 10 2 10:00 CA 4 6 15 3 10:00 WA 2 3 => 4 7 1 10:00 Burger King CA 2 3 5 1 10:00 Burger King WA 2 3 7 1 10:00 Subway CA 2 3 10 2 10:00 Subway WA 0 => 1 Advertiser: Subway Location: WA Cost: 2 Revenue: 1 Impressions: 5 Clicks: 1 Time: 10:15:30
  • 11. ● 6 geographically distributed data centers ● 10 PB of data under management ● 50 TB/day of data generated from auction & client logs ● 40+ billion ad impressions and 350+ billion bids per day ● Average data inflow of 450K events/sec ● 64 Kafka Input partitions, 32 instances of in-memory distributed store ● 1.2 TB of memory for the Apex application Scale 11
  • 12. ● State Management & Fault tolerance ○ Exactly-once, Checkpointing and Windowing ○ Fine grained recovery, low-latency SLA support ○ Queryable state ● Processing based on event time ○ Accuracy, Repeatable/Replay ● Native Streaming ○ Low latency + high throughput, efficient resource utilization ○ Pipelined processing (data in motion) ● Scalability ○ Process more data by adding compute resources, no platform/architecture limits ○ Dynamic scaling and resource allocation, elasticity ● Library of connectors and transformations ○ Time to value Why Apex 12
  • 13. Apex Library 13 Stateful Transformations • Windowing: sliding, tumbling, session • Accumulations: sum, merge, join, sort, top n, … • Triggering, Watermarks • Dimensional Aggregations (with state management for historical data + query) • Deduplication RDBMS • JDBC • MySQL • Oracle • MemSQL NoSQL • Cassandra, HBase • Aerospike, Accumulo • Couchbase, CouchDB • Redis, MongoDB • Geode, Kudu Messaging • Kafka • JMS (ActiveMQ etc.) • Kinesis, SQS • Flume, NiFi • MQTT File Systems • HDFS / Hive • Local File • S3 • FTP Stateless Transformations • Parsers: XML, JSON, CSV, Avro • Filter • Enrich • Configurable POJO schema • Map, FlatMap (custom Java function) • Script (JavaScript, Jython) Other • Elastic Search • Solr • Twitter • WebSocket / HTTP • SMTP
  • 14. How to build it 14
  • 15. Example Application (Twitter) ● Top N hashtags ● Tweet stats time series ● Queryable state ● WebSocket Pub/Sub ● Visualization with Grafana Source code: https://github.com/tweise/apex-samples/tree/master/twitter 15
  • 17. Top Hashtags ● Keyed sum accumulation (5 minute window, count trigger) ● TopN accumulation of upstream windowed counts 17
  • 18. Queryable State A set of operators in the library that support real-time queries of operator state. 18 Hashtag Extractor TopN Window Twitter Feed Input Operator CountByKey Window Snapshot Server Result Pub/Sub Broker HTTPWebSocket Query Input ● Pub/Sub server: https://github.com/atrato/pubsub-server ● Grafana data source: https://github.com/atrato/apex-grafana-datasource-server
  • 19. Queryable State ● Snapshot server ○ Stateful operator that holds last received list of objects ○ Receives query and emits the list as JSON formatted query result ● Source schema configured, result fields via query ● Predefined schemas (Apex library): “Snapshot”, “Dimensional” 19
  • 21. ● Apex runner in Apache Beam ● Iterative processing ● Integrated with Apache Samoa, opens up ML ● Integrated with Apache Calcite, enables SQL ● Scalable, incremental state management ● User defined control tuples (watermarks, batch control, …) ● Enhanced support for Batch Processing ● Support for Mesos and Kubernetes ● Encrypted Streams ● Support for Python Apex - Recent Additions & Roadmap 21
  • 22. Resources 22 • http://apex.apache.org/ • Powered by Apex - http://apex.apache.org/powered-by-apex.html • Learn more - http://apex.apache.org/docs.html • Getting involved - http://apex.apache.org/community.html • Download - http://apex.apache.org/downloads.html • Follow @ApacheApex - https://twitter.com/apacheapex • Meetups - https://www.meetup.com/topics/apache-apex/ • Examples - https://github.com/apache/apex-malhar/tree/master/examples • Slideshare - http://www.slideshare.net/ApacheApex/presentations