Flink Forward SF 2017: Bill Liu & Haohui Mai - AthenaX : Uber’s streaming processing platform on Flink

Flink Forward
Flink ForwardFlink Forward
AthenaX: Streaming
Processing Platform
@Uber
Bill Liu, Haohui Mai
APRIL 11, 2017
Speakers
Bill Liu Haohui Mai, @wheat9
• Senior Software Engineer @ Uber
• PMC, Apache Hadoop & Storm
• Senior Software Engineer @ Uber
• Uber: Transport A → B on demand
reliably
• Dynamic marketplace
• Example: UberEATS
Uber business is real-time
Challenges
Infra.: Reliability & scalability
• 99.99% SLA on latency
• At-least-once processing
• Billions of messages
• Multiple PB / day
Solutions: Productivity
• Audiences: majority of employees
use SQL actively
• Abstractions: Flink / DSL?
• Integrations: data management,
monitoring, reporting, etc.
Building streaming applications
Thrift
01001…
• Framework-specific
• Ad-hoc management over the life-
cycles
The AthenaX approach
SELECT AVG(…)
FROM eats_order
WHERE …
Write SQLs to build streaming applications
01001…
Generic tables
Compilation
Thrift• Decouple business logics with
framework
• Unified integration & management
• Write SQLs to build streaming applications
• Insight: generic table
• Reliable, scalable processing based on Apache
Flink
• Develop & deploy streaming applications in
production in hours instead of weeks
AthenaX: Streaming processing platform @ Uber
Agenda
• Motivating example
• Case study: ETD in UberEATS
• Implementation
• Current status
• Conclusion
Example
Real-time dashboard for restaurants
…
Time
AvgPrep.time
Time
…
SELECT meal_id, AVG(meal_prep_time)
FROM eats_order
GROUP BY meal_id, HOP(proctime(),
INTERVAL ‘1’ MINUTE,
INTERVAL ‘15’ MINUTE)
Example (cont.)
Building streaming processing applications with SQL
SELECT AVG(meal_prep_time) FROM
eats_order
GROUP BY meal_id, HOP(proctime(),
INTERVAL ‘1’ MINUTE,
INTERVAL ‘15’ MINUTE)
Example (cont.)
SELECT * FROM (
SELECT EXPECTED_TIME(meal_id)
AS e, meal_id,

AVG(meal_prep_time) AS t

FROM eats_order
GROUP BY meal_id, HOP(proctime(),
INTERVAL ‘1’ MINUTE,
INTERVAL ‘15’ MINUTE)
Building streaming processing applications with SQL
Tables are more generic than analytical stores
RPC
Agenda
• Motivating example
• Case study: ETD in UberEATS
• Implementation
• Current status
• Conclusion
The case of UberEATS
• Three-way marketplace
• Real-time metrics
• Estimated Time to Delivery (ETD)
• Transactions
• Demand forecasts
The case of UberEATS
• Three-way marketplace
• Real-time metrics
• Estimated Time to Delivery (ETD)
• Transactions
• Demand forecasts
Predicting the ETD
• Key metric: time to prepare a meal(tprep)
• Learn a function f: (order status) → tprep periodically
• Predict the ETD for current orders using f
• AthenaX extracts features for both learnings and
predictions
Architecture of the ETD service
Prediction
service
Order status
(Kafka)
AthenaX
Data warehouse
Feature / Model
(Cassandra)
Online features
Offline features
Machinelearning
SELECT AVG(meal_prep_time)
FROM eats_order
GROUP BY meal_id,
HOP(proctime(),
INTERVAL ‘1’ MINUTE,
Agenda
• Motivating example
• Case study: ETD in UberEATS
• Implementation
• Current status
• Conclusion
Architecture
SQL
Catalog
Query planner
Optimizer
Deployment
Monitoring
Flink job
AthenaX runtime
Flink on YARN
HDFS
AthenaX Flink
Executing AthenaX applications
• Compilation + Code generation
• Flink SQLAPIs: SQL → Logical plans → Flink
applications
• Leverage the Volcano optimizer in Apache
Calcite
• Challenges: exposing streaming semantics
Query planner
Optimizer
Deployment
Monitoring
Compile SQLs to Flink applications
AthenaX as a self-serving platform
• Metadata / catalog management
• Job management
• Monitoring
• Resource management and elastic scaling
• Failure recovery
Query planner
Optimizer
Deployment
Monitoring
Self-serving production support end-to-end
Agenda
• Motivating example
• Case study: ETD in UberEATS
• Implementation
• Current status
• Conclusion
Current status
• Pilot jobs in production
• In the process of full-scale roll outs
• Based on Apache Flink 1.3-SNAPSHOT
• Projection, filtering, group windows, UDF
• Streaming joins not yet supported
Embrace the community
• Group window support for streaming SQL
• CALCITE-1603, CALCITE-1615
• FLINK-5624, FLINK-5710, FLINK-6011, FLINK-6012
• Stability fixes
• FLINK-3679, FLINK-5631
• Table abstractions for Cassandra / JDBC (WIP)
• Available in the upcoming 1.3 release
Contributions to the upstream
Agenda
• Motivating example
• Case study: ETD in UberEATS
• Implementation
• Current status
• Conclusion
Conclusion
• AthenaX: write SQLs to build streaming applications
• Treat table as a generic concept
• Productivity: development → production in hours
• The AthenaX approach
• SQL on streams as a platform
• Self-serving production support end-to-end
First & Last Name
Thank you
Flink Forward SF 2017: Bill Liu & Haohui Mai - AthenaX : Uber’s streaming processing platform on Flink
Compiling SQL
LogicalTableScan
LogicalProject
LogicalAggregate
LogicalProject
SELECT AVG(meal_prep_time)
FROM eats_order
GROUP BY meal_id,
HOP(proctime(),
INTERVAL ‘1’ MINUTE,
val eats = getEatsOrder()
eats.window(Slide
.over(“15.minutes”)
.every(“1.minute”))
 .avg(“meal_prep_time”)
Parsing
DataStreamScan
DataStreamCalc
DataStreamAggregate
DataStreamCalc
Planning
01001…
Lazy deserialization
Example of SQL optimization
SELECT
AVG(meal_prep_time)
FROM eats_order
1 of 29

Recommended

Flink Forward SF 2017: Malo Deniélou - No shard left behind: Dynamic work re... by
Flink Forward SF 2017: Malo Deniélou -  No shard left behind: Dynamic work re...Flink Forward SF 2017: Malo Deniélou -  No shard left behind: Dynamic work re...
Flink Forward SF 2017: Malo Deniélou - No shard left behind: Dynamic work re...Flink Forward
267 views29 slides
Flink forward SF 2017: Elizabeth K. Joseph and Ravi Yadav - Flink meet DC/OS ... by
Flink forward SF 2017: Elizabeth K. Joseph and Ravi Yadav - Flink meet DC/OS ...Flink forward SF 2017: Elizabeth K. Joseph and Ravi Yadav - Flink meet DC/OS ...
Flink forward SF 2017: Elizabeth K. Joseph and Ravi Yadav - Flink meet DC/OS ...Flink Forward
501 views26 slides
Flink Forward SF 2017: Eron Wright - Introducing Flink Tensorflow by
Flink Forward SF 2017: Eron Wright - Introducing Flink TensorflowFlink Forward SF 2017: Eron Wright - Introducing Flink Tensorflow
Flink Forward SF 2017: Eron Wright - Introducing Flink TensorflowFlink Forward
2K views27 slides
Flink Forward SF 2017: David Hardwick, Sean Hester & David Brelloch - Dynami... by
Flink Forward SF 2017: David Hardwick, Sean Hester & David Brelloch -  Dynami...Flink Forward SF 2017: David Hardwick, Sean Hester & David Brelloch -  Dynami...
Flink Forward SF 2017: David Hardwick, Sean Hester & David Brelloch - Dynami...Flink Forward
963 views49 slides
Flink Forward Berlin 2017: Steffen Hausmann - Build a Real-time Stream Proces... by
Flink Forward Berlin 2017: Steffen Hausmann - Build a Real-time Stream Proces...Flink Forward Berlin 2017: Steffen Hausmann - Build a Real-time Stream Proces...
Flink Forward Berlin 2017: Steffen Hausmann - Build a Real-time Stream Proces...Flink Forward
1.1K views19 slides
Taking a look under the hood of Apache Flink's relational APIs. by
Taking a look under the hood of Apache Flink's relational APIs.Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.Fabian Hueske
2.7K views36 slides

More Related Content

What's hot

Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ... by
Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...
Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...Flink Forward
787 views40 slides
Kafka Summit NYC 2017 - Building Advanced Streaming Applications using the La... by
Kafka Summit NYC 2017 - Building Advanced Streaming Applications using the La...Kafka Summit NYC 2017 - Building Advanced Streaming Applications using the La...
Kafka Summit NYC 2017 - Building Advanced Streaming Applications using the La...confluent
850 views32 slides
Portable Streaming Pipelines with Apache Beam by
Portable Streaming Pipelines with Apache BeamPortable Streaming Pipelines with Apache Beam
Portable Streaming Pipelines with Apache Beamconfluent
3.4K views46 slides
Flink Forward SF 2017: Kenneth Knowles - Back to Sessions overview by
Flink Forward SF 2017: Kenneth Knowles - Back to Sessions overviewFlink Forward SF 2017: Kenneth Knowles - Back to Sessions overview
Flink Forward SF 2017: Kenneth Knowles - Back to Sessions overviewFlink Forward
465 views35 slides
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi... by
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...confluent
3.2K views36 slides
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn? by
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?Flink Forward
6.9K views18 slides

What's hot(20)

Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ... by Flink Forward
Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...
Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...
Flink Forward787 views
Kafka Summit NYC 2017 - Building Advanced Streaming Applications using the La... by confluent
Kafka Summit NYC 2017 - Building Advanced Streaming Applications using the La...Kafka Summit NYC 2017 - Building Advanced Streaming Applications using the La...
Kafka Summit NYC 2017 - Building Advanced Streaming Applications using the La...
confluent850 views
Portable Streaming Pipelines with Apache Beam by confluent
Portable Streaming Pipelines with Apache BeamPortable Streaming Pipelines with Apache Beam
Portable Streaming Pipelines with Apache Beam
confluent3.4K views
Flink Forward SF 2017: Kenneth Knowles - Back to Sessions overview by Flink Forward
Flink Forward SF 2017: Kenneth Knowles - Back to Sessions overviewFlink Forward SF 2017: Kenneth Knowles - Back to Sessions overview
Flink Forward SF 2017: Kenneth Knowles - Back to Sessions overview
Flink Forward465 views
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi... by confluent
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
confluent3.2K views
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn? by Flink Forward
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
Flink Forward6.9K views
Apache Beam @ GCPUG.TW Flink.TW 20161006 by Randy Huang
Apache Beam @ GCPUG.TW Flink.TW 20161006Apache Beam @ GCPUG.TW Flink.TW 20161006
Apache Beam @ GCPUG.TW Flink.TW 20161006
Randy Huang366 views
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink... by Flink Forward
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
Flink Forward468 views
How to use Standard SQL over Kafka: From the basics to advanced use cases | F... by HostedbyConfluent
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
HostedbyConfluent356 views
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ... by confluent
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
confluent667 views
Virtual Flink Forward 2020: How Streaming Helps Your Staging Environment and ... by Flink Forward
Virtual Flink Forward 2020: How Streaming Helps Your Staging Environment and ...Virtual Flink Forward 2020: How Streaming Helps Your Staging Environment and ...
Virtual Flink Forward 2020: How Streaming Helps Your Staging Environment and ...
Flink Forward150 views
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ... by Flink Forward
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward881 views
Spark Summit EU talk by Mikhail Semeniuk Hollin Wilkins by Spark Summit
Spark Summit EU talk by Mikhail Semeniuk Hollin WilkinsSpark Summit EU talk by Mikhail Semeniuk Hollin Wilkins
Spark Summit EU talk by Mikhail Semeniuk Hollin Wilkins
Spark Summit858 views
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ... by HostedbyConfluent
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
HostedbyConfluent1.8K views
Capture the Streams of Database Changes by confluent
Capture the Streams of Database ChangesCapture the Streams of Database Changes
Capture the Streams of Database Changes
confluent6.9K views
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea... by Flink Forward
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Flink Forward368 views
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline... by Provectus
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Provectus595 views
Flink Forward San Francisco 2019: Building production Flink jobs with Airstre... by Flink Forward
Flink Forward San Francisco 2019: Building production Flink jobs with Airstre...Flink Forward San Francisco 2019: Building production Flink jobs with Airstre...
Flink Forward San Francisco 2019: Building production Flink jobs with Airstre...
Flink Forward617 views
Virtual Flink Forward 2020: Production-Ready Flink and Hive Integration - wha... by Flink Forward
Virtual Flink Forward 2020: Production-Ready Flink and Hive Integration - wha...Virtual Flink Forward 2020: Production-Ready Flink and Hive Integration - wha...
Virtual Flink Forward 2020: Production-Ready Flink and Hive Integration - wha...
Flink Forward161 views
Flink Forward San Francisco 2018: Ken Krugler - "Building a scalable focused ... by Flink Forward
Flink Forward San Francisco 2018: Ken Krugler - "Building a scalable focused ...Flink Forward San Francisco 2018: Ken Krugler - "Building a scalable focused ...
Flink Forward San Francisco 2018: Ken Krugler - "Building a scalable focused ...
Flink Forward1.1K views

Similar to Flink Forward SF 2017: Bill Liu & Haohui Mai - AthenaX : Uber’s streaming processing platform on Flink

HBaseCon2015-final by
HBaseCon2015-finalHBaseCon2015-final
HBaseCon2015-finalMaryann Xue
127 views35 slides
Flink in Zalando's World of Microservices by
Flink in Zalando's World of Microservices  Flink in Zalando's World of Microservices
Flink in Zalando's World of Microservices Zalando Technology
5.1K views54 slides
Flink in Zalando's world of Microservices by
Flink in Zalando's world of Microservices   Flink in Zalando's world of Microservices
Flink in Zalando's world of Microservices ZalandoHayley
138 views54 slides
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ... by
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon
4.3K views35 slides
eHarmony @ Hbase Conference 2016 by vijay vangapandu. by
eHarmony @ Hbase Conference 2016 by vijay vangapandu.eHarmony @ Hbase Conference 2016 by vijay vangapandu.
eHarmony @ Hbase Conference 2016 by vijay vangapandu.Vijaykumar Vangapandu
698 views38 slides
Big Data Analytics Platforms by KTH and RISE SICS by
Big Data Analytics Platforms by KTH and RISE SICSBig Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICSBig Data Value Association
114 views38 slides

Similar to Flink Forward SF 2017: Bill Liu & Haohui Mai - AthenaX : Uber’s streaming processing platform on Flink(20)

HBaseCon2015-final by Maryann Xue
HBaseCon2015-finalHBaseCon2015-final
HBaseCon2015-final
Maryann Xue127 views
Flink in Zalando's World of Microservices by Zalando Technology
Flink in Zalando's World of Microservices  Flink in Zalando's World of Microservices
Flink in Zalando's World of Microservices
Zalando Technology5.1K views
Flink in Zalando's world of Microservices by ZalandoHayley
Flink in Zalando's world of Microservices   Flink in Zalando's world of Microservices
Flink in Zalando's world of Microservices
ZalandoHayley138 views
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ... by HBaseCon
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon4.3K views
What's New in .Net 4.5 by Malam Team
What's New in .Net 4.5What's New in .Net 4.5
What's New in .Net 4.5
Malam Team3K views
What's New in IBM Streams V4.1 by lisanl
What's New in IBM Streams V4.1What's New in IBM Streams V4.1
What's New in IBM Streams V4.1
lisanl724 views
Bay Area Impala User Group Meetup (Sept 16 2014) by Cloudera, Inc.
Bay Area Impala User Group Meetup (Sept 16 2014)Bay Area Impala User Group Meetup (Sept 16 2014)
Bay Area Impala User Group Meetup (Sept 16 2014)
Cloudera, Inc.1.2K views
Stinger.Next by Alan Gates of Hortonworks by Data Con LA
Stinger.Next by Alan Gates of HortonworksStinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of Hortonworks
Data Con LA1.4K views
HBaseCon2016-final by Maryann Xue
HBaseCon2016-finalHBaseCon2016-final
HBaseCon2016-final
Maryann Xue323 views
data Artisans Product Announcement by Flink Forward
data Artisans Product Announcementdata Artisans Product Announcement
data Artisans Product Announcement
Flink Forward17.7K views
vCloud Automation Center and Pivotal Cloud Foundry – Better PaaS Solution (VM... by VMware Tanzu
vCloud Automation Center and Pivotal Cloud Foundry – Better PaaS Solution (VM...vCloud Automation Center and Pivotal Cloud Foundry – Better PaaS Solution (VM...
vCloud Automation Center and Pivotal Cloud Foundry – Better PaaS Solution (VM...
VMware Tanzu3.6K views
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014 by cdmaxime
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
cdmaxime2.2K views
Containerized architectures for deep learning by Antje Barth
Containerized architectures for deep learningContainerized architectures for deep learning
Containerized architectures for deep learning
Antje Barth164 views
Elasticsearch + Cascading for Scalable Log Processing by Cascading
Elasticsearch + Cascading for Scalable Log ProcessingElasticsearch + Cascading for Scalable Log Processing
Elasticsearch + Cascading for Scalable Log Processing
Cascading1K views

More from Flink Forward

Building a fully managed stream processing platform on Flink at scale for Lin... by
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Flink Forward
853 views56 slides
Evening out the uneven: dealing with skew in Flink by
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkFlink Forward
2.5K views35 slides
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i... by
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...Flink Forward
185 views13 slides
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ... by
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Flink Forward
576 views34 slides
Introducing the Apache Flink Kubernetes Operator by
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorFlink Forward
771 views37 slides
Autoscaling Flink with Reactive Mode by
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeFlink Forward
922 views17 slides

More from Flink Forward(20)

Building a fully managed stream processing platform on Flink at scale for Lin... by Flink Forward
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
Flink Forward853 views
Evening out the uneven: dealing with skew in Flink by Flink Forward
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
Flink Forward2.5K views
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i... by Flink Forward
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
Flink Forward185 views
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ... by Flink Forward
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Flink Forward576 views
Introducing the Apache Flink Kubernetes Operator by Flink Forward
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
Flink Forward771 views
Autoscaling Flink with Reactive Mode by Flink Forward
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
Flink Forward922 views
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli... by Flink Forward
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Flink Forward266 views
One sink to rule them all: Introducing the new Async Sink by Flink Forward
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
Flink Forward312 views
Tuning Apache Kafka Connectors for Flink.pptx by Flink Forward
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
Flink Forward428 views
Flink powered stream processing platform at Pinterest by Flink Forward
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
Flink Forward224 views
Apache Flink in the Cloud-Native Era by Flink Forward
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
Flink Forward173 views
Where is my bottleneck? Performance troubleshooting in Flink by Flink Forward
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
Flink Forward535 views
Using the New Apache Flink Kubernetes Operator in a Production Deployment by Flink Forward
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Flink Forward655 views
The Current State of Table API in 2022 by Flink Forward
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
Flink Forward173 views
Dynamic Rule-based Real-time Market Data Alerts by Flink Forward
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
Flink Forward745 views
Exactly-Once Financial Data Processing at Scale with Flink and Pinot by Flink Forward
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Flink Forward688 views
Processing Semantically-Ordered Streams in Financial Services by Flink Forward
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
Flink Forward168 views
Tame the small files problem and optimize data layout for streaming ingestion... by Flink Forward
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
Flink Forward807 views
Batch Processing at Scale with Flink & Iceberg by Flink Forward
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
Flink Forward591 views
Welcome to the Flink Community! by Flink Forward
Welcome to the Flink Community!Welcome to the Flink Community!
Welcome to the Flink Community!
Flink Forward127 views

Recently uploaded

Cross-network in Google Analytics 4.pdf by
Cross-network in Google Analytics 4.pdfCross-network in Google Analytics 4.pdf
Cross-network in Google Analytics 4.pdfGA4 Tutorials
6 views7 slides
[DSC Europe 23] Ivana Sesic - Use of AI in Public Health.pptx by
[DSC Europe 23] Ivana Sesic - Use of AI in Public Health.pptx[DSC Europe 23] Ivana Sesic - Use of AI in Public Health.pptx
[DSC Europe 23] Ivana Sesic - Use of AI in Public Health.pptxDataScienceConferenc1
5 views15 slides
Ukraine Infographic_22NOV2023_v2.pdf by
Ukraine Infographic_22NOV2023_v2.pdfUkraine Infographic_22NOV2023_v2.pdf
Ukraine Infographic_22NOV2023_v2.pdfAnastosiyaGurin
1.4K views3 slides
OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an... by
OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an...OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an...
OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an...StatsCommunications
5 views26 slides
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init... by
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...DataScienceConferenc1
5 views18 slides
Data Journeys Hard Talk workshop final.pptx by
Data Journeys Hard Talk workshop final.pptxData Journeys Hard Talk workshop final.pptx
Data Journeys Hard Talk workshop final.pptxinfo828217
10 views18 slides

Recently uploaded(20)

Cross-network in Google Analytics 4.pdf by GA4 Tutorials
Cross-network in Google Analytics 4.pdfCross-network in Google Analytics 4.pdf
Cross-network in Google Analytics 4.pdf
GA4 Tutorials6 views
Ukraine Infographic_22NOV2023_v2.pdf by AnastosiyaGurin
Ukraine Infographic_22NOV2023_v2.pdfUkraine Infographic_22NOV2023_v2.pdf
Ukraine Infographic_22NOV2023_v2.pdf
AnastosiyaGurin1.4K views
OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an... by StatsCommunications
OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an...OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an...
OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an...
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init... by DataScienceConferenc1
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...
Data Journeys Hard Talk workshop final.pptx by info828217
Data Journeys Hard Talk workshop final.pptxData Journeys Hard Talk workshop final.pptx
Data Journeys Hard Talk workshop final.pptx
info82821710 views
Chapter 3b- Process Communication (1) (1)(1) (1).pptx by ayeshabaig2004
Chapter 3b- Process Communication (1) (1)(1) (1).pptxChapter 3b- Process Communication (1) (1)(1) (1).pptx
Chapter 3b- Process Communication (1) (1)(1) (1).pptx
ayeshabaig20047 views
Advanced_Recommendation_Systems_Presentation.pptx by neeharikasingh29
Advanced_Recommendation_Systems_Presentation.pptxAdvanced_Recommendation_Systems_Presentation.pptx
Advanced_Recommendation_Systems_Presentation.pptx
Data about the sector workshop by info828217
Data about the sector workshopData about the sector workshop
Data about the sector workshop
info82821715 views
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ... by DataScienceConferenc1
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M... by DataScienceConferenc1
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Predrag Ilic & Simeon Rilling - From Data Lakes to Data Mesh ... by DataScienceConferenc1
[DSC Europe 23] Predrag Ilic & Simeon Rilling - From Data Lakes to Data Mesh ...[DSC Europe 23] Predrag Ilic & Simeon Rilling - From Data Lakes to Data Mesh ...
[DSC Europe 23] Predrag Ilic & Simeon Rilling - From Data Lakes to Data Mesh ...
Short Story Assignment by Kelly Nguyen by kellynguyen01
Short Story Assignment by Kelly NguyenShort Story Assignment by Kelly Nguyen
Short Story Assignment by Kelly Nguyen
kellynguyen0119 views
CRIJ4385_Death Penalty_F23.pptx by yvettemm100
CRIJ4385_Death Penalty_F23.pptxCRIJ4385_Death Penalty_F23.pptx
CRIJ4385_Death Penalty_F23.pptx
yvettemm1007 views
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation by DataScienceConferenc1
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
Survey on Factuality in LLM's.pptx by NeethaSherra1
Survey on Factuality in LLM's.pptxSurvey on Factuality in LLM's.pptx
Survey on Factuality in LLM's.pptx
NeethaSherra17 views

Flink Forward SF 2017: Bill Liu & Haohui Mai - AthenaX : Uber’s streaming processing platform on Flink

  • 1. AthenaX: Streaming Processing Platform @Uber Bill Liu, Haohui Mai APRIL 11, 2017
  • 2. Speakers Bill Liu Haohui Mai, @wheat9 • Senior Software Engineer @ Uber • PMC, Apache Hadoop & Storm • Senior Software Engineer @ Uber
  • 3. • Uber: Transport A → B on demand reliably • Dynamic marketplace • Example: UberEATS Uber business is real-time
  • 4. Challenges Infra.: Reliability & scalability • 99.99% SLA on latency • At-least-once processing • Billions of messages • Multiple PB / day Solutions: Productivity • Audiences: majority of employees use SQL actively • Abstractions: Flink / DSL? • Integrations: data management, monitoring, reporting, etc.
  • 5. Building streaming applications Thrift 01001… • Framework-specific • Ad-hoc management over the life- cycles
  • 6. The AthenaX approach SELECT AVG(…) FROM eats_order WHERE … Write SQLs to build streaming applications 01001… Generic tables Compilation Thrift• Decouple business logics with framework • Unified integration & management
  • 7. • Write SQLs to build streaming applications • Insight: generic table • Reliable, scalable processing based on Apache Flink • Develop & deploy streaming applications in production in hours instead of weeks AthenaX: Streaming processing platform @ Uber
  • 8. Agenda • Motivating example • Case study: ETD in UberEATS • Implementation • Current status • Conclusion
  • 9. Example Real-time dashboard for restaurants … Time AvgPrep.time Time … SELECT meal_id, AVG(meal_prep_time) FROM eats_order GROUP BY meal_id, HOP(proctime(), INTERVAL ‘1’ MINUTE, INTERVAL ‘15’ MINUTE)
  • 10. Example (cont.) Building streaming processing applications with SQL SELECT AVG(meal_prep_time) FROM eats_order GROUP BY meal_id, HOP(proctime(), INTERVAL ‘1’ MINUTE, INTERVAL ‘15’ MINUTE)
  • 11. Example (cont.) SELECT * FROM ( SELECT EXPECTED_TIME(meal_id) AS e, meal_id,
 AVG(meal_prep_time) AS t
 FROM eats_order GROUP BY meal_id, HOP(proctime(), INTERVAL ‘1’ MINUTE, INTERVAL ‘15’ MINUTE) Building streaming processing applications with SQL Tables are more generic than analytical stores RPC
  • 12. Agenda • Motivating example • Case study: ETD in UberEATS • Implementation • Current status • Conclusion
  • 13. The case of UberEATS • Three-way marketplace • Real-time metrics • Estimated Time to Delivery (ETD) • Transactions • Demand forecasts
  • 14. The case of UberEATS • Three-way marketplace • Real-time metrics • Estimated Time to Delivery (ETD) • Transactions • Demand forecasts
  • 15. Predicting the ETD • Key metric: time to prepare a meal(tprep) • Learn a function f: (order status) → tprep periodically • Predict the ETD for current orders using f • AthenaX extracts features for both learnings and predictions
  • 16. Architecture of the ETD service Prediction service Order status (Kafka) AthenaX Data warehouse Feature / Model (Cassandra) Online features Offline features Machinelearning SELECT AVG(meal_prep_time) FROM eats_order GROUP BY meal_id, HOP(proctime(), INTERVAL ‘1’ MINUTE,
  • 17. Agenda • Motivating example • Case study: ETD in UberEATS • Implementation • Current status • Conclusion
  • 19. Executing AthenaX applications • Compilation + Code generation • Flink SQLAPIs: SQL → Logical plans → Flink applications • Leverage the Volcano optimizer in Apache Calcite • Challenges: exposing streaming semantics Query planner Optimizer Deployment Monitoring Compile SQLs to Flink applications
  • 20. AthenaX as a self-serving platform • Metadata / catalog management • Job management • Monitoring • Resource management and elastic scaling • Failure recovery Query planner Optimizer Deployment Monitoring Self-serving production support end-to-end
  • 21. Agenda • Motivating example • Case study: ETD in UberEATS • Implementation • Current status • Conclusion
  • 22. Current status • Pilot jobs in production • In the process of full-scale roll outs • Based on Apache Flink 1.3-SNAPSHOT • Projection, filtering, group windows, UDF • Streaming joins not yet supported
  • 23. Embrace the community • Group window support for streaming SQL • CALCITE-1603, CALCITE-1615 • FLINK-5624, FLINK-5710, FLINK-6011, FLINK-6012 • Stability fixes • FLINK-3679, FLINK-5631 • Table abstractions for Cassandra / JDBC (WIP) • Available in the upcoming 1.3 release Contributions to the upstream
  • 24. Agenda • Motivating example • Case study: ETD in UberEATS • Implementation • Current status • Conclusion
  • 25. Conclusion • AthenaX: write SQLs to build streaming applications • Treat table as a generic concept • Productivity: development → production in hours • The AthenaX approach • SQL on streams as a platform • Self-serving production support end-to-end
  • 26. First & Last Name Thank you
  • 28. Compiling SQL LogicalTableScan LogicalProject LogicalAggregate LogicalProject SELECT AVG(meal_prep_time) FROM eats_order GROUP BY meal_id, HOP(proctime(), INTERVAL ‘1’ MINUTE, val eats = getEatsOrder() eats.window(Slide .over(“15.minutes”) .every(“1.minute”))  .avg(“meal_prep_time”) Parsing DataStreamScan DataStreamCalc DataStreamAggregate DataStreamCalc Planning 01001…
  • 29. Lazy deserialization Example of SQL optimization SELECT AVG(meal_prep_time) FROM eats_order