SlideShare a Scribd company logo
Streaming and
Social Media
Joe Olson
Senior Manager, Big Data Analytics
Apache Road Show Chicago - May 2019
Agenda
United and the Airline Industry
How Streaming Model Presents
Opportunity
Apache Flink
4 Q & A
2
About United Airlines…..
 1,348 aircraft (779 mainline, 569 regional) with 250+ on order (supply chain)
 158M passengers in 2018
(public facing web site, mobile app, time / geospatial based inventory, loyalty program, surveys, ancillary sales)
 4900 daily departures (scheduling, operations, weather, route planning)
 355 airports served, in 48 countries (baggage claim, check-ins)
 88,000 employees worldwide (scheduling, pay)
 Constantly in motion! Future (and past) always changing.
 A data scientist / data engineer dream.
Source: https://hub.united.com/corporate-fact-sheet/
3
Business Goals
 Improve Customer Experience
- How can we reduce friction when booking a reservation? Maneuvering through an airport?
- How can we deliver a consistent message across all channels? (mobile app, web site, social media etc)
 Improve Employee Experience
- How can we keep employees better informed of the current situation so they can relay it to the customers?
- What are we learning from our surveys about what the customer bases says is / isn’t working?
 Revenue Generation
- What personalized offers can we make to our customers?
- Are our offers competitive with the rest of the industry?
 Improve Operational Reliability
- How can we better prepare for weather or other operational interruptions?
- How can we manage the fleet better and insure spare parts are where they need to be?
4
Industry Ideas – Customer Experience
5
Use Case – Improve Customer Experience Via Social Media
 Social media represents a unique opportunity for any service company
- Connect with customers in a familiar environment.
- Consistent messaging and brand management.
- Build community and advocacy.
- Direct issues to appropriate channels so they can be handled expediently.
6
Use Case – Customer Experience
 Can we use social media as a giant issue tracking database?
 Obstacles:
- Who am I talking to?
- Is there an issue? If so, what is the issue?
- What is the current state of the issue? How did it get there?
- Are there any recommendations on how to handle the issue?
- Who is best equipped to handle this issue?
All of these need to be overcome within a few seconds of receiving a notification…
7
Use Case – Customer Experience
 Actions
- Identification (Who am I talking to?)
- Classification, prioritization (Is there an issue? What is it? How important is it?)
- State determination (What is the current state of the issue? How did it get there?)
- Recommendation, clustering (Are there any recommendations on how to handle the issue?)
- Routing (Who is best equipped to handle this issue?)
Conclusion: several enrichments + state lookup
Other needs: low latency, fault tolerance, high availability, elasticity…
8
Stream Processing Engine
 Apache Flink - Stateful Computations over Data Streams
What about enrichment?
9
Stream Processing Engine - Enrichment
 Enrichment options:
- Option 1: Data lives in an external database or service using a map
- Option 2: Data arrives as a second stream
Option #1:
Social Media Messages
Social Media Messages
Source
Source
Map
(keyBy)
Map
(keyBy)
Map
Map
10
Stream Processing Engine - Enrichment
 Option #1 Issues
- Synchronous requests are slow and prone to error, jamming up the pipeline
- Wasted resources while waiting for the service to respond
 What about asynchronous?
- AsyncFunction in DataStream API since Flink 1.2
• A queue of promises
• Emitter on a different thread
- Client needs to support async requests
11
Stream Processing Engine - Enrichment
 Async call:
DataStream<Tuple2<String, String>> result =
AsyncDataStream.(un)orderedWait(stream,
new MyAsyncFunction(),
1000, TimeUnit.MILLISECONDS, 100)
– our asycFunction
– a timeout: max time until considered failed
– capacity: max number of queued up requests
– unorderedWait: emit results in order of completion
– orderedWait: emit results in order of arrival
Timeout: Exception thrown. Can override exception handler.
Capacity exceeded: back pressure.
12
Stream Processing Engine - Enrichment
 Option #2 - joining streams
Social Media Messages Source Map
(keyBy)
Social Media Messages Source Map
(keyBy)
Events Source Map
(keyBy)
Join
13
Stream Processing Engine - Joining
 Window join
- Only elements within the same window can be joined
• Tumbling window
• Sliding window
• Session window
- Interval Join
• Common key and where elements of stream B have event timestamps that lie in a relative
time interval to event timestamps of elements in stream A
14
Stream Processing Engine - State
 Managing state
- Ability to store and retrieve information about a key.
VS.
Client - Server Stateful Streaming
15
Stream Processing Engine - State
 Operate on a key-value pull on a keyed stream
 Several possible back ends, all easily configurable at cluster create time:
- Memory (very small state)
- File on disk
- RocksDB (very large state)
Keyed Stream
<Key> <Value>
16
Stream Processing Engine - State
 Types of state
- ValueState<T> - use this when the state is a single value
- ListState<T> - use this when the state is a list of items
- ReducingState<T> - single value that represents an aggregation of all values added to state
- AggregatingState<IN, OUT> - similar to ReducingState, the aggregation function can change
based on different inputs types.
- MapState<UK, UV> - mapping. Can use put(UK, UV) or get(UK). Also iterable.
17
Stream Processing Engine – Queryable State
 Ability to query state from outside a Flink cluster via an API:
Flink Compute Cluster
Keyed Stream <K>
<V>
18
State - Other Issues
 Fault tolerance and high availability: Savepointing
HDFS / S3, etc
19
Stream Processing - Other Issues
 Elasticity:
- Flink Active (Flink controls resource allocation) / Reactive (external entity controls resource
allocation) mode.
- FLIP-6
- Idea: cluster manager creates and destroys task managers based on demand.
- Flink Forward San Francisco 2019: Future of Apache Flink Deployments: Containers,
Kubernetes and More - Till Rohrmann
20
Use Case – Customer Experience
 Actions
- Identification (Who am I talking to?)
- Classification, prioritization (Is there an issue? What is it?)
- State determination (What is the current state of the issue, and how did it get there?)
- Recommendation, clustering (Are there any recommendations on how to handle the issue?)
- Routing (Who is best equipped to handle this issue?)
 Some of these are machine learning / model type applications.
 How to switch model versions without interrupting the stream?
- Control Stream!
21
Stream Processing Engine – Interacting With Models
 Control Stream:
Social Media Messages Source
Model A
List State
Make sure the output stream contains which model version was used!
Map
(KeyBy)
Control Stream Source
Connect CoFlatMap
Model B
Map
(KeyBy)
22
Apache Communities
 Twitter: @ApacheFlink
 Mailing Lists
- news@flink.apache.org
- community@flink.apache.org
- user@flink.apache.org
- dev@flink.apache.org
- issues@flink.apache.org
 Stack Overflow: apache-flink tag
 Github
- https://github.com/apache/flink
Apache Flink
Thank You!
We’re hiring!
- Data Engineers
- Data Scientists

More Related Content

What's hot

Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
Guido Schmutz
 
Data Integration with Apache Kafka: What, Why, How
Data Integration with Apache Kafka: What, Why, HowData Integration with Apache Kafka: What, Why, How
Data Integration with Apache Kafka: What, Why, How
Pat Patterson
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
HostedbyConfluent
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
Guido Schmutz
 
Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...
Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...
Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...
HostedbyConfluent
 
Mind the App: How to Monitor Your Kafka Streams Applications | Bruno Cadonna,...
Mind the App: How to Monitor Your Kafka Streams Applications | Bruno Cadonna,...Mind the App: How to Monitor Your Kafka Streams Applications | Bruno Cadonna,...
Mind the App: How to Monitor Your Kafka Streams Applications | Bruno Cadonna,...
HostedbyConfluent
 
Why Cloud-Native Kafka Matters: 4 Reasons to Stop Managing it Yourself
Why Cloud-Native Kafka Matters: 4 Reasons to Stop Managing it YourselfWhy Cloud-Native Kafka Matters: 4 Reasons to Stop Managing it Yourself
Why Cloud-Native Kafka Matters: 4 Reasons to Stop Managing it Yourself
DATAVERSITY
 
Operational Analytics on Event Streams in Kafka
Operational Analytics on Event Streams in KafkaOperational Analytics on Event Streams in Kafka
Operational Analytics on Event Streams in Kafka
confluent
 
Hadoop made fast - Why Virtual Reality Needed Stream Processing to Survive
Hadoop made fast - Why Virtual Reality Needed Stream Processing to SurviveHadoop made fast - Why Virtual Reality Needed Stream Processing to Survive
Hadoop made fast - Why Virtual Reality Needed Stream Processing to Survive
confluent
 
Event Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data ArchitectureEvent Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data Architecture
Guido Schmutz
 
Real-time processing of large amounts of data
Real-time processing of large amounts of dataReal-time processing of large amounts of data
Real-time processing of large amounts of data
confluent
 
Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?
Guido Schmutz
 
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
confluent
 
How a distributed graph analytics platform uses Apache Kafka for data ingesti...
How a distributed graph analytics platform uses Apache Kafka for data ingesti...How a distributed graph analytics platform uses Apache Kafka for data ingesti...
How a distributed graph analytics platform uses Apache Kafka for data ingesti...
HostedbyConfluent
 
Why SQL? | Kenny Gorman, Cloudera
Why SQL? | Kenny Gorman, ClouderaWhy SQL? | Kenny Gorman, Cloudera
Why SQL? | Kenny Gorman, Cloudera
HostedbyConfluent
 
Using Apache Cassandra and Apache Kafka to Scale Next Gen Applications
Using Apache Cassandra and Apache Kafka to Scale Next Gen ApplicationsUsing Apache Cassandra and Apache Kafka to Scale Next Gen Applications
Using Apache Cassandra and Apache Kafka to Scale Next Gen Applications
Data Con LA
 
SIEM Modernization: Build a Situationally Aware Organization with Apache Kafka®
SIEM Modernization: Build a Situationally Aware Organization with Apache Kafka®SIEM Modernization: Build a Situationally Aware Organization with Apache Kafka®
SIEM Modernization: Build a Situationally Aware Organization with Apache Kafka®
confluent
 
Evolving from Messaging to Event Streaming
Evolving from Messaging to Event StreamingEvolving from Messaging to Event Streaming
Evolving from Messaging to Event Streaming
confluent
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
ETL as a Platform: Pandora Plays Nicely Everywhere with Real-Time Data Pipelines
ETL as a Platform: Pandora Plays Nicely Everywhere with Real-Time Data PipelinesETL as a Platform: Pandora Plays Nicely Everywhere with Real-Time Data Pipelines
ETL as a Platform: Pandora Plays Nicely Everywhere with Real-Time Data Pipelines
confluent
 

What's hot (20)

Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
 
Data Integration with Apache Kafka: What, Why, How
Data Integration with Apache Kafka: What, Why, HowData Integration with Apache Kafka: What, Why, How
Data Integration with Apache Kafka: What, Why, How
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...
Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...
Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...
 
Mind the App: How to Monitor Your Kafka Streams Applications | Bruno Cadonna,...
Mind the App: How to Monitor Your Kafka Streams Applications | Bruno Cadonna,...Mind the App: How to Monitor Your Kafka Streams Applications | Bruno Cadonna,...
Mind the App: How to Monitor Your Kafka Streams Applications | Bruno Cadonna,...
 
Why Cloud-Native Kafka Matters: 4 Reasons to Stop Managing it Yourself
Why Cloud-Native Kafka Matters: 4 Reasons to Stop Managing it YourselfWhy Cloud-Native Kafka Matters: 4 Reasons to Stop Managing it Yourself
Why Cloud-Native Kafka Matters: 4 Reasons to Stop Managing it Yourself
 
Operational Analytics on Event Streams in Kafka
Operational Analytics on Event Streams in KafkaOperational Analytics on Event Streams in Kafka
Operational Analytics on Event Streams in Kafka
 
Hadoop made fast - Why Virtual Reality Needed Stream Processing to Survive
Hadoop made fast - Why Virtual Reality Needed Stream Processing to SurviveHadoop made fast - Why Virtual Reality Needed Stream Processing to Survive
Hadoop made fast - Why Virtual Reality Needed Stream Processing to Survive
 
Event Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data ArchitectureEvent Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data Architecture
 
Real-time processing of large amounts of data
Real-time processing of large amounts of dataReal-time processing of large amounts of data
Real-time processing of large amounts of data
 
Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?
 
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
 
How a distributed graph analytics platform uses Apache Kafka for data ingesti...
How a distributed graph analytics platform uses Apache Kafka for data ingesti...How a distributed graph analytics platform uses Apache Kafka for data ingesti...
How a distributed graph analytics platform uses Apache Kafka for data ingesti...
 
Why SQL? | Kenny Gorman, Cloudera
Why SQL? | Kenny Gorman, ClouderaWhy SQL? | Kenny Gorman, Cloudera
Why SQL? | Kenny Gorman, Cloudera
 
Using Apache Cassandra and Apache Kafka to Scale Next Gen Applications
Using Apache Cassandra and Apache Kafka to Scale Next Gen ApplicationsUsing Apache Cassandra and Apache Kafka to Scale Next Gen Applications
Using Apache Cassandra and Apache Kafka to Scale Next Gen Applications
 
SIEM Modernization: Build a Situationally Aware Organization with Apache Kafka®
SIEM Modernization: Build a Situationally Aware Organization with Apache Kafka®SIEM Modernization: Build a Situationally Aware Organization with Apache Kafka®
SIEM Modernization: Build a Situationally Aware Organization with Apache Kafka®
 
Evolving from Messaging to Event Streaming
Evolving from Messaging to Event StreamingEvolving from Messaging to Event Streaming
Evolving from Messaging to Event Streaming
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
ETL as a Platform: Pandora Plays Nicely Everywhere with Real-Time Data Pipelines
ETL as a Platform: Pandora Plays Nicely Everywhere with Real-Time Data PipelinesETL as a Platform: Pandora Plays Nicely Everywhere with Real-Time Data Pipelines
ETL as a Platform: Pandora Plays Nicely Everywhere with Real-Time Data Pipelines
 

Similar to Streaming and Social Media

Narrative Mind Week 9 H4D Stanford 2016
Narrative Mind Week 9 H4D Stanford 2016Narrative Mind Week 9 H4D Stanford 2016
Narrative Mind Week 9 H4D Stanford 2016
Stanford University
 
Big data at United Airlines
Big data at United AirlinesBig data at United Airlines
Big data at United Airlines
DataWorks Summit
 
Evolution of a big data project
Evolution of a big data projectEvolution of a big data project
Evolution of a big data projectMichael Peacock
 
Jazz for Service Management
Jazz for Service ManagementJazz for Service Management
Jazz for Service Management
IBM Danmark
 
Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...
Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...
Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...
Flink Forward
 
Divya 3 yrs exp in qa engg
Divya 3 yrs exp in qa enggDivya 3 yrs exp in qa engg
Divya 3 yrs exp in qa engg
Divya Lakshmi.B
 
Getting Started with Splunk Enterprise
Getting Started with Splunk EnterpriseGetting Started with Splunk Enterprise
Getting Started with Splunk Enterprise
Splunk
 
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...
Prolifics
 
MIT lecture - Socrata Open Data Architecture
MIT lecture - Socrata Open Data ArchitectureMIT lecture - Socrata Open Data Architecture
MIT lecture - Socrata Open Data Architecture
Evan Chan
 
Increase payment platform adoption by growing partner/client categories
Increase payment platform adoption by growing partner/client categoriesIncrease payment platform adoption by growing partner/client categories
Increase payment platform adoption by growing partner/client categories
Bhaskar Jayaraman
 
Loan Decisioning Transformation
Loan Decisioning TransformationLoan Decisioning Transformation
Loan Decisioning Transformation
DataWorks Summit/Hadoop Summit
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
confluent
 
Presentation Data Council Meetup: F. Mekkenholt, R. Vlijm
Presentation Data Council Meetup: F. Mekkenholt, R. VlijmPresentation Data Council Meetup: F. Mekkenholt, R. Vlijm
Presentation Data Council Meetup: F. Mekkenholt, R. Vlijm
Alexander Oppel
 
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
confluent
 
Split my monolith! Workshop
Split my monolith! Workshop Split my monolith! Workshop
Split my monolith! Workshop
martinsson
 
Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018
Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018
Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018
VMware Tanzu
 

Similar to Streaming and Social Media (20)

Narrative Mind Week 9 H4D Stanford 2016
Narrative Mind Week 9 H4D Stanford 2016Narrative Mind Week 9 H4D Stanford 2016
Narrative Mind Week 9 H4D Stanford 2016
 
Big data at United Airlines
Big data at United AirlinesBig data at United Airlines
Big data at United Airlines
 
Evolution of a big data project
Evolution of a big data projectEvolution of a big data project
Evolution of a big data project
 
Jazz for Service Management
Jazz for Service ManagementJazz for Service Management
Jazz for Service Management
 
Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...
Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...
Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...
 
Divya 3 yrs exp in qa engg
Divya 3 yrs exp in qa enggDivya 3 yrs exp in qa engg
Divya 3 yrs exp in qa engg
 
Getting Started with Splunk Enterprise
Getting Started with Splunk EnterpriseGetting Started with Splunk Enterprise
Getting Started with Splunk Enterprise
 
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...
 
Rohit_Gupta
Rohit_GuptaRohit_Gupta
Rohit_Gupta
 
MIT lecture - Socrata Open Data Architecture
MIT lecture - Socrata Open Data ArchitectureMIT lecture - Socrata Open Data Architecture
MIT lecture - Socrata Open Data Architecture
 
Increase payment platform adoption by growing partner/client categories
Increase payment platform adoption by growing partner/client categoriesIncrease payment platform adoption by growing partner/client categories
Increase payment platform adoption by growing partner/client categories
 
AlBaraaAhmed_20160523
AlBaraaAhmed_20160523AlBaraaAhmed_20160523
AlBaraaAhmed_20160523
 
Loan Decisioning Transformation
Loan Decisioning TransformationLoan Decisioning Transformation
Loan Decisioning Transformation
 
Rohit Gupta
Rohit GuptaRohit Gupta
Rohit Gupta
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
 
Presentation Data Council Meetup: F. Mekkenholt, R. Vlijm
Presentation Data Council Meetup: F. Mekkenholt, R. VlijmPresentation Data Council Meetup: F. Mekkenholt, R. Vlijm
Presentation Data Council Meetup: F. Mekkenholt, R. Vlijm
 
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
 
IBM Rational HATS Overview 2013
IBM Rational HATS Overview 2013IBM Rational HATS Overview 2013
IBM Rational HATS Overview 2013
 
Split my monolith! Workshop
Split my monolith! Workshop Split my monolith! Workshop
Split my monolith! Workshop
 
Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018
Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018
Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018
 

Recently uploaded

一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 

Recently uploaded (20)

一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 

Streaming and Social Media

  • 1. Streaming and Social Media Joe Olson Senior Manager, Big Data Analytics Apache Road Show Chicago - May 2019
  • 2. Agenda United and the Airline Industry How Streaming Model Presents Opportunity Apache Flink 4 Q & A
  • 3. 2 About United Airlines…..  1,348 aircraft (779 mainline, 569 regional) with 250+ on order (supply chain)  158M passengers in 2018 (public facing web site, mobile app, time / geospatial based inventory, loyalty program, surveys, ancillary sales)  4900 daily departures (scheduling, operations, weather, route planning)  355 airports served, in 48 countries (baggage claim, check-ins)  88,000 employees worldwide (scheduling, pay)  Constantly in motion! Future (and past) always changing.  A data scientist / data engineer dream. Source: https://hub.united.com/corporate-fact-sheet/
  • 4. 3 Business Goals  Improve Customer Experience - How can we reduce friction when booking a reservation? Maneuvering through an airport? - How can we deliver a consistent message across all channels? (mobile app, web site, social media etc)  Improve Employee Experience - How can we keep employees better informed of the current situation so they can relay it to the customers? - What are we learning from our surveys about what the customer bases says is / isn’t working?  Revenue Generation - What personalized offers can we make to our customers? - Are our offers competitive with the rest of the industry?  Improve Operational Reliability - How can we better prepare for weather or other operational interruptions? - How can we manage the fleet better and insure spare parts are where they need to be?
  • 5. 4 Industry Ideas – Customer Experience
  • 6. 5 Use Case – Improve Customer Experience Via Social Media  Social media represents a unique opportunity for any service company - Connect with customers in a familiar environment. - Consistent messaging and brand management. - Build community and advocacy. - Direct issues to appropriate channels so they can be handled expediently.
  • 7. 6 Use Case – Customer Experience  Can we use social media as a giant issue tracking database?  Obstacles: - Who am I talking to? - Is there an issue? If so, what is the issue? - What is the current state of the issue? How did it get there? - Are there any recommendations on how to handle the issue? - Who is best equipped to handle this issue? All of these need to be overcome within a few seconds of receiving a notification…
  • 8. 7 Use Case – Customer Experience  Actions - Identification (Who am I talking to?) - Classification, prioritization (Is there an issue? What is it? How important is it?) - State determination (What is the current state of the issue? How did it get there?) - Recommendation, clustering (Are there any recommendations on how to handle the issue?) - Routing (Who is best equipped to handle this issue?) Conclusion: several enrichments + state lookup Other needs: low latency, fault tolerance, high availability, elasticity…
  • 9. 8 Stream Processing Engine  Apache Flink - Stateful Computations over Data Streams What about enrichment?
  • 10. 9 Stream Processing Engine - Enrichment  Enrichment options: - Option 1: Data lives in an external database or service using a map - Option 2: Data arrives as a second stream Option #1: Social Media Messages Social Media Messages Source Source Map (keyBy) Map (keyBy) Map Map
  • 11. 10 Stream Processing Engine - Enrichment  Option #1 Issues - Synchronous requests are slow and prone to error, jamming up the pipeline - Wasted resources while waiting for the service to respond  What about asynchronous? - AsyncFunction in DataStream API since Flink 1.2 • A queue of promises • Emitter on a different thread - Client needs to support async requests
  • 12. 11 Stream Processing Engine - Enrichment  Async call: DataStream<Tuple2<String, String>> result = AsyncDataStream.(un)orderedWait(stream, new MyAsyncFunction(), 1000, TimeUnit.MILLISECONDS, 100) – our asycFunction – a timeout: max time until considered failed – capacity: max number of queued up requests – unorderedWait: emit results in order of completion – orderedWait: emit results in order of arrival Timeout: Exception thrown. Can override exception handler. Capacity exceeded: back pressure.
  • 13. 12 Stream Processing Engine - Enrichment  Option #2 - joining streams Social Media Messages Source Map (keyBy) Social Media Messages Source Map (keyBy) Events Source Map (keyBy) Join
  • 14. 13 Stream Processing Engine - Joining  Window join - Only elements within the same window can be joined • Tumbling window • Sliding window • Session window - Interval Join • Common key and where elements of stream B have event timestamps that lie in a relative time interval to event timestamps of elements in stream A
  • 15. 14 Stream Processing Engine - State  Managing state - Ability to store and retrieve information about a key. VS. Client - Server Stateful Streaming
  • 16. 15 Stream Processing Engine - State  Operate on a key-value pull on a keyed stream  Several possible back ends, all easily configurable at cluster create time: - Memory (very small state) - File on disk - RocksDB (very large state) Keyed Stream <Key> <Value>
  • 17. 16 Stream Processing Engine - State  Types of state - ValueState<T> - use this when the state is a single value - ListState<T> - use this when the state is a list of items - ReducingState<T> - single value that represents an aggregation of all values added to state - AggregatingState<IN, OUT> - similar to ReducingState, the aggregation function can change based on different inputs types. - MapState<UK, UV> - mapping. Can use put(UK, UV) or get(UK). Also iterable.
  • 18. 17 Stream Processing Engine – Queryable State  Ability to query state from outside a Flink cluster via an API: Flink Compute Cluster Keyed Stream <K> <V>
  • 19. 18 State - Other Issues  Fault tolerance and high availability: Savepointing HDFS / S3, etc
  • 20. 19 Stream Processing - Other Issues  Elasticity: - Flink Active (Flink controls resource allocation) / Reactive (external entity controls resource allocation) mode. - FLIP-6 - Idea: cluster manager creates and destroys task managers based on demand. - Flink Forward San Francisco 2019: Future of Apache Flink Deployments: Containers, Kubernetes and More - Till Rohrmann
  • 21. 20 Use Case – Customer Experience  Actions - Identification (Who am I talking to?) - Classification, prioritization (Is there an issue? What is it?) - State determination (What is the current state of the issue, and how did it get there?) - Recommendation, clustering (Are there any recommendations on how to handle the issue?) - Routing (Who is best equipped to handle this issue?)  Some of these are machine learning / model type applications.  How to switch model versions without interrupting the stream? - Control Stream!
  • 22. 21 Stream Processing Engine – Interacting With Models  Control Stream: Social Media Messages Source Model A List State Make sure the output stream contains which model version was used! Map (KeyBy) Control Stream Source Connect CoFlatMap Model B Map (KeyBy)
  • 23. 22 Apache Communities  Twitter: @ApacheFlink  Mailing Lists - news@flink.apache.org - community@flink.apache.org - user@flink.apache.org - dev@flink.apache.org - issues@flink.apache.org  Stack Overflow: apache-flink tag  Github - https://github.com/apache/flink Apache Flink
  • 24. Thank You! We’re hiring! - Data Engineers - Data Scientists