Flink Forward San Francisco 2018: Andrew Gao & Jeff Sharpe - "Finding Bad Acorns"

Flink Forward
Flink ForwardFlink Forward
FINDING BAD
ACORNS
ANDREW GAO
&
JEFF SHARPE
FLINK FORWARD 2018
ANDREW GAO JEFF SHARPE
Developing a Fraud
Defense Platform
Fraud Defense at the
Teller Using Flink
Our journey to build a Fraud Decisioning Platform and use
Flink to build out the use cases
DEVELOPING A FRAUD DEFENSE PLATFORM
OUR USERS
Fraud
Operator
Customer
Data
Scientist
Data
Analyst
Engineer
Product
Owner
Flink Forward San Francisco 2018: Andrew Gao &  Jeff Sharpe - "Finding Bad Acorns"
OUR USERS
Fraud
Operator
Customer
Data
Scientist
Data
Analyst
Engineer
Product
Owner
ARCHITECTURE
DATA ACTIONS
MAGIC!
RUNNING ON
Flink Forward San Francisco 2018: Andrew Gao &  Jeff Sharpe - "Finding Bad Acorns"
RUNNING ON
PROS
• Community support for
Docker/Kube
• Resilient
• Easy to tear down and bring
back
• Maximizing resource efficiency
CONS
• Maintaining your own
Kubernetes solution
• Containing blast radius
• Edge cases when combining #
of technology solutions
Developing on Kubernetes has been challenging but very
rewarding
Flink Forward San Francisco 2018: Andrew Gao &  Jeff Sharpe - "Finding Bad Acorns"
FRAUD DEFENSE AT THE TELLER
Flink Forward San Francisco 2018: Andrew Gao &  Jeff Sharpe - "Finding Bad Acorns"
A FLINK MONOLITH
• Problem: Develop a stream processing workflow for
two legacy batch data sources
• First Attempt: Do everything in Flink and take
advantage of Flink Connected Streams
1
2 3
Using Flink operators to build our application workflow
4
PROS
• Cheap
• Not a lot of
Code/Config
• Scalability / Availability
• Deployments are a
breeze
CONS
• Not truly stateless
• Start-up time
AWS Lambda is a good fit for our use case and works well
with our underlying technologies
1
2 3
Using Flink operators to build our application workflow
4
90 Day Storage Window
CUSTOM WINDOWS FOR OPTIMIZATION
AND PORTABILITY
30 Day Virtual View
90 Day Filtered View
CUSTOM WINDOWS FOR OPTIMIZATION
AND PORTABILITY
Most-Recent-Beyond-24-Hours Window
24 Hour Offset Dynamic Window
1
2 3
Using Flink operators to build our application workflow
4
USING JYTHON TO BRIDGE THE GAP TO
DATA SCIENTISTS
Flink
Jython Adapter
.py .py .py .py
Windows
Data
Featur
e
Featur
e
Featur
e
Featur
e
Featur
e
Featur
e
Featur
e
Featur
e
.py .py .py .py
Data
GITFLOW AND JYTHON IMPROVE
TRACEABILITY
Featur
e JAR
v1.0.42
Junit
Tests
Pull
Request
Merge
Build
Develop Denied
Failed
Maven
Import
Junit
Tests
Build
Flink
Job
JAR
Commit
1
2 3
Using Flink operators to build our application workflow
4
FEATURES EXIST TO FEED MODELS
FeatureFeature
Model Model Score
H20 Tensor Flow Seldon (whatever)
Flink Forward San Francisco 2018: Andrew Gao &  Jeff Sharpe - "Finding Bad Acorns"
BREAKING UP THE MONOLITH
• Problem: Back Pressure leading to Delayed Transactions
• Solution: Break up the monolith Flink App into small Queryable State
Apps
CHIPMUNKS
•Connected Streams
•Flink Keyed State
•Checkpointing/Savepointing
•Queryable State
Features Used
•Flink Versioning (FLINK-7783, FLINK-8487)
•Keyed Source Function
•Kafka Offsets
Issues
We had a lot of fun and success using Flink, but not without a
few hiccups
Developing a Fraud
Defense Platform
Fraud Defense at the
Teller Using Flink
Our journey to build a Fraud Decisioning Platform and use
Flink to build out the use cases
QUESTIONS?
1 of 31

Recommended

Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t... by
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...Flink Forward
2.7K views25 slides
Flink Case Study: Capital One by
Flink Case Study: Capital OneFlink Case Study: Capital One
Flink Case Study: Capital OneFlink Forward
11.9K views9 slides
Welcome to the Flink Community! by
Welcome to the Flink Community!Welcome to the Flink Community!
Welcome to the Flink Community!Flink Forward
127 views54 slides
Introducing the Apache Flink Kubernetes Operator by
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorFlink Forward
778 views37 slides
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i... by
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...Flink Forward
185 views13 slides
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli... by
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Flink Forward
266 views57 slides

More Related Content

What's hot

Near real-time statistical modeling and anomaly detection using Flink! by
Near real-time statistical modeling and anomaly detection using Flink!Near real-time statistical modeling and anomaly detection using Flink!
Near real-time statistical modeling and anomaly detection using Flink!Flink Forward
203 views12 slides
Kafka At Scale in the Cloud by
Kafka At Scale in the CloudKafka At Scale in the Cloud
Kafka At Scale in the Cloudconfluent
11.2K views43 slides
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree by
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision TreeApache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision TreeSlim Baltagi
14.5K views1 slide
Practical learnings from running thousands of Flink jobs by
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsFlink Forward
269 views18 slides
Data integration with Apache Kafka by
Data integration with Apache KafkaData integration with Apache Kafka
Data integration with Apache Kafkaconfluent
6K views31 slides
kafka by
kafkakafka
kafkaAmikam Snir
1K views23 slides

What's hot(20)

Near real-time statistical modeling and anomaly detection using Flink! by Flink Forward
Near real-time statistical modeling and anomaly detection using Flink!Near real-time statistical modeling and anomaly detection using Flink!
Near real-time statistical modeling and anomaly detection using Flink!
Flink Forward203 views
Kafka At Scale in the Cloud by confluent
Kafka At Scale in the CloudKafka At Scale in the Cloud
Kafka At Scale in the Cloud
confluent11.2K views
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree by Slim Baltagi
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision TreeApache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Slim Baltagi14.5K views
Practical learnings from running thousands of Flink jobs by Flink Forward
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobs
Flink Forward269 views
Data integration with Apache Kafka by confluent
Data integration with Apache KafkaData integration with Apache Kafka
Data integration with Apache Kafka
confluent6K views
Stephan Ewen - Experiences running Flink at Very Large Scale by Ververica
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
Ververica 3.5K views
F5 Solutions for Service Providers by BAKOTECH
F5 Solutions for Service ProvidersF5 Solutions for Service Providers
F5 Solutions for Service Providers
BAKOTECH7.6K views
The top 3 challenges running multi-tenant Flink at scale by Flink Forward
The top 3 challenges running multi-tenant Flink at scaleThe top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scale
Flink Forward332 views
Presto best practices for Cluster admins, data engineers and analysts by Shubham Tagra
Presto best practices for Cluster admins, data engineers and analystsPresto best practices for Cluster admins, data engineers and analysts
Presto best practices for Cluster admins, data engineers and analysts
Shubham Tagra316 views
美团数据平台之Kafka应用实践和优化 by confluent
美团数据平台之Kafka应用实践和优化美团数据平台之Kafka应用实践和优化
美团数据平台之Kafka应用实践和优化
confluent580 views
Where is my bottleneck? Performance troubleshooting in Flink by Flink Forward
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
Flink Forward540 views
Introduction to Apache Flink by datamantra
Introduction to Apache FlinkIntroduction to Apache Flink
Introduction to Apache Flink
datamantra5.2K views
Exactly-Once Financial Data Processing at Scale with Flink and Pinot by Flink Forward
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Flink Forward697 views
JDBC Source Connector: What could go wrong? with Francesco Tisiot | Kafka Sum... by HostedbyConfluent
JDBC Source Connector: What could go wrong? with Francesco Tisiot | Kafka Sum...JDBC Source Connector: What could go wrong? with Francesco Tisiot | Kafka Sum...
JDBC Source Connector: What could go wrong? with Francesco Tisiot | Kafka Sum...
HostedbyConfluent966 views
Introducing Kafka's Streams API by confluent
Introducing Kafka's Streams APIIntroducing Kafka's Streams API
Introducing Kafka's Streams API
confluent4.9K views
Apache Kafka – (Pattern and) Anti-Pattern by confluent
Apache Kafka – (Pattern and) Anti-PatternApache Kafka – (Pattern and) Anti-Pattern
Apache Kafka – (Pattern and) Anti-Pattern
confluent2.3K views
Apache Kafka - Martin Podval by Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin Podval
Martin Podval3.4K views
Introduction to Kafka Cruise Control by Jiangjie Qin
Introduction to Kafka Cruise ControlIntroduction to Kafka Cruise Control
Introduction to Kafka Cruise Control
Jiangjie Qin25.7K views

Similar to Flink Forward San Francisco 2018: Andrew Gao & Jeff Sharpe - "Finding Bad Acorns"

DevOps on Steroids Featuring Red Hat & Alantiss - Pop-up Loft Tel Aviv by
DevOps on Steroids Featuring Red Hat & Alantiss - Pop-up Loft Tel AvivDevOps on Steroids Featuring Red Hat & Alantiss - Pop-up Loft Tel Aviv
DevOps on Steroids Featuring Red Hat & Alantiss - Pop-up Loft Tel AvivAmazon Web Services
1.8K views36 slides
Webinar by ZNetLive & Plesk- Winning the Game for WebOps and DevOps by
Webinar by ZNetLive & Plesk- Winning the Game for WebOps and DevOps Webinar by ZNetLive & Plesk- Winning the Game for WebOps and DevOps
Webinar by ZNetLive & Plesk- Winning the Game for WebOps and DevOps ZNetLive
634 views45 slides
GE Capital Legacy Modernization and Mainframe Conversion by
GE Capital Legacy Modernization and Mainframe ConversionGE Capital Legacy Modernization and Mainframe Conversion
GE Capital Legacy Modernization and Mainframe Conversionguatham
30 views24 slides
Orchestrate Your End-to-end Mainframe Application Release Pipeline by
Orchestrate Your End-to-end Mainframe Application Release PipelineOrchestrate Your End-to-end Mainframe Application Release Pipeline
Orchestrate Your End-to-end Mainframe Application Release PipelineDevOps.com
89 views24 slides
From 0 to DevOps in 80 Days [Webinar Replay] by
From 0 to DevOps in 80 Days [Webinar Replay]From 0 to DevOps in 80 Days [Webinar Replay]
From 0 to DevOps in 80 Days [Webinar Replay]Dynatrace
958 views45 slides
Accelerate User Driven Innovation [Webinar] by
Accelerate User Driven Innovation [Webinar]Accelerate User Driven Innovation [Webinar]
Accelerate User Driven Innovation [Webinar]Dynatrace
748 views45 slides

Similar to Flink Forward San Francisco 2018: Andrew Gao & Jeff Sharpe - "Finding Bad Acorns"(20)

DevOps on Steroids Featuring Red Hat & Alantiss - Pop-up Loft Tel Aviv by Amazon Web Services
DevOps on Steroids Featuring Red Hat & Alantiss - Pop-up Loft Tel AvivDevOps on Steroids Featuring Red Hat & Alantiss - Pop-up Loft Tel Aviv
DevOps on Steroids Featuring Red Hat & Alantiss - Pop-up Loft Tel Aviv
Amazon Web Services1.8K views
Webinar by ZNetLive & Plesk- Winning the Game for WebOps and DevOps by ZNetLive
Webinar by ZNetLive & Plesk- Winning the Game for WebOps and DevOps Webinar by ZNetLive & Plesk- Winning the Game for WebOps and DevOps
Webinar by ZNetLive & Plesk- Winning the Game for WebOps and DevOps
ZNetLive634 views
GE Capital Legacy Modernization and Mainframe Conversion by guatham
GE Capital Legacy Modernization and Mainframe ConversionGE Capital Legacy Modernization and Mainframe Conversion
GE Capital Legacy Modernization and Mainframe Conversion
guatham30 views
Orchestrate Your End-to-end Mainframe Application Release Pipeline by DevOps.com
Orchestrate Your End-to-end Mainframe Application Release PipelineOrchestrate Your End-to-end Mainframe Application Release Pipeline
Orchestrate Your End-to-end Mainframe Application Release Pipeline
DevOps.com89 views
From 0 to DevOps in 80 Days [Webinar Replay] by Dynatrace
From 0 to DevOps in 80 Days [Webinar Replay]From 0 to DevOps in 80 Days [Webinar Replay]
From 0 to DevOps in 80 Days [Webinar Replay]
Dynatrace958 views
Accelerate User Driven Innovation [Webinar] by Dynatrace
Accelerate User Driven Innovation [Webinar]Accelerate User Driven Innovation [Webinar]
Accelerate User Driven Innovation [Webinar]
Dynatrace748 views
Hybrid and Multi-Cloud Strategies for Kubernetes with GitOps by Weaveworks
Hybrid and Multi-Cloud Strategies for Kubernetes with GitOpsHybrid and Multi-Cloud Strategies for Kubernetes with GitOps
Hybrid and Multi-Cloud Strategies for Kubernetes with GitOps
Weaveworks245 views
Hybrid and Multi-Cloud Strategies for Kubernetes with GitOps by Sonja Schweigert
Hybrid and Multi-Cloud Strategies for Kubernetes with GitOpsHybrid and Multi-Cloud Strategies for Kubernetes with GitOps
Hybrid and Multi-Cloud Strategies for Kubernetes with GitOps
Sonja Schweigert62 views
DevOps adoption in the enterprise by Sanjeev Sharma
DevOps adoption in the enterpriseDevOps adoption in the enterprise
DevOps adoption in the enterprise
Sanjeev Sharma2.3K views
Webinar: Capabilities, Confidence and Community – What Flux GA Means for You by Weaveworks
Webinar: Capabilities, Confidence and Community – What Flux GA Means for YouWebinar: Capabilities, Confidence and Community – What Flux GA Means for You
Webinar: Capabilities, Confidence and Community – What Flux GA Means for You
Weaveworks8 views
Top 5 benefits of docker by John Zaccone
Top 5 benefits of dockerTop 5 benefits of docker
Top 5 benefits of docker
John Zaccone3.4K views
Intro to GitOps with Weave GitOps, Flagger and Linkerd by Weaveworks
Intro to GitOps with Weave GitOps, Flagger and LinkerdIntro to GitOps with Weave GitOps, Flagger and Linkerd
Intro to GitOps with Weave GitOps, Flagger and Linkerd
Weaveworks85 views
Transform Digital Business with DevOps by Daniel Oh
Transform Digital Business with DevOpsTransform Digital Business with DevOps
Transform Digital Business with DevOps
Daniel Oh551 views
Continuous testing for Agile and DevOps teams by Laurent PY
Continuous testing for Agile and DevOps teamsContinuous testing for Agile and DevOps teams
Continuous testing for Agile and DevOps teams
Laurent PY251 views
Securing Red Hat OpenShift Containerized Applications At Enterprise Scale by DevOps.com
Securing Red Hat OpenShift Containerized Applications At Enterprise ScaleSecuring Red Hat OpenShift Containerized Applications At Enterprise Scale
Securing Red Hat OpenShift Containerized Applications At Enterprise Scale
DevOps.com266 views
Docker & aPaaS: Enterprise Innovation and Trends for 2015 by WaveMaker, Inc.
Docker & aPaaS: Enterprise Innovation and Trends for 2015Docker & aPaaS: Enterprise Innovation and Trends for 2015
Docker & aPaaS: Enterprise Innovation and Trends for 2015
WaveMaker, Inc.6K views
IBM JavaOne Community Keynote 2017 by John Duimovich
IBM JavaOne Community Keynote 2017IBM JavaOne Community Keynote 2017
IBM JavaOne Community Keynote 2017
John Duimovich545 views

More from Flink Forward

Building a fully managed stream processing platform on Flink at scale for Lin... by
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Flink Forward
856 views56 slides
Evening out the uneven: dealing with skew in Flink by
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkFlink Forward
2.5K views35 slides
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ... by
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Flink Forward
579 views34 slides
Autoscaling Flink with Reactive Mode by
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeFlink Forward
926 views17 slides
One sink to rule them all: Introducing the new Async Sink by
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkFlink Forward
314 views10 slides
Tuning Apache Kafka Connectors for Flink.pptx by
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxFlink Forward
430 views54 slides

More from Flink Forward(20)

Building a fully managed stream processing platform on Flink at scale for Lin... by Flink Forward
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
Flink Forward856 views
Evening out the uneven: dealing with skew in Flink by Flink Forward
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
Flink Forward2.5K views
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ... by Flink Forward
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Flink Forward579 views
Autoscaling Flink with Reactive Mode by Flink Forward
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
Flink Forward926 views
One sink to rule them all: Introducing the new Async Sink by Flink Forward
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
Flink Forward314 views
Tuning Apache Kafka Connectors for Flink.pptx by Flink Forward
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
Flink Forward430 views
Apache Flink in the Cloud-Native Era by Flink Forward
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
Flink Forward174 views
Using the New Apache Flink Kubernetes Operator in a Production Deployment by Flink Forward
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Flink Forward655 views
The Current State of Table API in 2022 by Flink Forward
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
Flink Forward173 views
Dynamic Rule-based Real-time Market Data Alerts by Flink Forward
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
Flink Forward756 views
Processing Semantically-Ordered Streams in Financial Services by Flink Forward
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
Flink Forward169 views
Tame the small files problem and optimize data layout for streaming ingestion... by Flink Forward
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
Flink Forward809 views
Batch Processing at Scale with Flink & Iceberg by Flink Forward
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
Flink Forward592 views
Extending Flink SQL for stream processing use cases by Flink Forward
Extending Flink SQL for stream processing use casesExtending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use cases
Flink Forward117 views
Using Queryable State for Fun and Profit by Flink Forward
Using Queryable State for Fun and ProfitUsing Queryable State for Fun and Profit
Using Queryable State for Fun and Profit
Flink Forward261 views
Changelog Stream Processing with Apache Flink by Flink Forward
Changelog Stream Processing with Apache FlinkChangelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache Flink
Flink Forward399 views
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap... by Flink Forward
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Flink Forward3.2K views
Building Reliable Lakehouses with Apache Flink and Delta Lake by Flink Forward
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta Lake
Flink Forward570 views
How to build a streaming Lakehouse with Flink, Kafka, and Hudi by Flink Forward
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
Flink Forward489 views
Virtual Flink Forward 2020: Apache Flink Worst Wractices - Konstantin Knauf by Flink Forward
Virtual Flink Forward 2020: Apache Flink Worst Wractices - Konstantin KnaufVirtual Flink Forward 2020: Apache Flink Worst Wractices - Konstantin Knauf
Virtual Flink Forward 2020: Apache Flink Worst Wractices - Konstantin Knauf
Flink Forward2.7K views

Recently uploaded

Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ... by
Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ...Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ...
Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ...ShapeBlue
144 views12 slides
DRBD Deep Dive - Philipp Reisner - LINBIT by
DRBD Deep Dive - Philipp Reisner - LINBITDRBD Deep Dive - Philipp Reisner - LINBIT
DRBD Deep Dive - Philipp Reisner - LINBITShapeBlue
140 views21 slides
NTGapps NTG LowCode Platform by
NTGapps NTG LowCode Platform NTGapps NTG LowCode Platform
NTGapps NTG LowCode Platform Mustafa Kuğu
365 views30 slides
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue by
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlueCloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlueShapeBlue
93 views15 slides
Business Analyst Series 2023 - Week 4 Session 7 by
Business Analyst Series 2023 -  Week 4 Session 7Business Analyst Series 2023 -  Week 4 Session 7
Business Analyst Series 2023 - Week 4 Session 7DianaGray10
126 views31 slides
Data Integrity for Banking and Financial Services by
Data Integrity for Banking and Financial ServicesData Integrity for Banking and Financial Services
Data Integrity for Banking and Financial ServicesPrecisely
78 views26 slides

Recently uploaded(20)

Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ... by ShapeBlue
Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ...Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ...
Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ...
ShapeBlue144 views
DRBD Deep Dive - Philipp Reisner - LINBIT by ShapeBlue
DRBD Deep Dive - Philipp Reisner - LINBITDRBD Deep Dive - Philipp Reisner - LINBIT
DRBD Deep Dive - Philipp Reisner - LINBIT
ShapeBlue140 views
NTGapps NTG LowCode Platform by Mustafa Kuğu
NTGapps NTG LowCode Platform NTGapps NTG LowCode Platform
NTGapps NTG LowCode Platform
Mustafa Kuğu365 views
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue by ShapeBlue
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlueCloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue
ShapeBlue93 views
Business Analyst Series 2023 - Week 4 Session 7 by DianaGray10
Business Analyst Series 2023 -  Week 4 Session 7Business Analyst Series 2023 -  Week 4 Session 7
Business Analyst Series 2023 - Week 4 Session 7
DianaGray10126 views
Data Integrity for Banking and Financial Services by Precisely
Data Integrity for Banking and Financial ServicesData Integrity for Banking and Financial Services
Data Integrity for Banking and Financial Services
Precisely78 views
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ... by ShapeBlue
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...
ShapeBlue79 views
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f... by TrustArc
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc160 views
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlue by ShapeBlue
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlueCloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlue
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlue
ShapeBlue94 views
The Role of Patterns in the Era of Large Language Models by Yunyao Li
The Role of Patterns in the Era of Large Language ModelsThe Role of Patterns in the Era of Large Language Models
The Role of Patterns in the Era of Large Language Models
Yunyao Li80 views
Setting Up Your First CloudStack Environment with Beginners Challenges - MD R... by ShapeBlue
Setting Up Your First CloudStack Environment with Beginners Challenges - MD R...Setting Up Your First CloudStack Environment with Beginners Challenges - MD R...
Setting Up Your First CloudStack Environment with Beginners Challenges - MD R...
ShapeBlue132 views
"Surviving highload with Node.js", Andrii Shumada by Fwdays
"Surviving highload with Node.js", Andrii Shumada "Surviving highload with Node.js", Andrii Shumada
"Surviving highload with Node.js", Andrii Shumada
Fwdays53 views
The Power of Heat Decarbonisation Plans in the Built Environment by IES VE
The Power of Heat Decarbonisation Plans in the Built EnvironmentThe Power of Heat Decarbonisation Plans in the Built Environment
The Power of Heat Decarbonisation Plans in the Built Environment
IES VE69 views
Extending KVM Host HA for Non-NFS Storage - Alex Ivanov - StorPool by ShapeBlue
Extending KVM Host HA for Non-NFS Storage -  Alex Ivanov - StorPoolExtending KVM Host HA for Non-NFS Storage -  Alex Ivanov - StorPool
Extending KVM Host HA for Non-NFS Storage - Alex Ivanov - StorPool
ShapeBlue84 views
Future of AR - Facebook Presentation by Rob McCarty
Future of AR - Facebook PresentationFuture of AR - Facebook Presentation
Future of AR - Facebook Presentation
Rob McCarty62 views
Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ... by ShapeBlue
Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ...Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ...
Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ...
ShapeBlue85 views
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT by ShapeBlue
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBITUpdates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT
ShapeBlue166 views

Flink Forward San Francisco 2018: Andrew Gao & Jeff Sharpe - "Finding Bad Acorns"

Editor's Notes

  1. Jeff Intro Andrew Intro We are part of the Forest teams(very high level intro) Kubernetes-based fraud decisioning platform that you can deploy multiple fraud use cases on With the goal of being able to rapidly spin up fraud apps Running in Production since September 2017
  2. Our talk today: Talk briefly about our journey building out this Forest platform using Kubernetes as well as talk about how we used Flink with Kubernetes at a high level Then talk about a specific use case we have on the platform and do a deep dive on what’s inside our Flink app
  3. Customers First If one day you take a look at your bank account and its empty However if your account was locked for no reason you would be upset This sense of balance between catching stopping fraud and providing a great customer experience is a common trend that we have to deal with If we wanted to stop fraud completely we could just stop letting people take their money On a similar note, we have a limited number of fraud operators Do not have the manpower to call every single person up and ask them Primary directive of the platform is to empower Data Scientists/ Data Analysts by building the tools on the platform to help create the models needed to make decisions This includes having access to all the data in a fast and easy-to-understand format Seeing how their models are performing, and whether the features are being calculated as expected When they need to refit the model they need to be able to do the data transformations quickly so we can turn a refreshed model around Lastly as we are developing a fraud platform, we need to keep in mind the engineers/developers that will be developing the fraud app it should be something that engineers enjoy to develop on When you have a feature/model/action repository its very easy to develop turn around fraud apps To help us balance these different needs we have our product owners to help bridge the gap
  4. Customers First If one day you take a look at your bank account and its empty However if your account was locked for no reason you would be upset This sense of balance between catching stopping fraud and providing a great customer experience is a common trend that we have to deal with If we wanted to stop fraud completely we could just stop letting people take their money On a similar note, we have a limited number of fraud operators Do not have the manpower to call every single person up and ask them Primary directive of the platform is to empower Data Scientists/ Data Analysts by building the tools on the platform to help create the models needed to make decisions This includes having access to all the data in a fast and easy-to-understand format Seeing how their models are performing, and whether the features are being calculated as expected When they need to refit the model they need to be able to do the data transformations quickly so we can turn a refreshed model around Lastly as we are developing a fraud platform, we need to keep in mind the engineers/developers that will be developing the fraud app it should be something that engineers enjoy to develop on When you have a feature/model/action repository its very easy to develop turn around fraud apps To help us balance these different needs we have our product owners to help bridge the gap
  5. 14 EC2s 6 m4.10xlarge for general minions 5 m4.2xlarge for kafka nodes 3 m4.large for masters Ansible to provision 200+ pods Flink apps in Java/Scala/Kotlin Microservices in Golang
  6. Holy smokes that’s a lot Zookeeper/Kafka/Flink/Nifi Kappa Architecture Kafka is our primary messaging bus throughout the platform Nifi is one of the tools we use to grab data from different sources in the company Flink does the calculations and applies needed transformations Minio/Istio to handle http communications throughout the platform EFK = ElasticSearch / FluentD / Kibana Docker logs Managed AWS service Influx / Prometheus / Grafana Metrics reporting and Dashboards Platform health Fraud health Drill / zeppelin / s3 for data analysts to view transactions Why are we switching from influx to prometheus
  7. Holy smokes that’s a lot Zookeeper/Kafka/Flink/Nifi Kafka is our primary messaging bus throughout the platform Nifi is one of the tools we use to grab data from different sources in the company Flink does the calculations and applies needed transformations Minio/Istio to handle http communications throughout the platform EFK = ElasticSearch / FluentD / Kibana Docker logs Managed AWS service Influx / Prometheus / Grafana Metrics reporting and Dashboards Platform health Fraud health Drill / zeppelin / s3 for data analysts to view transactions Why are we switching from influx to prometheus
  8. Kubernetes has been a challenge If a task manager goes down, it will auto-heal If your configurations are set up correctly you can just delete pods and they’ll come back Unless your configurations are completely fleshed out, the blast radius on failure can be rippling Situation where docker logs could not make it out to kubernetes logs because the docker machines were dying Developed internal tool for ci/cd and deployment
  9. Use cases tell us the resources they need and we provision them a flink cluster 1 Job Manager per cluster 5 Task Managers per cluster RocksDB backend Checkpoint/Savepoint persist on S3 Job Deployment Options
  10. Considerations People obviously don’t want to wait too long But we want to respond with the most data we have available on the customer
  11. Two data streams need to share state Data stream from online interactions / all other customer interactions Data stream that we receive from the branch Need to calculate Features Need to apply ML model Need to respond in real-time
  12. Developed in python, evaluating golang Developed internal tool for ci/cd and deployment
  13. Teller transactions have a real-time SLA Connected Streams is the culprit Break Up One Flink App into Smaller Flink Queryable State Apps Flink Apps as Functions Disparate Data Streams: Back Pressure In our case: we have all the account level activity for a given customer from one source and on the other we have the data from the teller machine Not all transactions are equal due to their source. However in a ML world we still want to examine every transaction Results in back pressure and uneven transaction flow
  14. Alvin for each data source Scurry of Alvins build out our feature repository Theodore builds his own features, adds on features for Alvin and the passes it down Why did we break Simon out? We can replace it with anything such as Seldon
  15. https://issues.apache.org/jira/browse/FLINK-7783 https://issues.apache.org/jira/browse/FLINK-8487