SlideShare a Scribd company logo
1 of 31
FINDING BAD
ACORNS
ANDREW GAO
&
JEFF SHARPE
FLINK FORWARD 2018
ANDREW GAO JEFF SHARPE
Developing a Fraud
Defense Platform
Fraud Defense at the
Teller Using Flink
Our journey to build a Fraud Decisioning Platform and use
Flink to build out the use cases
DEVELOPING A FRAUD DEFENSE PLATFORM
OUR USERS
Fraud
Operator
Customer
Data
Scientist
Data
Analyst
Engineer
Product
Owner
OUR USERS
Fraud
Operator
Customer
Data
Scientist
Data
Analyst
Engineer
Product
Owner
ARCHITECTURE
DATA ACTIONS
MAGIC!
RUNNING ON
RUNNING ON
PROS
• Community support for
Docker/Kube
• Resilient
• Easy to tear down and bring
back
• Maximizing resource efficiency
CONS
• Maintaining your own
Kubernetes solution
• Containing blast radius
• Edge cases when combining #
of technology solutions
Developing on Kubernetes has been challenging but very
rewarding
FRAUD DEFENSE AT THE TELLER
A FLINK MONOLITH
• Problem: Develop a stream processing workflow for
two legacy batch data sources
• First Attempt: Do everything in Flink and take
advantage of Flink Connected Streams
1
2 3
Using Flink operators to build our application workflow
4
PROS
• Cheap
• Not a lot of
Code/Config
• Scalability / Availability
• Deployments are a
breeze
CONS
• Not truly stateless
• Start-up time
AWS Lambda is a good fit for our use case and works well
with our underlying technologies
1
2 3
Using Flink operators to build our application workflow
4
90 Day Storage Window
CUSTOM WINDOWS FOR OPTIMIZATION
AND PORTABILITY
30 Day Virtual View
90 Day Filtered View
CUSTOM WINDOWS FOR OPTIMIZATION
AND PORTABILITY
Most-Recent-Beyond-24-Hours Window
24 Hour Offset Dynamic Window
1
2 3
Using Flink operators to build our application workflow
4
USING JYTHON TO BRIDGE THE GAP TO
DATA SCIENTISTS
Flink
Jython Adapter
.py .py .py .py
Windows
Data
Featur
e
Featur
e
Featur
e
Featur
e
Featur
e
Featur
e
Featur
e
Featur
e
.py .py .py .py
Data
GITFLOW AND JYTHON IMPROVE
TRACEABILITY
Featur
e JAR
v1.0.42
Junit
Tests
Pull
Request
Merge
Build
Develop Denied
Failed
Maven
Import
Junit
Tests
Build
Flink
Job
JAR
Commit
1
2 3
Using Flink operators to build our application workflow
4
FEATURES EXIST TO FEED MODELS
FeatureFeature
Model Model Score
H20 Tensor Flow Seldon (whatever)
BREAKING UP THE MONOLITH
• Problem: Back Pressure leading to Delayed Transactions
• Solution: Break up the monolith Flink App into small Queryable State
Apps
CHIPMUNKS
•Connected Streams
•Flink Keyed State
•Checkpointing/Savepointing
•Queryable State
Features Used
•Flink Versioning (FLINK-7783, FLINK-8487)
•Keyed Source Function
•Kafka Offsets
Issues
We had a lot of fun and success using Flink, but not without a
few hiccups
Developing a Fraud
Defense Platform
Fraud Defense at the
Teller Using Flink
Our journey to build a Fraud Decisioning Platform and use
Flink to build out the use cases
QUESTIONS?

More Related Content

What's hot

End to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
End to end Machine Learning using Kubeflow - Build, Train, Deploy and ManageEnd to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
End to end Machine Learning using Kubeflow - Build, Train, Deploy and ManageAnimesh Singh
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Flink Forward
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotFlink Forward
 
OpenTelemetry For Architects
OpenTelemetry For ArchitectsOpenTelemetry For Architects
OpenTelemetry For ArchitectsKevin Brockhoff
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkFlink Forward
 
Deploying Flink on Kubernetes - David Anderson
 Deploying Flink on Kubernetes - David Anderson Deploying Flink on Kubernetes - David Anderson
Deploying Flink on Kubernetes - David AndersonVerverica
 
Distributed tracing using open tracing & jaeger 2
Distributed tracing using open tracing & jaeger 2Distributed tracing using open tracing & jaeger 2
Distributed tracing using open tracing & jaeger 2Chandresh Pancholi
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsFlink Forward
 
Near real-time statistical modeling and anomaly detection using Flink!
Near real-time statistical modeling and anomaly detection using Flink!Near real-time statistical modeling and anomaly detection using Flink!
Near real-time statistical modeling and anomaly detection using Flink!Flink Forward
 
Introduction to Apache Flink
Introduction to Apache FlinkIntroduction to Apache Flink
Introduction to Apache Flinkdatamantra
 
Opentelemetry - From frontend to backend
Opentelemetry - From frontend to backendOpentelemetry - From frontend to backend
Opentelemetry - From frontend to backendSebastian Poxhofer
 
Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Flink Forward
 
Thomas Lamirault_Mohamed Amine Abdessemed -A brief history of time with Apac...
Thomas Lamirault_Mohamed Amine Abdessemed  -A brief history of time with Apac...Thomas Lamirault_Mohamed Amine Abdessemed  -A brief history of time with Apac...
Thomas Lamirault_Mohamed Amine Abdessemed -A brief history of time with Apac...Flink Forward
 
Combining logs, metrics, and traces for unified observability
Combining logs, metrics, and traces for unified observabilityCombining logs, metrics, and traces for unified observability
Combining logs, metrics, and traces for unified observabilityElasticsearch
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used forAljoscha Krettek
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Jean-Paul Azar
 
OpenTelemetry For Operators
OpenTelemetry For OperatorsOpenTelemetry For Operators
OpenTelemetry For OperatorsKevin Brockhoff
 
Introducing Change Data Capture with Debezium
Introducing Change Data Capture with DebeziumIntroducing Change Data Capture with Debezium
Introducing Change Data Capture with DebeziumChengKuan Gan
 
Anatomy of a Spring Boot App with Clean Architecture - Spring I/O 2023
Anatomy of a Spring Boot App with Clean Architecture - Spring I/O 2023Anatomy of a Spring Boot App with Clean Architecture - Spring I/O 2023
Anatomy of a Spring Boot App with Clean Architecture - Spring I/O 2023Steve Pember
 

What's hot (20)

End to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
End to end Machine Learning using Kubeflow - Build, Train, Deploy and ManageEnd to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
End to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
OpenTelemetry For Architects
OpenTelemetry For ArchitectsOpenTelemetry For Architects
OpenTelemetry For Architects
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
 
Deploying Flink on Kubernetes - David Anderson
 Deploying Flink on Kubernetes - David Anderson Deploying Flink on Kubernetes - David Anderson
Deploying Flink on Kubernetes - David Anderson
 
Distributed tracing using open tracing & jaeger 2
Distributed tracing using open tracing & jaeger 2Distributed tracing using open tracing & jaeger 2
Distributed tracing using open tracing & jaeger 2
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
 
Near real-time statistical modeling and anomaly detection using Flink!
Near real-time statistical modeling and anomaly detection using Flink!Near real-time statistical modeling and anomaly detection using Flink!
Near real-time statistical modeling and anomaly detection using Flink!
 
Introduction to Apache Flink
Introduction to Apache FlinkIntroduction to Apache Flink
Introduction to Apache Flink
 
Envoy and Kafka
Envoy and KafkaEnvoy and Kafka
Envoy and Kafka
 
Opentelemetry - From frontend to backend
Opentelemetry - From frontend to backendOpentelemetry - From frontend to backend
Opentelemetry - From frontend to backend
 
Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
 
Thomas Lamirault_Mohamed Amine Abdessemed -A brief history of time with Apac...
Thomas Lamirault_Mohamed Amine Abdessemed  -A brief history of time with Apac...Thomas Lamirault_Mohamed Amine Abdessemed  -A brief history of time with Apac...
Thomas Lamirault_Mohamed Amine Abdessemed -A brief history of time with Apac...
 
Combining logs, metrics, and traces for unified observability
Combining logs, metrics, and traces for unified observabilityCombining logs, metrics, and traces for unified observability
Combining logs, metrics, and traces for unified observability
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used for
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
 
OpenTelemetry For Operators
OpenTelemetry For OperatorsOpenTelemetry For Operators
OpenTelemetry For Operators
 
Introducing Change Data Capture with Debezium
Introducing Change Data Capture with DebeziumIntroducing Change Data Capture with Debezium
Introducing Change Data Capture with Debezium
 
Anatomy of a Spring Boot App with Clean Architecture - Spring I/O 2023
Anatomy of a Spring Boot App with Clean Architecture - Spring I/O 2023Anatomy of a Spring Boot App with Clean Architecture - Spring I/O 2023
Anatomy of a Spring Boot App with Clean Architecture - Spring I/O 2023
 

Similar to Flink Forward San Francisco 2018: Andrew Gao & Jeff Sharpe - "Finding Bad Acorns"

DevOps on Steroids Featuring Red Hat & Alantiss - Pop-up Loft Tel Aviv
DevOps on Steroids Featuring Red Hat & Alantiss - Pop-up Loft Tel AvivDevOps on Steroids Featuring Red Hat & Alantiss - Pop-up Loft Tel Aviv
DevOps on Steroids Featuring Red Hat & Alantiss - Pop-up Loft Tel AvivAmazon Web Services
 
Webinar by ZNetLive & Plesk- Winning the Game for WebOps and DevOps
Webinar by ZNetLive & Plesk- Winning the Game for WebOps and DevOps Webinar by ZNetLive & Plesk- Winning the Game for WebOps and DevOps
Webinar by ZNetLive & Plesk- Winning the Game for WebOps and DevOps ZNetLive
 
GE Capital Legacy Modernization and Mainframe Conversion
GE Capital Legacy Modernization and Mainframe ConversionGE Capital Legacy Modernization and Mainframe Conversion
GE Capital Legacy Modernization and Mainframe Conversionguatham
 
Orchestrate Your End-to-end Mainframe Application Release Pipeline
Orchestrate Your End-to-end Mainframe Application Release PipelineOrchestrate Your End-to-end Mainframe Application Release Pipeline
Orchestrate Your End-to-end Mainframe Application Release PipelineDevOps.com
 
From 0 to DevOps in 80 Days [Webinar Replay]
From 0 to DevOps in 80 Days [Webinar Replay]From 0 to DevOps in 80 Days [Webinar Replay]
From 0 to DevOps in 80 Days [Webinar Replay]Dynatrace
 
Accelerate User Driven Innovation [Webinar]
Accelerate User Driven Innovation [Webinar]Accelerate User Driven Innovation [Webinar]
Accelerate User Driven Innovation [Webinar]Dynatrace
 
Hybrid and Multi-Cloud Strategies for Kubernetes with GitOps
Hybrid and Multi-Cloud Strategies for Kubernetes with GitOpsHybrid and Multi-Cloud Strategies for Kubernetes with GitOps
Hybrid and Multi-Cloud Strategies for Kubernetes with GitOpsWeaveworks
 
Hybrid and Multi-Cloud Strategies for Kubernetes with GitOps
Hybrid and Multi-Cloud Strategies for Kubernetes with GitOpsHybrid and Multi-Cloud Strategies for Kubernetes with GitOps
Hybrid and Multi-Cloud Strategies for Kubernetes with GitOpsSonja Schweigert
 
DevOps adoption in the enterprise
DevOps adoption in the enterpriseDevOps adoption in the enterprise
DevOps adoption in the enterpriseSanjeev Sharma
 
Webinar: Capabilities, Confidence and Community – What Flux GA Means for You
Webinar: Capabilities, Confidence and Community – What Flux GA Means for YouWebinar: Capabilities, Confidence and Community – What Flux GA Means for You
Webinar: Capabilities, Confidence and Community – What Flux GA Means for YouWeaveworks
 
Top 5 benefits of docker
Top 5 benefits of dockerTop 5 benefits of docker
Top 5 benefits of dockerJohn Zaccone
 
Intro to GitOps with Weave GitOps, Flagger and Linkerd
Intro to GitOps with Weave GitOps, Flagger and LinkerdIntro to GitOps with Weave GitOps, Flagger and Linkerd
Intro to GitOps with Weave GitOps, Flagger and LinkerdWeaveworks
 
Transform Digital Business with DevOps
Transform Digital Business with DevOpsTransform Digital Business with DevOps
Transform Digital Business with DevOpsDaniel Oh
 
Continuous testing for Agile and DevOps teams
Continuous testing for Agile and DevOps teamsContinuous testing for Agile and DevOps teams
Continuous testing for Agile and DevOps teamsLaurent PY
 
Securing Red Hat OpenShift Containerized Applications At Enterprise Scale
Securing Red Hat OpenShift Containerized Applications At Enterprise ScaleSecuring Red Hat OpenShift Containerized Applications At Enterprise Scale
Securing Red Hat OpenShift Containerized Applications At Enterprise ScaleDevOps.com
 
Docker & aPaaS: Enterprise Innovation and Trends for 2015
Docker & aPaaS: Enterprise Innovation and Trends for 2015Docker & aPaaS: Enterprise Innovation and Trends for 2015
Docker & aPaaS: Enterprise Innovation and Trends for 2015WaveMaker, Inc.
 
Continuous Deployment To The Cloud
Continuous Deployment To The CloudContinuous Deployment To The Cloud
Continuous Deployment To The CloudMarcin Grzejszczak
 
IBM JavaOne Community Keynote 2017
IBM JavaOne Community Keynote 2017IBM JavaOne Community Keynote 2017
IBM JavaOne Community Keynote 2017John Duimovich
 
Meetup Openshift Geneva 03/10
Meetup Openshift Geneva 03/10Meetup Openshift Geneva 03/10
Meetup Openshift Geneva 03/10MagaliDavidCruz
 

Similar to Flink Forward San Francisco 2018: Andrew Gao & Jeff Sharpe - "Finding Bad Acorns" (20)

DevOps on Steroids Featuring Red Hat & Alantiss - Pop-up Loft Tel Aviv
DevOps on Steroids Featuring Red Hat & Alantiss - Pop-up Loft Tel AvivDevOps on Steroids Featuring Red Hat & Alantiss - Pop-up Loft Tel Aviv
DevOps on Steroids Featuring Red Hat & Alantiss - Pop-up Loft Tel Aviv
 
Webinar by ZNetLive & Plesk- Winning the Game for WebOps and DevOps
Webinar by ZNetLive & Plesk- Winning the Game for WebOps and DevOps Webinar by ZNetLive & Plesk- Winning the Game for WebOps and DevOps
Webinar by ZNetLive & Plesk- Winning the Game for WebOps and DevOps
 
GE Capital Legacy Modernization and Mainframe Conversion
GE Capital Legacy Modernization and Mainframe ConversionGE Capital Legacy Modernization and Mainframe Conversion
GE Capital Legacy Modernization and Mainframe Conversion
 
Orchestrate Your End-to-end Mainframe Application Release Pipeline
Orchestrate Your End-to-end Mainframe Application Release PipelineOrchestrate Your End-to-end Mainframe Application Release Pipeline
Orchestrate Your End-to-end Mainframe Application Release Pipeline
 
From 0 to DevOps in 80 Days [Webinar Replay]
From 0 to DevOps in 80 Days [Webinar Replay]From 0 to DevOps in 80 Days [Webinar Replay]
From 0 to DevOps in 80 Days [Webinar Replay]
 
Accelerate User Driven Innovation [Webinar]
Accelerate User Driven Innovation [Webinar]Accelerate User Driven Innovation [Webinar]
Accelerate User Driven Innovation [Webinar]
 
Hybrid and Multi-Cloud Strategies for Kubernetes with GitOps
Hybrid and Multi-Cloud Strategies for Kubernetes with GitOpsHybrid and Multi-Cloud Strategies for Kubernetes with GitOps
Hybrid and Multi-Cloud Strategies for Kubernetes with GitOps
 
Hybrid and Multi-Cloud Strategies for Kubernetes with GitOps
Hybrid and Multi-Cloud Strategies for Kubernetes with GitOpsHybrid and Multi-Cloud Strategies for Kubernetes with GitOps
Hybrid and Multi-Cloud Strategies for Kubernetes with GitOps
 
DevOps adoption in the enterprise
DevOps adoption in the enterpriseDevOps adoption in the enterprise
DevOps adoption in the enterprise
 
Webinar: Capabilities, Confidence and Community – What Flux GA Means for You
Webinar: Capabilities, Confidence and Community – What Flux GA Means for YouWebinar: Capabilities, Confidence and Community – What Flux GA Means for You
Webinar: Capabilities, Confidence and Community – What Flux GA Means for You
 
Open by Design
Open by DesignOpen by Design
Open by Design
 
Top 5 benefits of docker
Top 5 benefits of dockerTop 5 benefits of docker
Top 5 benefits of docker
 
Intro to GitOps with Weave GitOps, Flagger and Linkerd
Intro to GitOps with Weave GitOps, Flagger and LinkerdIntro to GitOps with Weave GitOps, Flagger and Linkerd
Intro to GitOps with Weave GitOps, Flagger and Linkerd
 
Transform Digital Business with DevOps
Transform Digital Business with DevOpsTransform Digital Business with DevOps
Transform Digital Business with DevOps
 
Continuous testing for Agile and DevOps teams
Continuous testing for Agile and DevOps teamsContinuous testing for Agile and DevOps teams
Continuous testing for Agile and DevOps teams
 
Securing Red Hat OpenShift Containerized Applications At Enterprise Scale
Securing Red Hat OpenShift Containerized Applications At Enterprise ScaleSecuring Red Hat OpenShift Containerized Applications At Enterprise Scale
Securing Red Hat OpenShift Containerized Applications At Enterprise Scale
 
Docker & aPaaS: Enterprise Innovation and Trends for 2015
Docker & aPaaS: Enterprise Innovation and Trends for 2015Docker & aPaaS: Enterprise Innovation and Trends for 2015
Docker & aPaaS: Enterprise Innovation and Trends for 2015
 
Continuous Deployment To The Cloud
Continuous Deployment To The CloudContinuous Deployment To The Cloud
Continuous Deployment To The Cloud
 
IBM JavaOne Community Keynote 2017
IBM JavaOne Community Keynote 2017IBM JavaOne Community Keynote 2017
IBM JavaOne Community Keynote 2017
 
Meetup Openshift Geneva 03/10
Meetup Openshift Geneva 03/10Meetup Openshift Geneva 03/10
Meetup Openshift Geneva 03/10
 

More from Flink Forward

Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkFlink Forward
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...Flink Forward
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Flink Forward
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorFlink Forward
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeFlink Forward
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Flink Forward
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxFlink Forward
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink Forward
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraFlink Forward
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkFlink Forward
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022Flink Forward
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink Forward
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesFlink Forward
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Flink Forward
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergFlink Forward
 
Welcome to the Flink Community!
Welcome to the Flink Community!Welcome to the Flink Community!
Welcome to the Flink Community!Flink Forward
 
Practical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsFlink Forward
 
The top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleThe top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleFlink Forward
 
Using Queryable State for Fun and Profit
Using Queryable State for Fun and ProfitUsing Queryable State for Fun and Profit
Using Queryable State for Fun and ProfitFlink Forward
 
Changelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache FlinkChangelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache FlinkFlink Forward
 

More from Flink Forward (20)

Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 
Welcome to the Flink Community!
Welcome to the Flink Community!Welcome to the Flink Community!
Welcome to the Flink Community!
 
Practical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobs
 
The top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleThe top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scale
 
Using Queryable State for Fun and Profit
Using Queryable State for Fun and ProfitUsing Queryable State for Fun and Profit
Using Queryable State for Fun and Profit
 
Changelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache FlinkChangelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache Flink
 

Recently uploaded

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 

Recently uploaded (20)

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

Flink Forward San Francisco 2018: Andrew Gao & Jeff Sharpe - "Finding Bad Acorns"

Editor's Notes

  1. Jeff Intro Andrew Intro We are part of the Forest teams(very high level intro) Kubernetes-based fraud decisioning platform that you can deploy multiple fraud use cases on With the goal of being able to rapidly spin up fraud apps Running in Production since September 2017
  2. Our talk today: Talk briefly about our journey building out this Forest platform using Kubernetes as well as talk about how we used Flink with Kubernetes at a high level Then talk about a specific use case we have on the platform and do a deep dive on what’s inside our Flink app
  3. Customers First If one day you take a look at your bank account and its empty However if your account was locked for no reason you would be upset This sense of balance between catching stopping fraud and providing a great customer experience is a common trend that we have to deal with If we wanted to stop fraud completely we could just stop letting people take their money On a similar note, we have a limited number of fraud operators Do not have the manpower to call every single person up and ask them Primary directive of the platform is to empower Data Scientists/ Data Analysts by building the tools on the platform to help create the models needed to make decisions This includes having access to all the data in a fast and easy-to-understand format Seeing how their models are performing, and whether the features are being calculated as expected When they need to refit the model they need to be able to do the data transformations quickly so we can turn a refreshed model around Lastly as we are developing a fraud platform, we need to keep in mind the engineers/developers that will be developing the fraud app it should be something that engineers enjoy to develop on When you have a feature/model/action repository its very easy to develop turn around fraud apps To help us balance these different needs we have our product owners to help bridge the gap
  4. Customers First If one day you take a look at your bank account and its empty However if your account was locked for no reason you would be upset This sense of balance between catching stopping fraud and providing a great customer experience is a common trend that we have to deal with If we wanted to stop fraud completely we could just stop letting people take their money On a similar note, we have a limited number of fraud operators Do not have the manpower to call every single person up and ask them Primary directive of the platform is to empower Data Scientists/ Data Analysts by building the tools on the platform to help create the models needed to make decisions This includes having access to all the data in a fast and easy-to-understand format Seeing how their models are performing, and whether the features are being calculated as expected When they need to refit the model they need to be able to do the data transformations quickly so we can turn a refreshed model around Lastly as we are developing a fraud platform, we need to keep in mind the engineers/developers that will be developing the fraud app it should be something that engineers enjoy to develop on When you have a feature/model/action repository its very easy to develop turn around fraud apps To help us balance these different needs we have our product owners to help bridge the gap
  5. 14 EC2s 6 m4.10xlarge for general minions 5 m4.2xlarge for kafka nodes 3 m4.large for masters Ansible to provision 200+ pods Flink apps in Java/Scala/Kotlin Microservices in Golang
  6. Holy smokes that’s a lot Zookeeper/Kafka/Flink/Nifi Kappa Architecture Kafka is our primary messaging bus throughout the platform Nifi is one of the tools we use to grab data from different sources in the company Flink does the calculations and applies needed transformations Minio/Istio to handle http communications throughout the platform EFK = ElasticSearch / FluentD / Kibana Docker logs Managed AWS service Influx / Prometheus / Grafana Metrics reporting and Dashboards Platform health Fraud health Drill / zeppelin / s3 for data analysts to view transactions Why are we switching from influx to prometheus
  7. Holy smokes that’s a lot Zookeeper/Kafka/Flink/Nifi Kafka is our primary messaging bus throughout the platform Nifi is one of the tools we use to grab data from different sources in the company Flink does the calculations and applies needed transformations Minio/Istio to handle http communications throughout the platform EFK = ElasticSearch / FluentD / Kibana Docker logs Managed AWS service Influx / Prometheus / Grafana Metrics reporting and Dashboards Platform health Fraud health Drill / zeppelin / s3 for data analysts to view transactions Why are we switching from influx to prometheus
  8. Kubernetes has been a challenge If a task manager goes down, it will auto-heal If your configurations are set up correctly you can just delete pods and they’ll come back Unless your configurations are completely fleshed out, the blast radius on failure can be rippling Situation where docker logs could not make it out to kubernetes logs because the docker machines were dying Developed internal tool for ci/cd and deployment
  9. Use cases tell us the resources they need and we provision them a flink cluster 1 Job Manager per cluster 5 Task Managers per cluster RocksDB backend Checkpoint/Savepoint persist on S3 Job Deployment Options
  10. Considerations People obviously don’t want to wait too long But we want to respond with the most data we have available on the customer
  11. Two data streams need to share state Data stream from online interactions / all other customer interactions Data stream that we receive from the branch Need to calculate Features Need to apply ML model Need to respond in real-time
  12. Developed in python, evaluating golang Developed internal tool for ci/cd and deployment
  13. Teller transactions have a real-time SLA Connected Streams is the culprit Break Up One Flink App into Smaller Flink Queryable State Apps Flink Apps as Functions Disparate Data Streams: Back Pressure In our case: we have all the account level activity for a given customer from one source and on the other we have the data from the teller machine Not all transactions are equal due to their source. However in a ML world we still want to examine every transaction Results in back pressure and uneven transaction flow
  14. Alvin for each data source Scurry of Alvins build out our feature repository Theodore builds his own features, adds on features for Alvin and the passes it down Why did we break Simon out? We can replace it with anything such as Seldon
  15. https://issues.apache.org/jira/browse/FLINK-7783 https://issues.apache.org/jira/browse/FLINK-8487