Flink Forward San Francisco 2018: Andrew Gao & Jeff Sharpe - "Finding Bad Acorns"

•Download as PPTX, PDF•

3 likes•3,076 views

Within fintech catching fraudsters is one of the primary opportunities for us to use streaming applications to apply ML models in real-time. This talk will be a review of our journey to bring fraud decisioning to our tellers at Capital One using Kafka, Flink and AWS Lambda. We will share our learnings and experiences to common problems such as custom windowing, breaking down a monolith app to small queryable state apps, feature engineering with Jython, dealing with back pressure from combining two disparate streams, model/feature validation in a regulatory environment, and running Flink jobs on Kubernetes.

Technology

FINDING BAD
ACORNS
ANDREW GAO
&
JEFF SHARPE
FLINK FORWARD 2018

Developing a Fraud
Defense Platform
Fraud Defense at the
Teller Using Flink
Our journey to build a Fraud Decisioning Platform and use
Flink to build out the use cases

OUR USERS
Fraud
Operator
Customer
Data
Scientist
Data
Analyst
Engineer
Product
Owner

PROS
• Community support for
Docker/Kube
• Resilient
• Easy to tear down and bring
back
• Maximizing resource efficiency
CONS
• Maintaining your own
Kubernetes solution
• Containing blast radius
• Edge cases when combining #
of technology solutions
Developing on Kubernetes has been challenging but very
rewarding

A FLINK MONOLITH
• Problem: Develop a stream processing workflow for
two legacy batch data sources
• First Attempt: Do everything in Flink and take
advantage of Flink Connected Streams

1
2 3
Using Flink operators to build our application workflow
4

PROS
• Cheap
• Not a lot of
Code/Config
• Scalability / Availability
• Deployments are a
breeze
CONS
• Not truly stateless
• Start-up time
AWS Lambda is a good fit for our use case and works well
with our underlying technologies

90 Day Storage Window
CUSTOM WINDOWS FOR OPTIMIZATION
AND PORTABILITY
30 Day Virtual View
90 Day Filtered View

CUSTOM WINDOWS FOR OPTIMIZATION
AND PORTABILITY
Most-Recent-Beyond-24-Hours Window
24 Hour Offset Dynamic Window

USING JYTHON TO BRIDGE THE GAP TO
DATA SCIENTISTS
Flink
Jython Adapter
.py .py .py .py
Windows
Data
Featur
e
Featur
e
Featur
e
Featur
e
Featur
e
Featur
e
Featur
e
Featur
e
.py .py .py .py
Data

GITFLOW AND JYTHON IMPROVE
TRACEABILITY
Featur
e JAR
v1.0.42
Junit
Tests
Pull
Request
Merge
Build
Develop Denied
Failed
Maven
Import
Junit
Tests
Build
Flink
Job
JAR
Commit

FEATURES EXIST TO FEED MODELS
FeatureFeature
Model Model Score
H20 Tensor Flow Seldon (whatever)

BREAKING UP THE MONOLITH
• Problem: Back Pressure leading to Delayed Transactions
• Solution: Break up the monolith Flink App into small Queryable State
Apps

•Connected Streams
•Flink Keyed State
•Checkpointing/Savepointing
•Queryable State
Features Used
•Flink Versioning (FLINK-7783, FLINK-8487)
•Keyed Source Function
•Kafka Offsets
Issues
We had a lot of fun and success using Flink, but not without a
few hiccups

What's hot

End to end Machine Learning using Kubeflow - Build, Train, Deploy and ManageAnimesh Singh

Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Flink Forward

Exactly-Once Financial Data Processing at Scale with Flink and PinotFlink Forward

OpenTelemetry For ArchitectsKevin Brockhoff

One sink to rule them all: Introducing the new Async SinkFlink Forward

Deploying Flink on Kubernetes - David AndersonVerverica

Distributed tracing using open tracing & jaeger 2Chandresh Pancholi

Dynamic Rule-based Real-time Market Data AlertsFlink Forward

Near real-time statistical modeling and anomaly detection using Flink!Flink Forward

Introduction to Apache Flinkdatamantra

Envoy and KafkaAdam Kotwasinski

Opentelemetry - From frontend to backendSebastian Poxhofer

Building a fully managed stream processing platform on Flink at scale for Lin...Flink Forward

Thomas Lamirault_Mohamed Amine Abdessemed -A brief history of time with Apac...Flink Forward

Combining logs, metrics, and traces for unified observabilityElasticsearch

Apache Flink and what it is used forAljoscha Krettek

Kafka Tutorial - Introduction to Apache Kafka (Part 1)Jean-Paul Azar

OpenTelemetry For OperatorsKevin Brockhoff

Introducing Change Data Capture with DebeziumChengKuan Gan

Anatomy of a Spring Boot App with Clean Architecture - Spring I/O 2023Steve Pember

What's hot (20)

End to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage

Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...

Exactly-Once Financial Data Processing at Scale with Flink and Pinot

OpenTelemetry For Architects

One sink to rule them all: Introducing the new Async Sink

Deploying Flink on Kubernetes - David Anderson

Distributed tracing using open tracing & jaeger 2

Dynamic Rule-based Real-time Market Data Alerts

Near real-time statistical modeling and anomaly detection using Flink!

Introduction to Apache Flink

Envoy and Kafka

Opentelemetry - From frontend to backend

Building a fully managed stream processing platform on Flink at scale for Lin...

Thomas Lamirault_Mohamed Amine Abdessemed -A brief history of time with Apac...

Combining logs, metrics, and traces for unified observability

Apache Flink and what it is used for

Kafka Tutorial - Introduction to Apache Kafka (Part 1)

OpenTelemetry For Operators

Introducing Change Data Capture with Debezium

Anatomy of a Spring Boot App with Clean Architecture - Spring I/O 2023

Similar to Flink Forward San Francisco 2018: Andrew Gao & Jeff Sharpe - "Finding Bad Acorns"

DevOps on Steroids Featuring Red Hat & Alantiss - Pop-up Loft Tel AvivAmazon Web Services

Webinar by ZNetLive & Plesk- Winning the Game for WebOps and DevOps ZNetLive

GE Capital Legacy Modernization and Mainframe Conversionguatham

Orchestrate Your End-to-end Mainframe Application Release PipelineDevOps.com

From 0 to DevOps in 80 Days [Webinar Replay]Dynatrace

Accelerate User Driven Innovation [Webinar]Dynatrace

Hybrid and Multi-Cloud Strategies for Kubernetes with GitOpsWeaveworks

Hybrid and Multi-Cloud Strategies for Kubernetes with GitOpsSonja Schweigert

DevOps adoption in the enterpriseSanjeev Sharma

Webinar: Capabilities, Confidence and Community – What Flux GA Means for YouWeaveworks

Open by DesignNimesh Bhatia

Top 5 benefits of dockerJohn Zaccone

Intro to GitOps with Weave GitOps, Flagger and LinkerdWeaveworks

Transform Digital Business with DevOpsDaniel Oh

Continuous testing for Agile and DevOps teamsLaurent PY

Securing Red Hat OpenShift Containerized Applications At Enterprise ScaleDevOps.com

Docker & aPaaS: Enterprise Innovation and Trends for 2015WaveMaker, Inc.

Continuous Deployment To The CloudMarcin Grzejszczak

IBM JavaOne Community Keynote 2017John Duimovich

Meetup Openshift Geneva 03/10MagaliDavidCruz

Similar to Flink Forward San Francisco 2018: Andrew Gao & Jeff Sharpe - "Finding Bad Acorns" (20)

DevOps on Steroids Featuring Red Hat & Alantiss - Pop-up Loft Tel Aviv

Webinar by ZNetLive & Plesk- Winning the Game for WebOps and DevOps

GE Capital Legacy Modernization and Mainframe Conversion

Orchestrate Your End-to-end Mainframe Application Release Pipeline

From 0 to DevOps in 80 Days [Webinar Replay]

Accelerate User Driven Innovation [Webinar]

Hybrid and Multi-Cloud Strategies for Kubernetes with GitOps

DevOps adoption in the enterprise

Webinar: Capabilities, Confidence and Community – What Flux GA Means for You

Open by Design

Top 5 benefits of docker

Intro to GitOps with Weave GitOps, Flagger and Linkerd

Transform Digital Business with DevOps

Continuous testing for Agile and DevOps teams

Securing Red Hat OpenShift Containerized Applications At Enterprise Scale

Docker & aPaaS: Enterprise Innovation and Trends for 2015

Continuous Deployment To The Cloud

IBM JavaOne Community Keynote 2017

Meetup Openshift Geneva 03/10

Recently uploaded

Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer

Scaling API-first – The story of a global engineering organizationRadu Cotescu

Finology Group – Insurtech Innovation Award 2024The Digital Insurer

Slack Application Development 101 Slidespraypatel2

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK

Automating Google Workspace (GWS) & more with Apps Scriptwesley chun

Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2

08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls

Developing An App To Navigate The Roads of BrazilV3cube

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

Partners Life - Insurer Innovation Award 2024The Digital Insurer

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

How to convert PDF to text with Nanonetsnaman860154

TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc

Salesforce Community Group Quito, Salesforce 101Paola De la Torre

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

Recently uploaded (20)

Tata AIG General Insurance Company - Insurer Innovation Award 2024

Scaling API-first – The story of a global engineering organization

Finology Group – Insurtech Innovation Award 2024

Slack Application Development 101 Slides

2024: Domino Containers - The Next Step. News from the Domino Container commu...

Unblocking The Main Thread Solving ANRs and Frozen Frames

Automating Google Workspace (GWS) & more with Apps Script

Exploring the Future Potential of AI-Enabled Smartphone Processors

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

Developing An App To Navigate The Roads of Brazil

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...

IAC 2024 - IA Fast Track to Search Focused AI Solutions

Partners Life - Insurer Innovation Award 2024

The Codex of Business Writing Software for Real-World Solutions 2.pptx

How to convert PDF to text with Nanonets

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

Salesforce Community Group Quito, Salesforce 101

Handwritten Text Recognition for manuscripts and early printed texts

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Flink Forward San Francisco 2018: Andrew Gao & Jeff Sharpe - "Finding Bad Acorns"

1. FINDING BAD ACORNS ANDREW GAO & JEFF SHARPE FLINK FORWARD 2018

2. ANDREW GAO JEFF SHARPE

3. Developing a Fraud Defense Platform Fraud Defense at the Teller Using Flink Our journey to build a Fraud Decisioning Platform and use Flink to build out the use cases

4. DEVELOPING A FRAUD DEFENSE PLATFORM

5. OUR USERS Fraud Operator Customer Data Scientist Data Analyst Engineer Product Owner

7. OUR USERS Fraud Operator Customer Data Scientist Data Analyst Engineer Product Owner

8. ARCHITECTURE DATA ACTIONS MAGIC!

9. RUNNING ON

10.

11. RUNNING ON

12. PROS • Community support for Docker/Kube • Resilient • Easy to tear down and bring back • Maximizing resource efficiency CONS • Maintaining your own Kubernetes solution • Containing blast radius • Edge cases when combining # of technology solutions Developing on Kubernetes has been challenging but very rewarding

13.

14. FRAUD DEFENSE AT THE TELLER

15.

16. A FLINK MONOLITH • Problem: Develop a stream processing workflow for two legacy batch data sources • First Attempt: Do everything in Flink and take advantage of Flink Connected Streams

17. 1 2 3 Using Flink operators to build our application workflow 4

18. PROS • Cheap • Not a lot of Code/Config • Scalability / Availability • Deployments are a breeze CONS • Not truly stateless • Start-up time AWS Lambda is a good fit for our use case and works well with our underlying technologies

19. 1 2 3 Using Flink operators to build our application workflow 4

20. 90 Day Storage Window CUSTOM WINDOWS FOR OPTIMIZATION AND PORTABILITY 30 Day Virtual View 90 Day Filtered View

21. CUSTOM WINDOWS FOR OPTIMIZATION AND PORTABILITY Most-Recent-Beyond-24-Hours Window 24 Hour Offset Dynamic Window

22. 1 2 3 Using Flink operators to build our application workflow 4

23. USING JYTHON TO BRIDGE THE GAP TO DATA SCIENTISTS Flink Jython Adapter .py .py .py .py Windows Data Featur e Featur e Featur e Featur e Featur e Featur e Featur e Featur e .py .py .py .py Data

24. GITFLOW AND JYTHON IMPROVE TRACEABILITY Featur e JAR v1.0.42 Junit Tests Pull Request Merge Build Develop Denied Failed Maven Import Junit Tests Build Flink Job JAR Commit

25. 1 2 3 Using Flink operators to build our application workflow 4

26. FEATURES EXIST TO FEED MODELS FeatureFeature Model Model Score H20 Tensor Flow Seldon (whatever)

27.

28. BREAKING UP THE MONOLITH • Problem: Back Pressure leading to Delayed Transactions • Solution: Break up the monolith Flink App into small Queryable State Apps

29. CHIPMUNKS

30. •Connected Streams •Flink Keyed State •Checkpointing/Savepointing •Queryable State Features Used •Flink Versioning (FLINK-7783, FLINK-8487) •Keyed Source Function •Kafka Offsets Issues We had a lot of fun and success using Flink, but not without a few hiccups

31. Developing a Fraud Defense Platform Fraud Defense at the Teller Using Flink Our journey to build a Fraud Decisioning Platform and use Flink to build out the use cases QUESTIONS?

Editor's Notes

Jeff Intro Andrew Intro We are part of the Forest teams(very high level intro) Kubernetes-based fraud decisioning platform that you can deploy multiple fraud use cases on With the goal of being able to rapidly spin up fraud apps Running in Production since September 2017
Our talk today: Talk briefly about our journey building out this Forest platform using Kubernetes as well as talk about how we used Flink with Kubernetes at a high level Then talk about a specific use case we have on the platform and do a deep dive on what’s inside our Flink app
Customers First If one day you take a look at your bank account and its empty However if your account was locked for no reason you would be upset This sense of balance between catching stopping fraud and providing a great customer experience is a common trend that we have to deal with If we wanted to stop fraud completely we could just stop letting people take their money On a similar note, we have a limited number of fraud operators Do not have the manpower to call every single person up and ask them Primary directive of the platform is to empower Data Scientists/ Data Analysts by building the tools on the platform to help create the models needed to make decisions This includes having access to all the data in a fast and easy-to-understand format Seeing how their models are performing, and whether the features are being calculated as expected When they need to refit the model they need to be able to do the data transformations quickly so we can turn a refreshed model around Lastly as we are developing a fraud platform, we need to keep in mind the engineers/developers that will be developing the fraud app it should be something that engineers enjoy to develop on When you have a feature/model/action repository its very easy to develop turn around fraud apps To help us balance these different needs we have our product owners to help bridge the gap
Customers First If one day you take a look at your bank account and its empty However if your account was locked for no reason you would be upset This sense of balance between catching stopping fraud and providing a great customer experience is a common trend that we have to deal with If we wanted to stop fraud completely we could just stop letting people take their money On a similar note, we have a limited number of fraud operators Do not have the manpower to call every single person up and ask them Primary directive of the platform is to empower Data Scientists/ Data Analysts by building the tools on the platform to help create the models needed to make decisions This includes having access to all the data in a fast and easy-to-understand format Seeing how their models are performing, and whether the features are being calculated as expected When they need to refit the model they need to be able to do the data transformations quickly so we can turn a refreshed model around Lastly as we are developing a fraud platform, we need to keep in mind the engineers/developers that will be developing the fraud app it should be something that engineers enjoy to develop on When you have a feature/model/action repository its very easy to develop turn around fraud apps To help us balance these different needs we have our product owners to help bridge the gap
14 EC2s 6 m4.10xlarge for general minions 5 m4.2xlarge for kafka nodes 3 m4.large for masters Ansible to provision 200+ pods Flink apps in Java/Scala/Kotlin Microservices in Golang
Holy smokes that’s a lot Zookeeper/Kafka/Flink/Nifi Kappa Architecture Kafka is our primary messaging bus throughout the platform Nifi is one of the tools we use to grab data from different sources in the company Flink does the calculations and applies needed transformations Minio/Istio to handle http communications throughout the platform EFK = ElasticSearch / FluentD / Kibana Docker logs Managed AWS service Influx / Prometheus / Grafana Metrics reporting and Dashboards Platform health Fraud health Drill / zeppelin / s3 for data analysts to view transactions Why are we switching from influx to prometheus
Holy smokes that’s a lot Zookeeper/Kafka/Flink/Nifi Kafka is our primary messaging bus throughout the platform Nifi is one of the tools we use to grab data from different sources in the company Flink does the calculations and applies needed transformations Minio/Istio to handle http communications throughout the platform EFK = ElasticSearch / FluentD / Kibana Docker logs Managed AWS service Influx / Prometheus / Grafana Metrics reporting and Dashboards Platform health Fraud health Drill / zeppelin / s3 for data analysts to view transactions Why are we switching from influx to prometheus
Kubernetes has been a challenge If a task manager goes down, it will auto-heal If your configurations are set up correctly you can just delete pods and they’ll come back Unless your configurations are completely fleshed out, the blast radius on failure can be rippling Situation where docker logs could not make it out to kubernetes logs because the docker machines were dying Developed internal tool for ci/cd and deployment
Use cases tell us the resources they need and we provision them a flink cluster 1 Job Manager per cluster 5 Task Managers per cluster RocksDB backend Checkpoint/Savepoint persist on S3 Job Deployment Options
Considerations People obviously don’t want to wait too long But we want to respond with the most data we have available on the customer
Two data streams need to share state Data stream from online interactions / all other customer interactions Data stream that we receive from the branch Need to calculate Features Need to apply ML model Need to respond in real-time
Developed in python, evaluating golang Developed internal tool for ci/cd and deployment
Teller transactions have a real-time SLA Connected Streams is the culprit Break Up One Flink App into Smaller Flink Queryable State Apps Flink Apps as Functions Disparate Data Streams: Back Pressure In our case: we have all the account level activity for a given customer from one source and on the other we have the data from the teller machine Not all transactions are equal due to their source. However in a ML world we still want to examine every transaction Results in back pressure and uneven transaction flow
Alvin for each data source Scurry of Alvins build out our feature repository Theodore builds his own features, adds on features for Alvin and the passes it down Why did we break Simon out? We can replace it with anything such as Seldon
https://issues.apache.org/jira/browse/FLINK-7783 https://issues.apache.org/jira/browse/FLINK-8487

Flink Forward San Francisco 2018: Andrew Gao & Jeff Sharpe - "Finding Bad Acorns"

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Flink Forward San Francisco 2018: Andrew Gao & Jeff Sharpe - "Finding Bad Acorns"

Similar to Flink Forward San Francisco 2018: Andrew Gao & Jeff Sharpe - "Finding Bad Acorns" (20)

More from Flink Forward

More from Flink Forward (20)

Recently uploaded

Recently uploaded (20)

Flink Forward San Francisco 2018: Andrew Gao & Jeff Sharpe - "Finding Bad Acorns"

Editor's Notes