SlideShare a Scribd company logo
1 of 39
Download to read offline
Leveraging Spark ML for
Real-Time Credit Card Approvals
Case study from a large financial Institution
Anand Venugopal
Saurabh Dutta
Impetus – StreamAnalytix
#Ent6SAIS
Agenda
• Use case background
• Existing system challenges and new goals
• Solution details and lessons learnt
• Q&A
#Ent6SAIS
Background - Customer
#Ent6SAIS
50M+ Credit cards
50+ countries ~4M per year
Background – Use Case
• Acquire legitimate, responsible customers
• Decision: Approve ? Credit Limit ? APR ?
• Sub-second response time to make a decision
#Ent6SAIS
Problem Statement
Submits
application
Prospective
Customer
Determining Card Eligibility
#Ent6SAIS
Logic
Execution
Decision +
Communication
Core logic: Risk scoring
• Estimate debt repayment risk
• To limit credit risk of the lender
• Ensure individual’s financial well being
#Ent6SAIS
Risk scoring factors
• History
• Credit usage
• Loans
• Other Credit cards
#Ent6SAIS
• Job type
• Income band
• Debt Ratio
• Credit Scores
Decision flow
Receive
Request
Call CB
REST API
Get
Acxiom
Data
Get SOW
Data
CB
Derivations
SOW
Derivations
Decision Line Price
Respond
#Ent6SAIS
Models
Risk Score: 0-1
Multiple model types
• Approve/ Decline
– Segment
– Geography
• Line
– Credit Limit
• Price
– APR
• Decision Tree
– Numerous instances
• Regression
• K-means
9#Ent6SAIS
Decision tree – Approve ? Y/N
1
2 3
4 5 6 7
Salary >= 50,000Salary < 50,000
Other Loans = Y
Other Loans = N
8 9
Debt Ratio < 0.7
Other Loans = Y
Other Loans = N
Debt Ratio > 0.7
#Ent6SAIS
Regression: Credit line ($)
$ 1500
$ 500
Creditlimit
Risk Model Score
#Ent6SAIS
Clustering model: Price
22% APR
#Ent6SAIS
18% APR
Existing system
• Built using traditional technologies
• Microsoft .NET stack
– C#
– MS SQL Server
#Ent6SAIS
Top challenges with existing system
• Everything on single box: not scalable, not flexible
• Model training on limited data: limits accuracy
• Data Scientists work in isolation: silo’ed tools
• Model management: manual and cumbersome
#Ent6SAIS
Primary goals for the new system
• Ease of use for stakeholders (self-service)
• Scale: Build models on huge datasets
• Fast decision response for the end-customer
• Unified, collaborative platform
• Data Lineage / Audit capability
#Ent6SAIS
Proposed tools
• Spark Streaming
• Spark ML
• Kafka
• HDFS
• HBase
• Visual Spark Platform - StreamAnalytix
#Ent6SAIS
Spark Streaming
• Write streaming jobs
• Extension of core Spark API
– Scalable
– High throughput
– Fault tolerance
• Receives input and divides into batches
#Ent6SAIS
Spark ML
• Spark’s Machine learning
module
– DataFrame-based API
• Algorithms
– Classification
– Regression
– Clustering
– Collaborative Filtering
#Ent6SAIS
• Utilities
– Feature Selection
– Feature Transformations
– Hyper Parameter Tuning
– Model Evaluation
– Linear Algebra, Statistics
Spark based architecture - Training
SparkHDFS
Spark ML
Model RepositoryTraining Data Source
#Ent6SAIS
HDFS
Model training pipeline
#Ent6SAIS
Read from HDFS
Model training pipeline
Data Validation
- Null checks
- Invalid Chars ♡⚐♯♣
#Ent6SAIS
Data Quality
Model training pipeline
Score = 200
Status = Approved
#Ent6SAIS
Eliminate Outliers
Model training pipeline
Mean
Median
Most Frequent
#Ent6SAIS
Name Age
Jack 37
Eva ?
Dirk 42
Impute missing values
Model based Imputation
Constant
Model training pipeline
Feature Selection
Transformation
Model Selection
Hyperparameters
#Ent6SAIS
Core logic of training model
Model Evaluation
#Ent6SAIS
Spark based architecture - Scoring
#Ent6SAIS
User
Session Kafka Spark Streaming
3rd Party
Providers
Bank’s internal
repository
YARN
ML
Models
HBase
Kafka
HDFS
Model scoring pipeline
#Ent6SAIS
Read from Kafka
Model scoring pipeline
External WS calls
#Ent6SAIS
Model scoring pipeline
Internal DB Lookup
#Ent6SAIS
Model scoring pipeline
Conditional Filter and model execution
#Ent6SAIS
Model scoring pipeline
Decision based on
model’s score
#Ent6SAIS
Model scoring pipeline
#Ent6SAIS
Rejected
Approved
Pending
Model scoring pipeline
Line & Price Models
#Ent6SAIS
Lineage/ Audit
#Ent6SAIS
The journey of a single data record through the scoring pipeline
Deployment
Transport Compute Storage Exploration
Kafka Spark
StreamAnalytix
HDFS + Hive BI Tools
- 2 Nodes with Sticky Session
- Load Balancer
- Zookeeper
- Tomcat
- MySQL
- RabbitMQ
#Ent6SAIS
Project Details
• Q4 2017
• 3 months from start to finish
• 3x faster than originally planned
• Team size: 4
• Apache Spark 2.1
• On-premise Hadoop Cluster with YARN
#Ent6SAIS
Learnings
• Consistent data format
• Add timeouts to third
party API calls
• Optimize stragglers
• Avoid excessive logging
#Ent6SAIS
• Checkpointing
• Outlier Analysis
– Using models
• Hyperparameter tuning +
Metric Evaluation
• Caching
– useNodeIdCache
Goals: Recap
• Ease of use for stakeholders (self-service)
• Scale: Build models on huge datasets
• Fast decision response for the end-customer
• Unified, collaborative platform
• Data Lineage / Audit capability
#Ent6SAIS
Q&A
Visit Impetus StreamAnalytix booth #209
#Ent6SAIS

More Related Content

Similar to Spark2

Data Science Out of The Box : Case Studies in the Telecommunication by Anand ...
Data Science Out of The Box : Case Studies in the Telecommunication by Anand ...Data Science Out of The Box : Case Studies in the Telecommunication by Anand ...
Data Science Out of The Box : Case Studies in the Telecommunication by Anand ...
Data Con LA
 
Quartesian capabilities-2013
Quartesian capabilities-2013Quartesian capabilities-2013
Quartesian capabilities-2013
Benjamin Jackson
 

Similar to Spark2 (20)

Enabling Better Clinical Operations through a Clinical Operations Store
Enabling Better Clinical Operations through a Clinical Operations StoreEnabling Better Clinical Operations through a Clinical Operations Store
Enabling Better Clinical Operations through a Clinical Operations Store
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
The Pursuit of Happiness: Building a Scalable Pipeline Using Apache Spark and...
The Pursuit of Happiness: Building a Scalable Pipeline Using Apache Spark and...The Pursuit of Happiness: Building a Scalable Pipeline Using Apache Spark and...
The Pursuit of Happiness: Building a Scalable Pipeline Using Apache Spark and...
 
Stream Analytics in the Enterprise
Stream Analytics in the EnterpriseStream Analytics in the Enterprise
Stream Analytics in the Enterprise
 
Engineering patterns for implementing data science models on big data platforms
Engineering patterns for implementing data science models on big data platformsEngineering patterns for implementing data science models on big data platforms
Engineering patterns for implementing data science models on big data platforms
 
Large Scale Data Analytics
Large Scale Data AnalyticsLarge Scale Data Analytics
Large Scale Data Analytics
 
The Hive Think Tank: AI in The Enterprise by Venkat Srinivasan
The Hive Think Tank: AI in The Enterprise by Venkat SrinivasanThe Hive Think Tank: AI in The Enterprise by Venkat Srinivasan
The Hive Think Tank: AI in The Enterprise by Venkat Srinivasan
 
Presentation by Meshlabs at Zensar #TechShowcase - An iSPIRT ProductNation in...
Presentation by Meshlabs at Zensar #TechShowcase - An iSPIRT ProductNation in...Presentation by Meshlabs at Zensar #TechShowcase - An iSPIRT ProductNation in...
Presentation by Meshlabs at Zensar #TechShowcase - An iSPIRT ProductNation in...
 
Tutorial on query auto completion
Tutorial on query auto completionTutorial on query auto completion
Tutorial on query auto completion
 
Tutorial on query auto-completion
Tutorial on query auto-completionTutorial on query auto-completion
Tutorial on query auto-completion
 
Data Science Out of The Box : Case Studies in the Telecommunication by Anand ...
Data Science Out of The Box : Case Studies in the Telecommunication by Anand ...Data Science Out of The Box : Case Studies in the Telecommunication by Anand ...
Data Science Out of The Box : Case Studies in the Telecommunication by Anand ...
 
Software and SaaS Key Metrics and Benchmarks
Software and SaaS Key Metrics and BenchmarksSoftware and SaaS Key Metrics and Benchmarks
Software and SaaS Key Metrics and Benchmarks
 
Delivering Aha Moments through Procurement Performance Analytics (part II)
Delivering Aha Moments through Procurement Performance Analytics (part II)Delivering Aha Moments through Procurement Performance Analytics (part II)
Delivering Aha Moments through Procurement Performance Analytics (part II)
 
Sai Konijeti Resume
Sai Konijeti ResumeSai Konijeti Resume
Sai Konijeti Resume
 
Helping B2B markerters to find more waldos
Helping B2B markerters to find more waldosHelping B2B markerters to find more waldos
Helping B2B markerters to find more waldos
 
Workshop: Make the Most of Customer Data Platforms - David Raab
Workshop: Make the Most of Customer Data Platforms - David RaabWorkshop: Make the Most of Customer Data Platforms - David Raab
Workshop: Make the Most of Customer Data Platforms - David Raab
 
Accelerating Insight - Smart Data Lake Customer Success Stories
Accelerating Insight - Smart Data Lake Customer Success StoriesAccelerating Insight - Smart Data Lake Customer Success Stories
Accelerating Insight - Smart Data Lake Customer Success Stories
 
ReStream: Accelerating Backtesting and Stream Replay with Serial-Equivalent P...
ReStream: Accelerating Backtesting and Stream Replay with Serial-Equivalent P...ReStream: Accelerating Backtesting and Stream Replay with Serial-Equivalent P...
ReStream: Accelerating Backtesting and Stream Replay with Serial-Equivalent P...
 
Quartesian capabilities-2013
Quartesian capabilities-2013Quartesian capabilities-2013
Quartesian capabilities-2013
 
Data Strategy
Data StrategyData Strategy
Data Strategy
 

More from poovarasu maniandan (12)

Spark7
Spark7Spark7
Spark7
 
Spark4
Spark4Spark4
Spark4
 
Spark3
Spark3Spark3
Spark3
 
Ml3
Ml3Ml3
Ml3
 
Ml8
Ml8Ml8
Ml8
 
Ml2
Ml2Ml2
Ml2
 
Ml7
Ml7Ml7
Ml7
 
Ml5
Ml5Ml5
Ml5
 
Blue arm
Blue armBlue arm
Blue arm
 
Literature survey
Literature surveyLiterature survey
Literature survey
 
Home security system using internet of things
Home security system using internet of thingsHome security system using internet of things
Home security system using internet of things
 
rescue robot
rescue robotrescue robot
rescue robot
 

Recently uploaded

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 

Recently uploaded (20)

How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 

Spark2