In the Credit Card Companies, illegitimate credit card usage is a serious problem which results in a need to accurately detect fraudulent transactions vs non-fraudulent transactions. All organizations can be hugely impacted by fraud and fraudulent activities, especially those in financial services. The threat can originate from internal or external, but the effects can be devastating – including loss of consumer confidence, incarceration for those involved, even up to downfall of a corporation. Despite regular fraud prevention measures, these are constantly being put to the test in an attempt to beat the system.
Fraud detection is a task of predicting whether a card has been used by the cardholder. One of the methods to recognize fraud card usage is to leverage Machine Learning (ML) models. In order to more dynamically detect fraudulent transactions, one can train ML models on a set of dataset including credit card transaction information as well as card and demographic information of the owner of the account. This will be our goal of the project while leveraging Databricks.
2. Agenda
▪ Background
▪ Introduction
▪ Use Cases
▪ Deep Dive
▪ Machine Learning
▪ Databricks Advantages
▪ MLFlow Advantages
▪ Fraud Detection using ML
▪ Microservices & ML in Databricks
▪ Demo
3. Badrish Davay
● Tech Evangelist
● Love exploring data trends and
predicting outcomes
● Building data pipelines and ML
platforms
● Help data scientists use
seamless platforms for
execution
About Us
Neil Allen
● Big data fanatic
● Passionate about solving
problems with data-driven
solutions
● Exploring cutting edge
technology to provide great
outcomes
4. Maryam Esmaeilkhanian
● Big Data, ML and DL Enthusiast
● Passionate about implementing
data-driven, analytical ML
pipelines
● Designer and developer of
cutting edge ML solutions for
complex business problems
● Deriving insights from
multidisciplinary data in
financial domain.
Special Thanks
5. Julie Fergerson
CEO of Merchant Risk Council
“What happens in every economic
downturn is that the attacks start
to become more successful, so
over the next two to three years, I
fully expect credit card fraud
numbers to increase in a pretty
meaningful way.”
6. What is Credit Card Fraudulent Activity?
• Fraudulent activities are defined as any transactions not initiated or
otherwise authorized by the cardholder.
• Stolen credit card or credit card information.
• Customers will be charged for items that they did not purchase.
7. Acceleration of in-depth 6-W analysis: what, who, when,
where, why and what-if
▪ Perform in-depth analysis to answer
even the deepest analytical
questions.
▪ Leverage Machine Learning tools to
identify credit card fraud trends.
▪ Drill down into individual trends and
perform “what if analysis.”
8. Bringing together the full suite of model development,
execution, analysis and machine learning tools
Fraud Activity Visualization
● Visualization shown in milliseconds
Machine Learning
● Dynamic features and competing models
● Early detection
Monitoring
● Data quality and model health
What-if analysis
● Input, assumption, and scenario gaming to
answer common what-ifs
Collaboration
● Best in class collaboration abilities
End to end orchestration/execution
● Detect fraud on available features and trends
● Automatic real time detection
9. What is at stake?
Ease of Use
One stop shop for users to
train the models and
orchestrate the execution.
Real-time detection
Detect fraudulent activities in
real time..
Deep Analytics and Modeling
Leverage powerful machine
learning tools to perform
“what if” analysis and analyze
the newest hypotheses.
Security
Adhere to security and
compliance requirements of
company wide policies and
mandates.
Notification service
Notify card holder of detected
fraudulent transactions.
10. Impact of Fraud & Fraudulent Activities on
Financial Organizations
• The threat can originate from internal or external sources
• Effects of fraud and fraudulent activities on financial services:
• Loss of consumer confidence
• Incarceration for those involved
• Despite regular fraud prevention measures, these are constantly
being put to the test in an attempt to beat the system.
12. Use Case #1
▪ Cardholder swiped a credit card in a
gas station in New York
▪ The data of the card magnet is stolen
▪ Card is used from spatially far-off
location within a short amount of
time
▪ The prediction model will use spatial
and temporal features among others
to determine fraudulent activity
13. Use Case #2
▪ Credit card information is stolen
during online transaction
▪ After a period of time, some
transactions happen with irregular
amounts or during an irregular time
of the day
14. Use Case #3
▪ Credit card information is stolen
physically during a routine visit to a
grocery store, and it is not noticed by the
cardholder
▪ A very small transaction takes place a
short time later to ensure the purchase
will go through
▪ Later, a very large transaction will take
place
16. What is Machine Learning (ML)?
Machine Learning is an adaptive
algorithm that learns insights
automatically from datasets and
later predicts based on what it
already learned from the training
data.
Machine Learning
Unsupervised
Learning
Supervised
Learning
Regression
Classification
Binary
Classification
Multi-class
classification
Multi-label
classification
17. Machine Learning Workflow
Test the model
and Compare
results
● Evaluating the
models using
Sklearn and
compare the
results
Hyperparameters
Tuning
● Using grid
search/Hyperopt to
find the best
hyperparameters
Train the ML
model
● Using Sklearn to
train a classifier
● SVM, Decision
Tree, Random
Forest
Split data to train
and test
● Using Sklearn to
split the data to
train and test
dataset
18. How to detect fraudulent activities?
• Detecting whether a card has been used by the
cardholder
• One of the methods to recognize fraud:
• Leveraging Machine Learning (ML)
• A dataset can be used such as credit card transactions or demographic and personal information of the card holder
• Train ML models to predict if a transaction is fraud or not
• Select the best model based on the evaluation on the testing dataset
19. Imagine if we can make this happen ...
ML Model
Not Fraud
Fraud
20. Databricks Advantages
Databricks
for ML
Security
features
Private and
solitary workspace
One common
platform
Common platform
for any part of
pipeline
Track
experiments
automatically
MLFlow
Work on
projects
collaboratively
Share notebooks
with colleagues
Easy access
to computing
resources
Easily create a
cluster with
desired resource
21. Training and Evaluating Different ML models
Transaction
Transaction amount
Date and Time of the transaction
Merchant Category
Location
Employment Information
Employment status
Monthly average balance in bank account
SVM
Best Model
Credit Card Information
Credit Limit
Monthly average percentage of credit usage
Random Forest
Decision Tree
Training
Hyperparameters tuning
Data Processing &
transformation
Deploy the best
model
Find the best model by
evaluation
22. MLFlow Advantages for ML Life Cycle
1 3 5
2 4
Track ML Experiments
Input, output,
hyperparameters,
Plots, etc
Run Functions with
accessing to code
Run directly from
github repo
Serialize ML model
Deployable format
Mllib, Pickle,
Spark ML, etc.
Register your best
model easily
Deploy ML model
to an API endpoint
23. What is a microservice and why it is useful?
• Microservice is a gateway to a specific functional
aspect of an application/service.
• Advantages:
• Standardization
• Independently deployable
• Reusability
• Easily being composed
• Fault tolerance
24. Microservice Control Flow
ML Inference
Pipeline Fraud/Not Fraud/ error code
Notify client
Notify Bank
Transaction and
contextual information
Fraud
Not Fraud
Fraud/ Not Fraud Fraud/ Not Fraud
25. AWS in Databricks
Trainin
g ML
Model
Deploy
ML
Model Rest
Endpoint
Model
Serving
S3
Amazon SageMaker
Amazon EC2