SlideShare a Scribd company logo
a talk
Ryan Wang (@ryw90)
If it weighs the same as a duck
Detecting fraud with Python and machine learning
Outline
• Why do we use machine learning?
• Overview of our pipeline
• What does it take to update a model?
What is Stripe?
• Collect payments viaAPI
• Most users charge credit cards
import stripe
stripe.Charge.create(
amount='100',
currency='usd',
source={
object='card',
number='4242 4242 4242 4242',
...
}
)
Things fraudsters do
• Typical fraudster buys stolen credit cards then:
• Creates fake Stripe accounts
• Buys goods from legitimate Stripe users
• Others test / brute force credentials
Witches easier to spot than fraud
Stopping fraud v1
• Manual rules and aggressive blacklisting
• Scaling issues
• Hard to control precision
• Complexity grows quickly
• Little generalization
• But important infrastructure built
• Tools for manual investigation
• Graph search
Stopping fraud v2
• Tree-based models to estimate p(fraud | features)
• Target composite outcome
• Disputes,
• Manual tags
• Information from card networks
• Python as glue
Qualita've*
feedback*
Feature*
engineering*
Model*
training*
Model*
evalua'on*
Model*
deployment*
In order of work required
• Model evaluation
• Feature engineering
• Model training
• Qualitative feedback
• Monitoring / deployment
What does it take to update a model?
Feature engineering aka counting stuff
Types of features
• Static features useful on the margin
• Card from risky country?
• Billing details consistent?
• Dynamic features really useful
• Velocity of charges from email recently?
• Utilize network information
Feature pipeline
• Slow Hadoop jobs compute features
• Sampling doesn’t really help
• Luigi manages dependencies
• Only re-run jobs with changes
• Load results to database
• http://www.github.com/spotify/luigi
Raw$
Charges$
Sta-c$
features$
Card$
features$
Email$
features$
Joined$
features$
Training$
Outcomes$
Feature pipeline (cont.)
@redshift('transactionfraud.features')
class JoinFeatures(luigi.WrapperTask):
def requires(self):
components = [
'static_features',
'dynamic_card_features',
'dynamic_email_features',
'outcomes',
]
return [FeatureTask(c) for c in components]
def job(self):
return ScaldingJob(
job='JoinFeatures',
output=self.output().path,
**self.requires()
)
Feature pipeline (cont.)
import com.twitter.scalding._
import com.stripe.thrift.Charge
class DynamicIpFeatures(args: Args) extends Job(args) {
val charges = load[Charge](args("charges"))
val historicalCounts = getHistoricalCounts(charges)
historicalCounts
.map { case (chargeId, counts) =>
IpFeatures(
chargeId = chargeId,
feature1 = counts.feature1,
feature2 = counts.feature2,
...
)
}
.save
}
The curious case of email
Model debugging
• Added dynamic email features to model
• Velocity of charges from email recently?
• Quantitative measures good
• High feature importance
• Overall model performance improved
• Weird issues in staging
• Systematic false positives
• High velocity did not yield higher p(fraud)
Model debugging (cont.)
• Old fashioned data analysis reveals…
• Likelihood of fraud much higher when email undefined
than when defined
• p(fraud | email undefined) = ~14%
• p(fraud | email defined) = ~5%
• In other words, email missing “predictive” of fraud
Model debugging (cont.)
• Email attribute of Customer
• If credit card declined during customer creation*,
fails with `CardError`
• Fraud correlated with decline, thus missing email
stripe.Customer.create(
source={
'object': 'card',
# Test card for declines
'number': '4000000000000002',
'exp_year': '2016',
'exp_month': 1,
}
)
* Not exactly accurate, as most users tokenize cards rather than creating customers with cards directly
• Apply this model on live traffic:
Model debugging (cont.)
• Data is generated according to:
stripe.Customer.create.
Card.declined.
(correlated.with.fraud).
No.customer.
(customer.email).
A"empt'charge'
without'email'
P(fraud'|'no'email)'>>'
P(fraud'|'email)'
Model'blocks'
charge'
Is the model any good?
Model evaluation
• Topmodel
• Flask app that charts and organizes output
from binary classifiers
• Cross between a lab notebook and Kaggle
• Feedback / PRs appreciated!
• https://github.com/stripe/topmodel
Model evaluation (cont.)
• Regularly generate ground truth and
benchmarks existing models
• Newly trained models automatically compared
test_y, test_start, test_end = 
topmodel_integration.retrieve_actuals(path)
test_X = query_to_df(
model.spec.sql_query()), test_start, test_end)
metadata = model.metadata()
results = model.score_and_format(test_y, test_X)
topmodel_integration.send_dataframe_to_s3(results, metadata)
Model evaluation (cont.)
• Maintaining reproducibility annoying
• Originally store pickled models on S3
• But wrapper code sometimes changes
• But sklearn sometimes changes
Summary
• Python glues together whole pipeline
• Adding a simple feature can be hard
• Spend a lot of time on feature
engineering, model evaluation
Questions?

More Related Content

What's hot

Credit card fraud detection using machine learning Algorithms
Credit card fraud detection using machine learning AlgorithmsCredit card fraud detection using machine learning Algorithms
Credit card fraud detection using machine learning Algorithms
ankit panigrahy
 
Credit card fraud detection through machine learning
Credit card fraud detection through machine learningCredit card fraud detection through machine learning
Credit card fraud detection through machine learning
dataalcott
 
Real-Time Fraud Detection in Payment Transactions
Real-Time Fraud Detection in Payment TransactionsReal-Time Fraud Detection in Payment Transactions
Real-Time Fraud Detection in Payment Transactions
Christian Gügi
 
Credit card fraud dection
Credit card fraud dectionCredit card fraud dection
Credit card fraud dection
Sonali Birajadar
 
Credit card fraud detection methods using Data-mining.pptx (2)
Credit card fraud detection methods using Data-mining.pptx (2)Credit card fraud detection methods using Data-mining.pptx (2)
Credit card fraud detection methods using Data-mining.pptx (2)
k.surya kumar
 
Artificial Intelligence for Banking Fraud Prevention
Artificial Intelligence for Banking Fraud PreventionArtificial Intelligence for Banking Fraud Prevention
Artificial Intelligence for Banking Fraud Prevention
Jérôme Kehrli
 
Credit card fraud detection
Credit card fraud detectionCredit card fraud detection
Credit card fraud detection
vineeta vineeta
 
Fraud detection
Fraud detectionFraud detection
Credit card fraud detection pptx (1) (1)
Credit card fraud detection pptx (1) (1)Credit card fraud detection pptx (1) (1)
Credit card fraud detection pptx (1) (1)
ajmal anbu
 
Credit card fraud detection
Credit card fraud detectionCredit card fraud detection
Credit card fraud detection
anthonytaylor01
 
Deep Learning for Fraud Detection
Deep Learning for Fraud DetectionDeep Learning for Fraud Detection
Deep Learning for Fraud Detection
DataWorks Summit/Hadoop Summit
 
Credit Card Fraud Detection Using Unsupervised Machine Learning Algorithms
Credit Card Fraud Detection Using Unsupervised Machine Learning AlgorithmsCredit Card Fraud Detection Using Unsupervised Machine Learning Algorithms
Credit Card Fraud Detection Using Unsupervised Machine Learning Algorithms
Hariteja Bodepudi
 
Credit Card Fraud Detection Tutorial
Credit Card Fraud Detection TutorialCredit Card Fraud Detection Tutorial
Credit Card Fraud Detection Tutorial
KNIMESlides
 
Online Payment Fraud Detection with Azure Machine Learning
Online Payment Fraud Detection with Azure Machine LearningOnline Payment Fraud Detection with Azure Machine Learning
Online Payment Fraud Detection with Azure Machine Learning
Stefano Tempesta
 
Fraud Detection
Fraud DetectionFraud Detection
Fraud Detection
Prashanth Vajjhala
 
Credit card fraud detection
Credit card fraud detectionCredit card fraud detection
Credit card fraud detectionkalpesh1908
 
CREDIT CARD FRAUD DETECTION
CREDIT CARD FRAUD DETECTION CREDIT CARD FRAUD DETECTION
CREDIT CARD FRAUD DETECTION
K Srinivas Rao
 
Fraud Detection presentation
Fraud Detection presentationFraud Detection presentation
Fraud Detection presentation
Hernan Huwyler
 
credit card fraud detection
credit card fraud detectioncredit card fraud detection
credit card fraud detection
jagan477830
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
Hitesh Mohapatra
 

What's hot (20)

Credit card fraud detection using machine learning Algorithms
Credit card fraud detection using machine learning AlgorithmsCredit card fraud detection using machine learning Algorithms
Credit card fraud detection using machine learning Algorithms
 
Credit card fraud detection through machine learning
Credit card fraud detection through machine learningCredit card fraud detection through machine learning
Credit card fraud detection through machine learning
 
Real-Time Fraud Detection in Payment Transactions
Real-Time Fraud Detection in Payment TransactionsReal-Time Fraud Detection in Payment Transactions
Real-Time Fraud Detection in Payment Transactions
 
Credit card fraud dection
Credit card fraud dectionCredit card fraud dection
Credit card fraud dection
 
Credit card fraud detection methods using Data-mining.pptx (2)
Credit card fraud detection methods using Data-mining.pptx (2)Credit card fraud detection methods using Data-mining.pptx (2)
Credit card fraud detection methods using Data-mining.pptx (2)
 
Artificial Intelligence for Banking Fraud Prevention
Artificial Intelligence for Banking Fraud PreventionArtificial Intelligence for Banking Fraud Prevention
Artificial Intelligence for Banking Fraud Prevention
 
Credit card fraud detection
Credit card fraud detectionCredit card fraud detection
Credit card fraud detection
 
Fraud detection
Fraud detectionFraud detection
Fraud detection
 
Credit card fraud detection pptx (1) (1)
Credit card fraud detection pptx (1) (1)Credit card fraud detection pptx (1) (1)
Credit card fraud detection pptx (1) (1)
 
Credit card fraud detection
Credit card fraud detectionCredit card fraud detection
Credit card fraud detection
 
Deep Learning for Fraud Detection
Deep Learning for Fraud DetectionDeep Learning for Fraud Detection
Deep Learning for Fraud Detection
 
Credit Card Fraud Detection Using Unsupervised Machine Learning Algorithms
Credit Card Fraud Detection Using Unsupervised Machine Learning AlgorithmsCredit Card Fraud Detection Using Unsupervised Machine Learning Algorithms
Credit Card Fraud Detection Using Unsupervised Machine Learning Algorithms
 
Credit Card Fraud Detection Tutorial
Credit Card Fraud Detection TutorialCredit Card Fraud Detection Tutorial
Credit Card Fraud Detection Tutorial
 
Online Payment Fraud Detection with Azure Machine Learning
Online Payment Fraud Detection with Azure Machine LearningOnline Payment Fraud Detection with Azure Machine Learning
Online Payment Fraud Detection with Azure Machine Learning
 
Fraud Detection
Fraud DetectionFraud Detection
Fraud Detection
 
Credit card fraud detection
Credit card fraud detectionCredit card fraud detection
Credit card fraud detection
 
CREDIT CARD FRAUD DETECTION
CREDIT CARD FRAUD DETECTION CREDIT CARD FRAUD DETECTION
CREDIT CARD FRAUD DETECTION
 
Fraud Detection presentation
Fraud Detection presentationFraud Detection presentation
Fraud Detection presentation
 
credit card fraud detection
credit card fraud detectioncredit card fraud detection
credit card fraud detection
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
 

Viewers also liked

Fraud Detection Architecture
Fraud Detection ArchitectureFraud Detection Architecture
Fraud Detection Architecture
Gwen (Chen) Shapira
 
PayPal's Fraud Detection with Deep Learning in H2O World 2014
PayPal's Fraud Detection with Deep Learning in H2O World 2014PayPal's Fraud Detection with Deep Learning in H2O World 2014
PayPal's Fraud Detection with Deep Learning in H2O World 2014
Sri Ambati
 
ACFE Presentation on Analytics for Fraud Detection and Mitigation
ACFE Presentation on Analytics for Fraud Detection and MitigationACFE Presentation on Analytics for Fraud Detection and Mitigation
ACFE Presentation on Analytics for Fraud Detection and Mitigation
Scott Mongeau
 
Using Deep Learning to do Real-Time Scoring in Practical Applications
Using Deep Learning to do Real-Time Scoring in Practical ApplicationsUsing Deep Learning to do Real-Time Scoring in Practical Applications
Using Deep Learning to do Real-Time Scoring in Practical Applications
Greg Makowski
 
Anomaly detection in deep learning
Anomaly detection in deep learningAnomaly detection in deep learning
Anomaly detection in deep learning
Adam Gibson
 
Fraud Analytics with Machine Learning and Big Data Engineering for Telecom
Fraud Analytics with Machine Learning and Big Data Engineering for TelecomFraud Analytics with Machine Learning and Big Data Engineering for Telecom
Fraud Analytics with Machine Learning and Big Data Engineering for Telecom
Sudarson Roy Pratihar
 
Bigdata based fraud detection
Bigdata based fraud detectionBigdata based fraud detection
Bigdata based fraud detection
Mk Kim
 
Big Data Application Architectures - Fraud Detection
Big Data Application Architectures - Fraud DetectionBig Data Application Architectures - Fraud Detection
Big Data Application Architectures - Fraud Detection
DataWorks Summit/Hadoop Summit
 
AWS re:Invent 2016: Fraud Detection with Amazon Machine Learning on AWS (FIN301)
AWS re:Invent 2016: Fraud Detection with Amazon Machine Learning on AWS (FIN301)AWS re:Invent 2016: Fraud Detection with Amazon Machine Learning on AWS (FIN301)
AWS re:Invent 2016: Fraud Detection with Amazon Machine Learning on AWS (FIN301)
Amazon Web Services
 
Hadoop BIG Data - Fraud Detection with Real-Time Analytics
Hadoop BIG Data - Fraud Detection with Real-Time AnalyticsHadoop BIG Data - Fraud Detection with Real-Time Analytics
Hadoop BIG Data - Fraud Detection with Real-Time Analytics
hkbhadraa
 
Fraud in the Banking Sector
Fraud in the Banking Sector Fraud in the Banking Sector
Fraud in the Banking Sector Venktesh Venke
 
Presentation on fraud prevention, detection & control
Presentation on fraud prevention, detection & controlPresentation on fraud prevention, detection & control
Presentation on fraud prevention, detection & control
Dominic Sroda Korkoryi
 
SnapChat Resume
SnapChat ResumeSnapChat Resume
SnapChat Resume
Matt Charney
 
Exploring Machine Learning in Python with Scikit-Learn
Exploring Machine Learning in Python with Scikit-LearnExploring Machine Learning in Python with Scikit-Learn
Exploring Machine Learning in Python with Scikit-Learn
Kan Ouivirach, Ph.D.
 
Tensor flow
Tensor flowTensor flow
Tensor flow
Alexander Sanchez
 
Masters thesis - Fraud & Big Data
Masters thesis - Fraud & Big DataMasters thesis - Fraud & Big Data
Masters thesis - Fraud & Big Data
Stephanie Canovas
 
VMware vSphere Vs. Microsoft Hyper-V: A Technical Analysis
VMware vSphere Vs. Microsoft Hyper-V: A Technical AnalysisVMware vSphere Vs. Microsoft Hyper-V: A Technical Analysis
VMware vSphere Vs. Microsoft Hyper-V: A Technical Analysis
Corporate Technologies
 
Operations Management Suite, the Penguins and the others
Operations Management Suite, the Penguins and the othersOperations Management Suite, the Penguins and the others
Operations Management Suite, the Penguins and the others
Christian Heitkamp
 
OMS Overview
OMS OverviewOMS Overview
OMS Overview
Jan Van Meirvenne
 

Viewers also liked (19)

Fraud Detection Architecture
Fraud Detection ArchitectureFraud Detection Architecture
Fraud Detection Architecture
 
PayPal's Fraud Detection with Deep Learning in H2O World 2014
PayPal's Fraud Detection with Deep Learning in H2O World 2014PayPal's Fraud Detection with Deep Learning in H2O World 2014
PayPal's Fraud Detection with Deep Learning in H2O World 2014
 
ACFE Presentation on Analytics for Fraud Detection and Mitigation
ACFE Presentation on Analytics for Fraud Detection and MitigationACFE Presentation on Analytics for Fraud Detection and Mitigation
ACFE Presentation on Analytics for Fraud Detection and Mitigation
 
Using Deep Learning to do Real-Time Scoring in Practical Applications
Using Deep Learning to do Real-Time Scoring in Practical ApplicationsUsing Deep Learning to do Real-Time Scoring in Practical Applications
Using Deep Learning to do Real-Time Scoring in Practical Applications
 
Anomaly detection in deep learning
Anomaly detection in deep learningAnomaly detection in deep learning
Anomaly detection in deep learning
 
Fraud Analytics with Machine Learning and Big Data Engineering for Telecom
Fraud Analytics with Machine Learning and Big Data Engineering for TelecomFraud Analytics with Machine Learning and Big Data Engineering for Telecom
Fraud Analytics with Machine Learning and Big Data Engineering for Telecom
 
Bigdata based fraud detection
Bigdata based fraud detectionBigdata based fraud detection
Bigdata based fraud detection
 
Big Data Application Architectures - Fraud Detection
Big Data Application Architectures - Fraud DetectionBig Data Application Architectures - Fraud Detection
Big Data Application Architectures - Fraud Detection
 
AWS re:Invent 2016: Fraud Detection with Amazon Machine Learning on AWS (FIN301)
AWS re:Invent 2016: Fraud Detection with Amazon Machine Learning on AWS (FIN301)AWS re:Invent 2016: Fraud Detection with Amazon Machine Learning on AWS (FIN301)
AWS re:Invent 2016: Fraud Detection with Amazon Machine Learning on AWS (FIN301)
 
Hadoop BIG Data - Fraud Detection with Real-Time Analytics
Hadoop BIG Data - Fraud Detection with Real-Time AnalyticsHadoop BIG Data - Fraud Detection with Real-Time Analytics
Hadoop BIG Data - Fraud Detection with Real-Time Analytics
 
Fraud in the Banking Sector
Fraud in the Banking Sector Fraud in the Banking Sector
Fraud in the Banking Sector
 
Presentation on fraud prevention, detection & control
Presentation on fraud prevention, detection & controlPresentation on fraud prevention, detection & control
Presentation on fraud prevention, detection & control
 
SnapChat Resume
SnapChat ResumeSnapChat Resume
SnapChat Resume
 
Exploring Machine Learning in Python with Scikit-Learn
Exploring Machine Learning in Python with Scikit-LearnExploring Machine Learning in Python with Scikit-Learn
Exploring Machine Learning in Python with Scikit-Learn
 
Tensor flow
Tensor flowTensor flow
Tensor flow
 
Masters thesis - Fraud & Big Data
Masters thesis - Fraud & Big DataMasters thesis - Fraud & Big Data
Masters thesis - Fraud & Big Data
 
VMware vSphere Vs. Microsoft Hyper-V: A Technical Analysis
VMware vSphere Vs. Microsoft Hyper-V: A Technical AnalysisVMware vSphere Vs. Microsoft Hyper-V: A Technical Analysis
VMware vSphere Vs. Microsoft Hyper-V: A Technical Analysis
 
Operations Management Suite, the Penguins and the others
Operations Management Suite, the Penguins and the othersOperations Management Suite, the Penguins and the others
Operations Management Suite, the Penguins and the others
 
OMS Overview
OMS OverviewOMS Overview
OMS Overview
 

Similar to Detecting fraud with Python and machine learning

Hack in Cash out OWASP London
Hack in Cash out OWASP LondonHack in Cash out OWASP London
Hack in Cash out OWASP London
Payment Village
 
Low Latency Fraud Detection & Prevention
Low Latency Fraud Detection & PreventionLow Latency Fraud Detection & Prevention
Low Latency Fraud Detection & Prevention
Sid Anand
 
Creating an In-Aisle Purchasing System from Scratch
Creating an In-Aisle Purchasing System from ScratchCreating an In-Aisle Purchasing System from Scratch
Creating an In-Aisle Purchasing System from Scratch
Jonathan LeBlanc
 
Save lockbox fees with remit data capture AI | Lockbox Automation Software | ...
Save lockbox fees with remit data capture AI | Lockbox Automation Software | ...Save lockbox fees with remit data capture AI | Lockbox Automation Software | ...
Save lockbox fees with remit data capture AI | Lockbox Automation Software | ...
Emagia
 
Lockbox and remittance data extraction with ai
Lockbox and remittance data extraction with aiLockbox and remittance data extraction with ai
Lockbox and remittance data extraction with ai
Emagia
 
EAC-VB2023.pdf
EAC-VB2023.pdfEAC-VB2023.pdf
EAC-VB2023.pdf
ssuserb29f84
 
Share Credit_Card_Fraud_Detection_ML_MP (1).pptx
Share Credit_Card_Fraud_Detection_ML_MP (1).pptxShare Credit_Card_Fraud_Detection_ML_MP (1).pptx
Share Credit_Card_Fraud_Detection_ML_MP (1).pptx
yatintaneja6
 
The Target Breach – Follow The Money
The Target Breach – Follow The MoneyThe Target Breach – Follow The Money
The Target Breach – Follow The Money
Resilient Systems
 
Review on Fraud Detection in Electronic Payment Gateway
Review on Fraud Detection in Electronic Payment GatewayReview on Fraud Detection in Electronic Payment Gateway
Review on Fraud Detection in Electronic Payment Gateway
IRJET Journal
 
Netmera_Presentation.pdf
Netmera_Presentation.pdfNetmera_Presentation.pdf
Netmera_Presentation.pdf
Mustafa Kuğu
 
Ch 7: Attacking Session Management
Ch 7: Attacking Session ManagementCh 7: Attacking Session Management
Ch 7: Attacking Session Management
Sam Bowne
 
Email_Account_Compromise_VB_2023_Final 2.pdf
Email_Account_Compromise_VB_2023_Final 2.pdfEmail_Account_Compromise_VB_2023_Final 2.pdf
Email_Account_Compromise_VB_2023_Final 2.pdf
Fahim392515
 
AI_finance_Module-3.pptx
AI_finance_Module-3.pptxAI_finance_Module-3.pptx
AI_finance_Module-3.pptx
ShivamMishra977127
 
Fraud prevention is better with TigerGraph inside
Fraud prevention is better with  TigerGraph insideFraud prevention is better with  TigerGraph inside
Fraud prevention is better with TigerGraph inside
TigerGraph
 
Credit Card Fraud Detection project.pptx
Credit Card Fraud Detection project.pptxCredit Card Fraud Detection project.pptx
Credit Card Fraud Detection project.pptx
sanjivaniahire31
 
Abidin, zainal IBM Software "Data is a New Oil"
Abidin, zainal  IBM Software "Data is a New Oil"Abidin, zainal  IBM Software "Data is a New Oil"
Abidin, zainal IBM Software "Data is a New Oil"
Zainal Abidin
 
How Eastern Bank Uses Big Data to Better Serve and Protect its Customers
How Eastern Bank Uses Big Data to Better Serve and Protect its CustomersHow Eastern Bank Uses Big Data to Better Serve and Protect its Customers
How Eastern Bank Uses Big Data to Better Serve and Protect its Customers
Brian Griffith
 
Technical Challenges Facing e-Payment
Technical Challenges Facing e-PaymentTechnical Challenges Facing e-Payment
Technical Challenges Facing e-Payment
Fadi Aljabali , PMP , PMI-ACP
 
Micro-service architectures with Gilmour
Micro-service architectures with GilmourMicro-service architectures with Gilmour
Micro-service architectures with Gilmour
Aditya Godbole
 
The Target Breach - Follow The Money EU
The Target Breach - Follow The Money EUThe Target Breach - Follow The Money EU
The Target Breach - Follow The Money EU
Resilient Systems
 

Similar to Detecting fraud with Python and machine learning (20)

Hack in Cash out OWASP London
Hack in Cash out OWASP LondonHack in Cash out OWASP London
Hack in Cash out OWASP London
 
Low Latency Fraud Detection & Prevention
Low Latency Fraud Detection & PreventionLow Latency Fraud Detection & Prevention
Low Latency Fraud Detection & Prevention
 
Creating an In-Aisle Purchasing System from Scratch
Creating an In-Aisle Purchasing System from ScratchCreating an In-Aisle Purchasing System from Scratch
Creating an In-Aisle Purchasing System from Scratch
 
Save lockbox fees with remit data capture AI | Lockbox Automation Software | ...
Save lockbox fees with remit data capture AI | Lockbox Automation Software | ...Save lockbox fees with remit data capture AI | Lockbox Automation Software | ...
Save lockbox fees with remit data capture AI | Lockbox Automation Software | ...
 
Lockbox and remittance data extraction with ai
Lockbox and remittance data extraction with aiLockbox and remittance data extraction with ai
Lockbox and remittance data extraction with ai
 
EAC-VB2023.pdf
EAC-VB2023.pdfEAC-VB2023.pdf
EAC-VB2023.pdf
 
Share Credit_Card_Fraud_Detection_ML_MP (1).pptx
Share Credit_Card_Fraud_Detection_ML_MP (1).pptxShare Credit_Card_Fraud_Detection_ML_MP (1).pptx
Share Credit_Card_Fraud_Detection_ML_MP (1).pptx
 
The Target Breach – Follow The Money
The Target Breach – Follow The MoneyThe Target Breach – Follow The Money
The Target Breach – Follow The Money
 
Review on Fraud Detection in Electronic Payment Gateway
Review on Fraud Detection in Electronic Payment GatewayReview on Fraud Detection in Electronic Payment Gateway
Review on Fraud Detection in Electronic Payment Gateway
 
Netmera_Presentation.pdf
Netmera_Presentation.pdfNetmera_Presentation.pdf
Netmera_Presentation.pdf
 
Ch 7: Attacking Session Management
Ch 7: Attacking Session ManagementCh 7: Attacking Session Management
Ch 7: Attacking Session Management
 
Email_Account_Compromise_VB_2023_Final 2.pdf
Email_Account_Compromise_VB_2023_Final 2.pdfEmail_Account_Compromise_VB_2023_Final 2.pdf
Email_Account_Compromise_VB_2023_Final 2.pdf
 
AI_finance_Module-3.pptx
AI_finance_Module-3.pptxAI_finance_Module-3.pptx
AI_finance_Module-3.pptx
 
Fraud prevention is better with TigerGraph inside
Fraud prevention is better with  TigerGraph insideFraud prevention is better with  TigerGraph inside
Fraud prevention is better with TigerGraph inside
 
Credit Card Fraud Detection project.pptx
Credit Card Fraud Detection project.pptxCredit Card Fraud Detection project.pptx
Credit Card Fraud Detection project.pptx
 
Abidin, zainal IBM Software "Data is a New Oil"
Abidin, zainal  IBM Software "Data is a New Oil"Abidin, zainal  IBM Software "Data is a New Oil"
Abidin, zainal IBM Software "Data is a New Oil"
 
How Eastern Bank Uses Big Data to Better Serve and Protect its Customers
How Eastern Bank Uses Big Data to Better Serve and Protect its CustomersHow Eastern Bank Uses Big Data to Better Serve and Protect its Customers
How Eastern Bank Uses Big Data to Better Serve and Protect its Customers
 
Technical Challenges Facing e-Payment
Technical Challenges Facing e-PaymentTechnical Challenges Facing e-Payment
Technical Challenges Facing e-Payment
 
Micro-service architectures with Gilmour
Micro-service architectures with GilmourMicro-service architectures with Gilmour
Micro-service architectures with Gilmour
 
The Target Breach - Follow The Money EU
The Target Breach - Follow The Money EUThe Target Breach - Follow The Money EU
The Target Breach - Follow The Money EU
 

Recently uploaded

一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 

Recently uploaded (20)

一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 

Detecting fraud with Python and machine learning

  • 1. a talk Ryan Wang (@ryw90) If it weighs the same as a duck Detecting fraud with Python and machine learning
  • 2. Outline • Why do we use machine learning? • Overview of our pipeline • What does it take to update a model?
  • 3. What is Stripe? • Collect payments viaAPI • Most users charge credit cards import stripe stripe.Charge.create( amount='100', currency='usd', source={ object='card', number='4242 4242 4242 4242', ... } )
  • 4. Things fraudsters do • Typical fraudster buys stolen credit cards then: • Creates fake Stripe accounts • Buys goods from legitimate Stripe users • Others test / brute force credentials
  • 5. Witches easier to spot than fraud
  • 6. Stopping fraud v1 • Manual rules and aggressive blacklisting • Scaling issues • Hard to control precision • Complexity grows quickly • Little generalization • But important infrastructure built • Tools for manual investigation • Graph search
  • 7. Stopping fraud v2 • Tree-based models to estimate p(fraud | features) • Target composite outcome • Disputes, • Manual tags • Information from card networks • Python as glue
  • 8. Qualita've* feedback* Feature* engineering* Model* training* Model* evalua'on* Model* deployment* In order of work required • Model evaluation • Feature engineering • Model training • Qualitative feedback • Monitoring / deployment
  • 9. What does it take to update a model?
  • 10. Feature engineering aka counting stuff
  • 11. Types of features • Static features useful on the margin • Card from risky country? • Billing details consistent? • Dynamic features really useful • Velocity of charges from email recently? • Utilize network information
  • 12. Feature pipeline • Slow Hadoop jobs compute features • Sampling doesn’t really help • Luigi manages dependencies • Only re-run jobs with changes • Load results to database • http://www.github.com/spotify/luigi Raw$ Charges$ Sta-c$ features$ Card$ features$ Email$ features$ Joined$ features$ Training$ Outcomes$
  • 13. Feature pipeline (cont.) @redshift('transactionfraud.features') class JoinFeatures(luigi.WrapperTask): def requires(self): components = [ 'static_features', 'dynamic_card_features', 'dynamic_email_features', 'outcomes', ] return [FeatureTask(c) for c in components] def job(self): return ScaldingJob( job='JoinFeatures', output=self.output().path, **self.requires() )
  • 14. Feature pipeline (cont.) import com.twitter.scalding._ import com.stripe.thrift.Charge class DynamicIpFeatures(args: Args) extends Job(args) { val charges = load[Charge](args("charges")) val historicalCounts = getHistoricalCounts(charges) historicalCounts .map { case (chargeId, counts) => IpFeatures( chargeId = chargeId, feature1 = counts.feature1, feature2 = counts.feature2, ... ) } .save }
  • 15. The curious case of email
  • 16. Model debugging • Added dynamic email features to model • Velocity of charges from email recently? • Quantitative measures good • High feature importance • Overall model performance improved • Weird issues in staging • Systematic false positives • High velocity did not yield higher p(fraud)
  • 17. Model debugging (cont.) • Old fashioned data analysis reveals… • Likelihood of fraud much higher when email undefined than when defined • p(fraud | email undefined) = ~14% • p(fraud | email defined) = ~5% • In other words, email missing “predictive” of fraud
  • 18. Model debugging (cont.) • Email attribute of Customer • If credit card declined during customer creation*, fails with `CardError` • Fraud correlated with decline, thus missing email stripe.Customer.create( source={ 'object': 'card', # Test card for declines 'number': '4000000000000002', 'exp_year': '2016', 'exp_month': 1, } ) * Not exactly accurate, as most users tokenize cards rather than creating customers with cards directly
  • 19. • Apply this model on live traffic: Model debugging (cont.) • Data is generated according to: stripe.Customer.create. Card.declined. (correlated.with.fraud). No.customer. (customer.email). A"empt'charge' without'email' P(fraud'|'no'email)'>>' P(fraud'|'email)' Model'blocks' charge'
  • 20. Is the model any good?
  • 21. Model evaluation • Topmodel • Flask app that charts and organizes output from binary classifiers • Cross between a lab notebook and Kaggle • Feedback / PRs appreciated! • https://github.com/stripe/topmodel
  • 22.
  • 23. Model evaluation (cont.) • Regularly generate ground truth and benchmarks existing models • Newly trained models automatically compared test_y, test_start, test_end = topmodel_integration.retrieve_actuals(path) test_X = query_to_df( model.spec.sql_query()), test_start, test_end) metadata = model.metadata() results = model.score_and_format(test_y, test_X) topmodel_integration.send_dataframe_to_s3(results, metadata)
  • 24. Model evaluation (cont.) • Maintaining reproducibility annoying • Originally store pickled models on S3 • But wrapper code sometimes changes • But sklearn sometimes changes
  • 25. Summary • Python glues together whole pipeline • Adding a simple feature can be hard • Spend a lot of time on feature engineering, model evaluation