Detecting fraud with Python and machine learning

a talk
Ryan Wang (@ryw90)
If it weighs the same as a duck
Detecting fraud with Python and machine learning

Outline
• Why do we use machine learning?
• Overview of our pipeline
• What does it take to update a model?

What is Stripe?
• Collect payments viaAPI
• Most users charge credit cards
import stripe
stripe.Charge.create(
amount='100',
currency='usd',
source={
object='card',
number='4242 4242 4242 4242',
...
}
)

Things fraudsters do
• Typical fraudster buys stolen credit cards then:
• Creates fake Stripe accounts
• Buys goods from legitimate Stripe users
• Others test / brute force credentials

Witches easier to spot than fraud

Stopping fraud v1
• Manual rules and aggressive blacklisting
• Scaling issues
• Hard to control precision
• Complexity grows quickly
• Little generalization
• But important infrastructure built
• Tools for manual investigation
• Graph search

Stopping fraud v2
• Tree-based models to estimate p(fraud | features)
• Target composite outcome
• Disputes,
• Manual tags
• Information from card networks
• Python as glue

Qualita've*
feedback*
Feature*
engineering*
Model*
training*
Model*
evalua'on*
Model*
deployment*
In order of work required
• Model evaluation
• Feature engineering
• Model training
• Qualitative feedback
• Monitoring / deployment

What does it take to update a model?

Feature engineering aka counting stuff

Types of features
• Static features useful on the margin
• Card from risky country?
• Billing details consistent?
• Dynamic features really useful
• Velocity of charges from email recently?
• Utilize network information

Feature pipeline
• Slow Hadoop jobs compute features
• Sampling doesn’t really help
• Luigi manages dependencies
• Only re-run jobs with changes
• Load results to database
• http://www.github.com/spotify/luigi
Raw$
Charges$
Sta-c$
features$
Card$
features$
Email$
features$
Joined$
features$
Training$
Outcomes$

Feature pipeline (cont.)
@redshift('transactionfraud.features')
class JoinFeatures(luigi.WrapperTask):
def requires(self):
components = [
'static_features',
'dynamic_card_features',
'dynamic_email_features',
'outcomes',
]
return [FeatureTask(c) for c in components]
def job(self):
return ScaldingJob(
job='JoinFeatures',
output=self.output().path,
**self.requires()
)

Feature pipeline (cont.)
import com.twitter.scalding._
import com.stripe.thrift.Charge
class DynamicIpFeatures(args: Args) extends Job(args) {
val charges = load[Charge](args("charges"))
val historicalCounts = getHistoricalCounts(charges)
historicalCounts
.map { case (chargeId, counts) =>
IpFeatures(
chargeId = chargeId,
feature1 = counts.feature1,
feature2 = counts.feature2,
...
)
}
.save
}

Model debugging
• Added dynamic email features to model
• Velocity of charges from email recently?
• Quantitative measures good
• High feature importance
• Overall model performance improved
• Weird issues in staging
• Systematic false positives
• High velocity did not yield higher p(fraud)

Model debugging (cont.)
• Old fashioned data analysis reveals…
• Likelihood of fraud much higher when email undefined
than when defined
• p(fraud | email undefined) = ~14%
• p(fraud | email defined) = ~5%
• In other words, email missing “predictive” of fraud

• Email attribute of Customer
• If credit card declined during customer creation*,
fails with `CardError`
• Fraud correlated with decline, thus missing email
stripe.Customer.create(
source={
'object': 'card',
# Test card for declines
'number': '4000000000000002',
'exp_year': '2016',
'exp_month': 1,
}
)
* Not exactly accurate, as most users tokenize cards rather than creating customers with cards directly

• Apply this model on live traffic:
• Data is generated according to:
stripe.Customer.create.
Card.declined.
(correlated.with.fraud).
No.customer.
(customer.email).
A"empt'charge'
without'email'
P(fraud'|'no'email)'>>'
P(fraud'|'email)'
Model'blocks'
charge'

Model evaluation
• Topmodel
• Flask app that charts and organizes output
from binary classifiers
• Cross between a lab notebook and Kaggle
• Feedback / PRs appreciated!
• https://github.com/stripe/topmodel

Model evaluation (cont.)
• Regularly generate ground truth and
benchmarks existing models
• Newly trained models automatically compared
test_y, test_start, test_end =
topmodel_integration.retrieve_actuals(path)
test_X = query_to_df(
model.spec.sql_query()), test_start, test_end)
metadata = model.metadata()
results = model.score_and_format(test_y, test_X)
topmodel_integration.send_dataframe_to_s3(results, metadata)

Model evaluation (cont.)
• Maintaining reproducibility annoying
• Originally store pickled models on S3
• But wrapper code sometimes changes
• But sklearn sometimes changes

Summary
• Python glues together whole pipeline
• Adding a simple feature can be hard
• Spend a lot of time on feature
engineering, model evaluation

Detecting fraud with Python and machine learning

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (19)

Similar to Detecting fraud with Python and machine learning

Similar to Detecting fraud with Python and machine learning (20)

Recently uploaded

Recently uploaded (20)

Detecting fraud with Python and machine learning