SlideShare a Scribd company logo
Three lessons learned
from building a production
machine learning system
Michael Manapat
Stripe
@mlmanapat
Fraud
• Card numbers are stolen by hacking, malware, etc.
• “Dumps” are sold in “carding” forums
• Fraudsters use numbers in dumps to buy goods,
which they then resell
• Cardholders dispute transactions
• Merchant ends up bearing cost of fraud
• We train binary classifiers to predict fraud
• We use open source tools
• Scalding/Summingbird for feature generation
• scikit-learn for model training

(eventually: github.com/stripe/brushfire)
1
Don’t treat models as
black boxes
Early ML at Stripe
• Focused on training with more and more data and
adding more and more features
• Didn’t think much about
• ML algorithms (tuning hyperparameters, e.g.)
• The deeper reasons behind any particular set of
results
Substantial reduction in fraud rate
Product development
From a product standpoint:
• We were blocking high risk charges and surfacing
just the decision
• We wanted to provide Stripe users insight into our
actions—reasons for scores
Score reasons
X = 5, Y = 3: score = 0.1
Which feature is “driving” the score more?
X < 10
Y < 5 X < 15
0.1 (20) 0.3 (30) 0.5 (10) 0.9 (40)
True False
Score reasons
X = ?, Y = 3:

(20/70) * 0.1 + (10/70) * 0.5 + (40/70) * 0.9 = 0.61
Score Δ = |holdout - original| = |0.61 - 0.1| = 0.51
Now producing richer reasons with multiple predicates
X < ?
Y < 5 X < ?
0.1 (20) 0.3 (30) 0.5 (10) 0.9 (40)
Model introspection
If a model didn’t look good in validation, it wasn’t clear what
to do (besides trying more features/data)
What if we used our “score reasons” to debug model issues?
• Take all false positives (in validation data or in
production) and group by generated reason
• Were a substantial fraction of the false positives
driven by a few features?
• Did all the comparisons in the explanation
predicates make sense? (Were they comparisons a
human might make for fraud?)
• Our models were overfit!
Actioning insights
• Hyperparameter optimization
• Feature selection
Precision
Recall
Summary
• Don’t treat models as black boxes
• Thinking about the learning process (vs. just
features and data) can yield significant payoffs
• Tooling for introspection can accelerate model
development/“debugging”
Julia Evans, Alyssa Frazee, Erik Osheim, Sam Ritchie,
Jocelyn Ross, Tom Switzer
2
Have a plan for
counterfactual
evaluation
• December 31st, 2013
• Train a binary classifier for
disputes on data from Jan
1st to Sep 30th
• Validate on data from Oct
1st to Oct 31st (need to wait
~60 days for labels)
• Based on validation data, pick
a policy for actioning scores:

block if score > 50
Questions (1)
• Business complains about high false positive rate:
what would happen if we changed the policy to
"block if score > 70"?
• What are the production precision and recall of the
model?
• December 31st, 2014. We
repeat the exercise from a
year earlier
• Train a model on data
from Jan 1st to Sep 30th
• Validate on data from Oct
1st to Oct 31st (need to
wait ~60 days for labels)
• Validation results look
~ok (but not great)
• We put the model into
production and the results
are terrible
Questions (2)
• Why did the validation results for the new model
look so much worse?
• How do we know if the retrained model really is
better than the original model?
Counterfactual evaluation
• Our model changes reality (the world is different
because of its existence)
• We can answer some questions (around model
comparisons) with A/B tests
• For all these questions, we want an approximation
of the charge/outcome distribution that would exist
if there were no model
One approach
• Probabilistically reverse a
small fraction of our block
decisions
• The higher the score, the lower
probability we let the charge
through
• Weight samples by 1 / P(allow)
• Get information on the area we
want to improve on
ID Score p(Allow)
Original
Action
Selected
Action
Outcome
1 10 1.0 Allow Allow OK
2 45 1.0 Allow Allow Fraud
3 55 0.30 Block Block -
4 65 0.20 Block Allow Fraud
5 100 0.0005 Block Block -
6 60 0.25 Block Allow OK
ID Score P(Allow) Weight
Original
Action
Selected
Action
Outcome
1 10 1.0 1 Allow Allow OK
2 45 1.0 1 Allow Allow Fraud
4 65 0.20 5 Block Allow Fraud
6 60 0.25 4 Block Allow OK
Evaluating the "block if score > 50" policy
Precision = 5 / 9 = 0.56
Recall = 5 / 6 = 0.83
• The propensity function controls the exploration/
exploitation tradeoff
• Precision, recall, etc. are estimators
• Variance of the estimators decreases the more we
allow through
• Bootstrap to get error bars (pick rows from the table
uniformly at random with replacement)
• Li, Chen, Kleban, Gupta: "Counterfactual Estimation
and Optimization of Click Metrics for Search Engines"
Summary
• Have a plan for counterfactual evaluation before
you productionize your first model
• You can back yourself into a corner (with no data to
retrain on) if you address this later
• You should be monitoring the production
performance of your model anyway (cf. next lesson)
Alyssa Frazee, Julia Evans, Roban Kramer, Ryan
Wang
3
Invest in production
monitoring for your
models
Production vs. data stack
• Ruby/Mongo vs. Scala/Hadoop/Thrift
• Some issues
• Divergence between production and training
definitions
• Upstream changes to library code in production
feature generation can change feature definitions
• True vs. “True”
Domain-specific
scoring
service
(business
logic)
“Pure”
model
evaluation
service
Aggregation
jobs
Aggregation jobs keep
track of
• Overall action rate and
rate per Stripe user
• Score distributions
• Feature distributions (%
null, p50/p90 for
numerical values, etc.)
Logged
scoring
requests
Aggregation
jobs (get all
aggregates per
model)
Logged
scoring
requests
Domain-specific
scoring
service
(business
logic)
“Pure”
model
evaluation
service
Summary
• Monitor the production inputs to and outputs of
your models
• Have dashboards that can be watched on deploys
and alerting for significant anomalies
• Bake the monitoring into generic ML infrastructure
(so that each ML application isn’t redoing this)
Steve Mardenfeld, Tom Switzer
• Don’t treat models as black boxes
• Have a plan for counterfactual evaluation before
productionizing your first model
• Build production monitoring for action rates, score
distributions, and feature distributions (and bake
into ML infra)

Thanks
Stripe is hiring data scientists, engineers, and
engineering managers!
mlm@stripe.com | @mlmanapat

More Related Content

What's hot

Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...
Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...
Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...
Andreas Grabner
 
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and ScalabiltyDocker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty
Andreas Grabner
 
Hugs instead of Bugs: Dreaming of Quality Tools for Devs and Testers
Hugs instead of Bugs: Dreaming of Quality Tools for Devs and TestersHugs instead of Bugs: Dreaming of Quality Tools for Devs and Testers
Hugs instead of Bugs: Dreaming of Quality Tools for Devs and Testers
Andreas Grabner
 
Agile Data
Agile DataAgile Data
Agile Data
odsc
 
DMCA#21: reactive-programming
DMCA#21: reactive-programmingDMCA#21: reactive-programming
DMCA#21: reactive-programming
Olivier Destrebecq
 
Database DevOps Anti-patterns
Database DevOps Anti-patternsDatabase DevOps Anti-patterns
Database DevOps Anti-patterns
Alex Yates
 
How to keep you out of the News: Web and End-to-End Performance Tips
How to keep you out of the News: Web and End-to-End Performance TipsHow to keep you out of the News: Web and End-to-End Performance Tips
How to keep you out of the News: Web and End-to-End Performance Tips
Andreas Grabner
 
Getting CI right for SQL Server
Getting CI right for SQL ServerGetting CI right for SQL Server
Getting CI right for SQL Server
Alex Yates
 
MLconf NYC Ted Willke
MLconf NYC Ted WillkeMLconf NYC Ted Willke
MLconf NYC Ted WillkeMLconf
 
DevOps 101 for data professionals
DevOps 101 for data professionalsDevOps 101 for data professionals
DevOps 101 for data professionals
Alex Yates
 
Building Scalable Prediction Services in R
Building Scalable Prediction Services in RBuilding Scalable Prediction Services in R
Building Scalable Prediction Services in R
Work-Bench
 
Scaling Your Architecture with Services and Events
Scaling Your Architecture with Services and EventsScaling Your Architecture with Services and Events
Scaling Your Architecture with Services and Events
Randy Shoup
 
BTD2015 - Your Place In DevTOps is Finding Solutions - Not Just Bugs!
BTD2015 - Your Place In DevTOps is Finding Solutions - Not Just Bugs!BTD2015 - Your Place In DevTOps is Finding Solutions - Not Just Bugs!
BTD2015 - Your Place In DevTOps is Finding Solutions - Not Just Bugs!
Andreas Grabner
 
Managing Data in Microservices
Managing Data in MicroservicesManaging Data in Microservices
Managing Data in Microservices
Randy Shoup
 
HSPS 2015 - SharePoint Performance Santiy Checks
HSPS 2015 - SharePoint Performance Santiy ChecksHSPS 2015 - SharePoint Performance Santiy Checks
HSPS 2015 - SharePoint Performance Santiy Checks
Andreas Grabner
 
Four Practices to Fix Your Top .NET Performance Problems
Four Practices to Fix Your Top .NET Performance ProblemsFour Practices to Fix Your Top .NET Performance Problems
Four Practices to Fix Your Top .NET Performance Problems
Andreas Grabner
 
Workbooks-Convert-OMG-LOL!.PDF
Workbooks-Convert-OMG-LOL!.PDFWorkbooks-Convert-OMG-LOL!.PDF
Workbooks-Convert-OMG-LOL!.PDFKyle Lambert
 
Making operations visible - devopsdays tokyo 2013
Making operations visible  - devopsdays tokyo 2013Making operations visible  - devopsdays tokyo 2013
Making operations visible - devopsdays tokyo 2013Nick Galbreath
 
Web and App Performance: Top Problems to avoid to keep you out of the News
Web and App Performance: Top Problems to avoid to keep you out of the NewsWeb and App Performance: Top Problems to avoid to keep you out of the News
Web and App Performance: Top Problems to avoid to keep you out of the News
Andreas Grabner
 
DevSecOps - London Gathering : June 2018
DevSecOps - London Gathering : June 2018DevSecOps - London Gathering : June 2018
DevSecOps - London Gathering : June 2018
Michael Man
 

What's hot (20)

Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...
Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...
Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...
 
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and ScalabiltyDocker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty
 
Hugs instead of Bugs: Dreaming of Quality Tools for Devs and Testers
Hugs instead of Bugs: Dreaming of Quality Tools for Devs and TestersHugs instead of Bugs: Dreaming of Quality Tools for Devs and Testers
Hugs instead of Bugs: Dreaming of Quality Tools for Devs and Testers
 
Agile Data
Agile DataAgile Data
Agile Data
 
DMCA#21: reactive-programming
DMCA#21: reactive-programmingDMCA#21: reactive-programming
DMCA#21: reactive-programming
 
Database DevOps Anti-patterns
Database DevOps Anti-patternsDatabase DevOps Anti-patterns
Database DevOps Anti-patterns
 
How to keep you out of the News: Web and End-to-End Performance Tips
How to keep you out of the News: Web and End-to-End Performance TipsHow to keep you out of the News: Web and End-to-End Performance Tips
How to keep you out of the News: Web and End-to-End Performance Tips
 
Getting CI right for SQL Server
Getting CI right for SQL ServerGetting CI right for SQL Server
Getting CI right for SQL Server
 
MLconf NYC Ted Willke
MLconf NYC Ted WillkeMLconf NYC Ted Willke
MLconf NYC Ted Willke
 
DevOps 101 for data professionals
DevOps 101 for data professionalsDevOps 101 for data professionals
DevOps 101 for data professionals
 
Building Scalable Prediction Services in R
Building Scalable Prediction Services in RBuilding Scalable Prediction Services in R
Building Scalable Prediction Services in R
 
Scaling Your Architecture with Services and Events
Scaling Your Architecture with Services and EventsScaling Your Architecture with Services and Events
Scaling Your Architecture with Services and Events
 
BTD2015 - Your Place In DevTOps is Finding Solutions - Not Just Bugs!
BTD2015 - Your Place In DevTOps is Finding Solutions - Not Just Bugs!BTD2015 - Your Place In DevTOps is Finding Solutions - Not Just Bugs!
BTD2015 - Your Place In DevTOps is Finding Solutions - Not Just Bugs!
 
Managing Data in Microservices
Managing Data in MicroservicesManaging Data in Microservices
Managing Data in Microservices
 
HSPS 2015 - SharePoint Performance Santiy Checks
HSPS 2015 - SharePoint Performance Santiy ChecksHSPS 2015 - SharePoint Performance Santiy Checks
HSPS 2015 - SharePoint Performance Santiy Checks
 
Four Practices to Fix Your Top .NET Performance Problems
Four Practices to Fix Your Top .NET Performance ProblemsFour Practices to Fix Your Top .NET Performance Problems
Four Practices to Fix Your Top .NET Performance Problems
 
Workbooks-Convert-OMG-LOL!.PDF
Workbooks-Convert-OMG-LOL!.PDFWorkbooks-Convert-OMG-LOL!.PDF
Workbooks-Convert-OMG-LOL!.PDF
 
Making operations visible - devopsdays tokyo 2013
Making operations visible  - devopsdays tokyo 2013Making operations visible  - devopsdays tokyo 2013
Making operations visible - devopsdays tokyo 2013
 
Web and App Performance: Top Problems to avoid to keep you out of the News
Web and App Performance: Top Problems to avoid to keep you out of the NewsWeb and App Performance: Top Problems to avoid to keep you out of the News
Web and App Performance: Top Problems to avoid to keep you out of the News
 
DevSecOps - London Gathering : June 2018
DevSecOps - London Gathering : June 2018DevSecOps - London Gathering : June 2018
DevSecOps - London Gathering : June 2018
 

Viewers also liked

DataEngConf SF16 - High cardinality time series search
DataEngConf SF16 - High cardinality time series searchDataEngConf SF16 - High cardinality time series search
DataEngConf SF16 - High cardinality time series search
Hakka Labs
 
DataEngConf SF16 - Deriving Meaning from Wearable Sensor Data
DataEngConf SF16 - Deriving Meaning from Wearable Sensor DataDataEngConf SF16 - Deriving Meaning from Wearable Sensor Data
DataEngConf SF16 - Deriving Meaning from Wearable Sensor Data
Hakka Labs
 
DataEngConf SF16 - Bridging the gap between data science and data engineering
DataEngConf SF16 - Bridging the gap between data science and data engineeringDataEngConf SF16 - Bridging the gap between data science and data engineering
DataEngConf SF16 - Bridging the gap between data science and data engineering
Hakka Labs
 
DataEngConf SF16 - Running simulations at scale
DataEngConf SF16 - Running simulations at scaleDataEngConf SF16 - Running simulations at scale
DataEngConf SF16 - Running simulations at scale
Hakka Labs
 
DataEngConf SF16 - Entity Resolution in Data Pipelines Using Spark
DataEngConf SF16 - Entity Resolution in Data Pipelines Using SparkDataEngConf SF16 - Entity Resolution in Data Pipelines Using Spark
DataEngConf SF16 - Entity Resolution in Data Pipelines Using Spark
Hakka Labs
 
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQ
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQDataEngConf SF16 - BYOMQ: Why We [re]Built IronMQ
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQ
Hakka Labs
 
DataEngConf SF16 - Scalable and Reliable Logging at Pinterest
DataEngConf SF16 - Scalable and Reliable Logging at PinterestDataEngConf SF16 - Scalable and Reliable Logging at Pinterest
DataEngConf SF16 - Scalable and Reliable Logging at Pinterest
Hakka Labs
 
DataEngConf SF16 - Methods for Content Relevance at LinkedIn
DataEngConf SF16 - Methods for Content Relevance at LinkedInDataEngConf SF16 - Methods for Content Relevance at LinkedIn
DataEngConf SF16 - Methods for Content Relevance at LinkedIn
Hakka Labs
 
DataEngConf SF16 - Beginning with Ourselves
DataEngConf SF16 - Beginning with OurselvesDataEngConf SF16 - Beginning with Ourselves
DataEngConf SF16 - Beginning with Ourselves
Hakka Labs
 
DataEngConf SF16 - Routing Billions of Analytics Events with High Deliverability
DataEngConf SF16 - Routing Billions of Analytics Events with High DeliverabilityDataEngConf SF16 - Routing Billions of Analytics Events with High Deliverability
DataEngConf SF16 - Routing Billions of Analytics Events with High Deliverability
Hakka Labs
 
DataEngConf SF16 - Collecting and Moving Data at Scale
DataEngConf SF16 - Collecting and Moving Data at Scale DataEngConf SF16 - Collecting and Moving Data at Scale
DataEngConf SF16 - Collecting and Moving Data at Scale
Hakka Labs
 
DataEngConf SF16 - Tales from the other side - What a hiring manager wish you...
DataEngConf SF16 - Tales from the other side - What a hiring manager wish you...DataEngConf SF16 - Tales from the other side - What a hiring manager wish you...
DataEngConf SF16 - Tales from the other side - What a hiring manager wish you...
Hakka Labs
 
DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...
DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...
DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...
Hakka Labs
 
DataEngConf SF16 - Recommendations at Instacart
DataEngConf SF16 - Recommendations at InstacartDataEngConf SF16 - Recommendations at Instacart
DataEngConf SF16 - Recommendations at Instacart
Hakka Labs
 
Always Valid Inference (Ramesh Johari, Stanford)
Always Valid Inference (Ramesh Johari, Stanford)Always Valid Inference (Ramesh Johari, Stanford)
Always Valid Inference (Ramesh Johari, Stanford)
Hakka Labs
 
DataEngConf SF16 - Multi-temporal Data Structures
DataEngConf SF16 - Multi-temporal Data StructuresDataEngConf SF16 - Multi-temporal Data Structures
DataEngConf SF16 - Multi-temporal Data Structures
Hakka Labs
 
DataEngConf: Apache Spark in Financial Modeling at BlackRock
DataEngConf: Apache Spark in Financial Modeling at BlackRock DataEngConf: Apache Spark in Financial Modeling at BlackRock
DataEngConf: Apache Spark in Financial Modeling at BlackRock
Hakka Labs
 
DataEngConf: Feature Extraction: Modern Questions and Challenges at Google
DataEngConf: Feature Extraction: Modern Questions and Challenges at GoogleDataEngConf: Feature Extraction: Modern Questions and Challenges at Google
DataEngConf: Feature Extraction: Modern Questions and Challenges at Google
Hakka Labs
 
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
Hakka Labs
 
A Primer on Entity Resolution
A Primer on Entity ResolutionA Primer on Entity Resolution
A Primer on Entity Resolution
Benjamin Bengfort
 

Viewers also liked (20)

DataEngConf SF16 - High cardinality time series search
DataEngConf SF16 - High cardinality time series searchDataEngConf SF16 - High cardinality time series search
DataEngConf SF16 - High cardinality time series search
 
DataEngConf SF16 - Deriving Meaning from Wearable Sensor Data
DataEngConf SF16 - Deriving Meaning from Wearable Sensor DataDataEngConf SF16 - Deriving Meaning from Wearable Sensor Data
DataEngConf SF16 - Deriving Meaning from Wearable Sensor Data
 
DataEngConf SF16 - Bridging the gap between data science and data engineering
DataEngConf SF16 - Bridging the gap between data science and data engineeringDataEngConf SF16 - Bridging the gap between data science and data engineering
DataEngConf SF16 - Bridging the gap between data science and data engineering
 
DataEngConf SF16 - Running simulations at scale
DataEngConf SF16 - Running simulations at scaleDataEngConf SF16 - Running simulations at scale
DataEngConf SF16 - Running simulations at scale
 
DataEngConf SF16 - Entity Resolution in Data Pipelines Using Spark
DataEngConf SF16 - Entity Resolution in Data Pipelines Using SparkDataEngConf SF16 - Entity Resolution in Data Pipelines Using Spark
DataEngConf SF16 - Entity Resolution in Data Pipelines Using Spark
 
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQ
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQDataEngConf SF16 - BYOMQ: Why We [re]Built IronMQ
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQ
 
DataEngConf SF16 - Scalable and Reliable Logging at Pinterest
DataEngConf SF16 - Scalable and Reliable Logging at PinterestDataEngConf SF16 - Scalable and Reliable Logging at Pinterest
DataEngConf SF16 - Scalable and Reliable Logging at Pinterest
 
DataEngConf SF16 - Methods for Content Relevance at LinkedIn
DataEngConf SF16 - Methods for Content Relevance at LinkedInDataEngConf SF16 - Methods for Content Relevance at LinkedIn
DataEngConf SF16 - Methods for Content Relevance at LinkedIn
 
DataEngConf SF16 - Beginning with Ourselves
DataEngConf SF16 - Beginning with OurselvesDataEngConf SF16 - Beginning with Ourselves
DataEngConf SF16 - Beginning with Ourselves
 
DataEngConf SF16 - Routing Billions of Analytics Events with High Deliverability
DataEngConf SF16 - Routing Billions of Analytics Events with High DeliverabilityDataEngConf SF16 - Routing Billions of Analytics Events with High Deliverability
DataEngConf SF16 - Routing Billions of Analytics Events with High Deliverability
 
DataEngConf SF16 - Collecting and Moving Data at Scale
DataEngConf SF16 - Collecting and Moving Data at Scale DataEngConf SF16 - Collecting and Moving Data at Scale
DataEngConf SF16 - Collecting and Moving Data at Scale
 
DataEngConf SF16 - Tales from the other side - What a hiring manager wish you...
DataEngConf SF16 - Tales from the other side - What a hiring manager wish you...DataEngConf SF16 - Tales from the other side - What a hiring manager wish you...
DataEngConf SF16 - Tales from the other side - What a hiring manager wish you...
 
DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...
DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...
DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...
 
DataEngConf SF16 - Recommendations at Instacart
DataEngConf SF16 - Recommendations at InstacartDataEngConf SF16 - Recommendations at Instacart
DataEngConf SF16 - Recommendations at Instacart
 
Always Valid Inference (Ramesh Johari, Stanford)
Always Valid Inference (Ramesh Johari, Stanford)Always Valid Inference (Ramesh Johari, Stanford)
Always Valid Inference (Ramesh Johari, Stanford)
 
DataEngConf SF16 - Multi-temporal Data Structures
DataEngConf SF16 - Multi-temporal Data StructuresDataEngConf SF16 - Multi-temporal Data Structures
DataEngConf SF16 - Multi-temporal Data Structures
 
DataEngConf: Apache Spark in Financial Modeling at BlackRock
DataEngConf: Apache Spark in Financial Modeling at BlackRock DataEngConf: Apache Spark in Financial Modeling at BlackRock
DataEngConf: Apache Spark in Financial Modeling at BlackRock
 
DataEngConf: Feature Extraction: Modern Questions and Challenges at Google
DataEngConf: Feature Extraction: Modern Questions and Challenges at GoogleDataEngConf: Feature Extraction: Modern Questions and Challenges at Google
DataEngConf: Feature Extraction: Modern Questions and Challenges at Google
 
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
 
A Primer on Entity Resolution
A Primer on Entity ResolutionA Primer on Entity Resolution
A Primer on Entity Resolution
 

Similar to DataEngConf SF16 - Three lessons learned from building a production machine learning system

Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10
Roger Barga
 
Before Kaggle : from a business goal to a Machine Learning problem
Before Kaggle : from a business goal to a Machine Learning problem Before Kaggle : from a business goal to a Machine Learning problem
Before Kaggle : from a business goal to a Machine Learning problem
Dataiku
 
Before Kaggle
Before KaggleBefore Kaggle
Before Kaggle
Pierre Gutierrez
 
Why Automated Testing Matters To DevOps
Why Automated Testing Matters To DevOpsWhy Automated Testing Matters To DevOps
Why Automated Testing Matters To DevOps
dpaulmerrill
 
AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)
AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)
AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)
Amazon Web Services
 
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Alok Singh
 
[QE 2018] Paul Gerrard – Automating Assurance: Tools, Collaboration and DevOps
[QE 2018] Paul Gerrard – Automating Assurance: Tools, Collaboration and DevOps[QE 2018] Paul Gerrard – Automating Assurance: Tools, Collaboration and DevOps
[QE 2018] Paul Gerrard – Automating Assurance: Tools, Collaboration and DevOps
Future Processing
 
10 -- Overfitting and Underfitting.pptx
10 -- Overfitting and Underfitting.pptx10 -- Overfitting and Underfitting.pptx
10 -- Overfitting and Underfitting.pptx
kpcp
 
From Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender systemFrom Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender system
Pierre Gutierrez
 
ML Application Life Cycle
ML Application Life CycleML Application Life Cycle
ML Application Life Cycle
SrujanaMerugu1
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms Reliable
Justin Basilico
 
A New Model for Testing
A New Model for TestingA New Model for Testing
A New Model for Testing
SQALab
 
The 8 Step Data Mining Process
The 8 Step Data Mining ProcessThe 8 Step Data Mining Process
The 8 Step Data Mining Process
Marc Berman
 
New model
New modelNew model
New model
TEST Huddle
 
A New Model For Testing
A New Model For TestingA New Model For Testing
A New Model For Testing
TEST Huddle
 
AI Models For Fun and Profit by Walmart Director of Artificial Intelligence
AI Models For Fun and Profit by Walmart Director of Artificial IntelligenceAI Models For Fun and Profit by Walmart Director of Artificial Intelligence
AI Models For Fun and Profit by Walmart Director of Artificial Intelligence
Product School
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2
Roger Barga
 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial Industry
Subrat Panda, PhD
 
Can we induce change with what we measure?
Can we induce change with what we measure?Can we induce change with what we measure?
Can we induce change with what we measure?
Michaela Greiler
 
Business process simulations: from GREAT! to good, Razvan Radulian, Sept 2013
Business process simulations: from GREAT! to good, Razvan Radulian, Sept 2013Business process simulations: from GREAT! to good, Razvan Radulian, Sept 2013
Business process simulations: from GREAT! to good, Razvan Radulian, Sept 2013
Why-What-How Consulting, LLC
 

Similar to DataEngConf SF16 - Three lessons learned from building a production machine learning system (20)

Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10
 
Before Kaggle : from a business goal to a Machine Learning problem
Before Kaggle : from a business goal to a Machine Learning problem Before Kaggle : from a business goal to a Machine Learning problem
Before Kaggle : from a business goal to a Machine Learning problem
 
Before Kaggle
Before KaggleBefore Kaggle
Before Kaggle
 
Why Automated Testing Matters To DevOps
Why Automated Testing Matters To DevOpsWhy Automated Testing Matters To DevOps
Why Automated Testing Matters To DevOps
 
AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)
AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)
AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)
 
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
 
[QE 2018] Paul Gerrard – Automating Assurance: Tools, Collaboration and DevOps
[QE 2018] Paul Gerrard – Automating Assurance: Tools, Collaboration and DevOps[QE 2018] Paul Gerrard – Automating Assurance: Tools, Collaboration and DevOps
[QE 2018] Paul Gerrard – Automating Assurance: Tools, Collaboration and DevOps
 
10 -- Overfitting and Underfitting.pptx
10 -- Overfitting and Underfitting.pptx10 -- Overfitting and Underfitting.pptx
10 -- Overfitting and Underfitting.pptx
 
From Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender systemFrom Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender system
 
ML Application Life Cycle
ML Application Life CycleML Application Life Cycle
ML Application Life Cycle
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms Reliable
 
A New Model for Testing
A New Model for TestingA New Model for Testing
A New Model for Testing
 
The 8 Step Data Mining Process
The 8 Step Data Mining ProcessThe 8 Step Data Mining Process
The 8 Step Data Mining Process
 
New model
New modelNew model
New model
 
A New Model For Testing
A New Model For TestingA New Model For Testing
A New Model For Testing
 
AI Models For Fun and Profit by Walmart Director of Artificial Intelligence
AI Models For Fun and Profit by Walmart Director of Artificial IntelligenceAI Models For Fun and Profit by Walmart Director of Artificial Intelligence
AI Models For Fun and Profit by Walmart Director of Artificial Intelligence
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2
 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial Industry
 
Can we induce change with what we measure?
Can we induce change with what we measure?Can we induce change with what we measure?
Can we induce change with what we measure?
 
Business process simulations: from GREAT! to good, Razvan Radulian, Sept 2013
Business process simulations: from GREAT! to good, Razvan Radulian, Sept 2013Business process simulations: from GREAT! to good, Razvan Radulian, Sept 2013
Business process simulations: from GREAT! to good, Razvan Radulian, Sept 2013
 

More from Hakka Labs

DataEngConf SF16 - Spark SQL Workshop
DataEngConf SF16 - Spark SQL WorkshopDataEngConf SF16 - Spark SQL Workshop
DataEngConf SF16 - Spark SQL Workshop
Hakka Labs
 
DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...
DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...
DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...
Hakka Labs
 
DataEngConf: Data Science at the New York Times by Chris Wiggins
DataEngConf: Data Science at the New York Times by Chris WigginsDataEngConf: Data Science at the New York Times by Chris Wiggins
DataEngConf: Data Science at the New York Times by Chris Wiggins
Hakka Labs
 
DataEngConf: Building the Next New York Times Recommendation Engine
DataEngConf: Building the Next New York Times Recommendation EngineDataEngConf: Building the Next New York Times Recommendation Engine
DataEngConf: Building the Next New York Times Recommendation Engine
Hakka Labs
 
DataEngConf: Measuring Impact with Data in a Distributed World at Conde Nast
DataEngConf: Measuring Impact with Data in a Distributed World at Conde NastDataEngConf: Measuring Impact with Data in a Distributed World at Conde Nast
DataEngConf: Measuring Impact with Data in a Distributed World at Conde Nast
Hakka Labs
 
DataEngConf: Talkographics: Using What Viewers Say Online to Measure TV and B...
DataEngConf: Talkographics: Using What Viewers Say Online to Measure TV and B...DataEngConf: Talkographics: Using What Viewers Say Online to Measure TV and B...
DataEngConf: Talkographics: Using What Viewers Say Online to Measure TV and B...
Hakka Labs
 
DataEngConf: The Science of Virality at BuzzFeed
DataEngConf: The Science of Virality at BuzzFeedDataEngConf: The Science of Virality at BuzzFeed
DataEngConf: The Science of Virality at BuzzFeed
Hakka Labs
 
DataEngConf: Uri Laserson (Data Scientist, Cloudera) Scaling up Genomics with...
DataEngConf: Uri Laserson (Data Scientist, Cloudera) Scaling up Genomics with...DataEngConf: Uri Laserson (Data Scientist, Cloudera) Scaling up Genomics with...
DataEngConf: Uri Laserson (Data Scientist, Cloudera) Scaling up Genomics with...
Hakka Labs
 
DataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big Data
DataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big DataDataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big Data
DataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big Data
Hakka Labs
 
DataEngConf: Apache Kafka at Rocana: a scalable, distributed log for machine ...
DataEngConf: Apache Kafka at Rocana: a scalable, distributed log for machine ...DataEngConf: Apache Kafka at Rocana: a scalable, distributed log for machine ...
DataEngConf: Apache Kafka at Rocana: a scalable, distributed log for machine ...
Hakka Labs
 
DataEngConf: Building Satori, a Hadoop toll for Data Extraction at LinkedIn
DataEngConf: Building Satori, a Hadoop toll for Data Extraction at LinkedInDataEngConf: Building Satori, a Hadoop toll for Data Extraction at LinkedIn
DataEngConf: Building Satori, a Hadoop toll for Data Extraction at LinkedIn
Hakka Labs
 

More from Hakka Labs (11)

DataEngConf SF16 - Spark SQL Workshop
DataEngConf SF16 - Spark SQL WorkshopDataEngConf SF16 - Spark SQL Workshop
DataEngConf SF16 - Spark SQL Workshop
 
DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...
DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...
DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...
 
DataEngConf: Data Science at the New York Times by Chris Wiggins
DataEngConf: Data Science at the New York Times by Chris WigginsDataEngConf: Data Science at the New York Times by Chris Wiggins
DataEngConf: Data Science at the New York Times by Chris Wiggins
 
DataEngConf: Building the Next New York Times Recommendation Engine
DataEngConf: Building the Next New York Times Recommendation EngineDataEngConf: Building the Next New York Times Recommendation Engine
DataEngConf: Building the Next New York Times Recommendation Engine
 
DataEngConf: Measuring Impact with Data in a Distributed World at Conde Nast
DataEngConf: Measuring Impact with Data in a Distributed World at Conde NastDataEngConf: Measuring Impact with Data in a Distributed World at Conde Nast
DataEngConf: Measuring Impact with Data in a Distributed World at Conde Nast
 
DataEngConf: Talkographics: Using What Viewers Say Online to Measure TV and B...
DataEngConf: Talkographics: Using What Viewers Say Online to Measure TV and B...DataEngConf: Talkographics: Using What Viewers Say Online to Measure TV and B...
DataEngConf: Talkographics: Using What Viewers Say Online to Measure TV and B...
 
DataEngConf: The Science of Virality at BuzzFeed
DataEngConf: The Science of Virality at BuzzFeedDataEngConf: The Science of Virality at BuzzFeed
DataEngConf: The Science of Virality at BuzzFeed
 
DataEngConf: Uri Laserson (Data Scientist, Cloudera) Scaling up Genomics with...
DataEngConf: Uri Laserson (Data Scientist, Cloudera) Scaling up Genomics with...DataEngConf: Uri Laserson (Data Scientist, Cloudera) Scaling up Genomics with...
DataEngConf: Uri Laserson (Data Scientist, Cloudera) Scaling up Genomics with...
 
DataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big Data
DataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big DataDataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big Data
DataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big Data
 
DataEngConf: Apache Kafka at Rocana: a scalable, distributed log for machine ...
DataEngConf: Apache Kafka at Rocana: a scalable, distributed log for machine ...DataEngConf: Apache Kafka at Rocana: a scalable, distributed log for machine ...
DataEngConf: Apache Kafka at Rocana: a scalable, distributed log for machine ...
 
DataEngConf: Building Satori, a Hadoop toll for Data Extraction at LinkedIn
DataEngConf: Building Satori, a Hadoop toll for Data Extraction at LinkedInDataEngConf: Building Satori, a Hadoop toll for Data Extraction at LinkedIn
DataEngConf: Building Satori, a Hadoop toll for Data Extraction at LinkedIn
 

Recently uploaded

Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
Vlad Stirbu
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 

Recently uploaded (20)

Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 

DataEngConf SF16 - Three lessons learned from building a production machine learning system

  • 1. Three lessons learned from building a production machine learning system Michael Manapat Stripe @mlmanapat
  • 2. Fraud • Card numbers are stolen by hacking, malware, etc. • “Dumps” are sold in “carding” forums • Fraudsters use numbers in dumps to buy goods, which they then resell • Cardholders dispute transactions • Merchant ends up bearing cost of fraud
  • 3. • We train binary classifiers to predict fraud • We use open source tools • Scalding/Summingbird for feature generation • scikit-learn for model training
 (eventually: github.com/stripe/brushfire)
  • 4. 1 Don’t treat models as black boxes
  • 5. Early ML at Stripe • Focused on training with more and more data and adding more and more features • Didn’t think much about • ML algorithms (tuning hyperparameters, e.g.) • The deeper reasons behind any particular set of results Substantial reduction in fraud rate
  • 6. Product development From a product standpoint: • We were blocking high risk charges and surfacing just the decision • We wanted to provide Stripe users insight into our actions—reasons for scores
  • 7. Score reasons X = 5, Y = 3: score = 0.1 Which feature is “driving” the score more? X < 10 Y < 5 X < 15 0.1 (20) 0.3 (30) 0.5 (10) 0.9 (40) True False
  • 8. Score reasons X = ?, Y = 3:
 (20/70) * 0.1 + (10/70) * 0.5 + (40/70) * 0.9 = 0.61 Score Δ = |holdout - original| = |0.61 - 0.1| = 0.51 Now producing richer reasons with multiple predicates X < ? Y < 5 X < ? 0.1 (20) 0.3 (30) 0.5 (10) 0.9 (40)
  • 9. Model introspection If a model didn’t look good in validation, it wasn’t clear what to do (besides trying more features/data) What if we used our “score reasons” to debug model issues?
  • 10. • Take all false positives (in validation data or in production) and group by generated reason • Were a substantial fraction of the false positives driven by a few features? • Did all the comparisons in the explanation predicates make sense? (Were they comparisons a human might make for fraud?) • Our models were overfit!
  • 11. Actioning insights • Hyperparameter optimization • Feature selection Precision Recall
  • 12. Summary • Don’t treat models as black boxes • Thinking about the learning process (vs. just features and data) can yield significant payoffs • Tooling for introspection can accelerate model development/“debugging” Julia Evans, Alyssa Frazee, Erik Osheim, Sam Ritchie, Jocelyn Ross, Tom Switzer
  • 13. 2 Have a plan for counterfactual evaluation
  • 14. • December 31st, 2013 • Train a binary classifier for disputes on data from Jan 1st to Sep 30th • Validate on data from Oct 1st to Oct 31st (need to wait ~60 days for labels) • Based on validation data, pick a policy for actioning scores:
 block if score > 50
  • 15. Questions (1) • Business complains about high false positive rate: what would happen if we changed the policy to "block if score > 70"? • What are the production precision and recall of the model?
  • 16. • December 31st, 2014. We repeat the exercise from a year earlier • Train a model on data from Jan 1st to Sep 30th • Validate on data from Oct 1st to Oct 31st (need to wait ~60 days for labels) • Validation results look ~ok (but not great) • We put the model into production and the results are terrible
  • 17. Questions (2) • Why did the validation results for the new model look so much worse? • How do we know if the retrained model really is better than the original model?
  • 18. Counterfactual evaluation • Our model changes reality (the world is different because of its existence) • We can answer some questions (around model comparisons) with A/B tests • For all these questions, we want an approximation of the charge/outcome distribution that would exist if there were no model
  • 19. One approach • Probabilistically reverse a small fraction of our block decisions • The higher the score, the lower probability we let the charge through • Weight samples by 1 / P(allow) • Get information on the area we want to improve on
  • 20. ID Score p(Allow) Original Action Selected Action Outcome 1 10 1.0 Allow Allow OK 2 45 1.0 Allow Allow Fraud 3 55 0.30 Block Block - 4 65 0.20 Block Allow Fraud 5 100 0.0005 Block Block - 6 60 0.25 Block Allow OK
  • 21. ID Score P(Allow) Weight Original Action Selected Action Outcome 1 10 1.0 1 Allow Allow OK 2 45 1.0 1 Allow Allow Fraud 4 65 0.20 5 Block Allow Fraud 6 60 0.25 4 Block Allow OK Evaluating the "block if score > 50" policy Precision = 5 / 9 = 0.56 Recall = 5 / 6 = 0.83
  • 22. • The propensity function controls the exploration/ exploitation tradeoff • Precision, recall, etc. are estimators • Variance of the estimators decreases the more we allow through • Bootstrap to get error bars (pick rows from the table uniformly at random with replacement) • Li, Chen, Kleban, Gupta: "Counterfactual Estimation and Optimization of Click Metrics for Search Engines"
  • 23. Summary • Have a plan for counterfactual evaluation before you productionize your first model • You can back yourself into a corner (with no data to retrain on) if you address this later • You should be monitoring the production performance of your model anyway (cf. next lesson) Alyssa Frazee, Julia Evans, Roban Kramer, Ryan Wang
  • 25. Production vs. data stack • Ruby/Mongo vs. Scala/Hadoop/Thrift • Some issues • Divergence between production and training definitions • Upstream changes to library code in production feature generation can change feature definitions • True vs. “True”
  • 26. Domain-specific scoring service (business logic) “Pure” model evaluation service Aggregation jobs Aggregation jobs keep track of • Overall action rate and rate per Stripe user • Score distributions • Feature distributions (% null, p50/p90 for numerical values, etc.) Logged scoring requests
  • 27. Aggregation jobs (get all aggregates per model) Logged scoring requests Domain-specific scoring service (business logic) “Pure” model evaluation service
  • 28. Summary • Monitor the production inputs to and outputs of your models • Have dashboards that can be watched on deploys and alerting for significant anomalies • Bake the monitoring into generic ML infrastructure (so that each ML application isn’t redoing this) Steve Mardenfeld, Tom Switzer
  • 29. • Don’t treat models as black boxes • Have a plan for counterfactual evaluation before productionizing your first model • Build production monitoring for action rates, score distributions, and feature distributions (and bake into ML infra)

  • 30. Thanks Stripe is hiring data scientists, engineers, and engineering managers! mlm@stripe.com | @mlmanapat