SlideShare a Scribd company logo
Predictive Models
Energizer
Bunny
Models
Big Data Week - Randal Cox
Fill in Sparse Data
Use Data Indirectly
Predictive Models
Card Risk
Store Risk
Transaction Risk
Predictive Models
Transaction Risk
The Goal
Weekend Swiped >$100 ND, SD
Gas Station IL, IA, WI
Far Away
The Goal
Overcoming Sparse Data
Use Variables Smarter
Transaction
$$$
PIN OK?
Card Present?
Time stamp
Merchant
Type
State
Country
Postal
Card Holder
Postal
shops far away
shops odd hours
shops special stores
The Power
Cards near home
Fraudsters don’t know
or can’t spend there
Distance
Home => POS
The Price
POS Country
POS Postal
Home Country
Home Postal
Lat, Lon Lat, Lon
Lat/Lon For
World-Wide
Postal Codes
http://download.geonames.org/export/zip/
Haversine
Formula
0
100,000
200,000
300,000
400,000
500,000
600,000
700,000
800,000
1/1/04 1/29/04 2/26/04 3/25/04 4/22/04 5/20/04 6/17/04 7/15/04 8/12/04 9/9/04 10/7/04 11/4/04 12/2/04
0
100,000
200,000
300,000
400,000
500,000
600,000
700,000
800,000
1/1/04 1/29/04 2/26/04 3/25/04 4/22/04 5/20/04 6/17/04 7/15/04 8/12/04 9/9/04 10/7/04 11/4/04 12/2/04
Day of Week
SUN MON TUE
WED THU FRI SAT
0
20
40
60
80
100
0 3 6 9 12 15 18 21 24
Fraud
Hour
Risky Features
The Goal
Weekend Swiped >$100 ND, SD
Gas Station IL, IA, WI
Far Away
The Goal
Weekend Swiped >$100 ND, SD
Gas Station IL, IA, WI
Far Away
Ordinality
Change
Ordinality
40,000 US Postal Codes
1,190,000 World Postal Codes
3,000 Merchant Types
If Postal Code IS IN 50137, 32567, 57776, 24942, 98349, 13627, 80203,
84726, 43571, 25004, 12964, 75051, 88595, 13774,
64499, 33074, 54021, 45055, 92628, 32086, 36079,
99780, 93639, 88560, 58274, 55403, 30811, 19056,
47370, 71138, 29481, 45319, 89154, 50058, 56051,
57791, 43611, 97630, 95894, 46504, 07028, 30477,
19726, 68781, 05763, 59462, 92198, 50432, 90055,
20626, .... 1000 more
If PROXY > 5.3
One Solution
Fraud Rates
MCC Fraud Auths Fraud Rate
5734 57 of 7210 79.06 bp
5542 256 of 203053 12.61 bp
5732 25 of 3951 63.28 bp
6011 47 of 20565 22.85 bp
5691 24 of 4199 57.16 bp
5999 39 of 18494 21.09 bp
5967 16 of 1320 121.21 bp
5972 9 of 43 2093.02 bp
0000 20 of 6141 32.57 bp
5964 15 of 2584 58.05 bp
5651 27 of 12836 21.03 bp
MCC Fraud Auths Fraud Rate
9399 4 of 11786 3.39 bp
5921 7 of 20044 3.49 bp
4121 5 of 16751 2.98 bp
7230 3 of 11866 2.53 bp
5912 27 of 61428 4.40 bp
5411 134 of 255195 5.25 bp
4784 0 of 7360 0.00 bp
5812 55 of 128521 4.28 bp
7832 0 of 9076 0.00 bp
7841 1 of 10970 0.91 bp
SafeRisky
If Fraud Score> 600
Decline
If Fraud Score > 900 and MCC == 9399
Decline
If Fraud Score > 200 and MCC == 5734
Decline
Bulk
LOW
RISK
HIGH
RISK
Can We Use the Fraud Rate Instead?
If Fraud Score > 900 and MCC Fraud < 0.02%
Decline
If Fraud Score> 200 and MCC Fraud > 10%
Decline
LOW
RISK
HIGH
RISK
The Problem of
Small Numbers
Fraud Rate = 1/4
P[2 x 1] = 6%
Removing Variables
M1 - M2
SD1+SD2
Z =
4bp 6 bp 8 bp 10 bp2bp
Z = 10.7 Z = -6.2
RISKY SAFE
POS Country
POS State
POS Postal
POS MCC
Distance Intervals
Z-Score for
Entry Mode
Authentication
Time of Day
Day of Week
Merchants
If Postal Code IS IN 50137, 32567, 57776, 24942, 98349, 13627, 80203,
84726, 43571, 25004, 12964, 75051, 88595, 13774,
64499, 33074, 54021, 45055, 92628, 32086, 36079,
99780, 93639, 88560, 58274, 55403, 30811, 19056,
47370, 71138, 29481, 45319, 89154, 50058, 56051,
57791, 43611, 97630, 95894, 46504, 07028, 30477,
19726, 68781, 05763, 59462, 92198, 50432, 90055,
20626, .... 1000 more
If Postal Code Z-Score > 8.3 and FraudScore > 300
then DECLINE
vs
Model Durability
Approach Time to Decay
Direct Use Weeks to Months
Z-Score Years
Operationalize
Rolling Fraud Rate and Z-score
Update Tables in your Rules Environment
• Space and Time
• No Direct Features
• Add Z-Score
Energizer
Bunny
Models
randal@rippleshot.com
rippleshot.com

More Related Content

Similar to BDW Chicago 2016 - Randal Cox, Chief Scientist & Co-Founder, Rippleshot - Energizer Bunny Models: Variable Indirection for Eternally Robust Models

KDD Analytics 2014 - Experts in Marketing Analytics
KDD Analytics 2014 - Experts in Marketing AnalyticsKDD Analytics 2014 - Experts in Marketing Analytics
KDD Analytics 2014 - Experts in Marketing Analytics
Boulder Equity Analytics
 
Myrecharge ppt
Myrecharge pptMyrecharge ppt
Myrecharge ppt
Jig Modi
 
Financial Model Infograph
Financial Model InfographFinancial Model Infograph
Financial Model Infograph
Joe Solari
 
Allkontak presentation
Allkontak presentationAllkontak presentation
Allkontak presentation
allkontak
 
Audit fraud power point for a portfolio.pptx
Audit fraud power point for a portfolio.pptxAudit fraud power point for a portfolio.pptx
Audit fraud power point for a portfolio.pptx
christiancevallos01
 
Tangerine concepts
Tangerine conceptsTangerine concepts
Tangerine concepts
cconery
 
Online Retail Risk Management
Online Retail Risk ManagementOnline Retail Risk Management
Online Retail Risk Management
iamwire
 
Small Business Adoption of EMV Technology
Small Business Adoption of EMV TechnologySmall Business Adoption of EMV Technology
Small Business Adoption of EMV Technology
Intuit Inc.
 
Small_Merchant_Guide_to_Safe_Payments
Small_Merchant_Guide_to_Safe_PaymentsSmall_Merchant_Guide_to_Safe_Payments
Small_Merchant_Guide_to_Safe_Payments
Steve Abrams
 
How to Maximize Discount Capture
How to Maximize Discount CaptureHow to Maximize Discount Capture
How to Maximize Discount Capture
Taulia
 
Eleventy Marketing Intelligence presentation
Eleventy Marketing Intelligence presentationEleventy Marketing Intelligence presentation
Eleventy Marketing Intelligence presentation
Jeff Birkner
 
Designing for Financial Inclusion - Sending Money Home
Designing for Financial Inclusion - Sending Money HomeDesigning for Financial Inclusion - Sending Money Home
Designing for Financial Inclusion - Sending Money Home
Gabriel White
 
Paul Accinno – Traditional vs Digital Advertising
Paul Accinno – Traditional vs Digital AdvertisingPaul Accinno – Traditional vs Digital Advertising
Paul Accinno – Traditional vs Digital Advertising
Sean Bradley
 
The DNA of Online Payments Fraud
The DNA of Online Payments FraudThe DNA of Online Payments Fraud
The DNA of Online Payments Fraud
Christopher Uriarte
 
Optimising Payments for Strong Customer Authentication (SCA)
Optimising Payments for Strong Customer Authentication (SCA)Optimising Payments for Strong Customer Authentication (SCA)
Optimising Payments for Strong Customer Authentication (SCA)
Elliott Barton
 
Know your Fraudster: Preparing for the Post EMV Card-Not-Present Fraud
Know your Fraudster: Preparing for the Post EMV Card-Not-Present FraudKnow your Fraudster: Preparing for the Post EMV Card-Not-Present Fraud
Know your Fraudster: Preparing for the Post EMV Card-Not-Present Fraud
Noam Inbar
 
Know Your Fraudster: Leveraging everything you've got to prepare for post-EMV...
Know Your Fraudster: Leveraging everything you've got to prepare for post-EMV...Know Your Fraudster: Leveraging everything you've got to prepare for post-EMV...
Know Your Fraudster: Leveraging everything you've got to prepare for post-EMV...
Forter
 
Payments Pulse Survey: Small Business Edition (October 2019)
Payments Pulse Survey: Small Business Edition (October 2019)Payments Pulse Survey: Small Business Edition (October 2019)
Payments Pulse Survey: Small Business Edition (October 2019)
Payments Canada
 
Vmobilepresentation2010 101019020014-phpapp01
Vmobilepresentation2010 101019020014-phpapp01Vmobilepresentation2010 101019020014-phpapp01
Vmobilepresentation2010 101019020014-phpapp01
lordmer
 
CRiskCo - value deck
CRiskCo - value deckCRiskCo - value deck
CRiskCo - value deck
Erez Saf
 

Similar to BDW Chicago 2016 - Randal Cox, Chief Scientist & Co-Founder, Rippleshot - Energizer Bunny Models: Variable Indirection for Eternally Robust Models (20)

KDD Analytics 2014 - Experts in Marketing Analytics
KDD Analytics 2014 - Experts in Marketing AnalyticsKDD Analytics 2014 - Experts in Marketing Analytics
KDD Analytics 2014 - Experts in Marketing Analytics
 
Myrecharge ppt
Myrecharge pptMyrecharge ppt
Myrecharge ppt
 
Financial Model Infograph
Financial Model InfographFinancial Model Infograph
Financial Model Infograph
 
Allkontak presentation
Allkontak presentationAllkontak presentation
Allkontak presentation
 
Audit fraud power point for a portfolio.pptx
Audit fraud power point for a portfolio.pptxAudit fraud power point for a portfolio.pptx
Audit fraud power point for a portfolio.pptx
 
Tangerine concepts
Tangerine conceptsTangerine concepts
Tangerine concepts
 
Online Retail Risk Management
Online Retail Risk ManagementOnline Retail Risk Management
Online Retail Risk Management
 
Small Business Adoption of EMV Technology
Small Business Adoption of EMV TechnologySmall Business Adoption of EMV Technology
Small Business Adoption of EMV Technology
 
Small_Merchant_Guide_to_Safe_Payments
Small_Merchant_Guide_to_Safe_PaymentsSmall_Merchant_Guide_to_Safe_Payments
Small_Merchant_Guide_to_Safe_Payments
 
How to Maximize Discount Capture
How to Maximize Discount CaptureHow to Maximize Discount Capture
How to Maximize Discount Capture
 
Eleventy Marketing Intelligence presentation
Eleventy Marketing Intelligence presentationEleventy Marketing Intelligence presentation
Eleventy Marketing Intelligence presentation
 
Designing for Financial Inclusion - Sending Money Home
Designing for Financial Inclusion - Sending Money HomeDesigning for Financial Inclusion - Sending Money Home
Designing for Financial Inclusion - Sending Money Home
 
Paul Accinno – Traditional vs Digital Advertising
Paul Accinno – Traditional vs Digital AdvertisingPaul Accinno – Traditional vs Digital Advertising
Paul Accinno – Traditional vs Digital Advertising
 
The DNA of Online Payments Fraud
The DNA of Online Payments FraudThe DNA of Online Payments Fraud
The DNA of Online Payments Fraud
 
Optimising Payments for Strong Customer Authentication (SCA)
Optimising Payments for Strong Customer Authentication (SCA)Optimising Payments for Strong Customer Authentication (SCA)
Optimising Payments for Strong Customer Authentication (SCA)
 
Know your Fraudster: Preparing for the Post EMV Card-Not-Present Fraud
Know your Fraudster: Preparing for the Post EMV Card-Not-Present FraudKnow your Fraudster: Preparing for the Post EMV Card-Not-Present Fraud
Know your Fraudster: Preparing for the Post EMV Card-Not-Present Fraud
 
Know Your Fraudster: Leveraging everything you've got to prepare for post-EMV...
Know Your Fraudster: Leveraging everything you've got to prepare for post-EMV...Know Your Fraudster: Leveraging everything you've got to prepare for post-EMV...
Know Your Fraudster: Leveraging everything you've got to prepare for post-EMV...
 
Payments Pulse Survey: Small Business Edition (October 2019)
Payments Pulse Survey: Small Business Edition (October 2019)Payments Pulse Survey: Small Business Edition (October 2019)
Payments Pulse Survey: Small Business Edition (October 2019)
 
Vmobilepresentation2010 101019020014-phpapp01
Vmobilepresentation2010 101019020014-phpapp01Vmobilepresentation2010 101019020014-phpapp01
Vmobilepresentation2010 101019020014-phpapp01
 
CRiskCo - value deck
CRiskCo - value deckCRiskCo - value deck
CRiskCo - value deck
 

More from Big Data Week

BDW17 London - Edward Kibardin - Mitie PLC - Learning and Topological Data A...
 BDW17 London - Edward Kibardin - Mitie PLC - Learning and Topological Data A... BDW17 London - Edward Kibardin - Mitie PLC - Learning and Topological Data A...
BDW17 London - Edward Kibardin - Mitie PLC - Learning and Topological Data A...
Big Data Week
 
BDWW17 London - Steve Bradbury, GRSC - Big Data to the Rescue: A Fraud Case S...
BDWW17 London - Steve Bradbury, GRSC - Big Data to the Rescue: A Fraud Case S...BDWW17 London - Steve Bradbury, GRSC - Big Data to the Rescue: A Fraud Case S...
BDWW17 London - Steve Bradbury, GRSC - Big Data to the Rescue: A Fraud Case S...
Big Data Week
 
BDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal Inference
BDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal InferenceBDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal Inference
BDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal Inference
Big Data Week
 
BDW17 London - Rita Simoes, Boehringer Ingelheim - Big Data in Pharma: Sittin...
BDW17 London - Rita Simoes, Boehringer Ingelheim - Big Data in Pharma: Sittin...BDW17 London - Rita Simoes, Boehringer Ingelheim - Big Data in Pharma: Sittin...
BDW17 London - Rita Simoes, Boehringer Ingelheim - Big Data in Pharma: Sittin...
Big Data Week
 
BDW17 London - Mick Ridley, Exterion Media & Dale Campbell , TfL - Transformi...
BDW17 London - Mick Ridley, Exterion Media & Dale Campbell , TfL - Transformi...BDW17 London - Mick Ridley, Exterion Media & Dale Campbell , TfL - Transformi...
BDW17 London - Mick Ridley, Exterion Media & Dale Campbell , TfL - Transformi...
Big Data Week
 
BDW17 London - Abed Ajraou - First Utility - Putting Data Science in your Bus...
BDW17 London - Abed Ajraou - First Utility - Putting Data Science in your Bus...BDW17 London - Abed Ajraou - First Utility - Putting Data Science in your Bus...
BDW17 London - Abed Ajraou - First Utility - Putting Data Science in your Bus...
Big Data Week
 
BDW17 London - Steve Bradbury - GRSC - Making Sense of the Chaos of Data
BDW17 London - Steve Bradbury - GRSC - Making Sense of the Chaos of DataBDW17 London - Steve Bradbury - GRSC - Making Sense of the Chaos of Data
BDW17 London - Steve Bradbury - GRSC - Making Sense of the Chaos of Data
Big Data Week
 
BDW17 London - Andy Boura - Thomson Reuters - Does Big Data Have to Mean Big ...
BDW17 London - Andy Boura - Thomson Reuters - Does Big Data Have to Mean Big ...BDW17 London - Andy Boura - Thomson Reuters - Does Big Data Have to Mean Big ...
BDW17 London - Andy Boura - Thomson Reuters - Does Big Data Have to Mean Big ...
Big Data Week
 
BDW17 London - Tom Woolrich, Financial Times - What Does Big Data Mean for th...
BDW17 London - Tom Woolrich, Financial Times - What Does Big Data Mean for th...BDW17 London - Tom Woolrich, Financial Times - What Does Big Data Mean for th...
BDW17 London - Tom Woolrich, Financial Times - What Does Big Data Mean for th...
Big Data Week
 
BDW17 London - Andrew Fryer, Microsoft - Everybody Needs a Bit of Science in ...
BDW17 London - Andrew Fryer, Microsoft - Everybody Needs a Bit of Science in ...BDW17 London - Andrew Fryer, Microsoft - Everybody Needs a Bit of Science in ...
BDW17 London - Andrew Fryer, Microsoft - Everybody Needs a Bit of Science in ...
Big Data Week
 
BDW16 London - Alex Bordei, Bigstep - Building Data Labs in the Cloud
BDW16 London - Alex Bordei, Bigstep - Building Data Labs in the CloudBDW16 London - Alex Bordei, Bigstep - Building Data Labs in the Cloud
BDW16 London - Alex Bordei, Bigstep - Building Data Labs in the Cloud
Big Data Week
 
BDW16 London - William Vambenepe, Google - 3rd Generation Data Platform
BDW16 London - William Vambenepe, Google - 3rd Generation Data PlatformBDW16 London - William Vambenepe, Google - 3rd Generation Data Platform
BDW16 London - William Vambenepe, Google - 3rd Generation Data Platform
Big Data Week
 
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
Big Data Week
 
BDW16 London - Nondas Sourlas, Bupa - Big Data in Healthcare
BDW16 London  - Nondas Sourlas, Bupa - Big Data in HealthcareBDW16 London  - Nondas Sourlas, Bupa - Big Data in Healthcare
BDW16 London - Nondas Sourlas, Bupa - Big Data in Healthcare
Big Data Week
 
BDW16 London - John Callan, Boxever - Data and Analytics - The Fuel Your Bran...
BDW16 London - John Callan, Boxever - Data and Analytics - The Fuel Your Bran...BDW16 London - John Callan, Boxever - Data and Analytics - The Fuel Your Bran...
BDW16 London - John Callan, Boxever - Data and Analytics - The Fuel Your Bran...
Big Data Week
 
BDW16 London - John Belchamber, Telefonica - New Data, New Strategies, New Op...
BDW16 London - John Belchamber, Telefonica - New Data, New Strategies, New Op...BDW16 London - John Belchamber, Telefonica - New Data, New Strategies, New Op...
BDW16 London - John Belchamber, Telefonica - New Data, New Strategies, New Op...
Big Data Week
 
BDW16 London - Deenar Toraskar, Think Reactive - Fast Data Key to Efficient C...
BDW16 London - Deenar Toraskar, Think Reactive - Fast Data Key to Efficient C...BDW16 London - Deenar Toraskar, Think Reactive - Fast Data Key to Efficient C...
BDW16 London - Deenar Toraskar, Think Reactive - Fast Data Key to Efficient C...
Big Data Week
 
BDW16 London - Jonny Voon, Innovate UK - Smart Cities and the Buzz Word Bingo
BDW16 London - Jonny Voon, Innovate UK - Smart Cities and the Buzz Word BingoBDW16 London - Jonny Voon, Innovate UK - Smart Cities and the Buzz Word Bingo
BDW16 London - Jonny Voon, Innovate UK - Smart Cities and the Buzz Word Bingo
Big Data Week
 
BDW16 London - Marius Boeru, Bigstep - How to Automate Big Data with Ansible
BDW16 London -  Marius Boeru, Bigstep - How to Automate Big Data with AnsibleBDW16 London -  Marius Boeru, Bigstep - How to Automate Big Data with Ansible
BDW16 London - Marius Boeru, Bigstep - How to Automate Big Data with Ansible
Big Data Week
 
BDW16 London - Josh Partridge, Shazam - How Labels, Radio Stations and Brand...
BDW16 London - Josh Partridge, Shazam -  How Labels, Radio Stations and Brand...BDW16 London - Josh Partridge, Shazam -  How Labels, Radio Stations and Brand...
BDW16 London - Josh Partridge, Shazam - How Labels, Radio Stations and Brand...
Big Data Week
 

More from Big Data Week (20)

BDW17 London - Edward Kibardin - Mitie PLC - Learning and Topological Data A...
 BDW17 London - Edward Kibardin - Mitie PLC - Learning and Topological Data A... BDW17 London - Edward Kibardin - Mitie PLC - Learning and Topological Data A...
BDW17 London - Edward Kibardin - Mitie PLC - Learning and Topological Data A...
 
BDWW17 London - Steve Bradbury, GRSC - Big Data to the Rescue: A Fraud Case S...
BDWW17 London - Steve Bradbury, GRSC - Big Data to the Rescue: A Fraud Case S...BDWW17 London - Steve Bradbury, GRSC - Big Data to the Rescue: A Fraud Case S...
BDWW17 London - Steve Bradbury, GRSC - Big Data to the Rescue: A Fraud Case S...
 
BDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal Inference
BDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal InferenceBDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal Inference
BDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal Inference
 
BDW17 London - Rita Simoes, Boehringer Ingelheim - Big Data in Pharma: Sittin...
BDW17 London - Rita Simoes, Boehringer Ingelheim - Big Data in Pharma: Sittin...BDW17 London - Rita Simoes, Boehringer Ingelheim - Big Data in Pharma: Sittin...
BDW17 London - Rita Simoes, Boehringer Ingelheim - Big Data in Pharma: Sittin...
 
BDW17 London - Mick Ridley, Exterion Media & Dale Campbell , TfL - Transformi...
BDW17 London - Mick Ridley, Exterion Media & Dale Campbell , TfL - Transformi...BDW17 London - Mick Ridley, Exterion Media & Dale Campbell , TfL - Transformi...
BDW17 London - Mick Ridley, Exterion Media & Dale Campbell , TfL - Transformi...
 
BDW17 London - Abed Ajraou - First Utility - Putting Data Science in your Bus...
BDW17 London - Abed Ajraou - First Utility - Putting Data Science in your Bus...BDW17 London - Abed Ajraou - First Utility - Putting Data Science in your Bus...
BDW17 London - Abed Ajraou - First Utility - Putting Data Science in your Bus...
 
BDW17 London - Steve Bradbury - GRSC - Making Sense of the Chaos of Data
BDW17 London - Steve Bradbury - GRSC - Making Sense of the Chaos of DataBDW17 London - Steve Bradbury - GRSC - Making Sense of the Chaos of Data
BDW17 London - Steve Bradbury - GRSC - Making Sense of the Chaos of Data
 
BDW17 London - Andy Boura - Thomson Reuters - Does Big Data Have to Mean Big ...
BDW17 London - Andy Boura - Thomson Reuters - Does Big Data Have to Mean Big ...BDW17 London - Andy Boura - Thomson Reuters - Does Big Data Have to Mean Big ...
BDW17 London - Andy Boura - Thomson Reuters - Does Big Data Have to Mean Big ...
 
BDW17 London - Tom Woolrich, Financial Times - What Does Big Data Mean for th...
BDW17 London - Tom Woolrich, Financial Times - What Does Big Data Mean for th...BDW17 London - Tom Woolrich, Financial Times - What Does Big Data Mean for th...
BDW17 London - Tom Woolrich, Financial Times - What Does Big Data Mean for th...
 
BDW17 London - Andrew Fryer, Microsoft - Everybody Needs a Bit of Science in ...
BDW17 London - Andrew Fryer, Microsoft - Everybody Needs a Bit of Science in ...BDW17 London - Andrew Fryer, Microsoft - Everybody Needs a Bit of Science in ...
BDW17 London - Andrew Fryer, Microsoft - Everybody Needs a Bit of Science in ...
 
BDW16 London - Alex Bordei, Bigstep - Building Data Labs in the Cloud
BDW16 London - Alex Bordei, Bigstep - Building Data Labs in the CloudBDW16 London - Alex Bordei, Bigstep - Building Data Labs in the Cloud
BDW16 London - Alex Bordei, Bigstep - Building Data Labs in the Cloud
 
BDW16 London - William Vambenepe, Google - 3rd Generation Data Platform
BDW16 London - William Vambenepe, Google - 3rd Generation Data PlatformBDW16 London - William Vambenepe, Google - 3rd Generation Data Platform
BDW16 London - William Vambenepe, Google - 3rd Generation Data Platform
 
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
 
BDW16 London - Nondas Sourlas, Bupa - Big Data in Healthcare
BDW16 London  - Nondas Sourlas, Bupa - Big Data in HealthcareBDW16 London  - Nondas Sourlas, Bupa - Big Data in Healthcare
BDW16 London - Nondas Sourlas, Bupa - Big Data in Healthcare
 
BDW16 London - John Callan, Boxever - Data and Analytics - The Fuel Your Bran...
BDW16 London - John Callan, Boxever - Data and Analytics - The Fuel Your Bran...BDW16 London - John Callan, Boxever - Data and Analytics - The Fuel Your Bran...
BDW16 London - John Callan, Boxever - Data and Analytics - The Fuel Your Bran...
 
BDW16 London - John Belchamber, Telefonica - New Data, New Strategies, New Op...
BDW16 London - John Belchamber, Telefonica - New Data, New Strategies, New Op...BDW16 London - John Belchamber, Telefonica - New Data, New Strategies, New Op...
BDW16 London - John Belchamber, Telefonica - New Data, New Strategies, New Op...
 
BDW16 London - Deenar Toraskar, Think Reactive - Fast Data Key to Efficient C...
BDW16 London - Deenar Toraskar, Think Reactive - Fast Data Key to Efficient C...BDW16 London - Deenar Toraskar, Think Reactive - Fast Data Key to Efficient C...
BDW16 London - Deenar Toraskar, Think Reactive - Fast Data Key to Efficient C...
 
BDW16 London - Jonny Voon, Innovate UK - Smart Cities and the Buzz Word Bingo
BDW16 London - Jonny Voon, Innovate UK - Smart Cities and the Buzz Word BingoBDW16 London - Jonny Voon, Innovate UK - Smart Cities and the Buzz Word Bingo
BDW16 London - Jonny Voon, Innovate UK - Smart Cities and the Buzz Word Bingo
 
BDW16 London - Marius Boeru, Bigstep - How to Automate Big Data with Ansible
BDW16 London -  Marius Boeru, Bigstep - How to Automate Big Data with AnsibleBDW16 London -  Marius Boeru, Bigstep - How to Automate Big Data with Ansible
BDW16 London - Marius Boeru, Bigstep - How to Automate Big Data with Ansible
 
BDW16 London - Josh Partridge, Shazam - How Labels, Radio Stations and Brand...
BDW16 London - Josh Partridge, Shazam -  How Labels, Radio Stations and Brand...BDW16 London - Josh Partridge, Shazam -  How Labels, Radio Stations and Brand...
BDW16 London - Josh Partridge, Shazam - How Labels, Radio Stations and Brand...
 

Recently uploaded

“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
Edge AI and Vision Alliance
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Neo4j
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
Edge AI and Vision Alliance
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
Ajin Abraham
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
Ivo Velitchkov
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Fwdays
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
saastr
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Pitangent Analytics & Technology Solutions Pvt. Ltd
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 

Recently uploaded (20)

“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 

BDW Chicago 2016 - Randal Cox, Chief Scientist & Co-Founder, Rippleshot - Energizer Bunny Models: Variable Indirection for Eternally Robust Models

Editor's Notes

  1. Machine learning models are everywhere, doing everything. I’m guessing they are transforming the businesses of everyone here, right? And these models work great.
  2. Until they don’t. No matter how good your model, pretty soon the real world works its way in and the model start to tank. That’s expensive, since building a good model still takes a bunch of your time and attention to build.
  3. I’m Randal Cox, the chief scientist and co-founder at Rippleshot. I want to talk today about ways to keep your models running a lot longer – for years even.
  4. About us. Rippleshot detects payment card data breaches, like Target or Home Depot. We trace fraudulent purchases back in time
  5. About us. Rippleshot detects payment card data breaches, like Target or Home Depot. We trace fraudulent purchases back in time
  6. … to where those cards all visited the same location. That’s where the card was stolen. Think of it like tracing food poisoning back to a the greasy spoon.
  7. Rippleshot builds a lot of machine learning models
  8. We predict which **cards** are going to be used fraudulently soon, based on where and how they shop. We make models that predict is a store is likely to be breached soon And in real-time, we build payment card decline rules to stop fraud spends right NOW.
  9. Let’s consider the model that’s most important to card issuers. This model stops suspicious transactions in real time before the bank or merchant incurs any loss at all. Getting that right is very hard and really important to get right. Let’s look at a concrete example.
  10. We are a decision tree shop. You’re probably using tree-like rules all the time. In this example, a payment far from home at a gas station is likely to be fraud, though even more likely on a weekday. Some nearby states with big dollar purchases are also risky.
  11. That said, all of what I’m saying today is equally applicable to other modeling techniques like neural nets.
  12. The reason I’m up here is I’ve been asked to share some of my big data insights. I’ve only got two, really. So this should be a short talk.  - I have some techniques for filling out in feeble data - and a way to use those variables indirectly that makes your models last longer
  13. Models are only as good as their data, and sometimes the data is TERRIBLE One of our clients gave us VERY LITTLE about each transaction. It’s hard to make a great model out of that, so you’re going to have to augment this data.
  14. Luckily we know something about Fraudster behavior. He does things card holders do not.
  15. Basically he often spend far from the consumer’s home. he shop at odd hours when there is less scrutiny They like launderable goods, like big screen tvs. Let’s look at the where first.
  16. Let’s look at where first. You and I shop close to our homes usually. But the fraudsters might not know where home is or just don’t have presence on the ground there. So,
  17. Distance between home and the point of sale is incredibly predictive of fraud. It’s often my #1 variable in card present models
  18. Distance is a little hard to compute. You need clean country and postal codes for home and the POS. Then you need to look up the latitude and longitude of those postal codes, and then run some modestly complicated math to get the distance.
  19. Luckily the lot,lons for all worldwide postal codes are available for free. And the Haversine formula is a google search away.
  20. Distance was a big win, let’s look at time. Here is the legitimate spend on a large cohort of cards. It’s almost like a pulse with a 1-week period and a 1-month automatic payments period. More regular than my heartbeat.
  21. But the fraud spend, is often REALLY different. Huge upswings over week-long periods and even in more regular periods, out of phase from the consumers.
  22. The fraud signal is likely to be drowned out by legitimate Friday payments, for example. But the fraudsters are often busy on days when card holders are not.
  23. Same thing with the time of day. Fraudsters seem to like the dark better than the sun.
  24. So, we now have Day of Week and Hour of Day as new features. The original data set included some things like states and postal codes and merchant types (groceries or gas stations)
  25. A lot of modelers will use these variables DIRECTLY in the model like in that earlier example.
  26. Don’t do that
  27. There are two chief reasons for this: the problems of ordinality and of change.
  28. Ordinality is just a fancy way of saying there are too many possible values for these variables to use directly. If you feed any modeling tech a column with more than a million possible categories, it is going to barf. Like, game over, usually. If your lucky, it will just perform poorly.
  29. There is another disadvantage here. Splitting on a large list like postal codes makes for HUGE rules. Some environments impose character limits on decline expressions. It would be much better to have some proxy to postal codes to make the expression shorter.
  30. The other problem is change. The fraudsters know you are trying to catch them, so they change as fast as possible. Unfortunately, you usually don’t get the post-it-note about it. If you model directly on the the state of merchant category, you’re locked in until you can build the next model – and that might take months. Fraud is faster than that.
  31. One way forward is to replace those primary variables – one layer of indirection. Instead of a postal code, give the model the fraud rate at this postal code.
  32. Here is a table of merchant categories. One MCC has an astounding 20% fraud rate in this data set. And another never has fraud. How would you roll that information into a fraud model?
  33. Let’s say the rest of your variables can be used to make a fraud score and you want to add this MCC data. For the data as a whole, you usually decline at a score of 600 You might be tempted to just decline more often in the risky MCC and maybe require a higher fraud score for the safe MCC. But the fraudsters will move from 5734 to 5735 the week after you implement your model.
  34. Better to use the fraud rate instead. Then you can update a table of fraud rates for all your MCCs and not change your model ITSELF.
  35. So that’s a step up, but you need to be careful. If the number of transactions is small, you can get a high fraud rate by chance. Let’s say your real fraud rate is 25% - roll a 1 on a 4-sided die. But if you only have two records, you might get unlucky and roll two 1’s on the two dice. There is a 6% chance you get two ones and think your fraud rate is 100% - a huge error!
  36. There is a simple way around this. If you hate math, close your eyes for the next slide
  37. For everyone else, z-scores encapsulate how sure we are that our observed rate is different from background. Really, we’re comparing the global rate (all MCCs) with the rate at this MCC. <click> So, if you’re running 4bp in fraud overall, and your this MCC is at 6 bp, just divide that 2 bp difference by the sum of the standard deviations for those two curves. If the width of the curve is very large, then the z-score decreases a lot – you’re not so sure about the result. If the width is small (i.e., you have a large number of transactions), you get a small number in the denominator.
  38. The math phobes can open their eyes now. The bottom line is that a z-score above 3 means you can be very sure there is a lot of fraud going on at this MCC. If the z-score is less than -3, then you can be sure that fraud is really avoiding this MCC. So our tree model might pay a lot of attention to an MCC with a 10.7 z-score.
  39. I set up z-score tables for lots of primary variables, and then discard the primary variables. And some of the added variables. As fraud changes, update the z-scores – it’s like updating the model, but for less work. Usually, I keep a running fraud rate for, say, the POS state during the last month. I update this TABLE once a week There is another advantage here. Comparing against z-scores makes for more compact rule text.
  40. There is another disadvantage here. Splitting on a large list like postal codes makes for HUGE rules. Some environments impose character limits on decline expressions. It would be much better to have some proxy to postal codes to make the expression shorter.
  41. Using this approach makes models last dramatically longer. Sure lots of people will just retrain their model frequently, but that’s more work than updating a table. Also, in my hands most modeling technologies actually perform better with z-scores even directly out of the gate. It’s easier to do numeric comparisons for splits than split by a bunch of categories. The cleaner split usually means better capture.
  42. And this approach is not very hard. I keep a rolling calculator of the z-score for those frauds for, say, grocery stores over the last month. Every two weeks, I update the table and leave the model along.
  43. So, in quick summary, we’ve added space and time variables removed specific features that might change and replaced them with z-scores
  44. The upshot is you get models that last years, not months.
  45. Any questions?