SlideShare a Scribd company logo
1 of 45
Predictive Models
Energizer
Bunny
Models
Big Data Week - Randal Cox
Fill in Sparse Data
Use Data Indirectly
Predictive Models
Card Risk
Store Risk
Transaction Risk
Predictive Models
Transaction Risk
The Goal
Weekend Swiped >$100 ND, SD
Gas Station IL, IA, WI
Far Away
The Goal
Overcoming Sparse Data
Use Variables Smarter
Transaction
$$$
PIN OK?
Card Present?
Time stamp
Merchant
Type
State
Country
Postal
Card Holder
Postal
shops far away
shops odd hours
shops special stores
The Power
Cards near home
Fraudsters don’t know
or can’t spend there
Distance
Home => POS
The Price
POS Country
POS Postal
Home Country
Home Postal
Lat, Lon Lat, Lon
Lat/Lon For
World-Wide
Postal Codes
http://download.geonames.org/export/zip/
Haversine
Formula
0
100,000
200,000
300,000
400,000
500,000
600,000
700,000
800,000
1/1/04 1/29/04 2/26/04 3/25/04 4/22/04 5/20/04 6/17/04 7/15/04 8/12/04 9/9/04 10/7/04 11/4/04 12/2/04
0
100,000
200,000
300,000
400,000
500,000
600,000
700,000
800,000
1/1/04 1/29/04 2/26/04 3/25/04 4/22/04 5/20/04 6/17/04 7/15/04 8/12/04 9/9/04 10/7/04 11/4/04 12/2/04
Day of Week
SUN MON TUE
WED THU FRI SAT
0
20
40
60
80
100
0 3 6 9 12 15 18 21 24
Fraud
Hour
Risky Features
The Goal
Weekend Swiped >$100 ND, SD
Gas Station IL, IA, WI
Far Away
The Goal
Weekend Swiped >$100 ND, SD
Gas Station IL, IA, WI
Far Away
Ordinality
Change
Ordinality
40,000 US Postal Codes
1,190,000 World Postal Codes
3,000 Merchant Types
If Postal Code IS IN 50137, 32567, 57776, 24942, 98349, 13627, 80203,
84726, 43571, 25004, 12964, 75051, 88595, 13774,
64499, 33074, 54021, 45055, 92628, 32086, 36079,
99780, 93639, 88560, 58274, 55403, 30811, 19056,
47370, 71138, 29481, 45319, 89154, 50058, 56051,
57791, 43611, 97630, 95894, 46504, 07028, 30477,
19726, 68781, 05763, 59462, 92198, 50432, 90055,
20626, .... 1000 more
If PROXY > 5.3
One Solution
Fraud Rates
MCC Fraud Auths Fraud Rate
5734 57 of 7210 79.06 bp
5542 256 of 203053 12.61 bp
5732 25 of 3951 63.28 bp
6011 47 of 20565 22.85 bp
5691 24 of 4199 57.16 bp
5999 39 of 18494 21.09 bp
5967 16 of 1320 121.21 bp
5972 9 of 43 2093.02 bp
0000 20 of 6141 32.57 bp
5964 15 of 2584 58.05 bp
5651 27 of 12836 21.03 bp
MCC Fraud Auths Fraud Rate
9399 4 of 11786 3.39 bp
5921 7 of 20044 3.49 bp
4121 5 of 16751 2.98 bp
7230 3 of 11866 2.53 bp
5912 27 of 61428 4.40 bp
5411 134 of 255195 5.25 bp
4784 0 of 7360 0.00 bp
5812 55 of 128521 4.28 bp
7832 0 of 9076 0.00 bp
7841 1 of 10970 0.91 bp
SafeRisky
If Fraud Score> 600
Decline
If Fraud Score > 900 and MCC == 9399
Decline
If Fraud Score > 200 and MCC == 5734
Decline
Bulk
LOW
RISK
HIGH
RISK
Can We Use the Fraud Rate Instead?
If Fraud Score > 900 and MCC Fraud < 0.02%
Decline
If Fraud Score> 200 and MCC Fraud > 10%
Decline
LOW
RISK
HIGH
RISK
The Problem of
Small Numbers
Fraud Rate = 1/4
P[2 x 1] = 6%
Removing Variables
M1 - M2
SD1+SD2
Z =
4bp 6 bp 8 bp 10 bp2bp
Z = 10.7 Z = -6.2
RISKY SAFE
POS Country
POS State
POS Postal
POS MCC
Distance Intervals
Z-Score for
Entry Mode
Authentication
Time of Day
Day of Week
Merchants
If Postal Code IS IN 50137, 32567, 57776, 24942, 98349, 13627, 80203,
84726, 43571, 25004, 12964, 75051, 88595, 13774,
64499, 33074, 54021, 45055, 92628, 32086, 36079,
99780, 93639, 88560, 58274, 55403, 30811, 19056,
47370, 71138, 29481, 45319, 89154, 50058, 56051,
57791, 43611, 97630, 95894, 46504, 07028, 30477,
19726, 68781, 05763, 59462, 92198, 50432, 90055,
20626, .... 1000 more
If Postal Code Z-Score > 8.3 and FraudScore > 300
then DECLINE
vs
Model Durability
Approach Time to Decay
Direct Use Weeks to Months
Z-Score Years
Operationalize
Rolling Fraud Rate and Z-score
Update Tables in your Rules Environment
• Space and Time
• No Direct Features
• Add Z-Score
Energizer
Bunny
Models
randal@rippleshot.com
rippleshot.com

More Related Content

Similar to BDW Chicago 2016 - Randal Cox, Chief Scientist & Co-Founder, Rippleshot - Energizer Bunny Models: Variable Indirection for Eternally Robust Models

Financial Model Infograph
Financial Model InfographFinancial Model Infograph
Financial Model Infograph
Joe Solari
 
Online Retail Risk Management
Online Retail Risk ManagementOnline Retail Risk Management
Online Retail Risk Management
iamwire
 
Small_Merchant_Guide_to_Safe_Payments
Small_Merchant_Guide_to_Safe_PaymentsSmall_Merchant_Guide_to_Safe_Payments
Small_Merchant_Guide_to_Safe_Payments
Steve Abrams
 
Eleventy Marketing Intelligence presentation
Eleventy Marketing Intelligence presentationEleventy Marketing Intelligence presentation
Eleventy Marketing Intelligence presentation
Jeff Birkner
 
The DNA of Online Payments Fraud
The DNA of Online Payments FraudThe DNA of Online Payments Fraud
The DNA of Online Payments Fraud
Christopher Uriarte
 
Vmobilepresentation2010 101019020014-phpapp01
Vmobilepresentation2010 101019020014-phpapp01Vmobilepresentation2010 101019020014-phpapp01
Vmobilepresentation2010 101019020014-phpapp01
lordmer
 

Similar to BDW Chicago 2016 - Randal Cox, Chief Scientist & Co-Founder, Rippleshot - Energizer Bunny Models: Variable Indirection for Eternally Robust Models (20)

KDD Analytics 2014 - Experts in Marketing Analytics
KDD Analytics 2014 - Experts in Marketing AnalyticsKDD Analytics 2014 - Experts in Marketing Analytics
KDD Analytics 2014 - Experts in Marketing Analytics
 
Myrecharge ppt
Myrecharge pptMyrecharge ppt
Myrecharge ppt
 
Financial Model Infograph
Financial Model InfographFinancial Model Infograph
Financial Model Infograph
 
Allkontak presentation
Allkontak presentationAllkontak presentation
Allkontak presentation
 
Audit fraud power point for a portfolio.pptx
Audit fraud power point for a portfolio.pptxAudit fraud power point for a portfolio.pptx
Audit fraud power point for a portfolio.pptx
 
Tangerine concepts
Tangerine conceptsTangerine concepts
Tangerine concepts
 
Online Retail Risk Management
Online Retail Risk ManagementOnline Retail Risk Management
Online Retail Risk Management
 
Small Business Adoption of EMV Technology
Small Business Adoption of EMV TechnologySmall Business Adoption of EMV Technology
Small Business Adoption of EMV Technology
 
Small_Merchant_Guide_to_Safe_Payments
Small_Merchant_Guide_to_Safe_PaymentsSmall_Merchant_Guide_to_Safe_Payments
Small_Merchant_Guide_to_Safe_Payments
 
How to Maximize Discount Capture
How to Maximize Discount CaptureHow to Maximize Discount Capture
How to Maximize Discount Capture
 
Eleventy Marketing Intelligence presentation
Eleventy Marketing Intelligence presentationEleventy Marketing Intelligence presentation
Eleventy Marketing Intelligence presentation
 
Designing for Financial Inclusion - Sending Money Home
Designing for Financial Inclusion - Sending Money HomeDesigning for Financial Inclusion - Sending Money Home
Designing for Financial Inclusion - Sending Money Home
 
Paul Accinno – Traditional vs Digital Advertising
Paul Accinno – Traditional vs Digital AdvertisingPaul Accinno – Traditional vs Digital Advertising
Paul Accinno – Traditional vs Digital Advertising
 
The DNA of Online Payments Fraud
The DNA of Online Payments FraudThe DNA of Online Payments Fraud
The DNA of Online Payments Fraud
 
Optimising Payments for Strong Customer Authentication (SCA)
Optimising Payments for Strong Customer Authentication (SCA)Optimising Payments for Strong Customer Authentication (SCA)
Optimising Payments for Strong Customer Authentication (SCA)
 
Know your Fraudster: Preparing for the Post EMV Card-Not-Present Fraud
Know your Fraudster: Preparing for the Post EMV Card-Not-Present FraudKnow your Fraudster: Preparing for the Post EMV Card-Not-Present Fraud
Know your Fraudster: Preparing for the Post EMV Card-Not-Present Fraud
 
Know Your Fraudster: Leveraging everything you've got to prepare for post-EMV...
Know Your Fraudster: Leveraging everything you've got to prepare for post-EMV...Know Your Fraudster: Leveraging everything you've got to prepare for post-EMV...
Know Your Fraudster: Leveraging everything you've got to prepare for post-EMV...
 
Payments Pulse Survey: Small Business Edition (October 2019)
Payments Pulse Survey: Small Business Edition (October 2019)Payments Pulse Survey: Small Business Edition (October 2019)
Payments Pulse Survey: Small Business Edition (October 2019)
 
Vmobilepresentation2010 101019020014-phpapp01
Vmobilepresentation2010 101019020014-phpapp01Vmobilepresentation2010 101019020014-phpapp01
Vmobilepresentation2010 101019020014-phpapp01
 
CRiskCo - value deck
CRiskCo - value deckCRiskCo - value deck
CRiskCo - value deck
 

More from Big Data Week

More from Big Data Week (20)

BDW17 London - Edward Kibardin - Mitie PLC - Learning and Topological Data A...
 BDW17 London - Edward Kibardin - Mitie PLC - Learning and Topological Data A... BDW17 London - Edward Kibardin - Mitie PLC - Learning and Topological Data A...
BDW17 London - Edward Kibardin - Mitie PLC - Learning and Topological Data A...
 
BDWW17 London - Steve Bradbury, GRSC - Big Data to the Rescue: A Fraud Case S...
BDWW17 London - Steve Bradbury, GRSC - Big Data to the Rescue: A Fraud Case S...BDWW17 London - Steve Bradbury, GRSC - Big Data to the Rescue: A Fraud Case S...
BDWW17 London - Steve Bradbury, GRSC - Big Data to the Rescue: A Fraud Case S...
 
BDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal Inference
BDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal InferenceBDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal Inference
BDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal Inference
 
BDW17 London - Rita Simoes, Boehringer Ingelheim - Big Data in Pharma: Sittin...
BDW17 London - Rita Simoes, Boehringer Ingelheim - Big Data in Pharma: Sittin...BDW17 London - Rita Simoes, Boehringer Ingelheim - Big Data in Pharma: Sittin...
BDW17 London - Rita Simoes, Boehringer Ingelheim - Big Data in Pharma: Sittin...
 
BDW17 London - Mick Ridley, Exterion Media & Dale Campbell , TfL - Transformi...
BDW17 London - Mick Ridley, Exterion Media & Dale Campbell , TfL - Transformi...BDW17 London - Mick Ridley, Exterion Media & Dale Campbell , TfL - Transformi...
BDW17 London - Mick Ridley, Exterion Media & Dale Campbell , TfL - Transformi...
 
BDW17 London - Abed Ajraou - First Utility - Putting Data Science in your Bus...
BDW17 London - Abed Ajraou - First Utility - Putting Data Science in your Bus...BDW17 London - Abed Ajraou - First Utility - Putting Data Science in your Bus...
BDW17 London - Abed Ajraou - First Utility - Putting Data Science in your Bus...
 
BDW17 London - Steve Bradbury - GRSC - Making Sense of the Chaos of Data
BDW17 London - Steve Bradbury - GRSC - Making Sense of the Chaos of DataBDW17 London - Steve Bradbury - GRSC - Making Sense of the Chaos of Data
BDW17 London - Steve Bradbury - GRSC - Making Sense of the Chaos of Data
 
BDW17 London - Andy Boura - Thomson Reuters - Does Big Data Have to Mean Big ...
BDW17 London - Andy Boura - Thomson Reuters - Does Big Data Have to Mean Big ...BDW17 London - Andy Boura - Thomson Reuters - Does Big Data Have to Mean Big ...
BDW17 London - Andy Boura - Thomson Reuters - Does Big Data Have to Mean Big ...
 
BDW17 London - Tom Woolrich, Financial Times - What Does Big Data Mean for th...
BDW17 London - Tom Woolrich, Financial Times - What Does Big Data Mean for th...BDW17 London - Tom Woolrich, Financial Times - What Does Big Data Mean for th...
BDW17 London - Tom Woolrich, Financial Times - What Does Big Data Mean for th...
 
BDW17 London - Andrew Fryer, Microsoft - Everybody Needs a Bit of Science in ...
BDW17 London - Andrew Fryer, Microsoft - Everybody Needs a Bit of Science in ...BDW17 London - Andrew Fryer, Microsoft - Everybody Needs a Bit of Science in ...
BDW17 London - Andrew Fryer, Microsoft - Everybody Needs a Bit of Science in ...
 
BDW16 London - Alex Bordei, Bigstep - Building Data Labs in the Cloud
BDW16 London - Alex Bordei, Bigstep - Building Data Labs in the CloudBDW16 London - Alex Bordei, Bigstep - Building Data Labs in the Cloud
BDW16 London - Alex Bordei, Bigstep - Building Data Labs in the Cloud
 
BDW16 London - William Vambenepe, Google - 3rd Generation Data Platform
BDW16 London - William Vambenepe, Google - 3rd Generation Data PlatformBDW16 London - William Vambenepe, Google - 3rd Generation Data Platform
BDW16 London - William Vambenepe, Google - 3rd Generation Data Platform
 
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
 
BDW16 London - Nondas Sourlas, Bupa - Big Data in Healthcare
BDW16 London  - Nondas Sourlas, Bupa - Big Data in HealthcareBDW16 London  - Nondas Sourlas, Bupa - Big Data in Healthcare
BDW16 London - Nondas Sourlas, Bupa - Big Data in Healthcare
 
BDW16 London - John Callan, Boxever - Data and Analytics - The Fuel Your Bran...
BDW16 London - John Callan, Boxever - Data and Analytics - The Fuel Your Bran...BDW16 London - John Callan, Boxever - Data and Analytics - The Fuel Your Bran...
BDW16 London - John Callan, Boxever - Data and Analytics - The Fuel Your Bran...
 
BDW16 London - John Belchamber, Telefonica - New Data, New Strategies, New Op...
BDW16 London - John Belchamber, Telefonica - New Data, New Strategies, New Op...BDW16 London - John Belchamber, Telefonica - New Data, New Strategies, New Op...
BDW16 London - John Belchamber, Telefonica - New Data, New Strategies, New Op...
 
BDW16 London - Deenar Toraskar, Think Reactive - Fast Data Key to Efficient C...
BDW16 London - Deenar Toraskar, Think Reactive - Fast Data Key to Efficient C...BDW16 London - Deenar Toraskar, Think Reactive - Fast Data Key to Efficient C...
BDW16 London - Deenar Toraskar, Think Reactive - Fast Data Key to Efficient C...
 
BDW16 London - Jonny Voon, Innovate UK - Smart Cities and the Buzz Word Bingo
BDW16 London - Jonny Voon, Innovate UK - Smart Cities and the Buzz Word BingoBDW16 London - Jonny Voon, Innovate UK - Smart Cities and the Buzz Word Bingo
BDW16 London - Jonny Voon, Innovate UK - Smart Cities and the Buzz Word Bingo
 
BDW16 London - Marius Boeru, Bigstep - How to Automate Big Data with Ansible
BDW16 London -  Marius Boeru, Bigstep - How to Automate Big Data with AnsibleBDW16 London -  Marius Boeru, Bigstep - How to Automate Big Data with Ansible
BDW16 London - Marius Boeru, Bigstep - How to Automate Big Data with Ansible
 
BDW16 London - Josh Partridge, Shazam - How Labels, Radio Stations and Brand...
BDW16 London - Josh Partridge, Shazam -  How Labels, Radio Stations and Brand...BDW16 London - Josh Partridge, Shazam -  How Labels, Radio Stations and Brand...
BDW16 London - Josh Partridge, Shazam - How Labels, Radio Stations and Brand...
 

Recently uploaded

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

BDW Chicago 2016 - Randal Cox, Chief Scientist & Co-Founder, Rippleshot - Energizer Bunny Models: Variable Indirection for Eternally Robust Models

Editor's Notes

  1. Machine learning models are everywhere, doing everything. I’m guessing they are transforming the businesses of everyone here, right? And these models work great.
  2. Until they don’t. No matter how good your model, pretty soon the real world works its way in and the model start to tank. That’s expensive, since building a good model still takes a bunch of your time and attention to build.
  3. I’m Randal Cox, the chief scientist and co-founder at Rippleshot. I want to talk today about ways to keep your models running a lot longer – for years even.
  4. About us. Rippleshot detects payment card data breaches, like Target or Home Depot. We trace fraudulent purchases back in time
  5. About us. Rippleshot detects payment card data breaches, like Target or Home Depot. We trace fraudulent purchases back in time
  6. … to where those cards all visited the same location. That’s where the card was stolen. Think of it like tracing food poisoning back to a the greasy spoon.
  7. Rippleshot builds a lot of machine learning models
  8. We predict which **cards** are going to be used fraudulently soon, based on where and how they shop. We make models that predict is a store is likely to be breached soon And in real-time, we build payment card decline rules to stop fraud spends right NOW.
  9. Let’s consider the model that’s most important to card issuers. This model stops suspicious transactions in real time before the bank or merchant incurs any loss at all. Getting that right is very hard and really important to get right. Let’s look at a concrete example.
  10. We are a decision tree shop. You’re probably using tree-like rules all the time. In this example, a payment far from home at a gas station is likely to be fraud, though even more likely on a weekday. Some nearby states with big dollar purchases are also risky.
  11. That said, all of what I’m saying today is equally applicable to other modeling techniques like neural nets.
  12. The reason I’m up here is I’ve been asked to share some of my big data insights. I’ve only got two, really. So this should be a short talk.  - I have some techniques for filling out in feeble data - and a way to use those variables indirectly that makes your models last longer
  13. Models are only as good as their data, and sometimes the data is TERRIBLE One of our clients gave us VERY LITTLE about each transaction. It’s hard to make a great model out of that, so you’re going to have to augment this data.
  14. Luckily we know something about Fraudster behavior. He does things card holders do not.
  15. Basically he often spend far from the consumer’s home. he shop at odd hours when there is less scrutiny They like launderable goods, like big screen tvs. Let’s look at the where first.
  16. Let’s look at where first. You and I shop close to our homes usually. But the fraudsters might not know where home is or just don’t have presence on the ground there. So,
  17. Distance between home and the point of sale is incredibly predictive of fraud. It’s often my #1 variable in card present models
  18. Distance is a little hard to compute. You need clean country and postal codes for home and the POS. Then you need to look up the latitude and longitude of those postal codes, and then run some modestly complicated math to get the distance.
  19. Luckily the lot,lons for all worldwide postal codes are available for free. And the Haversine formula is a google search away.
  20. Distance was a big win, let’s look at time. Here is the legitimate spend on a large cohort of cards. It’s almost like a pulse with a 1-week period and a 1-month automatic payments period. More regular than my heartbeat.
  21. But the fraud spend, is often REALLY different. Huge upswings over week-long periods and even in more regular periods, out of phase from the consumers.
  22. The fraud signal is likely to be drowned out by legitimate Friday payments, for example. But the fraudsters are often busy on days when card holders are not.
  23. Same thing with the time of day. Fraudsters seem to like the dark better than the sun.
  24. So, we now have Day of Week and Hour of Day as new features. The original data set included some things like states and postal codes and merchant types (groceries or gas stations)
  25. A lot of modelers will use these variables DIRECTLY in the model like in that earlier example.
  26. Don’t do that
  27. There are two chief reasons for this: the problems of ordinality and of change.
  28. Ordinality is just a fancy way of saying there are too many possible values for these variables to use directly. If you feed any modeling tech a column with more than a million possible categories, it is going to barf. Like, game over, usually. If your lucky, it will just perform poorly.
  29. There is another disadvantage here. Splitting on a large list like postal codes makes for HUGE rules. Some environments impose character limits on decline expressions. It would be much better to have some proxy to postal codes to make the expression shorter.
  30. The other problem is change. The fraudsters know you are trying to catch them, so they change as fast as possible. Unfortunately, you usually don’t get the post-it-note about it. If you model directly on the the state of merchant category, you’re locked in until you can build the next model – and that might take months. Fraud is faster than that.
  31. One way forward is to replace those primary variables – one layer of indirection. Instead of a postal code, give the model the fraud rate at this postal code.
  32. Here is a table of merchant categories. One MCC has an astounding 20% fraud rate in this data set. And another never has fraud. How would you roll that information into a fraud model?
  33. Let’s say the rest of your variables can be used to make a fraud score and you want to add this MCC data. For the data as a whole, you usually decline at a score of 600 You might be tempted to just decline more often in the risky MCC and maybe require a higher fraud score for the safe MCC. But the fraudsters will move from 5734 to 5735 the week after you implement your model.
  34. Better to use the fraud rate instead. Then you can update a table of fraud rates for all your MCCs and not change your model ITSELF.
  35. So that’s a step up, but you need to be careful. If the number of transactions is small, you can get a high fraud rate by chance. Let’s say your real fraud rate is 25% - roll a 1 on a 4-sided die. But if you only have two records, you might get unlucky and roll two 1’s on the two dice. There is a 6% chance you get two ones and think your fraud rate is 100% - a huge error!
  36. There is a simple way around this. If you hate math, close your eyes for the next slide
  37. For everyone else, z-scores encapsulate how sure we are that our observed rate is different from background. Really, we’re comparing the global rate (all MCCs) with the rate at this MCC. <click> So, if you’re running 4bp in fraud overall, and your this MCC is at 6 bp, just divide that 2 bp difference by the sum of the standard deviations for those two curves. If the width of the curve is very large, then the z-score decreases a lot – you’re not so sure about the result. If the width is small (i.e., you have a large number of transactions), you get a small number in the denominator.
  38. The math phobes can open their eyes now. The bottom line is that a z-score above 3 means you can be very sure there is a lot of fraud going on at this MCC. If the z-score is less than -3, then you can be sure that fraud is really avoiding this MCC. So our tree model might pay a lot of attention to an MCC with a 10.7 z-score.
  39. I set up z-score tables for lots of primary variables, and then discard the primary variables. And some of the added variables. As fraud changes, update the z-scores – it’s like updating the model, but for less work. Usually, I keep a running fraud rate for, say, the POS state during the last month. I update this TABLE once a week There is another advantage here. Comparing against z-scores makes for more compact rule text.
  40. There is another disadvantage here. Splitting on a large list like postal codes makes for HUGE rules. Some environments impose character limits on decline expressions. It would be much better to have some proxy to postal codes to make the expression shorter.
  41. Using this approach makes models last dramatically longer. Sure lots of people will just retrain their model frequently, but that’s more work than updating a table. Also, in my hands most modeling technologies actually perform better with z-scores even directly out of the gate. It’s easier to do numeric comparisons for splits than split by a bunch of categories. The cleaner split usually means better capture.
  42. And this approach is not very hard. I keep a rolling calculator of the z-score for those frauds for, say, grocery stores over the last month. Every two weeks, I update the table and leave the model along.
  43. So, in quick summary, we’ve added space and time variables removed specific features that might change and replaced them with z-scores
  44. The upshot is you get models that last years, not months.
  45. Any questions?