To begin, the talk will briefly discuss on how Rippleshot uses payment card transaction data and fraud records to trace back in time where cards were stolen by hackers. Through the analysis of common points of purchase, Rippleshot is able to inform issuer partners which locations have been breached, which cards are most likely to be used fraudulently in the near term, and should be reissued, and which transactions should be declined.
In addition, the presentation will showcase a technique used extensively by Rippleshot in modeling. Instead of using primary variables like postal codes as indicators in models, they prefer to use risk indices derived from the primary variables, in order to future-proof the models.
For example, the fraudsters might be working out of a postal code in Florida right now, but they will change that soon in response to law enforcement or issuer declines. If that happens, a model using the postal code directly would quickly fail, requiring expensive rebuilding and, quite possibly, expensive model failures in the meantime.
Instead, Rippleshot experts collect the fraud rates for that postal code and use those as indicator variables, as opposed to the postal code itself. They can then update this fraud rate table continuously, without having to update the entire model as frequently. In turn, this makes for a far more robust response to dynamic fraud behaviors.
32. MCC Fraud Auths Fraud Rate
5734 57 of 7210 79.06 bp
5542 256 of 203053 12.61 bp
5732 25 of 3951 63.28 bp
6011 47 of 20565 22.85 bp
5691 24 of 4199 57.16 bp
5999 39 of 18494 21.09 bp
5967 16 of 1320 121.21 bp
5972 9 of 43 2093.02 bp
0000 20 of 6141 32.57 bp
5964 15 of 2584 58.05 bp
5651 27 of 12836 21.03 bp
MCC Fraud Auths Fraud Rate
9399 4 of 11786 3.39 bp
5921 7 of 20044 3.49 bp
4121 5 of 16751 2.98 bp
7230 3 of 11866 2.53 bp
5912 27 of 61428 4.40 bp
5411 134 of 255195 5.25 bp
4784 0 of 7360 0.00 bp
5812 55 of 128521 4.28 bp
7832 0 of 9076 0.00 bp
7841 1 of 10970 0.91 bp
SafeRisky
33. If Fraud Score> 600
Decline
If Fraud Score > 900 and MCC == 9399
Decline
If Fraud Score > 200 and MCC == 5734
Decline
Bulk
LOW
RISK
HIGH
RISK
34. Can We Use the Fraud Rate Instead?
If Fraud Score > 900 and MCC Fraud < 0.02%
Decline
If Fraud Score> 200 and MCC Fraud > 10%
Decline
LOW
RISK
HIGH
RISK
Machine learning models are everywhere, doing everything. I’m guessing they are transforming the businesses of everyone here, right?
And these models work great.
Until they don’t.
No matter how good your model, pretty soon the real world works its way in and the model start to tank.
That’s expensive, since building a good model still takes a bunch of your time and attention to build.
I’m Randal Cox, the chief scientist and co-founder at Rippleshot.
I want to talk today about ways to keep your models running a lot longer – for years even.
About us.
Rippleshot detects payment card data breaches, like Target or Home Depot.
We trace fraudulent purchases back in time
About us.
Rippleshot detects payment card data breaches, like Target or Home Depot.
We trace fraudulent purchases back in time
… to where those cards all visited the same location. That’s where the card was stolen.
Think of it like tracing food poisoning back to a the greasy spoon.
Rippleshot builds a lot of machine learning models
We predict which **cards** are going to be used fraudulently soon, based on where and how they shop.
We make models that predict is a store is likely to be breached soon
And in real-time, we build payment card decline rules to stop fraud spends right NOW.
Let’s consider the model that’s most important to card issuers.
This model stops suspicious transactions in real time before the bank or merchant incurs any loss at all. Getting that right is very hard and really important to get right.
Let’s look at a concrete example.
We are a decision tree shop. You’re probably using tree-like rules all the time.
In this example, a payment far from home at a gas station is likely to be fraud, though even more likely on a weekday. Some nearby states with big dollar purchases are also risky.
That said, all of what I’m saying today is equally applicable to other modeling techniques like neural nets.
The reason I’m up here is I’ve been asked to share some of my big data insights. I’ve only got two, really. So this should be a short talk.
- I have some techniques for filling out in feeble data
- and a way to use those variables indirectly that makes your models last longer
Models are only as good as their data, and sometimes the data is TERRIBLE
One of our clients gave us VERY LITTLE about each transaction.
It’s hard to make a great model out of that, so you’re going to have to augment this data.
Luckily we know something about Fraudster behavior. He does things card holders do not.
Basically
he often spend far from the consumer’s home.
he shop at odd hours when there is less scrutiny
They like launderable goods, like big screen tvs.
Let’s look at the where first.
Let’s look at where first. You and I shop close to our homes usually. But the fraudsters might not know where home is or just don’t have presence on the ground there. So,
Distance between home and the point of sale is incredibly predictive of fraud. It’s often my #1 variable in card present models
Distance is a little hard to compute.
You need clean country and postal codes for home and the POS. Then you need to look up the latitude and longitude of those postal codes, and then run some modestly complicated math to get the distance.
Luckily the lot,lons for all worldwide postal codes are available for free. And the Haversine formula is a google search away.
Distance was a big win, let’s look at time.
Here is the legitimate spend on a large cohort of cards. It’s almost like a pulse with a 1-week period and a 1-month automatic payments period.
More regular than my heartbeat.
But the fraud spend, is often REALLY different.
Huge upswings over week-long periods and even in more regular periods, out of phase from the consumers.
The fraud signal is likely to be drowned out by legitimate Friday payments, for example. But the fraudsters are often busy on days when card holders are not.
Same thing with the time of day. Fraudsters seem to like the dark better than the sun.
So, we now have Day of Week and Hour of Day as new features.
The original data set included some things like states and postal codes and merchant types (groceries or gas stations)
A lot of modelers will use these variables DIRECTLY in the model like in that earlier example.
Don’t do that
There are two chief reasons for this: the problems of ordinality and of change.
Ordinality is just a fancy way of saying there are too many possible values for these variables to use directly.
If you feed any modeling tech a column with more than a million possible categories, it is going to barf. Like, game over, usually. If your lucky, it will just perform poorly.
There is another disadvantage here. Splitting on a large list like postal codes makes for HUGE rules. Some environments impose character limits on decline expressions. It would be much better to have some proxy to postal codes to make the expression shorter.
The other problem is change. The fraudsters know you are trying to catch them, so they change as fast as possible. Unfortunately, you usually don’t get the post-it-note about it.
If you model directly on the the state of merchant category, you’re locked in until you can build the next model – and that might take months. Fraud is faster than that.
One way forward is to replace those primary variables – one layer of indirection.
Instead of a postal code, give the model the fraud rate at this postal code.
Here is a table of merchant categories. One MCC has an astounding 20% fraud rate in this data set. And another never has fraud.
How would you roll that information into a fraud model?
Let’s say the rest of your variables can be used to make a fraud score and you want to add this MCC data. For the data as a whole, you usually decline at a score of 600
You might be tempted to just decline more often in the risky MCC and maybe require a higher fraud score for the safe MCC.
But the fraudsters will move from 5734 to 5735 the week after you implement your model.
Better to use the fraud rate instead. Then you can update a table of fraud rates for all your MCCs and not change your model ITSELF.
So that’s a step up, but you need to be careful.
If the number of transactions is small, you can get a high fraud rate by chance.
Let’s say your real fraud rate is 25% - roll a 1 on a 4-sided die. But if you only have two records, you might get unlucky and roll two 1’s on the two dice. There is a 6% chance you get two ones and think your fraud rate is 100% - a huge error!
There is a simple way around this.
If you hate math, close your eyes for the next slide
For everyone else, z-scores encapsulate how sure we are that our observed rate is different from background.
Really, we’re comparing the global rate (all MCCs) with the rate at this MCC.
<click>
So, if you’re running 4bp in fraud overall, and your this MCC is at 6 bp, just divide that 2 bp difference by the sum of the standard deviations for those two curves.
If the width of the curve is very large, then the z-score decreases a lot – you’re not so sure about the result. If the width is small (i.e., you have a large number of transactions), you get a small number in the denominator.
The math phobes can open their eyes now.
The bottom line is that a z-score above 3 means you can be very sure there is a lot of fraud going on at this MCC.
If the z-score is less than -3, then you can be sure that fraud is really avoiding this MCC.
So our tree model might pay a lot of attention to an MCC with a 10.7 z-score.
I set up z-score tables for lots of primary variables, and then discard the primary variables. And some of the added variables.
As fraud changes, update the z-scores – it’s like updating the model, but for less work. Usually, I keep a running fraud rate for, say, the POS state during the last month. I update this TABLE once a week
There is another advantage here. Comparing against z-scores makes for more compact rule text.
There is another disadvantage here. Splitting on a large list like postal codes makes for HUGE rules. Some environments impose character limits on decline expressions. It would be much better to have some proxy to postal codes to make the expression shorter.
Using this approach makes models last dramatically longer. Sure lots of people will just retrain their model frequently, but that’s more work than updating a table.
Also, in my hands most modeling technologies actually perform better with z-scores even directly out of the gate. It’s easier to do numeric comparisons for splits than split by a bunch of categories. The cleaner split usually means better capture.
And this approach is not very hard. I keep a rolling calculator of the z-score for those frauds for, say, grocery stores over the last month.
Every two weeks, I update the table and leave the model along.
So, in quick summary, we’ve
added space and time variables
removed specific features that might change
and replaced them with z-scores
The upshot is you get models that last years, not months.