Traditional online fraud prevention techniques have high rates of false positives. That means they identify as fraudulent, and turn away, a large number of good customers. You can increase your sales by simply using some of the newer, more accurate tools.
8. Positive - ScammerNegative – Good customer
False Positive – We classify someone as a
scammer when they aren’t
Lose customers
False Negative – We classify someone as a good
customer when they are a scammer
Lose money
9. New Disease - Alexitis
• Very rare – only affects 1 in a million
people
• Luckily, we have a test that is 99%
accurate
• If they have Alexitis, test is positive 99% of
the time
• If they don’t have Alexitis, test is negative
99% of the time
10. I’ve just tested positive for
Alexitis. What are the
chances I actually have
it?
11. 99%, right? I’m screwed!
Would you believe .01%?
Has
Alexitis
Does not have
Alexitis
Total
Test Positive 1
(true positive)
10,000
(false positive)
10,001
Test Negative 0
(false negative)
989,999
(true negative)
989,999
Total 1 999,999 1,000,000
Paradox of the False Positive
12. Conditional Probability
If you live in the United States, you
probably speak English
If you speak English, you probably don’t
live in the United States
13. IF YOU ARE TESTING FOR
SOMETHING THAT RARELY OCCURS,
YOUR TOOLS HAVE TO BE
REALLY, REALLY GOOD
Remember
22. IP Geo-Location
891889-11
Problem 2: Other Carriers
891888 United States
891889 Nigeria
9999 FedEx
891891 Luxemborg
9999
9999
9999
9999
9999
9999
9999
9999
23. IP Geolocation
• With “honest” users, IP Geolocation can be
somewhat accurate
• Nation: 95% - 99%
• City: 50% - 80%
• In terms of fraud prevention, it will only
catch the most clueless of fraudsters
• Essentially useless for mobile data
25. Proxy Detection
• Can catch known proxies
• Suffers from same database issues as
IP Geolocation
• ANY machine on the internet can be a
proxy
26. Cookies
Once I find out your are a scammer, I sneak
into your house and put an X on your
envelopes, with invisible ink
891889-11
891899-11
X
X
27. Cookies
• Will work if the scammer does nothing to
prevent it
• Can be prevented with a single click
• Useful for tracking customers, almost
useless for tracking fraudster
29. Behavior Detection
• Very difficult to measure accurately
• Highly subject to false positives
• Almost any behavior that appears
suspicious can also have a legitimate
purpose as well
30. Browser Fingerprinting
I am going to measure the unique
characteristics of the paper, so I can
recognize the bad letters
31. Browser Fingerprinting
• Somewhat effective technique for tracking people
online
• Measures unique characteristics of your browser
(fonts, plug-ins, etc.) that are reported to web server
• Not well known among general public
• Generally not completely unique
• Will lead to false positives
• Not useful for mobile
• Trivial to circumvent
• Clean browser install
• Virtual machine
33. Transactional Data Strengths
• Does not require user involvement or
knowledge
• Usually quick
• Can encompass many data points
• Does not affect the user experience
• Can be tested on sample data
34. Transactional Data Weaknesses
• Generally easy to workaround
• Significant false positive rate
• Difficult to aggregate across platforms
36. Identity-Based Fraud Prevention
• In the real world, we want to know who we
are dealing with
• Personal recommendations are extremely
important
• Social context is extremely important
• However, online we have no identity
framework to leverage
37. FUNDAMENTALLY WE HAVE
BEEN SOLVING THE WRONG
PROBLEM
WE DON’T HAVE A TRANSACTION
PROBLEM, WE HAVE AN IDENTITY
PROBLEM
however
38. “No man is just of his own free
will [...] he will always do wrong
when he gets the chance. If
anyone who had the liberty [of
the ring of Gyges] neither
wronged nor robbed his
neighbor, men would think him
a most miserable idiot.”
- Plato
42. Extreme Identity: DoD Top Secret
Clearance
• Takes 1-2 years
• Involves ~ 40 pages of
documentation
• Leverages numerous federal
databases
• Involves dozens of interviews
with people who have known
you for
51. BeehiveID Advantages
• Ultra-low friction
• Selfies are easy!
• Uniqueness through biometrics
• NO private information whatsoever
• Supports trust through
connections between people
• One-step integration
52. Summary
• Classification problems are inherently fuzzy
• When the thing you are looking for is rare, you have to
be really precise
• Transactional data is dependent upon data effectively
provided by the scammers
• Results in high false positives, losing customers
• Is easy to circumvent by scammers
• Identity is the foundation of trust in the real world, and
can be used from trust online, with the right tools
• Must be low-friction
• Must preserve privacy
Let’s say you are a banker. You are concerned about a few different things. You need your bank to make money so you need deposits and you need to make loans. You need people to come in and make accounts. You know that some people may be doing things that are less than legal, but as long as they are making deposits you might not care. Or maybe you care deeply about that. In any case, you only have so much time to try to figure out why people are using your bank. You need those deposits.
But the government has told you exactly how much you need to care. They give you parameters on what you need to care about – cash deposits > $10,000 for example. But they don’t let you get off that easily – you also have to support “suspicious” activity.
Here are some examples of the complexities of determining suspicious activity. All of these could possibly be classified in any different way, depending upon context. A corporate check could be from a shell company doing money laundering and a $9,999 deposit could be legit.
FINCEN’s guidelines: http://www.fincen.gov/statutes_regs/guidance/pdf/msb_prevention_guide.pdf (fascinating stuff)
Binary classification is the process of separating things into two categories. In the graph on the right, a simple equation can perfectly separate the two classes. We want things to be this way, but unfortunately they rarely are.
In most real-world classification processes, the boundaries are much more fuzzy and the best we can do is catch some of the things on either side,
We can define “positive” and “negative” however we want. But since we are talking about fraud prevention/detection, we will define “positive” as someone being a scammer. That means negative is a good outcome – a good customer. You can think of it kind of like disease detection.
Using this definition, we want to avoid false positives because that means we are turning away good customers. We also want to avoid false negatives because it means we are letting in the scammers. Generally, when you try to optimize one, you make the other worse.
99% accurate is another way of saying 1% inaccurate. If we test 1,000,000 people, only one of them will actually have Alexitis. But since we are 1% inaccurate, we will falsely say 10,000 people have it. (.01 * 1,000,000).
Put in math terms, P(Alexitis | Positive) = .01%
This is called the paradox of the false positive and it occurs in populations where the probability of an event is low.
Conditional probability is actually quite simple, but most people don’t think about it when they are predicting what outcomes will happen
Here’s some other ones:
Here is a very simplified diagram of how the internet works. (ha)
Contrary to popular belief, the internet is not a series of tubes or even a series of wires. You may think of an internet connection as kind of like a phone call, but it is not. It is a series of distributed packets.
We’ll be using some mail analogies in the coming slides
LA Times did an experiment where they had both anonymous comments and Facebook comments available for articles. The difference in the level of discourse was “stunning” When people see their real name and face next to a comment, the civility of the discourse changes dramatically.
Ultimately, a security clearance is about trying to figure out who you are