© 2014 MapR Technologies 1© 2014 MapR Technologies
© 2014 MapR Technologies 2
Contact Information
Ted Dunning
Chief Applications Architect at MapR Technologies
Committer & PMC for Apache’s Drill, Zookeeper & others
VP of Incubator at Apache Foundation
Email tdunning@apache.org tdunning@maprtech.com
Twitter @ted_dunning
© 2014 MapR Technologies 3
Goals for Today
• Explore the state of the art for deep-learning and fraud detection
• Separate at least some of the wheat from the chaff
• Provide some realistic guidance for getting results
© 2014 MapR Technologies 4
Goals for Today
• Explore the state of the art for deep-learning and fraud detection
• Separate at least some of the wheat from the chaff
• Provide some realistic guidance for getting results
• Play with cool stuff !
© 2014 MapR Technologies 5
Agenda
• Motivation
• What are neural networks and deep learning?
• It can be simpler than you think
• But, no free lunch / you get what you pay / other clever aphorism
• Some experiments
• Where to go from here
© 2014 MapR Technologies 6
Motivation For Advanced Modeling in Fraud
• Neural networks have completely dominated credit card fraud
detection since late 80’s
– Random forest, tree ensembles often used in other kinds of fraud and
churn models
• The reason is rule-based systems simply don’t work
– Well, they do work at first
– Fraudsters change tactics, you add rules, interaction mayhem ensues
• And learning algorithms really do work
– Fraudsters change tactics, you add features and retrain
© 2014 MapR Technologies 7
So learning is good
© 2014 MapR Technologies 8
So learning is good
But good learning is hard
© 2014 MapR Technologies 9
So learning is good
But good learning is hard
And finding good features is
really hard
© 2014 MapR Technologies 10
Some Sample Features
• Charge size relative to previous averages for card
• Charge size relative to previous average for merchant
• Known merchant or not
• Doubled transaction
• AVS or CVV2 mismatch
© 2014 MapR Technologies 11
Some Sample Features
• Charge size relative to previous averages for card
• Charge size relative to previous average for merchant
• Known merchant or not
• Doubled transaction
• Address Verification System or CVV2 mismatch
© 2014 MapR Technologies 12
Some Sample Features
• Charge size relative to previous averages for card
• Charge size relative to previous average for merchant
• Known merchant or not
• Doubled transaction
• Address Verification System or Card Verification Value mismatch
© 2014 MapR Technologies 13
Some Sample Features
• Charge size relative to previous averages for card
• Charge size relative to previous average for merchant
• Known merchant or not
• Doubled transaction
• Address Verification System or Card Verification Value mismatch
• Unusual region for card
• Unusual time-of-day relative to history
• Magstripe use if chip available
• (hundreds more)
© 2014 MapR Technologies 14
Sequence Based Features
• Plausible pattern matching (rent a car, pay for gas at airport)
• Probe transactions (gas in wrong place, pizza, big charge)
• Previous transaction at compromised merchant
• Card velocity
© 2014 MapR Technologies 15
Key Problems
• Good guys need data … that means that fraudsters get first
chance at bat
• Good guys are careful and test systems before releasing
• Bad guys have many low-risk transactions and can change
methods quickly
• In some areas, fraudster adapt techniques in hours
© 2014 MapR Technologies 16
Making up features is easy
Finding features that add
real lift is very hard
© 2014 MapR Technologies 17
What are neural networks and deep learning?
• Start simple … imagine we have 20 features, 0 or 1
– Let’s yell “Fraud” if any of the features is a 1
– Houston, we have a model
• But this model isn’t any better than a rule
• Also doesn’t have any interesting Greek letters
© 2014 MapR Technologies 18
Real-world Intrudes
• We assumed all features are equally good
– What if some are kind of poor or weak?
• Can we weight different features more or less?
– Can we learn these weights from data?
© 2014 MapR Technologies 19
Real-world Intrudes
• We assumed all features are equally good
– What if some are kind of poor or weak?
• Can we weight different features more or less?
– Can we learn these weights from data?
© 2014 MapR Technologies 20
Learning Works
• Yes. We can learn these models
• How we measure error is important
• We must have good features
• Even good features may need transformation
– Take logs of times and monetary values
– Subtract means, scale, bin values
© 2014 MapR Technologies 21
Not Good Enough
• We need combinations of models
• Simple linear combinations aren’t subtle enough
• Enter multi-level models
– Can we learn a model that uses combinations of inputs
– Where each of those combinations is a model that we learn?
© 2014 MapR Technologies 22
Yes, Virginia, There IS a Santa Claus
Each circle is a sum
and a (soft) threshold
Arrows are multiplication
by a learned weight
© 2014 MapR Technologies 23
Errors on Output Can Propagate
Each circle is sends
error to each arrow
Arrows weight back-
propagating errors
Inputs
Hidden layer
© 2014 MapR Technologies 24
Success!
Triumph!
World domination!
© 2014 MapR Technologies 25
World domination!
With some reservations
because features are hard
© 2014 MapR Technologies 26
Turtles All the Way Down – We Wish
• This learning works well for just a few layers
• This is still a big deal …
– with cool features, we can build real systems
• With many layers, the learning no longer converges
• Well … until recently
© 2014 MapR Technologies 27
Model Learning in an Ideal World
• If we could just learn the features
– Maybe unsupervised, maybe supervised
– And at the same time learn the model
• Presumably we could build models quicker
• And more easily
• And we wouldn’t have to dirty our minds with
pedestrian domain knowledge
© 2014 MapR Technologies 28
Example 1 – (not very) Deep Auto-encoder
• Let’s take an example where we can learn features
• Data is EKG traces
• We want to find anomalies
– No supervised training
© 2014 MapR Technologies 29
Spot the Anomaly
Anomaly?
© 2014 MapR Technologies 30
Maybe not!
© 2014 MapR Technologies 31
Where’s Waldo?
This is the real
anomaly
© 2014 MapR Technologies 32
Normal Isn’t Just Normal
• What we want is a model of what is normal
• What doesn’t fit the model is the anomaly
• For simple signals, the model can be simple …
• The real world is rarely so accommodating
x ~ m(t)+ N(0,e)
© 2014 MapR Technologies 33
We Do Windows
© 2014 MapR Technologies 34
We Do Windows
© 2014 MapR Technologies 35
We Do Windows
© 2014 MapR Technologies 36
We Do Windows
© 2014 MapR Technologies 37
We Do Windows
© 2014 MapR Technologies 38
We Do Windows
© 2014 MapR Technologies 39
We Do Windows
© 2014 MapR Technologies 40
We Do Windows
© 2014 MapR Technologies 41
We Do Windows
© 2014 MapR Technologies 42
Windows on the World
• The set of windowed signals is a nice model of our original signal
• Clustering can find the prototypes
– Fancier techniques available using sparse coding
• The result is a dictionary of shapes
• New signals can be encoded by shifting, scaling and adding
shapes from the dictionary
© 2014 MapR Technologies 43
Most Common Shapes (for EKG)
© 2014 MapR Technologies 44
Reconstructed signal
Original
signal
Reconstructed
signal
Reconstruction
error
< 1 bit / sample
© 2014 MapR Technologies 45
An Anomaly
Original technique for finding
1-d anomaly works against
reconstruction error
© 2014 MapR Technologies 46
Close-up of anomaly
Not what you want your
heart to do.
And not what the model
expects it to do.
© 2014 MapR Technologies 47
A Different Kind of Anomaly
© 2014 MapR Technologies 48
Some k-means Caveats
• But Eamonn Keogh says that k-means can’t work on time-series
• That is silly … and kind of correct, k-means does have limits
– Other kinds of auto-encoders are much more powerful
• More fun and code demos at
– https://github.com/tdunning/k-means-auto-encoder
http://www.cs.ucr.edu/~eamonn/meaningless.pdf
© 2014 MapR Technologies 49
The Limits of Clustering as Auto-encoder
• Clustering is like trying to tile your sample distribution
• Can be used to approximate a signal
• Filling d dimensional region with k clusters should give
• If d is large, this is no good
e » 1/ kd
© 2014 MapR Technologies 50
0 500 1000 1500 2000
−2−1012
Time series training data (first 2000 samples)
Time
Test data
Reconstruction
Error
© 2014 MapR Technologies 51
0 500 1000 1500 2000
0.000.050.100.15
Reconstruction error for time−series data
Centroids
MAVError
Training data
Held−out data
© 2014 MapR Technologies 52
Moral For Auto-encoders
• The simplest auto-encoders can be good models
• For more complex spaces/signals, more elaborate models may
be required
– Winner take (absolutely) all may be problematic
– In particular, models that allow sparse linear combination may be better
• Consider deep learning, recurrent networks, denoising
© 2014 MapR Technologies 53
How Does Clustering Do Reconstruction?
x1 x2
...
xn-1 xn
Input
For normalized cluster centroids,
dot-product and distance are equivalent
© 2014 MapR Technologies 54
How Does Clustering Do Reconstruction?
x1 x2
...
xn-1 xn
Input
Winner takes all with k-means
© 2014 MapR Technologies 55
How Does Clustering Do Reconstruction?
x1 x2
...
xn-1 xn
x'1 x'2
...
x'n-1 x'n
Input
Hidden layer
(clusters)
Reconstruction
Dot-product scales
centroid to reconstruct
© 2014 MapR Technologies 56
AKA - Neural Network
x1 x2
...
xn-1 xn
x'1 x'2
...
x'n-1 x'n
Input
Hidden layer
(clusters)
Reconstruction
© 2014 MapR Technologies 57
What If … We Had More Layers?
...
...
...
...
... ... ... ... ...
... ... ... ... ...
A
B
A'
© 2014 MapR Technologies 58
Other Thoughts
• What if we allow more than one cluster to be active?
– k-sparse learning!
© 2014 MapR Technologies 59
Other Thoughts
• What if we allow more than one cluster to be active?
– k-sparse learning!
© 2014 MapR Technologies 60
Other Thoughts
• What if we allow more than one cluster to be active?
– k-sparse learning!
© 2014 MapR Technologies 61
Other Thoughts
• What if we allow more than one cluster to be active?
– k-sparse learning!
• Well, almost
© 2014 MapR Technologies 62
The Point of Deep Learning
• It isn’t just many hidden layers in a neural network
• The goal is to eliminate feature engineering by learning features
as well as the classifier
© 2014 MapR Technologies 63
Experiment 3 – Card Velocity
• Most features so far are inherent in the data
• Few are true sequence features
• Card velocity is a pure combination
– Starting point can be anywhere
– The issue is where the next point is relative to starting point
© 2014 MapR Technologies 64
Card Velocity
Non-fraud steps are
reasonable in terms
of distance and time
Fraudulent use of card
by multiple attackers
results in big, fast jumps
© 2014 MapR Technologies 65
Synthetic Data Example
• Generate random point
• Take four small steps
• If fraud, second step can be large
• Result is five positions, each in 3-d on surface of a sphere
– Data shape is N x (5 x 3)
• Add secondary features containing step size … N x 4
© 2014 MapR Technologies 66
The Truth is Out There
• With the right feature (step-size),
it is trivial to spot the fraud
• Here we show the step size
between positions
• Fraud cases take a big jump that
others don’t
• But they can be anywhere
© 2014 MapR Technologies 67
But Dimensionality Bites Hard
• With the step-size feature, learning succeeds instantly with the
simplest models and gets nearly perfect accuracy
• Without the step-size feature, learning with TensorFlow gets
modest accuracy after substantial learning cost (work in
progress, could do better with lots more tuning)
• The problem is that there are two many combinations of 15
variables, we need a very specific combination of three pair-wise
diffs combined non-linearly into a distance
© 2014 MapR Technologies 68
104
105
106
1
0
0.2
0.4
0.6
0.8
Data Size
AUCorPrecision
AUC
Precision
© 2014 MapR Technologies 69
We have a
bona fide revolution
But old tricks still pay
© 2014 MapR Technologies 70
Greenfield Problem Landscape
© 2014 MapR Technologies 71
Mature Problem Landscape
© 2014 MapR Technologies 72
Summary
• There is too much to say in 40 minutes, let’s talk some more at
the MapR booth
• Deep learning, especially with systems like TensorFlow have
huge promise
• Deep learning trades learning architecture engineering for
feature engineering
• There are powerful middle grounds
© 2014 MapR Technologies 73
© 2014 MapR Technologies 74
Short Books by Ted Dunning & Ellen Friedman
• Published by O’Reilly in 2014 - 2016
• For sale from Amazon or O’Reilly
• Free e-books currently available courtesy of MapR
http://bit.ly/ebook-real-
world-hadoop
http://bit.ly/mapr-tsdb-
ebook
http://bit.ly/ebook-
anomaly
http://bit.ly/recommend
ation-ebook
© 2014 MapR Technologies 75
Streaming Architecture
by Ted Dunning and Ellen Friedman © 2016 (published by O’Reilly)
Free copies at book
signing today
http://bit.ly/mapr-ebook-streams
© 2014 MapR Technologies 76
Thank You!
© 2014 MapR Technologies 77
Q&A
@mapr maprtech
tdunning@maprtech.com
Engage with us!
MapR
maprtech
mapr-technologies

Deep Learning for Fraud Detection

  • 1.
    © 2014 MapRTechnologies 1© 2014 MapR Technologies
  • 2.
    © 2014 MapRTechnologies 2 Contact Information Ted Dunning Chief Applications Architect at MapR Technologies Committer & PMC for Apache’s Drill, Zookeeper & others VP of Incubator at Apache Foundation Email tdunning@apache.org tdunning@maprtech.com Twitter @ted_dunning
  • 3.
    © 2014 MapRTechnologies 3 Goals for Today • Explore the state of the art for deep-learning and fraud detection • Separate at least some of the wheat from the chaff • Provide some realistic guidance for getting results
  • 4.
    © 2014 MapRTechnologies 4 Goals for Today • Explore the state of the art for deep-learning and fraud detection • Separate at least some of the wheat from the chaff • Provide some realistic guidance for getting results • Play with cool stuff !
  • 5.
    © 2014 MapRTechnologies 5 Agenda • Motivation • What are neural networks and deep learning? • It can be simpler than you think • But, no free lunch / you get what you pay / other clever aphorism • Some experiments • Where to go from here
  • 6.
    © 2014 MapRTechnologies 6 Motivation For Advanced Modeling in Fraud • Neural networks have completely dominated credit card fraud detection since late 80’s – Random forest, tree ensembles often used in other kinds of fraud and churn models • The reason is rule-based systems simply don’t work – Well, they do work at first – Fraudsters change tactics, you add rules, interaction mayhem ensues • And learning algorithms really do work – Fraudsters change tactics, you add features and retrain
  • 7.
    © 2014 MapRTechnologies 7 So learning is good
  • 8.
    © 2014 MapRTechnologies 8 So learning is good But good learning is hard
  • 9.
    © 2014 MapRTechnologies 9 So learning is good But good learning is hard And finding good features is really hard
  • 10.
    © 2014 MapRTechnologies 10 Some Sample Features • Charge size relative to previous averages for card • Charge size relative to previous average for merchant • Known merchant or not • Doubled transaction • AVS or CVV2 mismatch
  • 11.
    © 2014 MapRTechnologies 11 Some Sample Features • Charge size relative to previous averages for card • Charge size relative to previous average for merchant • Known merchant or not • Doubled transaction • Address Verification System or CVV2 mismatch
  • 12.
    © 2014 MapRTechnologies 12 Some Sample Features • Charge size relative to previous averages for card • Charge size relative to previous average for merchant • Known merchant or not • Doubled transaction • Address Verification System or Card Verification Value mismatch
  • 13.
    © 2014 MapRTechnologies 13 Some Sample Features • Charge size relative to previous averages for card • Charge size relative to previous average for merchant • Known merchant or not • Doubled transaction • Address Verification System or Card Verification Value mismatch • Unusual region for card • Unusual time-of-day relative to history • Magstripe use if chip available • (hundreds more)
  • 14.
    © 2014 MapRTechnologies 14 Sequence Based Features • Plausible pattern matching (rent a car, pay for gas at airport) • Probe transactions (gas in wrong place, pizza, big charge) • Previous transaction at compromised merchant • Card velocity
  • 15.
    © 2014 MapRTechnologies 15 Key Problems • Good guys need data … that means that fraudsters get first chance at bat • Good guys are careful and test systems before releasing • Bad guys have many low-risk transactions and can change methods quickly • In some areas, fraudster adapt techniques in hours
  • 16.
    © 2014 MapRTechnologies 16 Making up features is easy Finding features that add real lift is very hard
  • 17.
    © 2014 MapRTechnologies 17 What are neural networks and deep learning? • Start simple … imagine we have 20 features, 0 or 1 – Let’s yell “Fraud” if any of the features is a 1 – Houston, we have a model • But this model isn’t any better than a rule • Also doesn’t have any interesting Greek letters
  • 18.
    © 2014 MapRTechnologies 18 Real-world Intrudes • We assumed all features are equally good – What if some are kind of poor or weak? • Can we weight different features more or less? – Can we learn these weights from data?
  • 19.
    © 2014 MapRTechnologies 19 Real-world Intrudes • We assumed all features are equally good – What if some are kind of poor or weak? • Can we weight different features more or less? – Can we learn these weights from data?
  • 20.
    © 2014 MapRTechnologies 20 Learning Works • Yes. We can learn these models • How we measure error is important • We must have good features • Even good features may need transformation – Take logs of times and monetary values – Subtract means, scale, bin values
  • 21.
    © 2014 MapRTechnologies 21 Not Good Enough • We need combinations of models • Simple linear combinations aren’t subtle enough • Enter multi-level models – Can we learn a model that uses combinations of inputs – Where each of those combinations is a model that we learn?
  • 22.
    © 2014 MapRTechnologies 22 Yes, Virginia, There IS a Santa Claus Each circle is a sum and a (soft) threshold Arrows are multiplication by a learned weight
  • 23.
    © 2014 MapRTechnologies 23 Errors on Output Can Propagate Each circle is sends error to each arrow Arrows weight back- propagating errors Inputs Hidden layer
  • 24.
    © 2014 MapRTechnologies 24 Success! Triumph! World domination!
  • 25.
    © 2014 MapRTechnologies 25 World domination! With some reservations because features are hard
  • 26.
    © 2014 MapRTechnologies 26 Turtles All the Way Down – We Wish • This learning works well for just a few layers • This is still a big deal … – with cool features, we can build real systems • With many layers, the learning no longer converges • Well … until recently
  • 27.
    © 2014 MapRTechnologies 27 Model Learning in an Ideal World • If we could just learn the features – Maybe unsupervised, maybe supervised – And at the same time learn the model • Presumably we could build models quicker • And more easily • And we wouldn’t have to dirty our minds with pedestrian domain knowledge
  • 28.
    © 2014 MapRTechnologies 28 Example 1 – (not very) Deep Auto-encoder • Let’s take an example where we can learn features • Data is EKG traces • We want to find anomalies – No supervised training
  • 29.
    © 2014 MapRTechnologies 29 Spot the Anomaly Anomaly?
  • 30.
    © 2014 MapRTechnologies 30 Maybe not!
  • 31.
    © 2014 MapRTechnologies 31 Where’s Waldo? This is the real anomaly
  • 32.
    © 2014 MapRTechnologies 32 Normal Isn’t Just Normal • What we want is a model of what is normal • What doesn’t fit the model is the anomaly • For simple signals, the model can be simple … • The real world is rarely so accommodating x ~ m(t)+ N(0,e)
  • 33.
    © 2014 MapRTechnologies 33 We Do Windows
  • 34.
    © 2014 MapRTechnologies 34 We Do Windows
  • 35.
    © 2014 MapRTechnologies 35 We Do Windows
  • 36.
    © 2014 MapRTechnologies 36 We Do Windows
  • 37.
    © 2014 MapRTechnologies 37 We Do Windows
  • 38.
    © 2014 MapRTechnologies 38 We Do Windows
  • 39.
    © 2014 MapRTechnologies 39 We Do Windows
  • 40.
    © 2014 MapRTechnologies 40 We Do Windows
  • 41.
    © 2014 MapRTechnologies 41 We Do Windows
  • 42.
    © 2014 MapRTechnologies 42 Windows on the World • The set of windowed signals is a nice model of our original signal • Clustering can find the prototypes – Fancier techniques available using sparse coding • The result is a dictionary of shapes • New signals can be encoded by shifting, scaling and adding shapes from the dictionary
  • 43.
    © 2014 MapRTechnologies 43 Most Common Shapes (for EKG)
  • 44.
    © 2014 MapRTechnologies 44 Reconstructed signal Original signal Reconstructed signal Reconstruction error < 1 bit / sample
  • 45.
    © 2014 MapRTechnologies 45 An Anomaly Original technique for finding 1-d anomaly works against reconstruction error
  • 46.
    © 2014 MapRTechnologies 46 Close-up of anomaly Not what you want your heart to do. And not what the model expects it to do.
  • 47.
    © 2014 MapRTechnologies 47 A Different Kind of Anomaly
  • 48.
    © 2014 MapRTechnologies 48 Some k-means Caveats • But Eamonn Keogh says that k-means can’t work on time-series • That is silly … and kind of correct, k-means does have limits – Other kinds of auto-encoders are much more powerful • More fun and code demos at – https://github.com/tdunning/k-means-auto-encoder http://www.cs.ucr.edu/~eamonn/meaningless.pdf
  • 49.
    © 2014 MapRTechnologies 49 The Limits of Clustering as Auto-encoder • Clustering is like trying to tile your sample distribution • Can be used to approximate a signal • Filling d dimensional region with k clusters should give • If d is large, this is no good e » 1/ kd
  • 50.
    © 2014 MapRTechnologies 50 0 500 1000 1500 2000 −2−1012 Time series training data (first 2000 samples) Time Test data Reconstruction Error
  • 51.
    © 2014 MapRTechnologies 51 0 500 1000 1500 2000 0.000.050.100.15 Reconstruction error for time−series data Centroids MAVError Training data Held−out data
  • 52.
    © 2014 MapRTechnologies 52 Moral For Auto-encoders • The simplest auto-encoders can be good models • For more complex spaces/signals, more elaborate models may be required – Winner take (absolutely) all may be problematic – In particular, models that allow sparse linear combination may be better • Consider deep learning, recurrent networks, denoising
  • 53.
    © 2014 MapRTechnologies 53 How Does Clustering Do Reconstruction? x1 x2 ... xn-1 xn Input For normalized cluster centroids, dot-product and distance are equivalent
  • 54.
    © 2014 MapRTechnologies 54 How Does Clustering Do Reconstruction? x1 x2 ... xn-1 xn Input Winner takes all with k-means
  • 55.
    © 2014 MapRTechnologies 55 How Does Clustering Do Reconstruction? x1 x2 ... xn-1 xn x'1 x'2 ... x'n-1 x'n Input Hidden layer (clusters) Reconstruction Dot-product scales centroid to reconstruct
  • 56.
    © 2014 MapRTechnologies 56 AKA - Neural Network x1 x2 ... xn-1 xn x'1 x'2 ... x'n-1 x'n Input Hidden layer (clusters) Reconstruction
  • 57.
    © 2014 MapRTechnologies 57 What If … We Had More Layers? ... ... ... ... ... ... ... ... ... ... ... ... ... ... A B A'
  • 58.
    © 2014 MapRTechnologies 58 Other Thoughts • What if we allow more than one cluster to be active? – k-sparse learning!
  • 59.
    © 2014 MapRTechnologies 59 Other Thoughts • What if we allow more than one cluster to be active? – k-sparse learning!
  • 60.
    © 2014 MapRTechnologies 60 Other Thoughts • What if we allow more than one cluster to be active? – k-sparse learning!
  • 61.
    © 2014 MapRTechnologies 61 Other Thoughts • What if we allow more than one cluster to be active? – k-sparse learning! • Well, almost
  • 62.
    © 2014 MapRTechnologies 62 The Point of Deep Learning • It isn’t just many hidden layers in a neural network • The goal is to eliminate feature engineering by learning features as well as the classifier
  • 63.
    © 2014 MapRTechnologies 63 Experiment 3 – Card Velocity • Most features so far are inherent in the data • Few are true sequence features • Card velocity is a pure combination – Starting point can be anywhere – The issue is where the next point is relative to starting point
  • 64.
    © 2014 MapRTechnologies 64 Card Velocity Non-fraud steps are reasonable in terms of distance and time Fraudulent use of card by multiple attackers results in big, fast jumps
  • 65.
    © 2014 MapRTechnologies 65 Synthetic Data Example • Generate random point • Take four small steps • If fraud, second step can be large • Result is five positions, each in 3-d on surface of a sphere – Data shape is N x (5 x 3) • Add secondary features containing step size … N x 4
  • 66.
    © 2014 MapRTechnologies 66 The Truth is Out There • With the right feature (step-size), it is trivial to spot the fraud • Here we show the step size between positions • Fraud cases take a big jump that others don’t • But they can be anywhere
  • 67.
    © 2014 MapRTechnologies 67 But Dimensionality Bites Hard • With the step-size feature, learning succeeds instantly with the simplest models and gets nearly perfect accuracy • Without the step-size feature, learning with TensorFlow gets modest accuracy after substantial learning cost (work in progress, could do better with lots more tuning) • The problem is that there are two many combinations of 15 variables, we need a very specific combination of three pair-wise diffs combined non-linearly into a distance
  • 68.
    © 2014 MapRTechnologies 68 104 105 106 1 0 0.2 0.4 0.6 0.8 Data Size AUCorPrecision AUC Precision
  • 69.
    © 2014 MapRTechnologies 69 We have a bona fide revolution But old tricks still pay
  • 70.
    © 2014 MapRTechnologies 70 Greenfield Problem Landscape
  • 71.
    © 2014 MapRTechnologies 71 Mature Problem Landscape
  • 72.
    © 2014 MapRTechnologies 72 Summary • There is too much to say in 40 minutes, let’s talk some more at the MapR booth • Deep learning, especially with systems like TensorFlow have huge promise • Deep learning trades learning architecture engineering for feature engineering • There are powerful middle grounds
  • 73.
    © 2014 MapRTechnologies 73
  • 74.
    © 2014 MapRTechnologies 74 Short Books by Ted Dunning & Ellen Friedman • Published by O’Reilly in 2014 - 2016 • For sale from Amazon or O’Reilly • Free e-books currently available courtesy of MapR http://bit.ly/ebook-real- world-hadoop http://bit.ly/mapr-tsdb- ebook http://bit.ly/ebook- anomaly http://bit.ly/recommend ation-ebook
  • 75.
    © 2014 MapRTechnologies 75 Streaming Architecture by Ted Dunning and Ellen Friedman © 2016 (published by O’Reilly) Free copies at book signing today http://bit.ly/mapr-ebook-streams
  • 76.
    © 2014 MapRTechnologies 76 Thank You!
  • 77.
    © 2014 MapRTechnologies 77 Q&A @mapr maprtech tdunning@maprtech.com Engage with us! MapR maprtech mapr-technologies