© 2014 MapR Technologies 1© 2014 MapR Technologies
© 2014 MapR Technologies 2
Agenda
A sample problem
A general approach
Complications arise
Light is cast on the villains
Wh...
© 2014 MapR Technologies 3
Agenda Script
A sample problem
A general approach
Complications arise
Light is cast on the vill...
© 2014 MapR Technologies 4
Model Building in a Nutshell
Gather
data
Build
models
Predict
future
World
domination!
Fight fr...
© 2014 MapR Technologies 5
A Sample Problem
© 2014 MapR Technologies 6
Modeling Energy Use
• Modeling office and home energy use can save energy
• Guides retrofits
• ...
© 2014 MapR Technologies 7
Modeling Energy Use
See ASHRAE RP-1050
http://bit.ly/1ovwGfy
© 2014 MapR Technologies 8
Modeling Energy Use (or not)
© 2014 MapR Technologies 9
Modeling Energy Use (complete hash)
© 2014 MapR Technologies 10
Some Notes on the Method
• Can’t change method since this is ASHRAE standard
• Small changes i...
© 2014 MapR Technologies 11
Evolutionary Algorithms
• Basic algorithm:
fill population with random solutions
do {
keep bes...
© 2014 MapR Technologies 12
Doesn’t work in practice
© 2014 MapR Technologies 13
Meta-Evolutionary Algorithms
• Meta mutation algorithm:
fill population with random solutions
...
© 2014 MapR Technologies 14
Meta-Evolutionary Algorithms
• Meta mutation algorithm:
fill population with random solutions
...
© 2014 MapR Technologies 15
Meta-Evolutionary Algorithms
• Algorithm may go wrong way
• May take wrong-size steps
• But it...
© 2014 MapR Technologies 16
But There’s a Rub
• This new algorithm may be gang busters
– But it comes with new knobs to tu...
© 2014 MapR Technologies 17
We need to look inside
© 2014 MapR Technologies 18
Demo Reel Synopsis
• Constant mutation rate failure example
• Meta-mutation succeeds
• Meta-mu...
© 2014 MapR Technologies 19
Let’s put on a show!
© 2014 MapR Technologies 20
Not quite that simple
• Current problem is 5-dimensional
• Problem parameters don’t make sense...
© 2014 MapR Technologies 21
Main-line Model and Visualization Flow
Data
repo
Solver
grep
Solver
JSON
model
d3 +
twistd
JSO...
© 2014 MapR Technologies 22
How does R make video?
© 2014 MapR Technologies 23
© 2014 MapR Technologies 24
© 2014 MapR Technologies 25
Diagnostic Visualizations
Solver
JSON
model
Scalable
Logs
ScaleR ffmpeg
© 2014 MapR Technologies 26
Of Note
• RevoScaleR solves most of the parallelism issues
• We still want to run arbitrary R
...
© 2014 MapR Technologies 27
Simple Solution
• MapR provides hdfs and NFS access to cluster
• All path names are the same
•...
© 2014 MapR Technologies 28
Diagnostic Videos
• 5D x 100 can get trapped in local minimum
– ’470 example
• 5D x 500 avoids...
© 2014 MapR Technologies 29
Lessons I Learned by Watching Movies
• Lower dimensional problems are easier
– Evolve baseline...
© 2014 MapR Technologies 30
And there’s a
PRIZE in every
box!
Upcoming SlideShare
Loading in...5
×

Hadoop and R Go to the Movies

189

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
189
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Hadoop and R Go to the Movies

  1. 1. © 2014 MapR Technologies 1© 2014 MapR Technologies
  2. 2. © 2014 MapR Technologies 2 Agenda A sample problem A general approach Complications arise Light is cast on the villains Who flee from the scene
  3. 3. © 2014 MapR Technologies 3 Agenda Script A sample problem A general approach Complications arise Light is cast on the villains Who flee from the scene
  4. 4. © 2014 MapR Technologies 4 Model Building in a Nutshell Gather data Build models Predict future World domination! Fight fraud Save the planet ✔
  5. 5. © 2014 MapR Technologies 5 A Sample Problem
  6. 6. © 2014 MapR Technologies 6 Modeling Energy Use • Modeling office and home energy use can save energy • Guides retrofits • Finds bad leaks • Increases awareness and understanding of problems • Demonstrated results of 20% or more savings • Savings = less CO2 = less planet warming
  7. 7. © 2014 MapR Technologies 7 Modeling Energy Use See ASHRAE RP-1050 http://bit.ly/1ovwGfy
  8. 8. © 2014 MapR Technologies 8 Modeling Energy Use (or not)
  9. 9. © 2014 MapR Technologies 9 Modeling Energy Use (complete hash)
  10. 10. © 2014 MapR Technologies 10 Some Notes on the Method • Can’t change method since this is ASHRAE standard • Small changes in cutoff can have ragged effect on model fit – Linear methods out of the question – Gradient based methods find local minima • All parameters interact strongly – Can’t solve for one at a time
  11. 11. © 2014 MapR Technologies 11 Evolutionary Algorithms • Basic algorithm: fill population with random solutions do { keep best x% of solutions mutate survivors to fill population } until happy with results • Works great • Converges very slowly – If mutation is small, takes many, many steps to find best, gets trapped – If mutation is too big, keeps jumping away from optimum
  12. 12. © 2014 MapR Technologies 12 Doesn’t work in practice
  13. 13. © 2014 MapR Technologies 13 Meta-Evolutionary Algorithms • Meta mutation algorithm: fill population with random solutions do { keep best x% of solutions mutate survivors to fill population use mutation size to set mutation rate per candidate } until happy with results • Works great • Converges very fast – If small jump works, we get more of that – If big jump works, we get more of that
  14. 14. © 2014 MapR Technologies 14 Meta-Evolutionary Algorithms • Meta mutation algorithm: fill population with random solutions do { keep best x% of solutions mutate survivors to fill population use mutation size to set mutation rate per candidate } until happy with results • Works great • Converges very fast – If small jump works, we get more of that – If big jump works, we get more of that
  15. 15. © 2014 MapR Technologies 15 Meta-Evolutionary Algorithms • Algorithm may go wrong way • May take wrong-size steps • But it quickly learns to correct • Bad strategies die out along with bad solutions
  16. 16. © 2014 MapR Technologies 16 But There’s a Rub • This new algorithm may be gang busters – But it comes with new knobs to turn • How can we tell where to turn them? • How do we make sense of a seething mass of 5 dimensional spiders?
  17. 17. © 2014 MapR Technologies 17 We need to look inside
  18. 18. © 2014 MapR Technologies 18 Demo Reel Synopsis • Constant mutation rate failure example • Meta-mutation succeeds • Meta-mutation can handle highly correlated narrow valleys • Very complex landscapes can be navigated • Strategy shifts fluidly to find solutions
  19. 19. © 2014 MapR Technologies 19 Let’s put on a show!
  20. 20. © 2014 MapR Technologies 20 Not quite that simple • Current problem is 5-dimensional • Problem parameters don’t make sense directly • So we need to show the human face of the problem (that is where we started!) • We also need dynamics to understand how the algorithm gets where it goes
  21. 21. © 2014 MapR Technologies 21 Main-line Model and Visualization Flow Data repo Solver grep Solver JSON model d3 + twistd JSON model Conventional Scalable
  22. 22. © 2014 MapR Technologies 22 How does R make video?
  23. 23. © 2014 MapR Technologies 23
  24. 24. © 2014 MapR Technologies 24
  25. 25. © 2014 MapR Technologies 25 Diagnostic Visualizations Solver JSON model Scalable Logs ScaleR ffmpeg
  26. 26. © 2014 MapR Technologies 26 Of Note • RevoScaleR solves most of the parallelism issues • We still want to run arbitrary R • Some legacy functions are Particularly Unfriendly to hdfs – png(filename) – requires conventional file access – system(command) – assumes conventional file access – ffmpeg (1) – assumes conventional file access
  27. 27. © 2014 MapR Technologies 27 Simple Solution • MapR provides hdfs and NFS access to cluster • All path names are the same • Map reduce programs can use legacy POSIX code
  28. 28. © 2014 MapR Technologies 28 Diagnostic Videos • 5D x 100 can get trapped in local minimum – ’470 example • 5D x 500 avoids trapping issues – ’470 quiescence and resurgence • 3D x 500 and 3D x 100 also avoid trapping • Need to distinguish empty house from occupied – ’771 shows poor fit to either regime, classic real world issue
  29. 29. © 2014 MapR Technologies 29 Lessons I Learned by Watching Movies • Lower dimensional problems are easier – Evolve baseline level and cut-points, solve for wing slopes – Hybrid solutions are not “cheating” • Real-world data always has surprises and I am always surprised by this • Can use 5P models as cluster “centroids” to handle 2-state homes
  30. 30. © 2014 MapR Technologies 30 And there’s a PRIZE in every box!

×