© 2014 MapR Technologies 1
Talk Overview
• Agile Real-time Stats
• R + Storm
github.com/allenday/R-Storm
• DEMO
• How to d...
© 2014 MapR Technologies 2© 2014 MapR Technologies
Architecting R into the Storm
Application Development Process
© 2014 MapR Technologies 3
Allen (me) and Sungwook @ MapR
• Allen Day, Principal Data Scientist [ @allenday ]
7yr Hadoop d...
© 2014 MapR Technologies 4
What’s Storm? What’s R?
• What’s Storm?
– Processes a data stream. Akin to UNIX pipe + tee & me...
© 2014 MapR Technologies 5
R outside, Storm inside: not practical. Why?
• Model-building and QA is done
on data snapshots
...
© 2014 MapR Technologies 6
Storm outside, R inside: a good fit
• Enables separation of concerns
– Independently manage mod...
© 2014 MapR Technologies 7© 2014 MapR Technologies
Q: Who really likes statistics?
A: Baseball fans
A: Team Managers = Por...
© 2014 MapR Technologies 8
© 2014 MapR Technologies 9
Fresh Local Data Tonight!
© 2014 MapR Technologies 10
Famous Vintage Data
Oakland Athletics
2002 Season
20 consecutive
wins – the current
record
Obl...
© 2014 MapR Technologies 11© 2014 MapR Technologies
Goal: Detect “Moneyball” 2002 Winning Streak
© 2014 MapR Technologies 12
Methods:
Change Point Detection
Find natural breakpoints in a
time-series set of data points
R...
© 2014 MapR Technologies 13
GIFs to
MapR
Filesystem
Methods: R+Storm Demo Architecture
Storm Bolt
R online
change point
de...
© 2014 MapR Technologies 14© 2014 MapR Technologies
50-game sliding
window/buffer to
detect change points
Cumulative histo...
© 2014 MapR Technologies 15
Methods Details: How it’s done
• Uses R-Storm binding github.com/allenday/R-Storm
– Storm pack...
© 2014 MapR Technologies 16
Methods Details: Easy integration
R: lambda function
storm = Storm$new();
storm$lambda = funct...
© 2014 MapR Technologies 17
Results
• Change points are identified, but none for winning streak
– Not using score differen...
© 2014 MapR Technologies 18
Q&A
@allenday maprtech
allenday@mapr.com
Engage with us!
MapR
maprtech
linkedin.com/in/allenday
Upcoming SlideShare
Loading in...5
×

R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose

2,792

Published on

Architecting R into the Storm Application Development Process
~~~~~
The business need for real-time analytics at large scale has focused attention on the use of Apache Storm, but an approach that is sometimes overlooked is the use of Storm and R together. This novel combination of real-time processing with Storm and the practical but powerful statistical analysis offered by R substantially extends the usefulness of Storm as a solution to a variety of business critical problems. By architecting R into the Storm application development process, Storm developers can be much more effective. The aim of this design is not necessarily to deploy faster code but rather to deploy code faster. Just a few lines of R code can be used in place of lengthy Storm code for the purpose of early exploration – you can easily evaluate alternative approaches and quickly make a working prototype.

In this presentation, Allen will build a bridge from basic real-time business goals to the technical design of solutions. We will take an example of a real-world use case, compose an implementation of the use case as Storm components (spouts, bolts, etc.) and highlight how R can be an effective tool in prototyping a solution.

Published in: Sports
0 Comments
16 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,792
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
0
Comments
0
Likes
16
Embeds 0
No embeds

No notes for slide
  • FILL IN RED WITH CORRECT DETAILS
  • FILL IN RED WITH CORRECT DETAILS
  • Transcript of "R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose"

    1. 1. © 2014 MapR Technologies 1 Talk Overview • Agile Real-time Stats • R + Storm github.com/allenday/R-Storm • DEMO • How to do it? • Q & A @allenday Agile Methods Advanced Statistics Continuous Real-time Delivery github.com/allenday/hadoop-summit-r-storm-demo-public
    2. 2. © 2014 MapR Technologies 2© 2014 MapR Technologies Architecting R into the Storm Application Development Process
    3. 3. © 2014 MapR Technologies 3 Allen (me) and Sungwook @ MapR • Allen Day, Principal Data Scientist [ @allenday ] 7yr Hadoop dev, 12yr R dev/author PhD, Human Genetics, UCLA Medicine • Sungwook Yoon, Data Scientist Spark & Security Expert PhD, Computer Engineering, Purdue • MapR [ @mapr ] Distributes open source components for Hadoop Adds major technology for performance, HA, industry standard APIs
    4. 4. © 2014 MapR Technologies 4 What’s Storm? What’s R? • What’s Storm? – Processes a data stream. Akin to UNIX pipe + tee & merge commands – Runs on a cluster. Fault-tolerant and designed to scale out – Used for: real-time analytics & machine learning • What’s R? – Programming language with advanced statistics libraries – Does not scale out. Can scale up – Used for: prototyping, data modeling, visualization How to combine these?
    5. 5. © 2014 MapR Technologies 5 R outside, Storm inside: not practical. Why? • Model-building and QA is done on data snapshots • However, R => Hadoop is realistic. Key difference: referenced data can be static – Use MapR snapshots for dev and QA – See also: RHIPE (Purdue) and RHadoop (RevolutionAnalytics) R Storm User
    6. 6. © 2014 MapR Technologies 6 Storm outside, R inside: a good fit • Enables separation of concerns – Independently manage modeling, ops timelines, and version control – Integrate as needed • Enables role specialization – R built-ins allow faster iteration and more concise stats-type code – Do DevOps with specific SW engineering tech, e.g. Java Storm R User
    7. 7. © 2014 MapR Technologies 7© 2014 MapR Technologies Q: Who really likes statistics? A: Baseball fans A: Team Managers = Portfolio Managers
    8. 8. © 2014 MapR Technologies 8
    9. 9. © 2014 MapR Technologies 9 Fresh Local Data Tonight!
    10. 10. © 2014 MapR Technologies 10 Famous Vintage Data Oakland Athletics 2002 Season 20 consecutive wins – the current record Obligatory movie ref… I’m from LA LET’S GO DODGERS!
    11. 11. © 2014 MapR Technologies 11© 2014 MapR Technologies Goal: Detect “Moneyball” 2002 Winning Streak
    12. 12. © 2014 MapR Technologies 12 Methods: Change Point Detection Find natural breakpoints in a time-series set of data points R packages implement this: changepoint: more sensitve, but not streaming bcp: streaming, but less sensitive
    13. 13. © 2014 MapR Technologies 13 GIFs to MapR Filesystem Methods: R+Storm Demo Architecture Storm Bolt R online change point detector Storm Bolt (write to Jetty) Oakland A’s Data (accelerated) Jetty Webserver Browser (D3.js) Us  github.com/allenday/hadoop-summit-r-storm-demo-public
    14. 14. © 2014 MapR Technologies 14© 2014 MapR Technologies 50-game sliding window/buffer to detect change points Cumulative history with detected break points Raw data (score difference between A’s and opponent) Demo
    15. 15. © 2014 MapR Technologies 15 Methods Details: How it’s done • Uses R-Storm binding github.com/allenday/R-Storm – Storm package on CRAN cran.r-project.org/web/packages/Storm Storm (dev team) R (stats team) Storm (dev team, pure Java) Producer Consumer
    16. 16. © 2014 MapR Technologies 16 Methods Details: Easy integration R: lambda function storm = Storm$new(); storm$lambda = function(s) { t = s$tuple; t$output = vector(length=1); t$output[1] = “tada!” s$emit(t) } Storm: extend ShellBolt public static class MyRBolt extends ShellBolt implements IRichBolt { public RBolt() { super("Rscript", ”my.R"); } }
    17. 17. © 2014 MapR Technologies 17 Results • Change points are identified, but none for winning streak – Not using score difference, anyway • Time to integrate with the modeling team! – Send @kunpognr or @allenday a pull request on GitHub • Applicable to many other use cases, e.g. – Security (fraud detection, intrusion detection) – Marketing (intent to purchase / social media streams) – Customer Support (help desk voice calls) Discussion
    18. 18. © 2014 MapR Technologies 18 Q&A @allenday maprtech allenday@mapr.com Engage with us! MapR maprtech linkedin.com/in/allenday

    ×