Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
© 2014 MapR Technologies 1
Talk Overview
• Agile Real-time Stats
• R + Storm
github.com/allenday/R-Storm
• DEMO
• How to d...
© 2014 MapR Technologies 2© 2014 MapR Technologies
Architecting R into the Storm
Application Development Process
© 2014 MapR Technologies 3
Allen (me) and Sungwook @ MapR
• Allen Day, Principal Data Scientist [ @allenday ]
7yr Hadoop d...
© 2014 MapR Technologies 4
What’s Storm? What’s R?
• What’s Storm?
– Processes a data stream. Akin to UNIX pipe + tee & me...
© 2014 MapR Technologies 5
R outside, Storm inside: not practical. Why?
• Model-building and QA is done
on data snapshots
...
© 2014 MapR Technologies 6
Storm outside, R inside: a good fit
• Enables separation of concerns
– Independently manage mod...
© 2014 MapR Technologies 7© 2014 MapR Technologies
Q: Who really likes statistics?
A: Baseball fans
A: Team Managers = Por...
© 2014 MapR Technologies 8
© 2014 MapR Technologies 9
Fresh Local Data Tonight!
© 2014 MapR Technologies 10
Famous Vintage Data
Oakland Athletics
2002 Season
20 consecutive
wins – the current
record
Obl...
© 2014 MapR Technologies 11© 2014 MapR Technologies
Goal: Detect “Moneyball” 2002 Winning Streak
© 2014 MapR Technologies 12
Methods:
Change Point Detection
Find natural breakpoints in a
time-series set of data points
R...
© 2014 MapR Technologies 13
GIFs to
MapR
Filesystem
Methods: R+Storm Demo Architecture
Storm Bolt
R online
change point
de...
© 2014 MapR Technologies 14© 2014 MapR Technologies
50-game sliding
window/buffer to
detect change points
Cumulative histo...
© 2014 MapR Technologies 15
Methods Details: How it’s done
• Uses R-Storm binding github.com/allenday/R-Storm
– Storm pack...
© 2014 MapR Technologies 16
Methods Details: Easy integration
R: lambda function
storm = Storm$new();
storm$lambda = funct...
© 2014 MapR Technologies 17
Results
• Change points are identified, but none for winning streak
– Not using score differen...
© 2014 MapR Technologies 18
Q&A
@allenday maprtech
allenday@mapr.com
Engage with us!
MapR
maprtech
linkedin.com/in/allenday
Upcoming SlideShare
Loading in …5
×

R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose

4,763 views

Published on

Architecting R into the Storm Application Development Process
~~~~~
The business need for real-time analytics at large scale has focused attention on the use of Apache Storm, but an approach that is sometimes overlooked is the use of Storm and R together. This novel combination of real-time processing with Storm and the practical but powerful statistical analysis offered by R substantially extends the usefulness of Storm as a solution to a variety of business critical problems. By architecting R into the Storm application development process, Storm developers can be much more effective. The aim of this design is not necessarily to deploy faster code but rather to deploy code faster. Just a few lines of R code can be used in place of lengthy Storm code for the purpose of early exploration – you can easily evaluate alternative approaches and quickly make a working prototype.

In this presentation, Allen will build a bridge from basic real-time business goals to the technical design of solutions. We will take an example of a real-world use case, compose an implementation of the use case as Storm components (spouts, bolts, etc.) and highlight how R can be an effective tool in prototyping a solution.

Published in: Sports
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... ,DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ,DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ,DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ,DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ,DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ,DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Hey guys! Who wants to chat with me? More photos with me here 👉 http://www.bit.ly/katekoxx
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose

  1. 1. © 2014 MapR Technologies 1 Talk Overview • Agile Real-time Stats • R + Storm github.com/allenday/R-Storm • DEMO • How to do it? • Q & A @allenday Agile Methods Advanced Statistics Continuous Real-time Delivery github.com/allenday/hadoop-summit-r-storm-demo-public
  2. 2. © 2014 MapR Technologies 2© 2014 MapR Technologies Architecting R into the Storm Application Development Process
  3. 3. © 2014 MapR Technologies 3 Allen (me) and Sungwook @ MapR • Allen Day, Principal Data Scientist [ @allenday ] 7yr Hadoop dev, 12yr R dev/author PhD, Human Genetics, UCLA Medicine • Sungwook Yoon, Data Scientist Spark & Security Expert PhD, Computer Engineering, Purdue • MapR [ @mapr ] Distributes open source components for Hadoop Adds major technology for performance, HA, industry standard APIs
  4. 4. © 2014 MapR Technologies 4 What’s Storm? What’s R? • What’s Storm? – Processes a data stream. Akin to UNIX pipe + tee & merge commands – Runs on a cluster. Fault-tolerant and designed to scale out – Used for: real-time analytics & machine learning • What’s R? – Programming language with advanced statistics libraries – Does not scale out. Can scale up – Used for: prototyping, data modeling, visualization How to combine these?
  5. 5. © 2014 MapR Technologies 5 R outside, Storm inside: not practical. Why? • Model-building and QA is done on data snapshots • However, R => Hadoop is realistic. Key difference: referenced data can be static – Use MapR snapshots for dev and QA – See also: RHIPE (Purdue) and RHadoop (RevolutionAnalytics) R Storm User
  6. 6. © 2014 MapR Technologies 6 Storm outside, R inside: a good fit • Enables separation of concerns – Independently manage modeling, ops timelines, and version control – Integrate as needed • Enables role specialization – R built-ins allow faster iteration and more concise stats-type code – Do DevOps with specific SW engineering tech, e.g. Java Storm R User
  7. 7. © 2014 MapR Technologies 7© 2014 MapR Technologies Q: Who really likes statistics? A: Baseball fans A: Team Managers = Portfolio Managers
  8. 8. © 2014 MapR Technologies 8
  9. 9. © 2014 MapR Technologies 9 Fresh Local Data Tonight!
  10. 10. © 2014 MapR Technologies 10 Famous Vintage Data Oakland Athletics 2002 Season 20 consecutive wins – the current record Obligatory movie ref… I’m from LA LET’S GO DODGERS!
  11. 11. © 2014 MapR Technologies 11© 2014 MapR Technologies Goal: Detect “Moneyball” 2002 Winning Streak
  12. 12. © 2014 MapR Technologies 12 Methods: Change Point Detection Find natural breakpoints in a time-series set of data points R packages implement this: changepoint: more sensitve, but not streaming bcp: streaming, but less sensitive
  13. 13. © 2014 MapR Technologies 13 GIFs to MapR Filesystem Methods: R+Storm Demo Architecture Storm Bolt R online change point detector Storm Bolt (write to Jetty) Oakland A’s Data (accelerated) Jetty Webserver Browser (D3.js) Us  github.com/allenday/hadoop-summit-r-storm-demo-public
  14. 14. © 2014 MapR Technologies 14© 2014 MapR Technologies 50-game sliding window/buffer to detect change points Cumulative history with detected break points Raw data (score difference between A’s and opponent) Demo
  15. 15. © 2014 MapR Technologies 15 Methods Details: How it’s done • Uses R-Storm binding github.com/allenday/R-Storm – Storm package on CRAN cran.r-project.org/web/packages/Storm Storm (dev team) R (stats team) Storm (dev team, pure Java) Producer Consumer
  16. 16. © 2014 MapR Technologies 16 Methods Details: Easy integration R: lambda function storm = Storm$new(); storm$lambda = function(s) { t = s$tuple; t$output = vector(length=1); t$output[1] = “tada!” s$emit(t) } Storm: extend ShellBolt public static class MyRBolt extends ShellBolt implements IRichBolt { public RBolt() { super("Rscript", ”my.R"); } }
  17. 17. © 2014 MapR Technologies 17 Results • Change points are identified, but none for winning streak – Not using score difference, anyway • Time to integrate with the modeling team! – Send @kunpognr or @allenday a pull request on GitHub • Applicable to many other use cases, e.g. – Security (fraud detection, intrusion detection) – Marketing (intent to purchase / social media streams) – Customer Support (help desk voice calls) Discussion
  18. 18. © 2014 MapR Technologies 18 Q&A @allenday maprtech allenday@mapr.com Engage with us! MapR maprtech linkedin.com/in/allenday

×