Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Big data real time R - useR! 2013 - David Smith


Published on

Published in: Technology, Business
  • Be the first to comment

Big data real time R - useR! 2013 - David Smith

  1. 1. Big-data, real-time R? Yes, you can! 1 David Smith Revolution Analytics @revodavid
  3. 3. Real-time Deployment 1. Data distillation 2. Model development and validation 3. Model deployment 4. Real-time model scoring 5. Model refresh 3
  4. 4. 4Photo: Sarah&Boston (flickr: pocheco) Creative Commons BY-SA 2.0 “Big Data”
  5. 5. 1. Data Distillation in Hadoop 5 Unstructured Data Analytics Data Mart Structured Data Log Files Sensor Streams Language Text HDFS Load Map-Reduce RHadoop rmr
  6. 6. 6 2. The Model Development Cycle Feature Selection Sampling Aggregati on Variable Trans- formation Model Estimation Model Refineme nt Model Comparis on / Bench- marking Predictive Model R White Paper Structured Data
  7. 7. 7 Big-Data Predictive Models with ScaleR
  8. 8. 3: Deployment Options Unknown factors SQL / Rules Engine Code (C++, Java, R, Hadoop) PMML Engine Factors known in advance Batch Lookup Tables 8 Factors Scores
  9. 9. 9 4. Real-Time Scoring Factors Scores ”IO VAPOURA” by Jaya Prime CC-BY 2.0 Decision Tree Logistic Regression Neural Network K-means clustering Ensemble Model Predictive Model User ID Browser Time/Date / Location Previous purchases Friend data Any known information Product of most interest Offer of most likely sale Most relevant link Forecast sale value Optimal Bid Prediction or Selection Scoring Rules
  10. 10. 5. Model refresh Factors Scores Actual Outcomes
  11. 11. 11 Big Data Real Time Kilobytes/S ec Megabytes/ Sec Gigabytes  Terabytes Petabytes  Exabytes Seconds Milliseconds Minutes Minutes  Hours
  12. 12. Real-World Examples Revolution Analytics Case Studies 12
  13. 13. Why did I buy that blender? Just browsing in the mall TV ad / magazine ad Coupon in the mail “Just moved” promo email Webstore recommendation Browsing catalog 13
  14. 14. UpStream: Attribution Modeling 14
  15. 15. • ETL • Marketing channel data • Behavioral variables • Promotional data • Overlay data • Exploratory data analysis • Time-to-event models • GAM survival models • Scoring for inference • Scoring for prediction • 5 billion scores per day per retailer UPSTREAM DATA FORMAT CUSTOM VARIABLES (PMML)
  16. 16. ACI Top-20 mutual fund company $125B assets Research and data-driven Innovative 16
  17. 17. • Collaboration • Speed • Deployment Process • Adoption • Results 17 Analytics Function Library rACI Package (w/ RevoR) Model Building Function Library Data Acquisition Function Library Portfolio Optimization and Simulation API Market Data from Thomson Reuters (QA-Direct) American Century Quant Proprietary Data Additional 3rd Party Data Vendors Live Analytics PRODUCTION MODEL GENERATION AND TRADING PROCESSES Data Feeds
  19. 19. 19 +1 650 646 9545 Twitter: @RevolutionR The leading enterprise provider of software and services for Open Source R Big-Data, Real-Time R? Yes, you can! David Smith @revodavid