Big data real time R - useR! 2013 - David Smith

3,863 views
3,601 views

Published on

Published in: Technology, Business
0 Comments
10 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,863
On SlideShare
0
From Embeds
0
Number of Embeds
21
Actions
Shares
0
Downloads
138
Comments
0
Likes
10
Embeds 0
No embeds

No notes for slide
  • FastScalableIn Production
  • Data as “new oil” – valuable commodityBig Data is crude oil: messy, hard to get at, got contaminants in it.
  • Model development processNot just about the computational speed. Also about productivity of developer.
  • Start off with stuff we know in real time.
  • Demographics: consumer, product, marketActions: web clicks, email clicks, mobile app usage, call center logs, social, search …Outcomes: impressions, touches, orders (retail, online, mobile)Strategic allocation
  • Outcome is “buying” instead of “dying”
  • From Revolution Analytics. We help companies deploy predictive models created in R to real-time production systems.
  • Big data real time R - useR! 2013 - David Smith

    1. 1. Big-data, real-time R? Yes, you can! 1 David Smith Revolution Analytics @revodavid
    2. 2. 2 REAL TIME BIG DATA PREDICTIVE ANALYTICS Buzzword Bingo!
    3. 3. Real-time Deployment 1. Data distillation 2. Model development and validation 3. Model deployment 4. Real-time model scoring 5. Model refresh 3
    4. 4. 4Photo: Sarah&Boston (flickr: pocheco) Creative Commons BY-SA 2.0 “Big Data”
    5. 5. 1. Data Distillation in Hadoop 5 Unstructured Data Analytics Data Mart Structured Data Log Files Sensor Streams Language Text HDFS Load Map-Reduce RHadoop rmr
    6. 6. 6 2. The Model Development Cycle Feature Selection Sampling Aggregati on Variable Trans- formation Model Estimation Model Refineme nt Model Comparis on / Bench- marking Predictive Model R White Paper bit.ly/r-is-hot Structured Data
    7. 7. 7 Big-Data Predictive Models with ScaleR
    8. 8. 3: Deployment Options Unknown factors SQL / Rules Engine Code (C++, Java, R, Hadoop) PMML Engine Factors known in advance Batch Lookup Tables 8 Factors Scores
    9. 9. 9 4. Real-Time Scoring Factors Scores ”IO VAPOURA” by Jaya Prime flickr.com/photos/sanjayaprime/4924462993 CC-BY 2.0 Decision Tree Logistic Regression Neural Network K-means clustering Ensemble Model Predictive Model User ID Browser Time/Date / Location Previous purchases Friend data Any known information Product of most interest Offer of most likely sale Most relevant link Forecast sale value Optimal Bid Prediction or Selection Scoring Rules
    10. 10. 5. Model refresh Factors Scores Actual Outcomes
    11. 11. 11 Big Data Real Time Kilobytes/S ec Megabytes/ Sec Gigabytes  Terabytes Petabytes  Exabytes Seconds Milliseconds Minutes Minutes  Hours
    12. 12. Real-World Examples Revolution Analytics Case Studies 12
    13. 13. Why did I buy that blender? Just browsing in the mall TV ad / magazine ad Coupon in the mail “Just moved” promo email Webstore recommendation Browsing catalog 13
    14. 14. UpStream: Attribution Modeling 14
    15. 15. • ETL • Marketing channel data • Behavioral variables • Promotional data • Overlay data • Exploratory data analysis • Time-to-event models • GAM survival models • Scoring for inference • Scoring for prediction • 5 billion scores per day per retailer UPSTREAM DATA FORMAT CUSTOM VARIABLES (PMML)
    16. 16. ACI Top-20 mutual fund company $125B assets Research and data-driven Innovative 16
    17. 17. • Collaboration • Speed • Deployment Process • Adoption • Results 17 Analytics Function Library rACI Package (w/ RevoR) Model Building Function Library Data Acquisition Function Library Portfolio Optimization and Simulation API Market Data from Thomson Reuters (QA-Direct) American Century Quant Proprietary Data Additional 3rd Party Data Vendors Live Analytics PRODUCTION MODEL GENERATION AND TRADING PROCESSES Data Feeds
    18. 18. 18 PREDICTIVE ANALYTICS BIG DATA REAL TIME
    19. 19. 19 www.revolutionanalytics.com +1 650 646 9545 Twitter: @RevolutionR The leading enterprise provider of software and services for Open Source R Big-Data, Real-Time R? Yes, you can! David Smith @revodavid

    ×