Real-time Big Data Analytics: From Deployment to Production

13,480 views

Published on

0 Comments
14 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
13,480
On SlideShare
0
From Embeds
0
Number of Embeds
8,425
Actions
Shares
0
Downloads
221
Comments
0
Likes
14
Embeds 0
No embeds

No notes for slide
  • Get out your buzzword bingo cards!
  • Data as “new oil” – valuable commodityBig Data is crude oil: messy, hard to get at, got contaminants in it.
  • Start off with stuff we know in real time.
  • Model development processNot just about the computational speed. Also about productivity of developer.
  • Demographics: consumer, product, marketActions: web clicks, email clicks, mobile app usage, call center logs, social, search …Outcomes: impressions, touches, orders (retail, online, mobile)Strategic allocation
  • Outcome is “buying” instead of “dying”
  • From Revolution Analytics. We help companies deploy predictive models created in R to real-time production systems.
  • Real-time Big Data Analytics: From Deployment to Production

    1. 1. David Smith Revolution Analytics @revodavidReal-Time Big Data AnalyticsFrom Deployment to Production 1
    2. 2. 2
    3. 3. Buzzword Bingo! REAL TIME BIG DATA PREDICTIVE ANALYTICS 3
    4. 4. Photo: Sarah&Boston (flickr: pocheco) Creative Commons BY-SA 2.0 4
    5. 5. User IDPredictive Browser Factors Time/Date / Location Any known informationAnalytics Previous purchases Friend dataModel Decision Tree Logistic Regression Neural Network Predictive Model K-means clustering Scoring Rules Ensemble Model Product of most interest Offer of most likely sale Scores Most relevant Selection Prediction or link Forecast sale value Optimal Bid ”IO VAPOURA” by Jaya Prime flickr.com/photos/sanjayaprime/4924462993 CC-BY 2.0 5
    6. 6. Real-time Deployment1. Data distillation2. Model development and validation3. Model deployment4. Real-time model scoring5. Model refresh "CLOCK" by Heiko Klingele flickr.com/photos/divdax/3458668053/ CC-BY 2.0 6
    7. 7. 1. Data Distillation in Hadoop Log FilesSensor Streams HDFS Load Map-Reduce Structured Data rmr Language Text Unstructured Analytics Data Data Mart 7
    8. 8. 2. The Model Development Cycle Feature Selection Sampling Aggregati on Model Comparis VariableStructured Data on / Bench- Trans- formation Predictive Model marking Model Model Refineme nt Estimation R White Paper bit.ly/r-is-hot 8
    9. 9. 3: Deployment Options Factors Unknown factors SQL / Rules Engine Code (C++, Java, R, Hadoop) PMML Engine Factors known in advance Batch Lookup Tables Scores 9
    10. 10. Why did I buy that blender? Just browsing in the mall TV ad / magazine ad Coupon in the mail “Just moved” promo email Webstore recommendation Browsing catalog 10
    11. 11. UpStream: Attribution Modeling 11
    12. 12. 4. Model • Exploratory data analysisScoring • Time-to-event models • GAM survival modelsUPSTREAM DATA CUSTOM VARIABLESFORMAT (PMML) • ETL • Scoring for inference • Marketing channel data • Scoring for prediction • Behavioral variables • 5 billion scores per day • Promotional data per retailer • Overlay data
    13. 13. 5. Model refresh Factors Scores Actual Outcomes
    14. 14. Big Data Real TimeKilobytes/S Seconds ecMegabytes/ Milliseconds Sec Gigabytes Minutes TerabytesPetabytes  Minutes  Exabytes Hours 14
    15. 15. PREDICTIVEANALYTICS BIG DATAREAL TIME 15
    16. 16. Real-Time Big Data Predictive Analytics: David SmithFrom Deployment to Production @revodavid The leading enterprise provider of software and services for Open Source R Booth 618 / Office Hours Weds 1:30PM www.revolutionanalytics.com +1 650 646 9545 Twitter: @RevolutionR 16

    ×