Your SlideShare is downloading. ×
Real-time Big Data Analytics: From Deployment to Production
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Real-time Big Data Analytics: From Deployment to Production

10,641
views

Published on


0 Comments
12 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
10,641
On Slideshare
0
From Embeds
0
Number of Embeds
11
Actions
Shares
0
Downloads
166
Comments
0
Likes
12
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Get out your buzzword bingo cards!
  • Data as “new oil” – valuable commodityBig Data is crude oil: messy, hard to get at, got contaminants in it.
  • Start off with stuff we know in real time.
  • Model development processNot just about the computational speed. Also about productivity of developer.
  • Demographics: consumer, product, marketActions: web clicks, email clicks, mobile app usage, call center logs, social, search …Outcomes: impressions, touches, orders (retail, online, mobile)Strategic allocation
  • Outcome is “buying” instead of “dying”
  • From Revolution Analytics. We help companies deploy predictive models created in R to real-time production systems.
  • Transcript

    • 1. David Smith Revolution Analytics @revodavidReal-Time Big Data AnalyticsFrom Deployment to Production 1
    • 2. 2
    • 3. Buzzword Bingo! REAL TIME BIG DATA PREDICTIVE ANALYTICS 3
    • 4. Photo: Sarah&Boston (flickr: pocheco) Creative Commons BY-SA 2.0 4
    • 5. User IDPredictive Browser Factors Time/Date / Location Any known informationAnalytics Previous purchases Friend dataModel Decision Tree Logistic Regression Neural Network Predictive Model K-means clustering Scoring Rules Ensemble Model Product of most interest Offer of most likely sale Scores Most relevant Selection Prediction or link Forecast sale value Optimal Bid ”IO VAPOURA” by Jaya Prime flickr.com/photos/sanjayaprime/4924462993 CC-BY 2.0 5
    • 6. Real-time Deployment1. Data distillation2. Model development and validation3. Model deployment4. Real-time model scoring5. Model refresh "CLOCK" by Heiko Klingele flickr.com/photos/divdax/3458668053/ CC-BY 2.0 6
    • 7. 1. Data Distillation in Hadoop Log FilesSensor Streams HDFS Load Map-Reduce Structured Data rmr Language Text Unstructured Analytics Data Data Mart 7
    • 8. 2. The Model Development Cycle Feature Selection Sampling Aggregati on Model Comparis VariableStructured Data on / Bench- Trans- formation Predictive Model marking Model Model Refineme nt Estimation R White Paper bit.ly/r-is-hot 8
    • 9. 3: Deployment Options Factors Unknown factors SQL / Rules Engine Code (C++, Java, R, Hadoop) PMML Engine Factors known in advance Batch Lookup Tables Scores 9
    • 10. Why did I buy that blender? Just browsing in the mall TV ad / magazine ad Coupon in the mail “Just moved” promo email Webstore recommendation Browsing catalog 10
    • 11. UpStream: Attribution Modeling 11
    • 12. 4. Model • Exploratory data analysisScoring • Time-to-event models • GAM survival modelsUPSTREAM DATA CUSTOM VARIABLESFORMAT (PMML) • ETL • Scoring for inference • Marketing channel data • Scoring for prediction • Behavioral variables • 5 billion scores per day • Promotional data per retailer • Overlay data
    • 13. 5. Model refresh Factors Scores Actual Outcomes
    • 14. Big Data Real TimeKilobytes/S Seconds ecMegabytes/ Milliseconds Sec Gigabytes Minutes TerabytesPetabytes  Minutes  Exabytes Hours 14
    • 15. PREDICTIVEANALYTICS BIG DATAREAL TIME 15
    • 16. Real-Time Big Data Predictive Analytics: David SmithFrom Deployment to Production @revodavid The leading enterprise provider of software and services for Open Source R Booth 618 / Office Hours Weds 1:30PM www.revolutionanalytics.com +1 650 646 9545 Twitter: @RevolutionR 16