Your SlideShare is downloading. ×
Real-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to Production
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Real-time Big Data Analytics: From Deployment to Production

9,622

Published on

As the Big Data market has evolved, the focus has shifted from data operations (storage, access and processing of data) to data science (understanding, analyzing and forecasting from data). And as new …

As the Big Data market has evolved, the focus has shifted from data operations (storage, access and processing of data) to data science (understanding, analyzing and forecasting from data). And as new models are developed, organizations need a process for deploying analytics from research into the production environment. In this talk, we'll describe the five stages of real-time analytics deployment:

Data distillation
Model development
Model validation and deployment
Model refresh
Real-time model scoring

We'll review the technologies supporting each stage, and how Revolution Analytics software works with the entire analytics stack to bring Big Data analytics to real-time production environments.

0 Comments
10 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
9,622
On Slideshare
0
From Embeds
0
Number of Embeds
19
Actions
Shares
0
Downloads
368
Comments
0
Likes
10
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Revolution Confidential R eal-Time P redic tive A nalytic s with B ig Data from deployment to produc tionDavid M S mith @ revodavidV P Marketing and C ommunityR evolution Analytic s
  • 2. Today we’ll dis c us s : Revolution Confidential What is “real-time predictive analytics with big data?”  Case study: UpStream Software Real-time big-data stack Five phases of deployment to production Just what do we mean by “Big Data” and “Real-Time”, anyway? Recommendations Q&A 2
  • 3. R eal-Time P redic tive A nalytic s with B ig Data Revolution Confidential Real Time  Milliseconds? Seconds? Hours? Days?  In production, continuously updated Big Data  Size, flow rate, diversity  Conflict with ‘real time’ Predictive Analytics  Rear view: description, aggregation, tabulation  Prediction, inference, statistical modeling 3
  • 4. Revolution Confidential C as e S tudy www.upstreamsoftware.com“Given that our data sets are already in the terabytes and are growing rapidly, we depend on Revolution R Enterprise’s scalability and fast performance — we saw about a 4x From: “How Big Data is Changing Retail performance improvement on 50 Marketing Analytics” million records. It works brilliantly.” http://bit.ly/upstream-webinarJohn Wallace, CEO Upstream Software
  • 5. UpS tream S oftware Revolution ConfidentialUpStream’s Big Data Analytics engineanalyzes and optimizes marketing mix forUpStream’s retailer clients  Customer analytics and segmentation  Revenue/ action attribution among channels and marketing programs  Customized Next Best Action per individual prospect or customer  Event-Triggered Marketing: responses to consumer actions 5
  • 6. B ig Data Revolution Confidential Demographics: consumer, product, market Actions: web clicks, email clicks, mobile app usage, call center logs, social, search … Outcomes: impressions, touches, orders (retail, online, mobile) 6
  • 7. P redic tive A nalytic s Revolution Confidential • Time-to-event models UPSTREAM DATA • Exploratory data analysis CUSTOM VARIABLES FORMAT (PMML) • GAM survival models • ETL • Scoring for inference • N marketing channels • Scoring for prediction • Behavioral variables • 5 billion scores per day • Promotional data per customer • Overlay data
  • 8. R eal Time Revolution Confidential http://www.infoworld.com/d/big-data/williams-sonoma-uses-big-data-zero-in-customers-206641 November 8, 2012 8
  • 9. P redic tive vs Des c riptive A nalytic s Revolution Confidential Retail sales are influenced by stores near the home Newer customers are more sensitive to marketing Google keywords perform worse than you think Display advertising performs better than you think Online sales benefit more from seasonal events than retail stores 9
  • 10. R eal-time Revolution ConfidentialB ig DataP redic tiveA nalytic sS tac k 10
  • 11. Data L ayer Revolution Confidential Structured data  RDBMS, NoSQL, Hbase, Impala Unstructured data  Hadoop MapReduce Streaming data  Web, social, sensors, operational systems Some descriptive analytics done here 11
  • 12. A nalytic s layer Revolution Confidential Predictive analytics technology  Development environment: build models  Production environment:  Deploy real-time scoring  Engine for dynamic analytics Local data mart  Static, periodically updated from data layer  Improves performance 12
  • 13. Integration L ayer Revolution Confidential Connective tissue between end-user applications and analytics engine Engines for flow control, real-time scoring  Rules engine / CEP engine API for Dynamic Analytics  Brokers communication between app developers and data scientists 13
  • 14. Dec is ion L ayer Revolution Confidential End-user interface to analytics system  Customers  Operations  Business analysts  C-suite Familiar & simple interfaces  Supports a range of end-user technologies  Level of UI complexity dictated by need 14
  • 15. R eal-Time Deployment Revolution ConfidentialFive phases of deploying real-time predictiveanalytics with big data to production:1. Data Distillation2. Model Development3. Validation and Deployment4. Real-time Scoring5. Model refresh 16
  • 16. P has e 1: Data Dis tillation Revolution ConfidentialThe data in the Data Layer isn’t yet ready forpredictive modeling. We need to:  Extract features from unstructured text  Topic modeling, sentiment analysis  Combine disparate data sources  Filter for populations of interest  Select relevant features and outcomes for modeling  Export to the data mart for predictive modeling 17
  • 17. E xample: Data Dis tillation in Hadoop Revolution Confidential Log Files Event Streams HDFS Load Map-Reduce Structured Data rmr Scraped Text Unstructured Analytics Data Data Mart Revolution R Enterprise for Hadoop: bit.ly/r-hadoop 18
  • 18. A utomating the E xtrac tion P roc es s Revolution Confidential This process needs to be repeatable and maintainable Create a re-usable R script:  ASCII: rxDataStep  SQL: rxImport, ROracle, RMySQL  NoSQL: Rcassandra, rmongodb  Hadoop: rmr, rhbase, Rhive  Appliances: nza, teradataR, PL/R  Feeds: rjson, XML, RCurl, twitteR, python R script runs from Analytics Layer  Processing: in Data Layer  Output: structured file for Data Mart 19
  • 19. Data Revolution ConfidentialDis tillationP roc es s XD F 20
  • 20. P has e 2: Model development Revolution Confidential Goal: create a predictive model that is  Powerful  Robust  Comprehensible  Implementable Key requirements for Data Scientists:  Flexibility  Productivity  Speed  Reproducibility 21
  • 21. T he Model Development C yc le Revolution Confidential Download the White Paper R is Hot Variable bit.ly/r-is-hot Transformation Feature Selection Model PredictiveData Mart Sampling Estimation Aggregation Model Model Model Comparison / Refinement Benchmarking 22
  • 22. Tec hnology c ons iderations Revolution Confidential Moving Big Data is slow  So just move it once, to data mart near the development and production environments Static data needed for modeling cycle  But have a plan to refresh and update Data layer not optimized for predictive analytics  But good for descriptive (data distillation) Revolution R Enterprise optimizations for the analytics layer  R; XDF file format; Parallel External Memory Algorithms 23
  • 23. P redic tive Modeling A lgorithms Revolution Confidential Algorithm Example Applications Big Data ETL, data distillation, record/variable selection, Data Step variable transformation ✓✓ Descriptive Statistics Exploratory Data Analysis, Data Validation ✓✓ Tables & Cubes Reporting, contingency analysis ✓✓Correlation / Covariance Factor Analysis, Value at Risk ✓✓ Linear regression Forecasting, Net present value estimation ✓✓ Logistic Regression Response modeling, offer selection ✓✓ Generalized Linear Models Capital reserve estimation, climate modeling ✓✓ K-means clustering Customer Segmentation ✓✓ Decision Trees Dynamic pricing, classification, variable importance ✓✓ Model Prediction Real-time Scoring (decisions, offers, actions) ✓✓ R CRAN packages Everything else Parallel & distributed Simulations, By-Group analysis, ensemble models, computing with R custom applications ✓ 24
  • 24. High performanc e with dis tributed c lus ters Revolution Confidential Data Scientists / Modelers Grid computing cluster Revolution R Enterprise Platform LSF Microsoft HPC Server Microsoft Azure Ad-hoc grids (SNOW) Shared SMP server Hadoop / HDFS 25
  • 25. P has e 3: Model Validation and Deployment Revolution Confidential Scoring rules map parameters to outputs  Parameters: information known at real time  prospect ID, product, webpage, …  Scores: values inferred in real time  prices, recommendations, actions, … Validation: backtest with historical data Deployment: code running in real time 26
  • 26. Validation for produc tion Revolution Confidential Real Outcomes Historical Data Scores Scoring Rules Refresh data mart Rebuild model, withholding a validation set  Random sampling Create accuracy measure  ROC, positive rate, false positive rate Measure and monitor 27
  • 27. Deployment with R c ode Revolution Confidential OutcomesHistorical Scores R Model R Model Data Data New Object Object Scoring rules captured as “R model objects” Move R code directly from development environment to production server  No recoding: Lowest cost, greatest speed & reliability  Most flexibility (not limited in model choice)  Easy to validate with custom accuracy measures 28
  • 28. Managing and S c aling Deployed R C ode Revolution Confidential Revolution R Enterprise includes RevoDeployR  Code management & tracking  Security & resource allocation  Scalability for real-time demands  Web Services API Scoring and dynamic analytics  Data visualization  custom reporting  Ad-hoc analytics  Interactive data apps 29
  • 29. C ode C onvers ion for S c oring Revolution Confidential Business rules (e.g. IBM ILOG)  Create directly with R code from model object Recode in other languages  SQL  C++, etc. Low latency Slow and costly deployment Potential for errors 30
  • 30. S c oring via P MML Revolution Confidential Some predictive models can be expressed in the PMML standard (www.dmg.org)  Not all, but growing all the time ScoresR Model PMML Data PMML Object pmml Easily generate PMML with R “pmml” package PMML Scoring Engine: Zementis Adapa Databases and appliances may support PMML  But supported standards vary 31
  • 31. P has e 4: R eal-Time S c oring Revolution Confidential Scoring triggered by Decision Layer, brokered by Integration Layer  R code  Revolution R Enterprise Server  In-appliance (e.g IBM PureData System for Analytics)  SQL  In-database  Rules  Compiled code  Bespoke engines May be using hardware from data layer  But not the actual data 32
  • 32. R eal-Time Revolution ConfidentialS c oring:review iLog SQL Near Real Time 33
  • 33. P has e 5: Model R efres h Revolution Confidential Deployed scoring rules no longer connected to Data Layer or Data Mart  Enables real-time performance  Need to refresh using recent data Model refresh process  Use data extract script  Re-run model script  Statistical review OR automated validation 34
  • 34. S c heduled B atc h Updates Revolution Confidential  Periodic model refresh  Weekly  Daily Revolution R Enterprise  Hourly Production Server Cluster  Automated Scheduler validate/deploy process Excel RevoDeployR Server Web Services API 35
  • 35. R e-developing the model Revolution Confidential Even then, the underlying structure of the data may change:  Important variables become non-significant  Non-significant variables become important!  New data sources become available Model accuracy drift is a warning sign Return to Phase 2 or Phase 1 36
  • 36. Revolution Confidential Kilobytes / second Megabytes / secondHow big is‘B ig data’? MegabytesGigabytes  Terabytes Petabytes  Exabytes 38
  • 37. Revolution Confidential Seconds or less MillisecondsHow fas t is‘R eal time’? Daily  Hourly Hours  Minutes Daily  Hourly Weeks  Days 39
  • 38. R ec ommendations Revolution Confidential Architect your Predictive Analytics Stack  Best of breed vs single-vendor stack  Diversify data sources and decision apps R = Predictive Analytics Revolution R Enterprise provides:  Analytics Layer: big data, performance  Integration Layer: scalability, reliability Get Help to Get Started  Consider an SI partner for architecture, best practices and implementation advice 40
  • 39. R es ourc es Revolution Confidential Revolution R Enterprise : R for Big Data  www.revolutionanalytics.com/products Revolution Analytics Consulting Services  www.revolutionanalytics.com/services  Contact us: bit.ly/hey-revo Rhadoop : Connecting R and Hadoop  bit.ly/r-hadoop Contact David Smith  david@revolutionanalytics.com  @revodavid  blog.revolutionanalytics.com 41
  • 40. T hank you. Revolution Confidential The leading commercial provider of software and support for the popular open source R statistics language. www.revolutionanalytics.com 650.646.9545 Twitter: @RevolutionR 42

×