SlideShare a Scribd company logo
Revolution Confidential




    R eal-Time P redic tive
   A nalytic s with B ig Data
   from deployment to produc tion




David M S mith @ revodavid
V P Marketing and C ommunity
R evolution Analytic s
Today we’ll dis c us s :                 Revolution Confidential




 What is “real-time predictive analytics with
  big data?”
   Case study: UpStream Software
 Real-time big-data stack
 Five phases of deployment to production
 Just what do we mean by “Big Data” and
  “Real-Time”, anyway?
 Recommendations
 Q&A

                                                           2
R eal-Time P redic tive A nalytic s with B ig Data
                                             Revolution Confidential




 Real Time
   Milliseconds? Seconds? Hours? Days?
   In production, continuously updated
 Big Data
   Size, flow rate, diversity
   Conflict with ‘real time’
 Predictive Analytics
   Rear view: description, aggregation, tabulation
   Prediction, inference, statistical modeling
                                                               3
Revolution Confidential




                                C as e S tudy


                             www.upstreamsoftware.com


“Given that our data sets are already
   in the terabytes and are growing
 rapidly, we depend on Revolution R
    Enterprise’s scalability and fast
 performance — we saw about a 4x           From: “How Big Data is Changing Retail
   performance improvement on 50                                Marketing Analytics”
 million records. It works brilliantly.”             http://bit.ly/upstream-webinar

John Wallace, CEO Upstream Software
UpS tream S oftware                       Revolution Confidential




UpStream’s Big Data Analytics engine
analyzes and optimizes marketing mix for
UpStream’s retailer clients
   Customer analytics and segmentation
   Revenue/ action attribution among channels and
    marketing programs
   Customized Next Best Action per individual
    prospect or customer
   Event-Triggered Marketing: responses to
    consumer actions
                                                            5
B ig Data                                       Revolution Confidential




 Demographics: consumer, product, market
 Actions: web clicks, email clicks, mobile app usage,
  call center logs, social, search …
 Outcomes: impressions, touches, orders (retail, online,
  mobile)

                                                                  6
P redic tive A nalytic s                                                      Revolution Confidential




                             • Time-to-event models
 UPSTREAM DATA               • Exploratory data analysis                 CUSTOM VARIABLES
 FORMAT                                                                            (PMML)
                             • GAM survival models




  •   ETL                                                  • Scoring for inference
  •   N marketing channels                                 • Scoring for prediction
  •   Behavioral variables
                                                           • 5 billion scores per day
  •   Promotional data                                       per customer
  •   Overlay data
R eal Time                                                                                       Revolution Confidential




        http://www.infoworld.com/d/big-data/williams-sonoma-uses-big-data-zero-in-customers-206641
        November 8, 2012                                                                                           8
P redic tive vs Des c riptive A nalytic s    Revolution Confidential




 Retail sales are influenced by stores near the
  home

 Newer customers are more sensitive to marketing

 Google keywords perform worse than you think

 Display advertising performs better than you think

 Online sales benefit more from seasonal events
  than retail stores
                                                               9
R eal-time     Revolution Confidential



B ig Data
P redic tive
A nalytic s
S tac k




                                10
Data L ayer                                Revolution Confidential




 Structured data
   RDBMS, NoSQL, Hbase, Impala
 Unstructured data
   Hadoop MapReduce
 Streaming data
   Web, social, sensors, operational systems
 Some descriptive analytics done here
                                                            11
A nalytic s layer                                  Revolution Confidential




 Predictive analytics technology
   Development environment: build models
   Production environment:
      Deploy real-time scoring
      Engine for dynamic analytics
 Local data mart
   Static, periodically updated from data layer
   Improves performance

                                                                    12
Integration L ayer                      Revolution Confidential




 Connective tissue between end-user
  applications and analytics engine
 Engines for flow control, real-time scoring
   Rules engine / CEP engine
 API for Dynamic Analytics
   Brokers communication between app developers
    and data scientists

                                                         13
Dec is ion L ayer                               Revolution Confidential




 End-user interface to analytics system
     Customers
     Operations
     Business analysts
     C-suite
 Familiar & simple interfaces
   Supports a range of end-user technologies
   Level of UI complexity dictated by need
                                                                 14
R eal-Time Deployment                   Revolution Confidential




Five phases of deploying real-time predictive
analytics with big data to production:

1.   Data Distillation
2.   Model Development
3.   Validation and Deployment
4.   Real-time Scoring
5.   Model refresh

                                                         16
P has e 1: Data Dis tillation                 Revolution Confidential




The data in the Data Layer isn’t yet ready for
predictive modeling. We need to:
   Extract features from unstructured text
      Topic modeling, sentiment analysis
   Combine disparate data sources
   Filter for populations of interest
   Select relevant features and outcomes for
    modeling
   Export to the data mart for predictive modeling

                                                               17
E xample: Data Dis tillation in Hadoop                                 Revolution Confidential




    Log Files



 Event Streams HDFS Load                          Map-Reduce      Structured
                                                                     Data
                                                      rmr

  Scraped Text



  Unstructured                                                    Analytics
     Data                                                         Data Mart

                 Revolution R Enterprise for Hadoop: bit.ly/r-hadoop
                                                                                        18
A utomating the E xtrac tion P roc es s           Revolution Confidential




 This process needs to be repeatable and
  maintainable
 Create a re-usable R script:
     ASCII: rxDataStep
     SQL: rxImport, ROracle, RMySQL
     NoSQL: Rcassandra, rmongodb
     Hadoop: rmr, rhbase, Rhive
     Appliances: nza, teradataR, PL/R
     Feeds: rjson, XML, RCurl, twitteR, python
 R script runs from Analytics Layer
   Processing: in Data Layer
   Output: structured file for Data Mart
                                                                   19
Data                 Revolution Confidential



Dis tillation
P roc es s




                XD
                F




                                      20
P has e 2: Model development              Revolution Confidential




 Goal: create a predictive model that is
     Powerful
     Robust
     Comprehensible
     Implementable
 Key requirements for Data Scientists:
     Flexibility
     Productivity
     Speed
     Reproducibility
                                                           21
T he Model Development C yc le                                                       Revolution Confidential




                                                                   Download the White Paper
                                                                          R is Hot
                                      Variable                            bit.ly/r-is-hot
                                   Transformation




              Feature
             Selection                                         Model             Predictive
Data Mart    Sampling                                        Estimation
            Aggregation                                                           Model




                       Model
                                                      Model
                    Comparison /
                                                    Refinement
                    Benchmarking




                                                                                                 22
Tec hnology c ons iderations                       Revolution Confidential




 Moving Big Data is slow
   So just move it once, to data mart near the development
    and production environments
 Static data needed for modeling cycle
   But have a plan to refresh and update
 Data layer not optimized for predictive analytics
   But good for descriptive (data distillation)
 Revolution R Enterprise optimizations for the
  analytics layer
   R; XDF file format; Parallel External Memory Algorithms

                                                                    23
P redic tive Modeling A lgorithms                             Revolution Confidential

     Algorithm                         Example Applications                          Big Data
                                ETL, data distillation, record/variable selection,
              Data Step
                                                          variable transformation
                                                                                       ✓✓

   Descriptive Statistics            Exploratory Data Analysis, Data Validation        ✓✓
        Tables & Cubes                         Reporting, contingency analysis         ✓✓
Correlation / Covariance                          Factor Analysis, Value at Risk       ✓✓
      Linear regression              Forecasting, Net present value estimation         ✓✓
    Logistic Regression                     Response modeling, offer selection         ✓✓
     Generalized Linear
                Models
                                  Capital reserve estimation, climate modeling         ✓✓

    K-means clustering                                 Customer Segmentation           ✓✓
         Decision Trees     Dynamic pricing, classification, variable importance       ✓✓
       Model Prediction           Real-time Scoring (decisions, offers, actions)       ✓✓
     R CRAN packages                                             Everything else
   Parallel & distributed   Simulations, By-Group analysis, ensemble models,
      computing with R                                    custom applications
                                                                                         ✓

                                                                                                       24
High performanc e with dis tributed c lus ters
                                             Revolution Confidential


   Data Scientists / Modelers   Grid computing cluster




     Revolution R Enterprise
                                  Platform LSF
                                  Microsoft HPC Server
                                  Microsoft Azure
                                  Ad-hoc grids (SNOW)
                                  Shared SMP server
                                  Hadoop / HDFS


                                                              25
P has e 3: Model Validation and Deployment   Revolution Confidential




 Scoring rules map parameters to outputs
   Parameters: information known at real time
      prospect ID, product, webpage, …
   Scores: values inferred in real time
      prices, recommendations, actions, …


 Validation: backtest with historical data
 Deployment: code running in real time


                                                              26
Validation for produc tion                                   Revolution Confidential




                                             Real Outcomes
             Historical
               Data




                                    Scores
                          Scoring
                           Rules



 Refresh data mart
 Rebuild model, withholding a validation set
   Random sampling
 Create accuracy measure
   ROC, positive rate, false positive rate
 Measure and monitor
                                                                              27
Deployment with R c ode                            Revolution Confidential



             Outcomes
Historical




                                                               Scores
                        R Model          R Model




                                  Data
  Data




                                  New
                         Object           Object



 Scoring rules captured as “R model objects”
 Move R code directly from development
  environment to production server
       No recoding: Lowest cost, greatest speed &
        reliability
       Most flexibility (not limited in model choice)
       Easy to validate with custom accuracy measures
                                                                        28
Managing and S c aling Deployed R C ode   Revolution Confidential




 Revolution R Enterprise includes RevoDeployR
     Code management & tracking
     Security & resource allocation
     Scalability for real-time demands
     Web Services API
 Scoring and dynamic analytics
     Data visualization
     custom reporting
     Ad-hoc analytics
     Interactive data apps

                                                           29
C ode C onvers ion for S c oring            Revolution Confidential




 Business rules (e.g. IBM ILOG)
   Create directly with R code from model object
 Recode in other languages
   SQL
   C++, etc.
 Low latency
 Slow and costly
  deployment
 Potential for errors

                                                             30
S c oring via P MML                               Revolution Confidential




 Some predictive models can be expressed in
  the PMML standard (www.dmg.org)
    Not all, but growing all the time




                                                                    Scores
R Model           PMML




                                    Data
                                           PMML
 Object   pmml


 Easily generate PMML with R “pmml” package
 PMML Scoring Engine: Zementis Adapa
 Databases and appliances may support PMML
    But supported standards vary

                                                                   31
P has e 4: R eal-Time S c oring                               Revolution Confidential




 Scoring triggered by Decision Layer, brokered by
  Integration Layer
   R code
      Revolution R Enterprise Server
      In-appliance (e.g IBM PureData System for Analytics)
   SQL
      In-database
   Rules
   Compiled code
      Bespoke engines
 May be using hardware from data layer
   But not the actual data
                                                                               32
R eal-Time               Revolution Confidential



S c oring:
review
             iLog




                                 SQL


             Near Real
               Time                       33
P has e 5: Model R efres h                  Revolution Confidential




 Deployed scoring rules no longer connected
  to Data Layer or Data Mart
   Enables real-time performance
   Need to refresh using recent data


 Model refresh process
   Use data extract script
   Re-run model script
   Statistical review OR automated validation
                                                             34
S c heduled B atc h Updates                       Revolution Confidential




                                 Periodic model refresh
                                   Weekly
                                   Daily
    Revolution R Enterprise        Hourly
    Production Server Cluster
                                 Automated
        Scheduler                 validate/deploy process
                                      Excel
   RevoDeployR Server

    Web Services API


                                                                   35
R e-developing the model                   Revolution Confidential




 Even then, the underlying structure of the
  data may change:
   Important variables become non-significant
   Non-significant variables become important!
   New data sources become available
 Model accuracy drift is a warning sign
 Return to Phase 2 or Phase 1



                                                            36
Revolution Confidential
    Kilobytes / second



                                     Megabytes / second
How big is
‘B ig data’?
                         Megabytes



Gigabytes  Terabytes




 Petabytes  Exabytes


                                                                     38
Revolution Confidential
    Seconds or less



                                             Milliseconds
How fas t is
‘R eal time’?
                      Daily  Hourly



    Hours  Minutes


                                   Daily  Hourly


     Weeks  Days


                                                                       39
R ec ommendations                                 Revolution Confidential




 Architect your Predictive Analytics Stack
   Best of breed vs single-vendor stack
   Diversify data sources and decision apps
 R = Predictive Analytics
  Revolution R Enterprise provides:
   Analytics Layer: big data, performance
   Integration Layer: scalability, reliability
 Get Help to Get Started
   Consider an SI partner for architecture, best
    practices and implementation advice
                                                                   40
R es ourc es                                 Revolution Confidential




 Revolution R Enterprise : R for Big Data
    www.revolutionanalytics.com/products
 Revolution Analytics Consulting Services
    www.revolutionanalytics.com/services
    Contact us: bit.ly/hey-revo
 Rhadoop : Connecting R and Hadoop
    bit.ly/r-hadoop

 Contact David Smith
    david@revolutionanalytics.com
       @revodavid
    blog.revolutionanalytics.com

                                                              41
T hank you.                                                                      Revolution Confidential




           The leading commercial provider of software and support for the popular
                            open source R statistics language.




 www.revolutionanalytics.com            650.646.9545                  Twitter: @RevolutionR




                                                                                                  42

More Related Content

What's hot

Software architecture design ppt
Software architecture design pptSoftware architecture design ppt
Software architecture design ppt
farazimlak
 
Cutover Plan V2
Cutover Plan V2Cutover Plan V2
Cutover Plan V2
Mahesh Vallampati
 
Business Value Measurements and the Solution Design Framework
Business Value Measurements and the Solution Design FrameworkBusiness Value Measurements and the Solution Design Framework
Business Value Measurements and the Solution Design Framework
Leo Barella
 
Introduction to system analysis and design
Introduction to system analysis and designIntroduction to system analysis and design
Introduction to system analysis and designTwene Peter
 
Business Process Modeling
Business Process ModelingBusiness Process Modeling
Business Process Modeling
Sandy Kemsley
 
A Roadmap to Data Migration Success
A Roadmap to Data Migration SuccessA Roadmap to Data Migration Success
A Roadmap to Data Migration Success
FindWhitePapers
 
Enterprise arhitecture blueprint objectives
Enterprise arhitecture blueprint objectivesEnterprise arhitecture blueprint objectives
Enterprise arhitecture blueprint objectives
Andy Parkins
 
Software Project Management
Software Project ManagementSoftware Project Management
Software Project Management
karthikeyanC40
 
Function Point Analysis
Function Point AnalysisFunction Point Analysis
Function Point Analysis
Araf Karsh Hamid
 
ITIL4 and ServiceNow
ITIL4 and ServiceNowITIL4 and ServiceNow
ITIL4 and ServiceNow
ITSM Academy, Inc.
 
EBS and PDH, a comparision
EBS and PDH, a comparisionEBS and PDH, a comparision
EBS and PDH, a comparision
Larry Sherrod
 
Lect5 improving software economics
Lect5 improving software economicsLect5 improving software economics
Lect5 improving software economics
meena466141
 
Process Mining - Chapter 1 - Introduction
Process Mining - Chapter 1 - IntroductionProcess Mining - Chapter 1 - Introduction
Process Mining - Chapter 1 - Introduction
Wil van der Aalst
 
Mastering SharePoint Migration Planning
Mastering SharePoint Migration PlanningMastering SharePoint Migration Planning
Mastering SharePoint Migration Planning
Christian Buckley
 
Sustainable Product Development Solution by HCL on Aras PLM Platform
Sustainable Product Development Solution by HCL on Aras PLM PlatformSustainable Product Development Solution by HCL on Aras PLM Platform
Sustainable Product Development Solution by HCL on Aras PLM Platform
Aras
 
Cloud Migration Strategy and Best Practices
Cloud Migration Strategy and Best PracticesCloud Migration Strategy and Best Practices
Cloud Migration Strategy and Best Practices
QBurst
 
Aspect oriented architecture
Aspect oriented architecture Aspect oriented architecture
Aspect oriented architecture tigneb
 

What's hot (20)

Software architecture design ppt
Software architecture design pptSoftware architecture design ppt
Software architecture design ppt
 
Spm unit 5
Spm unit 5Spm unit 5
Spm unit 5
 
Cutover Plan V2
Cutover Plan V2Cutover Plan V2
Cutover Plan V2
 
Business Value Measurements and the Solution Design Framework
Business Value Measurements and the Solution Design FrameworkBusiness Value Measurements and the Solution Design Framework
Business Value Measurements and the Solution Design Framework
 
Telecom testing
Telecom testingTelecom testing
Telecom testing
 
Introduction to system analysis and design
Introduction to system analysis and designIntroduction to system analysis and design
Introduction to system analysis and design
 
Business Process Modeling
Business Process ModelingBusiness Process Modeling
Business Process Modeling
 
A Roadmap to Data Migration Success
A Roadmap to Data Migration SuccessA Roadmap to Data Migration Success
A Roadmap to Data Migration Success
 
Enterprise arhitecture blueprint objectives
Enterprise arhitecture blueprint objectivesEnterprise arhitecture blueprint objectives
Enterprise arhitecture blueprint objectives
 
Software Project Management
Software Project ManagementSoftware Project Management
Software Project Management
 
Function Point Analysis
Function Point AnalysisFunction Point Analysis
Function Point Analysis
 
ITIL4 and ServiceNow
ITIL4 and ServiceNowITIL4 and ServiceNow
ITIL4 and ServiceNow
 
EBS and PDH, a comparision
EBS and PDH, a comparisionEBS and PDH, a comparision
EBS and PDH, a comparision
 
Lect5 improving software economics
Lect5 improving software economicsLect5 improving software economics
Lect5 improving software economics
 
Process Mining - Chapter 1 - Introduction
Process Mining - Chapter 1 - IntroductionProcess Mining - Chapter 1 - Introduction
Process Mining - Chapter 1 - Introduction
 
Oracle hyperion financial management
Oracle hyperion financial managementOracle hyperion financial management
Oracle hyperion financial management
 
Mastering SharePoint Migration Planning
Mastering SharePoint Migration PlanningMastering SharePoint Migration Planning
Mastering SharePoint Migration Planning
 
Sustainable Product Development Solution by HCL on Aras PLM Platform
Sustainable Product Development Solution by HCL on Aras PLM PlatformSustainable Product Development Solution by HCL on Aras PLM Platform
Sustainable Product Development Solution by HCL on Aras PLM Platform
 
Cloud Migration Strategy and Best Practices
Cloud Migration Strategy and Best PracticesCloud Migration Strategy and Best Practices
Cloud Migration Strategy and Best Practices
 
Aspect oriented architecture
Aspect oriented architecture Aspect oriented architecture
Aspect oriented architecture
 

Viewers also liked

Streaming Big Data & Analytics For Scale
Streaming Big Data & Analytics For ScaleStreaming Big Data & Analytics For Scale
Streaming Big Data & Analytics For Scale
Helena Edelson
 
Predictive analytics for ROI driven decision making
Predictive analytics for ROI driven decision makingPredictive analytics for ROI driven decision making
Predictive analytics for ROI driven decision makingSai Kumar Devulapalli
 
Big Data & Analytics Architecture
Big Data & Analytics ArchitectureBig Data & Analytics Architecture
Big Data & Analytics Architecture
Arvind Sathi
 
Magazine Surface - Le uber de la rénovation - Alain Fortier, Josianne Marsan
Magazine Surface - Le uber de la rénovation - Alain Fortier, Josianne MarsanMagazine Surface - Le uber de la rénovation - Alain Fortier, Josianne Marsan
Magazine Surface - Le uber de la rénovation - Alain Fortier, Josianne Marsan
Alain Fortier
 
Big Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our LivesBig Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our Lives
Rukshan Batuwita
 
Unlock the Value of Usage Data
Unlock the Value of Usage DataUnlock the Value of Usage Data
Unlock the Value of Usage Data
Kissmetrics on SlideShare
 
081221736543 Cara membeli oris
081221736543 Cara membeli oris081221736543 Cara membeli oris
081221736543 Cara membeli oris
Toko Pasutri Berhasil
 
Tendencias de Facility Management 2017
Tendencias de Facility Management 2017Tendencias de Facility Management 2017
Tendencias de Facility Management 2017
IngesanServiciosMexico
 
Insights into Action – recent examples leveraging big data in the automotive ...
Insights into Action – recent examples leveraging big data in the automotive ...Insights into Action – recent examples leveraging big data in the automotive ...
Insights into Action – recent examples leveraging big data in the automotive ...
Capgemini
 
Big Data and Analytic Strategy for Clinical Research
Big Data and Analytic Strategy for Clinical ResearchBig Data and Analytic Strategy for Clinical Research
Big Data and Analytic Strategy for Clinical Research
BBCR Consulting
 
Hortonworks roadshow
Hortonworks roadshowHortonworks roadshow
Hortonworks roadshowAccenture
 
Real-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to ProductionReal-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to ProductionRevolution Analytics
 
Real-Time Analytics and Visualization of Streaming Big Data with JReport & Sc...
Real-Time Analytics and Visualization of Streaming Big Data with JReport & Sc...Real-Time Analytics and Visualization of Streaming Big Data with JReport & Sc...
Real-Time Analytics and Visualization of Streaming Big Data with JReport & Sc...
Mia Yuan Cao
 
Overview of Cascading 3.0 on Apache Flink
Overview of Cascading 3.0 on Apache Flink Overview of Cascading 3.0 on Apache Flink
Overview of Cascading 3.0 on Apache Flink
Cascading
 
Enterprise Approach towards Cost Savings and Enterprise Agility
Enterprise Approach towards Cost Savings and Enterprise AgilityEnterprise Approach towards Cost Savings and Enterprise Agility
Enterprise Approach towards Cost Savings and Enterprise Agility
NUS-ISS
 
BFBM(12-2016) Affordable and clean energy
BFBM(12-2016) Affordable and clean energyBFBM(12-2016) Affordable and clean energy
BFBM(12-2016) Affordable and clean energy
Hub Myanmar Company Limited
 
Building Big Data Applications
Building Big Data ApplicationsBuilding Big Data Applications
Building Big Data Applications
Richard McDougall
 
Online hotel booking application - Design Process
Online hotel booking application - Design ProcessOnline hotel booking application - Design Process
Online hotel booking application - Design Process
chayapathi sarath
 
BigDataEurope - Big Data & Energy
BigDataEurope - Big Data & EnergyBigDataEurope - Big Data & Energy
BigDataEurope - Big Data & Energy
BigData_Europe
 
"Big Data" in the Energy Industry
"Big Data" in the Energy Industry"Big Data" in the Energy Industry
"Big Data" in the Energy Industry
Paige Bailey
 

Viewers also liked (20)

Streaming Big Data & Analytics For Scale
Streaming Big Data & Analytics For ScaleStreaming Big Data & Analytics For Scale
Streaming Big Data & Analytics For Scale
 
Predictive analytics for ROI driven decision making
Predictive analytics for ROI driven decision makingPredictive analytics for ROI driven decision making
Predictive analytics for ROI driven decision making
 
Big Data & Analytics Architecture
Big Data & Analytics ArchitectureBig Data & Analytics Architecture
Big Data & Analytics Architecture
 
Magazine Surface - Le uber de la rénovation - Alain Fortier, Josianne Marsan
Magazine Surface - Le uber de la rénovation - Alain Fortier, Josianne MarsanMagazine Surface - Le uber de la rénovation - Alain Fortier, Josianne Marsan
Magazine Surface - Le uber de la rénovation - Alain Fortier, Josianne Marsan
 
Big Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our LivesBig Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our Lives
 
Unlock the Value of Usage Data
Unlock the Value of Usage DataUnlock the Value of Usage Data
Unlock the Value of Usage Data
 
081221736543 Cara membeli oris
081221736543 Cara membeli oris081221736543 Cara membeli oris
081221736543 Cara membeli oris
 
Tendencias de Facility Management 2017
Tendencias de Facility Management 2017Tendencias de Facility Management 2017
Tendencias de Facility Management 2017
 
Insights into Action – recent examples leveraging big data in the automotive ...
Insights into Action – recent examples leveraging big data in the automotive ...Insights into Action – recent examples leveraging big data in the automotive ...
Insights into Action – recent examples leveraging big data in the automotive ...
 
Big Data and Analytic Strategy for Clinical Research
Big Data and Analytic Strategy for Clinical ResearchBig Data and Analytic Strategy for Clinical Research
Big Data and Analytic Strategy for Clinical Research
 
Hortonworks roadshow
Hortonworks roadshowHortonworks roadshow
Hortonworks roadshow
 
Real-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to ProductionReal-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to Production
 
Real-Time Analytics and Visualization of Streaming Big Data with JReport & Sc...
Real-Time Analytics and Visualization of Streaming Big Data with JReport & Sc...Real-Time Analytics and Visualization of Streaming Big Data with JReport & Sc...
Real-Time Analytics and Visualization of Streaming Big Data with JReport & Sc...
 
Overview of Cascading 3.0 on Apache Flink
Overview of Cascading 3.0 on Apache Flink Overview of Cascading 3.0 on Apache Flink
Overview of Cascading 3.0 on Apache Flink
 
Enterprise Approach towards Cost Savings and Enterprise Agility
Enterprise Approach towards Cost Savings and Enterprise AgilityEnterprise Approach towards Cost Savings and Enterprise Agility
Enterprise Approach towards Cost Savings and Enterprise Agility
 
BFBM(12-2016) Affordable and clean energy
BFBM(12-2016) Affordable and clean energyBFBM(12-2016) Affordable and clean energy
BFBM(12-2016) Affordable and clean energy
 
Building Big Data Applications
Building Big Data ApplicationsBuilding Big Data Applications
Building Big Data Applications
 
Online hotel booking application - Design Process
Online hotel booking application - Design ProcessOnline hotel booking application - Design Process
Online hotel booking application - Design Process
 
BigDataEurope - Big Data & Energy
BigDataEurope - Big Data & EnergyBigDataEurope - Big Data & Energy
BigDataEurope - Big Data & Energy
 
"Big Data" in the Energy Industry
"Big Data" in the Energy Industry"Big Data" in the Energy Industry
"Big Data" in the Energy Industry
 

Similar to Real-time Big Data Analytics: From Deployment to Production

Risk Analysis in the Financial Services Industry
Risk Analysis in the Financial Services IndustryRisk Analysis in the Financial Services Industry
Risk Analysis in the Financial Services Industry
Revolution Analytics
 
100% R and More: Plus What's New in Revolution R Enterprise 6.0
100% R and More: Plus What's New in Revolution R Enterprise 6.0100% R and More: Plus What's New in Revolution R Enterprise 6.0
100% R and More: Plus What's New in Revolution R Enterprise 6.0
Revolution Analytics
 
Big Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of SybaseBig Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
BigDataCloud
 
Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase
Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase
Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase
Sybase Türkiye
 
R and Big Data using Revolution R Enterprise with Hadoop
R and Big Data using Revolution R Enterprise with HadoopR and Big Data using Revolution R Enterprise with Hadoop
R and Big Data using Revolution R Enterprise with Hadoop
Revolution Analytics
 
Big Data and Implications on Platform Architecture
Big Data and Implications on Platform ArchitectureBig Data and Implications on Platform Architecture
Big Data and Implications on Platform ArchitectureOdinot Stanislas
 
Finding fraud in large, diverse data sets
Finding fraud in large, diverse data setsFinding fraud in large, diverse data sets
Finding fraud in large, diverse data sets
Chris Selland
 
Big data in marketing at harvard business club nick1 june 15 2013
Big data in marketing at harvard business club nick1 june 15 2013Big data in marketing at harvard business club nick1 june 15 2013
Big data in marketing at harvard business club nick1 june 15 2013
nkabra
 
Big Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise ArchitectureBig Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise Architecture
MongoDB
 
APAC Big Data Strategy_RK
APAC Big Data Strategy_RKAPAC Big Data Strategy_RK
APAC Big Data Strategy_RKIntelAPAC
 
Rob's kt deck
Rob's kt deck Rob's kt deck
Rob's kt deck rvanwart
 
APAC Big Data Strategy RadhaKrishna Hiremane
APAC Big Data  Strategy RadhaKrishna  HiremaneAPAC Big Data  Strategy RadhaKrishna  Hiremane
APAC Big Data Strategy RadhaKrishna HiremaneIntelAPAC
 
"From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning",...
"From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning",..."From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning",...
"From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning",...
Dataconomy Media
 
Big data analytics on teradata with revolution r enterprise bill jacobs
Big data analytics on teradata with revolution r enterprise   bill jacobsBig data analytics on teradata with revolution r enterprise   bill jacobs
Big data analytics on teradata with revolution r enterprise bill jacobs
Bill Jacobs
 
In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017
SingleStore
 
Making Hadoop Ready for the Enterprise
Making Hadoop Ready for the Enterprise Making Hadoop Ready for the Enterprise
Making Hadoop Ready for the Enterprise
DataWorks Summit
 
Integrate Your Advanced Analytics into BI Apps and MS Office and Multiply The...
Integrate Your Advanced Analytics into BI Apps and MS Office and Multiply The...Integrate Your Advanced Analytics into BI Apps and MS Office and Multiply The...
Integrate Your Advanced Analytics into BI Apps and MS Office and Multiply The...Revolution Analytics
 
Mesa Big Data 2nd Screen Final
Mesa Big Data 2nd Screen FinalMesa Big Data 2nd Screen Final
Mesa Big Data 2nd Screen Final
Tripp Payne
 
Best practices for building and deploying predictive models over big data pre...
Best practices for building and deploying predictive models over big data pre...Best practices for building and deploying predictive models over big data pre...
Best practices for building and deploying predictive models over big data pre...
Kun Le
 
KNIME Meetup 2016-04-16
KNIME Meetup 2016-04-16KNIME Meetup 2016-04-16
KNIME Meetup 2016-04-16
W. Daniel Cox, III CMA, CFM
 

Similar to Real-time Big Data Analytics: From Deployment to Production (20)

Risk Analysis in the Financial Services Industry
Risk Analysis in the Financial Services IndustryRisk Analysis in the Financial Services Industry
Risk Analysis in the Financial Services Industry
 
100% R and More: Plus What's New in Revolution R Enterprise 6.0
100% R and More: Plus What's New in Revolution R Enterprise 6.0100% R and More: Plus What's New in Revolution R Enterprise 6.0
100% R and More: Plus What's New in Revolution R Enterprise 6.0
 
Big Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of SybaseBig Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
 
Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase
Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase
Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase
 
R and Big Data using Revolution R Enterprise with Hadoop
R and Big Data using Revolution R Enterprise with HadoopR and Big Data using Revolution R Enterprise with Hadoop
R and Big Data using Revolution R Enterprise with Hadoop
 
Big Data and Implications on Platform Architecture
Big Data and Implications on Platform ArchitectureBig Data and Implications on Platform Architecture
Big Data and Implications on Platform Architecture
 
Finding fraud in large, diverse data sets
Finding fraud in large, diverse data setsFinding fraud in large, diverse data sets
Finding fraud in large, diverse data sets
 
Big data in marketing at harvard business club nick1 june 15 2013
Big data in marketing at harvard business club nick1 june 15 2013Big data in marketing at harvard business club nick1 june 15 2013
Big data in marketing at harvard business club nick1 june 15 2013
 
Big Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise ArchitectureBig Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise Architecture
 
APAC Big Data Strategy_RK
APAC Big Data Strategy_RKAPAC Big Data Strategy_RK
APAC Big Data Strategy_RK
 
Rob's kt deck
Rob's kt deck Rob's kt deck
Rob's kt deck
 
APAC Big Data Strategy RadhaKrishna Hiremane
APAC Big Data  Strategy RadhaKrishna  HiremaneAPAC Big Data  Strategy RadhaKrishna  Hiremane
APAC Big Data Strategy RadhaKrishna Hiremane
 
"From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning",...
"From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning",..."From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning",...
"From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning",...
 
Big data analytics on teradata with revolution r enterprise bill jacobs
Big data analytics on teradata with revolution r enterprise   bill jacobsBig data analytics on teradata with revolution r enterprise   bill jacobs
Big data analytics on teradata with revolution r enterprise bill jacobs
 
In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017
 
Making Hadoop Ready for the Enterprise
Making Hadoop Ready for the Enterprise Making Hadoop Ready for the Enterprise
Making Hadoop Ready for the Enterprise
 
Integrate Your Advanced Analytics into BI Apps and MS Office and Multiply The...
Integrate Your Advanced Analytics into BI Apps and MS Office and Multiply The...Integrate Your Advanced Analytics into BI Apps and MS Office and Multiply The...
Integrate Your Advanced Analytics into BI Apps and MS Office and Multiply The...
 
Mesa Big Data 2nd Screen Final
Mesa Big Data 2nd Screen FinalMesa Big Data 2nd Screen Final
Mesa Big Data 2nd Screen Final
 
Best practices for building and deploying predictive models over big data pre...
Best practices for building and deploying predictive models over big data pre...Best practices for building and deploying predictive models over big data pre...
Best practices for building and deploying predictive models over big data pre...
 
KNIME Meetup 2016-04-16
KNIME Meetup 2016-04-16KNIME Meetup 2016-04-16
KNIME Meetup 2016-04-16
 

More from Revolution Analytics

Speeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the CloudSpeeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the Cloud
Revolution Analytics
 
Migrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureMigrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to Azure
Revolution Analytics
 
R in Minecraft
R in Minecraft R in Minecraft
R in Minecraft
Revolution Analytics
 
The case for R for AI developers
The case for R for AI developersThe case for R for AI developers
The case for R for AI developers
Revolution Analytics
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the Cloud
Revolution Analytics
 
The R Ecosystem
The R EcosystemThe R Ecosystem
The R Ecosystem
Revolution Analytics
 
R Then and Now
R Then and NowR Then and Now
R Then and Now
Revolution Analytics
 
Predicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per SecondPredicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per Second
Revolution Analytics
 
Reproducible Data Science with R
Reproducible Data Science with RReproducible Data Science with R
Reproducible Data Science with R
Revolution Analytics
 
The Value of Open Source Communities
The Value of Open Source CommunitiesThe Value of Open Source Communities
The Value of Open Source Communities
Revolution Analytics
 
The R Ecosystem
The R EcosystemThe R Ecosystem
The R Ecosystem
Revolution Analytics
 
R at Microsoft (useR! 2016)
R at Microsoft (useR! 2016)R at Microsoft (useR! 2016)
R at Microsoft (useR! 2016)
Revolution Analytics
 
Building a scalable data science platform with R
Building a scalable data science platform with RBuilding a scalable data science platform with R
Building a scalable data science platform with R
Revolution Analytics
 
R at Microsoft
R at MicrosoftR at Microsoft
R at Microsoft
Revolution Analytics
 
The Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data ScienceThe Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data Science
Revolution Analytics
 
Taking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the CloudTaking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the Cloud
Revolution Analytics
 
The Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductorThe Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductor
Revolution Analytics
 
The network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 finalThe network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 finalRevolution Analytics
 
Simple Reproducibility with the checkpoint package
Simple Reproducibilitywith the checkpoint packageSimple Reproducibilitywith the checkpoint package
Simple Reproducibility with the checkpoint package
Revolution Analytics
 
R at Microsoft
R at MicrosoftR at Microsoft
R at Microsoft
Revolution Analytics
 

More from Revolution Analytics (20)

Speeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the CloudSpeeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the Cloud
 
Migrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureMigrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to Azure
 
R in Minecraft
R in Minecraft R in Minecraft
R in Minecraft
 
The case for R for AI developers
The case for R for AI developersThe case for R for AI developers
The case for R for AI developers
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the Cloud
 
The R Ecosystem
The R EcosystemThe R Ecosystem
The R Ecosystem
 
R Then and Now
R Then and NowR Then and Now
R Then and Now
 
Predicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per SecondPredicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per Second
 
Reproducible Data Science with R
Reproducible Data Science with RReproducible Data Science with R
Reproducible Data Science with R
 
The Value of Open Source Communities
The Value of Open Source CommunitiesThe Value of Open Source Communities
The Value of Open Source Communities
 
The R Ecosystem
The R EcosystemThe R Ecosystem
The R Ecosystem
 
R at Microsoft (useR! 2016)
R at Microsoft (useR! 2016)R at Microsoft (useR! 2016)
R at Microsoft (useR! 2016)
 
Building a scalable data science platform with R
Building a scalable data science platform with RBuilding a scalable data science platform with R
Building a scalable data science platform with R
 
R at Microsoft
R at MicrosoftR at Microsoft
R at Microsoft
 
The Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data ScienceThe Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data Science
 
Taking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the CloudTaking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the Cloud
 
The Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductorThe Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductor
 
The network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 finalThe network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 final
 
Simple Reproducibility with the checkpoint package
Simple Reproducibilitywith the checkpoint packageSimple Reproducibilitywith the checkpoint package
Simple Reproducibility with the checkpoint package
 
R at Microsoft
R at MicrosoftR at Microsoft
R at Microsoft
 

Real-time Big Data Analytics: From Deployment to Production

  • 1. Revolution Confidential R eal-Time P redic tive A nalytic s with B ig Data from deployment to produc tion David M S mith @ revodavid V P Marketing and C ommunity R evolution Analytic s
  • 2. Today we’ll dis c us s : Revolution Confidential  What is “real-time predictive analytics with big data?”  Case study: UpStream Software  Real-time big-data stack  Five phases of deployment to production  Just what do we mean by “Big Data” and “Real-Time”, anyway?  Recommendations  Q&A 2
  • 3. R eal-Time P redic tive A nalytic s with B ig Data Revolution Confidential  Real Time  Milliseconds? Seconds? Hours? Days?  In production, continuously updated  Big Data  Size, flow rate, diversity  Conflict with ‘real time’  Predictive Analytics  Rear view: description, aggregation, tabulation  Prediction, inference, statistical modeling 3
  • 4. Revolution Confidential C as e S tudy www.upstreamsoftware.com “Given that our data sets are already in the terabytes and are growing rapidly, we depend on Revolution R Enterprise’s scalability and fast performance — we saw about a 4x From: “How Big Data is Changing Retail performance improvement on 50 Marketing Analytics” million records. It works brilliantly.” http://bit.ly/upstream-webinar John Wallace, CEO Upstream Software
  • 5. UpS tream S oftware Revolution Confidential UpStream’s Big Data Analytics engine analyzes and optimizes marketing mix for UpStream’s retailer clients  Customer analytics and segmentation  Revenue/ action attribution among channels and marketing programs  Customized Next Best Action per individual prospect or customer  Event-Triggered Marketing: responses to consumer actions 5
  • 6. B ig Data Revolution Confidential  Demographics: consumer, product, market  Actions: web clicks, email clicks, mobile app usage, call center logs, social, search …  Outcomes: impressions, touches, orders (retail, online, mobile) 6
  • 7. P redic tive A nalytic s Revolution Confidential • Time-to-event models UPSTREAM DATA • Exploratory data analysis CUSTOM VARIABLES FORMAT (PMML) • GAM survival models • ETL • Scoring for inference • N marketing channels • Scoring for prediction • Behavioral variables • 5 billion scores per day • Promotional data per customer • Overlay data
  • 8. R eal Time Revolution Confidential http://www.infoworld.com/d/big-data/williams-sonoma-uses-big-data-zero-in-customers-206641 November 8, 2012 8
  • 9. P redic tive vs Des c riptive A nalytic s Revolution Confidential  Retail sales are influenced by stores near the home  Newer customers are more sensitive to marketing  Google keywords perform worse than you think  Display advertising performs better than you think  Online sales benefit more from seasonal events than retail stores 9
  • 10. R eal-time Revolution Confidential B ig Data P redic tive A nalytic s S tac k 10
  • 11. Data L ayer Revolution Confidential  Structured data  RDBMS, NoSQL, Hbase, Impala  Unstructured data  Hadoop MapReduce  Streaming data  Web, social, sensors, operational systems  Some descriptive analytics done here 11
  • 12. A nalytic s layer Revolution Confidential  Predictive analytics technology  Development environment: build models  Production environment:  Deploy real-time scoring  Engine for dynamic analytics  Local data mart  Static, periodically updated from data layer  Improves performance 12
  • 13. Integration L ayer Revolution Confidential  Connective tissue between end-user applications and analytics engine  Engines for flow control, real-time scoring  Rules engine / CEP engine  API for Dynamic Analytics  Brokers communication between app developers and data scientists 13
  • 14. Dec is ion L ayer Revolution Confidential  End-user interface to analytics system  Customers  Operations  Business analysts  C-suite  Familiar & simple interfaces  Supports a range of end-user technologies  Level of UI complexity dictated by need 14
  • 15. R eal-Time Deployment Revolution Confidential Five phases of deploying real-time predictive analytics with big data to production: 1. Data Distillation 2. Model Development 3. Validation and Deployment 4. Real-time Scoring 5. Model refresh 16
  • 16. P has e 1: Data Dis tillation Revolution Confidential The data in the Data Layer isn’t yet ready for predictive modeling. We need to:  Extract features from unstructured text  Topic modeling, sentiment analysis  Combine disparate data sources  Filter for populations of interest  Select relevant features and outcomes for modeling  Export to the data mart for predictive modeling 17
  • 17. E xample: Data Dis tillation in Hadoop Revolution Confidential Log Files Event Streams HDFS Load Map-Reduce Structured Data rmr Scraped Text Unstructured Analytics Data Data Mart Revolution R Enterprise for Hadoop: bit.ly/r-hadoop 18
  • 18. A utomating the E xtrac tion P roc es s Revolution Confidential  This process needs to be repeatable and maintainable  Create a re-usable R script:  ASCII: rxDataStep  SQL: rxImport, ROracle, RMySQL  NoSQL: Rcassandra, rmongodb  Hadoop: rmr, rhbase, Rhive  Appliances: nza, teradataR, PL/R  Feeds: rjson, XML, RCurl, twitteR, python  R script runs from Analytics Layer  Processing: in Data Layer  Output: structured file for Data Mart 19
  • 19. Data Revolution Confidential Dis tillation P roc es s XD F 20
  • 20. P has e 2: Model development Revolution Confidential  Goal: create a predictive model that is  Powerful  Robust  Comprehensible  Implementable  Key requirements for Data Scientists:  Flexibility  Productivity  Speed  Reproducibility 21
  • 21. T he Model Development C yc le Revolution Confidential Download the White Paper R is Hot Variable bit.ly/r-is-hot Transformation Feature Selection Model Predictive Data Mart Sampling Estimation Aggregation Model Model Model Comparison / Refinement Benchmarking 22
  • 22. Tec hnology c ons iderations Revolution Confidential  Moving Big Data is slow  So just move it once, to data mart near the development and production environments  Static data needed for modeling cycle  But have a plan to refresh and update  Data layer not optimized for predictive analytics  But good for descriptive (data distillation)  Revolution R Enterprise optimizations for the analytics layer  R; XDF file format; Parallel External Memory Algorithms 23
  • 23. P redic tive Modeling A lgorithms Revolution Confidential Algorithm Example Applications Big Data ETL, data distillation, record/variable selection, Data Step variable transformation ✓✓ Descriptive Statistics Exploratory Data Analysis, Data Validation ✓✓ Tables & Cubes Reporting, contingency analysis ✓✓ Correlation / Covariance Factor Analysis, Value at Risk ✓✓ Linear regression Forecasting, Net present value estimation ✓✓ Logistic Regression Response modeling, offer selection ✓✓ Generalized Linear Models Capital reserve estimation, climate modeling ✓✓ K-means clustering Customer Segmentation ✓✓ Decision Trees Dynamic pricing, classification, variable importance ✓✓ Model Prediction Real-time Scoring (decisions, offers, actions) ✓✓ R CRAN packages Everything else Parallel & distributed Simulations, By-Group analysis, ensemble models, computing with R custom applications ✓ 24
  • 24. High performanc e with dis tributed c lus ters Revolution Confidential Data Scientists / Modelers Grid computing cluster Revolution R Enterprise Platform LSF Microsoft HPC Server Microsoft Azure Ad-hoc grids (SNOW) Shared SMP server Hadoop / HDFS 25
  • 25. P has e 3: Model Validation and Deployment Revolution Confidential  Scoring rules map parameters to outputs  Parameters: information known at real time  prospect ID, product, webpage, …  Scores: values inferred in real time  prices, recommendations, actions, …  Validation: backtest with historical data  Deployment: code running in real time 26
  • 26. Validation for produc tion Revolution Confidential Real Outcomes Historical Data Scores Scoring Rules  Refresh data mart  Rebuild model, withholding a validation set  Random sampling  Create accuracy measure  ROC, positive rate, false positive rate  Measure and monitor 27
  • 27. Deployment with R c ode Revolution Confidential Outcomes Historical Scores R Model R Model Data Data New Object Object  Scoring rules captured as “R model objects”  Move R code directly from development environment to production server  No recoding: Lowest cost, greatest speed & reliability  Most flexibility (not limited in model choice)  Easy to validate with custom accuracy measures 28
  • 28. Managing and S c aling Deployed R C ode Revolution Confidential  Revolution R Enterprise includes RevoDeployR  Code management & tracking  Security & resource allocation  Scalability for real-time demands  Web Services API  Scoring and dynamic analytics  Data visualization  custom reporting  Ad-hoc analytics  Interactive data apps 29
  • 29. C ode C onvers ion for S c oring Revolution Confidential  Business rules (e.g. IBM ILOG)  Create directly with R code from model object  Recode in other languages  SQL  C++, etc.  Low latency  Slow and costly deployment  Potential for errors 30
  • 30. S c oring via P MML Revolution Confidential  Some predictive models can be expressed in the PMML standard (www.dmg.org)  Not all, but growing all the time Scores R Model PMML Data PMML Object pmml  Easily generate PMML with R “pmml” package  PMML Scoring Engine: Zementis Adapa  Databases and appliances may support PMML  But supported standards vary 31
  • 31. P has e 4: R eal-Time S c oring Revolution Confidential  Scoring triggered by Decision Layer, brokered by Integration Layer  R code  Revolution R Enterprise Server  In-appliance (e.g IBM PureData System for Analytics)  SQL  In-database  Rules  Compiled code  Bespoke engines  May be using hardware from data layer  But not the actual data 32
  • 32. R eal-Time Revolution Confidential S c oring: review iLog SQL Near Real Time 33
  • 33. P has e 5: Model R efres h Revolution Confidential  Deployed scoring rules no longer connected to Data Layer or Data Mart  Enables real-time performance  Need to refresh using recent data  Model refresh process  Use data extract script  Re-run model script  Statistical review OR automated validation 34
  • 34. S c heduled B atc h Updates Revolution Confidential  Periodic model refresh  Weekly  Daily Revolution R Enterprise  Hourly Production Server Cluster  Automated Scheduler validate/deploy process Excel RevoDeployR Server Web Services API 35
  • 35. R e-developing the model Revolution Confidential  Even then, the underlying structure of the data may change:  Important variables become non-significant  Non-significant variables become important!  New data sources become available  Model accuracy drift is a warning sign  Return to Phase 2 or Phase 1 36
  • 36. Revolution Confidential Kilobytes / second Megabytes / second How big is ‘B ig data’? Megabytes Gigabytes  Terabytes Petabytes  Exabytes 38
  • 37. Revolution Confidential Seconds or less Milliseconds How fas t is ‘R eal time’? Daily  Hourly Hours  Minutes Daily  Hourly Weeks  Days 39
  • 38. R ec ommendations Revolution Confidential  Architect your Predictive Analytics Stack  Best of breed vs single-vendor stack  Diversify data sources and decision apps  R = Predictive Analytics Revolution R Enterprise provides:  Analytics Layer: big data, performance  Integration Layer: scalability, reliability  Get Help to Get Started  Consider an SI partner for architecture, best practices and implementation advice 40
  • 39. R es ourc es Revolution Confidential  Revolution R Enterprise : R for Big Data  www.revolutionanalytics.com/products  Revolution Analytics Consulting Services  www.revolutionanalytics.com/services  Contact us: bit.ly/hey-revo  Rhadoop : Connecting R and Hadoop  bit.ly/r-hadoop  Contact David Smith  david@revolutionanalytics.com  @revodavid  blog.revolutionanalytics.com 41
  • 40. T hank you. Revolution Confidential The leading commercial provider of software and support for the popular open source R statistics language. www.revolutionanalytics.com 650.646.9545 Twitter: @RevolutionR 42