This document summarizes a presentation given by Revolution Analytics on using R for marketing analytics. It discusses challenges like needing to make decisions faster based on more data and predictive models. It provides examples of companies using Revolution's R software to improve results, such as increasing lift for a client by 14% and saving another $270k. The presentation promotes Revolution's R software for handling big data and analytics faster through techniques like parallel processing and distributed computing. It argues Revolution R is the leading commercial provider of high performance R software.
5. Revolution Confidential
Today’s Challenge:
Accelerating Business Cadence
5
Changing Business Environment
• Fact Based Decisions Require More Data
• Need to Understand Tradeoffs and Best Course of Action
• Predictive Models Need to Continually Deliver Lift
• Reduced Shelf Life for Predictive Models
Faster Time to Value
• Reduce Analytic Cycle Time
• Build & Deploy Models Faster
• Eliminate Time Consuming Data Movements
Rapid Customer Facing Decisions
• Score More Frequently
• Need to Make Best Decision in Real Time
10. Revolution Confidential
Can we be more innovative in marketing
analytics…and precise in our targeting… using
new and “old” data… in less time?
10
11. Revolution Confidential
How fast can the marketing data scientist innovate
to drive better precision in model output? …
…and can you get it (scale of data / scale of model scoring) in to production?
…at an acceptable price point?
13. Revolution ConfidentialScaleR: High Performance Scalable
Parallel External Memory Algorithms
13
Data import – Delimited,
Fixed, SAS, SPSS, OBDC
Variable creation &
transformation
Recode variables
Factor variables
Missing value handling
Sort
Merge
Split
Aggregate by category
(means, sums)
Data import – Delimited,
Fixed, SAS, SPSS, OBDC
Variable creation &
transformation
Recode variables
Factor variables
Missing value handling
Sort
Merge
Split
Aggregate by category
(means, sums)
Min / Max
Mean
Median (approx.)
Quantiles (approx.)
Standard Deviation
Variance
Correlation
Covariance
Sum of Squares (cross product
matrix for set variables)
Pairwise Cross tabs
Risk Ratio & Odds Ratio
Cross-Tabulation of Data
(standard tables & long form)
Marginal Summaries of Cross
Tabulations
Min / Max
Mean
Median (approx.)
Quantiles (approx.)
Standard Deviation
Variance
Correlation
Covariance
Sum of Squares (cross product
matrix for set variables)
Pairwise Cross tabs
Risk Ratio & Odds Ratio
Cross-Tabulation of Data
(standard tables & long form)
Marginal Summaries of Cross
Tabulations
Chi Square Test
Kendall Rank Correlation
Fisher’s Exact Test
Student’s t-Test
Chi Square Test
Kendall Rank Correlation
Fisher’s Exact Test
Student’s t-Test
Data Prep, Distillation & Descriptive AnalyticsData Prep, Distillation & Descriptive Analytics
Subsample (observations &
variables)
Random Sampling
Subsample (observations &
variables)
Random Sampling
R Data Step Statistical Tests
Sampling
Descriptive Statistics
14. Revolution ConfidentialScaleR: High Performance Scalable
Parallel External Memory Algorithms
14
Sum of Squares (cross product
matrix for set variables)
Multiple Linear Regression
Generalized Linear Models (GLM)
- All exponential family
distributions: binomial, Gaussian,
inverse Gaussian, Poisson,
Tweedie. Standard link functions
including: cauchit, identity, log,
logit, probit. User defined
distributions & link functions.
Covariance & Correlation
Matrices
Logistic Regression
Classification & Regression Trees
Predictions/scoring for models
Residuals for all models
Sum of Squares (cross product
matrix for set variables)
Multiple Linear Regression
Generalized Linear Models (GLM)
- All exponential family
distributions: binomial, Gaussian,
inverse Gaussian, Poisson,
Tweedie. Standard link functions
including: cauchit, identity, log,
logit, probit. User defined
distributions & link functions.
Covariance & Correlation
Matrices
Logistic Regression
Classification & Regression Trees
Predictions/scoring for models
Residuals for all models
Histogram
Line Plot
Scatter Plot
Lorenz Curve
ROC Curves (actual data and
predicted values)
Histogram
Line Plot
Scatter Plot
Lorenz Curve
ROC Curves (actual data and
predicted values)
K-Means K-Means
Statistical ModelingStatistical Modeling
Decision Trees Decision Trees
Predictive Models Cluster AnalysisData Visualization
Classification
Machine LearningMachine Learning
SimulationSimulation
Variable Selection
Stepwise Regression
Monte Carlo
Parallel Random Number
Generation
Monte Carlo
Parallel Random Number
Generation
15. Revolution Confidential
15
• User Churn: predict the likelihood of a user leaving a particular game
• User Community Impact: understand the impact players have on communities
• Promotional Pricing: understand user purchase behavior better.
• Game Content Optimization: understand user behavior to develop new games
Revolution example: multi-use predictive analytics
16. Revolution Confidential
Example of what we do:
DataSong, marketing attribution and optimisation
16
Company: Data Song Software, San Francisco
www.datasong.com
Industry: software / services for marketing
attribution and campaign optimization
Challenge: economically develop a scalable,
high-performing R-powered Big Data Analytics
platform on which to provide services to clients
Solution:
• Revolution R Enterprise for Big Data
Analytics and Hadoop for data management
• Customized exploratory data analysis and
GAM survival models to drive NBA and
targeting
• Saved one client $270,000 on one campaign
• Generated 14% lift for another client
We saw about a 4x performance improvement on
50 million records. It works brilliantly.”
- CEO, John Wallace, DataSong
17. Revolution Confidential
Example of what we do: [X+1], digital marketing
analytics
17
Company: [X+1] New York, www.xplusone.com
Industry: software and services for optimized
digital marketing through multi-channel visitor
experiences on personalized websites and real-
time digital audience targeting
Challenge: needed real-time analytics,
automated model updates, include new data
types and manage quickly-growing data volumes
Solution:
• Revolution R Enterprise, for Big Data
Analytics, and a distributed computing
platform for data management
• Higher lift of real time multi-channel ad
targeting analytics derived from use of more
data and attributes
• Higher lift through higher precision audience
targeting and tailored messaging 2X data, 2X attributes
no impact on performance
19. Revolution Confidential
PEMAs Beat In-Memory Algorithms
Parallel external memory algorithms
(PEMA’s)
Exploit distributed and streaming data
Deliver scalability and performance
Split computations so not all data has to be in
memory at one time
“automatically” parallelize and distribute
algorithms
19
20. Revolution Confidential
20
Revolution R Enterprise
High Performance, Multi-Platform Analytics Platform
Revolution R EnterpriseRevolution R Enterprise
DeployR
Web Services Software Development Kit
DevelopR
Integrated
Development
Environment
ConnectR
High Speed & Direct Connectors
Teradata, HDFS (both), Hbase, Netezza, SAS, SPSS, CSV, ODBC
ScaleR
High Performance Big Data Analytics
DistributedR
Streaming, In-Memory Distributed Computing Framework
IBM PureData, IBM Platform LSF, HPC Server, MS Azure Burst, Windows &
redhat Servers
RevoR
Performance Enhanced Open Source R + Open Source R packages