This document summarizes an event where Revolution Analytics discussed applications of R and lessons learned from its use in the marketplace. The presentation covered how companies like Facebook, Google, Twitter, The New York Times, gaming companies and others use R for exploratory data analysis, data visualization, statistical modeling and more. It also provided examples of how pharmaceutical, finance, retail and other industries apply R at scale. The document concludes with a discussion of Revolution Analytics' training and support services to help organizations build out their use of R.
Applications of R in Business Analytics and Big Data
1. Applications in R
Success and Lessons Learned from the Marketplace
David Smith
Chief Community Officer
Revolution Analytics
July 29, 2014
Neera Talbert
VP Professional Services
Revolution Analytics
2. Agenda
Introduction to R
Growth of R
Applications of R
Q&A
David Smith
Chief Community Officer
@revodavid
Editor, blog.revolutionanalytics.com
Co-author, “Introduction to R”
3. 3
OUR COMPANY
The leading provider
of advanced analytics
software and services
based on open source R,
since 2007
OUR SOFTWARE
The only Big Data, Big
Analytics software platform
based on the data science
language R
SOME KUDOS
Visionary
Gartner Magic Quadrant
for Advanced Analytics
Platforms, 2014
4. What is R?
Most widely used data analysis software
• Used by 2M+ data scientists, statisticians and analysts
Most powerful statistical programming language
• Flexible, extensible and comprehensive for productivity
Create beautiful and unique data visualizations
• As seen in New York Times, Twitter and Flowing Data
Thriving open-source community
• Leading edge of analytics research
Fills the talent gap
• New graduates prefer R
www.revolutionanalytics.com/what-r
6. 6
R’s popularity is growing rapidly
R Usage Growth
Rexer Data Miner Survey, 2007-2013
• Rexer Data Miner Survey • IEEE Spectrum, July 2014
#9: R
Language Popularity
IEEE Spectrum Top Programming Languages
7. 7
R is among the highest-paid IT skills in the US
Dice Tech Salary Survey, January 2014 O’Reilly Strata 2013 Data Science Salary Survey
8. 8
Technical Support for Open Source R
AdviseR™ from Revolution Analytics
Technical support for open source R, from the R experts.
Email and phone support 8AM-6PM, Mon-Fri
Support for R, validated packages, and third-party software
connections
On-line case management and knowledgebase
Access to technical resources, documentation and user forums
Exclusive on-line webinars from community experts
Guaranteed response times
Also available: expert hands-on and on-line training for R, from
Revolution Analytics AcademyR.
www.revolutionanalytics.com/AdviseR
R SUPPORT
12 MONTHS
$795
PER USER
10. Facebook
• Exploratory Data
Analysis
• Experimental Analysis
“Generally, we use R to move
fast when we get a new data
set. With R, we don’t need to
develop custom tools or write
a bunch of code. Instead, we
can just go about cleaning
and exploring the data.” —
Solomon Messing, data
scientist at Facebook
11. Facebook
• Big-Data Visualization
“It resonated with
many people. It's not
just a pretty picture,
it's a reaffirmation of
the impact we have
in connecting
people, even across
oceans and
borders.” — Paul
Butler, data
scientist, Facebook
12. Google
12
“The great beauty of R
is that you can modify
it to do all sorts of
things.”
— Hal Varian
Chief Economist,
Google
• Advertising
Effectiveness
“R is really
important to the
point that it's hard
to overvalue it.” —
Daryl Pregibon
Head of
Statistics,
Google
• Economic forecasting
13. 13
Twitter
• Data Visualization • Semantic clustering
“A common pattern for me is that I'll code a MapReduce
job in Scala, do some simple command-line munging on
the results, pass the data into Python or R for further
analysis, pull from a database to grab some extra fields,
and so on, often integrating what I find into some
machine learning models in the end” — Ed Chen, Data
Scientist, Twitter
15. 15
The New York Times
Interactive Features
• Election Forecast
• Dialect Quiz
Data Journalism
• NFL Draft Picks
• Wealth distribution in USA
16. 16
The New York Times
Data Visualization
• Facebook IPO
• Baseball legends
17. 17
Video Gaming
• Multiplayer Matchmaking
• Player Churn
• Game design
• Difficulty curve
• Level trouble-spots
• In-game purchase optimization
• Fraud detection
• Player communities
• Game Analysis
VideoGames
18. 18
Housing
• Crime mapping
• choroplethr package
“The core innovation that Zillow
offers are its advanced
statistical predictive products,
including the Zestimate®, the
Rent Zestimate and the ZHVI®
family of real estate indexes. By
using R in production as well as
research, Zillow maximizes
flexibility and minimizes the
latency in rolling out updates
and new products.”
• Statistical forecasting
RealEstate
20. 20
Pharmaceuticals “R use at the FDA is completely
acceptable and has not caused
any problems.” — Dr Jae
Brodsky, Office of
Biostatistics, Food and Drug
Administration
Regulatory Drug Approvals
• Reproducible research
• Accurate, reliable and consistent statistical analysis
• Internal reporting (Section 508 compliance)
21. Power
“We’ve combined Revolution R
Enterprise and Hadoop to build and
deploy customized exploratory data
analysis and GAM survival models for
our marketing performance
management and attribution platform.
Given that our data sets are already in
the terabytes and are growing rapidly,
we depend on Revolution R Enterprise’s
scalability and power – we saw about
a 4x performance improvement on 50
million records. It works brilliantly.”
- CEO, John Wallace, DataSong
4X performance
50M records scored daily
Scalability
“We’ve been able to scale our solution to a
problem that’s so big that most companies could
not address it. If we had to go with a different
solution we wouldn’t be as efficient as we are
now.”
- SVP Analytics, Kevin Lyons, eXelate
TB’s data from 200+ data sources
10’s thousands attributes
100’s millions of scores daily
2X data
2X attributes
no impact on performance
Performance
“We need a high-performance
analytics infrastructure because
marketing optimization is a lot like a
financial trading. By watching the
market constantly for data or market
condition updates, we can now
identify opportunities for our clients
that would otherwise be lost.”
- Chief Analytics Officer, Leon Zemel,
[x+1]
MarketingAnalytics
22. All of Open Source R plus:
Big Data scalability
High-performance analytics
Development and deployment
tools
Data source connectivity
Application integration framework
Multi-platform architecture
Technical Support
Available training and services
22
is the
Big Data Big Analytics Platform
24. 24
Neera Talbert, VP Big Data & Advanced Analytic Services
Leads Services at Revolution Analytics
Fifteen years of experience the business analytics software industry
Works with Fortune 500 companies to define analytics strategy, implement analytic based
decision making, reduce decision latency, and increase speed of decision making
– Analytics, business intelligence, big data analytics, risk
– Customer intelligence, supply chain, manufacturing, retail, oil & gas, public sector.
25. Organizational
Readiness
“There will be almost half a million jobs in five years, and a
shortage of up to 190,000 qualified data scientists, plus a
need for 1.5 million executives and support staff who have
an understanding of data”
McKinsey Global Institute
April 2013
26. Opportunity to develop talent
Data Science “the sexiest job in the 21st
century,” - Harvard Business Review
A cross between computer engineers,
statisticians and business analyst – people who
ask good questions and open to working with
unstructured information
Universities can’t produce them fast enough –
need 60% more resources – McKinsey Global
Institute
27. Our Philosophy
“The Hands-on exercises were the best part of Revolution Analytics
training”
- A participant from a global telecom company
29. RRE Certification Testing
Demonstrate your R and RRE programming knowledge
– Fundamentals in R Language
– Data Management in Revolution R Enterprise
– Modeling in Revolution R Enterprise
Independently proctored exam – online and onsite
30. Training Data Science team for Big Data Analytics
30
“Given that our data sets are already in the
terabytes and are growing rapidly, we depend
on Revolution R Enterprise’s scalability
and power. We saw about a 4x
performance improvement on 50 million
records. It works brilliantly.”
CEO, John Wallace
(DataSong formerly named UpStream)
4X performance
50M+ records scored daily
Key Technology: Revolution R Enterprise and Hadoop,
replacing SAS and Open Source R
Outcomes: Massively scalable infrastructure to
support attribution and optimization at an individual
customer level (segments of one) for clients such as
Williams-Sonoma. Client saved $250K in one campaign.
Rapid development and deployment of customer-specific
models, using innovative analytic techniques such as big
data GAM Survival models
Bottom Line: Driving revenue lift and cost savings through
marketing optimization
Profile: Multi-channel marketing attribution
and analytics software developer and service
provider. Growing, innovative, cost-conscious.
31. Model Development for Supply Chain Analytics with Hadoop
31
Profile: The Application Development team worked with
Revolution Analytics Consultants to build cloud-based supply
chain analytics platform
Key Technology and Services: R for Big Data
Analytics, Consulting, Training
Analytic Approach: Aggregate data from 15 data
sources including ERP data, store sales data and
sales forecast data to 25,000 store locations, 50 SKUs
nightly across 6 forecast models, order planning
models, running back tests and validation. Worked
with client to establish big data environment and
models that will generate 6.5 billion computations
daily by end of the year (in a 4-hour window for
processing). Scale and performance will allow new
capabilities such as seasonality, promotions and
incentives.
>Sales and Demand Data Analysis
>R/RRE Model Development
Bottom line: Work with client to develop predictive models,
starting with rigorous forecasts across various models,
generating forecast statistics and scoring each model
against historical data to come up with the best fit. The
forecast is input into an order-planning model that generates
recommendations to optimize product distribution and
ensure in-stock rate targets are achieved so that the right
amount of product is in the right location at the right time. .
“The amount of analytic horsepower required for this
application cannot be supported in traditional means; it
would require millions of dollars of hardware. R +
Hadoop is allowing us to have the compute capacity to
run 6.5 billion computations on nightly basis to
generate order plans for our clients.” VP Application
Development
Confidential – Do Not Distribute
32. Model Development for Vehicle Data Analysis
32
Profile: The Analytics R&D team of the multinational
automobile manufacturer worked with Revolution Analytics
Consultants to perform Survival Analysis, and to build and
deploy Decision Trees and Time Series models
Key Technology and Services: Revolution R Enterprise
for Big Data Analytics, Consulting, Training
Analytic Approach – Warranty Data Analysis:
Estimating the life of an automobile component using
Survival Analysis with Cox proportional hazards. Models
are trained using historical data, consisting of warranty
claims, and region and weather related variables such
snow, rain, temperature etc.
Outcome: New analytics paradigm for existing
processes introduced, with potential for millions of dollars
in cost savings through improved warranty contracts, and
re-designed automobile components.
>Warranty & Sensor Data Analysis
>R/Revolution R Enterprise Training
Analytic Approach – Sensor Data Analysis: Use sensor
data from vehicle components to build Decision Trees for
classification, and to establish range of predicted values for
sensor readings so that actual readings can be analyzed for
outliers.
Bottom line: New analytics initiative for building an
intelligent automobile system that’s capable of guiding the
driver upon detection of anomalies in driving patterns.
“The consultants and training instructors from
Revolution Analytics were very knowledgeable and
supported me very well. I am looking forward to taking
my learnings to the larger analytics team at my
company.” Senior Researcher, Analytics R&D
Confidential – Do Not Distribute
33. R Package Validation
33
Profile: The Clinical Trials Analytics team at the
multinational biopharmaceutical company moved from
SAS to R to develop big data analytics for Clinical Trials
Key Technology and Services: RUnit testing framework,
Revolution R Enterprise (RRE) and open source R
Approach: Validate third party (user-contributed) R
packages from CRAN by executing unit and regression
tests for functions both in the stated base package and its
dependent packages.
Outcome: Client moving from SAS to RRE for new
analytics initiatives for improved performance and cost
savings, and requires validation for user contributed
packages for reliability and compliance.
Challenge: The Clinical Trials Analytics team had “big data”
and “big computation” challenges, and needed a
centralized, scalable, and high-performance platform to
concurrently run the analytic models for faster analysis.
Bottom Line: Revolution R Enterprise acts as their
statistical analytics platform providing a centralized and
scalable platform for 10’s of data scientists and analysts.
Confidential – Do Not Distribute
User-contributed, Open Source R package
validation for Clinical Trial compliance to
support move from SAS to R & RRE
34. Model Optimization for Customer Analytics
34
Profile: The advanced analytics & IT Infrastructure teams at the
Las Vegas-based gaming corporation build and deploy
analytical models for internal customers such as Marketing &
Sales.
Key Technology and Services: Hadoop, Open Source R,
Consulting and Training
Analytic Approach: Assess the end-to-end flow of the
current Guest scoring model, and re-write the existing rmr/
R code using optimization techniques.
Outcome: 84% reduction in run time of the Guest Scoring
model, which helps the gaming company target their
customers with a customized marketing campaign within
minutes of performing a new activity such as checking into
the hotel, and buying tickets to a show.
Challenge: The IT Infrastructure team at the company was
challenged to support innovative, R-powered big data analytics
initiatives and needed to optimize their Analytics and Visualization
architecture.
Bottom line: Revolution Analytics consultants helped re-write R
analytics running inside Hadoop to achieve superior performance
and as a second project, designed a big data architecture
incorporating Cloudera, Teradata, Alteryx and Tableau
“Excellent work, Revolution!! We’re very glad that you came
on board to help us. Revolution Consultants get an A+.”
Technical Program Manager, Big Data Initiatives
Confidential – Do Not Distribute
> 84% improvement in performance &
reliability of Guest Scoring model
> Multi-layer big data infrastructure
architecture design
35. Revolution Analytics Services Overview
35
Training
• On-Site or
Remote Classes
• Classroom or
Self Paced
• Standard or
Tailored
Project Services
• Analytics
Strategy
• Analytics
Architecture
• Full Life Cycle
Projects
• Application
Migration
• Proof of concept
• Staff
Augmentation
• Package
Certification
Quick Start
Services
• Pre-production
• Jumpstart
value
• Combines
software,
training, and
services
Post Go-Live
Support
• Technical
Account
Management
• On-going
Training
37. 37
Why are so many companies using R?
Big Data
Data Science
Competition and Innovation
Open Source
Ecosystem
38. 38
Q&A / Resources
What is R?
revolutionanalytics.com/what-is-r
Companies using R
revolutionanalytics.com/companies-using-r
AcademyR training
revolutionanalytics.com/AcademyR
AcademyR Certification
revolutionanalytics.com/AcademyR-certification
Contact Revolution Analytics
revolutionanalytics.com/contact-us
39. Thank you
Join us August 7th at 10:00 AM, Pacific, for our
Moving from SAS to R webinar. Please visit
our website to register.
www.revolutionanalytics.com, 1.855.GET.REVO, Twitter: @RevolutionR
39