This document summarizes a webinar introducing Revolution R Open, an enhanced open source R distribution from Revolution Analytics. The webinar discusses Revolution R Open's focus on reproducibility, multi-threaded performance improvements, and compatibility. It also briefly describes other open source projects from Revolution Analytics including DeployR Open for deploying R applications, Rhadoop for integrating R with Hadoop, and ParallelR for parallel programming with R. The webinar concludes with two polls asking participants about their current software use and which Revolution Analytics projects they plan to use.
2. In today’s webinar:
R Update
Revolution R Open
The Reproducible R Toolkit
MRAN
Other open-source projects
• DeployR Open
• ParallelR
• Rhadoop
Revolution R Plus
Q&A
David Smith
Chief Community Officer
Revolution Analytics
@revodavid
david@revolutionanalytics.com
Editor, blog.revolutionanalytics.com
Co-author, “Introduction to R”
3. 3
OUR COMPANY
The leading provider
of advanced analytics
software and services
based on open source R,
since 2007
OUR PRODUCT
REVOLUTION R: The
enterprise-grade predictive
analytics application platform
based on the R language
SOME KUDOS
Visionary
Gartner Magic Quadrant
for Advanced Analytics
Platforms, 2014
4. What is R?
Most widely used data analysis software
• Used by 2M+ data scientists, statisticians and analysts
Most powerful statistical programming language
• Flexible, extensible and comprehensive for productivity
Create beautiful and unique data visualizations
• As seen in New York Times, Twitter and Flowing Data
Thriving open-source community
• Leading edge of analytics research
Fills the Data Science talent gap
• New graduates prefer R
www.revolutionanalytics.com/what-is-r
5. 5
Poll #1
What software do you use for statistical analysis? (Select all that apply.)
R
SAS
SPSS
Python
Other
6. 6
R’s popularity is growing rapidly
More at blog.revolutionanalytics.com/popularity
R Usage Growth
Rexer Data Miner Survey, 2007-2013
• Rexer Data Miner Survey • IEEE Spectrum, July 2014
#9: R
Language Popularity
IEEE Spectrum Top Programming Languages
7. 7
Revolution R Open is:
Enhanced Open Source R distribution
Compatible with all R-related software
Multi-threaded for performance
Focus on reproducibility
Open source (GPLv2 license)
Available for Windows, Mac OS X, Ubuntu,
Red Hat and OpenSUSE
Download from
mran.revolutionanalytics.com
8. 8
Multi-threaded performance
Intel MKL replaces standard
BLAS/LAPACK algorithms
Pipelined operations
– Optimized for Intel, works for all archs
High-performance algorithms
Sequential Parallel
– Uses as many threads as there are
available cores
– Control with:
setMKLthreads(<value>)
No need to change any R code
Included in RRO binary distribution
More at Revolutions blog
9. 9
100% Compatibility
Built on latest R engine
– Currently R 3.1.1, R 3.1.2 in testing
100% compatible with
– R scripts
– R packages
– Applications with R connections
Designed to work with Rstudio
– No configuration required
Replaces existing R application
– Side-by-side installations
10. Reproducibility – why do we care?
Academic / Research
Verify results
Advance Research
Business
Production code
Reliability
Reusability
Collaboration
Regulation
10
www.nytimes.com/2011/07/08/health/research/08genes.html
http://arxiv.org/pdf/1010.1092.pdf
12. 12
Reproducible R Toolkit in RRO
Static CRAN mirror
– CRAN packages fixed with each Revolution R Open update
Daily CRAN snapshots
– Storing every package version since September 2014
– Binaries and sources
– At mran.revolutionanalytics.com/snapshot
Easily write and share scripts synced to a specific snapshot
– “checkpoint” package installed with RRO
CRAN
RRDaily
snapshots
http://mran.revolutionanalytics.com/snapshot/
checkpoint
package
library(checkpoint)
checkpoint("2014-09-17")
CRAN mirror
http://cran.revolutionanalytics.com/
checkpoint
server
Midnight
UTC
13. 13
Using checkpoint
Easy to use: add 2 lines to the top of each script
library(checkpoint)
checkpoint("2014-09-17")
For the package author:
– Use package versions available on the chosen date
– Installs packages local to this project
• Allows different package versions to be used simultaneously
For a script collaborator:
– Automatically installs required packages
• Detects required packages (no need to manually install!)
– Uses same package versions as script author to ensure reproducibility
14. 14
MRAN: The Managed R Archive Network
Download Revolution R
Open
Learn about R and RRO
Daily CRAN snapshots
Explore Packages
– and dependencies
Explore Task Views
16. 16
DeployR Open
Goal: embed results from R scripts into
existing applications, in real time
Problem:
– Exposing arbitrary R functions is unwise
– Need to handle concurrent R sessions
Solution: DeployR Open
– R, on a server, behind a firewall
– Repository Manager defines entry points
• Expose only authorized R functions
– Automatically creates Web Services APIs
– Manages and monitors pool of R sessions
– Separates roles for R and app developer
DeployR Open: for prototyping integrations
– Revolution R Enterprise adds grid-scaling and
enterprise authentication
More at deployr.revolutionanalytics.com
17. 17
DeployR : Integration
DeployR does not provide any application UI.
3 integration modes embed real-time R results into existing interfaces
Web app, mobile app, desktop app, BI tool, Excel, …
RBroker Framework (tutorial):
Simple, high-performance API for Java, .NET and Javascript apps
Supports transactional, on-demand analytics on a stateless R session
Client Libraries (tutorial):
Flexible control of R services from Java, .NET and Javascript apps
Also supports stateful R integrations (e.g. complex GUIs)
DeployR Web Services API:
Integrate R using almost any client languages
18. Only available in Revolution R Enterprise DeployR
18
DeployR : Security / Scalability Layers
1. Anonymous execution
– Only authorized, user-defined R functions accessible
– No state preserved
2. Basic username / password authentication
– Managed in DeployR Administration Console
3. Enterprise Authentication
– Verifies identify with SSO / LDAP / Active Directory / PAM
4. Adaptive load-balancing grid
– Ensures service availability
20. 20
RHadoop and ParallelR
Toolkits for data scientists and numerical analysts to create custom
parallel and distributed algorithms
ParallelR: parallel programming for multi-CPU servers and grids
RHadoop: map-reduce programming in R language
Mainly useful for “embarrassingly parallel” problems, where parallel
components work with small amounts of data
Big Data Predictive Analytics mostly not embarrassingly parallel
80+ pre-built “parallel external memory algorithms” included with
Revolution R Enterprise
21. 21
RHadoop
Collection of packages for interfacing R and Hadoop
Client (desktop) R interface to Hadoop:
– rhdfs: Browse, read, write and modify files stored in HDFS
– rhbase: Browse, read, write and modify tables stored in HBASE
– ravro: Read, write and run map-reduce on Apache Avro files in HDFS
R computations in Hadoop:
– rmr2: write map-reduce tasks in R to run in Hadoop
– plyrmr: R-based data manipulation computations on data in Hadoop
RHadoop Wiki: github.com/RevolutionAnalytics/RHadoop/wiki
22. 22
Word count in RHadoop
Map:
– Input: lines of text
– Output: words with key value 1
Reduce:
– Input: Words with several key values
– Output: words with counts
Map-Reduce:
– Apply map to lines of text
– Gather like words together and count
26. 26
Revolution R Plus includes:
AdviseR™ Technical Support for:
– Revolution R Open
• Including R, base and recommended packages
– Reproducible R Toolkit
– ParallelR: Parallel programming with R
– RHadoop: R integration with Hadoop
– DeployR Open: Secure deployment of R to applications
Open Source Assurance for all supported components
– Provides legal indemnity for subscribers
Workstation subscriptions: $1,800 per year
– Server and Hadoop subscriptions also available
27. 27
AdviseR™ Technical Support
Technical support for R, from the R experts.
10x5 email and phone support (in your local time zone)
Full support for R, validated packages, and third-party software
connections
Notifications of updates and bug fixes
On-line case management and knowledgebase
Access to technical resources, documentation and user forums
Defined service-level agreements for rapid responses
Included with Revolution R Plus and Revolution R Enterprise.
28. 28
Open Source Assurance
Revolution Analytics will defend Revolution R Plus subscribers should a
third party make an intellectual property claim against covered open
source software with respect to:
– copyrights, patents, trademarks, trade secrets
Covered software includes:
– Revolution R Open (incl. R base and recommended packages), Reproducible R
Toolkit, DeployR Open, ParallelR, RHadoop
Revolution Analytics will defend open source software in court
– If necessary, Revolution Analytics will obtain rights, modify, or replace software
found to be infringing
– If a resolution can’t be found, fees paid in past 12 months will be refunded.
29. 29
The Revolution R Product Suite
• Free and open source R distribution
• Enhanced and distributed by Revolution Analytics
Revolution R Open
• Open-source distribution of R, packages, and other components
• Enhanced, supported and indemnified by Revolution Analytics
Revolution R Plus
• Secure, Scalable and Supported Distribution of R
• With proprietary components created by Revolution Analytics
Revolution R Enterprise
30. Revolution R Enterprise (RRE)
The All-Inclusive Big Data Big Analytics Platform
DistributedR
DeployR DevelopR
ScaleR
ConnectR
High-performance open source R plus:
Data source connectivity to big-data objects
Big-data advanced analytics
Multi-platform environment support
In-Hadoop and in-Teradata predictive modeling
Visual Studio IDE option
Secure, Scalable R Deployment
Technical support, training and services
– 24x7 support option
30
Contact Revolution Analytics for more info: www.revolutionanalytics.com/contact-us
31. 31
Poll #2
Which Revolution Analytics projects do you plan to use (or already use?)
Select all that apply:
1. Revolution R Open (free distribution)
2. Revolution R Plus (paid subscription for support and indemnification)
3. Reproducible R Toolkit (checkpoint package)
4. DeployR Open
5. Rhadoop / ParallelR
32. 32
Wrapping up…
Revolution R Open is available now from
mran.revolutionanalytics.com/download
Explore Revolution Analytics open-source projects at
projects.revolutionanalytics.com
Technical support and open-source assurance with
Revolution R Plus
www.revolutionanalytics.com/plus
David Smith
Chief Community Officer
Revolution Analytics
@revodavid
david@revolutionanalytics.co
m
33. Thank you.
Next up:
Batter Up! Advanced Sports Analytics with R and Storm
December 11, 2014
revolutionanalytics.com/webinars
www.revolutionanalytics.com
1.855.GET.REVO
Twitter: @RevolutionR