Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit 2014)


Published on

Presented by David Smith, Chief Community Officer, Revolution Analytics at Garner Business Intelligence and Analytics Summit, April 2014.

In this presentation, I'll introduce the open source R language — the modern standard for Data Science — and the enhanced performance, scalability and ease-of-use capabilities of Revolution R Enterprise. Customer case studies will illustrate Revolution R Enterprise as a component of the real-time analytics deployment process, via integration with Hadoop, database warehousing systems and Cloud platforms, to implement data-driven end-user applications.

Published in: Software, Technology

Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit 2014)

  1. 1. Big Data Predictive Analytics with Revolution R Enterprise David Smith Gartner BI Conference, April 2014 Chief Community Officer @revodavid
  2. 2. 2 OUR COMPANY The leading provider of advanced analytics software and services based on open source R, since 2007 OUR SOFTWARE The only Big Data, Big Analytics software platform based on the data science language R KUDOS Visionary Gartner Magic Quadrant for Advanced Analytics Platforms, 2014
  3. 3. What is R?  Most widely used data analysis software • Used by 2M+ data scientists, statisticians and analysts  Most powerful statistical programming language • Flexible, extensible and comprehensive for productivity  Create beautiful and unique data visualizations • As seen in New York Times, Twitter and Flowing Data  Thriving open-source community • Leading edge of analytics research  Fills the talent gap • New graduates prefer R R is Hot WHITE PAPER
  4. 4. Exploding growth and demand for R  R is the highest paid IT skill  R most-used data science language after SQL  R is used by 70% of data miners  R is #15 of all programming languages  R growing faster than any other data science language  R is the #1 Google Search for Advanced Analytics software  R has more than 2 million users worldwide R Usage Growth Rexer Data Miner Survey, 2007-2013 70% of data miners report using R R is the first choice of more data miners than any other software Source:
  5. 5. 5 Technical Support for Open Source R AdviseR™ from Revolution Analytics Technical support for open source R, from the R experts.  24x7 email and phone support  On-line case management and knowledgebase  Access to technical resources, documentation and user forums  Exclusive on-line webinars from community experts  Guaranteed response times Also available: expert hands-on and on-line training for R, from Revolution Analytics AcademyR.
  6. 6. Revolution R Enterprise  High Performance, Scalable Analytics  Portable Across Enterprise Platforms  Easier to Build & Deploy Analytics is…. the only big data big analytics platform based on open source R 6
  7. 7. Big Data In-memory bound Hybrid memory & disk scalability Operates on bigger volumes & factors Speed of Analysis Single threaded Parallel threading Shrinks analysis time Enterprise Readiness Community support Commercial support Delivers full service production support Analytic Breadth & Depth 5000+ innovative analytic packages Leverage open source packages plus Big Data ready packages Supercharges R Commercial Viability Risk of deployment of open source GPL-compatible licensing Eliminate risk with open source Enhancing Open Source R for the Enterprise 7
  8. 8. COMBINE INTERMEDIATE RESULTS 8 Powering Next Generation Analytics Parallel External Memory Algorithms
  9. 9.  Unique PEMAs: Parallel, external-memory algorithms  High-performance, scalable replacements for R/SAS analytic functions  Parallel/distributed processing eliminates CPU bottleneck  Data streaming eliminates memory size limitations  Works with in-memory and disk-based architectures 9 Eliminates Performance and Capacity Limits of Open Source R and Legacy SAS
  10. 10. All of Open Source R plus:  Big Data scalability  High-performance analytics  Development and deployment tools  Data source connectivity  Application integration framework  Multi-platform architecture  Support, Training and Services 10 is the Big Data Big Analytics Platform
  11. 11. DistributedR ScaleR ConnectR DeployR DESIGNED FOR SCALE, PORTABILITY & PERFORMANCE In the Cloud Amazon AWS Workstations & Servers Windows Red Hat and SUSE Linux Clustered Systems IBM Platform LSF Microsoft HPC EDW IBM Netezza Teradata Hadoop Hortonworks Cloudera 11 Write Once. Deploy Anywhere.
  12. 12. Write Once  Deploy Anywhere rxSetComputeContext("local") # DEFAULT rxSetComputeContext(RxHadoopMR(<data, server environment arguments>)) # Summarize and calculate descriptive statistics from the data airDS data set adsSummary = rxSummary(~ArrDelay+CRSDepTime+DayOfWeek, data = airDS) # Fit Linear Regression Model arrDelayLm1 = rxLinMod(ArrDelay ~ DayOfWeek, data = airDS); summary(arrDelayLm1) rxSetComputeContext(RxHpcServer(<data, server environment arguments>)) rxSetComputeContext(RxLsfCluster(<data, server environment arguments>)) Same code to be run anywhere ….. Local System (default)     Set the desired compute context for code execution….. rxSetComputeContext(RxTeradata(<data, server environment arguments>)) 
  13. 13. 13 In-Hadoop Big Data Big Analytics  Eliminate data movement latency  Speed model development  Use commodity Hadoop nodes as analytics engine Name Node Data NodeData Node Data NodeData Node Data Node Job Tracker Task Tracker Task Tracker Task Tracker Task Tracker Task Tracker MapReduce HDFS
  14. 14. 14 Revolution Analytics coupled with the Teradata Unified Data Architecture accelerates big data analytics with the R language. + In-Database Analytics:  Parallel R in-database for big data analytics on Teradata  Build parallel R models completely in R  Use Teradata appliance as analytics engine  No need to move data Teradata 14.10 + Revolution R Enterprise V7
  15. 15. 15 RRE7 in the Cloud  Revolution R Enterprise 7, on the industry-leading cloud platform  Pay as you go, priced by cores x hours – No long-term commitment required  Launch Windows and Linux servers on demand – Windows 2008 R2 with DevelopR – RHEL 6 with RStudio Server Professional – Server instances from 2 – 32 cores – Analyze data sets up to 2 TB  Convenient, consistent and reliable – Available globally, accessible anywhere – Forum-based support with registration  Free 14-day trial available CLOUD SERVERS $0.70 PER CORE/HOUR PLUS AWS INFRASTRUCTURE COSTS
  16. 16. Revolution R Enterprise Ecosystem Integration with the Big Data Analytics Stack Deployment / Consumption Data / Infrastructure Advanced Analytics ETL SI / Service MSP / DSP 16
  17. 17. How Customers Revolutionize their Business Power “We’ve combined Revolution R Enterprise and Hadoop to build and deploy customized exploratory data analysis and GAM survival models for our marketing performance management and attribution platform. Given that our data sets are already in the terabytes and are growing rapidly, we depend on Revolution R Enterprise’s scalability and power – we saw about a 4x performance improvement on 50 million records. It works brilliantly.” - CEO, John Wallace, DataSong 4X performance 50M records scored daily Scalability “We’ve been able to scale our solution to a problem that’s so big that most companies could not address it. If we had to go with a different solution we wouldn’t be as efficient as we are now.” - SVP Analytics, Kevin Lyons, eXelate TB’s data from 200+ data sources 10’s thousands attributes 100’s millions of scores daily 2X data 2X attributes no impact on performance Performance “We need a high-performance analytics infrastructure because marketing optimization is a lot like a financial trading. By watching the market constantly for data or market condition updates, we can now identify opportunities for our clients that would otherwise be lost.” - Chief Analytics Officer, Leon Zemel, [x+1]
  18. 18. Why Revolution R Enterprise? 18 Platform Independence Take Big Cost Out of Big Data Supercharge R for Massive Data Power R for the Enterprise
  19. 19. Thank You David Smith Chief Community Officer @revodavid