• Like

R for SAS Users Complement or Replace Two Strategies

  • 772 views
Uploaded on

Are you working in a SAS shop but want to add R based analytics to your portfolio? Learn why that is a great idea and how to do it.

Are you working in a SAS shop but want to add R based analytics to your portfolio? Learn why that is a great idea and how to do it.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
772
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
83
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Revolution Confidential SAS: Complement or Replace June, 2013 Nick Barber - Sales Director Andrie de Vries – Business Services Director Revolution Analytics
  • 2. Revolution Confidential Introductions and welcome 2 Andrie de Vries Business Services Director, Europe Nick Barber Sales Director - Europe
  • 3. Revolution Confidential Strawpoll: experiences with R and SAS? 3
  • 4. Revolution Confidential Agenda  Quick introduction to Revolution Analytics  Where does SAS and R fit in the Analytical Landscape  Introduction to R  Typical Challenges Facing Analytical Organisations  Differences between SAS and Revolution R  Big Data  Complex Computation  Enterprise Readiness  Production Efficiency  Access to Talent  Conclusions… 4
  • 5. Revolution Confidential Corporate Overview & Quick Facts Founded 2008 (as REvolution Computing) Office Locations Palo Alto (HQ), Seattle (Engineering) Singapore London CEO David Rich Number of customers 200+ Investors • Northbridge Venture Partners • Intel Capital • Platform Vendor Web site: • www.revolutionanalytics.com Revolution – “Contender” The Forrester Wave™: Big Data Predictive Analytics Solutions, Q1 2013 5 In the big data analytics context, speed and scale are critical drivers of success, and Revolution R delivers on both Revolution R Enterprise is the leading commercial analytics platform based on the open source R statistical computing language
  • 6. Revolution Confidential Consumer & Info SvcsConsumer & Info Svcs 200+ Corporate Customers and growing 6 Finance & InsuranceFinance & Insurance Healthcare & Life SciencesHealthcare & Life Sciences Manuf & TechManuf & TechAcademic & Gov’tAcademic & Gov’t Revolution Confidential
  • 7. Revolution Confidential Where does R fit in the analytical lifecycle 7 Analytical data Preparation Analytical data Exploration Model Devlopment Model Deployment ETL BI / opera tions Opensource R competencies Open source R is not - ETL - Business reporting tool - An end to end solution such as SAS Marketing Automation or SAS Fraud Framework
  • 8. Revolution Confidential Is:  The way to do statistical computing  A full blown programming language  The home of every data mining algorithm known to data science.  A vibrant world-wide community 8 R was written in early 1990’s by Robert Gentleman Ross Ihaka the evolution of the Since 1997 a core group of ~ 20 developers guides the evolution of the language
  • 9. Revolution Confidential Top companies are using R around the world  The NHS uses R to advance patient care and diagnosis  The New York Times routinely uses R for interactive and print data visualization.  Ogilvy Europe uses R to analyse digital media campaigns for major brands  Google has more than 500 R users.  The FDA supports the use of R for clinical trials of new drugs.  The National Weather Service uses R to predict the extent of events.  Facebook uses R to model user behaviour.  The Consumer Financial Protection Bureau uses R and other open source tools.  Twitter uses R for data science applications on the Twitter database.  John Deere uses R to forecast crop yields and optimize tractor manufacturing. 9 Companies are recognising the additional benefits of R
  • 10. Revolution Confidential Incredible Graphics and Data Visualization lead the way vs SAS  Functions for standard graphs  Scatterplot, time series, histogram, smoothing, …  Bar plot, pie chart, dot chart, …  Image plot, 3-D surface, map, …  Customize without limits  Combine graph types  Create entirely new graphics 10
  • 11. Revolution Confidential R is open source and drives analytic innovation but has some limitations for Enterprises Bigger  data sizes  Speed of  analysis  Production  support Memory Bound Big Data Single Threaded Scale out, parallel processing, high speed Community Support Commercial production support Innovation  and scale Innovative – 4500 packages+, exponential growth Combines with open source R packages where needed 11
  • 12. Revolution Confidential Typical Challenges Facing Analytical Organisations 12 Big Data • New Data Sources • Data Variety & Velocity • Fine Grain Control • Data Movement, Memory Limits Big Data • New Data Sources • Data Variety & Velocity • Fine Grain Control • Data Movement, Memory Limits Complex Computation • Innovative Models • Experimentation • Many Small Models • Ensemble Models • Simulation Complex Computation • Innovative Models • Experimentation • Many Small Models • Ensemble Models • Simulation Enterprise Readiness • Heterogeneous Landscape • Write Once, Deploy Anywhere • Production Support • How to put analytics in the hands of business users Enterprise Readiness • Heterogeneous Landscape • Write Once, Deploy Anywhere • Production Support • How to put analytics in the hands of business users Speed & Production Efficiency • Shorter Model Shelf Life • Volume of Models • Long End-to- End Cycle Time • Pace of Decision Accelerated • Hardware Required Speed & Production Efficiency • Shorter Model Shelf Life • Volume of Models • Long End-to- End Cycle Time • Pace of Decision Accelerated • Hardware Required Talent • Finding data scientists • Ensuring work- force is continually trained • Creating an Analytical culture Talent • Finding data scientists • Ensuring work- force is continually trained • Creating an Analytical culture
  • 13. Revolution Confidential Lets talk BIG DATA 13 Big Data • New Data Sources • Data Variety & Velocity • Fine Grain Control • Data Movement, Memory Limits Big Data • New Data Sources • Data Variety & Velocity • Fine Grain Control • Data Movement, Memory Limits Complex Computation • Innovative Models • Experimentation • Many Small Models • Ensemble Models • Simulation Complex Computation • Innovative Models • Experimentation • Many Small Models • Ensemble Models • Simulation Enterprise Readiness • Heterogeneous Landscape • Write Once, Deploy Anywhere • Production Support • How to put analytics in the hands of business users Enterprise Readiness • Heterogeneous Landscape • Write Once, Deploy Anywhere • Production Support • How to put analytics in the hands of business users Speed & Production Efficiency • Shorter Model Shelf Life • Volume of Models • Long End-to- End Cycle Time • Pace of Decision Accelerated • Hardware Required Speed & Production Efficiency • Shorter Model Shelf Life • Volume of Models • Long End-to- End Cycle Time • Pace of Decision Accelerated • Hardware Required Talent • Finding data scientists • Ensuring work- force is continually trained • Creating an Analytical culture Talent • Finding data scientists • Ensuring work- force is continually trained • Creating an Analytical culture
  • 14. Revolution Confidential How do SAS and Revolution R stack up for Big Data  Both handle large data sets well (big speed differences….)  Both have high speed database connectors to handle variety / velocity  Object Orientated nature of R handles data manipulation and visualisation in a superior way  Data Step parallel functions (such as merge/sort/cleansing) in Revolution R are available only in SAS HPA environments  RHadoop project (rhbase, rhdfs, rmr) run in-side Hadoop 14 Big Data • New Data Sources • Data Variety & Velocity • Fine Grain Control • Data Movement, Memory Limits
  • 15. Revolution Confidential Lets talk Complex Computation 15 Big Data • New Data Sources • Data Variety & Velocity • Fine Grain Control • Data Movement, Memory Limits Big Data • New Data Sources • Data Variety & Velocity • Fine Grain Control • Data Movement, Memory Limits Complex Computation • Innovative Models • Experimentation • Many Small Models • Ensemble Models • Simulation Complex Computation • Innovative Models • Experimentation • Many Small Models • Ensemble Models • Simulation Enterprise Readiness • Heterogeneous Landscape • Write Once, Deploy Anywhere • Production Support • How to put analytics in the hands of business users Enterprise Readiness • Heterogeneous Landscape • Write Once, Deploy Anywhere • Production Support • How to put analytics in the hands of business users Speed & Production Efficiency • Shorter Model Shelf Life • Volume of Models • Long End-to- End Cycle Time • Pace of Decision Accelerated • Hardware Required Speed & Production Efficiency • Shorter Model Shelf Life • Volume of Models • Long End-to- End Cycle Time • Pace of Decision Accelerated • Hardware Required Talent • Finding data scientists • Ensuring work- force is continually trained • Creating an Analytical culture Talent • Finding data scientists • Ensuring work- force is continually trained • Creating an Analytical culture
  • 16. Revolution Confidential How do SAS and Revolution R stack up for Complex Computation  Innovative Models: More functions available in R 16 Complex Computation • Innovative models • Experimentation • Many Small Models • Ensemble Models • Simulation 0 1,000 2,000 3,000 4,000 5,000 1,192 4,500 R SAS R 2.15.2 Packages SAS 9.3 statements, procedures, functions and call routines Source: http://r4stats.com/2013/03/19/r-2012-growth-exceeds-sas-all-time-total/
  • 17. Revolution Confidential How do SAS and Revolution R stack up for Complex Computation  Revolution R runs in parallel across multiple nodes and cores  SAS runs in parallel in SAS Grid multiple jobs, but still single threaded  SAS can run in parallel in SAS HPA 17 Complex Computation at Speed • Innovative Models • Experimentation • Precision • Many Small Models • Ensemble Models • Simulation
  • 18. Revolution Confidential Lets talk Enterprise Readiness 18 Big Data • New Data Sources • Data Variety & Velocity • Fine Grain Control • Data Movement, Memory Limits Big Data • New Data Sources • Data Variety & Velocity • Fine Grain Control • Data Movement, Memory Limits Complex Computation • Innovative Models • Experimentation • Many Small Models • Ensemble Models • Simulation Complex Computation • Innovative Models • Experimentation • Many Small Models • Ensemble Models • Simulation Enterprise Readiness • Heterogeneous Landscape • Write Once, Deploy Anywhere • Production Support • How to put analytics in the hands of business users Enterprise Readiness • Heterogeneous Landscape • Write Once, Deploy Anywhere • Production Support • How to put analytics in the hands of business users Speed & Production Efficiency • Shorter Model Shelf Life • Volume of Models • Long End-to- End Cycle Time • Pace of Decision Accelerated • Hardware Required Speed & Production Efficiency • Shorter Model Shelf Life • Volume of Models • Long End-to- End Cycle Time • Pace of Decision Accelerated • Hardware Required Talent • Finding data scientists • Ensuring work- force is continually trained • Creating an Analytical culture Talent • Finding data scientists • Ensuring work- force is continually trained • Creating an Analytical culture
  • 19. Revolution Confidential How do SAS and Revolution R stack up for Enterprise Readiness  Both handle heterogeneous landscapes  SAS runs on anything but mostly single threaded apart from Teradata and Greenplum (no cloud except through own managed services)  Revolution runs across windows/Linux clusters, cores, Hadoop, Amazon Web Services, Microsoft Azure, Netezza and Teradata  SAS Programmers must write code for the required environment, whilst Revolution R code is device independent  Both offer good production support  SAS integrates with pretty much all common BI reporting tools as does Revolution 19 Enterprise Readiness • Heterogeneous Landscape • Write Once, Deploy Anywhere • Production Support • How to put analytics in the hands of business users
  • 20. Revolution Confidential Lets talk Production Efficiency 20 Big Data • New Data Sources • Data Variety & Velocity • Fine Grain Control • Data Movement, Memory Limits Big Data • New Data Sources • Data Variety & Velocity • Fine Grain Control • Data Movement, Memory Limits Complex Computation • Innovative Models • Experimentation • Many Small Models • Ensemble Models • Simulation Complex Computation • Innovative Models • Experimentation • Many Small Models • Ensemble Models • Simulation Enterprise Readiness • Heterogeneous Landscape • Write Once, Deploy Anywhere • Production Support • How to put analytics in the hands of business users Enterprise Readiness • Heterogeneous Landscape • Write Once, Deploy Anywhere • Production Support • How to put analytics in the hands of business users Speed & Production Efficiency • Shorter Model Shelf Life • Volume of Models • Long End-to- End Cycle Time • Pace of Decision Accelerated • Hardware Required Speed & Production Efficiency • Shorter Model Shelf Life • Volume of Models • Long End-to- End Cycle Time • Pace of Decision Accelerated • Hardware Required Talent • Finding data scientists • Ensuring work- force is continually trained • Creating an Analytical culture Talent • Finding data scientists • Ensuring work- force is continually trained • Creating an Analytical culture
  • 21. Revolution ConfidentialHow do SAS and Revolution R stack up for Speed & Production Efficiency? 21 Speed & Production Efficiency • Shorter Model Shelf Life • Volume of Models • Long End-to-End Cycle Time • Pace of Decision Accelerated *As published by SAS in HPC Wire, April 21, 2011 http://www.hpcwire.com/hpcwire/2011-04-19/sas_brings_high_performance_analytics_to_database_appliances.html
  • 22. Revolution Confidential Options for handling Speed 22 SAS - Normal SAS - Single Threaded SAS Grid - Platform LSF - Single Threaded SAS In-Database Scoring - Teradata Accelerator - Greenplum Accelerator SAS High Performance Computing - Visual Analytics - HPA on Teradata / Greenplum Revolution R - DistributedR parallel compute contexts, windows, Linux, Amazon Azure, Hadoop, Netezza …but Multi-threaded …All databases that support PMML …Commodity hardware, Hadoop, Netezza, (Teradata October)
  • 23. Revolution Confidential Lets see some R in action…… 23 Andrie de Vries Business Services Director, Europe
  • 24. Revolution Confidential Lets talk Talent 24 Big Data • New Data Sources • Data Variety & Velocity • Fine Grain Control • Data Movement, Memory Limits Big Data • New Data Sources • Data Variety & Velocity • Fine Grain Control • Data Movement, Memory Limits Complex Computation • Innovative Models • Experimentation • Many Small Models • Ensemble Models • Simulation Complex Computation • Innovative Models • Experimentation • Many Small Models • Ensemble Models • Simulation Enterprise Readiness • Heterogeneous Landscape • Write Once, Deploy Anywhere • Production Support • How to put analytics in the hands of business users Enterprise Readiness • Heterogeneous Landscape • Write Once, Deploy Anywhere • Production Support • How to put analytics in the hands of business users Speed & Production Efficiency • Shorter Model Shelf Life • Volume of Models • Long End-to- End Cycle Time • Pace of Decision Accelerated • Hardware Required Speed & Production Efficiency • Shorter Model Shelf Life • Volume of Models • Long End-to- End Cycle Time • Pace of Decision Accelerated • Hardware Required Talent • Finding data scientists • Ensuring work- force is continually trained • Creating an Analytical culture Talent • Finding data scientists • Ensuring work- force is continually trained • Creating an Analytical culture
  • 25. Revolution Confidential Talent gap emerging  Will finding SAS talent become more difficult?  Programming community want to keep up to date and work on modern object orientated languages  Many universities have adopted R as the defacto analytics standard for statistics  Since 2012, USA job descriptions that included “SAS” declined by 7.3% whilst Jobs for “R” increased by 42% (number of jobs on indeed.com) 25 Search phrase: “Statistics Programming” Sorted by popularity (May 29, 2013) 7 out of 10 books based on R 0 out of 10 books based on SAS or SPSS
  • 26. Revolution Confidentialwww.revolutionanalytics.com - Page Views 26 0 20000 40000 60000 80000 100000 120000 140000 160000 151302 36724 28321 27718 19888 12990 13615 11096 11748 10442 Page Views - Top 10 Countries 01/04/2013 – 25/05/2013 197454 163055 112172 19303 6544 4073 738 10624795 Page Views by Geo – 01/04/2013 – 25/05/2013 EUROPE NORTH AMERICA APJ SOUTH AMERICA AFRICA MIDDLE EAST NA CARIBBEAN CENTRAL AMERICA 15645 76227 EMEA Page Views by Organisation Type Academic Commercial
  • 27. Revolution Confidential Functionality SAS Software Revolution R Foundation Statistics Graphics Matrix Operations Optimization Time Series Quality Control Database Access Deploy in Excel Deploy in BI Distributed Algorithms Parallel small compute In Database Scoring 27 Base SAS SAS/STAT SAS/Graph SAS IML SAS/OR SAS ETS SAS QC SAS/ACCESS SAS Business Intelligence SAS HPA Server SAS Grid SAS DB Accelerators How do the modules breakdown
  • 28. Revolution Confidential Confidential to Revolution Analytics 28  Training courses helping companies train SAS users
  • 29. Revolution Confidential Conclusions  Complement SAS when…  End to end industry based solutions from SAS are a good fit for a particular business problem (e.g. SAS Fraud Framework for Insurance, Marketing Automation for Retail )  Complement when innovative models needed, visualisation or big data/complex model support is required  Choose SAS when users are not coders and need a point and click interface (SAS enterprise guide, SAS enterprise miner)  Existing SAS landscape requires significant re- training 29
  • 30. Revolution Confidential Conclusions  Replace SAS when…  Cost savings, do things faster, deal with bigger data  Big data and complex processing is required  Innovative models that give a competitive advantage  Access to talent today and in the future  Flexible compute environments are required 30
  • 31. Revolution Confidential 31 www.revolutionanalytics.com  Twitter: @RevolutionR The leading commercial provider of software and support for the popular  open source R statistics language. Thank you