Applications in R - Success and Lessons Learned from the Marketplace


Published on

Adoption of the R language has grown rapidly in the last few years, and is ranked as the number-one data science language in several surveys. This accelerating R adoption curve has been driven by the Big Data revolution, and the fact that so many data scientists — having learned R at university — are actively unlocking the secrets hidden in these new, vast data troves.

In this webinar David Smith, Chief Community Officer, will take a look at the growth of R and the innovative uses of R in business, government and non-profit sectors. Then Neera Talbert, Vice President, Professional Services will take you into the trenches of recent customer deployments and share best practices and pitfalls to avoid in deploying or expanding your own R applications.

Published in: Data & Analytics

Applications in R - Success and Lessons Learned from the Marketplace

  1. 1. Applications in R Success and Lessons Learned from the Marketplace David Smith Chief Community Officer Revolution Analytics July 29, 2014 Neera Talbert VP Professional Services Revolution Analytics
  2. 2. Agenda Introduction to R Growth of R Applications of R Q&A David Smith Chief Community Officer @revodavid Editor, Co-author, “Introduction to R”
  3. 3. 3 OUR COMPANY The leading provider of advanced analytics software and services based on open source R, since 2007 OUR SOFTWARE The only Big Data, Big Analytics software platform based on the data science language R SOME KUDOS Visionary Gartner Magic Quadrant for Advanced Analytics Platforms, 2014
  4. 4. What is R?  Most widely used data analysis software • Used by 2M+ data scientists, statisticians and analysts  Most powerful statistical programming language • Flexible, extensible and comprehensive for productivity  Create beautiful and unique data visualizations • As seen in New York Times, Twitter and Flowing Data  Thriving open-source community • Leading edge of analytics research  Fills the talent gap • New graduates prefer R
  5. 5. Poll #1 What data analysis software is used where you work? 5
  6. 6. 6 R’s popularity is growing rapidly R Usage Growth Rexer Data Miner Survey, 2007-2013 • Rexer Data Miner Survey • IEEE Spectrum, July 2014 #9: R Language Popularity IEEE Spectrum Top Programming Languages
  7. 7. 7 R is among the highest-paid IT skills in the US Dice Tech Salary Survey, January 2014 O’Reilly Strata 2013 Data Science Salary Survey
  8. 8. 8 Technical Support for Open Source R AdviseR™ from Revolution Analytics Technical support for open source R, from the R experts.  Email and phone support 8AM-6PM, Mon-Fri  Support for R, validated packages, and third-party software connections  On-line case management and knowledgebase  Access to technical resources, documentation and user forums  Exclusive on-line webinars from community experts  Guaranteed response times Also available: expert hands-on and on-line training for R, from Revolution Analytics AcademyR. R SUPPORT 12 MONTHS $795 PER USER
  9. 9. Applications of R 9
  10. 10. Facebook • Exploratory Data Analysis • Experimental Analysis “Generally, we use R to move fast when we get a new data set. With R, we don’t need to develop custom tools or write a bunch of code. Instead, we can just go about cleaning and exploring the data.” — Solomon Messing, data scientist at Facebook
  11. 11. Facebook • Big-Data Visualization “It resonated with many people. It's not just a pretty picture, it's a reaffirmation of the impact we have in connecting people, even across oceans and borders.” — Paul Butler, data scientist, Facebook
  12. 12. Google 12 “The great beauty of R is that you can modify it to do all sorts of things.” — Hal Varian Chief Economist, Google • Advertising Effectiveness “R is really important to the point that it's hard to overvalue it.” — Daryl Pregibon Head of Statistics, Google • Economic forecasting
  13. 13. 13 Twitter • Data Visualization • Semantic clustering “A common pattern for me is that I'll code a MapReduce job in Scala, do some simple command-line munging on the results, pass the data into Python or R for further analysis, pull from a database to grab some extra fields, and so on, often integrating what I find into some machine learning models in the end” — Ed Chen, Data Scientist, Twitter
  14. 14. City of Chicago 14 PublicHealth • Food poisoning monitor
  15. 15. 15 The New York Times Interactive Features • Election Forecast • Dialect Quiz Data Journalism • NFL Draft Picks • Wealth distribution in USA
  16. 16. 16 The New York Times Data Visualization • Facebook IPO • Baseball legends
  17. 17. 17 Video Gaming • Multiplayer Matchmaking • Player Churn • Game design • Difficulty curve • Level trouble-spots • In-game purchase optimization • Fraud detection • Player communities • Game Analysis VideoGames
  18. 18. 18 Housing • Crime mapping • choroplethr package “The core innovation that Zillow offers are its advanced statistical predictive products, including the Zestimate®, the Rent Zestimate and the ZHVI® family of real estate indexes. By using R in production as well as research, Zillow maximizes flexibility and minimizes the latency in rolling out updates and new products.” • Statistical forecasting RealEstate
  19. 19. 19 FinanceandBanking • Credit Risk Analysis • Financial Networks
  20. 20. 20 Pharmaceuticals “R use at the FDA is completely acceptable and has not caused any problems.” — Dr Jae Brodsky, Office of Biostatistics, Food and Drug Administration Regulatory Drug Approvals • Reproducible research • Accurate, reliable and consistent statistical analysis • Internal reporting (Section 508 compliance)
  21. 21. Power “We’ve combined Revolution R Enterprise and Hadoop to build and deploy customized exploratory data analysis and GAM survival models for our marketing performance management and attribution platform. Given that our data sets are already in the terabytes and are growing rapidly, we depend on Revolution R Enterprise’s scalability and power – we saw about a 4x performance improvement on 50 million records. It works brilliantly.” - CEO, John Wallace, DataSong 4X performance 50M records scored daily Scalability “We’ve been able to scale our solution to a problem that’s so big that most companies could not address it. If we had to go with a different solution we wouldn’t be as efficient as we are now.” - SVP Analytics, Kevin Lyons, eXelate TB’s data from 200+ data sources 10’s thousands attributes 100’s millions of scores daily 2X data 2X attributes no impact on performance Performance “We need a high-performance analytics infrastructure because marketing optimization is a lot like a financial trading. By watching the market constantly for data or market condition updates, we can now identify opportunities for our clients that would otherwise be lost.” - Chief Analytics Officer, Leon Zemel, [x+1] MarketingAnalytics
  22. 22. All of Open Source R plus:  Big Data scalability  High-performance analytics  Development and deployment tools  Data source connectivity  Application integration framework  Multi-platform architecture  Technical Support  Available training and services 22 is the Big Data Big Analytics Platform
  23. 23. Poll #2 What kinds of R projects are underway where you work? 23
  24. 24. 24 Neera Talbert, VP Big Data & Advanced Analytic Services  Leads Services at Revolution Analytics  Fifteen years of experience the business analytics software industry  Works with Fortune 500 companies to define analytics strategy, implement analytic based decision making, reduce decision latency, and increase speed of decision making – Analytics, business intelligence, big data analytics, risk – Customer intelligence, supply chain, manufacturing, retail, oil & gas, public sector.
  25. 25. Organizational Readiness “There will be almost half a million jobs in five years, and a shortage of up to 190,000 qualified data scientists, plus a need for 1.5 million executives and support staff who have an understanding of data” McKinsey Global Institute April 2013
  26. 26. Opportunity to develop talent Data Science “the sexiest job in the 21st century,” - Harvard Business Review A cross between computer engineers, statisticians and business analyst – people who ask good questions and open to working with unstructured information Universities can’t produce them fast enough – need 60% more resources – McKinsey Global Institute
  27. 27. Our Philosophy “The Hands-on exercises were the best part of Revolution Analytics training” - A participant from a global telecom company
  28. 28. Course Catalog
  29. 29. RRE Certification Testing  Demonstrate your R and RRE programming knowledge – Fundamentals in R Language – Data Management in Revolution R Enterprise – Modeling in Revolution R Enterprise  Independently proctored exam – online and onsite
  30. 30. Training Data Science team for Big Data Analytics 30 “Given that our data sets are already in the terabytes and are growing rapidly, we depend on Revolution R Enterprise’s scalability and power. We saw about a 4x performance improvement on 50 million records. It works brilliantly.” CEO, John Wallace (DataSong formerly named UpStream) 4X performance 50M+ records scored daily Key Technology: Revolution R Enterprise and Hadoop, replacing SAS and Open Source R Outcomes: Massively scalable infrastructure to support attribution and optimization at an individual customer level (segments of one) for clients such as Williams-Sonoma. Client saved $250K in one campaign. Rapid development and deployment of customer-specific models, using innovative analytic techniques such as big data GAM Survival models Bottom Line: Driving revenue lift and cost savings through marketing optimization Profile: Multi-channel marketing attribution and analytics software developer and service provider. Growing, innovative, cost-conscious.
  31. 31. Model Development for Supply Chain Analytics with Hadoop 31 Profile: The Application Development team worked with Revolution Analytics Consultants to build cloud-based supply chain analytics platform Key Technology and Services: R for Big Data Analytics, Consulting, Training Analytic Approach: Aggregate data from 15 data sources including ERP data, store sales data and sales forecast data to 25,000 store locations, 50 SKUs nightly across 6 forecast models, order planning models, running back tests and validation. Worked with client to establish big data environment and models that will generate 6.5 billion computations daily by end of the year (in a 4-hour window for processing). Scale and performance will allow new capabilities such as seasonality, promotions and incentives. >Sales and Demand Data Analysis >R/RRE Model Development Bottom line: Work with client to develop predictive models, starting with rigorous forecasts across various models, generating forecast statistics and scoring each model against historical data to come up with the best fit. The forecast is input into an order-planning model that generates recommendations to optimize product distribution and ensure in-stock rate targets are achieved so that the right amount of product is in the right location at the right time. . “The amount of analytic horsepower required for this application cannot be supported in traditional means; it would require millions of dollars of hardware. R + Hadoop is allowing us to have the compute capacity to run 6.5 billion computations on nightly basis to generate order plans for our clients.” VP Application Development Confidential – Do Not Distribute
  32. 32. Model Development for Vehicle Data Analysis 32 Profile: The Analytics R&D team of the multinational automobile manufacturer worked with Revolution Analytics Consultants to perform Survival Analysis, and to build and deploy Decision Trees and Time Series models Key Technology and Services: Revolution R Enterprise for Big Data Analytics, Consulting, Training Analytic Approach – Warranty Data Analysis: Estimating the life of an automobile component using Survival Analysis with Cox proportional hazards. Models are trained using historical data, consisting of warranty claims, and region and weather related variables such snow, rain, temperature etc. Outcome: New analytics paradigm for existing processes introduced, with potential for millions of dollars in cost savings through improved warranty contracts, and re-designed automobile components. >Warranty & Sensor Data Analysis >R/Revolution R Enterprise Training Analytic Approach – Sensor Data Analysis: Use sensor data from vehicle components to build Decision Trees for classification, and to establish range of predicted values for sensor readings so that actual readings can be analyzed for outliers. Bottom line: New analytics initiative for building an intelligent automobile system that’s capable of guiding the driver upon detection of anomalies in driving patterns. “The consultants and training instructors from Revolution Analytics were very knowledgeable and supported me very well. I am looking forward to taking my learnings to the larger analytics team at my company.” Senior Researcher, Analytics R&D Confidential – Do Not Distribute
  33. 33. R Package Validation 33 Profile: The Clinical Trials Analytics team at the multinational biopharmaceutical company moved from SAS to R to develop big data analytics for Clinical Trials Key Technology and Services: RUnit testing framework, Revolution R Enterprise (RRE) and open source R Approach: Validate third party (user-contributed) R packages from CRAN by executing unit and regression tests for functions both in the stated base package and its dependent packages. Outcome: Client moving from SAS to RRE for new analytics initiatives for improved performance and cost savings, and requires validation for user contributed packages for reliability and compliance. Challenge: The Clinical Trials Analytics team had “big data” and “big computation” challenges, and needed a centralized, scalable, and high-performance platform to concurrently run the analytic models for faster analysis. Bottom Line: Revolution R Enterprise acts as their statistical analytics platform providing a centralized and scalable platform for 10’s of data scientists and analysts. Confidential – Do Not Distribute User-contributed, Open Source R package validation for Clinical Trial compliance to support move from SAS to R & RRE
  34. 34. Model Optimization for Customer Analytics 34 Profile: The advanced analytics & IT Infrastructure teams at the Las Vegas-based gaming corporation build and deploy analytical models for internal customers such as Marketing & Sales. Key Technology and Services: Hadoop, Open Source R, Consulting and Training Analytic Approach: Assess the end-to-end flow of the current Guest scoring model, and re-write the existing rmr/ R code using optimization techniques. Outcome: 84% reduction in run time of the Guest Scoring model, which helps the gaming company target their customers with a customized marketing campaign within minutes of performing a new activity such as checking into the hotel, and buying tickets to a show. Challenge: The IT Infrastructure team at the company was challenged to support innovative, R-powered big data analytics initiatives and needed to optimize their Analytics and Visualization architecture. Bottom line: Revolution Analytics consultants helped re-write R analytics running inside Hadoop to achieve superior performance and as a second project, designed a big data architecture incorporating Cloudera, Teradata, Alteryx and Tableau “Excellent work, Revolution!! We’re very glad that you came on board to help us. Revolution Consultants get an A+.” Technical Program Manager, Big Data Initiatives Confidential – Do Not Distribute > 84% improvement in performance & reliability of Guest Scoring model > Multi-layer big data infrastructure architecture design
  35. 35. Revolution Analytics Services Overview 35 Training • On-Site or Remote Classes • Classroom or Self Paced • Standard or Tailored Project Services • Analytics Strategy • Analytics Architecture • Full Life Cycle Projects • Application Migration • Proof of concept • Staff Augmentation • Package Certification Quick Start Services • Pre-production • Jumpstart value • Combines software, training, and services Post Go-Live Support • Technical Account Management • On-going Training
  36. 36. Poll #3 What's the biggest R need at your company? 36
  37. 37. 37 Why are so many companies using R? Big Data Data Science Competition and Innovation Open Source Ecosystem
  38. 38. 38 Q&A / Resources What is R? Companies using R AcademyR training AcademyR Certification Contact Revolution Analytics
  39. 39. Thank you Join us August 7th at 10:00 AM, Pacific, for our Moving from SAS to R webinar. Please visit our website to register., 1.855.GET.REVO, Twitter: @RevolutionR 39