How the growth of R helps
data-driven organizations
succeed
David Smith
7th China R User Conference, May 2014
Chief Commun...
Agenda
 Introduction
 History of R
 Growth of R
 Applications of R
 The R Ecosystem
 Conclusion
Slides will be poste...
What is R?
 Most widely used data analysis software
• Used by 2M+ data scientists, statisticians and analysts
 Most powe...
A brief history of R
 1993: Research project in Auckland, NZ
– Ross Ihaka and Robert Gentlemen
 1995: Released as open-s...
R in the News
New York Times:
Data Analysts Captivated by R’s
Power
6 Jan 2009
5
6
R’s popularity is growing rapidly
R Usage Growth
Rexer Data Miner Survey, 2007-2013
#15: R
• Rexer Data Miner Survey, 20...
7
R is used more than other data science tools
• O’Reilly Strata 2013 Data Science
Salary Survey
• KDNuggets Poll: Top Lan...
8
R is among the highest-paid IT skills in the US
• Dice Tech Salary Survey, January
2014
• O’Reilly Strata 2013 Data Scie...
Applications of R
9
Facebook
• Exploratory Data
Analysis
• Experimental Analysis
“Generally, we use R to move
fast when we get a new data
set....
Facebook
• Big-Data Visualization
“It resonated with
many people. It's not
just a pretty picture,
it's a reaffirmation of
...
Google
12
“The great beauty of R
is that you can modify
it to do all sorts of
things.”
— Hal Varian
Chief Economist,
Googl...
Google
13
“We demonstrate the
utility of massively
parallel computational
infrastructure for
statistical computing
using t...
14
Twitter
• Data Visualization • Semantic clustering
“A common pattern for me is that I'll code a MapReduce
job in Scala,...
15
PublicHealth
• Food poisoning monitor
16
The New York Times
Interactive Features
• Election Forecast
• Dialect Quiz
Data Journalism
• NFL Draft Picks
• Wealth d...
17
The New York Times
Data Visualization
• Facebook IPO
• Baseball legends
18
Video Gaming
• Multiplayer Matchmaking
• Player Churn
• Game design
• Difficulty curve
• Level trouble-spots
• In-game ...
19
Housing
• Crime mapping
“The core innovation that Zillow
offers are its advanced
statistical predictive products,
inclu...
20
John Deere
Statistical Analysis:
• Short Term Demand Forecasting
• Crop Forecasting
• Long Term Demand Forecasting
• Ma...
21
PublicAffairs
• Casualty estimation in Warzones • Political Analysis
22
FinanceandBanking
• Credit Risk Analysis • Financial Networks
23
Pharmaceuticals “R use at the FDA is completely
acceptable and has not caused
any problems.” — Dr Jae
Brodsky, Office o...
Software Vendors and Service Providers
24
25
Revolution Analytics
 Open-Source R Technical Support
 Open Source development
– RHadoop, ParallelR
 Community Suppo...
CompaniesUsingR
26
Social media
Google
Facebook
Twitter
Foursquare
Kickstarter
eHarmony
Finance
American
Century
ANZ
Credi...
27
Why are so many companies using R?
Big Data
Data Science
Competition and Innovation
Open Source
Ecosystem
People
Thank You
David Smith
blog.revolutionanalytics.com
david@revolutionanalytics.com, @revodavid
Bonus Applications
Facebook
• Human Resources • Status Update Trends
• User Profile Images
31
Monsanto
Statistical Analysis:
• Plant Breeding
• Fertility mapping
• Precision Seeding
• Disease Management
• Yield fo...
32
Insurance
• Marketing Analytics• Risk Analysis
• Catastrophe Modeling
33
WeatherandClimate
• Flood Warnings• Climate change forecasts
34
Foursquare
• Recommendation Engine
Power
“We’ve combined Revolution R
Enterprise and Hadoop to build and
deploy customized exploratory data
analysis and GAM ...
Upcoming SlideShare
Loading in...5
×

How the growth of R helps data-driven organizations succeed

22,552

Published on

[Presented to the 7th China R Users Conference, Beijing, May 2014.]

Adoption of the R language has grown rapidly in the last few years, and is ranked as the number-one data science language in several surveys. This accelerating R adoption curve has been driven by the Big Data revolution, and the fact that so many data scientists — having learned R at university — are actively unlocking the secrets hidden in these new, vast data troves.

In more than 6 years of writing for the Revolutions blog, I’ve discovered hundreds of applications of R in business, in government, and in the non-profit sector. Sometimes the use of R is obvious, and sometimes it takes a little bit of detective work to learn how R is operating behind the scenes. In this talk, I’ll begin by presenting some recent statistics on the growth of R. Then I’ll recount some of my favourite applications of R, and show how R is behind some amazing innovations in today’s world.

Published in: Software, Technology, Education
0 Comments
16 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
22,552
On Slideshare
0
From Embeds
0
Number of Embeds
41
Actions
Shares
0
Downloads
109
Comments
0
Likes
16
Embeds 0
No embeds

No notes for slide
  • Companies using R in 2013
    http://blog.revolutionanalytics.com/2013/05/companies-using-open-source-r-in-2013.html

    Title: How the growth of R helps data-driven organizations succeed

    Abstract: Adoption of the R language has grown rapidly in the last few years, and is ranked as the number-one data science language in several surveys. This accelerating R adoption curve has been driven by the Big Data revolution, and the fact that so many data scientists — having learned R at university — are actively unlocking the secrets hidden in these new, vast data troves.

    In more than 6 years of writing for the Revolutions blog, I’ve discovered hundreds of applications of R in business, in government, and in the non-profit sector. Sometimes the use of R is obvious, and sometimes it takes a little bit of detective work to learn how R is operating behind the scenes. In this talk, I’ll begin by presenting some recent statistics on the growth of R. Then I’ll recount some of my favourite applications of R, and show how R is behind some amazing innovations in today’s world.

  • Image reference:
    http://www.facebook.com/notes/facebook-engineering/visualizing-friendships/469716398919
  • http://blog.revolutionanalytics.com/2014/02/r-salary-surveys.html
    http://blog.revolutionanalytics.com/2014/01/in-data-scientist-survey-r-is-the-most-used-tool-other-than-databases.html
    http://blog.revolutionanalytics.com/2013/10/r-usage-skyrocketing-rexer-poll.html
    http://blog.revolutionanalytics.com/2014/02/r-is-15th-of-top-programming-languages-in-latest-redmonk-ranking.html
    http://blog.revolutionanalytics.com/2013/09/top-languages-for-data-science.html
  • Dice Tech Salary Survey, January 2014
    O’Reilly Strata 2013 Data Science Salary Survey

  • A
  • http://blog.revolutionanalytics.com/2013/05/the-arteries-of-the-world-in-tweets.html
    http://blog.revolutionanalytics.com/2012/03/r-twitter-and-mcdonalds.html
  • Fantasy Football: http://blog.revolutionanalytics.com/2013/10/fantasy-football-modeling-with-r.html
  • Xbox: http://blog.revolutionanalytics.com/2014/05/microsoft-uses-r-for-xbox-matchmaking.html
    Other gaming http://blog.revolutionanalytics.com/2013/06/how-big-data-and-statistical-modeling-are-changing-video-games.html
  • http://strataconf.com/stratany2012/public/schedule/detail/26345 Zillow
    Trulia http://blog.revolutionanalytics.com/2011/06/the-residuals-of-crime.html

  • http://www.revolutionanalytics.com/free-webinars/order-fulfillment-forecasting-john-deere-how-r-facilitates-creativity-and-flexibility
    http://blog.revolutionanalytics.com/2012/11/video-how-john-deere-uses-r.html
  • Credit Suisse: http://blog.revolutionanalytics.com/2013/05/sheftel-on-r-on-the-trading-desk.html
  • EHarmony: http://blog.revolutionanalytics.com/2013/11/strata-hadoop-world-2013-recap.html
    Ford: http://www.zdnet.com/fords-big-data-chief-sees-massive-possibilities-but-the-tools-need-work-7000000322/
    CFPB http://blog.revolutionanalytics.com/2012/04/r-at-the-consumer-financial-protection-bureau.html
  • http://blog.revolutionanalytics.com/2013/11/strata-hadoop-world-2013-recap.html
  • Deloitte: http://www.revolutionanalytics.com/free-webinars/actuarial-analytics-r
  • http://blog.revolutionanalytics.com/2012/05/r-and-foursquares-recommendation-engine.html
  • How the growth of R helps data-driven organizations succeed

    1. 1. How the growth of R helps data-driven organizations succeed David Smith 7th China R User Conference, May 2014 Chief Community Officer Revolution Analytics
    2. 2. Agenda  Introduction  History of R  Growth of R  Applications of R  The R Ecosystem  Conclusion Slides will be posted to: blog.revolutionanalytics.com
    3. 3. What is R?  Most widely used data analysis software • Used by 2M+ data scientists, statisticians and analysts  Most powerful statistical programming language • Flexible, extensible and comprehensive for productivity  Create beautiful and unique data visualizations • As seen in New York Times, Twitter and Flowing Data  Thriving open-source community • Leading edge of analytics research  Fills the talent gap • New graduates prefer R R is Hot bit.ly/r-is-hot WHITE PAPER
    4. 4. A brief history of R  1993: Research project in Auckland, NZ – Ross Ihaka and Robert Gentlemen  1995: Released as open-source software – Generally compatible with the “S” language  1997: R core group formed  2000: R 1.0.0 released  2004: R 2.0.0 released, first international user conference in Vienna  2009: New York Times article on R  2013: R 3.0.0 released 4 Photo credit: Robert Gentleman
    5. 5. R in the News New York Times: Data Analysts Captivated by R’s Power 6 Jan 2009 5
    6. 6. 6 R’s popularity is growing rapidly R Usage Growth Rexer Data Miner Survey, 2007-2013 #15: R • Rexer Data Miner Survey, 2013 • RedMonk Programming Language Rankings, 2013
    7. 7. 7 R is used more than other data science tools • O’Reilly Strata 2013 Data Science Salary Survey • KDNuggets Poll: Top Languages for analytics, data mining, data science
    8. 8. 8 R is among the highest-paid IT skills in the US • Dice Tech Salary Survey, January 2014 • O’Reilly Strata 2013 Data Science Salary Survey
    9. 9. Applications of R 9
    10. 10. Facebook • Exploratory Data Analysis • Experimental Analysis “Generally, we use R to move fast when we get a new data set. With R, we don’t need to develop custom tools or write a bunch of code. Instead, we can just go about cleaning and exploring the data.” — Solomon Messing, data scientist at Facebook
    11. 11. Facebook • Big-Data Visualization “It resonated with many people. It's not just a pretty picture, it's a reaffirmation of the impact we have in connecting people, even across oceans and borders.” — Paul Butler, data scientist, Facebook
    12. 12. Google 12 “The great beauty of R is that you can modify it to do all sorts of things.” — Hal Varian Chief Economist, Google • Advertising Effectiveness “R is really important to the point that it's hard to overvalue it.” — Daryl Pregibon Head of Statistics, Google • Economic forecasting
    13. 13. Google 13 “We demonstrate the utility of massively parallel computational infrastructure for statistical computing using the MapReduce paradigm for R. This framework allows users to write computations in a high- level language that are then broken up and distributed to worker tasks in Google datacenters.” • Big-data statistical modeling (Large-Scale Parallel Statistical Forecasting Computations in R, JSM 2011)
    14. 14. 14 Twitter • Data Visualization • Semantic clustering “A common pattern for me is that I'll code a MapReduce job in Scala, do some simple command-line munging on the results, pass the data into Python or R for further analysis, pull from a database to grab some extra fields, and so on, often integrating what I find into some machine learning models in the end” — Ed Chen, Data Scientist, Twitter
    15. 15. 15 PublicHealth • Food poisoning monitor
    16. 16. 16 The New York Times Interactive Features • Election Forecast • Dialect Quiz Data Journalism • NFL Draft Picks • Wealth distribution in USA
    17. 17. 17 The New York Times Data Visualization • Facebook IPO • Baseball legends
    18. 18. 18 Video Gaming • Multiplayer Matchmaking • Player Churn • Game design • Difficulty curve • Level trouble-spots • In-game purchase optimization • Fraud detection • Player communities • Game Analysis VideoGames
    19. 19. 19 Housing • Crime mapping “The core innovation that Zillow offers are its advanced statistical predictive products, including the Zestimate®, the Rent Zestimate and the ZHVI® family of real estate indexes. By using R in production as well as research, Zillow maximizes flexibility and minimizes the latency in rolling out updates and new products.” • Statistical forecasting RealEstate
    20. 20. 20 John Deere Statistical Analysis: • Short Term Demand Forecasting • Crop Forecasting • Long Term Demand Forecasting • Maintenance and Reliability • Production Scheduling • Data Coordination
    21. 21. 21 PublicAffairs • Casualty estimation in Warzones • Political Analysis
    22. 22. 22 FinanceandBanking • Credit Risk Analysis • Financial Networks
    23. 23. 23 Pharmaceuticals “R use at the FDA is completely acceptable and has not caused any problems.” — Dr Jae Brodsky, Office of Biostatistics, Food and Drug Administration Regulatory Drug Approvals • Reproducible research • Accurate, reliable and consistent statistical analysis • Internal reporting (Section 508 compliance)
    24. 24. Software Vendors and Service Providers 24
    25. 25. 25 Revolution Analytics  Open-Source R Technical Support  Open Source development – RHadoop, ParallelR  Community Support – User Group Sponsorship – Meetups, Events – Revolutions Blog  Revolution R Enterprise – Big Data (ScaleR package) – Integration (Web Services API) – Enterprise Platforms – Cloud (Amazon AWS)
    26. 26. CompaniesUsingR 26 Social media Google Facebook Twitter Foursquare Kickstarter eHarmony Finance American Century ANZ Credit Suisse Nationwide Lloyds Media New York Times Economist New Scientist Xbox Software Vendors Revolution Analytics Rstudio Zementis Alteryx SAP IBM SAS Teradata TIBCO Oracle OneTick DataCamp Services Mango Accenture Deloitte Scientific Revenue OpenBI Coursera Analytics Zillow Trulia DataSong Exelate X+1 PredictWise Government FDA CPFB City of Chicago NOAA NIST Public Affairs HRDAG Sunlight Foundation Benetech RealClimate Manufacturing Ford John Deere Monsanto SZMF
    27. 27. 27 Why are so many companies using R? Big Data Data Science Competition and Innovation Open Source Ecosystem People
    28. 28. Thank You David Smith blog.revolutionanalytics.com david@revolutionanalytics.com, @revodavid
    29. 29. Bonus Applications
    30. 30. Facebook • Human Resources • Status Update Trends • User Profile Images
    31. 31. 31 Monsanto Statistical Analysis: • Plant Breeding • Fertility mapping • Precision Seeding • Disease Management • Yield forecasting
    32. 32. 32 Insurance • Marketing Analytics• Risk Analysis • Catastrophe Modeling
    33. 33. 33 WeatherandClimate • Flood Warnings• Climate change forecasts
    34. 34. 34 Foursquare • Recommendation Engine
    35. 35. Power “We’ve combined Revolution R Enterprise and Hadoop to build and deploy customized exploratory data analysis and GAM survival models for our marketing performance management and attribution platform. Given that our data sets are already in the terabytes and are growing rapidly, we depend on Revolution R Enterprise’s scalability and power – we saw about a 4x performance improvement on 50 million records. It works brilliantly.” - CEO, John Wallace, DataSong 4X performance 50M records scored daily Scalability “We’ve been able to scale our solution to a problem that’s so big that most companies could not address it. If we had to go with a different solution we wouldn’t be as efficient as we are now.” - SVP Analytics, Kevin Lyons, eXelate TB’s data from 200+ data sources 10’s thousands attributes 100’s millions of scores daily 2X data 2X attributes no impact on performance Performance “We need a high-performance analytics infrastructure because marketing optimization is a lot like a financial trading. By watching the market constantly for data or market condition updates, we can now identify opportunities for our clients that would otherwise be lost.” - Chief Analytics Officer, Leon Zemel, [x+1] MarketingAnalytics
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×