Your SlideShare is downloading. ×
0
Big Data Analytics with R
Derek McCrae Norton, Senior Sales Engineer
April 2, 2014
Agenda
 Introduction
 Big Data
 Analytics
 R
 Revolution R Enterprise
 Synergy
 Conclusion
© 2013 Revolution Analyt...
Who are you anyway?
 Statistician
– My degrees are all in statistics.
 Consultant
– My experience has been mostly in Mar...
Big Data
© 2013 Revolution Analytics
Big Data and Big Opportunities
© 2013 Revolution Analytics
“Big data is data that
exceeds the processing
capability of con...
What is Big Data?
Big Data is a loosely defined term used to describe
data sets so large and complex that they become
awkw...
Does Big Data Mean Hadoop?
 The short answer is no.
 The longer answer is maybe.
 Hadoop adoption is
turning that maybe...
Analytics
© 2013 Revolution Analytics
What is Analytics?
Analytics is the combination of mathematical,
statistical, and heuristic techniques to glean useful
ins...
Analytics
 The current buzzword is “Data Science,” but I
don’t really agree with that nomenclature.
– What statistician, ...
© 2013 Revolution Analytics
What is the R language?
 A Platform…
– A Procedural Language for Stats, Math and Data Science
– A Complete Data Visualiza...
THE R USER COMMUNITY
A brief history of R
 1993: Research project in Auckland, NZ
– Ross Ihaka and Robert Gentlemen
 1995: Released as open-s...
R is Free
 Open Source, licensed under GPL (like Linux!)
– Free as in beer
– Free as in freedom
 Flexible
 Open for int...
16
R is exploding in popularity & function
Web Site Popularity
Number of links to main web site
R
SAS
SPSS
S-Plus
Stata
Sc...
So why isn’t everyone using R?
“The best thing about R is that it was developed by
statisticians. The worst thing about R ...
Otherwise R is Great! Right?
 Who here has used R?
– Thoughts?
 Who has never seen this?
 Who here has more than 1 core...
What is Revolution R
Enterprise?
© 2013 Revolution Analytics
Motivators
© 2013 Revolution Analytics
Big Data In-memory bound Hybrid memory & disk
scalability
Operates on bigger
volume...
Introducing Revolution R Enterprise
(RRE)
The Big Data Big Analytics Platform
DistributedR
DevelopR DeployR
ScaleR
Connect...
The Platform Step by Step:
R Capabilities
R+CRAN
• Open source R interpreter
• UPDATED R 3.0.2
• Freely-available R algori...
The Platform Step by Step:
Parallelization & Data Sourcing ConnectR
• High-speed & direct connectors
Available for:
• High...
DeployR
• Web services software
development kit for integration
analytics via Java, JavaScript or
.NET APIs
• Integrates R...
DistributedR
ScaleR
ConnectR
DeployR
Write Once. Deploy Anywhere.
DESIGNED FOR SCALE, PORTABILITY & PERFORMANCE
In the Clo...
Synergy
© 2013 Revolution Analytics
Put it all together
 Talent fresh out of school knows R.
 RRE is R plus more.
 RRE provides a unified way of carrying o...
Scale and Portability
 Set “compute context” to define hardware (one line of code)
– Native job-scheduler handles distrib...
References
1. Snijders, C., Matzat, U., & Reips, U.-D. (2012). ‘Big Data’: Big gaps of
knowledge in the field of Internet....
Big Data Analytics with R
Upcoming SlideShare
Loading in...5
×

Big Data Analytics with R

494

Published on

Great Wide Open - Day 1
Derek Norton - Revolution Analytics
11:15 AM - Operations 2 (Big Data)

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
494
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
49
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "Big Data Analytics with R"

  1. 1. Big Data Analytics with R Derek McCrae Norton, Senior Sales Engineer April 2, 2014
  2. 2. Agenda  Introduction  Big Data  Analytics  R  Revolution R Enterprise  Synergy  Conclusion © 2013 Revolution Analytics
  3. 3. Who are you anyway?  Statistician – My degrees are all in statistics.  Consultant – My experience has been mostly in Marketing Analytics focusing on Predictive Analytics.  Sales Engineer – Still consulting, just with a much heavier emphasis on client interaction.  Founder/Director Atlanta R Users Group. – Shameless plug. Please join if interested. – http://www.meetup.com/R-Users-Atlanta/  Husband, Father, Outdoorsman, Serial Hobbyist, … © 2013 Revolution Analytics
  4. 4. Big Data © 2013 Revolution Analytics
  5. 5. Big Data and Big Opportunities © 2013 Revolution Analytics “Big data is data that exceeds the processing capability of conventional database systems” Edd Dumbill O’Reilly Radar*, Jan 2012 Worldwide data created and replicated, Zettabytes 1 2 35 * radar.oreilly.com/2012/01/what-is-big-data.html
  6. 6. What is Big Data? Big Data is a loosely defined term used to describe data sets so large and complex that they become awkward to work with using standard statistical software. © 2013 Revolution Analytics Snijders, Matzat, & Reips (2012)
  7. 7. Does Big Data Mean Hadoop?  The short answer is no.  The longer answer is maybe.  Hadoop adoption is turning that maybe into a probably. © 2013 Revolution Analytics ?
  8. 8. Analytics © 2013 Revolution Analytics
  9. 9. What is Analytics? Analytics is the combination of mathematical, statistical, and heuristic techniques to glean useful insights from data and to implement actions derived from those insights. © 2013 Revolution Analytics Derek McCrae Norton
  10. 10. Analytics  The current buzzword is “Data Science,” but I don’t really agree with that nomenclature. – What statistician, analyst, (data scientist) actually follows the scientific method?  That being said, the current definition of “Data Science” is a pretty good surrogate for what we are discussing.  Whatever descriptors you use, one thing is clear… You must use something to help you carry out the actual work. – R, Python, SAS, etc. – RDBMS, Hadoop, etc. © 2013 Revolution Analytics
  11. 11. © 2013 Revolution Analytics
  12. 12. What is the R language?  A Platform… – A Procedural Language for Stats, Math and Data Science – A Complete Data Visualization Framework – Provided as Open Source  A Community… – 2M+ Users with the Skill to Tackle Big Data Statistical and Numerical Analysis and Machine Learning Projects – Active User Groups Across the World  An Ecosystem – CRAN: 5000+ Freely Available Packages – Applicable to Big Data if scaled © 2013 Revolution Analytics
  13. 13. THE R USER COMMUNITY
  14. 14. A brief history of R  1993: Research project in Auckland, NZ – Ross Ihaka and Robert Gentlemen  1995: Released as open-source software – Generally compatible with the “S” language  1997: R core group formed  2000: R 1.0.0 released  2004: First international user conference in Vienna  2013: R 3.0.0 released © 2013 Revolution Analytics
  15. 15. R is Free  Open Source, licensed under GPL (like Linux!) – Free as in beer – Free as in freedom  Flexible  Open for integration – Data (SAS, SPSS, Excel, SQL Server, Oracle, …) – Systems (applications, webservers, …)  Broad user-base – De-facto standard for data analysis teaching © 2013 Revolution Analytics
  16. 16. 16 R is exploding in popularity & function Web Site Popularity Number of links to main web site R SAS SPSS S-Plus Stata Scholarly Activity Google Scholar hits (’05-’09 CAGR) R 46% SAS -11% SPSS -27% S-Plus 0% Stata 10% Internet Discussion Mean monthly traffic on email discussion list R SAS Stata SPSS S-Plus Package Growth Number of R packages listed on CRAN 4,332 as of Feb 2013 © 2013 Revolution Analytics
  17. 17. So why isn’t everyone using R? “The best thing about R is that it was developed by statisticians. The worst thing about R is that it was developed by statisticians.” © 2013 Revolution Analytics Bo Cowgill Google (at SF R Meetup)
  18. 18. Otherwise R is Great! Right?  Who here has used R? – Thoughts?  Who has never seen this?  Who here has more than 1 core/processor?  Who has ever used r-help? – ’They’ did write documentation that told you that Perl was needed, but ‘they’ can’t read it for you. - Brian D. Ripley, R-help (February 2001) – This is all documented in TFM. Those who WTFM don’t want to have to WTFM again on the mailing list. RTFM. - Barry Rowlingson, R-help (October 2003) © 2013 Revolution Analytics
  19. 19. What is Revolution R Enterprise? © 2013 Revolution Analytics
  20. 20. Motivators © 2013 Revolution Analytics Big Data In-memory bound Hybrid memory & disk scalability Operates on bigger volumes & factors Speed of Analysis Single threaded Parallel threading Shrinks analysis time Enterprise Readiness Community support Commercial support Delivers full service production support Analytic Breadth & Depth 5000+ innovative analytic packages Leverage open source packages plus Big Data ready packages Supercharges R Commercial Viability Risk of deployment of open source Commercial license Eliminate risk with open source
  21. 21. Introducing Revolution R Enterprise (RRE) The Big Data Big Analytics Platform DistributedR DevelopR DeployR ScaleR ConnectR  Big Data Big Analytics Ready – Enterprise readiness – High performance analytics – Multi-platform architecture – Data source integration – Development tools – Deployment tools © 2013 Revolution Analytics
  22. 22. The Platform Step by Step: R Capabilities R+CRAN • Open source R interpreter • UPDATED R 3.0.2 • Freely-available R algorithms • Algorithms callable by RevoR • Embeddable in R scripts • 100% Compatible with existing R scripts, functions and packages RevoR • Performance enhanced R interpreter • Based on open source R • Adds high-performance math Available On: • PlatformTM LSFTM Linux® • Microsoft® HPC Clusters • Windows® & Linux Servers • Windows & Linux Workstations • IBM® Netezza® • NEW Cloudera Hadoop® • NEW Hortonworks Hadoop • NEW Teradata® Database • Intel® Hadoop • IBM BigInsightsTM © 2013 Revolution Analytics
  23. 23. The Platform Step by Step: Parallelization & Data Sourcing ConnectR • High-speed & direct connectors Available for: • High-performance XDF • SAS, SPSS, delimited & fixed format text data files • Hadoop HDFS (text & XDF) • Teradata Database & Aster • EDWs and ADWs • ODBC ScaleR • Ready-to-Use high-performance big data big analytics • Fully-parallelized analytics • Data prep & data distillation • Descriptive statistics & statistical tests • Correlation & covariance matrices • Predictive Models – linear, logistic, GLM • Machine learning • Monte Carlo simulation • NEW Tools for distributing customized algorithms across nodes DistributedR • Distributed computing framework • Delivers portability across platforms Available on: • Windows Servers • Red Hat and NEW SuSE Linux Servers • IBM Platform LSF Linux • Microsoft HPC Clusters • NEW Teradata Database • NEW Cloudera Hadoop • NEW Hortonworks Hadoop © 2013 Revolution Analytics A single package (RevoScaleR)
  24. 24. DeployR • Web services software development kit for integration analytics via Java, JavaScript or .NET APIs • Integrates R Into application infrastructures Capabilities: • Invokes R Scripts from web services calls • RESTful interface for easy integration • Works with web & mobile apps, leading BI & Visualization tools and business rules engines DevelopR • Integrated development environment for R • Visual ‘step-into’ debugger Available on: • Windows The Platform Step by Step: Tools & Deployment DevelopR DeployR © 2013 Revolution Analytics
  25. 25. DistributedR ScaleR ConnectR DeployR Write Once. Deploy Anywhere. DESIGNED FOR SCALE, PORTABILITY & PERFORMANCE In the Cloud Amazon AWS Workstations & Servers Desktop Server Clustered Systems IBM Platform LSF Microsoft HPC EDW Teradata Hadoop Hortonworks Cloudera © 2013 Revolution Analytics
  26. 26. Synergy © 2013 Revolution Analytics
  27. 27. Put it all together  Talent fresh out of school knows R.  RRE is R plus more.  RRE provides a unified way of carrying out analytics (small or big).  RRE code is portable… © 2013 Revolution Analytics
  28. 28. Scale and Portability  Set “compute context” to define hardware (one line of code) – Native job-scheduler handles distribution, monitoring, failover etc.  Same code runs on other supported architectures – Just change compute context © 2013 Revolution Analytics 42 seconds instead of 6 minutes on the local machine
  29. 29. References 1. Snijders, C., Matzat, U., & Reips, U.-D. (2012). ‘Big Data’: Big gaps of knowledge in the field of Internet. International Journal of Internet Science, 7, 1-5. http://www.ijis.net/ijis7_1/ijis7_1_editorial.html 2. Conway, D, THE DATA SCIENCE VENN DIAGRAM © 2013 Revolution Analytics
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×