Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

1,187 views

Published on

Great Wide Open - Day 1

Derek Norton - Revolution Analytics

11:15 AM - Operations 2 (Big Data)

Published in:
Technology

No Downloads

Total views

1,187

On SlideShare

0

From Embeds

0

Number of Embeds

3

Shares

0

Downloads

66

Comments

0

Likes

3

No embeds

No notes for slide

- 1. Big Data Analytics with R Derek McCrae Norton, Senior Sales Engineer April 2, 2014
- 2. Agenda Introduction Big Data Analytics R Revolution R Enterprise Synergy Conclusion © 2013 Revolution Analytics
- 3. Who are you anyway? Statistician – My degrees are all in statistics. Consultant – My experience has been mostly in Marketing Analytics focusing on Predictive Analytics. Sales Engineer – Still consulting, just with a much heavier emphasis on client interaction. Founder/Director Atlanta R Users Group. – Shameless plug. Please join if interested. – http://www.meetup.com/R-Users-Atlanta/ Husband, Father, Outdoorsman, Serial Hobbyist, … © 2013 Revolution Analytics
- 4. Big Data © 2013 Revolution Analytics
- 5. Big Data and Big Opportunities © 2013 Revolution Analytics “Big data is data that exceeds the processing capability of conventional database systems” Edd Dumbill O’Reilly Radar*, Jan 2012 Worldwide data created and replicated, Zettabytes 1 2 35 * radar.oreilly.com/2012/01/what-is-big-data.html
- 6. What is Big Data? Big Data is a loosely defined term used to describe data sets so large and complex that they become awkward to work with using standard statistical software. © 2013 Revolution Analytics Snijders, Matzat, & Reips (2012)
- 7. Does Big Data Mean Hadoop? The short answer is no. The longer answer is maybe. Hadoop adoption is turning that maybe into a probably. © 2013 Revolution Analytics ?
- 8. Analytics © 2013 Revolution Analytics
- 9. What is Analytics? Analytics is the combination of mathematical, statistical, and heuristic techniques to glean useful insights from data and to implement actions derived from those insights. © 2013 Revolution Analytics Derek McCrae Norton
- 10. Analytics The current buzzword is “Data Science,” but I don’t really agree with that nomenclature. – What statistician, analyst, (data scientist) actually follows the scientific method? That being said, the current definition of “Data Science” is a pretty good surrogate for what we are discussing. Whatever descriptors you use, one thing is clear… You must use something to help you carry out the actual work. – R, Python, SAS, etc. – RDBMS, Hadoop, etc. © 2013 Revolution Analytics
- 11. © 2013 Revolution Analytics
- 12. What is the R language? A Platform… – A Procedural Language for Stats, Math and Data Science – A Complete Data Visualization Framework – Provided as Open Source A Community… – 2M+ Users with the Skill to Tackle Big Data Statistical and Numerical Analysis and Machine Learning Projects – Active User Groups Across the World An Ecosystem – CRAN: 5000+ Freely Available Packages – Applicable to Big Data if scaled © 2013 Revolution Analytics
- 13. THE R USER COMMUNITY
- 14. A brief history of R 1993: Research project in Auckland, NZ – Ross Ihaka and Robert Gentlemen 1995: Released as open-source software – Generally compatible with the “S” language 1997: R core group formed 2000: R 1.0.0 released 2004: First international user conference in Vienna 2013: R 3.0.0 released © 2013 Revolution Analytics
- 15. R is Free Open Source, licensed under GPL (like Linux!) – Free as in beer – Free as in freedom Flexible Open for integration – Data (SAS, SPSS, Excel, SQL Server, Oracle, …) – Systems (applications, webservers, …) Broad user-base – De-facto standard for data analysis teaching © 2013 Revolution Analytics
- 16. 16 R is exploding in popularity & function Web Site Popularity Number of links to main web site R SAS SPSS S-Plus Stata Scholarly Activity Google Scholar hits (’05-’09 CAGR) R 46% SAS -11% SPSS -27% S-Plus 0% Stata 10% Internet Discussion Mean monthly traffic on email discussion list R SAS Stata SPSS S-Plus Package Growth Number of R packages listed on CRAN 4,332 as of Feb 2013 © 2013 Revolution Analytics
- 17. So why isn’t everyone using R? “The best thing about R is that it was developed by statisticians. The worst thing about R is that it was developed by statisticians.” © 2013 Revolution Analytics Bo Cowgill Google (at SF R Meetup)
- 18. Otherwise R is Great! Right? Who here has used R? – Thoughts? Who has never seen this? Who here has more than 1 core/processor? Who has ever used r-help? – ’They’ did write documentation that told you that Perl was needed, but ‘they’ can’t read it for you. - Brian D. Ripley, R-help (February 2001) – This is all documented in TFM. Those who WTFM don’t want to have to WTFM again on the mailing list. RTFM. - Barry Rowlingson, R-help (October 2003) © 2013 Revolution Analytics
- 19. What is Revolution R Enterprise? © 2013 Revolution Analytics
- 20. Motivators © 2013 Revolution Analytics Big Data In-memory bound Hybrid memory & disk scalability Operates on bigger volumes & factors Speed of Analysis Single threaded Parallel threading Shrinks analysis time Enterprise Readiness Community support Commercial support Delivers full service production support Analytic Breadth & Depth 5000+ innovative analytic packages Leverage open source packages plus Big Data ready packages Supercharges R Commercial Viability Risk of deployment of open source Commercial license Eliminate risk with open source
- 21. Introducing Revolution R Enterprise (RRE) The Big Data Big Analytics Platform DistributedR DevelopR DeployR ScaleR ConnectR Big Data Big Analytics Ready – Enterprise readiness – High performance analytics – Multi-platform architecture – Data source integration – Development tools – Deployment tools © 2013 Revolution Analytics
- 22. The Platform Step by Step: R Capabilities R+CRAN • Open source R interpreter • UPDATED R 3.0.2 • Freely-available R algorithms • Algorithms callable by RevoR • Embeddable in R scripts • 100% Compatible with existing R scripts, functions and packages RevoR • Performance enhanced R interpreter • Based on open source R • Adds high-performance math Available On: • PlatformTM LSFTM Linux® • Microsoft® HPC Clusters • Windows® & Linux Servers • Windows & Linux Workstations • IBM® Netezza® • NEW Cloudera Hadoop® • NEW Hortonworks Hadoop • NEW Teradata® Database • Intel® Hadoop • IBM BigInsightsTM © 2013 Revolution Analytics
- 23. The Platform Step by Step: Parallelization & Data Sourcing ConnectR • High-speed & direct connectors Available for: • High-performance XDF • SAS, SPSS, delimited & fixed format text data files • Hadoop HDFS (text & XDF) • Teradata Database & Aster • EDWs and ADWs • ODBC ScaleR • Ready-to-Use high-performance big data big analytics • Fully-parallelized analytics • Data prep & data distillation • Descriptive statistics & statistical tests • Correlation & covariance matrices • Predictive Models – linear, logistic, GLM • Machine learning • Monte Carlo simulation • NEW Tools for distributing customized algorithms across nodes DistributedR • Distributed computing framework • Delivers portability across platforms Available on: • Windows Servers • Red Hat and NEW SuSE Linux Servers • IBM Platform LSF Linux • Microsoft HPC Clusters • NEW Teradata Database • NEW Cloudera Hadoop • NEW Hortonworks Hadoop © 2013 Revolution Analytics A single package (RevoScaleR)
- 24. DeployR • Web services software development kit for integration analytics via Java, JavaScript or .NET APIs • Integrates R Into application infrastructures Capabilities: • Invokes R Scripts from web services calls • RESTful interface for easy integration • Works with web & mobile apps, leading BI & Visualization tools and business rules engines DevelopR • Integrated development environment for R • Visual ‘step-into’ debugger Available on: • Windows The Platform Step by Step: Tools & Deployment DevelopR DeployR © 2013 Revolution Analytics
- 25. DistributedR ScaleR ConnectR DeployR Write Once. Deploy Anywhere. DESIGNED FOR SCALE, PORTABILITY & PERFORMANCE In the Cloud Amazon AWS Workstations & Servers Desktop Server Clustered Systems IBM Platform LSF Microsoft HPC EDW Teradata Hadoop Hortonworks Cloudera © 2013 Revolution Analytics
- 26. Synergy © 2013 Revolution Analytics
- 27. Put it all together Talent fresh out of school knows R. RRE is R plus more. RRE provides a unified way of carrying out analytics (small or big). RRE code is portable… © 2013 Revolution Analytics
- 28. Scale and Portability Set “compute context” to define hardware (one line of code) – Native job-scheduler handles distribution, monitoring, failover etc. Same code runs on other supported architectures – Just change compute context © 2013 Revolution Analytics 42 seconds instead of 6 minutes on the local machine
- 29. References 1. Snijders, C., Matzat, U., & Reips, U.-D. (2012). ‘Big Data’: Big gaps of knowledge in the field of Internet. International Journal of Internet Science, 7, 1-5. http://www.ijis.net/ijis7_1/ijis7_1_editorial.html 2. Conway, D, THE DATA SCIENCE VENN DIAGRAM © 2013 Revolution Analytics

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment