Big Data Analysis Starts with R
 

Like this? Share it with your network

Share

Big Data Analysis Starts with R

on

  • 5,851 views

 

Statistics

Views

Total Views
5,851
Views on SlideShare
2,775
Embed Views
3,076

Actions

Likes
2
Downloads
180
Comments
0

6 Embeds 3,076

http://www.revolutionanalytics.com 3032
http://yonniedev.devcloud.acquia-sites.com 25
http://localhost 14
http://yonnie.devcloud.acquia-sites.com 2
http://www.revolution-computing.com 2
http://revolutionanalytics.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Big Data Analysis Starts with R Presentation Transcript

  • 1. R evolution A nalytic sT he B ig Data A nalytic s R evolutionS tarts with RDec ember 20, 2011 1
  • 2. In Today’s Webinar: About Revolution Analytics Getting Value with Advanced Analytics Implementing The Advanced Analytics Stack Resources and Further Reading
  • 3. Most advanced statisticalanalysis software available The professor who invented analytic software forHalf the cost of the experts now wants to take it to the massescommercial alternatives2M+ Users Power 4,000+ Applications Finance Statistics Life Sciences Predictive Manufacturing Analytics Productivity Retail Data Mining Telecom Enterprise Visualization Social Media Readiness Government
  • 4. What is R ?Data analysis software An open-source software projectA programming language A communityAn environment 4
  • 5. What’s the Differenc e B etween R andR evolution R E nterpris e? Revolution R is 100% R and More® Multi-Threaded Web-Based Web Services Big Data Parallel Math Libraries GUI API Analysis Tools Technical IDE / Developer Support GUI 4,000+ Community Build Packages R Engine Assurance Language Libraries 5
  • 6. L et’s Talk about B ig Data 6
  • 7. E xtrac ting Value with A dvanc ed A nalytic s Missing the potential value of the data that is being collected Need more than counts and averages Advanced Analytics with Big Data Predict the Future Understand Risk and Uncertainty Embrace Complexity Identify the Unusual Think Big 7
  • 8. R : A Unique P latform for E xtrac ting Value fromData Data Exploration • R is superior at exploring data to find unexpected trends and relationships…finding the best predictive models and identify critical “outliers”, such as clusters of customers who are particularly and Visualization profitable(or unprofitable!). • Google, LinkedIn and Facebook, rely on R and the skills of data scientists who are accustomed to hacking together large data sets Data Science from disparate sources, visualizing and exploring data to identify novel modeling techniques, and combining the results of several modeling strategies to optimize predictive power. Modeling •Other commercial programs push users through a pre-programmed procedure and discourages modeling innovation. R was created as a 4GL with the needs of modern data scientists in mind, with an interactive language that Innovation promotes data exploration, data visualization, and flexible data modeling. Talent •R is creating a massive amount of talent because is now the dominant tool of choice at the universities. 8
  • 9. Making It WorkUs e C as es for B ig Data A nalytic s deployment 9
  • 10. T he A dvanc ed A nalytic s S tac k Deployment / Consumption Advanced Analytics ETL Data / Infrastructure “Open Analytics Stack” White Paper: bit.ly/lC43Kw 10
  • 11. B es t P rac tic es for Implementing an A dvanc edA nalytic s S tac k for B ig Data Limit sampling Reduce data movement and replication Bring the analytics as close as possible to the data Optimize computation speed – parallel algorithms 11
  • 12. B ig Data C omputations Computations are data intensive To be effective, must rely on data parallelism Data is distributed across compute nodes Same task is run in parallel on each of the data partitions Examples of distributed computing frameworks that support data parallelism Traditional file based analytics using on-premise clusters Hadoop and MapReduce In-Database Analytics using parallel hardware architectures 12
  • 13. R evolution R E nterpris e: B ig Data S tatis tic s in R www.revolutionanalytics.com/bigdataEvery US airlinedeparture and arrival,1987-2008File: AirlineData87to08.xdfRows: 123.5 millionVariables: 29Size on disk: 13.2Gb arrDelayLm2 <- rxLinMod(ArrDelay ~ DayOfWeek:F(CRSDepTime),cube=TRUE) 13
  • 14. R evoS c aleR – Dis tributed C omputing Compute • Portions of the data source are Data Node made available to each compute Partition (RevoScaleR) node • RevoScaleR on the master node Compute assigns a task to each compute Data Node node Partition (RevoScaleR) Master • Each compute node independently Node processes its data, and returns its Compute (RevoScaleR) intermediate results back to the Data Node master node Partition (RevoScaleR) • master node aggregates all of the intermediate results from each Compute compute node and produces the Data Node final result Partition (RevoScaleR) 14
  • 15. R and Hadoop Capabilities delivered as individual HBASE R packages HDFS rhdfs - R and HDFS R Thrift rhbase - R and HBASE Map or Reduce rmr - R and MapReduce Task rhbase rhdfs Node Downloads available from R Client Github Job Tracker rmr 15
  • 16. R evolution A nalytic s with Netezza A pplianc e 16
  • 17. Deployment with R evolution R E nterpris eEnd User Desktop Business Interactive Web Applications Intelligence Applications (i.e. Excel) (i.e. QlikView)Application Client libraries (JavaScript, Java, .NET)Developer HTTP/HTTPS – JSON/XML RevoDeployR Web ServicesAdmin Session Data/Script Authentication Administration Management ManagementR RProgrammer R R 17
  • 18. T hree final thoughts Now enterprise-ready, R offers innovation and flexibility needed to meet analytics challenges in a changing world R-enabled advanced analytics are key to unlocking value in big data Revolution Analytics optimizes R to take advantage of multiple data management paradigms and emerging best practices 18
  • 19. R es ourc es Slides / Replay: bit.ly/r-big-data “Open Analytics Stack” White Paper: bit.ly/lC43Kw McKinsey Report on Big Data: bit.ly/jWyrFM Conway, Data Science Intelligence: bit.ly/myMwak “Big Analytics” White Paper by Norman H. Nie: bit.ly/biganalytics Revolution R Enterprise: bit.ly/Enterprise-R Questions: david.champagne@revolutionanalytics.com 19
  • 20. T hank you. The leading commercial provider of software and support for the popular open source R statistics language. www.revolutionanalytics.com 650.330.0553 Twitter: @RevolutionR 20