Your SlideShare is downloading. ×
0
Revolution Analytics
Revolution Analytics
Revolution Analytics
Revolution Analytics
Revolution Analytics
Revolution Analytics
Revolution Analytics
Revolution Analytics
Revolution Analytics
Revolution Analytics
Revolution Analytics
Revolution Analytics
Revolution Analytics
Revolution Analytics
Revolution Analytics
Revolution Analytics
Revolution Analytics
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Revolution Analytics

8,539

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
8,539
On Slideshare
0
From Embeds
0
Number of Embeds
49
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Solution Spotlight Presents
  • 2. Integrating R and Hadoop
    Part of Revolution Analytics’
    Big Analytics Strategy
    Contact us at info@revolutionanalytics.com
    2
  • 3. Outline
    Introduction to Revolution Analytics
    Opportunity and Challenges of Big Analytics
    Revolution Analytics’ Support of Integration between R and Hadoop
    Contact Info
  • 4. Open Source Analytics for the Enterprise
    • Most advanced statistical analysis software available
    The professor who invented analytic software for the experts now wants to take it to the masses
    • Half the cost of commercial alternatives
    • 5. 2M+ Users
    • 6. 2,500+ Applications
    Finance
    Statistics
    Life Sciences
    Predictive Analytics
    Manufacturing
    Retail
    Data Mining
    Telecom
    Social Media
    Visualization
    Government
  • 7. Revolution has garnered tremendous attention from media and analysts
  • 8. Big Analytics, Big Advantages
    Big Analytics could be
    Simple algorithms running on “Big Data”
    Compute-intensive algorithms running on either “Big Data” or small data sets
    Advanced Analytic routines for data visualization or statistical analysis
  • 9. Extracting Value with Big Analytics
    Big Analytics’ Advantages
    Predict the Future
    Understand Risk and Uncertainty
    Embrace Complexity
    Identify the Unusual
    Think Big
    7
  • 10. Big Analytics Challenges
    Computations are data intensive (i.e. require large amounts of data)
    To be effective, must rely on data parallelism
    Data is distributed across compute nodes
    Same task is run in parallel on each of the data partitions
    Examples of distributed computing frameworks that support data parallelism
    Traditional file based analytics using on-premise clusters
    Hadoop and MapReduce
    In-Database Analytics using parallel hardware architectures
    8
  • 11. Key Objectives for Big Analytics Deployments
    Best performance is achieved when these Big Analytics challenges are overcome:
    Avoid sampling / aggregation;
    Reduce data movement and replication;
    Bring the analytics as close as possible to the data and;
    Optimize computation speed.
    Revolution Analytics’ support for R and Hadoop helps overcome these challenges
  • 12. Revolution Analytics’ RevoConnectRsfor Hadoop
    RevoHDFS provides connectivity from R to HDFS and RevoHBase
    Allows an R programmer to manipulate Hadoop data stores directly from HDFS and HBASE
    RevoHStream allows MapReduce jobs to be developed in R and executed as Hadoop Streaming jobs
    Gives R programmers the ability to write MapReduce jobs in R using Hadoop Streaming
  • 13. R/Hadoop – Revolution Analytics
    HDFS
    HBASE
    • Connectors to HDFS and HBASE for interacting with data stores directly in R
    • 14. Hadoop Streaming package for executing MapReduce jobs from R.
    R
    Map Reduce
    Task Tracker
    Task Node
    R Client
    Job Tracker
  • 15. RevoHDFS
    R package for working with HDFS
    Connect and Browse HDFS
    Read/Write/Delete/Copy/Rename files
    Examples:
    Read an HDFS text file into a data frame
    Serialize a data frame to HDFS
    Stream lines from HDFS text file that can be used with biglm or bigglm
    12
  • 16. RevoHBase
    R Package for working with HBASE
    Connect and Browse HBASE
    Get Rows/Columns of an HBASE table
    Write data to HBASE table
    Create/Delete HBASE table
    Examples
    Create a data frame in R from a collection of Rows/Columns from HBASE
    Update an HBASE table with values from a data frame
    13
  • 17. RevoHStream
    RevoHStream – R package capable of performing the following types of Analysis using Hadoop Streaming
    Simulations - Monte Carlo and other Stochastic analysis
    R ‘apply’ family of operations (tapply, lapply…)
    Binning, quantiles, summaries and crosstabs for input to displays (ggplot, lattice).
    Data transformations
    Data Mining
    14
  • 18. Example MapReduce AlgorithmLogistic Regresion
    ## create test set as follows
    ## rhwrite(lapply (1:100, function(i) {eps = rnorm(1, sd =10) ; keyval(i, list(x = c(i,i+eps), y = 2 * (eps > 0) - 1))}), "/tmp/logreg")
    ## run as:
    ## rhLogisticRegression("/tmp/logreg", 10, 2, 0.05)
    ## max likelihood solution diverges for separable dataset, (-inf, inf) such as the above
    rhLogisticRegression = function(input, iterations, dims, alpha){
    plane = rep(0, dims)
    g = function(z) 1/(1 + exp(-z))
    for (i in 1:iterations) {
    gradient = rhread(revoMapReduce(input,
    map = function(k, v) keyval (1, v$y * v$x * g(-v$y * (plane %*% v$x))),
    reduce = function(k, vv) keyval(k, apply(do.call(rbind,vv),2,sum)),
    combine = T))
    plane = plane + alpha * gradient[[1]]$val }
    plane }
    15
  • 19. Get more information about Revolution Analytics’ Big Analytics Solutions, including R connectors for Hadoop
    1 855-GET-REVO
    16
    http://www.revolutionanalytics.com/big-analytics
  • 20. http://www.cloudera.com/partners

×