×
  • Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
 

High Performance Predictive Analytics in R and Hadoop

by on Jul 10, 2013

  • 4,384 views

Hadoop is rapidly being adopted as a major platform for storing and managing massive amounts of data, and for computing descriptive and query types of analytics on that data. However, it has a ...

Hadoop is rapidly being adopted as a major platform for storing and managing massive amounts of data, and for computing descriptive and query types of analytics on that data. However, it has a reputation for not being a suitable environment for high performance complex iterative algorithms such as logistic regression, generalized linear models, and decision trees. At Revolution Analytics we think that reputation is unjustified, and in this talk I discuss the approach we have taken to porting our suite of High Performance Analytics algorithms to run natively and efficiently in Hadoop. Our algorithms are written in C++ and R, and are based on a platform that automatically and efficiently parallelizes a broad class of algorithms called Parallel External Memory Algorithms (PEMA’s). This platform abstracts both the inter-process communication layer and the data source layer, so that the algorithms can work in almost any environment in which messages can be passed among processes and with almost any data source. MPI and RPC are two traditional ways to send messages, but messages can also be passed using files, as in Hadoop. I describe how we use the file-based communication choreographed by MapReduce and how we efficiently access data stored in HDFS.

Statistics

Views

Total Views
4,384
Views on SlideShare
4,174
Embed Views
210

Actions

Likes
6
Downloads
108
Comments
0

2 Embeds 210

http://www.scoop.it 199
https://twitter.com 11

Accessibility

Categories

Upload Details

Uploaded via SlideShare as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
Post Comment
Edit your comment

High Performance Predictive Analytics in R and Hadoop High Performance Predictive Analytics in R and Hadoop Presentation Transcript