View stunning SlideShares in full-screen with the new iOS app!Introducing SlideShare for AndroidExplore all your favorite topics in the SlideShare appGet the SlideShare app to Save for Later — even offline
View stunning SlideShares in full-screen with the new Android app!View stunning SlideShares in full-screen with the new iOS app!
The Need for Other Compute Models “… in addition, these data stores often expose a proprietary interface for application programming (e.g. PL/SQL or TSQL), but not the full power of procedural programming. More programmer-friendly parallel dataflow languages await discovery, I think. MapReduce is one (small) step in that direction.” Engineer-to-Engineer Lectures Jeff Hammerbacher June 2010 2
Support for Other Programming Paradigms “MapReduceNextGen provides a completely generic computation framework to support MapReduceand other paradigms.” The Next Generation of Apache Hadoop MapReduce Arun C Murthy February 2011 3
What is dataflow Based on operators that provide a specific function (nodes) Data queues (edges) connecting operators High Productivity Message Passing Architecture Natural Fit for Big Data 4 find grep awk sort
Where it’s been applied Bioinformatics Next Generation Sequencing Nearly 1 TCUP throughput using Smith Waterman Scalable BFAST implementation Telecom Analyzing Call Data Records (network logs) Operational intelligence Fraud and waste detection Public Sector State income tax revenue recovery Cyber security Financial Services Mortgage analysis Healthcare Claims processing and analysis Fraud detection Network Analyzing network log data Cyber security 5