Big Data and Dataflow:Made for each other<br />
The Need for Other Compute Models<br />“… in addition, these data stores often expose a proprietary interface for applicat...
Support for Other Programming Paradigms<br />“MapReduceNextGen provides a completely generic computation framework to supp...
What is dataflow<br />Based on operators that provide a specific function (nodes)<br />Data queues (edges) connecting oper...
Where it’s been applied<br />Bioinformatics<br />Next Generation Sequencing<br />Nearly 1 TCUP throughput using Smith Wate...
Lends itself to graphical programming<br />6<br />
Coming Soon … Community Edition<br />Write mapper/reducer using dataflow constructs<br />Simple and efficient<br />Handles...
Integration with Hive<br />Integrates with Hive<br />Distributed DataRush for query execution<br />Increases execution eff...
Come to Austin!<br />9<br />We are hiring!<br />
Upcoming SlideShare
Loading in...5
×

Big Data and Dataflow: Made for each other

1,223

Published on

Lightning talk given at the Big Data Camp in Santa Clara on June 28th, 2011.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,223
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
15
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • RayNewmarkI’d like to thank everyone for joining us today to learn a little more about how they can achieve the performance they require. My name is Ray Newmark and I’m the Vice President of Sales and Marketing for Pervasive Software’s DataRush business and I’ll be your host. But don’t worry, I’m going to get out of the way very shortly and let our Chief Technologist Jim Falgout have the floor.A few housekeeping notes:This webinar is being recorded, and will be available on our website for viewing.At any time during the webinar, you may enter questions in the Q&amp;A window. We will address them at the end of the presentation. If we can’t get to your question during the time allotted, we will respond to you by email.We will have a few survey questions for you as we go through the presentation. You’ll have the opportunity to enter your answers, and see the polling results.
  • Big Data and Dataflow: Made for each other

    1. 1. Big Data and Dataflow:Made for each other<br />
    2. 2. The Need for Other Compute Models<br />“… in addition, these data stores often expose a proprietary interface for application programming (e.g. PL/SQL or TSQL), but not the full power of procedural programming.  More programmer-friendly parallel dataflow languages await discovery, I think.  MapReduce is one (small) step in that direction.”<br />Engineer-to-Engineer Lectures<br />Jeff Hammerbacher<br />June 2010<br />2<br />
    3. 3. Support for Other Programming Paradigms<br />“MapReduceNextGen provides a completely generic computation framework to support MapReduceand other paradigms.”<br />The Next Generation of Apache Hadoop MapReduce<br />Arun C Murthy<br />February 2011<br />3<br />
    4. 4. What is dataflow<br />Based on operators that provide a specific function (nodes)<br />Data queues (edges) connecting operators<br />High Productivity<br />Message Passing Architecture<br />Natural Fit for Big Data<br />4<br />find<br />grep<br />awk<br />sort<br />
    5. 5. Where it’s been applied<br />Bioinformatics<br />Next Generation Sequencing<br />Nearly 1 TCUP throughput using Smith Waterman<br />Scalable BFAST implementation<br />Telecom<br />Analyzing Call Data Records (network logs)<br />Operational intelligence<br />Fraud and waste detection<br />Public Sector<br />State income tax revenue recovery<br />Cyber security<br />Financial Services<br />Mortgage analysis<br />Healthcare<br />Claims processing and analysis<br />Fraud detection<br />Network<br />Analyzing network log data<br />Cyber security<br />5<br />
    6. 6. Lends itself to graphical programming<br />6<br />
    7. 7. Coming Soon … Community Edition<br />Write mapper/reducer using dataflow constructs<br />Simple and efficient<br />Handles details of formats, data types, record parsing, serialize/deserialize, partition/sort<br />FREE!<br />FREE!<br />FREE!<br />7<br />Hadoop Distributed <br />File System<br />Mapper<br />Mapper<br />Mapper<br />Mapper<br />DataRush<br />DataRush<br />DataRush<br />DataRush<br />Reducer<br />Reducer<br />DataRush<br />DataRush<br />
    8. 8. Integration with Hive<br />Integrates with Hive<br />Distributed DataRush for query execution<br />Increases execution efficiency, lowers latency<br />Looking for early adopters<br />8<br />
    9. 9. Come to Austin!<br />9<br />We are hiring!<br />
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×