Your SlideShare is downloading. ×
Big Data and Dataflow: Made for each other
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Big Data and Dataflow: Made for each other

1,167
views

Published on

Lightning talk given at the Big Data Camp in Santa Clara on June 28th, 2011.

Lightning talk given at the Big Data Camp in Santa Clara on June 28th, 2011.

Published in: Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,167
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
14
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • RayNewmarkI’d like to thank everyone for joining us today to learn a little more about how they can achieve the performance they require. My name is Ray Newmark and I’m the Vice President of Sales and Marketing for Pervasive Software’s DataRush business and I’ll be your host. But don’t worry, I’m going to get out of the way very shortly and let our Chief Technologist Jim Falgout have the floor.A few housekeeping notes:This webinar is being recorded, and will be available on our website for viewing.At any time during the webinar, you may enter questions in the Q&A window. We will address them at the end of the presentation. If we can’t get to your question during the time allotted, we will respond to you by email.We will have a few survey questions for you as we go through the presentation. You’ll have the opportunity to enter your answers, and see the polling results.
  • Transcript

    • 1. Big Data and Dataflow:Made for each other
    • 2. The Need for Other Compute Models
      “… in addition, these data stores often expose a proprietary interface for application programming (e.g. PL/SQL or TSQL), but not the full power of procedural programming.  More programmer-friendly parallel dataflow languages await discovery, I think.  MapReduce is one (small) step in that direction.”
      Engineer-to-Engineer Lectures
      Jeff Hammerbacher
      June 2010
      2
    • 3. Support for Other Programming Paradigms
      “MapReduceNextGen provides a completely generic computation framework to support MapReduceand other paradigms.”
      The Next Generation of Apache Hadoop MapReduce
      Arun C Murthy
      February 2011
      3
    • 4. What is dataflow
      Based on operators that provide a specific function (nodes)
      Data queues (edges) connecting operators
      High Productivity
      Message Passing Architecture
      Natural Fit for Big Data
      4
      find
      grep
      awk
      sort
    • 5. Where it’s been applied
      Bioinformatics
      Next Generation Sequencing
      Nearly 1 TCUP throughput using Smith Waterman
      Scalable BFAST implementation
      Telecom
      Analyzing Call Data Records (network logs)
      Operational intelligence
      Fraud and waste detection
      Public Sector
      State income tax revenue recovery
      Cyber security
      Financial Services
      Mortgage analysis
      Healthcare
      Claims processing and analysis
      Fraud detection
      Network
      Analyzing network log data
      Cyber security
      5
    • 6. Lends itself to graphical programming
      6
    • 7. Coming Soon … Community Edition
      Write mapper/reducer using dataflow constructs
      Simple and efficient
      Handles details of formats, data types, record parsing, serialize/deserialize, partition/sort
      FREE!
      FREE!
      FREE!
      7
      Hadoop Distributed
      File System
      Mapper
      Mapper
      Mapper
      Mapper
      DataRush
      DataRush
      DataRush
      DataRush
      Reducer
      Reducer
      DataRush
      DataRush
    • 8. Integration with Hive
      Integrates with Hive
      Distributed DataRush for query execution
      Increases execution efficiency, lowers latency
      Looking for early adopters
      8
    • 9. Come to Austin!
      9
      We are hiring!

    ×