Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Aha - Hadoop - Not just for counting words

716 views

Published on

A lightning talk highlighting the difference in power between the Hadoop Streaming and Java apis. In short, the former may handle simpler operations, many more complex operations will require you to combine several map-reduce jobs in a data flow. The power of the multi-job paradigm enables you to tackle large problems, such as large graph traversal operations.

Originally delivered at the 2009 Goruco conference in New York.

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

Aha - Hadoop - Not just for counting words

  1. 1. AHA - DOOP (not just for counting words) Ben.Woosley@gmail.com
  2. 2. STREAMING API Any Language, inc. Ruby Only one pass
  3. 3. JAVA API Java Or DSL: Pig, Cascading Gives you “Main,” the orchestrator: Chaining, Redirecting, Recursing
  4. 4. ◦ COGROUP ◦ CROSS ◦ DISTINCT ◦ DUMP ◦ FILTER ◦ FOREACH ◦ GROUP ◦ JOIN ◦ LIMIT ◦ LOAD ◦ ORDER ◦ SPLIT ◦ STORE ◦ STREAM ◦ UNION
  5. 5. Terabyte 62 Seconds Sort Petabyte 16.25 Hours

×