Your SlideShare is downloading. ×
  • Like
Hadoop summit 2010 frameworks panel elephant bird
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Hadoop summit 2010 frameworks panel elephant bird

  • 3,853 views
Published

 

Published in Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
3,853
On SlideShare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
65
Comments
0
Likes
9

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • This is the Title slide. Please use the name of the presentation that was used in the abstract submission.
  • This is the agenda slide. There is only one of these in the deck.
  • This is the agenda slide. There is only one of these in the deck.
  • This is the agenda slide. There is only one of these in the deck.
  • This is the agenda slide. There is only one of these in the deck.
  • This is a topic/content slide. Duplicate as many of these as are needed. Generally, there is one slide per three minutes of talk time.
  • This is a topic/content slide. Duplicate as many of these as are needed. Generally, there is one slide per three minutes of talk time.
  • This is a topic/content slide. Duplicate as many of these as are needed. Generally, there is one slide per three minutes of talk time.
  • This is a topic/content slide. Duplicate as many of these as are needed. Generally, there is one slide per three minutes of talk time.
  • This is a topic/content slide. Duplicate as many of these as are needed. Generally, there is one slide per three minutes of talk time.
  • This is a topic/content slide. Duplicate as many of these as are needed. Generally, there is one slide per three minutes of talk time.

Transcript

  • 1. Hadoop Frameworks
    • Kevin Weil @kevinweil
    Twitter
  • 2.
    • A framework for working with structured data within the Hadoop ecosystem
    Elephant Bird
  • 3.
    • A framework for working with structured data within the Hadoop ecosystem
      • Protocol Buffers
      • Thrift
      • JSON
      • W3C Logs
    Elephant Bird
  • 4.
    • A framework for working with structured data within the Hadoop ecosystem
      • InputFormats
      • OutputFormats
      • Hadoop Writables
      • Pig LoadFuncs
      • Pig StoreFuncs
      • Hbase LoadFuncs
    Elephant Bird
  • 5.
    • A framework for working with structured data within the Hadoop ecosystem… plus:
      • LZO Compression
      • Code Generation
      • Hadoop Counter Utilities
      • Misc Pig UDFs
    Elephant Bird
  • 6.
    • You should only need to specify the data schema
    Why?
  • 7.
    • You should only need to specify the ( flexible, forward-backward compatible, self-documenting ) data schema
    Why?
  • 8.
    • You should only need to specify the ( flexible, forward-backward compatible, self-documenting ) data schema
    • Everything else can be codegen’d.
    Why?
  • 9.
    • You should only need to specify the ( flexible, forward-backward compatible, self-documenting ) data schema
    • Everything else can be codegen’d.
    • Less Code. Efficient Storage. Focus on the Data.
    Why?
  • 10.
    • You should only need to specify the ( flexible, forward-backward compatible, self-documenting ) data schema
    • Everything else can be codegen’d.
    • Less Code. Efficient Storage. Focus on the Data.
    • Underlies 20,000 Hadoop jobs at Twitter every day.
    Why?
  • 11.
    • You should only need to specify the ( flexible, forward-backward compatible, self-documenting ) data schema
    • Everything else can be codegen’d.
    • Less Code. Efficient Storage. Focus on the Data.
    • Underlies 20,000 Hadoop jobs at Twitter every day.
    • http://github.com/kevinweil/elephant-bird : contributors welcome!
    Why?