Hadoop summit 2010 frameworks panel elephant bird

  • 3,831 views
Uploaded on

 

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
3,831
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
64
Comments
0
Likes
9

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • This is the Title slide. Please use the name of the presentation that was used in the abstract submission.
  • This is the agenda slide. There is only one of these in the deck.
  • This is the agenda slide. There is only one of these in the deck.
  • This is the agenda slide. There is only one of these in the deck.
  • This is the agenda slide. There is only one of these in the deck.
  • This is a topic/content slide. Duplicate as many of these as are needed. Generally, there is one slide per three minutes of talk time.
  • This is a topic/content slide. Duplicate as many of these as are needed. Generally, there is one slide per three minutes of talk time.
  • This is a topic/content slide. Duplicate as many of these as are needed. Generally, there is one slide per three minutes of talk time.
  • This is a topic/content slide. Duplicate as many of these as are needed. Generally, there is one slide per three minutes of talk time.
  • This is a topic/content slide. Duplicate as many of these as are needed. Generally, there is one slide per three minutes of talk time.
  • This is a topic/content slide. Duplicate as many of these as are needed. Generally, there is one slide per three minutes of talk time.

Transcript

  • 1. Hadoop Frameworks
    • Kevin Weil @kevinweil
    Twitter
  • 2.
    • A framework for working with structured data within the Hadoop ecosystem
    Elephant Bird
  • 3.
    • A framework for working with structured data within the Hadoop ecosystem
      • Protocol Buffers
      • Thrift
      • JSON
      • W3C Logs
    Elephant Bird
  • 4.
    • A framework for working with structured data within the Hadoop ecosystem
      • InputFormats
      • OutputFormats
      • Hadoop Writables
      • Pig LoadFuncs
      • Pig StoreFuncs
      • Hbase LoadFuncs
    Elephant Bird
  • 5.
    • A framework for working with structured data within the Hadoop ecosystem… plus:
      • LZO Compression
      • Code Generation
      • Hadoop Counter Utilities
      • Misc Pig UDFs
    Elephant Bird
  • 6.
    • You should only need to specify the data schema
    Why?
  • 7.
    • You should only need to specify the ( flexible, forward-backward compatible, self-documenting ) data schema
    Why?
  • 8.
    • You should only need to specify the ( flexible, forward-backward compatible, self-documenting ) data schema
    • Everything else can be codegen’d.
    Why?
  • 9.
    • You should only need to specify the ( flexible, forward-backward compatible, self-documenting ) data schema
    • Everything else can be codegen’d.
    • Less Code. Efficient Storage. Focus on the Data.
    Why?
  • 10.
    • You should only need to specify the ( flexible, forward-backward compatible, self-documenting ) data schema
    • Everything else can be codegen’d.
    • Less Code. Efficient Storage. Focus on the Data.
    • Underlies 20,000 Hadoop jobs at Twitter every day.
    Why?
  • 11.
    • You should only need to specify the ( flexible, forward-backward compatible, self-documenting ) data schema
    • Everything else can be codegen’d.
    • Less Code. Efficient Storage. Focus on the Data.
    • Underlies 20,000 Hadoop jobs at Twitter every day.
    • http://github.com/kevinweil/elephant-bird : contributors welcome!
    Why?