Hadoop Frameworks <ul><li>Kevin Weil  @kevinweil </li></ul>Twitter
<ul><li>A framework for working with structured data within the Hadoop ecosystem </li></ul>Elephant Bird
<ul><li>A framework for working with  structured  data within the Hadoop ecosystem </li></ul><ul><ul><li>Protocol Buffers ...
<ul><li>A framework for working with structured data within the  Hadoop ecosystem </li></ul><ul><ul><li>InputFormats </li>...
<ul><li>A framework for working with structured data within the  Hadoop ecosystem… plus: </li></ul><ul><ul><li>LZO Compres...
<ul><li>You should only need to specify the data schema </li></ul>Why?
<ul><li>You should only need to specify the ( flexible, forward-backward compatible, self-documenting )   data schema </li...
<ul><li>You should only need to specify the ( flexible, forward-backward compatible, self-documenting )   data schema </li...
<ul><li>You should only need to specify the ( flexible, forward-backward compatible, self-documenting )   data schema </li...
<ul><li>You should only need to specify the ( flexible, forward-backward compatible, self-documenting )   data schema </li...
<ul><li>You should only need to specify the ( flexible, forward-backward compatible, self-documenting )   data schema </li...
Upcoming SlideShare
Loading in...5
×

Hadoop summit 2010 frameworks panel elephant bird

3,994

Published on

Published in: Technology
0 Comments
9 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,994
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
65
Comments
0
Likes
9
Embeds 0
No embeds

No notes for slide
  • This is the Title slide. Please use the name of the presentation that was used in the abstract submission.
  • This is the agenda slide. There is only one of these in the deck.
  • This is the agenda slide. There is only one of these in the deck.
  • This is the agenda slide. There is only one of these in the deck.
  • This is the agenda slide. There is only one of these in the deck.
  • This is a topic/content slide. Duplicate as many of these as are needed. Generally, there is one slide per three minutes of talk time.
  • This is a topic/content slide. Duplicate as many of these as are needed. Generally, there is one slide per three minutes of talk time.
  • This is a topic/content slide. Duplicate as many of these as are needed. Generally, there is one slide per three minutes of talk time.
  • This is a topic/content slide. Duplicate as many of these as are needed. Generally, there is one slide per three minutes of talk time.
  • This is a topic/content slide. Duplicate as many of these as are needed. Generally, there is one slide per three minutes of talk time.
  • This is a topic/content slide. Duplicate as many of these as are needed. Generally, there is one slide per three minutes of talk time.
  • Hadoop summit 2010 frameworks panel elephant bird

    1. 1. Hadoop Frameworks <ul><li>Kevin Weil @kevinweil </li></ul>Twitter
    2. 2. <ul><li>A framework for working with structured data within the Hadoop ecosystem </li></ul>Elephant Bird
    3. 3. <ul><li>A framework for working with structured data within the Hadoop ecosystem </li></ul><ul><ul><li>Protocol Buffers </li></ul></ul><ul><ul><li>Thrift </li></ul></ul><ul><ul><li>JSON </li></ul></ul><ul><ul><li>W3C Logs </li></ul></ul>Elephant Bird
    4. 4. <ul><li>A framework for working with structured data within the Hadoop ecosystem </li></ul><ul><ul><li>InputFormats </li></ul></ul><ul><ul><li>OutputFormats </li></ul></ul><ul><ul><li>Hadoop Writables </li></ul></ul><ul><ul><li>Pig LoadFuncs </li></ul></ul><ul><ul><li>Pig StoreFuncs </li></ul></ul><ul><ul><li>Hbase LoadFuncs </li></ul></ul>Elephant Bird
    5. 5. <ul><li>A framework for working with structured data within the Hadoop ecosystem… plus: </li></ul><ul><ul><li>LZO Compression </li></ul></ul><ul><ul><li>Code Generation </li></ul></ul><ul><ul><li>Hadoop Counter Utilities </li></ul></ul><ul><ul><li>Misc Pig UDFs </li></ul></ul>Elephant Bird
    6. 6. <ul><li>You should only need to specify the data schema </li></ul>Why?
    7. 7. <ul><li>You should only need to specify the ( flexible, forward-backward compatible, self-documenting ) data schema </li></ul>Why?
    8. 8. <ul><li>You should only need to specify the ( flexible, forward-backward compatible, self-documenting ) data schema </li></ul><ul><li>Everything else can be codegen’d. </li></ul>Why?
    9. 9. <ul><li>You should only need to specify the ( flexible, forward-backward compatible, self-documenting ) data schema </li></ul><ul><li>Everything else can be codegen’d. </li></ul><ul><li>Less Code. Efficient Storage. Focus on the Data. </li></ul>Why?
    10. 10. <ul><li>You should only need to specify the ( flexible, forward-backward compatible, self-documenting ) data schema </li></ul><ul><li>Everything else can be codegen’d. </li></ul><ul><li>Less Code. Efficient Storage. Focus on the Data. </li></ul><ul><li>Underlies 20,000 Hadoop jobs at Twitter every day. </li></ul>Why?
    11. 11. <ul><li>You should only need to specify the ( flexible, forward-backward compatible, self-documenting ) data schema </li></ul><ul><li>Everything else can be codegen’d. </li></ul><ul><li>Less Code. Efficient Storage. Focus on the Data. </li></ul><ul><li>Underlies 20,000 Hadoop jobs at Twitter every day. </li></ul><ul><li>http://github.com/kevinweil/elephant-bird : contributors welcome! </li></ul>Why?
    1. Gostou de algum slide específico?

      Recortar slides é uma maneira fácil de colecionar informações para acessar mais tarde.

    ×