Next Steps for Hadoop


      Doug Cutting
       Cloudera
Proviso
●   Linus Torvalds:
    ●   “Whatever they contribute.”
    ●   diverse set of contributors
    ●   central planni...
The Dream
●   faster, more reliable, available
    ●   of course
●   spreadsheet-like interfaces
    ●   provide non-progr...
Requirements
●   security
    ●   facilitate sharing of resources
●   stable cross-language APIs
    ●   facilitate divers...
Data Formats
●   today in Hadoop:
    ●   text
        –   pro: inter-operable
        –   con: not expressive, inefficien...
Protocol Buffers & Thrift
●   expressive
●   efficient (small & fast)
●   but not very dynamic
    ●   cannot browse arbit...
Avro Data
●   as expressive
●   smaller and faster
●   dynamic
    ●   schema stored with data
        –   but factored ou...
Avro Data
●   includes a file format
●   includes a textual encoding
●   handles versioning
    ●   if schema changes
    ...
Avro RPC
●   leverage versioning support
    ●   to permit different versions of services to
        interoperate
●   for ...
Avro Status
●   1.1 release out
    ●   added JSON and comparators
●   1.2 soon
    ●   adds HTTP & UDP-based RPC
●   will...
Avro Near Future
●   full mapreduce support
●   used for RPC in Hadoop 0.22 (1.0)?
Thanks!




What are your next steps?
Upcoming SlideShare
Loading in...5
×

Hw09 Next Steps For Hadoop

1,955

Published on

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,955
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
88
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Hw09 Next Steps For Hadoop

  1. 1. Next Steps for Hadoop Doug Cutting Cloudera
  2. 2. Proviso ● Linus Torvalds: ● “Whatever they contribute.” ● diverse set of contributors ● central planning impossible
  3. 3. The Dream ● faster, more reliable, available ● of course ● spreadsheet-like interfaces ● provide non-programmers ● with powerful, interactive tools ● easier sharing ● of data & hardware resources
  4. 4. Requirements ● security ● facilitate sharing of resources ● stable cross-language APIs ● facilitate diverse tools & apps ● expressive, inter-operable data ● facilitates sharing of datasets ● facilitates dynamic analyses
  5. 5. Data Formats ● today in Hadoop: ● text – pro: inter-operable – con: not expressive, inefficient ● - Java Writable – pro: expressive, efficient – con: platform-specific, fragile
  6. 6. Protocol Buffers & Thrift ● expressive ● efficient (small & fast) ● but not very dynamic ● cannot browse arbitrary data ● no DESCRIBE or SHOW ● viewing a new dataset – requires code generation & load ● writing a new dataset – requires generating schema text – plus code generation & load
  7. 7. Avro Data ● as expressive ● smaller and faster ● dynamic ● schema stored with data – but factored out of instances ● API permits reading & creating – arbitrary datatypes – without generating & loading code
  8. 8. Avro Data ● includes a file format ● includes a textual encoding ● handles versioning ● if schema changes ● can still process data ● Hadoop apps can ● upgrade from text ● and standardize on Avro for data
  9. 9. Avro RPC ● leverage versioning support ● to permit different versions of services to interoperate ● for Hadoop services, will ● provide cross-language access ● let apps talk to clusters running different versions
  10. 10. Avro Status ● 1.1 release out ● added JSON and comparators ● 1.2 soon ● adds HTTP & UDP-based RPC ● will first appear in Hadoop 0.21 ● as format for job history ● in sequence files
  11. 11. Avro Near Future ● full mapreduce support ● used for RPC in Hadoop 0.22 (1.0)?
  12. 12. Thanks! What are your next steps?
  1. ¿Le ha llamado la atención una diapositiva en particular?

    Recortar diapositivas es una manera útil de recopilar información importante para consultarla más tarde.

×