Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Hw09 Next Steps For Hadoop

2,442 views

Published on

Published in: Technology
  • Be the first to comment

Hw09 Next Steps For Hadoop

  1. 1. Next Steps for Hadoop Doug Cutting Cloudera
  2. 2. Proviso ● Linus Torvalds: ● “Whatever they contribute.” ● diverse set of contributors ● central planning impossible
  3. 3. The Dream ● faster, more reliable, available ● of course ● spreadsheet-like interfaces ● provide non-programmers ● with powerful, interactive tools ● easier sharing ● of data & hardware resources
  4. 4. Requirements ● security ● facilitate sharing of resources ● stable cross-language APIs ● facilitate diverse tools & apps ● expressive, inter-operable data ● facilitates sharing of datasets ● facilitates dynamic analyses
  5. 5. Data Formats ● today in Hadoop: ● text – pro: inter-operable – con: not expressive, inefficient ● - Java Writable – pro: expressive, efficient – con: platform-specific, fragile
  6. 6. Protocol Buffers & Thrift ● expressive ● efficient (small & fast) ● but not very dynamic ● cannot browse arbitrary data ● no DESCRIBE or SHOW ● viewing a new dataset – requires code generation & load ● writing a new dataset – requires generating schema text – plus code generation & load
  7. 7. Avro Data ● as expressive ● smaller and faster ● dynamic ● schema stored with data – but factored out of instances ● API permits reading & creating – arbitrary datatypes – without generating & loading code
  8. 8. Avro Data ● includes a file format ● includes a textual encoding ● handles versioning ● if schema changes ● can still process data ● Hadoop apps can ● upgrade from text ● and standardize on Avro for data
  9. 9. Avro RPC ● leverage versioning support ● to permit different versions of services to interoperate ● for Hadoop services, will ● provide cross-language access ● let apps talk to clusters running different versions
  10. 10. Avro Status ● 1.1 release out ● added JSON and comparators ● 1.2 soon ● adds HTTP & UDP-based RPC ● will first appear in Hadoop 0.21 ● as format for job history ● in sequence files
  11. 11. Avro Near Future ● full mapreduce support ● used for RPC in Hadoop 0.22 (1.0)?
  12. 12. Thanks! What are your next steps?

×