Hw09   Next Steps For Hadoop
 

Hw09 Next Steps For Hadoop

on

  • 2,808 views

 

Statistics

Views

Total Views
2,808
Views on SlideShare
2,794
Embed Views
14

Actions

Likes
2
Downloads
87
Comments
0

1 Embed 14

http://www.slideshare.net 14

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Hw09   Next Steps For Hadoop Hw09 Next Steps For Hadoop Presentation Transcript

  • Next Steps for Hadoop Doug Cutting Cloudera
  • Proviso ● Linus Torvalds: ● “Whatever they contribute.” ● diverse set of contributors ● central planning impossible
  • The Dream ● faster, more reliable, available ● of course ● spreadsheet-like interfaces ● provide non-programmers ● with powerful, interactive tools ● easier sharing ● of data & hardware resources
  • Requirements ● security ● facilitate sharing of resources ● stable cross-language APIs ● facilitate diverse tools & apps ● expressive, inter-operable data ● facilitates sharing of datasets ● facilitates dynamic analyses
  • Data Formats ● today in Hadoop: ● text – pro: inter-operable – con: not expressive, inefficient ● - Java Writable – pro: expressive, efficient – con: platform-specific, fragile
  • Protocol Buffers & Thrift ● expressive ● efficient (small & fast) ● but not very dynamic ● cannot browse arbitrary data ● no DESCRIBE or SHOW ● viewing a new dataset – requires code generation & load ● writing a new dataset – requires generating schema text – plus code generation & load
  • Avro Data ● as expressive ● smaller and faster ● dynamic ● schema stored with data – but factored out of instances ● API permits reading & creating – arbitrary datatypes – without generating & loading code
  • Avro Data ● includes a file format ● includes a textual encoding ● handles versioning ● if schema changes ● can still process data ● Hadoop apps can ● upgrade from text ● and standardize on Avro for data
  • Avro RPC ● leverage versioning support ● to permit different versions of services to interoperate ● for Hadoop services, will ● provide cross-language access ● let apps talk to clusters running different versions
  • Avro Status ● 1.1 release out ● added JSON and comparators ● 1.2 soon ● adds HTTP & UDP-based RPC ● will first appear in Hadoop 0.21 ● as format for job history ● in sequence files
  • Avro Near Future ● full mapreduce support ● used for RPC in Hadoop 0.22 (1.0)?
  • Thanks! What are your next steps?