• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content

Loading…

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

Like this presentation? Why not share!

Apache Avro and You

on

  • 6,413 views

Overview of Apache Avro just before 1.4 release

Overview of Apache Avro just before 1.4 release

Statistics

Views

Total Views
6,413
Views on SlideShare
6,366
Embed Views
47

Actions

Likes
5
Downloads
113
Comments
3

3 Embeds 47

http://www.sozialpapier.com 43
http://sozialpapier.com 3
http://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

13 of 3 previous next Post a comment

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />

Apache Avro and You Apache Avro and You Presentation Transcript

  • Apache Avro and You Boulder/Denver Hadoop Users Group 09.01.2010 © 2010 Eric Wendelin
  • Eric Wendelin Hadooper @returnpath Blog: eriwen.com Twitter: @eriwen
  • Your data and Hadoop • Traditional Hadoop consumes 2 basic types of input: • Plain text - Cross-platform but inefficient • Java Writables - Efficient but tied to Java platform
  • Better: Cross-platform data serialization
  • What about Thrift? • Thrift is expressive and efficient but not dynamic: • Reading/Writing datasets requires code generation and load • Can’t browse arbitrary data
  • Better than Thrift (in most cases) • At least as efficient as Thrift and Protocol Buffers • Dynamic! • Schema stored with data • Arbitrary data types • All without generating and loading code
  • Avro is... • A data serialization system • Focused on dynamic access, platform independence and simple schema evolution
  • Major parts of Avro • Schema language (JSON) • RPC framework • APIs for Python (dynamic), Java (generic and specific), Ruby, C, C++ and more!
  • Example Schema { ‘type’: ‘record’, ‘name’:‘Widget’, ‘fields’: [ {‘name’:‘size’,‘type’:‘long’, ‘default’:0}, {‘name’:‘density’,‘type’:‘float’} ] }
  • Avro RPC • Cross-language access • Permits different versions of services to interoperate
  • Avro Data • Compressible • Splittable (but that doesn’t mean files are easy to combine) • Arbitrary metadata • Can include data with different schemas in the same file and detect dynamically!
  • Why do you care about Avro?
  • Future: Hadoop’s RPC mechanism to be replaced by Avro
  • Using Avro in Hadoop Traditional Class ➡Avro Class Job ➡AvroJob TextInputFormat ➡AvroInputFormat Mapper<K1,V1,K2,V2> ➡AvroMapper<IN,OUT> Reducer<K2,V2,K3,V3> ➡AvroReducer<K,V,OUT>
  • Future of Avro • Version 1.4 (coming soon) will include: • Classes for Avro input/output • MapReduce connector protocol • PHP API
  • ?
  • avro.apache.org Blog: eriwen.com Twitter: @eriwen Email: eric.wendelin@returnpath.net © 2010 Eric Wendelin