• Like

Loading…

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

Apache Avro and You

  • 5,833 views
Uploaded on

Overview of Apache Avro just before 1.4 release

Overview of Apache Avro just before 1.4 release

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
5,833
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
120
Comments
3
Likes
5

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide


















Transcript

  • 1. Apache Avro and You Boulder/Denver Hadoop Users Group 09.01.2010 © 2010 Eric Wendelin
  • 2. Eric Wendelin Hadooper @returnpath Blog: eriwen.com Twitter: @eriwen
  • 3. Your data and Hadoop • Traditional Hadoop consumes 2 basic types of input: • Plain text - Cross-platform but inefficient • Java Writables - Efficient but tied to Java platform
  • 4. Better: Cross-platform data serialization
  • 5. What about Thrift? • Thrift is expressive and efficient but not dynamic: • Reading/Writing datasets requires code generation and load • Can’t browse arbitrary data
  • 6. Better than Thrift (in most cases) • At least as efficient as Thrift and Protocol Buffers • Dynamic! • Schema stored with data • Arbitrary data types • All without generating and loading code
  • 7. Avro is... • A data serialization system • Focused on dynamic access, platform independence and simple schema evolution
  • 8. Major parts of Avro • Schema language (JSON) • RPC framework • APIs for Python (dynamic), Java (generic and specific), Ruby, C, C++ and more!
  • 9. Example Schema { ‘type’: ‘record’, ‘name’:‘Widget’, ‘fields’: [ {‘name’:‘size’,‘type’:‘long’, ‘default’:0}, {‘name’:‘density’,‘type’:‘float’} ] }
  • 10. Avro RPC • Cross-language access • Permits different versions of services to interoperate
  • 11. Avro Data • Compressible • Splittable (but that doesn’t mean files are easy to combine) • Arbitrary metadata • Can include data with different schemas in the same file and detect dynamically!
  • 12. Why do you care about Avro?
  • 13. Future: Hadoop’s RPC mechanism to be replaced by Avro
  • 14. Using Avro in Hadoop Traditional Class ➡Avro Class Job ➡AvroJob TextInputFormat ➡AvroInputFormat Mapper<K1,V1,K2,V2> ➡AvroMapper<IN,OUT> Reducer<K2,V2,K3,V3> ➡AvroReducer<K,V,OUT>
  • 15. Future of Avro • Version 1.4 (coming soon) will include: • Classes for Avro input/output • MapReduce connector protocol • PHP API
  • 16. ?
  • 17. avro.apache.org Blog: eriwen.com Twitter: @eriwen Email: eric.wendelin@returnpath.net © 2010 Eric Wendelin