Indic threads pune12-apache-crunch

918 views
753 views

Published on

The 7th Annual IndicThreads Pune Conference was held on 14-15 December 2012. http://pune12.indicthreads.com/

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
918
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Indic threads pune12-apache-crunch

  1. 1. Apache CrunchRahul SharmaApache
  2. 2. Agenda : Issues with MapReduce pipelines Solving with Apache Crunch Data Model & Operations System Workflow Examples Question & Answers 2
  3. 3. Issues with MapReduce Pipelines Unit Testing pipeline ?? You must be joking !! Can someone tell me where is the business logic ?? Chain performance?? Learn Latin(pig) first!! 3
  4. 4. Apache Crunch Is a Java library Contains Collections which can excute Parallel operations Lazy evaluation of Collections at runtime Operations merged at runtime to have efficient chains. Available @ http://incubator.apache.org/crunch/ Based on Google FlumeJava paper 4
  5. 5. Apache Crunch Supports Hadoop version 1 and 2-alpha Supports HBase, jdbc etc Works with Writables, Avro, Thrift and proto-buffers Scala varient also exists Integration with R and Clojure in process Archetype exists for creating sample maven project 5
  6. 6. Apache Crunch : Data Model  Pipeline  MRPipeline  MemPipeline  PCollection<T>  PTable<K,V>  PGroupTable<K,V>  Source<T>  Target<T>  Emitter<T> 6  PType<K,V>
  7. 7. Apache Crunch : Operations  DoFn<S,T>  CombineFn<S,T>  FilterFn<T>  Joins  Cartesian  Sort  SecondarySort  PObject<T>  BloomFilters 7
  8. 8. Apache Crunch : System Workflow Construct a pipeline Pipeline.done() Map Map Map GBK GBK Reduce Reduce 8 Output
  9. 9. Apache Crunch : Examples  WordCount example  Avro example  Sorting example  SecondarySort  Join Example  BloomFilters 9
  10. 10. Write to me : rsharma@apache.orgExample src : http://github.com/rahul0208 10Blog : devlearnings.wordpress.com

×