Transcript of "Indic threads pune12-apache-crunch"
Apache CrunchRahul SharmaApache
Agenda : Issues with MapReduce pipelines Solving with Apache Crunch Data Model & Operations System Workflow Examples Question & Answers 2
Issues with MapReduce Pipelines Unit Testing pipeline ?? You must be joking !! Can someone tell me where is the business logic ?? Chain performance?? Learn Latin(pig) first!! 3
Apache Crunch Is a Java library Contains Collections which can excute Parallel operations Lazy evaluation of Collections at runtime Operations merged at runtime to have efficient chains. Available @ http://incubator.apache.org/crunch/ Based on Google FlumeJava paper 4
Apache Crunch Supports Hadoop version 1 and 2-alpha Supports HBase, jdbc etc Works with Writables, Avro, Thrift and proto-buffers Scala varient also exists Integration with R and Clojure in process Archetype exists for creating sample maven project 5