This document discusses Hadoop Record Compiler (Jute) and its extension JuteRC. JuteRC automatically generates serialization/deserialization code for data types to support the RC storage format. It extends Jute by adding RC-specific code generation. The execution path involves running rcc against a DDL file, which uses the JRcCodeGenerator to set up members, functions, and get RC read/write methods. These methods write to/read from the RC format using utilities like RcUtil.writeBuffer. JuteRC is packaged with Maven and its usage involves generating the JAR then running rcc against a DDL file.
2. Motivation
Automatically generate serialization/de-
serialization code for any give primitive or
composite data type.
Directly plug serialization/de-serialization code to
generate MapReduce output file that supports RC
storage format.
An extension to the existing Hadoop Record
Compiler (Jute) package – So is named JuteRc.
9. How to use the JuteRC
Maven packaged
README
1) generate JuteRC.jar
% mvn install
2) run rcc against DDL file
% ./rcc --language javarc something.jr