This document discusses building a Hadoop Record Reader in Python. It summarizes that Hadoop has a native data storage format called Hadoop Record or "Jute" that has a Data Definition Language and compiler. It then discusses how the author built a Python parsing library to parse this format into generic Python data types by using Ply (lex and yacc for Python). Future work mentioned includes building a DDL translator and integrating feedback.