ContextWeb is an online advertisement company that processes large volumes of log data using Hadoop. They process up to 120GB of raw log files per day. Their Hadoop cluster consists of 40 nodes and processes around 2000 MapReduce jobs per day. They developed techniques for partitioning data by date/time and using file revisions to allow incremental processing while ensuring data consistency and freshness of reports.