This document provides instructions for IR Homework #1, which involves building an inverted index for a text collection. The input will be the ClueWeb09 dataset containing over 1 billion web pages. The output should be inverted index files with a dictionary file listing vocabularies and postings lists showing term occurrences in documents. Optional functionality may include efficiency techniques, tokenization settings, and support for multiple input formats. The program and documentation are due in two weeks and will be evaluated based on correctness and any optional features. Students will submit their work electronically and may be asked to demo if the submission does not run properly.