This document discusses how to design, acquire, and process a collection of linguistic data to form the raw material for a dictionary. It explains that a reliable dictionary requires evidence from language used in real communicative acts. While introspection and informant testing provide some evidence, they are limited due to subjectivity. Therefore, observation of language in use through large text corpora is indispensable. Key considerations in corpus design include size, inclusion of different text types and styles to avoid bias, and ensuring representativeness.