This document discusses approaches to multimodal content language processing. Multimodal interfaces allow for both input and output through multiple channels, like speech and gestures. However, integrating semantic content from different modes poses challenges. The document describes an architecture using agents to process speech, gestures, and their interpretations. A multidimensional chart parser treats input elements as terminals that are combined according to a unification-based multimodal grammar. This allows for parsing of more complex multimodal utterances involving multiple gestures. Constructional meanings are also described to interpret specific multimodal combinations.