This document summarizes text mining techniques for automatic annotation of unstructured content such as images, video and audio. It discusses using multimedia mining to extract structured information from multimedia data through techniques like automatic image annotation and video annotation. Automatic image annotation assigns keywords or tags to images using computer vision and machine learning. Video annotation creates text transcripts from spoken narration in videos and segments videos into clips linked to corresponding text. The document explores models for content-based multimedia retrieval based on visual features and annotations.