This document provides a specification of the web mining process for hypervideo concept identification in the LinkedTV project. It presents a state-of-the-art analysis of named entity recognition and disambiguation techniques, including statistical approaches, knowledge-based approaches, and existing NER web APIs. It also reviews approaches for retrieving additional content from the web, including crawling web pages, retrieving content from structured sources, and APIs. The document describes the first NER tools developed for the project and evaluates their performance. It identifies requirements for the web mining process based on the LinkedTV scenarios.