This document reviews systems for automatically extracting educational digital objects (EDOs) and metadata from institutional websites. It discusses existing systems like Agathe, Crossmarc, and CiteSeerX that extract information from documents but not linked web pages. The proposed system would crawl a website, extract and classify EDOs into audio, video and text, and store them in a database along with automatically extracted metadata like title, category, author. This aims to assist repositories in identifying documents that could be uploaded along with their metadata.