This document outlines a thesis submitted for a Doctor of Philosophy degree focused on information extraction from semi-structured web pages. The thesis contains two main parts: 1) A survey and comparative analysis of existing information extraction systems, analyzing them based on task domain, techniques used, and degree of automation. 2) A new approach called FiVaTech for page-level extraction of web data from template pages without supervision by automatically detecting schemas and templates.