This document summarizes research on detecting duplicated code in web applications. It discusses how much duplicated code exists in large software systems, around 20-50%. Detecting duplicated code, or "clones", helps with program understanding, maintenance and refactoring. The document reviews different approaches for detecting clones, including detecting both small fragments and larger structural clones involving groups of files. It also discusses challenges like the large number of clones detected and need to analyze relationships between clones. Several tools for clone detection are evaluated. Finally, information extraction from web pages is discussed, including approaches that detect templates to extract structured data without examples.