FivaTech ： The problem of peer
Reporter ： Che-Min Liao
• Related Work
• Problem Formulation
• System Architecture
• The Approach
• Web data extraction has been an important part for many
web data analysis applications.
• Many web sites contain large sets of pages generated using
a common template or layout.
– EX ： Amazon 、 Ebay 、 Google, etc.
• The key to automatic extraction for these template web pages
depend on whether we can deduce the template automatically.
– There is no need to annotate the web pages for extraction targets.
• According to the kind of extraction targets, the web data
extraction tasks can be classified into three categories ：
– Record-level ： the target is usually constrained to record-wide
– Page-level ： the target aims at page-wide information.
– Site-level ： populate database from pages of a Web site.
• We take FivaTech System as our research, and study it’s
problem to improve the performance.
– It is unsupervised.
– It is both page-level and record-level.
– It has much higher precision than EXALG.
– It is comparable with other record-level extraction systems
like ViPER and MSE.