20090813MEETING

FivaTech ： The problem of peer
node recognition
Reporter ： Che-Min Liao

Outline
• Introduction
• Related Work
• Problem Formulation
• System Architecture
• The Approach
• Experiment
• Conclusion

Introduction
• Web data extraction has been an important part for many
web data analysis applications.
• Many web sites contain large sets of pages generated using
a common template or layout.
– EX ： Amazon 、 Ebay 、 Google, etc.
• The key to automatic extraction for these template web pages
depend on whether we can deduce the template automatically.
– There is no need to annotate the web pages for extraction targets.

Introduction (Cont.)
• According to the kind of extraction targets, the web data
extraction tasks can be classified into three categories ：
– Record-level ： the target is usually constrained to record-wide
information
• DEPTA
• IEPAD
– Page-level ： the target aims at page-wide information.
• RoadRunner
• EXALG
• FivaTech
– Site-level ： populate database from pages of a Web site.

Introduction (Cont.)
• We take FivaTech System as our research, and study it’s
problem to improve the performance.
– It is unsupervised.
– It is both page-level and record-level.
– It has much higher precision than EXALG.
– It is comparable with other record-level extraction systems
like ViPER and MSE.

• Assume the similarity between b1 and b2 is 1.0 ， and the
similarity between tr1~tr4 and tr5~tr6 is 0.6
• The FivaMatchingScore is (1.0+0.6+0.6+0.6+0.6)/5 = 0.68

The problem of FivaMatchingScore
• Case 1. Table structure.
• Case 2. Child trees containing set type data.
• Case 3. Asymmetry.

Case 2. Child trees containing set type
data
• Assume tr5 and tr6 containing set type data, and the similarity
between tr1~tr4 and tr5~tr6 is 0.3.
• The FivaMatchingScore is 1.0/5 = 0.2.

Case 3. Asymmetry
• Assume S(b1,b2) = 1.0, S(tr1,tr5) = 0.6, S(tr4,tr6) = 0.6,
S(tr2~tr4,tr5) = 0.3, S(tr1~tr3,tr6) = 0.3, where S = Similarity.
• FivaMatchingScore(A,B) = (1.0+0.6+0.6)/5 = 0.44
≠ FivaMatchingScore(B,A) = (1.0+0.6+0.6)/3 = 0.86

20090813MEETING

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to 20090813MEETING

Similar to 20090813MEETING (20)

More from marxliouville

More from marxliouville (13)

Recently uploaded

Recently uploaded (20)

20090813MEETING