20090813MEETING

748 views

Published on

Published in: Technology, Sports
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
748
On SlideShare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

20090813MEETING

  1. 1. FivaTech : The problem of peer node recognition Reporter : Che-Min Liao
  2. 2. Outline • Introduction • Related Work • Problem Formulation • System Architecture • The Approach • Experiment • Conclusion
  3. 3. Introduction • Web data extraction has been an important part for many web data analysis applications. • Many web sites contain large sets of pages generated using a common template or layout. – EX : Amazon 、 Ebay 、 Google, etc. • The key to automatic extraction for these template web pages depend on whether we can deduce the template automatically. – There is no need to annotate the web pages for extraction targets.
  4. 4. Introduction (Cont.) • According to the kind of extraction targets, the web data extraction tasks can be classified into three categories : – Record-level : the target is usually constrained to record-wide information • DEPTA • IEPAD – Page-level : the target aims at page-wide information. • RoadRunner • EXALG • FivaTech – Site-level : populate database from pages of a Web site.
  5. 5. Introduction (Cont.) • We take FivaTech System as our research, and study it’s problem to improve the performance. – It is unsupervised. – It is both page-level and record-level. – It has much higher precision than EXALG. – It is comparable with other record-level extraction systems like ViPER and MSE.
  6. 6. FivaMatchingScore
  7. 7. • Assume the similarity between b1 and b2 is 1.0 , and the similarity between tr1~tr4 and tr5~tr6 is 0.6 • The FivaMatchingScore is (1.0+0.6+0.6+0.6+0.6)/5 = 0.68
  8. 8. The problem of FivaMatchingScore • Case 1. Table structure. • Case 2. Child trees containing set type data. • Case 3. Asymmetry.
  9. 9. Case 1. Table Structure
  10. 10. Case 1. Table Structure
  11. 11. Case 2. Child trees containing set type data • Assume tr5 and tr6 containing set type data, and the similarity between tr1~tr4 and tr5~tr6 is 0.3. • The FivaMatchingScore is 1.0/5 = 0.2.
  12. 12. Case 3. Asymmetry • Assume S(b1,b2) = 1.0, S(tr1,tr5) = 0.6, S(tr4,tr6) = 0.6, S(tr2~tr4,tr5) = 0.3, S(tr1~tr3,tr6) = 0.3, where S = Similarity. • FivaMatchingScore(A,B) = (1.0+0.6+0.6)/5 = 0.44 ≠ FivaMatchingScore(B,A) = (1.0+0.6+0.6)/3 = 0.86

×