Upcoming SlideShare
Loading in …5
×

# The Problem of Peer Node Recognition

1,649 views

Published on

Published in: Technology, News & Politics
0 Comments
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
Your message goes here
• Be the first to comment

• Be the first to like this

No Downloads
Views
Total views
1,649
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

### The Problem of Peer Node Recognition

1. 1. FivaTech ： Schema & Template Discovery Reporter ： Che-Min Liao
2. 2. Introduction <ul><li>FivaTech is a page-level data extraction system which deduces the data schema and templates for the input pages generated from a CGI program. </li></ul><ul><ul><li>Tree Merging </li></ul></ul><ul><ul><li>Schema Detection </li></ul></ul>
3. 3. Problem Formulation
4. 4. Problem Formulation
5. 5. The FivaTech Approach <ul><li>The proposed approach FivaTech contains two modules ： </li></ul><ul><ul><li>Tree merging </li></ul></ul><ul><ul><li>Schema detection </li></ul></ul>
6. 6. Peer Node Recognition <ul><li>As each tag/node is actually denoted a tree, we can use 2-tree matching algorithm for computing whether two nodes with the same tag are similar. </li></ul><ul><ul><li>We adopt Yang’s algorithm </li></ul></ul><ul><li>A more serious problem is score normalization. </li></ul><ul><ul><li>A typical way to compute a normalized score is the ratio between the numbers of parts in the mapping over the maximum size of the two trees. </li></ul></ul>
7. 7. Tree Merging Score Algorithm
8. 8. Example
9. 9. Peer Matrix Alignment
10. 10. Pattern Mining
11. 11. Optional Node Merging <ul><li>After the mining step, we are able to detect optional nodes based the ocurence vectors . </li></ul>
12. 12. The Example of Pattern Tree
13. 13. Identifying the Schema <ul><li>Recognize tuple type </li></ul><ul><li>Recognize order of the set type and optional data. </li></ul>
14. 14. Defining the Template <ul><li>Templates can be obtained by segmenting the pattern tree at reference nodes defined below ： </li></ul>
15. 15. The Example of Schema
16. 16. The Example of Template <ul><li>T( τ 1 ) = (T 1 , (T 2 , Φ ), 0) </li></ul><ul><li>T( τ 2 ) = ( Φ , (T 3 , Φ ), 0) </li></ul><ul><li>T( τ 3 ) = ( Φ , (T 4 , T 5 , T 21 ), (0,0)) </li></ul><ul><li>T( τ 4 ) = ( Φ , (T 6 , T 7 , Φ ), (0,0)) </li></ul><ul><li>… </li></ul><ul><li>T( τ 13 ) = ( Φ , (T 20 , Φ ), 2) </li></ul>