Home Page Live(Www2007)

603 views

Published on

Published in: Business, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
603
On SlideShare
0
From Embeds
0
Number of Embeds
33
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Home Page Live(Www2007)

  1. 1. Homepage live: automatic block tracing for web personalization J. Han, D. Han, C. Lin, H.J. Zeng, Z. Chen, Y. Yu Proceedings of the 16th international conference on World Wide Web, 2007 Reporter: Shih-Feng Yang 2007/8/9
  2. 2. Outline <ul><li>Introduction </li></ul><ul><li>Homepage Live </li></ul><ul><li>Tree mapping algorithm for block tracking </li></ul><ul><li>Experiments and Analysis </li></ul><ul><li>Conclusion </li></ul>
  3. 3. Introduction <ul><li>Personalized homepage services have enabled web users to select web contents and to aggregate them in a single web page. </li></ul><ul><li>However, it involves manual efforts to define the content blocks and maintain the information. </li></ul>
  4. 4. Homepage Live <ul><li>An application which offers “one-stop browsing” for users. </li></ul><ul><li>Let users to collect blocks from different web pages and organize them in a single page. </li></ul><ul><li>It can automatically trace and present the extracted real time content to the user. </li></ul>
  5. 5. Homepage Live
  6. 6. Homepage Live <ul><li>Two steps of Homepage Live: </li></ul><ul><ul><li>Collecting the Blocks </li></ul></ul><ul><ul><ul><li>Users can select the block they want by drag-and-drop with mouse. </li></ul></ul></ul>
  7. 7. Homepage Live <ul><li>Two steps of Homepage Live: </li></ul><ul><ul><li>Tracing Web Page Blocks </li></ul></ul><ul><ul><ul><li>Use tracing algorithm to analyze the original pages and the new pages. </li></ul></ul></ul><ul><ul><ul><li>It can detect the new block position in the updated pages. </li></ul></ul></ul>
  8. 8. Homepage Live <ul><li>Two steps of Homepage Live: </li></ul><ul><ul><li>Tracing Web Page Blocks </li></ul></ul>
  9. 9. Tree mapping algorithm for block tracking <ul><li>Simple methods </li></ul><ul><ul><li>Direct Path Finding </li></ul></ul><ul><ul><ul><li>Record the tags on the path from the root node to the target block, and use the path to trace the evolved block. </li></ul></ul></ul><ul><ul><ul><li>Can not deal with the problem of block position changing. </li></ul></ul></ul><ul><ul><li>Tag String Matching </li></ul></ul><ul><ul><ul><li>To find the evolved block, it compares the original tag sequence in the old page with the tag sequence of every sub-tree in the new page. </li></ul></ul></ul><ul><ul><ul><li>Use longest common subsequences (LCS) as the similarity measure. </li></ul></ul></ul>
  10. 10. Tree mapping algorithm for block tracking <ul><li>Tree Edit Distance </li></ul>
  11. 11. Tree mapping algorithm for block tracking <ul><li>Tree Edit Distance </li></ul><ul><ul><li>Case 1 </li></ul></ul><ul><ul><ul><li>All nodes in T are not mapped to a node in T’, then </li></ul></ul></ul><ul><ul><ul><li>Dis(T,T’)=n(T)+n(T’) </li></ul></ul></ul><ul><ul><ul><li>T: the original tree. </li></ul></ul></ul><ul><ul><ul><li>T’: the evolved tree. </li></ul></ul></ul><ul><ul><ul><li>n(T): the number of nodes in T. </li></ul></ul></ul>
  12. 12. Tree mapping algorithm for block tracking <ul><li>Tree Edit Distance </li></ul><ul><ul><li>Case 2 </li></ul></ul><ul><ul><ul><li>If r is mapped to r’ </li></ul></ul></ul><ul><ul><ul><li>r: the root node of T. </li></ul></ul></ul><ul><ul><ul><li>r’: the root node of T’. </li></ul></ul></ul><ul><ul><ul><li>pi ,pi’ : monotonically increasing. </li></ul></ul></ul><ul><ul><ul><li>m: assume there are m pairs of (S pi ,S pi ’) </li></ul></ul></ul><ul><ul><ul><li>S: sub-tree </li></ul></ul></ul>
  13. 13. Tree mapping algorithm for block tracking <ul><li>Tree Edit Distance </li></ul><ul><ul><li>Case 2 </li></ul></ul><ul><ul><ul><li>Standard dynamic programming algorithm can be used to calculate the mapping with minimum edit distance. </li></ul></ul></ul><ul><ul><ul><li>For example: </li></ul></ul></ul>
  14. 14. Tree mapping algorithm for block tracking <ul><li>Tree Edit Distance </li></ul><ul><ul><li>Case 3 </li></ul></ul><ul><ul><ul><li>If r is mapped to the root node of s’ of sub-tree S’ in T’ </li></ul></ul></ul><ul><ul><ul><li>Dis(T,T’)=n(T’)-n(S’)+Dis(T,S’) </li></ul></ul></ul>
  15. 15. Tree mapping algorithm for block tracking <ul><li>Tree Edit Distance </li></ul>i j i j
  16. 16. Tree mapping algorithm for block tracking <ul><li>Fixed Sub-tree Based Tracing </li></ul><ul><ul><li>Finding Fix Nodes </li></ul></ul><ul><ul><ul><li>Fix Node: a node with both tag and attributes immutable in two trees. </li></ul></ul></ul><ul><ul><ul><li>All the tags and contents of the nodes in the original tree are indexed. </li></ul></ul></ul><ul><ul><ul><li>Duplicated nodes in the original tree are removed. </li></ul></ul></ul><ul><ul><ul><li>Check all nodes in the new tree sequentially and find the fix nodes. </li></ul></ul></ul>
  17. 17. Tree mapping algorithm for block tracking <ul><li>Fixed Sub-tree Based Tracing </li></ul><ul><ul><li>Generating the Reduced Trees </li></ul></ul><ul><ul><ul><li>Common Sub-Tree Pair </li></ul></ul></ul><ul><ul><ul><ul><li>The sub-tree roots are same fix nodes. </li></ul></ul></ul></ul><ul><ul><ul><ul><li>The two sub-trees contain a same set of Fix Nodes; and none of their sub-trees contain all Fix Nodes. </li></ul></ul></ul></ul><ul><ul><ul><li>Minimal Common Sub-Tree </li></ul></ul></ul><ul><ul><ul><ul><li>The common sub-tree with minimum size. </li></ul></ul></ul></ul>
  18. 18. Tree mapping algorithm for block tracking <ul><li>Fixed Sub-tree Based Tracing </li></ul><ul><ul><li>Generating the Reduced Trees </li></ul></ul><ul><ul><ul><li>Finding minimum Common Tree </li></ul></ul></ul>? ?
  19. 19. Tree mapping algorithm for block tracking <ul><li>Fixed Sub-tree Based Tracing </li></ul><ul><ul><li>Generating the Reduced Trees </li></ul></ul><ul><ul><ul><li>First, find the minimum common tree pair contains the tracing blocks. </li></ul></ul></ul><ul><ul><ul><li>Second, prune away some sub-trees that are intuitively unnecessary ( in a rule-based fashion). </li></ul></ul></ul><ul><ul><ul><li>For each Fix Node, all of its ancestor nodes, except the nodes lies in the path from the root to the tracing block, should be cut off. </li></ul></ul></ul>
  20. 20. Tree mapping algorithm for block tracking <ul><li>Fixed Sub-tree Based Tracing </li></ul><ul><ul><li>Generating the Reduced Trees </li></ul></ul>
  21. 21. Tree mapping algorithm for block tracking <ul><li>Fixed Sub-tree Based Tracing </li></ul><ul><ul><li>Mapping on the Reduced Trees </li></ul></ul><ul><ul><ul><li>After step 1. and 2. , only the remaining nodes in the minimum common sub-tree will be taken into consideration by minimum edit distance algorithm. </li></ul></ul></ul>
  22. 22. Experiments and Analysis <ul><li>Data Set </li></ul><ul><ul><li>25-url dataset, 101 pages for each URL(30 minutes a version). </li></ul></ul><ul><ul><li>Five users select their interested blocks of the first version of 25 URLs. </li></ul></ul><ul><ul><li>Then users mark out the evolved blocks in the later 100 versions also. </li></ul></ul><ul><ul><li>In total, there are 12,625 blocks marked. </li></ul></ul>
  23. 23. Experiments and Analysis <ul><li>Metrics </li></ul><ul><ul><li>Correct Tracing Rate (CTR) </li></ul></ul><ul><ul><ul><li>Correct tracing count / Total tracing count </li></ul></ul></ul><ul><ul><ul><li>Total count = 12,500 </li></ul></ul></ul><ul><ul><li>Correct Case Rate (CCR) </li></ul></ul><ul><ul><ul><li>Correct case count / Total case count </li></ul></ul></ul><ul><ul><ul><li>Total case = 125 </li></ul></ul></ul>
  24. 24. Experiments and Analysis <ul><li>CTR and CCR </li></ul><ul><ul><li>DPF (Direct Path Finding) </li></ul></ul><ul><ul><li>TSM (Tag String Matching) </li></ul></ul><ul><ul><li>TED (Tree Edit Distance) </li></ul></ul><ul><ul><li>FSBT (Fixed Sub-tree Based Tracing) </li></ul></ul>
  25. 25. Experiments and Analysis Size (Node Number) and Change rate (Block Position) of the web page don't impact the algorithm much. => prove the scalability.
  26. 26. Experiments and Analysis <ul><li>Computational Cost </li></ul>(millisecond) (kilobyte)
  27. 27. Conclusion <ul><li>A novel application, Homepage Live, for tracing interesting blocks on different web pages has been proposed. </li></ul><ul><li>Use tree edit distance to trace the block when the page is updated. </li></ul><ul><li>With the ability of automatic recognizing and tracing web blocks, we’re able to develop some sections or gadgets for personalized homepage. </li></ul>

×