Progress Report 2010/7/29 Shu-Ying Li
Associated information <ul><li>There are two possible case : </li></ul><ul><ul><li>Two different layouts in a web page. </...
Method  <ul><li>Find the position of address  a 1   and record the path  p a </li></ul><ul><li>For each  path i </li></ul>...
Case 1 : No obvious boundary among extracted addresses(1/2)
Case 1 : No obvious boundary among extracted addresses(2/2)
Case 2 : Two different layouts in a web page(1/2)
Case 2 : Two different layouts in a web page(2/2) miss miss
Case 3 :
Case 3 : miss
Discussion <ul><li>Using ANNIE annotation </li></ul><ul><ul><li>Organization </li></ul></ul><ul><ul><li>Person </li></ul><...
Discussion-Using ANNE annotation Organization Address Location Address Organization Location
Discussion-Using ANNE annotation Date Organization Organization Organization Organization Person Person Person Location
Discussion-Using ANNE annotation Organization Date Location Organization Location Date Person Organization Location Date
Upcoming SlideShare
Loading in …5
×

Progress Report 0729

347 views

Published on

Associated Information extraction

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
347
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Progress Report 0729

  1. 1. Progress Report 2010/7/29 Shu-Ying Li
  2. 2. Associated information <ul><li>There are two possible case : </li></ul><ul><ul><li>Two different layouts in a web page. </li></ul></ul><ul><ul><li>No obvious boundary among extracted addresses. </li></ul></ul><ul><li>We apply parser and Tidy on web pages and consider the following two fields : </li></ul><ul><ul><li>Path based on number </li></ul></ul><ul><ul><li>Terminal Value </li></ul></ul>
  3. 3. Method <ul><li>Find the position of address a 1 and record the path p a </li></ul><ul><li>For each path i </li></ul><ul><ul><li>If path i match p a </li></ul></ul><ul><ul><ul><li>extract the corresponding information </li></ul></ul></ul><ul><ul><li>If path i match p a and length( path i ) >length( p a ) </li></ul></ul><ul><ul><ul><li>extract the corresponding information </li></ul></ul></ul><ul><ul><li>Path based on number </li></ul></ul><ul><ul><li>12811111432 </li></ul></ul><ul><ul><li>128111114324 </li></ul></ul><ul><ul><li>12811111432 </li></ul></ul><ul><ul><li>Terminal Value </li></ul></ul><ul><ul><li>Infomation1 </li></ul></ul><ul><ul><li>information2 </li></ul></ul><ul><ul><li>1410 Pines Road, Oregon, IL, 61061, USA </li></ul></ul>a 1 p a
  4. 4. Case 1 : No obvious boundary among extracted addresses(1/2)
  5. 5. Case 1 : No obvious boundary among extracted addresses(2/2)
  6. 6. Case 2 : Two different layouts in a web page(1/2)
  7. 7. Case 2 : Two different layouts in a web page(2/2) miss miss
  8. 8. Case 3 :
  9. 9. Case 3 : miss
  10. 10. Discussion <ul><li>Using ANNIE annotation </li></ul><ul><ul><li>Organization </li></ul></ul><ul><ul><li>Person </li></ul></ul><ul><ul><li>Date </li></ul></ul><ul><ul><li>Address </li></ul></ul><ul><ul><li>Location </li></ul></ul>
  11. 11. Discussion-Using ANNE annotation Organization Address Location Address Organization Location
  12. 12. Discussion-Using ANNE annotation Date Organization Organization Organization Organization Person Person Person Location
  13. 13. Discussion-Using ANNE annotation Organization Date Location Organization Location Date Person Organization Location Date

×