Introduction to Information Retrieval Chapter 10

386 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
386
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Introduction to Information Retrieval Chapter 10

  1. 1. !10 !XML !
  2. 2. 10.0 ! !10.1 ! XML !10.2 !XML !10.3 !XML !10.4 !XML !!! h,p://nlp.stanford.edu/IR=book/ppt/10xml.pptx
  3. 3. 10.0 ! !10.1 ! XML !10.2 !XML !10.3 !XML !10.4 !XML
  4. 4. RDB•  RDB !•  IR !•  RDB ! –  !•  !
  5. 5. (Structured!Retrieval)•  or ! DB (named!enJty!tagging)! !!  :! !!  :! 4,405,829 RSA !!  :! !
  6. 6. RDB•  3 ! –  (DB) ! –  !−! ! •  tours!AND!(COUNTRY:!VaJcan!OR! •  LANDMARK:!Coliseum)?! •  tour!AND!(STATE:!VaJcan!OR!BUILDING:!Coliseum)?! –  !•  ! –  !
  7. 7. •  XML! –  ! !→!XML ! –  (HTML,!SGML,!…)
  8. 8. 10.0 ! !10.1 ! XML !10.2 !XML !10.3 !XML !10.4 !XML
  9. 9. XML!  ! !<play>!!  XML / !<author>Shakespeare</author>! (e.g.!<Jtle… !<Jtle>Macbeth</Jtle>! >,!</Jtle…>)!! !<act!number=“I”>!!  XML (e.g.!number)! !<scene!!number=”vii”>!!  (e.g.!vii)! !<Jtle>Macbeth’s!castle</Jtle>!!  !(e.g.! !<verse>Will!I!with!wine! Jtle,!verse)! !…</verse>! !</scene>! !</act>! !</play>!
  10. 10. XML root!element! play! !element! !element! !element! author! act! )tle! !text! !text!Shakespeare! Macbeth! !a,ribute! element!number=“I”! scene! !a,ribute! !element! !element!number=“vii”! verse! )tle! !text! !text! Shakespeare! Macbeth’s!castle! 10! 10!
  11. 11. XML root!element! play! !element! !element! !element! author! act! )tle! !text! !text!Shakespeare! Macbeth! !a,ribute! element!number=“I”! scene! !a,ribute! !element! !element!number=“vii”! verse! )tle! !text! !text! Shakespeare! Macbeth’s!castle!
  12. 12. XML root!element! play! !element! !element! !element! author! act! )tle! !text! !text!Shakespeare! Macbeth! !a,ribute! element!number=“I”! scene! !a,ribute! !element! !element!number=“vii”! verse! )tle! !text! !text! Shakespeare! Macbeth’s!castle! 12! 12!
  13. 13. XML!  XML$Documents$Object$Model$(XML$DOM):! ! !  DOM ! !  DOM!API XML !!  XPath:!XML e.g.NEXI ! !  XML !!  Schema:! XML E.g.! :! (scene) (act) ! !  XML :!XML!DTD!(document! type!definiJon)! !XML!Schema!
  14. 14. 10.0 ! !10.1 ! XML !10.2 !XML !10.3 !XML !10.4 !XML
  15. 15. XML1.  !2.  !3.  !
  16. 16. ! !•  /XML :! (i.e.,!XML ) ! ! Macbeth’s!castle !scene act! ! –  scene ! –  Macbeth play !• 
  17. 17. ! ! !•  !•  E.g.! query:Jtle:Macbeth! Macbeth tragedy,!Macbeth Jtle,!Act!I,!Scene!vii,!Macbetch’s! castle Jtle 10.2 ! –  tragedy Jtle ! – 
  18. 18. !•  ! –  ! –  ! 1.  ! 2.  ! 3.  ! 4. 
  19. 19. •  !•  !•  !
  20. 20. •  ! 1.  E.g.! book! ! 2.  !•  book
  21. 21. •  ! –  ! –  !
  22. 22. •  :! :! –  XML E.g.!ISBN!! –  ! !Macbeth’s!castle! Macbeth’s!castle play,!act,!scene,!)tle !•  !
  23. 23. ! !•  ! –  ! –  XML ! –  ! –  !• 
  24. 24. ! ! ! •  ! •  !!  1:! !!  2:! !
  25. 25. •  (idf) ! ! author Gates gateGates ! ! XML=context/term( / ) ! •  XML=context df ! •  x x
  26. 26. 10.0 ! !10.1 ! XML !10.2 !XML !10.3 !XML !10.4 !XML
  27. 27. •  :! XML Microso.$ Bill$ Gates$ Book! Title! Author! Author! Title! Author! Microso.$ Bill$ Gates$Microso.$ Bill$ Gates$ Book! Book! .!.!.!! Title! Author! Microso.$ Bill$ 27! Gates$
  28. 28. 1.  Bill!Gates Bill Gates !2.  ! Microso.$ Bill$ Gates$ Book! Title! Author! Author! Title! Author! Microso.$ Bill$ Gates$Microso.$ Bill$ Gates$ Book! Book! .!.!.!! Title! Author! Microso.$ Bill$ 28! Gates$
  29. 29. •  E.g.! $vs.$ $ XML !
  30. 30. •  ! –  e.g.!author Jtle Gates ! –  !•  XML=context/term XML=context/term (structural!term) <c,!t>! XML context c (term)t !
  31. 31. (context!resemblance)•  cq ! cd ! !CR! :!•  |cq|! |cd|! !•  cq ! cd! cq ! cd ! !
  32. 32. CR(cq4,!cd2)!=!3/4!=!0.75.!!q! d! idenJcal !CR(cq,!cd)! !1.0! !
  33. 33. CR(cq4,!cd3)!=!3/5!=!0.6.!
  34. 34. •  SIMNOMERGE! !SIMNOMERGE(q,!d)!=!!!•  V !•  B XML !•  weight!(q,!t,!c),!weight(d,!t,!c)! !q! !d! XML !c! !t! ( E.g.!idft!*!wft,d!:!idft! dft! )!!•  SIMNOMERGE(q,!d)! 1.0 !
  35. 35. SimNoMergeSCOREDOCUMENTSWITHSIMNOMERGE(q,!B,!V,!N,!normalizer)!
  36. 36. 10.0 ! !10.1 ! XML !10.2 !XML !10.3 !XML !10.4 !XML
  37. 37. XML (INEX)•  INEX:!XML INEX2002 IEEE 12,000 2006 Wikipedia INEX$2002$collec@on$sta@s@cs$ 12,107! number!of!documents! 494!MB! size! 1995—2002! Jme!of!publicaJon!of!arJcles! 1,532! average!number!of!XML!nodes!per!document! 6.9! average!depth!of!a!node! 30! number!of!CAS!topics! 30! number!of!CO!topics!
  38. 38. INEX•  2 / ! 1.  (CO )! 2.  (CAS )!CAS !
  39. 39. INEX•  :! 1.  Content=only!or!CO :! ! 2.  Content=and=structure!or!CAS :! ! CAS !
  40. 40. INEX•  INEX!2002! ! $ !•  ! 1.  E :! ! 2.  S :! ! 3.  L ! 4.  N
  41. 41. INEX•  (3) (2) (1) (0) $ (ex.!3E!→! (3) (E) ) 2S 3E 3N !
  42. 42. INEX•  !•  XML Q / A
  43. 43. INEX•  2 ! $•  INEX ! !•  INEX !
  44. 44. XML XML•  ! –  XML! •  XML ! •  ! 1.  ! 2.  ! 3.  ! –  XML! •  ! •  → ! –  !•  XML XQuery!(W3C)
  45. 45. •  ( )XML IR !•  ex.! !•  10 !

×