Session 1: NLP and the Multilingual Semantic Web: Challenges and Opportunities<br />Tobias Wunner<br />Digital Research En...
2<br />What’s on the Web?<br /><ul><li>Wikipedia
250 languages
less than 25% in English </li></ul>3.5M<br />1M<br />2001<br />2011<br />From: http://en.wikipedia.org/wiki/Wikipedia:Size...
What’s on the Web?<br /><ul><li>HudongBaike Chinese Encyclopedia
3.9m Chinese articles</li></ul>GoogleTranslate<br />
Language use on the Web<br /><ul><li>Term variations</li></ul>Rarely used term variation<br />widely accepted term<br />30...
Language use on the Web<br /><ul><li>Linguistic variations “as Gaelge”</li></ul>Irish cases for<br />word “medicine”<br />...
Language use on the Web<br /><ul><li>Linguistic variations - syntactic</li></ul>TermLeasePaym + NOUN<br />TermLeasePaym + ...
Language use on the Web<br /><ul><li>Linguistic variations – morphological (German Compounding)</li></ul>ADJECTIVE + NOUN ...
The Semantic Web<br /><ul><li>Structured data in Triples    <Subject>  <Predicate>  <Object>
Resources identified by URI (unique resource identifier)</li></ul>URI  =  http://dbpedia.org/resource/TCM<br />dbpedia:TCM...
The Semantic Web<br /><ul><li>…is multilingual</li></ul>Multilingual literals (STW - German economy Thesaurus)<br />Multil...
Language use on the Web<br /><ul><li>Different resources different labeling mechanisms!
To (some extent) no linguistic right or wrong</li></ul>-->  Standards (formal agreements)<br />MeSH (Medical Subject Headi...
What’s on the Semantic Web?<br /><ul><li>How to search?
Semantic Web Query Language (SPARQL)
Semantic Web Search Engines</li></li></ul><li>What’s on the Semantic Web?<br /><ul><li>How to search with SPARQL?
Matching pattern on graph of triples
Choose labeling mechanism e.g
…from RDFS vocabulary (label)
…from SKOS vocabulary (preferred label)
…other </li></li></ul><li><ul><li>How to search with SPARQL?
Matching pattern on graph of triples
Choose predicate according to labeling mechanism
Query on literal value</li></ul>What’s on the Semantic Web?<br /><Subject>  <Predicate>  <Object><br />Resource<br />rdfs:...
What’s on the Semantic Web?<br /><ul><li>How to search with Sindice?
Query all literals with Greek encoded String  “Χερσόνησος”</li></li></ul><li>What’s on the Semantic Web?<br /><ul><li>How ...
Query all literals with chinese encoded String  “中医”</li></ul>Results<br /><http://raynix.cn/…> dc:title "极客路线中医”<br />......
What’s on the Semantic Web?<br /><ul><li>How to search embedded terms in URI?
Example: “all resources with word traditional”</li></ul>dbpedia:TraditionalChineseMedicine<br />dbpedia:TraditionalIrishMu...
Upcoming SlideShare
Loading in …5
×

Enriching the semantic web tutorial session 1

1,061 views

Published on

Tutorial at ESWC 2011 with John McCrae and Elena Montiel-Ponsoda

Published in: Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,061
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
15
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Enriching the semantic web tutorial session 1

  1. 1. Session 1: NLP and the Multilingual Semantic Web: Challenges and Opportunities<br />Tobias Wunner<br />Digital Research Enterprise Institute (DERI)<br />National University of Ireland, Galway (NUIG)<br />
  2. 2. 2<br />What’s on the Web?<br /><ul><li>Wikipedia
  3. 3. 250 languages
  4. 4. less than 25% in English </li></ul>3.5M<br />1M<br />2001<br />2011<br />From: http://en.wikipedia.org/wiki/Wikipedia:Size_comparisons<br />
  5. 5. What’s on the Web?<br /><ul><li>HudongBaike Chinese Encyclopedia
  6. 6. 3.9m Chinese articles</li></ul>GoogleTranslate<br />
  7. 7. Language use on the Web<br /><ul><li>Term variations</li></ul>Rarely used term variation<br />widely accepted term<br />300k<br />7m<br />more results<br />
  8. 8. Language use on the Web<br /><ul><li>Linguistic variations “as Gaelge”</li></ul>Irish cases for<br />word “medicine”<br />Singular<br />Genetiveplural<br />
  9. 9. Language use on the Web<br /><ul><li>Linguistic variations - syntactic</li></ul>TermLeasePaym + NOUN<br />TermLeasePaym + ADJECTIVE<br />thirty times more results<br />
  10. 10. Language use on the Web<br /><ul><li>Linguistic variations – morphological (German Compounding)</li></ul>ADJECTIVE + NOUN (delayed Paym)<br />COMPOUND (PaymDelay)<br />ADJECTIVE<br />verspätet<br />delayed<br />NOUN<br />Zahlung<br />payment<br />NOUN<br />Zahlung<br />payment<br />NOUN<br />Verspätung<br />delay<br />de<br />en<br />
  11. 11. The Semantic Web<br /><ul><li>Structured data in Triples <Subject> <Predicate> <Object>
  12. 12. Resources identified by URI (unique resource identifier)</li></ul>URI = http://dbpedia.org/resource/TCM<br />dbpedia:TCM (Turtle)<br />Linguistic and semanticinformation on the Semantic Web!<br />DBPedia RDFS label and OWL same as relationship<br />dbpedia:TCMrdfs:label“Traditional Chinese Medicine”@en<br />dbpedia:TCMrdfs:label“MedicinaTradicional Chinese”@es<br />dbpedia:TCMowl:sameAsdbpedia:TraditionalChineseMedicine<br />
  13. 13. The Semantic Web<br /><ul><li>…is multilingual</li></ul>Multilingual literals (STW - German economy Thesaurus)<br />Multilingual vocabularies (Rechtspraak.nl –Dutch) case)law dataset)<br />
  14. 14. Language use on the Web<br /><ul><li>Different resources different labeling mechanisms!
  15. 15. To (some extent) no linguistic right or wrong</li></ul>--> Standards (formal agreements)<br />MeSH (Medical Subject Headings)<br />From http://www.nlm.nih.gov/mesh/MBrowser.html<br />
  16. 16. What’s on the Semantic Web?<br /><ul><li>How to search?
  17. 17. Semantic Web Query Language (SPARQL)
  18. 18. Semantic Web Search Engines</li></li></ul><li>What’s on the Semantic Web?<br /><ul><li>How to search with SPARQL?
  19. 19. Matching pattern on graph of triples
  20. 20. Choose labeling mechanism e.g
  21. 21. …from RDFS vocabulary (label)
  22. 22. …from SKOS vocabulary (preferred label)
  23. 23. …other </li></li></ul><li><ul><li>How to search with SPARQL?
  24. 24. Matching pattern on graph of triples
  25. 25. Choose predicate according to labeling mechanism
  26. 26. Query on literal value</li></ul>What’s on the Semantic Web?<br /><Subject> <Predicate> <Object><br />Resource<br />rdfs:label<br /> ”Traditionelle chinesische Medizin”@de<br />
  27. 27. What’s on the Semantic Web?<br /><ul><li>How to search with Sindice?
  28. 28. Query all literals with Greek encoded String “Χερσόνησος”</li></li></ul><li>What’s on the Semantic Web?<br /><ul><li>How to search with Sindice?
  29. 29. Query all literals with chinese encoded String “中医”</li></ul>Results<br /><http://raynix.cn/…> dc:title "极客路线中医”<br />... <br />
  30. 30. What’s on the Semantic Web?<br /><ul><li>How to search embedded terms in URI?
  31. 31. Example: “all resources with word traditional”</li></ul>dbpedia:TraditionalChineseMedicine<br />dbpedia:TraditionalIrishMusic<br />dbpedia:IrishTraditionalMusic<br />...<br />with SPARQL filter<br />select ?subject where {<br />?subject ?predicate ?object<br /> filter regex(?subject,”.*traditional.*chinese.*” )<br />}<br />
  32. 32. What’s on the Semantic Web?<br /><ul><li>How to search embedded terms?
  33. 33. Example: “all resources with word traditional”</li></ul>dbpedia:TraditionalChineseMedicine<br />dbpedia:TraditionalIrishMusic<br />dbpedia:IrishTraditionalMusic<br />...<br />with Sindicestar-shaped queries (SIREn)<br />Results<br />
  34. 34. NLP for the Semantic Web<br />Multilingual/Ontology-based Information Extraction (BioCaster, OpenCalais)<br />Ontology Localization (LabelTranslator)<br />Ontology-based Natural Language Generation (CLANN)<br />
  35. 35. Multilingual/Ontology-based Information Extraction (Biocaster)<br />http://born.nii.ac.jp<br />concept = measles<br /><ul><li>Aggregates and processes health news
  36. 36. Annotates news based on a multilingual ontology
  37. 37. Uses proprietary format and SKOS-XL to maintain terminology</li></ul>…<br />
  38. 38. Multilingual/Ontology-based Information Extraction (Biocaster)<br /><ul><li>Example: “Risk of measles outbreak in Malta unlikely…”</li></ul>[DISEASE]<br />[COUNTRY]<br />http://born.nii.ac.jp<br />
  39. 39. Multilingual/Ontology-based Information Extraction (Biocaster)<br /><ul><li>Challenges
  40. 40. Multilingual adaptation
  41. 41. Adaptation of information extracion rules to other domains
  42. 42. Use of proprietary format is undesirable</li></li></ul><li>Multilingual Information Extraction (OpenCalais)<br /><ul><li>Semantic markup of unstructured text
  43. 43. Multilingual (English, French, Spanish)
  44. 44. English
  45. 45. 39 entities
  46. 46. 75 relations</li></li></ul><li>Multilingual Information Extraction (OpenCalais)<br /><ul><li>Domain tuned (Finance, Biomedical)
  47. 47. Only 15 base entities for non-English, no relations
  48. 48. Demo</li></ul>http://viewer.opencalais.com<br />
  49. 49. Multilingual Information Extraction (OpenCalais)<br /><ul><li>Challenges
  50. 50. Multilingual adaptation of lexicon and extraction rules
  51. 51. Domain adaptation of lexicon and extraction rules</li></li></ul><li>Ontology Localisation (LabelTranslator)<br /><ul><li>Multilingual ontology editor
  52. 52. Linguistic annotations (Num., POS, Gender)
  53. 53. … for a better translation</li></ul>Number + Gender<br />part of<br />speech<br />
  54. 54. Ontology Localisation (LabelTranslator)<br />“river”@en<br />“rivière”@fr<br />“fleuve”@fr<br />Ambiguous!<br />
  55. 55. Ontology Localisation (LabelTranslator)<br /><ul><li>Challenges
  56. 56. Use linguistic features in the lexicon for better machine translation
  57. 57. Use semantic features from the domain model as well</li></li></ul><li>Natural Language Generation (CLANN)<br /><ul><li>Controlled Language ANNotations (CLANN)
  58. 58. To write domain specific grammars (meeting minutes)
  59. 59. Intermediate representation </li></ul>Domain ontology (e.g. meeting minutes)<br />MLink Grammer <br />LinkedGrammar<br />
  60. 60. Natural Language Generation (CLANN)<br /><ul><li>Example </li></ul>parse tree (absract)<br />“John will present lemon model.”<br />aux<br />:Sentence1 :hasRootNode [ <br />rdf:type :TextNode ;<br /> :hasText "present" ;<br /> :hasSubType :Verb ;<br /> :hasObject [ rdf:type :TextNode ;<br /> :hasText "model" ;<br /> :hasObjectModifier [ rdf:type :TextNode ; <br /> :hasText "lemon" . <br /> ] ] ]<br />nsubj<br />dobj<br />parse tree<br />In MLINK<br />
  61. 61. Natural Language Generation (CLANN)<br /><ul><li>Challenges
  62. 62. From text to triples?
  63. 63. Domain adaptation (meeting minutes)
  64. 64. Multilingual adaptation</li></li></ul><li>Summary<br /><ul><li>Web and Semantic Web is
  65. 65. “Lingual” (variations within one language)
  66. 66. Multilingual (between languages and cultures)
  67. 67. NLP Applications need domain and multilingual adaptation
  68. 68. Lexicon updates / extensions
  69. 69. Extraction rules updates / extensions
  70. 70. What do we need?
  71. 71. Efficient adaptation and sharing of linguistic resources between ontology-based NLP applications </li></li></ul><li>Links and resources<br /><ul><li>Tutorial website
  72. 72. http://tiny.cc/tvzlc
  73. 73. The Monnet Project
  74. 74. Multilingual Ontologies for Network for Networked Knowledge
  75. 75. http://www.monnet-project.eu/
  76. 76. Lexinfo
  77. 77. http://lexinfo.net/</li>

×