Your SlideShare is downloading. ×
Enriching the semantic web tutorial session 1
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Enriching the semantic web tutorial session 1

819
views

Published on

Tutorial at ESWC 2011 with John McCrae and Elena Montiel-Ponsoda

Tutorial at ESWC 2011 with John McCrae and Elena Montiel-Ponsoda

Published in: Education

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
819
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
14
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Session 1: NLP and the Multilingual Semantic Web: Challenges and Opportunities
    Tobias Wunner
    Digital Research Enterprise Institute (DERI)
    National University of Ireland, Galway (NUIG)
  • 2. 2
    What’s on the Web?
    • Wikipedia
    • 3. 250 languages
    • 4. less than 25% in English
    3.5M
    1M
    2001
    2011
    From: http://en.wikipedia.org/wiki/Wikipedia:Size_comparisons
  • 5. What’s on the Web?
    • HudongBaike Chinese Encyclopedia
    • 6. 3.9m Chinese articles
    GoogleTranslate
  • 7. Language use on the Web
    • Term variations
    Rarely used term variation
    widely accepted term
    300k
    7m
    more results
  • 8. Language use on the Web
    • Linguistic variations “as Gaelge”
    Irish cases for
    word “medicine”
    Singular
    Genetiveplural
  • 9. Language use on the Web
    • Linguistic variations - syntactic
    TermLeasePaym + NOUN
    TermLeasePaym + ADJECTIVE
    thirty times more results
  • 10. Language use on the Web
    • Linguistic variations – morphological (German Compounding)
    ADJECTIVE + NOUN (delayed Paym)
    COMPOUND (PaymDelay)
    ADJECTIVE
    verspätet
    delayed
    NOUN
    Zahlung
    payment
    NOUN
    Zahlung
    payment
    NOUN
    Verspätung
    delay
    de
    en
  • 11. The Semantic Web
    • Structured data in Triples <Subject> <Predicate> <Object>
    • 12. Resources identified by URI (unique resource identifier)
    URI = http://dbpedia.org/resource/TCM
    dbpedia:TCM (Turtle)
    Linguistic and semanticinformation on the Semantic Web!
    DBPedia RDFS label and OWL same as relationship
    dbpedia:TCMrdfs:label“Traditional Chinese Medicine”@en
    dbpedia:TCMrdfs:label“MedicinaTradicional Chinese”@es
    dbpedia:TCMowl:sameAsdbpedia:TraditionalChineseMedicine
  • 13. The Semantic Web
    • …is multilingual
    Multilingual literals (STW - German economy Thesaurus)
    Multilingual vocabularies (Rechtspraak.nl –Dutch) case)law dataset)
  • 14. Language use on the Web
    • Different resources different labeling mechanisms!
    • 15. To (some extent) no linguistic right or wrong
    --> Standards (formal agreements)
    MeSH (Medical Subject Headings)
    From http://www.nlm.nih.gov/mesh/MBrowser.html
  • 16. What’s on the Semantic Web?
    • How to search?
    • 17. Semantic Web Query Language (SPARQL)
    • 18. Semantic Web Search Engines
  • What’s on the Semantic Web?
    • How to search with SPARQL?
    • 19. Matching pattern on graph of triples
    • 20. Choose labeling mechanism e.g
    • 21. …from RDFS vocabulary (label)
    • 22. …from SKOS vocabulary (preferred label)
    • 23. …other
    • How to search with SPARQL?
    • 24. Matching pattern on graph of triples
    • 25. Choose predicate according to labeling mechanism
    • 26. Query on literal value
    What’s on the Semantic Web?
    <Subject> <Predicate> <Object>
    Resource
    rdfs:label
    ”Traditionelle chinesische Medizin”@de
  • 27. What’s on the Semantic Web?
    • How to search with Sindice?
    • 28. Query all literals with Greek encoded String “Χερσόνησος”
  • What’s on the Semantic Web?
    • How to search with Sindice?
    • 29. Query all literals with chinese encoded String “中医”
    Results
    <http://raynix.cn/…> dc:title "极客路线中医”
    ...
  • 30. What’s on the Semantic Web?
    • How to search embedded terms in URI?
    • 31. Example: “all resources with word traditional”
    dbpedia:TraditionalChineseMedicine
    dbpedia:TraditionalIrishMusic
    dbpedia:IrishTraditionalMusic
    ...
    with SPARQL filter
    select ?subject where {
    ?subject ?predicate ?object
    filter regex(?subject,”.*traditional.*chinese.*” )
    }
  • 32. What’s on the Semantic Web?
    • How to search embedded terms?
    • 33. Example: “all resources with word traditional”
    dbpedia:TraditionalChineseMedicine
    dbpedia:TraditionalIrishMusic
    dbpedia:IrishTraditionalMusic
    ...
    with Sindicestar-shaped queries (SIREn)
    Results
  • 34. NLP for the Semantic Web
    Multilingual/Ontology-based Information Extraction (BioCaster, OpenCalais)
    Ontology Localization (LabelTranslator)
    Ontology-based Natural Language Generation (CLANN)
  • 35. Multilingual/Ontology-based Information Extraction (Biocaster)
    http://born.nii.ac.jp
    concept = measles
    • Aggregates and processes health news
    • 36. Annotates news based on a multilingual ontology
    • 37. Uses proprietary format and SKOS-XL to maintain terminology

  • 38. Multilingual/Ontology-based Information Extraction (Biocaster)
    • Example: “Risk of measles outbreak in Malta unlikely…”
    [DISEASE]
    [COUNTRY]
    http://born.nii.ac.jp
  • 39. Multilingual/Ontology-based Information Extraction (Biocaster)
    • Challenges
    • 40. Multilingual adaptation
    • 41. Adaptation of information extracion rules to other domains
    • 42. Use of proprietary format is undesirable
  • Multilingual Information Extraction (OpenCalais)
    • Semantic markup of unstructured text
    • 43. Multilingual (English, French, Spanish)
    • 44. English
    • 45. 39 entities
    • 46. 75 relations
  • Multilingual Information Extraction (OpenCalais)
    • Domain tuned (Finance, Biomedical)
    • 47. Only 15 base entities for non-English, no relations
    • 48. Demo
    http://viewer.opencalais.com
  • 49. Multilingual Information Extraction (OpenCalais)
    • Challenges
    • 50. Multilingual adaptation of lexicon and extraction rules
    • 51. Domain adaptation of lexicon and extraction rules
  • Ontology Localisation (LabelTranslator)
    • Multilingual ontology editor
    • 52. Linguistic annotations (Num., POS, Gender)
    • 53. … for a better translation
    Number + Gender
    part of
    speech
  • 54. Ontology Localisation (LabelTranslator)
    “river”@en
    “rivière”@fr
    “fleuve”@fr
    Ambiguous!
  • 55. Ontology Localisation (LabelTranslator)
    • Challenges
    • 56. Use linguistic features in the lexicon for better machine translation
    • 57. Use semantic features from the domain model as well
  • Natural Language Generation (CLANN)
    • Controlled Language ANNotations (CLANN)
    • 58. To write domain specific grammars (meeting minutes)
    • 59. Intermediate representation
    Domain ontology (e.g. meeting minutes)
    MLink Grammer
    LinkedGrammar
  • 60. Natural Language Generation (CLANN)
    • Example
    parse tree (absract)
    “John will present lemon model.”
    aux
    :Sentence1 :hasRootNode [
    rdf:type :TextNode ;
    :hasText "present" ;
    :hasSubType :Verb ;
    :hasObject [ rdf:type :TextNode ;
    :hasText "model" ;
    :hasObjectModifier [ rdf:type :TextNode ;
    :hasText "lemon" .
    ] ] ]
    nsubj
    dobj
    parse tree
    In MLINK
  • 61. Natural Language Generation (CLANN)
    • Challenges
    • 62. From text to triples?
    • 63. Domain adaptation (meeting minutes)
    • 64. Multilingual adaptation
  • Summary
    • Web and Semantic Web is
    • 65. “Lingual” (variations within one language)
    • 66. Multilingual (between languages and cultures)
    • 67. NLP Applications need domain and multilingual adaptation
    • 68. Lexicon updates / extensions
    • 69. Extraction rules updates / extensions
    • 70. What do we need?
    • 71. Efficient adaptation and sharing of linguistic resources between ontology-based NLP applications
  • Links and resources
    • Tutorial website
    • 72. http://tiny.cc/tvzlc
    • 73. The Monnet Project
    • 74. Multilingual Ontologies for Network for Networked Knowledge
    • 75. http://www.monnet-project.eu/
    • 76. Lexinfo
    • 77. http://lexinfo.net/