Lecture semantic augmentation

1,407 views
1,263 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,407
On SlideShare
0
From Embeds
0
Number of Embeds
330
Actions
Shares
0
Downloads
13
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • It is just not tagging
  • Upon their return, Lennon and McCartney went to New York to announce the formation of Apple Corps.
  • Upon their return, Lennon and McCartney went to New York to announce the formation of Apple Corps.
  • Lecture semantic augmentation

    1. 1. COMP3725Knowledge Enriched Information Systems Lecture 13: Semantic Augmentation Dhavalkumar Thakker (Dhaval) School of Computing, University of Leeds 1
    2. 2. Outline• Semantic Augmentation – What – Why – How• Existing systems & services for Semantic Augmentation• Challenges 2
    3. 3. Semantic Augmentation• From: (…) Upon their return, Lennon and McCartney went to New York to announce the formation of Apple Corps.• To: (…) Upon their return, Lennon and McCartney went to New York to announce the formation of Apple Corps. http://dbpedia.org/Ontology/New_York_City http://dbpedia.org/Ontology/Apple_Corps 3
    4. 4. Semantic Augmentation• Semantic augmentation is a process of attaching semantics to a selected part of a text to assist automatic interpretation of the meaning conveyed by the text.• Also called semantic annotation, semantic tagging 4
    5. 5. It provides additional information about an existing piece of data. 5
    6. 6. Why Semantic Augmentation?• Links to complementary information – “More about this”• Show related or similar informatiom• Reasoning and inferencing offered by semantics• Semantic annotation is the glue that ties ontologies into document spaces – remember existing web is document web• Manual metadata production cost is too high 6
    7. 7. GATE for Semantic Augmentation• GATE (General Architecture for Text Engineering) – see gate.ac.uk• GATE Developer is a development environment that provides a rich set of graphical interactive tools for the creation, measurement and maintenance of software components for processing human language.• See: http://gate.ac.uk/family/developer.html 7
    8. 8. Overview of Gate Developer• GATE Developer• Resources Pane – applications: groups of processes to run on a document or corpus – language resources: corpus, ontologies, schemas – processing resources: tools that operate on unstructured text – datastores: saved documents and resources• Display Pane: whatever you’re currently working with.• See next slide
    9. 9. GATE : InterfaceResourcesPane Display Pane 9
    10. 10. Processing Resources: ANNIE• A family of Processing Resources for language analysis included with GATE• Stands for A Nearly-New Information Extraction system.• Using finite state techniques to implement various tasks: tokenization, semantic tagging, verb phrase chunking, and so on.
    11. 11. ANNIE IE Modules http://gate.ac.uk/sale/tao/splitch6.html#chap:annie
    12. 12. Some ANNIE Components• Tokenizer – word, number, symbol, punctuation, and spaceToken.• Sentence Splitter – Segments text into sentences• Part of Speech Tagger – produces a part-of-speech tag as an annotation on each word or symbol – Nouns, verbs etc.• Gate Morphological Analyser – detecting morphemes in a piece of text (e.g. car, caring)• OntoGazetteer – Semantic Tagging component – uses ontology
    13. 13. Demo:• From: (…) Upon their return, Lennon and McCartney went to New York to announce the formation of Apple Corps.• To: (…) Upon their return, Lennon and McCartney went to New York to announce the formation of Apple Corps. http://dbpedia.org/Ontology/New_York_City http://dbpedia.org/Ontology/Apple_Corps 13 13
    14. 14. Step : Download & Start the GATE application• Download GATE from: http://gate.ac.uk/download/• Note: the demonstration is using GATE 6.0 14
    15. 15. Step: From Language Resources Select• GATE document-> Make sure that String content is selected in the last field, see screenshot below. Name the file “Test” 15
    16. 16. Paste following text…in the file• Upon their return, Lennon and McCartney went to New York to announce the formation of Apple Corps. 16
    17. 17. Step: From Processing resources select following resources• ANNIE English Tokeniser• ANNIE Sentence Splitter• ANNIE POS Tagger• GATE Morphological Analyser• Note: For all the above, leave the “Name” field Empty 17
    18. 18. Step: From Processing resources select following resources 18
    19. 19. Step: From Language Resources Select• OWLIM Ontology – Specify the location of the ontology you would like to use for semantic augmentation – For example, we are using dbpedia ontology 19
    20. 20. OWLIM Ontology window 20
    21. 21. From Processing Resources Select• Select Onto Root Gazetteer• & specify parameters as follows: 21
    22. 22. Final steps: Create Corpus• Go to Language resources and click on GATE Corpus, and add “Test” document created earlier 22
    23. 23. Final steps: Create Corpus Pipeline• From application• And add processing resources in order shown below and press “run this application” 23
    24. 24. Results: Go to file, Click on Annotation Set, Annotation List, LookupSemantic Augmentation 24
    25. 25. Other features• JAPE – a Java Annotation Patterns Engine, provides regular-expression based pattern/action rules over annotations. – Grammar to detect entities, validate detected entities, pre & post processing – Example: “at the Carnegie Stadium”, “at the Emirates Stadium”, “at the O2 Arena” – See Tutorial: http://gate.ac.uk/sale/thakker-jape- tutorial/index.html
    26. 26. Some Links• Home page is http://gate.ac.uk/• Some good short tutorial videos for getting started: http://gate.ac.uk/demos/developer-videos/ . These are only a few minutes each, so they’re fast• User Guide: http://gate.ac.uk/sale/tao/index.html . This is apparently for version 7.1, which is a development build, but again it seems to be fine.• Lots of documentation : http://gate.ac.uk/documentation.html• The wiki: http://gate.ac.uk/wiki/• JAPE grammar by Dhaval Thakker et al http://gate.ac.uk/sale/thakker-jape- tutorial/index.html
    27. 27. Challenge: Term Ambiguity• ...this apple on the palm of my hand...• ...Apple tried to acquire Palm Inc....• ...eating an apple sitted by a palm tree...• What do “apple” and “palm” mean in each case?• Objective is to recognize entities and disambiguate their meaning. DBpedia Spotlight: Shedding Light on the Web of Documents. Pablo Mendes, Max Jakob, Andrés García-Silva, and Christian Bizer. In: In the Proceedings of the 7th International Conference on Semantic Systems I-Semantics (2011) . 27
    28. 28. Challenges• Disambiguation• Unknown entities• Ontology learning• Scale and speed• Co-referencing
    29. 29. Existing Services for SemanticAugmentation
    30. 30. Existing Services for SemanticAugmentation
    31. 31. DBpedia Spotlight• DBpedia is a collection of entity descriptions extracted from Wikipedia & shared as linked data• DBpedia Spotlight uses data from DBpedia and text from associated Wikipedia pages• Learns how to recognize that a DBpedia resource was mentioned• Given plain text as input, generates annotated text http://dbpedia-spotlight.github.com/demo/ 31
    32. 32. DBpedia Spotlight 32
    33. 33. DBpedia Spotlight 33
    34. 34. References• DBpedia Spotlight: Shedding Light on the Web of Documents. Pablo Mendes, Max Jakob, Andrés García-Silva, and Christian Bizer. In: In the Proceedings of the 7th International Conference on Semantic Systems I-Semantics (2011) .• Introduction to GATE, Dr. Paula Matuszek• Various resources from gate.ac.uk 34

    ×