Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Michael Fuchs | How to compute semantic relationships between entities and facts out of natural texts

344 views

Published on

http://2016.semantics.cc/michael-fuchs

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Michael Fuchs | How to compute semantic relationships between entities and facts out of natural texts

  1. 1. How to compute semantic relationships between entities and facts out of natural texts Michael Fuchs Technology Evangelist ABBYY fuchs@abbyy.com
  2. 2. Agenda 1. How machines read pixels 2. Documents, words, layout & semantics 3. Syntactic & semantic text parsing 4. Live demo 5. Q&A 2
  3. 3. How machines read pixels 3 Separate pixels to charactersPixel analysis Find text/image blocks
  4. 4. How machines read pixels 4 Build proper words as editable textRecognize individual characters -> Linguistics: Alphabets & Morphology Dictionaries -> Math, AI, Statistics, Experience, and… Requirements to make a machine read text:
  5. 5. 5 What is needed to make a machine understand the meaning of words, sentences, texts?
  6. 6. Documents & Words 6 What is a document? Statistics can give basic insights -> No real semantic understanding b) Words in order? Layouts generate visual pattern -> Semantics can be derived from layout a) Bag of words?
  7. 7. Documents, Words and Layout 7 Document with layout Text document with “simulated” layout Text with line breaks Text only -> Rules can extract data out of (semi-)structured texts and documents -> Layout helps to identify the semantic meaning of data
  8. 8. Text and Structure Is “plain” natural language text unstructured? 8 -> yes, at least for almost all IT systems -> not for humans who can read and speak the language -> Facts and their relations can’t be reliably detected with “simple” rules
  9. 9. Text, Structure & Translation 9 Is a word by word translation enough? -> … well – not really… -> Semantic understanding of the words and their relationship in sentences is needed! -> That is true for humans and machines
  10. 10. Text & Structure 10 Why is natural language text understanding difficult for machines? -> Languages are not logical and context dependent – different usage, e.g. as verb, noun, adjective -> Different words – the same concept, e.g. to buy/sell something – different meanings, e.g. run, plant, apple … -> One word – different variants, e.g. go, went, gone
  11. 11. Basic Language Structure 11 -> Morphology = Rules how to use words -> Semantics = meaning and the usage of words -> Semantic Relations = reflect/organise the meaning and relations of words and sentences. -> Syntax = Rules are used to build correct sentences How to get to the insides of a sentence?
  12. 12. Compreno System Architecture 13 Extraction rules Interpretation rules Identification rules Morphological analyzer Syntactic and semantic analysis Anaphora resolution Disambiguation Semantic representation of text Parser Information Extraction Module RDF Graph
  13. 13. Morphology Analysis 1414
  14. 14. Sentence Analysis with Semantic Info 15
  15. 15. 17 How to get the correct semantic meaning of words? ABBYY’s answer: Universal Semantic Hierarchy = language independent semantic concepts
  16. 16. ABBYY’s Universal Semantic Hierarchy 18 Semantic Meaning “Vocabulary” EN “Vocabulary” DE
  17. 17. Handling Lexical Ambiguity 19
  18. 18. Recovering Omitted Words and Links (Ellipsis) 20 Recovered Node Ellipsis
  19. 19. Identifying Pronoun Referents (Anaphora) 21 Mary saw her students. They were wearing masks. She was surprised. (Mary → her, Mary → she, students → they).
  20. 20. From Text to Semantic with Compreno 22
  21. 21. DEMO
  22. 22. Summary: What is ABBYY Compreno? ● … NLP technology featuring a unique model-based approach that employs universal language models and identifies language structures. ● …. combines both syntactic and semantic analysis, as well as machine learning on untagged text corpora. ● … allows to create a semantic representation of text ● … able to resolve complex language phenomena: − lexical ambiguity − omitted words and links recovering ellipsis − identifying pronoun referents anaphora − coreference − coordination and more ● … support of English, Russian, German in progress 24
  23. 23. QUESTIONS? Thank you for your attention!

×