Michael Fuchs | How to compute semantic relationships between entities and facts out of natural texts
How to compute
and facts out of
1. How machines read pixels
2. Documents, words, layout & semantics
3. Syntactic & semantic text parsing
4. Live demo
How machines read pixels
Separate pixels to charactersPixel analysis Find text/image blocks
How machines read pixels
Build proper words as editable textRecognize individual characters
-> Linguistics: Alphabets & Morphology Dictionaries
-> Math, AI, Statistics, Experience, and…
Requirements to make a machine read text:
What is needed to make
a machine understand the meaning
of words, sentences, texts?
Documents & Words
What is a document?
Statistics can give
-> No real semantic
b) Words in order?
-> Semantics can be
derived from layout
a) Bag of words?
Documents, Words and Layout
Document with layout
Text document with “simulated” layout Text with line breaks
-> Rules can extract data out of (semi-)structured texts and documents
-> Layout helps to identify the semantic meaning of data
Text and Structure
Is “plain” natural language text unstructured?
-> yes, at least for almost all IT systems
-> not for humans who can read and
speak the language
-> Facts and their relations can’t be reliably
detected with “simple” rules
Text, Structure & Translation
Is a word by word translation enough?
-> … well – not really…
-> Semantic understanding of the words and
their relationship in sentences is needed!
-> That is true for humans and machines
Text & Structure
Why is natural language text understanding difficult for machines?
-> Languages are not logical and context dependent
– different usage, e.g. as verb, noun, adjective
-> Different words – the same concept, e.g. to buy/sell something
– different meanings, e.g. run, plant, apple …
-> One word – different variants, e.g. go, went, gone
Basic Language Structure
-> Morphology = Rules how to use words
-> Semantics = meaning and the usage of words
-> Semantic Relations = reflect/organise the meaning and
relations of words and sentences.
-> Syntax = Rules are used to build correct sentences
How to get to the insides of a sentence?
Compreno System Architecture
Summary: What is ABBYY Compreno?
● … NLP technology featuring a unique model-based approach that employs
universal language models and identifies language structures.
● …. combines both syntactic and semantic analysis, as well as machine learning
on untagged text corpora.
● … allows to create a semantic representation of text
● … able to resolve complex language phenomena:
− lexical ambiguity
− omitted words and links recovering ellipsis
− identifying pronoun referents anaphora
− coordination and more
● … support of English, Russian, German in progress