SlideShare a Scribd company logo
1 of 23
Download to read offline
How to compute
semantic relationships
between entities
and facts out of
natural texts
Michael Fuchs
Technology Evangelist
ABBYY
fuchs@abbyy.com
Agenda
1. How machines read pixels
2. Documents, words, layout & semantics
3. Syntactic & semantic text parsing
4. Live demo
5. Q&A
2
How machines read pixels
3
Separate pixels to charactersPixel analysis Find text/image blocks
How machines read pixels
4
Build proper words as editable textRecognize individual characters
-> Linguistics: Alphabets & Morphology Dictionaries
-> Math, AI, Statistics, Experience, and…
Requirements to make a machine read text:
5
What is needed to make
a machine understand the meaning
of words, sentences, texts?
Documents & Words
6
What is a document?
Statistics can give
basic insights
-> No real semantic
understanding
b) Words in order?
Layouts generate
visual pattern
-> Semantics can be
derived from layout
a) Bag of words?
Documents, Words and Layout
7
Document with layout
Text document with “simulated” layout Text with line breaks
Text only
-> Rules can extract data out of (semi-)structured texts and documents
-> Layout helps to identify the semantic meaning of data
Text and Structure
Is “plain” natural language text unstructured?
8
-> yes, at least for almost all IT systems
-> not for humans who can read and
speak the language
-> Facts and their relations can’t be reliably
detected with “simple” rules
Text, Structure & Translation
9
Is a word by word translation enough?
-> … well – not really…
-> Semantic understanding of the words and
their relationship in sentences is needed!
-> That is true for humans and machines
Text & Structure
10
Why is natural language text understanding difficult for machines?
-> Languages are not logical and context dependent
– different usage, e.g. as verb, noun, adjective
-> Different words – the same concept, e.g. to buy/sell something
– different meanings, e.g. run, plant, apple …
-> One word – different variants, e.g. go, went, gone
Basic Language Structure
11
-> Morphology = Rules how to use words
-> Semantics = meaning and the usage of words
-> Semantic Relations = reflect/organise the meaning and
relations of words and sentences.
-> Syntax = Rules are used to build correct sentences
How to get to the insides of a sentence?
Compreno System Architecture
13
Extraction rules
Interpretation
rules
Identification
rules
Morphological
analyzer
Syntactic and
semantic analysis
Anaphora
resolution
Disambiguation
Semantic
representation
of text
Parser Information
Extraction
Module
RDF Graph
Morphology Analysis
1414
Sentence Analysis with Semantic Info
15
17
How to get the correct
semantic meaning of words?
ABBYY’s answer:
Universal Semantic Hierarchy
= language independent semantic concepts
ABBYY’s Universal Semantic Hierarchy
18
Semantic Meaning “Vocabulary” EN “Vocabulary” DE
Handling Lexical Ambiguity
19
Recovering Omitted Words and Links (Ellipsis)
20
Recovered Node
Ellipsis
Identifying Pronoun Referents (Anaphora)
21
Mary saw her students. They were wearing masks. She was surprised.
(Mary → her, Mary → she, students → they).
From Text to Semantic with Compreno
22
DEMO
Summary: What is ABBYY Compreno?
● … NLP technology featuring a unique model-based approach that employs
universal language models and identifies language structures.
● …. combines both syntactic and semantic analysis, as well as machine learning
on untagged text corpora.
● … allows to create a semantic representation of text
● … able to resolve complex language phenomena:
− lexical ambiguity
− omitted words and links recovering ellipsis
− identifying pronoun referents anaphora
− coreference
− coordination and more
● … support of English, Russian, German in progress
24
QUESTIONS?
Thank you for
your attention!

More Related Content

Viewers also liked

Kostas Kastrantas | Business Opportunities with Linked Open Data
Kostas Kastrantas  | Business Opportunities with Linked Open DataKostas Kastrantas  | Business Opportunities with Linked Open Data
Kostas Kastrantas | Business Opportunities with Linked Open Data
semanticsconference
 

Viewers also liked (20)

Robert Isele | eccenca CorporateMemory - Semantically integrated Enterprise D...
Robert Isele | eccenca CorporateMemory - Semantically integrated Enterprise D...Robert Isele | eccenca CorporateMemory - Semantically integrated Enterprise D...
Robert Isele | eccenca CorporateMemory - Semantically integrated Enterprise D...
 
Philippe Martin and Jérémy Bénard | Importing, Translating and Exporting Know...
Philippe Martin and Jérémy Bénard | Importing, Translating and Exporting Know...Philippe Martin and Jérémy Bénard | Importing, Translating and Exporting Know...
Philippe Martin and Jérémy Bénard | Importing, Translating and Exporting Know...
 
Adam Bartusiak and Jörg Lässig | Semantic Processing for the Conversion of Un...
Adam Bartusiak and Jörg Lässig | Semantic Processing for the Conversion of Un...Adam Bartusiak and Jörg Lässig | Semantic Processing for the Conversion of Un...
Adam Bartusiak and Jörg Lässig | Semantic Processing for the Conversion of Un...
 
Vladimir Alexiev | Semantic Enrichment of Twitter Microposts Helps Understand...
Vladimir Alexiev | Semantic Enrichment of Twitter Microposts Helps Understand...Vladimir Alexiev | Semantic Enrichment of Twitter Microposts Helps Understand...
Vladimir Alexiev | Semantic Enrichment of Twitter Microposts Helps Understand...
 
Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...
Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...
Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...
 
Ben Gardner | Delivering a Linked Data warehouse and integrating across the w...
Ben Gardner | Delivering a Linked Data warehouse and integrating across the w...Ben Gardner | Delivering a Linked Data warehouse and integrating across the w...
Ben Gardner | Delivering a Linked Data warehouse and integrating across the w...
 
David Kuilman | Creating a Semantic Enterprise Content model to support conti...
David Kuilman | Creating a Semantic Enterprise Content model to support conti...David Kuilman | Creating a Semantic Enterprise Content model to support conti...
David Kuilman | Creating a Semantic Enterprise Content model to support conti...
 
Victor Charpenay | Standardized Semantics for an Open Web of Things
Victor Charpenay | Standardized Semantics for an Open Web of ThingsVictor Charpenay | Standardized Semantics for an Open Web of Things
Victor Charpenay | Standardized Semantics for an Open Web of Things
 
Shuangyong Song, Qingliang Miao and Yao Meng | Linking Images to Semantic Kno...
Shuangyong Song, Qingliang Miao and Yao Meng | Linking Images to Semantic Kno...Shuangyong Song, Qingliang Miao and Yao Meng | Linking Images to Semantic Kno...
Shuangyong Song, Qingliang Miao and Yao Meng | Linking Images to Semantic Kno...
 
Kolawole John Adebayo, Luigi Di Caro and Guido Boella | A Supervised Keyphras...
Kolawole John Adebayo, Luigi Di Caro and Guido Boella | A Supervised Keyphras...Kolawole John Adebayo, Luigi Di Caro and Guido Boella | A Supervised Keyphras...
Kolawole John Adebayo, Luigi Di Caro and Guido Boella | A Supervised Keyphras...
 
Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Tr...
Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Tr...Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Tr...
Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Tr...
 
Najmeh Mousavi Nejad, Simon Scerri, Sören Auer and Elisa M. Sibarani | EULAid...
Najmeh Mousavi Nejad, Simon Scerri, Sören Auer and Elisa M. Sibarani | EULAid...Najmeh Mousavi Nejad, Simon Scerri, Sören Auer and Elisa M. Sibarani | EULAid...
Najmeh Mousavi Nejad, Simon Scerri, Sören Auer and Elisa M. Sibarani | EULAid...
 
Chalitha Perera | Cross Media Concept and Entity Driven Search for Enterprise
Chalitha Perera | Cross Media Concept and Entity Driven Search for EnterpriseChalitha Perera | Cross Media Concept and Entity Driven Search for Enterprise
Chalitha Perera | Cross Media Concept and Entity Driven Search for Enterprise
 
Kostas Kastrantas | Business Opportunities with Linked Open Data
Kostas Kastrantas  | Business Opportunities with Linked Open DataKostas Kastrantas  | Business Opportunities with Linked Open Data
Kostas Kastrantas | Business Opportunities with Linked Open Data
 
OWL-based validation by Gavin Mendel Gleasonand Bojan Bozic, Trinity College,...
OWL-based validation by Gavin Mendel Gleasonand Bojan Bozic, Trinity College,...OWL-based validation by Gavin Mendel Gleasonand Bojan Bozic, Trinity College,...
OWL-based validation by Gavin Mendel Gleasonand Bojan Bozic, Trinity College,...
 
Thomas Vavra | New Ways of Handling Old Data
Thomas Vavra | New Ways of Handling Old DataThomas Vavra | New Ways of Handling Old Data
Thomas Vavra | New Ways of Handling Old Data
 
OOPS!: on-line ontology diagnosis by Maria Poveda
OOPS!: on-line ontology diagnosis by Maria PovedaOOPS!: on-line ontology diagnosis by Maria Poveda
OOPS!: on-line ontology diagnosis by Maria Poveda
 
Georgios Meditskos and Stamatia Dasiopoulou | Question Answering over Pattern...
Georgios Meditskos and Stamatia Dasiopoulou | Question Answering over Pattern...Georgios Meditskos and Stamatia Dasiopoulou | Question Answering over Pattern...
Georgios Meditskos and Stamatia Dasiopoulou | Question Answering over Pattern...
 
Felix Burkhardt | ARCHITECTURE FOR A QUESTION ANSWERING MACHINE
Felix Burkhardt | ARCHITECTURE FOR A QUESTION ANSWERING MACHINEFelix Burkhardt | ARCHITECTURE FOR A QUESTION ANSWERING MACHINE
Felix Burkhardt | ARCHITECTURE FOR A QUESTION ANSWERING MACHINE
 
Sören Auer | Enterprise Knowledge Graphs
Sören Auer | Enterprise Knowledge GraphsSören Auer | Enterprise Knowledge Graphs
Sören Auer | Enterprise Knowledge Graphs
 

Similar to Michael Fuchs | How to compute semantic relationships between entities and facts out of natural texts

Information retrieval based on word sens 1
Information retrieval based on word sens 1Information retrieval based on word sens 1
Information retrieval based on word sens 1
ATHMAN HAJ-HAMOU
 
Coaching kippsters to guided reading success
Coaching kippsters to guided reading successCoaching kippsters to guided reading success
Coaching kippsters to guided reading success
bvardiman
 
Introduction to Distributional Semantics
Introduction to Distributional SemanticsIntroduction to Distributional Semantics
Introduction to Distributional Semantics
Andre Freitas
 
Survey methods of_teaching_esl_reading
Survey methods of_teaching_esl_readingSurvey methods of_teaching_esl_reading
Survey methods of_teaching_esl_reading
Marv1
 

Similar to Michael Fuchs | How to compute semantic relationships between entities and facts out of natural texts (20)

NLP
NLPNLP
NLP
 
Textmining
TextminingTextmining
Textmining
 
Introduction to Semantic Technology for SharePoint Administrators
Introduction to Semantic Technology for SharePoint AdministratorsIntroduction to Semantic Technology for SharePoint Administrators
Introduction to Semantic Technology for SharePoint Administrators
 
Nlp
NlpNlp
Nlp
 
NLP
NLPNLP
NLP
 
NLP
NLPNLP
NLP
 
nlp (1).pptx
nlp (1).pptxnlp (1).pptx
nlp (1).pptx
 
Information retrieval based on word sens 1
Information retrieval based on word sens 1Information retrieval based on word sens 1
Information retrieval based on word sens 1
 
Coaching kippsters to guided reading success
Coaching kippsters to guided reading successCoaching kippsters to guided reading success
Coaching kippsters to guided reading success
 
Conceptual foundations of text mining and preprocessing steps nfaoui el_habib
Conceptual foundations of text mining and preprocessing steps nfaoui el_habibConceptual foundations of text mining and preprocessing steps nfaoui el_habib
Conceptual foundations of text mining and preprocessing steps nfaoui el_habib
 
Nlp (1)
Nlp (1)Nlp (1)
Nlp (1)
 
Big Data and Natural Language Processing
Big Data and Natural Language ProcessingBig Data and Natural Language Processing
Big Data and Natural Language Processing
 
NLP todo
NLP todoNLP todo
NLP todo
 
The search engine index
The search engine indexThe search engine index
The search engine index
 
Introduction to Distributional Semantics
Introduction to Distributional SemanticsIntroduction to Distributional Semantics
Introduction to Distributional Semantics
 
NLP.pptx
NLP.pptxNLP.pptx
NLP.pptx
 
Survey methods of_teaching_esl_reading
Survey methods of_teaching_esl_readingSurvey methods of_teaching_esl_reading
Survey methods of_teaching_esl_reading
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Reading Automaticity by David LaBerge and S Jay Samuels
Reading Automaticity by David LaBerge  and S Jay SamuelsReading Automaticity by David LaBerge  and S Jay Samuels
Reading Automaticity by David LaBerge and S Jay Samuels
 
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUECOMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
 

More from semanticsconference

More from semanticsconference (20)

Linear books to open world adventure
Linear books to open world adventureLinear books to open world adventure
Linear books to open world adventure
 
Session 1.2 high-precision, context-free entity linking exploiting unambigu...
Session 1.2   high-precision, context-free entity linking exploiting unambigu...Session 1.2   high-precision, context-free entity linking exploiting unambigu...
Session 1.2 high-precision, context-free entity linking exploiting unambigu...
 
Session 4.3 semantic annotation for enhancing collaborative ideation
Session 4.3   semantic annotation for enhancing collaborative ideationSession 4.3   semantic annotation for enhancing collaborative ideation
Session 4.3 semantic annotation for enhancing collaborative ideation
 
Session 1.1 dalicc - data licenses clearance center
Session 1.1   dalicc - data licenses clearance centerSession 1.1   dalicc - data licenses clearance center
Session 1.1 dalicc - data licenses clearance center
 
Session 1.3 context information management across smart city knowledge domains
Session 1.3   context information management across smart city knowledge domainsSession 1.3   context information management across smart city knowledge domains
Session 1.3 context information management across smart city knowledge domains
 
Session 0.0 aussenac semanticsnl-pwebsem2017-v4
Session 0.0   aussenac semanticsnl-pwebsem2017-v4Session 0.0   aussenac semanticsnl-pwebsem2017-v4
Session 0.0 aussenac semanticsnl-pwebsem2017-v4
 
Session 0.0 keynote sandeep sacheti - final hi res
Session 0.0   keynote sandeep sacheti - final hi resSession 0.0   keynote sandeep sacheti - final hi res
Session 0.0 keynote sandeep sacheti - final hi res
 
Session 1.1 linked data applied: a field report from the netherlands
Session 1.1   linked data applied: a field report from the netherlandsSession 1.1   linked data applied: a field report from the netherlands
Session 1.1 linked data applied: a field report from the netherlands
 
Session 1.2 enrich your knowledge graphs: linked data integration with pool...
Session 1.2   enrich your knowledge graphs: linked data integration with pool...Session 1.2   enrich your knowledge graphs: linked data integration with pool...
Session 1.2 enrich your knowledge graphs: linked data integration with pool...
 
Session 1.4 connecting information from legislation and datasets using a ca...
Session 1.4   connecting information from legislation and datasets using a ca...Session 1.4   connecting information from legislation and datasets using a ca...
Session 1.4 connecting information from legislation and datasets using a ca...
 
Session 1.4 a distributed network of heritage information
Session 1.4   a distributed network of heritage informationSession 1.4   a distributed network of heritage information
Session 1.4 a distributed network of heritage information
 
Session 0.0 media panel - matthias priem - gtuo - semantics 2017
Session 0.0   media panel - matthias priem - gtuo - semantics 2017Session 0.0   media panel - matthias priem - gtuo - semantics 2017
Session 0.0 media panel - matthias priem - gtuo - semantics 2017
 
Session 1.3 semantic asset management in the dutch rail engineering and con...
Session 1.3   semantic asset management in the dutch rail engineering and con...Session 1.3   semantic asset management in the dutch rail engineering and con...
Session 1.3 semantic asset management in the dutch rail engineering and con...
 
Session 1.3 energy, smart homes & smart grids: towards interoperability...
Session 1.3   energy, smart homes & smart grids: towards interoperability...Session 1.3   energy, smart homes & smart grids: towards interoperability...
Session 1.3 energy, smart homes & smart grids: towards interoperability...
 
Session 1.2 improving access to digital content by semantic enrichment
Session 1.2   improving access to digital content by semantic enrichmentSession 1.2   improving access to digital content by semantic enrichment
Session 1.2 improving access to digital content by semantic enrichment
 
Session 2.3 semantics for safeguarding & security – a police story
Session 2.3   semantics for safeguarding & security – a police storySession 2.3   semantics for safeguarding & security – a police story
Session 2.3 semantics for safeguarding & security – a police story
 
Session 2.5 semantic similarity based clustering of license excerpts for im...
Session 2.5   semantic similarity based clustering of license excerpts for im...Session 2.5   semantic similarity based clustering of license excerpts for im...
Session 2.5 semantic similarity based clustering of license excerpts for im...
 
Session 4.2 unleash the triple: leveraging a corporate discovery interface....
Session 4.2   unleash the triple: leveraging a corporate discovery interface....Session 4.2   unleash the triple: leveraging a corporate discovery interface....
Session 4.2 unleash the triple: leveraging a corporate discovery interface....
 
Session 1.6 slovak public metadata governance and management based on linke...
Session 1.6   slovak public metadata governance and management based on linke...Session 1.6   slovak public metadata governance and management based on linke...
Session 1.6 slovak public metadata governance and management based on linke...
 
Session 5.6 towards a semantic outlier detection framework in wireless sens...
Session 5.6   towards a semantic outlier detection framework in wireless sens...Session 5.6   towards a semantic outlier detection framework in wireless sens...
Session 5.6 towards a semantic outlier detection framework in wireless sens...
 

Recently uploaded

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Recently uploaded (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

Michael Fuchs | How to compute semantic relationships between entities and facts out of natural texts

  • 1. How to compute semantic relationships between entities and facts out of natural texts Michael Fuchs Technology Evangelist ABBYY fuchs@abbyy.com
  • 2. Agenda 1. How machines read pixels 2. Documents, words, layout & semantics 3. Syntactic & semantic text parsing 4. Live demo 5. Q&A 2
  • 3. How machines read pixels 3 Separate pixels to charactersPixel analysis Find text/image blocks
  • 4. How machines read pixels 4 Build proper words as editable textRecognize individual characters -> Linguistics: Alphabets & Morphology Dictionaries -> Math, AI, Statistics, Experience, and… Requirements to make a machine read text:
  • 5. 5 What is needed to make a machine understand the meaning of words, sentences, texts?
  • 6. Documents & Words 6 What is a document? Statistics can give basic insights -> No real semantic understanding b) Words in order? Layouts generate visual pattern -> Semantics can be derived from layout a) Bag of words?
  • 7. Documents, Words and Layout 7 Document with layout Text document with “simulated” layout Text with line breaks Text only -> Rules can extract data out of (semi-)structured texts and documents -> Layout helps to identify the semantic meaning of data
  • 8. Text and Structure Is “plain” natural language text unstructured? 8 -> yes, at least for almost all IT systems -> not for humans who can read and speak the language -> Facts and their relations can’t be reliably detected with “simple” rules
  • 9. Text, Structure & Translation 9 Is a word by word translation enough? -> … well – not really… -> Semantic understanding of the words and their relationship in sentences is needed! -> That is true for humans and machines
  • 10. Text & Structure 10 Why is natural language text understanding difficult for machines? -> Languages are not logical and context dependent – different usage, e.g. as verb, noun, adjective -> Different words – the same concept, e.g. to buy/sell something – different meanings, e.g. run, plant, apple … -> One word – different variants, e.g. go, went, gone
  • 11. Basic Language Structure 11 -> Morphology = Rules how to use words -> Semantics = meaning and the usage of words -> Semantic Relations = reflect/organise the meaning and relations of words and sentences. -> Syntax = Rules are used to build correct sentences How to get to the insides of a sentence?
  • 12. Compreno System Architecture 13 Extraction rules Interpretation rules Identification rules Morphological analyzer Syntactic and semantic analysis Anaphora resolution Disambiguation Semantic representation of text Parser Information Extraction Module RDF Graph
  • 14. Sentence Analysis with Semantic Info 15
  • 15. 17 How to get the correct semantic meaning of words? ABBYY’s answer: Universal Semantic Hierarchy = language independent semantic concepts
  • 16. ABBYY’s Universal Semantic Hierarchy 18 Semantic Meaning “Vocabulary” EN “Vocabulary” DE
  • 18. Recovering Omitted Words and Links (Ellipsis) 20 Recovered Node Ellipsis
  • 19. Identifying Pronoun Referents (Anaphora) 21 Mary saw her students. They were wearing masks. She was surprised. (Mary → her, Mary → she, students → they).
  • 20. From Text to Semantic with Compreno 22
  • 21. DEMO
  • 22. Summary: What is ABBYY Compreno? ● … NLP technology featuring a unique model-based approach that employs universal language models and identifies language structures. ● …. combines both syntactic and semantic analysis, as well as machine learning on untagged text corpora. ● … allows to create a semantic representation of text ● … able to resolve complex language phenomena: − lexical ambiguity − omitted words and links recovering ellipsis − identifying pronoun referents anaphora − coreference − coordination and more ● … support of English, Russian, German in progress 24