A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia

Andre Freitas
Andre FreitasLecturer at University of Manchester
Digital Enterprise Research Institute                                          www.deri.ie




             A Semantic Best-Effort Approach for
               Extracting Structured Discourse
                   Graphs from Wikipedia
                  André Freitas, Danilo Carvalho, J. C. P. da Silva,
                           Sean O’Riain, Edward Curry




© Copyright 2009 Digital Enterprise Research Institute. All rights reserved.
Outline
Digital Enterprise Research Institute                   www.deri.ie



           Motivation
           Representation
                 Requirements
                 Semantic Best-effort Representation
           Extraction
                 Graphia Extractor
                 Preliminary Evaluation
                 Extraction Examples
           Conclusion
Digital Enterprise Research Institute           www.deri.ie




                                        Motivation
Motivation
Digital Enterprise Research Institute   www.deri.ie
Motivation
Digital Enterprise Research Institute   www.deri.ie
Motivation
Digital Enterprise Research Institute   www.deri.ie
Motivation
Digital Enterprise Research Institute                                               www.deri.ie




Natural language texts                               Linked Data
     No terminological or structural
      regularity
                                                           Terminological and
                                                            structural regularity
     Highly contextualized
                                                           Shared semantic agreement
     Complex semantic dependency                           between data consumers
      relations
     Ambiguity
                            Information selection/normalization

                                 - vocabulary constraints
                                 + entity-centric
                                 + pay-as-you-go data semantics
                                  = semantic best-effort
Motivation
Digital Enterprise Research Institute                         www.deri.ie


         Vocabulary-independent (schema-free queries)
               How to abstract users from knowing the data
                representation?
               Semantic matching


         Schemaless databases in the limit demands
          vocabulary-independency

         How information extraction is reshaped in this
          scenario?
Motivational Scenario
Digital Enterprise Research Institute                                             www.deri.ie


     What is the relationship between Barack Obama and Indonesia?




                                         Semantic Best-
                                        effort Extraction


Sentence: From age six
to ten, Obama attended
local schools in Jakarta,                                   Entity-centric text
including Besuki Public                                      representation
School and St Francis
Assisi School.
Digital Enterprise Research Institute                www.deri.ie




                                        Representation
Computational Linguistics Perspective
Digital Enterprise Research Institute                   www.deri.ie



         What is already there to represent NL?
               Discourse Representation Theory (DRT)
               Semantic Role Labeling (SRL)
Discourse Representation Theory (DRT)
Digital Enterprise Research Institute                                 www.deri.ie

           “The key idea behind (...) Discourse Representation Theory is
            that each new sentence of a discourse is interpreted in the
            context provided by the sentences preceding it.”
                                           van Eijck and Kamp
        Models propositions in discourse (multiple sentences).
        Discourse representation structures (DRS).

        John enters a card. Every card is green.
Semantic Role Labeling (SRL)
Digital Enterprise Research Institute                                 www.deri.ie



        Shallow semantic parsing.
        Detection of arguments associated with a predicate.
        Associated semantic types to arguments.




      Bill cut his hair with a razor
     [Agent Bill] cut [Patient his hair] [Instrument with a razor.]
Semantic Best-Effort
Digital Enterprise Research Institute                                       www.deri.ie




           Objectives:
                 Entity-centric & Standardized: easier to integrate with
                  other resources
                 Remove the formal constraints and the ‘baggage’          from
                  existing approaches
                 Representation robust to extraction limitations/errors
Semantic Best-Effort Requirements
Digital Enterprise Research Institute                      www.deri.ie



           Text segmentation into (s,p,o)s
           Context representation
           Conceptual model independency
           Resolve co-references (pay-as-you-go)
           Represent recurrent discourse structures
           Standardized representation (RDF(S))
           Principled interpretation (compositionality)
Examples
Digital Enterprise Research Institute              www.deri.ie




         - Text segmentation into (s,p,o)s
         - Context representation
         - Resolve co-references (pay-as-you-go)
         - Conceptual model independency
Examples
Digital Enterprise Research Institute   www.deri.ie




         -    Context representation
Examples
Digital Enterprise Research Institute   www.deri.ie
Examples
Digital Enterprise Research Institute                                       www.deri.ie




                                        - Represent recurrent discourse structures
Examples
Digital Enterprise Research Institute                                             www.deri.ie




                                        - Represent recurrent discourse structures
                                        - Resolve co-references (pay-as-you-go)
Examples
Digital Enterprise Research Institute                                          www.deri.ie




                                        -   Represent recurrent discourse structures
SDG Elements
Digital Enterprise Research Institute                  www.deri.ie



           Named, non-named entities and properties
           Quantifiers & operators
           Triple Trees
           Context elements
           Co-Referential elements
           Resolved & normalized entities
Graph Patterns
Digital Enterprise Research Institute   www.deri.ie
[[Interpretation]]
Digital Enterprise Research Institute        www.deri.ie



         Graph traversal – deref sequence
Digital Enterprise Research Institute                www.deri.ie




                                        Extraction
SBE Graph Extraction Tool
Digital Enterprise Research Institute   www.deri.ie
Extraction Pipeline Architecture
Digital Enterprise Research Institute                                    www.deri.ie




                                           Subject
                                           Predicate
                                           Object
                                           Prepositional phrase & Noun complement
                                           Reification
                                           Time
Preliminary Evaluation
Digital Enterprise Research Institute                       www.deri.ie



          1033 relations (triples) from 150 sentences from 5
           randomly selected Wikipedia articles
          Manually classified the graphs: error categories
           and accuracy.
Preliminary Evaluation
Digital Enterprise Research Institute   www.deri.ie
Other Extraction Examples
Digital Enterprise Research Institute   www.deri.ie
Other Extraction Examples
Digital Enterprise Research Institute   www.deri.ie
Other Extraction Examples
Digital Enterprise Research Institute   www.deri.ie
Other Extraction Examples
Digital Enterprise Research Institute   www.deri.ie
Other Extraction Examples
Digital Enterprise Research Institute   www.deri.ie
Other Extraction Examples
Digital Enterprise Research Institute   www.deri.ie
Other Extraction Examples
Digital Enterprise Research Institute   www.deri.ie
Other Extraction Examples
Digital Enterprise Research Institute   www.deri.ie
Other Extraction Examples
Digital Enterprise Research Institute   www.deri.ie
Other Extraction Examples
Digital Enterprise Research Institute   www.deri.ie
Other Extraction Examples
Digital Enterprise Research Institute   www.deri.ie
Other Extraction Examples
Digital Enterprise Research Institute   www.deri.ie
Conclusion
Digital Enterprise Research Institute                                 www.deri.ie


          Main direction for improvement is completeness
                Aligned with the pay-as-you-go scenario
          Still need to define clear criteria for what you can’t extract
          There is still a long way to go (e.g. complex subordination)
          Investigation using existing n-ary relations patterns
          Context (reification) should be a first-class citizen in the
           representation of natural language
          Focus on getting the semantic pivots (rigid designators) right
          Worth putting effort on enumerable patterns (timestamps,
           operators)
1 of 42

More Related Content

Viewers also liked(6)

Discourse analysis and grammar Discourse analysis and grammar
Discourse analysis and grammar
Septy Riani Pangindoman18.3K views
Discourse Discourse
Discourse
Eika Matari9.1K views
Discourse analysis and grammarDiscourse analysis and grammar
Discourse analysis and grammar
Amal Mustafa29.9K views
Discourse analysisDiscourse analysis
Discourse analysis
Melikarj101.9K views
Discourse AnalysisDiscourse Analysis
Discourse Analysis
Dr. Cupid Lucid159.8K views

Similar to A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia(20)

Hello Open World - Semtech 2009Hello Open World - Semtech 2009
Hello Open World - Semtech 2009
Alexandre Passant580 views
Knowledge management on the desktopKnowledge management on the desktop
Knowledge management on the desktop
Laura Dragan457 views
Semantic DesktopSemantic Desktop
Semantic Desktop
Laura Dragan547 views
VoID: Metadata for RDF DatasetsVoID: Metadata for RDF Datasets
VoID: Metadata for RDF Datasets
Richard Cyganiak2.8K views
What is SDMX-RDF?What is SDMX-RDF?
What is SDMX-RDF?
Richard Cyganiak2.5K views
Data Curation at the New York TimesData Curation at the New York Times
Data Curation at the New York Times
Edward Curry6.9K views
Reading Group 2013 (DERI NUIG)Reading Group 2013 (DERI NUIG)
Reading Group 2013 (DERI NUIG)
Bianca Pereira352 views

More from Andre Freitas(20)

AI Systems @ ManchesterAI Systems @ Manchester
AI Systems @ Manchester
Andre Freitas450 views
AI Beyond Deep LearningAI Beyond Deep Learning
AI Beyond Deep Learning
Andre Freitas403 views
Open IE tutorial 2018Open IE tutorial 2018
Open IE tutorial 2018
Andre Freitas2K views
WiSS Challenge - Day 2WiSS Challenge - Day 2
WiSS Challenge - Day 2
Andre Freitas626 views

A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia

Editor's Notes

  1. Change to horizontal
  2. Change to horizontal
  3. Change to horizontal
  4. Change to horizontal
  5. Emphasize entity
  6. Logic and linguistics have had lively connections from Antiquity right until today
  7. Limitations. Categories of requirements.
  8. Relate with req