Digital Enterprise Research Institute                                          www.deri.ie             A Semantic Best-Eff...
OutlineDigital Enterprise Research Institute                   www.deri.ie           Motivation           Representation...
Digital Enterprise Research Institute           www.deri.ie                                        Motivation
MotivationDigital Enterprise Research Institute   www.deri.ie
MotivationDigital Enterprise Research Institute   www.deri.ie
MotivationDigital Enterprise Research Institute   www.deri.ie
MotivationDigital Enterprise Research Institute                                               www.deri.ieNatural language ...
MotivationDigital Enterprise Research Institute                         www.deri.ie         Vocabulary-independent (schem...
Motivational ScenarioDigital Enterprise Research Institute                                             www.deri.ie     Wha...
Digital Enterprise Research Institute                www.deri.ie                                        Representation
Computational Linguistics PerspectiveDigital Enterprise Research Institute                   www.deri.ie         What is ...
Discourse Representation Theory (DRT)Digital Enterprise Research Institute                                 www.deri.ie    ...
Semantic Role Labeling (SRL)Digital Enterprise Research Institute                                 www.deri.ie        Shal...
Semantic Best-EffortDigital Enterprise Research Institute                                       www.deri.ie           Obj...
Semantic Best-Effort RequirementsDigital Enterprise Research Institute                      www.deri.ie           Text se...
ExamplesDigital Enterprise Research Institute              www.deri.ie         - Text segmentation into (s,p,o)s         -...
ExamplesDigital Enterprise Research Institute   www.deri.ie         -    Context representation
ExamplesDigital Enterprise Research Institute   www.deri.ie
ExamplesDigital Enterprise Research Institute                                       www.deri.ie                           ...
ExamplesDigital Enterprise Research Institute                                             www.deri.ie                     ...
ExamplesDigital Enterprise Research Institute                                          www.deri.ie                        ...
SDG ElementsDigital Enterprise Research Institute                  www.deri.ie           Named, non-named entities and pr...
Graph PatternsDigital Enterprise Research Institute   www.deri.ie
[[Interpretation]]Digital Enterprise Research Institute        www.deri.ie         Graph traversal – deref sequence
Digital Enterprise Research Institute                www.deri.ie                                        Extraction
SBE Graph Extraction ToolDigital Enterprise Research Institute   www.deri.ie
Extraction Pipeline ArchitectureDigital Enterprise Research Institute                                    www.deri.ie      ...
Preliminary EvaluationDigital Enterprise Research Institute                       www.deri.ie          1033 relations (tr...
Preliminary EvaluationDigital Enterprise Research Institute   www.deri.ie
Other Extraction ExamplesDigital Enterprise Research Institute   www.deri.ie
Other Extraction ExamplesDigital Enterprise Research Institute   www.deri.ie
Other Extraction ExamplesDigital Enterprise Research Institute   www.deri.ie
Other Extraction ExamplesDigital Enterprise Research Institute   www.deri.ie
Other Extraction ExamplesDigital Enterprise Research Institute   www.deri.ie
Other Extraction ExamplesDigital Enterprise Research Institute   www.deri.ie
Other Extraction ExamplesDigital Enterprise Research Institute   www.deri.ie
Other Extraction ExamplesDigital Enterprise Research Institute   www.deri.ie
Other Extraction ExamplesDigital Enterprise Research Institute   www.deri.ie
Other Extraction ExamplesDigital Enterprise Research Institute   www.deri.ie
Other Extraction ExamplesDigital Enterprise Research Institute   www.deri.ie
Other Extraction ExamplesDigital Enterprise Research Institute   www.deri.ie
ConclusionDigital Enterprise Research Institute                                 www.deri.ie          Main direction for i...
Upcoming SlideShare
Loading in...5
×

A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia

963

Published on

Most information extraction approaches available today have either focused on the extraction of simple relations or in scenarios where
data extracted from texts should be normalized into a database schema or ontology. Some relevant information present in natural language texts,
however, can be irregular, highly contextualized, with complex semantic dependency relations, poorly structured, and intrinsically ambiguous.
These characteristics should also be supported by an information extraction approach. To cope with this scenario, this work introduces a seman-
tic best-effort information extraction approach, which targets an information extraction scenario where text information is extracted under a
pay-as-you-go data quality perspective, trading high-accuracy, schema consistency and terminological normalization for domain-independency,
context capture, wider extraction scope and maximization of the text semantics extraction and representation. A semantic information ex-
traction framework (Graphia) is implemented and evaluated over the Wikipedia corpus.

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
963
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Change to horizontal
  • Change to horizontal
  • Change to horizontal
  • Change to horizontal
  • Emphasize entity
  • Logic and linguistics have had lively connections from Antiquity right until today
  • Limitations. Categories of requirements.
  • Relate with req
  • A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia

    1. 1. Digital Enterprise Research Institute www.deri.ie A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia André Freitas, Danilo Carvalho, J. C. P. da Silva, Sean O’Riain, Edward Curry© Copyright 2009 Digital Enterprise Research Institute. All rights reserved.
    2. 2. OutlineDigital Enterprise Research Institute www.deri.ie  Motivation  Representation  Requirements  Semantic Best-effort Representation  Extraction  Graphia Extractor  Preliminary Evaluation  Extraction Examples  Conclusion
    3. 3. Digital Enterprise Research Institute www.deri.ie Motivation
    4. 4. MotivationDigital Enterprise Research Institute www.deri.ie
    5. 5. MotivationDigital Enterprise Research Institute www.deri.ie
    6. 6. MotivationDigital Enterprise Research Institute www.deri.ie
    7. 7. MotivationDigital Enterprise Research Institute www.deri.ieNatural language texts Linked Data  No terminological or structural regularity  Terminological and structural regularity  Highly contextualized  Shared semantic agreement  Complex semantic dependency between data consumers relations  Ambiguity Information selection/normalization - vocabulary constraints + entity-centric + pay-as-you-go data semantics = semantic best-effort
    8. 8. MotivationDigital Enterprise Research Institute www.deri.ie  Vocabulary-independent (schema-free queries)  How to abstract users from knowing the data representation?  Semantic matching  Schemaless databases in the limit demands vocabulary-independency  How information extraction is reshaped in this scenario?
    9. 9. Motivational ScenarioDigital Enterprise Research Institute www.deri.ie What is the relationship between Barack Obama and Indonesia? Semantic Best- effort ExtractionSentence: From age sixto ten, Obama attendedlocal schools in Jakarta, Entity-centric textincluding Besuki Public representationSchool and St FrancisAssisi School.
    10. 10. Digital Enterprise Research Institute www.deri.ie Representation
    11. 11. Computational Linguistics PerspectiveDigital Enterprise Research Institute www.deri.ie  What is already there to represent NL?  Discourse Representation Theory (DRT)  Semantic Role Labeling (SRL)
    12. 12. Discourse Representation Theory (DRT)Digital Enterprise Research Institute www.deri.ie  “The key idea behind (...) Discourse Representation Theory is that each new sentence of a discourse is interpreted in the context provided by the sentences preceding it.” van Eijck and Kamp  Models propositions in discourse (multiple sentences).  Discourse representation structures (DRS). John enters a card. Every card is green.
    13. 13. Semantic Role Labeling (SRL)Digital Enterprise Research Institute www.deri.ie  Shallow semantic parsing.  Detection of arguments associated with a predicate.  Associated semantic types to arguments. Bill cut his hair with a razor [Agent Bill] cut [Patient his hair] [Instrument with a razor.]
    14. 14. Semantic Best-EffortDigital Enterprise Research Institute www.deri.ie  Objectives:  Entity-centric & Standardized: easier to integrate with other resources  Remove the formal constraints and the ‘baggage’ from existing approaches  Representation robust to extraction limitations/errors
    15. 15. Semantic Best-Effort RequirementsDigital Enterprise Research Institute www.deri.ie  Text segmentation into (s,p,o)s  Context representation  Conceptual model independency  Resolve co-references (pay-as-you-go)  Represent recurrent discourse structures  Standardized representation (RDF(S))  Principled interpretation (compositionality)
    16. 16. ExamplesDigital Enterprise Research Institute www.deri.ie - Text segmentation into (s,p,o)s - Context representation - Resolve co-references (pay-as-you-go) - Conceptual model independency
    17. 17. ExamplesDigital Enterprise Research Institute www.deri.ie - Context representation
    18. 18. ExamplesDigital Enterprise Research Institute www.deri.ie
    19. 19. ExamplesDigital Enterprise Research Institute www.deri.ie - Represent recurrent discourse structures
    20. 20. ExamplesDigital Enterprise Research Institute www.deri.ie - Represent recurrent discourse structures - Resolve co-references (pay-as-you-go)
    21. 21. ExamplesDigital Enterprise Research Institute www.deri.ie - Represent recurrent discourse structures
    22. 22. SDG ElementsDigital Enterprise Research Institute www.deri.ie  Named, non-named entities and properties  Quantifiers & operators  Triple Trees  Context elements  Co-Referential elements  Resolved & normalized entities
    23. 23. Graph PatternsDigital Enterprise Research Institute www.deri.ie
    24. 24. [[Interpretation]]Digital Enterprise Research Institute www.deri.ie  Graph traversal – deref sequence
    25. 25. Digital Enterprise Research Institute www.deri.ie Extraction
    26. 26. SBE Graph Extraction ToolDigital Enterprise Research Institute www.deri.ie
    27. 27. Extraction Pipeline ArchitectureDigital Enterprise Research Institute www.deri.ie  Subject  Predicate  Object  Prepositional phrase & Noun complement  Reification  Time
    28. 28. Preliminary EvaluationDigital Enterprise Research Institute www.deri.ie  1033 relations (triples) from 150 sentences from 5 randomly selected Wikipedia articles  Manually classified the graphs: error categories and accuracy.
    29. 29. Preliminary EvaluationDigital Enterprise Research Institute www.deri.ie
    30. 30. Other Extraction ExamplesDigital Enterprise Research Institute www.deri.ie
    31. 31. Other Extraction ExamplesDigital Enterprise Research Institute www.deri.ie
    32. 32. Other Extraction ExamplesDigital Enterprise Research Institute www.deri.ie
    33. 33. Other Extraction ExamplesDigital Enterprise Research Institute www.deri.ie
    34. 34. Other Extraction ExamplesDigital Enterprise Research Institute www.deri.ie
    35. 35. Other Extraction ExamplesDigital Enterprise Research Institute www.deri.ie
    36. 36. Other Extraction ExamplesDigital Enterprise Research Institute www.deri.ie
    37. 37. Other Extraction ExamplesDigital Enterprise Research Institute www.deri.ie
    38. 38. Other Extraction ExamplesDigital Enterprise Research Institute www.deri.ie
    39. 39. Other Extraction ExamplesDigital Enterprise Research Institute www.deri.ie
    40. 40. Other Extraction ExamplesDigital Enterprise Research Institute www.deri.ie
    41. 41. Other Extraction ExamplesDigital Enterprise Research Institute www.deri.ie
    42. 42. ConclusionDigital Enterprise Research Institute www.deri.ie  Main direction for improvement is completeness  Aligned with the pay-as-you-go scenario  Still need to define clear criteria for what you can’t extract  There is still a long way to go (e.g. complex subordination)  Investigation using existing n-ary relations patterns  Context (reification) should be a first-class citizen in the representation of natural language  Focus on getting the semantic pivots (rigid designators) right  Worth putting effort on enumerable patterns (timestamps, operators)
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×