Your SlideShare is downloading. ×
0
A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia
A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia
A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia
A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia
A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia
A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia
A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia
A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia
A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia
A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia
A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia
A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia
A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia
A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia
A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia
A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia
A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia
A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia
A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia
A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia
A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia
A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia
A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia
A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia
A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia
A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia
A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia
A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia
A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia
A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia
A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia
A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia
A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia
A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia
A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia
A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia
A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia
A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia
A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia
A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia
A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia
A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia

923

Published on

Most information extraction approaches available today have either focused on the extraction of simple relations or in scenarios where …

Most information extraction approaches available today have either focused on the extraction of simple relations or in scenarios where
data extracted from texts should be normalized into a database schema or ontology. Some relevant information present in natural language texts,
however, can be irregular, highly contextualized, with complex semantic dependency relations, poorly structured, and intrinsically ambiguous.
These characteristics should also be supported by an information extraction approach. To cope with this scenario, this work introduces a seman-
tic best-effort information extraction approach, which targets an information extraction scenario where text information is extracted under a
pay-as-you-go data quality perspective, trading high-accuracy, schema consistency and terminological normalization for domain-independency,
context capture, wider extraction scope and maximization of the text semantics extraction and representation. A semantic information ex-
traction framework (Graphia) is implemented and evaluated over the Wikipedia corpus.

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
923
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Change to horizontal
  • Change to horizontal
  • Change to horizontal
  • Change to horizontal
  • Emphasize entity
  • Logic and linguistics have had lively connections from Antiquity right until today
  • Limitations. Categories of requirements.
  • Relate with req
  • Transcript

    • 1. Digital Enterprise Research Institute www.deri.ie A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia André Freitas, Danilo Carvalho, J. C. P. da Silva, Sean O’Riain, Edward Curry© Copyright 2009 Digital Enterprise Research Institute. All rights reserved.
    • 2. OutlineDigital Enterprise Research Institute www.deri.ie  Motivation  Representation  Requirements  Semantic Best-effort Representation  Extraction  Graphia Extractor  Preliminary Evaluation  Extraction Examples  Conclusion
    • 3. Digital Enterprise Research Institute www.deri.ie Motivation
    • 4. MotivationDigital Enterprise Research Institute www.deri.ie
    • 5. MotivationDigital Enterprise Research Institute www.deri.ie
    • 6. MotivationDigital Enterprise Research Institute www.deri.ie
    • 7. MotivationDigital Enterprise Research Institute www.deri.ieNatural language texts Linked Data  No terminological or structural regularity  Terminological and structural regularity  Highly contextualized  Shared semantic agreement  Complex semantic dependency between data consumers relations  Ambiguity Information selection/normalization - vocabulary constraints + entity-centric + pay-as-you-go data semantics = semantic best-effort
    • 8. MotivationDigital Enterprise Research Institute www.deri.ie  Vocabulary-independent (schema-free queries)  How to abstract users from knowing the data representation?  Semantic matching  Schemaless databases in the limit demands vocabulary-independency  How information extraction is reshaped in this scenario?
    • 9. Motivational ScenarioDigital Enterprise Research Institute www.deri.ie What is the relationship between Barack Obama and Indonesia? Semantic Best- effort ExtractionSentence: From age sixto ten, Obama attendedlocal schools in Jakarta, Entity-centric textincluding Besuki Public representationSchool and St FrancisAssisi School.
    • 10. Digital Enterprise Research Institute www.deri.ie Representation
    • 11. Computational Linguistics PerspectiveDigital Enterprise Research Institute www.deri.ie  What is already there to represent NL?  Discourse Representation Theory (DRT)  Semantic Role Labeling (SRL)
    • 12. Discourse Representation Theory (DRT)Digital Enterprise Research Institute www.deri.ie  “The key idea behind (...) Discourse Representation Theory is that each new sentence of a discourse is interpreted in the context provided by the sentences preceding it.” van Eijck and Kamp  Models propositions in discourse (multiple sentences).  Discourse representation structures (DRS). John enters a card. Every card is green.
    • 13. Semantic Role Labeling (SRL)Digital Enterprise Research Institute www.deri.ie  Shallow semantic parsing.  Detection of arguments associated with a predicate.  Associated semantic types to arguments. Bill cut his hair with a razor [Agent Bill] cut [Patient his hair] [Instrument with a razor.]
    • 14. Semantic Best-EffortDigital Enterprise Research Institute www.deri.ie  Objectives:  Entity-centric & Standardized: easier to integrate with other resources  Remove the formal constraints and the ‘baggage’ from existing approaches  Representation robust to extraction limitations/errors
    • 15. Semantic Best-Effort RequirementsDigital Enterprise Research Institute www.deri.ie  Text segmentation into (s,p,o)s  Context representation  Conceptual model independency  Resolve co-references (pay-as-you-go)  Represent recurrent discourse structures  Standardized representation (RDF(S))  Principled interpretation (compositionality)
    • 16. ExamplesDigital Enterprise Research Institute www.deri.ie - Text segmentation into (s,p,o)s - Context representation - Resolve co-references (pay-as-you-go) - Conceptual model independency
    • 17. ExamplesDigital Enterprise Research Institute www.deri.ie - Context representation
    • 18. ExamplesDigital Enterprise Research Institute www.deri.ie
    • 19. ExamplesDigital Enterprise Research Institute www.deri.ie - Represent recurrent discourse structures
    • 20. ExamplesDigital Enterprise Research Institute www.deri.ie - Represent recurrent discourse structures - Resolve co-references (pay-as-you-go)
    • 21. ExamplesDigital Enterprise Research Institute www.deri.ie - Represent recurrent discourse structures
    • 22. SDG ElementsDigital Enterprise Research Institute www.deri.ie  Named, non-named entities and properties  Quantifiers & operators  Triple Trees  Context elements  Co-Referential elements  Resolved & normalized entities
    • 23. Graph PatternsDigital Enterprise Research Institute www.deri.ie
    • 24. [[Interpretation]]Digital Enterprise Research Institute www.deri.ie  Graph traversal – deref sequence
    • 25. Digital Enterprise Research Institute www.deri.ie Extraction
    • 26. SBE Graph Extraction ToolDigital Enterprise Research Institute www.deri.ie
    • 27. Extraction Pipeline ArchitectureDigital Enterprise Research Institute www.deri.ie  Subject  Predicate  Object  Prepositional phrase & Noun complement  Reification  Time
    • 28. Preliminary EvaluationDigital Enterprise Research Institute www.deri.ie  1033 relations (triples) from 150 sentences from 5 randomly selected Wikipedia articles  Manually classified the graphs: error categories and accuracy.
    • 29. Preliminary EvaluationDigital Enterprise Research Institute www.deri.ie
    • 30. Other Extraction ExamplesDigital Enterprise Research Institute www.deri.ie
    • 31. Other Extraction ExamplesDigital Enterprise Research Institute www.deri.ie
    • 32. Other Extraction ExamplesDigital Enterprise Research Institute www.deri.ie
    • 33. Other Extraction ExamplesDigital Enterprise Research Institute www.deri.ie
    • 34. Other Extraction ExamplesDigital Enterprise Research Institute www.deri.ie
    • 35. Other Extraction ExamplesDigital Enterprise Research Institute www.deri.ie
    • 36. Other Extraction ExamplesDigital Enterprise Research Institute www.deri.ie
    • 37. Other Extraction ExamplesDigital Enterprise Research Institute www.deri.ie
    • 38. Other Extraction ExamplesDigital Enterprise Research Institute www.deri.ie
    • 39. Other Extraction ExamplesDigital Enterprise Research Institute www.deri.ie
    • 40. Other Extraction ExamplesDigital Enterprise Research Institute www.deri.ie
    • 41. Other Extraction ExamplesDigital Enterprise Research Institute www.deri.ie
    • 42. ConclusionDigital Enterprise Research Institute www.deri.ie  Main direction for improvement is completeness  Aligned with the pay-as-you-go scenario  Still need to define clear criteria for what you can’t extract  There is still a long way to go (e.g. complex subordination)  Investigation using existing n-ary relations patterns  Context (reification) should be a first-class citizen in the representation of natural language  Focus on getting the semantic pivots (rigid designators) right  Worth putting effort on enumerable patterns (timestamps, operators)

    ×