• Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
382
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
3
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Motivation Issues of transformation ConclusionsOn Distance between Deep Syntax and Semantic Representation V´clav Nov´k a a Institute of Formal and Applied Linguistics Charles University Prague, Czech Republic Frontiers in Linguistically Annotated Corpora July 22, 2006, 16:00 – 16:30 Sydney, Australia novak@ufal.mff.cuni.cz Syntax – Semantic Distance 1/ 20
  • 2. MotivationIssues of transformation Conclusions novak@ufal.mff.cuni.cz Syntax – Semantic Distance 2/ 20
  • 3. MotivationIssues of transformation Conclusions novak@ufal.mff.cuni.cz Syntax – Semantic Distance 2/ 20
  • 4. MotivationIssues of transformation Conclusions novak@ufal.mff.cuni.cz Syntax – Semantic Distance 2/ 20
  • 5. MotivationIssues of transformation Conclusions novak@ufal.mff.cuni.cz Syntax – Semantic Distance 2/ 20
  • 6. MotivationIssues of transformation Conclusions novak@ufal.mff.cuni.cz Syntax – Semantic Distance 2/ 20
  • 7. Motivation Issues of transformation ConclusionsPresentation Outline 1 Motivation MultiNet – Knowledge Representation Prague Dependency Treebank Missing pieces 2 Issues of transformation Mapping Topic-Focus Articulation Additional Requirements 3 Conclusions Conclusions Related Work Future Work novak@ufal.mff.cuni.cz Syntax – Semantic Distance 3/ 20
  • 8. Motivation MultiNet – Knowledge Representation Issues of transformation Prague Dependency Treebank Conclusions Missing piecesMultiNet What is MultiNet Multilayered Semantic Network University in Hagen, Germany Hermann Helbig, Sven Hartrumpf Parser: WOCADI for German (relies heavily on HaGenLex lexicon) MWR interface (Workbench of Knowledge Engineer) Designed w.r.t. question answering and cognitive modeling novak@ufal.mff.cuni.cz Syntax – Semantic Distance 4/ 20
  • 9. Motivation MultiNet – Knowledge Representation Issues of transformation Prague Dependency Treebank Conclusions Missing piecesSemantic Network Properties of Semantic Networks Everything represented as graph nodes The utterances gradually build the graph Inference rules can further connect the nodes (or add new ones) ⇒ Representation of knowledge, usable for inferencing and QA novak@ufal.mff.cuni.cz Syntax – Semantic Distance 5/ 20
  • 10. Motivation MultiNet – Knowledge RepresentationIssues of transformation Prague Dependency Treebank Conclusions Missing pieces novak@ufal.mff.cuni.cz Syntax – Semantic Distance 6/ 20
  • 11. Motivation MultiNet – Knowledge Representation Issues of transformation Prague Dependency Treebank Conclusions Missing piecesMultiNet Example: “The car was damaged because of the impact.” novak@ufal.mff.cuni.cz Syntax – Semantic Distance 7/ 20
  • 12. Motivation MultiNet – Knowledge Representation Issues of transformation Prague Dependency Treebank Conclusions Missing piecesMultiNet – technical info Properties of MultiNet 93 relations + 18 functions 7 layers of attributes hierarchy of 46 sorts 1 edge-end attribute distinguishing immanent (prototypical / categorical) vs. situational knowledge encapsulation of concepts default vs. categorical inference rules novak@ufal.mff.cuni.cz Syntax – Semantic Distance 8/ 20
  • 13. Motivation MultiNet – Knowledge Representation Issues of transformation Prague Dependency Treebank Conclusions Missing piecesPrague Dependency Treebank Developed at the Institute of Formal and Applied Linguistics, Charles University, Prague Three layers of annotation 3,168 documents ≈ 49,442 sentences ≈ 833,357 tokens annotated on all three layers. novak@ufal.mff.cuni.cz Syntax – Semantic Distance 9/ 20
  • 14. Motivation MultiNet – Knowledge Representation Issues of transformation Prague Dependency Treebank Conclusions Missing piecesPrague Dependency Treebank novak@ufal.mff.cuni.cz Syntax – Semantic Distance 10/ 20
  • 15. Motivation MultiNet – Knowledge Representation Issues of transformation Prague Dependency Treebank Conclusions Missing piecesPrague Dependency Treebank Sed’ klidnˇ, neh´bej se, nab´dala mˇ pˇıtelkynˇ. Kritizovali hvˇzdn´ syst´m, vˇˇıce v autentiˇnost dosud e y a e r´ e e y e er´ c neokoukan´ch tv´ˇı, kter´ se vˇak z´hy tak´ staly hvˇzdami (a nen´ to jen osud Belmond˚v). Pacient, vzpomenuv si y ar´ e s a e e ı u na vˇechna pˇıkoˇı zp˚soben´ mu spoleˇnost´ vzt´hl na doktora ruku. s r´ r´ u a c ı, a novak@ufal.mff.cuni.cz Syntax – Semantic Distance 10/ 20
  • 16. Motivation MultiNet – Knowledge Representation Issues of transformation Prague Dependency Treebank Conclusions Missing piecesPrague Dependency Treebank Sed’ klidnˇ, neh´bej se, nab´dala mˇ pˇıtelkynˇ. Kritizovali hvˇzdn´ syst´m, vˇˇıce v autentiˇnost dosud e y a e r´ e e y e er´ c neokoukan´ch tv´ˇı, kter´ se vˇak z´hy tak´ staly hvˇzdami (a nen´ to jen osud Belmond˚v). Pacient, vzpomenuv si y ar´ e s a e e ı u na vˇechna pˇıkoˇı zp˚soben´ mu spoleˇnost´ vzt´hl na doktora ruku. s r´ r´ u a c ı, a novak@ufal.mff.cuni.cz Syntax – Semantic Distance 10/ 20
  • 17. Motivation MultiNet – Knowledge Representation Issues of transformation Prague Dependency Treebank Conclusions Missing piecesPrague Dependency Treebank Sed’ klidnˇ, neh´bej se, nab´dala mˇ pˇıtelkynˇ. Kritizovali hvˇzdn´ syst´m, vˇˇıce v autentiˇnost dosud e y a e r´ e e y e er´ c neokoukan´ch tv´ˇı, kter´ se vˇak z´hy tak´ staly hvˇzdami (a nen´ to jen osud Belmond˚v). Pacient, vzpomenuv si y ar´ e s a e e ı u na vˇechna pˇıkoˇı zp˚soben´ mu spoleˇnost´ vzt´hl na doktora ruku. s r´ r´ u a c ı, a t-lnd94103-085-p1s21B t-ln94200-173-p2s6 t-ln94211-120-p5s4 root root root nabádat.enunc kritizovat.enunc vztáhnout.enunc PRED PRED PRED v v v přítelkyně #PersPron #Comma.enunc #PersPron #PersPron systém věřit pacient vzpomenout_si doktor ruku ACT ADDR CONJ ACT PAT COMPL ACT COMPL PAT DPHR n.denot n.pron.def.pers coap n.pron.def.pers n.denot v n.denot v n.denot dphr #PersPron sedět hýbat_se hvězdný #Cor autentičnost #Cor příkoří ACT PAT PAT RSTR ACT PAT ACT PAT n.pron.def.pers v v adj.denot qcomplex n.denot.neg qcomplex n.denot klidný #Neg tvář způsobený který MANN RHEM APP RSTR RSTR adj.denot atom n.denot adj.denot adj.pron.indef okoukaný stát_se #PersPron společnost RSTR RSTR PAT ACT adj.denot v n.pron.def.pers n.denot dosud však který záhy také hvězda být.enunc TTILL PREC ACT TWHEN.basic RHEM PAT PAR adv.denot.ngrad.nneg atom n.pron.indef adv.denot.ngrad.nneg atom n.denot v a ten #Neg osud PREC ACT RHEM PAT atom n.pron.def.demon atom n.denot jen Belmondo RHEM APP atom n.denot novak@ufal.mff.cuni.cz Syntax – Semantic Distance 10/ 20
  • 18. Motivation MultiNet – Knowledge Representation Issues of transformation Prague Dependency Treebank Conclusions Missing piecesTectogrammatical Representation Properties of Tectogrammatical Layer One sentence ≈ one tree Auxiliaries and function words removed Missing obligatory valents inserted Attributes of nodes Functor Semantic part of speech 15 grammatemes (negation, tense, politeness, . . . ) Topic-Focus distinction Sentential modality + technical attributes (coordinations, parentheses, IDs) novak@ufal.mff.cuni.cz Syntax – Semantic Distance 11/ 20
  • 19. Motivation MultiNet – Knowledge Representation Issues of transformation Prague Dependency Treebank Conclusions Missing piecesTectogrammatical Representation t-lnd94103-085-p1s21B root nabádat.enunc PRED v přítelkyně #PersPron #Comma.enunc ACT ADDR CONJ n.denot n.pron.def.pers coap #PersPron sedět hýbat_se ACT PAT PAT n.pron.def.pers v v klidný #Neg MANN RHEM adj.denot atom novak@ufal.mff.cuni.cz Syntax – Semantic Distance 12/ 20
  • 20. Motivation MultiNet – Knowledge Representation Issues of transformation Prague Dependency Treebank Conclusions Missing piecesAdditional Required Information Missing Pieces 1 Named entities recognition Numbers Places People ... 2 Metadata Author Date Place Document type Intended recipient of the text Bibliographical and other references novak@ufal.mff.cuni.cz Syntax – Semantic Distance 13/ 20
  • 21. Motivation Mapping Issues of transformation Topic-Focus Articulation Conclusions Additional RequirementsPresentation Outline Again 1 Motivation MultiNet – Knowledge Representation Prague Dependency Treebank Missing pieces 2 Issues of transformation Mapping Topic-Focus Articulation Additional Requirements 3 Conclusions Conclusions Related Work Future Work novak@ufal.mff.cuni.cz Syntax – Semantic Distance 14/ 20
  • 22. Motivation Mapping Issues of transformation Topic-Focus Articulation Conclusions Additional RequirementsMapping of Representational Means Main Issues of Transformation 1 Mapping of edges and corresponding functors in TR to MultiNet cognitive roles 2 Mapping of TR nodes to MultiNet concepts 3 Mapping of various natural language constructs to attribute-value assignments 4 Mapping of verbal tenses to temporal axis novak@ufal.mff.cuni.cz Syntax – Semantic Distance 15/ 20
  • 23. Motivation Mapping Issues of transformation Topic-Focus Articulation Conclusions Additional RequirementsMapping of Representational Means Main Issues of Transformation – closer look 1 1 Mapping of edges and corresponding functors in TR to MultiNet cognitive roles Actor and Patient highly ambiguous Location functors are used also where no location is involved (ELMT, CTXT, SITU) However, other functors correspond quite straightforwardly to MultiNet roles (a table is presented in the paper) 2 Mapping of TR nodes to MultiNet concepts 3 Mapping of various natural language constructs to attribute-value assignments 4 Mapping of verbal tenses to temporal axis novak@ufal.mff.cuni.cz Syntax – Semantic Distance 15/ 20
  • 24. Motivation Mapping Issues of transformation Topic-Focus Articulation Conclusions Additional RequirementsMapping of Representational Means Main Issues of Transformation – closer look 2 1 Mapping of edges and corresponding functors in TR to MultiNet cognitive roles 2 Mapping of TR nodes to MultiNet concepts Typically, a TR node corresponds to a MultiNet concept (i.e., also a node) Quite often, a TR node corresponds to a subnetwork in MultiNet Sometimes, the TR node corresponds to an edge in MultiNet (e.g., CORR, CTXT) 3 Mapping of various natural language constructs to attribute-value assignments 4 Mapping of verbal tenses to temporal axis novak@ufal.mff.cuni.cz Syntax – Semantic Distance 15/ 20
  • 25. Motivation Mapping Issues of transformation Topic-Focus Articulation Conclusions Additional RequirementsMapping of Representational Means Main Issues of Transformation – closer look 3 1 Mapping of edges and corresponding functors in TR to MultiNet cognitive roles 2 Mapping of TR nodes to MultiNet concepts 3 Mapping of various natural language constructs to attribute-value assignments color The color of x is y . SUB y x has y color. TR VAL x is y . AT y is the color of x. x 4 Mapping of verbal tenses to temporal axis novak@ufal.mff.cuni.cz Syntax – Semantic Distance 15/ 20
  • 26. Motivation Mapping Issues of transformation Topic-Focus Articulation Conclusions Additional RequirementsMapping of Representational Means Main Issues of Transformation – closer look 4 1 Mapping of edges and corresponding functors in TR to MultiNet cognitive roles 2 Mapping of TR nodes to MultiNet concepts 3 Mapping of various natural language constructs to attribute-value assignments 4 Mapping of verbal tenses to temporal axis Verbal tenses encoded in grammatemes In MultiNet, TEMP, ANTE, DUR, STRT, and FIN relations can be used. novak@ufal.mff.cuni.cz Syntax – Semantic Distance 15/ 20
  • 27. Motivation Mapping Issues of transformation Topic-Focus Articulation Conclusions Additional RequirementsTopic-Focus Articulation TFA in PDT TFA is annotated on the Tectogrammatical layer Every word has an attribute: c, t, or f The nodes are ordered with respect to “communicative dynamism” ⇓ TFA in MultiNet Content expressed by TFA is further analyzed into: 1 Encapsulation of concepts 2 Scope of quantifiers 3 Layer attributes (GENER, REFER, VARIA, . . . ) novak@ufal.mff.cuni.cz Syntax – Semantic Distance 16/ 20
  • 28. Motivation Mapping Issues of transformation Topic-Focus Articulation Conclusions Additional RequirementsAdditional Requirements Additional Requirements 1 Spatio-Temporal Representation For simple inferences about space and time 2 Calendar For computations with dates 3 Ontology For all kinds of inferences Ontology is an inherent part of MultiNet semantic network design Upper conceptual ontology represented by sorts novak@ufal.mff.cuni.cz Syntax – Semantic Distance 17/ 20
  • 29. Motivation Conclusions Issues of transformation Related Work Conclusions Future WorkConclusions Conclusions MultiNet is a suitable formalism for inferences and QA It’s difficult to transform texts into MultiNet Tectogrammatical representation is not designed for inferencing and QA There are tools for text-to-TR conversion TR is a good starting point for conversion to MultiNet (structural similarity, disambiguation in TR) We have presented issues arising in such a process novak@ufal.mff.cuni.cz Syntax – Semantic Distance 18/ 20
  • 30. Motivation Conclusions Issues of transformation Related Work Conclusions Future WorkRelated Work Related Work Helbig (1986): Automatical transformation to MultiNet Hor´k (2001): Automatical transformation to Transparent a Intensional Logic Callmeier et al. (2004): DeepThought project – automatical transformation to Robust Minimal Recursion Semantics Bos (2005): Automatical transformation to Discourse Representation Theory Bolshakov and Gelbukh (2000): Automatical transformation in Meaning–Text Theory framework Kruijff-Korbayov´ (1998): TR to DRT automatical a transformation novak@ufal.mff.cuni.cz Syntax – Semantic Distance 19/ 20
  • 31. Motivation Conclusions Issues of transformation Related Work Conclusions Future WorkFuture Work Future Work 1 Stage I – Preparation Annotation tools Annotation guidelines 2 Stage II – Annotation Pilot study Automated preprocessing Evaluation of annotators 3 Stage III – Application Supervised “parsing” Assessment of TR necessity novak@ufal.mff.cuni.cz Syntax – Semantic Distance 20/ 20