Perspectives of Turning Prague Dependency Treebank into a Knowledge Base
Upcoming SlideShare
Loading in...5
×
 

Perspectives of Turning Prague Dependency Treebank into a Knowledge Base

on

  • 231 views

LREC 2006, Genoa, Italy

LREC 2006, Genoa, Italy

Statistics

Views

Total Views
231
Views on SlideShare
229
Embed Views
2

Actions

Likes
0
Downloads
0
Comments
0

1 Embed 2

http://www.linkedin.com 2

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution-NonCommercial LicenseCC Attribution-NonCommercial License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Perspectives of Turning Prague Dependency Treebank into a Knowledge Base Perspectives of Turning Prague Dependency Treebank into a Knowledge Base Presentation Transcript

    • Motivation Issues of transformation ConclusionsPerspectives of Turning Prague Dependency Treebank into a Knowledge Base V´clav Nov´k, Jan Hajiˇ a a c Institute of Formal and Applied Linguistics Charles University Prague, Czech Republic May 25, 2006 novak@ufal.mff.cuni.cz Perspectives of PDT as a knowledge base 1/ 16
    • Motivation Issues of transformation ConclusionsPresentation Outline 1 Motivation MultiNet – Knowledge Representation Prague Dependency Treebank Missing pieces 2 Issues of transformation Topic-Focus Articulation Mapping Additional Requirements 3 Conclusions novak@ufal.mff.cuni.cz Perspectives of PDT as a knowledge base 2/ 16
    • Motivation MultiNet – Knowledge Representation Issues of transformation Prague Dependency Treebank Conclusions Missing piecesMultiNet What is MultiNet Multilayered Semantic Network University in Hagen, Germany Hermann Helbig, Sven Hartrumpf Parser: WOCADI for German (relies heavily on HaGenLex lexicon) MWR interface (Workbench of Knowledge Engineer) Designed w.r.t. question answering and cognitive modeling novak@ufal.mff.cuni.cz Perspectives of PDT as a knowledge base 3/ 16
    • Motivation MultiNet – Knowledge Representation Issues of transformation Prague Dependency Treebank Conclusions Missing piecesSemantic Network Properties of Semantic Networks Everything represented as graph nodes The utterances gradually build the graph Inference rules can further connect the nodes (or add new ones) ⇒ Representation of knowledge, usable for inferencing and QA novak@ufal.mff.cuni.cz Perspectives of PDT as a knowledge base 4/ 16
    • Motivation MultiNet – Knowledge RepresentationIssues of transformation Prague Dependency Treebank Conclusions Missing pieces novak@ufal.mff.cuni.cz Perspectives of PDT as a knowledge base 5/ 16
    • Motivation MultiNet – Knowledge Representation Issues of transformation Prague Dependency Treebank Conclusions Missing piecesMultiNet – technical info Properties of MultiNet 93 relations + 18 functions 7 layers of attributes hierarchy of 46 sorts 1 edge-end attribute distinguishing immanent (prototypical / categorical) vs. situational knowledge encapsulation of concepts default vs. categorical inference rules novak@ufal.mff.cuni.cz Perspectives of PDT as a knowledge base 6/ 16
    • Motivation MultiNet – Knowledge Representation Issues of transformation Prague Dependency Treebank Conclusions Missing piecesPrague Dependency Treebank Developed at the Institute of Formal and Applied Linguistics, Charles University, Prague Three layers of annotation 3,168 documents ≈ 49,442 sentences ≈ 833,357 tokens annotated on all three layers. novak@ufal.mff.cuni.cz Perspectives of PDT as a knowledge base 7/ 16
    • Motivation MultiNet – Knowledge Representation Issues of transformation Prague Dependency Treebank Conclusions Missing piecesPrague Dependency Treebank Sed’ klidnˇ, neh´bej se, nab´dala mˇ pˇıtelkynˇ. Kritizovali hvˇzdn´ syst´m, vˇˇıce v autentiˇnost dosud e y a e r´ e e y e er´ c neokoukan´ch tv´ˇı, kter´ se vˇak z´hy tak´ staly hvˇzdami (a nen´ to jen osud Belmond˚v). Pacient, vzpomenuv si y ar´ e s a e e ı u na vˇechna pˇıkoˇı zp˚soben´ mu spoleˇnost´ vzt´hl na doktora ruku. s r´ r´ u a c ı, a t-lnd94103-085-p1s21B t-ln94200-173-p2s6 t-ln94211-120-p5s4 root root root nabádat.enunc kritizovat.enunc vztáhnout.enunc PRED PRED PRED v v v přítelkyně #PersPron #Comma.enunc #PersPron #PersPron systém věřit pacient vzpomenout_si doktor ruku ACT ADDR CONJ ACT PAT COMPL ACT COMPL PAT DPHR n.denot n.pron.def.pers coap n.pron.def.pers n.denot v n.denot v n.denot dphr #PersPron sedět hýbat_se hvězdný #Cor autentičnost #Cor příkoří ACT PAT PAT RSTR ACT PAT ACT PAT n.pron.def.pers v v adj.denot qcomplex n.denot.neg qcomplex n.denot klidný #Neg tvář způsobený který MANN RHEM APP RSTR RSTR adj.denot atom n.denot adj.denot adj.pron.indef okoukaný stát_se #PersPron společnost RSTR RSTR PAT ACT adj.denot v n.pron.def.pers n.denot dosud však který záhy také hvězda být.enunc TTILL PREC ACT TWHEN.basic RHEM PAT PAR adv.denot.ngrad.nneg atom n.pron.indef adv.denot.ngrad.nneg atom n.denot v a ten #Neg osud PREC ACT RHEM PAT atom n.pron.def.demon atom n.denot jen Belmondo RHEM APP atom n.denot novak@ufal.mff.cuni.cz Perspectives of PDT as a knowledge base 8/ 16
    • Motivation MultiNet – Knowledge Representation Issues of transformation Prague Dependency Treebank Conclusions Missing piecesTectogrammatical Representation Properties of Tectogrammatical Layer One sentence ≈ one tree Auxiliaries and function words removed Missing obligatory valents inserted Attributes of nodes Functor Semantic part of speech 15 grammatemes (negation, tense, politeness, . . . ) Topic-Focus distinction Sentential modality + technical attributes (coordinations, parentheses, IDs) novak@ufal.mff.cuni.cz Perspectives of PDT as a knowledge base 9/ 16
    • Motivation MultiNet – Knowledge Representation Issues of transformation Prague Dependency Treebank Conclusions Missing piecesTectogrammatical Representation t-lnd94103-085-p1s21B root nabádat.enunc PRED v přítelkyně #PersPron #Comma.enunc ACT ADDR CONJ n.denot n.pron.def.pers coap #PersPron sedět hýbat_se ACT PAT PAT n.pron.def.pers v v klidný #Neg MANN RHEM adj.denot atom novak@ufal.mff.cuni.cz Perspectives of PDT as a knowledge base 10/ 16
    • Motivation MultiNet – Knowledge Representation Issues of transformation Prague Dependency Treebank Conclusions Missing piecesAdditional Required Information Missing Pieces 1 Named entities recognition Numbers Places People ... 2 Metadata Author Date Place Document type Intended recipient of the text Bibliographical and other references novak@ufal.mff.cuni.cz Perspectives of PDT as a knowledge base 11/ 16
    • Motivation Topic-Focus Articulation Issues of transformation Mapping Conclusions Additional RequirementsPresentation Outline Again 1 Motivation MultiNet – Knowledge Representation Prague Dependency Treebank Missing pieces 2 Issues of transformation Topic-Focus Articulation Mapping Additional Requirements 3 Conclusions novak@ufal.mff.cuni.cz Perspectives of PDT as a knowledge base 12/ 16
    • Motivation Topic-Focus Articulation Issues of transformation Mapping Conclusions Additional RequirementsTopic-Focus Articulation TFA in PDT TFA is annotated on the Tectogrammatical layer Every word has an attribute: c, t, or f The nodes are ordered with respect to “communicative dynamism” ⇓ TFA in MultiNet Content expressed by TFA is further analyzed into: 1 Encapsulation of concepts 2 Scope of quantifiers 3 Layer attributes (GENER, REFER, VARIA, . . . ) novak@ufal.mff.cuni.cz Perspectives of PDT as a knowledge base 13/ 16
    • Motivation Topic-Focus Articulation Issues of transformation Mapping Conclusions Additional RequirementsMapping of Representational Means Main Issues of Transformation 1 Mapping of edges and corresponding functors in TR to MultiNet cognitive roles 2 Mapping of TR nodes to MultiNet concepts 3 Mapping of various natural language constructs to attribute-value assignments 4 Mapping of verbal tenses to temporal axis novak@ufal.mff.cuni.cz Perspectives of PDT as a knowledge base 14/ 16
    • Motivation Topic-Focus Articulation Issues of transformation Mapping Conclusions Additional RequirementsMapping of Representational Means Main Issues of Transformation – closer look 1 1 Mapping of edges and corresponding functors in TR to MultiNet cognitive roles Actor and Patient highly ambiguous Location functors are used also where no location is involved (ELMT, CTXT, SITU) However, other functors correspond quite straightforwardly to MultiNet roles (a table is presented in the paper) 2 Mapping of TR nodes to MultiNet concepts 3 Mapping of various natural language constructs to attribute-value assignments 4 Mapping of verbal tenses to temporal axis novak@ufal.mff.cuni.cz Perspectives of PDT as a knowledge base 14/ 16
    • Motivation Topic-Focus Articulation Issues of transformation Mapping Conclusions Additional RequirementsMapping of Representational Means Main Issues of Transformation – closer look 2 1 Mapping of edges and corresponding functors in TR to MultiNet cognitive roles 2 Mapping of TR nodes to MultiNet concepts Typically, a TR node corresponds to a MultiNet concept (i.e., also a node) Quite often, a TR node corresponds to a subnetwork in MultiNet Sometimes, the TR node corresponds to an edge in MultiNet (e.g., CORR, CTXT) 3 Mapping of various natural language constructs to attribute-value assignments 4 Mapping of verbal tenses to temporal axis novak@ufal.mff.cuni.cz Perspectives of PDT as a knowledge base 14/ 16
    • Motivation Topic-Focus Articulation Issues of transformation Mapping Conclusions Additional RequirementsMapping of Representational Means Main Issues of Transformation – closer look 3 1 Mapping of edges and corresponding functors in TR to MultiNet cognitive roles 2 Mapping of TR nodes to MultiNet concepts 3 Mapping of various natural language constructs to attribute-value assignments colour The colour of x is y . SUB y x has y colour. TR VAL x is y . AT y is the colour of x. x 4 Mapping of verbal tenses to temporal axis novak@ufal.mff.cuni.cz Perspectives of PDT as a knowledge base 14/ 16
    • Motivation Topic-Focus Articulation Issues of transformation Mapping Conclusions Additional RequirementsMapping of Representational Means Main Issues of Transformation – closer look 4 1 Mapping of edges and corresponding functors in TR to MultiNet cognitive roles 2 Mapping of TR nodes to MultiNet concepts 3 Mapping of various natural language constructs to attribute-value assignments 4 Mapping of verbal tenses to temporal axis Verbal tenses encoded in grammatemes In MultiNet, TEMP, ANTE, DUR, STRT, and FIN relations can be used. novak@ufal.mff.cuni.cz Perspectives of PDT as a knowledge base 14/ 16
    • Motivation Topic-Focus Articulation Issues of transformation Mapping Conclusions Additional RequirementsAdditional Requirements Additional Requirements 1 Spatio-Temporal Representation For simple inferences about space and time 2 Calendar For computations with dates 3 Ontology For all kinds of inferences Ontology is an inherent part of MultiNet semantic network design Upper conceptual ontology represented by sorts novak@ufal.mff.cuni.cz Perspectives of PDT as a knowledge base 15/ 16
    • Motivation Issues of transformation ConclusionsConclusions Conclusions MultiNet is a suitable formalism for inferences and QA It’s difficult to transform texts into MultiNet Tectogrammatical representation is not designed for inferencing and QA There are tools for text-to-TR conversion TR is a good starting point for conversion to MultiNet We have presented issues arising in such a process novak@ufal.mff.cuni.cz Perspectives of PDT as a knowledge base 16/ 16