SlideShare a Scribd company logo
1 of 8
Download to read offline
Cross-lingual event-mining using wordnet as a shared knowledge interface

                  Piek Vossen                                    ˜
                                                 Aitor Soroa, Benat Zapirain, German Rigau
             VU University Amsterdam                   University of the Basque Country
            p.t.j.m.vossen@vu.nl                             a.soroa@ehu.es
                                                        benat.zapirain@ehu.es
                                                         german.rigau@ehu.es


                   Abstract                         The major disadvantage of traditional IE systems
                                                    is that they focus on satisfying precise, narrow,
    We describe a concept-based event-mining        pre-specified requests e.g. all names of directors
    system that maximizes information ex-           of movies. Compared to full text indexes in In-
    tracted from text and is not restricted to      formation Retrieval (IR), IE systems only cover a
    predefined knowledge templates. Such             small portion of the knowledge in texts while cap-
    a system needs to handle a wide range           turing deeper semantic relations.
    of expressions while being able to ex-             Alternatively, the KYOTO system1 combines
    tract precise semantic relations.     The       the comprehensiveness of IR systems with the
    system uses simple patterns of linguis-         depths of IE systems. Furthermore, the system can
    tic and ontological constraints that are        be applied to different languages and domains in
    applied to a uniform representation of          an uniform way. It tries to extract any event and
    the text. It uses a generic ontology            its participants from the text, and relate them to
    based on DOLCE and wordnets in dif-             time and place. To achieve this, it uses a full text
    ferent languages to extract events from         representation format and a wide range of knowl-
    text in these languages in an interopera-       edge in the form of wordnets and a generic on-
    ble way. The system performs unsuper-           tology (Vossen and Rigau, 2010). Word-sense-
    vised domain-independent event-mining           disambiguation (WSD) is a crucial step to map text
    with promising results. Error-analysis          to concepts. We implemented a two-phased WSD
    showed that the semantic model and the          strategy and show the effects on event-extraction.
    mapping of text to concepts through word-       We first apply a state-of-the-art WSD to words in
    sense-disambiguation (WSD) are not the          context, scoring all possible synsets in a wordnet.
    main cause of the errors but the complex-       Each of these synsets is mapped to a shared on-
    ity of the grammatical structures and the       tology. From this mapping, all possible ontolog-
    quality of parsing. Using the same seman-       ical implications are derived. Next, the mining
    tic model and their cross-wordnet links,        system extracts all possible interpretations of all
    our English event-mining patterns were          sequences of ontological concepts that match the
    transferred to Dutch in less than a day’s       patterns. In the second-phase, we select an inter-
    work. The platform was tested on the envi-      pretation only if there is a choice using the WSD
    ronment domain but can be applied to any        score. The system has been tested on texts from
    other domain.                                   the environment domain. However, the knowledge
                                                    resources and patterns are generic and can be ap-
1   Introduction
                                                    plied to any other domain.
Traditionally, Information Extraction (IE) is the      In the next section, we describe the general ar-
task of filling template information from previ-     chitecture of the KYOTO system and the knowl-
ously unseen text which belongs to a predefined      edge structure. In section 3, we describe the mod-
domain (Peshkin and Pfeffer, 2003). Standard        ule for mining knowledge from the text. In sec-
IE systems are based on language-specific pattern    tion 4, we describe the evaluation results and an
matching (Kaiser and Miksch, 2005), consisting      error-analysis for English. Since the profiles use
of language specific regular expressions and asso-
                                                       1
ciated mappings from syntactic to logical forms.           Available at http://www.kyoto-project.eu/
<kaf xml:lang="en" doc="example1">
 <text>
<wf page="1" sent="40" wid="w267" fileoffset="6,11">
        water</wf>
<wf page="1" sent="40" wid="w268" fileoffset="12,21">
       pollution</wf>
  <...>
 </text>
 <terms>
<term id="t241" lemma="water" pos="N">
   <span><target id="w267"></target></span>
   <externalReferences>
     <externalRef conf="0.29" ref="14845743-n"
        resource="wn30g"/>
     <externalRef ref="Kyoto#water"
        reftype="sc_equivalenceOf" resource="ontology"/>
     <!--...-->
     <externalRef reftype="SubClassOf"
        reference="DOLCE-Lite.owl#endurant"
        status="implied"/>
     <!--...-->
  </externalReferences>
</term>
<term tid="t242" lemma="pollution" pos="N" type="open">
   <span><target id="w268"></target></span>
   <externalReferences>
       <externalRef conf="0.33" ref="14516743-n"                     Figure 2: System Architecture
         resource="wn30g"/>
       <externalRef ref="Kyoto#contamination_pollution"
         reftype="sc_equivalenceOf" resource="ontology"/>
       <!--...-->
       <externalRef reftype="SubClassOf"
                                                          is therefore the same module for all the languages
        reference="DOLCE-Lite.owl#perdurant"              (Agirre and Soroa, 2009). The current system
        status="implied"/>
       <!--...-->                                         includes processors for English, Dutch, Italian,
  </externalReferences>
</term>                                                   Spanish, Basque, Chinese and Japanese. Figure
 </terms>                                                 1 shows a simplified example of a KAF struc-
<!-- Additional layers (chunking, dependencies, ...) -->
</kaf>                                                    ture with the two basic layers: <text> layer (tok-
                                                        enization, segmentation) and <terms>, containing
     Figure 1: Example of a KAF document                morpho-syntactic and semantic information drawn
                                                        from WordNet and the KYOTO ontology for the
                                                        words water and pollution.
language-neutral ontological constraints, they can         In KYOTO, the knowledge extraction is done by
be easily transferred to another language. There-       so-called Kybots (Knowledge Yielding Robots).
fore in section 5, we describe how the system was       Kybots are defined by a set of profiles represent-
transferred from English to Dutch, through the          ing information patterns. In the profile, concep-
wordnet-equivalence links.                              tual relations are expressed using ontological and
                                                        morpho-syntactic patterns. Since the semantics is
2   KYOTO overview                                      defined through the ontology, it is possible to de-
                                                        tect similar data even if expressed differently. In
The KYOTO system starts with linguistic pro-
                                                        Figure 2, we show an example of a conceptual pat-
cessors that apply tokenization, segmentation,
                                                        tern for the environment domain that relates organ-
morpho-syntactic analysis and semantic tagging of
                                                        isms that live in habitats. The pattern uses labels
the text. The semantic tagging involves detection
                                                        from the central ontology, whereas each wordnet
of named-entities and the meaning of words ac-
                                                        synsets is directly or indirectly related to these la-
cording to a given wordnet. The output of the lin-
                                                        bels. Such a pattern can be used to extract events
guistic processors is stored in an XML annotation
                                                        from text, such as frogs that live in cropland in
format that is the same for all the languages, called
                                                        France during the period 2000-2010.
the KYOTO Annotation Format (KAF, (Bosma et
                                                           The system exploits a 3-layered knowledge-
al., 2009)). KAF is compatible with the Linguistic
                                                        architecture (Vossen and Rigau, 2010), using a
Annotation Framework (LAF, (Ide and L.Romary,
                                                        central ontology, wordnets in different languages
2003)). In KAF, words, terms, constituents and
                                                        and potential background vocabularies linked to
syntactic dependencies are stored in separate lay-
                                                        the wordnets. The ontology consists of around
ers with references across the structures. All
                                                        2,000 classes divided over three layers (Hicks and
modules in KYOTO draw their input from these
                                                        Herold, 2009). The top layer is based on DOLCE2
XML structures. Likewise, WSD is done on the
                                                           2
same KAF annotation in different languages and                 DOLCE-Lite-Plus version 3.9.7
(Gangemi et al., 2003) and OntoWordNet. The                but also other relations such as events in which
second layer are the Base Concepts3 (BCs) which            the concept denoted by the word plays a role (e.g.
cover an intermediate level of abstraction for all         the word polluted water denotes water that plays
nominal and verbal WordNet synsets (Izquierdo et           a role in the event pollution) or, the other way
al., 2007). Examples of BCs are: building, vehicle,        around, the roles involved in the event denoted by
animal, plant, change, move, size, weight. A third         the word (e.g. the word water pollution denotes
layer consists of domain classes introduced for de-        events in which water is as a patient).
tecting events and qualities in a particular domain
(i.e. environment).                                        3 Event extraction
   The semantic model also provides complete               A set of abstract patterns called Kybots use
mappings to the ontology for all nominal, ver-             the central ontology to extract actual concept
bal and adjectival WordNet3.0 synsets (Fellbaum,           instances and relations from KAF documents.
1998)4 . The mappings also harmonize predi-                Event-mining is done by processing these abstract
cate information across different part-of-speech           patterns on the enriched documents. These pat-
(POS). For instance, migratory events represented          terns are defined in a declarative format using
by different synsets of the verb migrate, the noun         profiles, which describe general morpho-syntactic
migration or the adjective migratory inherit the           and semantic conditions on sequences of KAF
same ontological information corresponding to the          terms (which are lemmas in the text). These
ChangeOfResidence class in the ontology.                   profiles are compiled to XQueries to efficiently
   This generic knowledge model provides an ex-            scan over KAF documents uploaded into an XML
tremely powerful basis for semantic processing             database. These patterns extract the relevant infor-
in any domain. Furthermore, through the equiv-             mation from each match.
alence relations of wordnets in other languages               Figure 3 shows an example of a simple Kybot
to the English WordNet, this semantic framework            profile. Profiles are described using XML syntax
can also be applied to other languages as shown in         and consist of three main parts:
Section 5.
   The WSD module assigns concepts to each                   • Variable declaration (<variables> element).
word with a score based on the context. Onto-                  In this part, the search entities are defined,
logical tagging of the text is then the last step in           e.g.: X (terms whose part-of-speech is
the pre-processing before the extraction of events.            noun and whose lemma is not “system”),
For each synset associated to a word, we use the               Y (terms whose lemma is either “release”,
wordnet to ontology mappings to look up its asso-              “produce” or “generate”) and Z (terms
ciated ontological classes and inherited properties.           linked to a subclass of the ontological class
The Base Concept mapping guarantees that every                 DOLCE-Lite.owl#contamination pollution,
synset is mapped to an ontological class. Next,                meaning being contaminated with harmful
we insert into KAF all the ontological implications            substances).
that apply to each concept. By making the implicit           • Relations among variables (<rel> element):
ontological statements explicit, Kybots are able to            This part specifies the relations among the
find the same relations hidden in different expres-             previously defined variables, e.g.: Y is the
sions with different surface realizations, e.g.: wa-           main pivot, variable X must precede variable
ter pollution, polluted water, pollution of water,             Y in the same sentence, and variable Z must
water that is polluted directly or indirectly express          follow variable Y. Thus, this relation declares
the same relations. Figure 1 shows how ontolog-                patterns like ’X → Y → Z’ in a sentence.
ical statements are represented in KAF as exter-
                                                             • Output template: describes the output to be
nal references related to synsets with a score from
                                                               produced for every match, e.g.: each match
the WSD (the value of the atribute conf). Words
                                                               generates a new event from the term Y and
in the term structure usually get many ontological
                                                               two roles: the ’done-by’ role filled by term X
implications for each word meaning. The implica-
                                                               and ’patient’ role, filled by Z.
tions reflect subclass relations from the ontology
   3                                                         We created 261 generic profiles for English.
    http://adimen.si.ehu.es/web/BLC
  4
    This knowledge model is freely available through the   These profiles capture very simple sequences of
KYOTO website as open-source data.                         parts-of-speech or words, e.g. noun-verb or
<kprofile>
 <variables>                                                 • a list of text token ids that represent a partic-
  <var name="x" type="term" pos="N" lemma="! system"/>
  <var name="y" type="term"
                                                               ipant
       lemma="produce | generate | release"/>
  <var name="z" type="term"                                  If an event has multiple participants, a separate
       ref="DOLCE-Lite.owl#contamination_pollution"
       reftype="SubClassOf"/>                             triplet is created for each event-participant pair.
 </variables>
 <relations>
                                                          The triplet identifier is used to mark which triplets
  <root span="y"/>                                        relate to the same event.
  <rel span="x" pivot="y" direction="preceding"/>
  <rel span="z" pivot="y" direction="following"/>
 </relations>                                             4.2    Evaluation results for English
 <events>
  <event target="$y/@tid" lemma="$y/@lemma"               We created a gold-standard in the triplet format.
          pos="$y/@pos"/>
  <role target="$x/@tid" rtype="done-by"                  An annotation tool that reads KAF and can assign
        lemma="$x/@lemma"/>
  <role target="$z/@tid" rtype="patient"                  any set of tags to tokens in KAF was used to make
        lemma="$z/@lemma"/>                               a gold-standard for a document about the Chesa-
 </events>
</kprofile>                                               peake Bay, a large estuary in the US5 . The doc-
                                                          ument has 16, 145 word tokens. We manually an-
        Figure 3: Example of a Kybot profile               notated all relevant relations in 127 sentences, cor-
                                                          responding to 1, 416 tokens, 353 triplets and 201
                                                          events.
adjective-noun, where each word is restricted to             The first column in Table 2 show the annotated
classes from the ontology, e.g. a motion event fol-       relations.6 The patient relation is most frequent
lowed by a geographical region. Note that it is           (38%), followed by done-by (15%) and simple-
important that all possible expressions of relation       cause-of (14%).
are modeled by the profiles.                                  As a baseline, we created triplets for all heads of
                                                          constituents in a single sentence according to the
4    Evaluation                                           constituent representation of the text in KAF. The
4.1 Triplet representation for representing               baseline generates 3, 427 triplets for the annotated
    events                                                sentences. Since there is no relation predicted, we
                                                          assume the most-frequent patient relation.
The event structure in KYOTO is rather specific               To evaluate the Kybots, we used the 261 generic
and events can be complex, including many dif-            profiles. The profiles generated 548 triplets for the
ferent roles and relations. Below is an example           annotated sentences. In total 169 profiles or com-
of such a structure extracted from the sentence:          binations of profiles (since multiple profiles can
“Forests also absorb air pollution and retain up to       propose the same triplet) have been applied to the
85 percent of the nitrogen from sources such as           annotated fragment.
automobiles and power plants.”                               To measure the proportion of relevant events
<event eid="e203" target="t4260"                          that are detected by these heuristics, we compare
   lemma="absorb" pos="V"
   synset="eng-30-01539633-v" rank="0.25"/>               the baseline and Kybot events with the event to-
<role rid="r280" event="e203" target="t4258"
   lemma="forest" pos="N" rtype="done-by"                 kens in the gold standard. The gold standard has
   synset="eng-30-09284015-n" rank="0.15"/>               201 events and the baseline 1,627 events, of which
<role rid="r976" event="e203" target="t4262mw"
   lemma="air pollution" pos="N" rtype="patient"          249 overlap with the gold standard events. This
   synset="eng-30-14517412-n" rank="1"/>
<role rid="r1609" event="e203" target="t4277mw"           results in a recall of 1.24 and a precision of 0.15.
   lemma="power plant" pos="N" rtype="simple-cause-of"
   synset="eng-30-03996655-n" rank="1"/>
                                                          The Kybot profiles detect 733 events, of which 209
<role rid="r276" event="e203" target="t4274"              are relevant. Recall is 1.04 and precision is 0.29.
   lemma="automobile" pos="N" rtype="simple-cause-of"
   synset="eng-30-02958343-n" rank="1"/>                  Recall of events is similar to the baseline and pre-
                                                          cision is almost twice as high. The fact that the
  To be able to compare our results with the out-
                                                          recall is higher than 1 is caused by the fact that
put of other systems and gold-standards, we de-
                                                          the gold standard sometimes marks larger phrases
fined a more neutral and simple triplet format.
                                                          as a single event which may be separate events in
  A triplet consists of:
                                                             5
                                                               The tool and evaluation data is available at the KYOTO
                                                          website
    • a relation                                             6
                                                               The relations are taken from the DOLCE part of the KY-
                                                          OTO ontology. Please consult the DOLCE ontology for their
    • a list of text token ids that represent the event   formal definition.
the baseline and the Kybot output. Both the base-                    To analyze the recall in more detail, we looked
line and the Kybot profiles thus do not miss any                   at the most-frequent missed relations: patient (48)
relevant events but do extract a substantial amount               and done-by (30) (see Table 2).
of irrelevant events. The precision for the profiles
                                                                   Relation           Gold       %    System   Correct   R.    P.   Missed
is still reasonable, given the fact that no relevance              destination-of       27    7.65%      17         6    22    35      21
ranking has been applied and only generic profiles                  use-of
                                                                   generic-location
                                                                                         4
                                                                                        11
                                                                                              1.13%
                                                                                              3.12%
                                                                                                          1
                                                                                                         22
                                                                                                                    1
                                                                                                                    8
                                                                                                                         25
                                                                                                                         72
                                                                                                                              100
                                                                                                                               36
                                                                                                                                        3
                                                                                                                                        3
have been used. Note that events that are not an-                  source-of             4    1.13%      10         1    25    10       3
                                                                   instrument            2    0.57%       0         0     0     0       2
notated can still be proper events.                                product-of            2    0.57%       0         0     0     0       2
                                                                   part-of               1    0.28%       3         0     0     0       1
   Table 1 shows the results for the triplet evalua-               purpose-of            7    1.98%       9         3    42    33       4
                                                                   patient             133   37.68%     195        85    63    43      48
tion of the relevant events.                                       path-of               1    0.28%       0         0     0     0       1
                                                                   result-of             4    1.13%       7         0     0     0       4
                       Ignoring relations      With relations      participant           0     0.0%       3         0     0     0       0
                      Baseline     Kybots   Baseline     Kybots    has-state            32    9.07%      42        11    34    26      21
        Nr. correct    306         222       115         174       state-of             22    6.23%      25        11    50    44      11
        Precision      0.09        0.49      0.03        0.32      done-by              52   14.73%      89        22    42    24      30
        Recall         0.86        0.63      0.33        0.49      simple-cause-of      51   14.45%     125        26    50    20      25
                                                                   Total               353    100%      548       174    49    31     179

       Table 1: Baseline and Kybot results
                                                                  Table 2: Generic processing with 261 profiles dif-
                                                                  ferentiated per relation
   When ignoring the relation, recall for the base-
line is 86%, which shows that the baseline matches
                                                                     From the patient triplets, we missed 25% due to
a substantial part of the annotated triplets. It also
                                                                  parser errors, among which wrong-POS, missed
shows that 14% is missed. This is due to the fact
                                                                  verb-particle combinations and multiwords. An-
that the parser only marks one word as the head
                                                                  other 15% of the patient triplets was not found be-
in the case of a coordination of heads, e.g. in the
                                                                  cause the parser does not provide detailed and re-
phrase “birds and fish” only “bird” is marked as
                                                                  liable dependency information to distinguish be-
the head. Precision of the baseline is very low,
                                                                  tween subjects and objects and the ontology does
even when we ignore the relation itself. If we take
                                                                  not distinguish sufficiently between events with
the patient relation as the default, we see that the
                                                                  participants that control the process (e.g. “to
precision and recall drop even more. The Kybot
                                                                  swim”) and participants that do not (e.g. “to
profiles clearly outperform the baseline in terms
                                                                  flow”). Remarkably, only 4% of the errors are
of precision: 49% when ignoring the relation and
                                                                  due to a missing concept in WordNet or a wrong
32% considering all relations. In terms of recall,
                                                                  mapping of WordNet to the ontology. Another 4%
we see that 63% is covered when we ignore the
                                                                  could have been found by making more profiles.
relation. This indicates that the profiles do con-
sider the majority of structures, but still miss 37%                 In the case of the done-by relation, 30% of
of the structures. When we consider the relations,                the missed relations are the result of parser errors
recall drops to 49% which is still well above the                 (mainly coordination of NPs and VPs in which
baseline.                                                         only one is marked as the head) and another
                                                                  30% because the structures of simple-cause-of and
4.2.1 Error analysis                                              done-by are the same and the ontology does not
We did a separate error analysis for recall and pre-              provide sufficient information on the events to dis-
cision. First of all, we checked the 1,023 term                   tinguish.
tokens of content words (nouns, verbs and adjec-                     Precision errors are mostly caused by the fact
tives) that occurred in the 127 gold-standard sen-                that patient, done-by and simple-cause-of are eas-
tences. It turned out that there are 70 tokens with               ily confused not only by the Kybots but also by
the wrong POS assigned (7%). The major errors                     humans. The patient relation performs slightly
are nouns and verbs interpreted as adjectives and                 above average precision: 43% but done-by (24%),
common nouns considered as proper names, most                     simple-cause-of (20%) and has-state (26%) are
notably “wetlands” and “wastewater” occurring 3                   performing below average. Especially, the simple-
and 5 times respectively. If the wrong POS is as-                 cause-of relation is decreasing the overall preci-
signed, the words cannot be found in WordNet                      sion since it represents 125 triplets (15%). The
or the wrong synsets are assigned. In that case,                  simple-cause-of relation applies to perdurants re-
wrong or no ontological statements are inserted for               lated to other perdurants. Due to the ambiguity in
a word.                                                           English of nouns to denote either an endurant or
WSD threshold   #triplets   #correct      P.     R.     F1
a perdurant, the system is likely to over-generate        0                    548        174    0.32    0.49   0.39
this relation. The reverse holds for the done-by re-      10                   500        169    0.34    0.48   0.40
                                                          20                   479        167    0.35    0.47   0.40
lation. Both relations typically hold for the same        30                   470        167    0.36    0.47   0.41
structures such as nouns in subject position of a         40                   461        166    0.36    0.47   0.41
                                                          50                   446        164    0.37    0.46   0.41
verb. Another common error that is related are            60                   434        164    0.38    0.46   0.42
                                                          70                   429        162    0.38    0.46   0.41
cases such as forest destroyed and houses built.          80                   427        161    0.38    0.46   0.41
Since the parser does not provide information on          90                   426        161    0.38    0.46   0.41
                                                          100                  377        148    0.39    0.42   0.41
the inflection of the verbs nor on passive/active          manual               364        141    0.39    0.40   0.39
form, the profile can only detect a noun+verb pat-
tern and assigns a done-by relation where a patient    Table 3: Generic processing with different WSD
relation should be assigned. Again, more informa-      thresholds.
tion from the parser in the KAF representation can
help here.                                             42%. We get the optimal settings using a thresh-
   The main conclusion is that major improve-          old for WSD of 60%. This gives an F-measure of
ment both in recall and precision can be achieved      42%, for precision 38% and recall 46%.
by better and more input from the linguistic pre-         We also applied a manual disambiguation of the
processing, by richer ontological information e.g.     benchmark file. The results for the manually dis-
control of events, and by extending the number of      ambiguated file are shown in the last row. We can
profiles. Furthermore, precision could also be im-      see that less triplets are generated (364) but close
proved if we can resolve ambiguity between en-         to the 100% WSD threshold (377). Remarkably,
durants and perdurants of nouns to distinguish for     the precision is the same as for 100% WSD while
example done-by from simple-cause-of. It thus          recall is a bit less (40%). This shows that the er-
makes sense to consider the effect of WSD on the       rors of WSD apparently do not have a big impact
precision of the mining. This is discussed in the      on the extraction. For recall, it is thus better to
next section.                                          use less perfect WSD. This is inline with the er-
                                                       ror analysis in the previous section, which showed
4.2.2 Effects of Word-Sense-Disambiguation
                                                       that structural processing is more a problem than
The generic processing considers all the possible      the mapping of the text to concepts.
meanings of the words and does not take the WSD           We also checked the effect of WSD on the ex-
into account.                                          traction of relevant events. Eliminating synsets
   To see the effect of the WSD, we implemented        through WSD did not show any effect. Precision
a filter on the Kybot output that selects interpreta-   remains the same (29%), and recall only drops
tions with the highest WSD score for each word         slightly from 104% to 97% when we limit the
in the output that has multiple interpretations. By    events to 100% WSD threshold. In the case of
interpretation we mean: being either an event or a     manual WSD, we do get a much higher preci-
role or having different relations assigned. By ex-    sion (49%) and a bit lower recall (83%). This
cluding low scoring concepts only when there is a      clearly shows that the Kybots over-generate many
choice to be made, we hope to capture as much re-      events due to the event-object ambiguity of words
call as possible and to gain precision. Note that      in English. The fact that precision of the manually
the WSD scored a precision of 48% in the Se-           tagged file is much less than the recall, also sug-
mEval2010 task on domain specific WSD, which            gest that relevance of extracted events is not con-
used documents from the same domain as KY-             sidered by our system: even after manual (perfect)
OTO ((Agirre et al., 2010)). We set a threshold        WSD, the system detects events that are not anno-
for eliminating relations in proportion to the max-    tated. If we consider 49% of event detection as the
imum WSD scores of each word. The results are          upper limit here, which seems reasonable, we can
shown in Table 3. A threshold of 0 means that all      say that the Kybots reach a precision of 60% of the
interpretations are considered, a threshold of 100     upper limit in detecting events.
means only the highest scoring interpretations.
   We can see that there is a positive correlation     4.2.3   Effects of selecting best performing
between WSD threshold and precision, where pre-                profiles
cision increases from 32% to 39% using the high-       The profiles perform very differently in terms of
est WSD scores only. Recall drops from 49% to          recall and precision. We therefore derived the
precision for each each profile, using the opti-                           sharing the same semantic backbone and transfer-
mal WSD setting of 60% of the maximum score.                              ring Kybot profiles, we carried out a transfer ex-
We implemented a filter that checks every conflict                          periment from English to Dutch. We collected 93
across triplets. If two triplets involve the same                         Dutch documents on a Dutch estuary (the West-
events and roles but have a different relation, we                        erschelde) and related topics. We created KAF
choose the triplet generated by the higher scoring                        files and applied WSD to these KAF file using the
profile. The results are shown in Table 4. The                             Dutch wordnet data.
first row of the table shows the results for a WSD                            To apply the profiles to the Dutch KAF docu-
threshold of 60% using 128 profiles. The remain-                           ments, we need to apply the ontotagger program
ing rows show the results when this 60% WSD                               to the Dutch KAF. We created tables that match
output is post-filtered using profiles with precision                       every Dutch synset to the English Base Concepts
scores 1, 5, 10, 25, 50 and 75.                                           and to the ontology using the equivalence rela-
                                                                          tions. We generated 145,189 Dutch synset to En-
                  #profiles   #triplets   #correct      P.     R.     F1
    All profiles       129         434        164    0.38    0.46   0.42   glish Base Concept mappings (for comparison for
    profiles 1%        104         332        147    0.44    0.42   0.43   English we have 114,477 mappings) and 326,667
    profiles 5%        103         312        147    0.47    0.42   0.44
    profiles 10%       103         312        147    0.47    0.42   0.44   Dutch synset to ontology mappings (186,383 for
    profiles 25%        93         284        141    0.50    0.40   0.44
    profiles 50%        76         219        115    0.53    0.33   0.40
                                                                          English). These ontotag tables were used to insert
    profiles 75%        22          46         32    0.70    0.09   0.16   the ontological implications into the Dutch KAF
                                                                          files.
Table 4: Generic processing with WSD threshold
                                                                             Next, we adapted the 261 English Kybot pro-
of 60% and using best performing profiles.
                                                                          files to replace all English specific elements by
                                                                          Dutch. This mainly involved:
   We see a clear increase in precision and a drop
in recall, as expected. However, we also see an                             • replacing English prepositions and relative
increase in the F-measure from 41% to 44% us-                                 clause complementizers by Dutch equiva-
ing a subset of the profiles with higher precision.                            lents;
Using profiles with a precision score of at least
25%, we obtain a precision of 50% and a recall                              • adapting the word order sequences for rela-
of 40%. With these settings, 90 profiles have been                             tive clauses in Dutch;
used compared to 129 profiles using just the WSD
                                                                            • adapting profiles that include adverbials,
threshold of 60%. This shows that the set of pro-
                                                                              since they occur in different positions in
files can be optimized for specific document col-
                                                                              Dutch;
lections by annotating a proportion of the collec-
tion that is representative and deriving a precision-                       • eliminating profiles for multiword com-
score for the different profiles. Likewise, we can                             pounds which mostly occur in Dutch as a one
pair style of writing to the type of relation ex-                             word compound;
pressed.
   If we compare these results with the manually                            • eliminating profiles for explicit English struc-
annotated file in Table 3, we see that the best pro-                           tures that express causal relations;
files have a much higher precision (50% against                               We kept all the ontological constraints exactly
39% manual) and the same recall. This again con-                          as they were for English. Only superficial syntac-
firms that the challenge for getting more preci-                           tic properties were thus changed. It took us half-a-
sion is in resolving the structural relations in the                      day to adapt the profiles for Dutch. From the orig-
text rather than assigning better concepts through                        inal 261 English profiles, we obtained 134 Dutch
WSD.                                                                      profiles.
5      Transferring Kybots to another                                        We ran the profiles on the 93 Dutch KAF files
                                                                          (42,697 word tokens) and 65 profiles generated
       language
                                                                          output: 4,095 events and 6,862 roles. In terms
An important aspect of the KYOTO system is the                            of relations, we see a similar distribution as for
sharing of the central ontology and the possibil-                         English, as shown in Table 5. The patient rela-
ity to extract semantic relations in different lan-                       tion is most frequent, followed by relations such as
guages in a uniform way. To test the feasibility of                       generic-location, has-state and done-by. We did a
Relation               #        %
                                                             and KNOW2 TIN2009-14715-C04-04 projects.
              destination-of        10    0,15%
              patient            2067    30,12%
              path-of               23    0,34%              References
              has-state          1236    18,01%
              generic-location     396    5,77%              E. Agirre and A. Soroa. 2009. Personalizing pagerank
              state-of             748   10,90%                 for word sense disambiguation. In EACL, pages 33–
              source-of            669    9,75%                 41.
              done-by              792   11,54%
              part-of               87    1,27%
              simple-cause-of      573    8,35%              Eneko Agirre, Montse Cuadros, German Rigau, and
              purpose-of           261    3,80%                Aitor Soroa. 2010. Exploring knowledge bases
              Total              6,862                         for similarity. In Nicoletta Calzolari (Conference
                                                               Chair), Khalid Choukri, Bente Maegaard, Joseph
Table 5: Relations extracted for Dutch documents.              Mariani, Jan Odijk, Stelios Piperidis, Mike Ros-
                                                               ner, and Daniel Tapias, editors, Proceedings of the
                                                               Seventh conference on International Language Re-
preliminary inspection and the results look reason-            sources and Evaluation (LREC’10), Valletta, Malta,
able. For instance, two frequent words denoting                may. European Language Resources Association
events (the noun toename (increase) and the verb               (ELRA).
stijgen (increase)) appear to have sensible patients         W. E. Bosma, Piek Vossen, Aitor Soroa, German
(number, activity, consumption, pollution, trade,              Rigau, Maurizio Tesconi, Andrea Marchetti, Mon-
pressure, ground sea level, earth).                            ica Monachini, and Carlo Aliprandi. 2009. Kaf: a
                                                               generic semantic annotation format. In Proceedings
                                                               of the GL2009 Workshop on Semantic Annotation,
6   Conclusions                                                Pisa, Italy.
We described an open platform for event-mining               C. Fellbaum. 1998. WordNet: An Electronical Lexical
using wordnets and a central ontology that aims at              Database. The MIT Press, Cambridge, MA.
maximizing the information extracted from text.
                                                             A. Gangemi, N. Guarino, C. Masolo, and A. Oltramari.
The system uses a limited set of generic patterns               2003. Sweetening wordnet with dolce. AI Mag.,
with structural and ontological constraints on ele-             24(3):13–24.
ments from the text. We have shown that word-
                                                             A. Hicks and A. Herold. 2009. Evaluating ontolo-
nets can be used to map text to ontological classes            gies with rudify. In Jan L. G. Dietz, editor, Pro-
and extract events and participants from text. Our             ceedings of the 2nd International Conference on
error analysis showed that recall is mostly ham-               Knowledge Engineering and Ontology Development
pered by the structural complexity of the text and             (KEOD’09), pages 5–12. INSTICC Press.
the incapability of the parser to handle this phe-           N. Ide and L.Romary. 2003. Outline of the interna-
nomenon. The knowledge resources, wordnet and                  tional standard linguistic annotation framework. In
the ontology, did not play a major role in recall.             Proceedings of ACL’03 Workshop on Linguistic An-
However, precision of the event relations is more              notation: Getting the Model, pages 1–5.
affected by richness and quality of the semantics            R. Izquierdo, A. Suarez, and G. Rigau. 2007. Explor-
analysis. We have shown that WSD has a positive                 ing the automatic selection of basic level concepts.
effect on the precision of the extracted relations              In Galia Angelova et al., editor, International Con-
                                                                ference Recent Advances in Natural Language Pro-
and that precision can be further optimized by tun-
                                                                cessing, pages 298–302, Borovets, Bulgaria.
ing the structural profiles to the genre of the target
text. The system can be easily transferred to any            K. Kaiser and S. Miksch. 2005. Information extrac-
language that has a wordnet connected to the En-               tion. a survey. Technical report, Vienna University
                                                               of Technology. Institute of Software Technology and
glish WordNet, as was shown for Dutch. In the fu-              Interactive Systems.
ture, we want to further improve recall and preci-
sion using richer event data and machine learning            L. Peshkin and A. Pfeffer. 2003. Bayesian information
                                                                extraction network. In In Proc. of the 18th Interna-
techniques and use the output for reconstruction of             tional Joint Conference on Artificial Intelligence.
relations between events. We will also experiment
with other parsers for English and Dutch to see the          P. Vossen and G. Rigau. 2010. Division of semantic
effect on the quality.                                          labor in the global wordnet grid. In Proc. of Global
                                                                WordNet Conference (GWC’2010). Mumbay, India.
Acknowledgments
This work was been possible thanks to the support provided
by KYOTO ICT-2007-211423, PATHS ICT-2010-270082

More Related Content

What's hot

Association Rule Mining Based Extraction of Semantic Relations Using Markov L...
Association Rule Mining Based Extraction of Semantic Relations Using Markov L...Association Rule Mining Based Extraction of Semantic Relations Using Markov L...
Association Rule Mining Based Extraction of Semantic Relations Using Markov L...IJwest
 
Ekaw ontology learning for cost effective large-scale semantic annotation
Ekaw ontology learning for cost effective large-scale semantic annotationEkaw ontology learning for cost effective large-scale semantic annotation
Ekaw ontology learning for cost effective large-scale semantic annotationShahab Mokarizadeh
 
Artigo - Aplicações Interativas para TV Digital: Uma Proposta de Ontologia de...
Artigo - Aplicações Interativas para TV Digital: Uma Proposta de Ontologia de...Artigo - Aplicações Interativas para TV Digital: Uma Proposta de Ontologia de...
Artigo - Aplicações Interativas para TV Digital: Uma Proposta de Ontologia de...Diego Armando
 
Poster Semantic data integration proof of concept
Poster Semantic data integration proof of conceptPoster Semantic data integration proof of concept
Poster Semantic data integration proof of conceptNicolas Bertrand
 
Analyzing individual neurons in pre trained language models
Analyzing individual neurons in pre trained language modelsAnalyzing individual neurons in pre trained language models
Analyzing individual neurons in pre trained language modelsken-ando
 
Language Combinatorics: A Sentence Pattern Extraction Architecture Based on C...
Language Combinatorics: A Sentence Pattern Extraction Architecture Based on C...Language Combinatorics: A Sentence Pattern Extraction Architecture Based on C...
Language Combinatorics: A Sentence Pattern Extraction Architecture Based on C...Waqas Tariq
 
Model-Driven Software Development with Semantic Web Technologies
Model-Driven Software Development with Semantic Web TechnologiesModel-Driven Software Development with Semantic Web Technologies
Model-Driven Software Development with Semantic Web TechnologiesFernando Silva Parreiras
 
ONTOLOGICAL MODEL FOR CHARACTER RECOGNITION BASED ON SPATIAL RELATIONS
ONTOLOGICAL MODEL FOR CHARACTER RECOGNITION BASED ON SPATIAL RELATIONSONTOLOGICAL MODEL FOR CHARACTER RECOGNITION BASED ON SPATIAL RELATIONS
ONTOLOGICAL MODEL FOR CHARACTER RECOGNITION BASED ON SPATIAL RELATIONSsipij
 
Hyponymy extraction of domain ontology
Hyponymy extraction of domain ontologyHyponymy extraction of domain ontology
Hyponymy extraction of domain ontologyIJwest
 
PA5-2_iconf08.doc.doc
PA5-2_iconf08.doc.docPA5-2_iconf08.doc.doc
PA5-2_iconf08.doc.docbutest
 
A Natural Logic for Artificial Intelligence, and its Risks and Benefits
A Natural Logic for Artificial Intelligence, and its Risks and Benefits A Natural Logic for Artificial Intelligence, and its Risks and Benefits
A Natural Logic for Artificial Intelligence, and its Risks and Benefits gerogepatton
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI) International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI) inventionjournals
 
Semantic Rules Representation in Controlled Natural Language in FluentEditor
Semantic Rules Representation in Controlled Natural Language in FluentEditorSemantic Rules Representation in Controlled Natural Language in FluentEditor
Semantic Rules Representation in Controlled Natural Language in FluentEditorCognitum
 
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING cscpconf
 
Method for ontology generation from concept maps in shallow domains
Method for ontology generation from concept maps in shallow domainsMethod for ontology generation from concept maps in shallow domains
Method for ontology generation from concept maps in shallow domainsLuigi Ceccaroni
 
ConNeKTion: A Tool for Exploiting Conceptual Graphs Automatically Learned fro...
ConNeKTion: A Tool for Exploiting Conceptual Graphs Automatically Learned fro...ConNeKTion: A Tool for Exploiting Conceptual Graphs Automatically Learned fro...
ConNeKTion: A Tool for Exploiting Conceptual Graphs Automatically Learned fro...University of Bari (Italy)
 

What's hot (17)

Association Rule Mining Based Extraction of Semantic Relations Using Markov L...
Association Rule Mining Based Extraction of Semantic Relations Using Markov L...Association Rule Mining Based Extraction of Semantic Relations Using Markov L...
Association Rule Mining Based Extraction of Semantic Relations Using Markov L...
 
Ekaw ontology learning for cost effective large-scale semantic annotation
Ekaw ontology learning for cost effective large-scale semantic annotationEkaw ontology learning for cost effective large-scale semantic annotation
Ekaw ontology learning for cost effective large-scale semantic annotation
 
Artigo - Aplicações Interativas para TV Digital: Uma Proposta de Ontologia de...
Artigo - Aplicações Interativas para TV Digital: Uma Proposta de Ontologia de...Artigo - Aplicações Interativas para TV Digital: Uma Proposta de Ontologia de...
Artigo - Aplicações Interativas para TV Digital: Uma Proposta de Ontologia de...
 
Poster Semantic data integration proof of concept
Poster Semantic data integration proof of conceptPoster Semantic data integration proof of concept
Poster Semantic data integration proof of concept
 
Analyzing individual neurons in pre trained language models
Analyzing individual neurons in pre trained language modelsAnalyzing individual neurons in pre trained language models
Analyzing individual neurons in pre trained language models
 
Language Combinatorics: A Sentence Pattern Extraction Architecture Based on C...
Language Combinatorics: A Sentence Pattern Extraction Architecture Based on C...Language Combinatorics: A Sentence Pattern Extraction Architecture Based on C...
Language Combinatorics: A Sentence Pattern Extraction Architecture Based on C...
 
Model-Driven Software Development with Semantic Web Technologies
Model-Driven Software Development with Semantic Web TechnologiesModel-Driven Software Development with Semantic Web Technologies
Model-Driven Software Development with Semantic Web Technologies
 
ONTOLOGICAL MODEL FOR CHARACTER RECOGNITION BASED ON SPATIAL RELATIONS
ONTOLOGICAL MODEL FOR CHARACTER RECOGNITION BASED ON SPATIAL RELATIONSONTOLOGICAL MODEL FOR CHARACTER RECOGNITION BASED ON SPATIAL RELATIONS
ONTOLOGICAL MODEL FOR CHARACTER RECOGNITION BASED ON SPATIAL RELATIONS
 
Hyponymy extraction of domain ontology
Hyponymy extraction of domain ontologyHyponymy extraction of domain ontology
Hyponymy extraction of domain ontology
 
PA5-2_iconf08.doc.doc
PA5-2_iconf08.doc.docPA5-2_iconf08.doc.doc
PA5-2_iconf08.doc.doc
 
A Natural Logic for Artificial Intelligence, and its Risks and Benefits
A Natural Logic for Artificial Intelligence, and its Risks and Benefits A Natural Logic for Artificial Intelligence, and its Risks and Benefits
A Natural Logic for Artificial Intelligence, and its Risks and Benefits
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI) International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
Semantic Rules Representation in Controlled Natural Language in FluentEditor
Semantic Rules Representation in Controlled Natural Language in FluentEditorSemantic Rules Representation in Controlled Natural Language in FluentEditor
Semantic Rules Representation in Controlled Natural Language in FluentEditor
 
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
 
Method for ontology generation from concept maps in shallow domains
Method for ontology generation from concept maps in shallow domainsMethod for ontology generation from concept maps in shallow domains
Method for ontology generation from concept maps in shallow domains
 
ConNeKTion: A Tool for Exploiting Conceptual Graphs Automatically Learned fro...
ConNeKTion: A Tool for Exploiting Conceptual Graphs Automatically Learned fro...ConNeKTion: A Tool for Exploiting Conceptual Graphs Automatically Learned fro...
ConNeKTion: A Tool for Exploiting Conceptual Graphs Automatically Learned fro...
 
Human Assessment of Ontologies
Human Assessment of OntologiesHuman Assessment of Ontologies
Human Assessment of Ontologies
 

Viewers also liked

PATHS at Royal Melbourne Institute of Technology
PATHS at Royal Melbourne Institute of TechnologyPATHS at Royal Melbourne Institute of Technology
PATHS at Royal Melbourne Institute of Technologypathsproject
 
OWL2+SWRL to EMF+IQPL
OWL2+SWRL to EMF+IQPLOWL2+SWRL to EMF+IQPL
OWL2+SWRL to EMF+IQPLizso
 
WordPressでサイト作るときに始めにやっておきたいこと~パーマリンク編~
WordPressでサイト作るときに始めにやっておきたいこと~パーマリンク編~WordPressでサイト作るときに始めにやっておきたいこと~パーマリンク編~
WordPressでサイト作るときに始めにやっておきたいこと~パーマリンク編~Akinori Tateyama
 
Outdoor Kitchens
Outdoor KitchensOutdoor Kitchens
Outdoor Kitchenssunkitchen
 
My E-mail appears as spam - troubleshooting path - part 11 of 17
My E-mail appears as spam - troubleshooting path - part 11 of 17My E-mail appears as spam - troubleshooting path - part 11 of 17
My E-mail appears as spam - troubleshooting path - part 11 of 17Eyal Doron
 
Evaluating the Use of Clustering for Automatically Organising Digital Library...
Evaluating the Use of Clustering for Automatically Organising Digital Library...Evaluating the Use of Clustering for Automatically Organising Digital Library...
Evaluating the Use of Clustering for Automatically Organising Digital Library...pathsproject
 
Boletim informativo novembro/dezembro 2015
Boletim informativo novembro/dezembro 2015Boletim informativo novembro/dezembro 2015
Boletim informativo novembro/dezembro 2015bibliotecasjuliomartins
 
PATHS at VSMM 2012
PATHS at VSMM 2012PATHS at VSMM 2012
PATHS at VSMM 2012pathsproject
 
Feg chapter 04 - present perfect azar
Feg chapter 04 - present perfect azarFeg chapter 04 - present perfect azar
Feg chapter 04 - present perfect azarmacbridesmith
 
WordBench熊本第3回勉強会
WordBench熊本第3回勉強会WordBench熊本第3回勉強会
WordBench熊本第3回勉強会Akinori Tateyama
 

Viewers also liked (18)

Canciones del colegio
Canciones del colegioCanciones del colegio
Canciones del colegio
 
PATHS at Royal Melbourne Institute of Technology
PATHS at Royal Melbourne Institute of TechnologyPATHS at Royal Melbourne Institute of Technology
PATHS at Royal Melbourne Institute of Technology
 
OWL2+SWRL to EMF+IQPL
OWL2+SWRL to EMF+IQPLOWL2+SWRL to EMF+IQPL
OWL2+SWRL to EMF+IQPL
 
WordPressでサイト作るときに始めにやっておきたいこと~パーマリンク編~
WordPressでサイト作るときに始めにやっておきたいこと~パーマリンク編~WordPressでサイト作るときに始めにやっておきたいこと~パーマリンク編~
WordPressでサイト作るときに始めにやっておきたいこと~パーマリンク編~
 
Outdoor Kitchens
Outdoor KitchensOutdoor Kitchens
Outdoor Kitchens
 
P1 component 2
P1 component 2P1 component 2
P1 component 2
 
IND-2012-290 Anando -REUNION
IND-2012-290 Anando -REUNIONIND-2012-290 Anando -REUNION
IND-2012-290 Anando -REUNION
 
Extra unit 2
Extra unit 2Extra unit 2
Extra unit 2
 
My E-mail appears as spam - troubleshooting path - part 11 of 17
My E-mail appears as spam - troubleshooting path - part 11 of 17My E-mail appears as spam - troubleshooting path - part 11 of 17
My E-mail appears as spam - troubleshooting path - part 11 of 17
 
Chocolate Games
Chocolate GamesChocolate Games
Chocolate Games
 
Evaluating the Use of Clustering for Automatically Organising Digital Library...
Evaluating the Use of Clustering for Automatically Organising Digital Library...Evaluating the Use of Clustering for Automatically Organising Digital Library...
Evaluating the Use of Clustering for Automatically Organising Digital Library...
 
Boletim informativo novembro/dezembro 2015
Boletim informativo novembro/dezembro 2015Boletim informativo novembro/dezembro 2015
Boletim informativo novembro/dezembro 2015
 
PATHS at VSMM 2012
PATHS at VSMM 2012PATHS at VSMM 2012
PATHS at VSMM 2012
 
Zp primary school, ganegaon khalsa
Zp primary school, ganegaon khalsaZp primary school, ganegaon khalsa
Zp primary school, ganegaon khalsa
 
Feg chapter 04 - present perfect azar
Feg chapter 04 - present perfect azarFeg chapter 04 - present perfect azar
Feg chapter 04 - present perfect azar
 
WordBench熊本第3回勉強会
WordBench熊本第3回勉強会WordBench熊本第3回勉強会
WordBench熊本第3回勉強会
 
Sesion 10 FLOR
Sesion 10 FLORSesion 10 FLOR
Sesion 10 FLOR
 
Renaissance
RenaissanceRenaissance
Renaissance
 

Similar to Cross-lingual event-mining using wordnet as a shared knowledge interface

PA5-2_iconf08.doc.doc
PA5-2_iconf08.doc.docPA5-2_iconf08.doc.doc
PA5-2_iconf08.doc.docbutest
 
Association Rule Mining Based Extraction of Semantic Relations Using Markov ...
Association Rule Mining Based Extraction of  Semantic Relations Using Markov ...Association Rule Mining Based Extraction of  Semantic Relations Using Markov ...
Association Rule Mining Based Extraction of Semantic Relations Using Markov ...dannyijwest
 
SOFIA - RDF Recipes for Context Aware Interoperability in Pervasive Systems. NXP
SOFIA - RDF Recipes for Context Aware Interoperability in Pervasive Systems. NXPSOFIA - RDF Recipes for Context Aware Interoperability in Pervasive Systems. NXP
SOFIA - RDF Recipes for Context Aware Interoperability in Pervasive Systems. NXPSofia Eu
 
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word EmbeddingIRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word EmbeddingIRJET Journal
 
Improving Robustness and Flexibility of Concept Taxonomy Learning from Text
Improving Robustness and Flexibility of Concept Taxonomy Learning from Text Improving Robustness and Flexibility of Concept Taxonomy Learning from Text
Improving Robustness and Flexibility of Concept Taxonomy Learning from Text University of Bari (Italy)
 
NL Context Understanding 23(6)
NL Context Understanding 23(6)NL Context Understanding 23(6)
NL Context Understanding 23(6)IT Industry
 
SWSN UNIT-3.pptx we can information about swsn professional
SWSN UNIT-3.pptx we can information about swsn professionalSWSN UNIT-3.pptx we can information about swsn professional
SWSN UNIT-3.pptx we can information about swsn professionalgowthamnaidu0986
 
An Incremental Method For Meaning Elicitation Of A Domain Ontology
An Incremental Method For Meaning Elicitation Of A Domain OntologyAn Incremental Method For Meaning Elicitation Of A Domain Ontology
An Incremental Method For Meaning Elicitation Of A Domain OntologyAudrey Britton
 
Cordon review-mamdani-gf ss-ijar-52-6-2011-pp894-913
Cordon review-mamdani-gf ss-ijar-52-6-2011-pp894-913Cordon review-mamdani-gf ss-ijar-52-6-2011-pp894-913
Cordon review-mamdani-gf ss-ijar-52-6-2011-pp894-913Iffalia R
 
Authoring OWL 2 ontologies with the TEX-OWL syntax
Authoring OWL 2 ontologies with the TEX-OWL syntaxAuthoring OWL 2 ontologies with the TEX-OWL syntax
Authoring OWL 2 ontologies with the TEX-OWL syntaxMauro Dragoni
 
Modeling of Speech Synthesis of Standard Arabic Using an Expert System
Modeling of Speech Synthesis of Standard Arabic Using an Expert SystemModeling of Speech Synthesis of Standard Arabic Using an Expert System
Modeling of Speech Synthesis of Standard Arabic Using an Expert Systemcsandit
 
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextFulvio Rotella
 
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextUniversity of Bari (Italy)
 
Venkatesan bosc2010 onto-toolkit
Venkatesan bosc2010 onto-toolkitVenkatesan bosc2010 onto-toolkit
Venkatesan bosc2010 onto-toolkitBOSC 2010
 
A NATURAL LOGIC FOR ARTIFICIAL INTELLIGENCE, AND ITS RISKS AND BENEFITS
A NATURAL LOGIC FOR ARTIFICIAL INTELLIGENCE, AND ITS RISKS AND BENEFITSA NATURAL LOGIC FOR ARTIFICIAL INTELLIGENCE, AND ITS RISKS AND BENEFITS
A NATURAL LOGIC FOR ARTIFICIAL INTELLIGENCE, AND ITS RISKS AND BENEFITSijasuc
 

Similar to Cross-lingual event-mining using wordnet as a shared knowledge interface (20)

PA5-2_iconf08.doc.doc
PA5-2_iconf08.doc.docPA5-2_iconf08.doc.doc
PA5-2_iconf08.doc.doc
 
AICOL2015_paper_16
AICOL2015_paper_16AICOL2015_paper_16
AICOL2015_paper_16
 
Association Rule Mining Based Extraction of Semantic Relations Using Markov ...
Association Rule Mining Based Extraction of  Semantic Relations Using Markov ...Association Rule Mining Based Extraction of  Semantic Relations Using Markov ...
Association Rule Mining Based Extraction of Semantic Relations Using Markov ...
 
SOFIA - RDF Recipes for Context Aware Interoperability in Pervasive Systems. NXP
SOFIA - RDF Recipes for Context Aware Interoperability in Pervasive Systems. NXPSOFIA - RDF Recipes for Context Aware Interoperability in Pervasive Systems. NXP
SOFIA - RDF Recipes for Context Aware Interoperability in Pervasive Systems. NXP
 
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word EmbeddingIRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
 
Improving Robustness and Flexibility of Concept Taxonomy Learning from Text
Improving Robustness and Flexibility of Concept Taxonomy Learning from Text Improving Robustness and Flexibility of Concept Taxonomy Learning from Text
Improving Robustness and Flexibility of Concept Taxonomy Learning from Text
 
Project proposal for a fishery ontology service
Project proposal for a fishery ontology serviceProject proposal for a fishery ontology service
Project proposal for a fishery ontology service
 
The basics of ontologies
The basics of ontologiesThe basics of ontologies
The basics of ontologies
 
NL Context Understanding 23(6)
NL Context Understanding 23(6)NL Context Understanding 23(6)
NL Context Understanding 23(6)
 
SWSN UNIT-3.pptx we can information about swsn professional
SWSN UNIT-3.pptx we can information about swsn professionalSWSN UNIT-3.pptx we can information about swsn professional
SWSN UNIT-3.pptx we can information about swsn professional
 
An Incremental Method For Meaning Elicitation Of A Domain Ontology
An Incremental Method For Meaning Elicitation Of A Domain OntologyAn Incremental Method For Meaning Elicitation Of A Domain Ontology
An Incremental Method For Meaning Elicitation Of A Domain Ontology
 
CL2009_ANNIS_pre
CL2009_ANNIS_preCL2009_ANNIS_pre
CL2009_ANNIS_pre
 
CL2009_ANNIS_pre
CL2009_ANNIS_preCL2009_ANNIS_pre
CL2009_ANNIS_pre
 
Cordon review-mamdani-gf ss-ijar-52-6-2011-pp894-913
Cordon review-mamdani-gf ss-ijar-52-6-2011-pp894-913Cordon review-mamdani-gf ss-ijar-52-6-2011-pp894-913
Cordon review-mamdani-gf ss-ijar-52-6-2011-pp894-913
 
Authoring OWL 2 ontologies with the TEX-OWL syntax
Authoring OWL 2 ontologies with the TEX-OWL syntaxAuthoring OWL 2 ontologies with the TEX-OWL syntax
Authoring OWL 2 ontologies with the TEX-OWL syntax
 
Modeling of Speech Synthesis of Standard Arabic Using an Expert System
Modeling of Speech Synthesis of Standard Arabic Using an Expert SystemModeling of Speech Synthesis of Standard Arabic Using an Expert System
Modeling of Speech Synthesis of Standard Arabic Using an Expert System
 
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
 
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
 
Venkatesan bosc2010 onto-toolkit
Venkatesan bosc2010 onto-toolkitVenkatesan bosc2010 onto-toolkit
Venkatesan bosc2010 onto-toolkit
 
A NATURAL LOGIC FOR ARTIFICIAL INTELLIGENCE, AND ITS RISKS AND BENEFITS
A NATURAL LOGIC FOR ARTIFICIAL INTELLIGENCE, AND ITS RISKS AND BENEFITSA NATURAL LOGIC FOR ARTIFICIAL INTELLIGENCE, AND ITS RISKS AND BENEFITS
A NATURAL LOGIC FOR ARTIFICIAL INTELLIGENCE, AND ITS RISKS AND BENEFITS
 

More from pathsproject

Roadmap from ESEPaths to EDMPaths: a note on representing annotations resulti...
Roadmap from ESEPaths to EDMPaths: a note on representing annotations resulti...Roadmap from ESEPaths to EDMPaths: a note on representing annotations resulti...
Roadmap from ESEPaths to EDMPaths: a note on representing annotations resulti...pathsproject
 
Aletras, Nikolaos and Stevenson, Mark (2013) "Evaluating Topic Coherence Us...
Aletras, Nikolaos  and  Stevenson, Mark (2013) "Evaluating Topic Coherence Us...Aletras, Nikolaos  and  Stevenson, Mark (2013) "Evaluating Topic Coherence Us...
Aletras, Nikolaos and Stevenson, Mark (2013) "Evaluating Topic Coherence Us...pathsproject
 
PATHSenrich: A Web Service Prototype for Automatic Cultural Heritage Item Enr...
PATHSenrich: A Web Service Prototype for Automatic Cultural Heritage Item Enr...PATHSenrich: A Web Service Prototype for Automatic Cultural Heritage Item Enr...
PATHSenrich: A Web Service Prototype for Automatic Cultural Heritage Item Enr...pathsproject
 
Implementing Recommendations in the PATHS system, SUEDL 2013
Implementing Recommendations in the PATHS system, SUEDL 2013Implementing Recommendations in the PATHS system, SUEDL 2013
Implementing Recommendations in the PATHS system, SUEDL 2013pathsproject
 
User-Centred Design to Support Exploration and Path Creation in Cultural Her...
 User-Centred Design to Support Exploration and Path Creation in Cultural Her... User-Centred Design to Support Exploration and Path Creation in Cultural Her...
User-Centred Design to Support Exploration and Path Creation in Cultural Her...pathsproject
 
Generating Paths through Cultural Heritage Collections Latech2013 paper
Generating Paths through Cultural Heritage Collections Latech2013 paperGenerating Paths through Cultural Heritage Collections Latech2013 paper
Generating Paths through Cultural Heritage Collections Latech2013 paperpathsproject
 
Supporting User's Exploration of Digital Libraries, Suedl 2012 workshop proce...
Supporting User's Exploration of Digital Libraries, Suedl 2012 workshop proce...Supporting User's Exploration of Digital Libraries, Suedl 2012 workshop proce...
Supporting User's Exploration of Digital Libraries, Suedl 2012 workshop proce...pathsproject
 
PATHS state of the art monitoring report
PATHS state of the art monitoring reportPATHS state of the art monitoring report
PATHS state of the art monitoring reportpathsproject
 
Recommendations for the automatic enrichment of digital library content using...
Recommendations for the automatic enrichment of digital library content using...Recommendations for the automatic enrichment of digital library content using...
Recommendations for the automatic enrichment of digital library content using...pathsproject
 
Semantic Enrichment of Cultural Heritage content in PATHS
Semantic Enrichment of Cultural Heritage content in PATHSSemantic Enrichment of Cultural Heritage content in PATHS
Semantic Enrichment of Cultural Heritage content in PATHSpathsproject
 
Generating Paths through Cultural Heritage Collections, LATECH 2013 paper
Generating Paths through Cultural Heritage Collections, LATECH 2013 paperGenerating Paths through Cultural Heritage Collections, LATECH 2013 paper
Generating Paths through Cultural Heritage Collections, LATECH 2013 paperpathsproject
 
PATHS @ LATECH 2013
PATHS @ LATECH 2013PATHS @ LATECH 2013
PATHS @ LATECH 2013pathsproject
 
PATHS at the eChallenges conference
PATHS at the eChallenges conferencePATHS at the eChallenges conference
PATHS at the eChallenges conferencepathsproject
 
PATHS at the EAA conference 2013
PATHS at the EAA conference 2013PATHS at the EAA conference 2013
PATHS at the EAA conference 2013pathsproject
 
PATHS at the eCult dialogue day 2013
PATHS at the eCult dialogue day 2013PATHS at the eCult dialogue day 2013
PATHS at the eCult dialogue day 2013pathsproject
 
Comparing taxonomies for organising collections of documents presentation
Comparing taxonomies for organising collections of documents presentationComparing taxonomies for organising collections of documents presentation
Comparing taxonomies for organising collections of documents presentationpathsproject
 
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
SemEval-2012 Task 6: A Pilot on Semantic Textual SimilaritySemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
SemEval-2012 Task 6: A Pilot on Semantic Textual Similaritypathsproject
 
A pilot on Semantic Textual Similarity
A pilot on Semantic Textual SimilarityA pilot on Semantic Textual Similarity
A pilot on Semantic Textual Similaritypathsproject
 
Comparing taxonomies for organising collections of documents
Comparing taxonomies for organising collections of documentsComparing taxonomies for organising collections of documents
Comparing taxonomies for organising collections of documentspathsproject
 
PATHS Final prototype interface design v1.0
PATHS Final prototype interface design v1.0PATHS Final prototype interface design v1.0
PATHS Final prototype interface design v1.0pathsproject
 

More from pathsproject (20)

Roadmap from ESEPaths to EDMPaths: a note on representing annotations resulti...
Roadmap from ESEPaths to EDMPaths: a note on representing annotations resulti...Roadmap from ESEPaths to EDMPaths: a note on representing annotations resulti...
Roadmap from ESEPaths to EDMPaths: a note on representing annotations resulti...
 
Aletras, Nikolaos and Stevenson, Mark (2013) "Evaluating Topic Coherence Us...
Aletras, Nikolaos  and  Stevenson, Mark (2013) "Evaluating Topic Coherence Us...Aletras, Nikolaos  and  Stevenson, Mark (2013) "Evaluating Topic Coherence Us...
Aletras, Nikolaos and Stevenson, Mark (2013) "Evaluating Topic Coherence Us...
 
PATHSenrich: A Web Service Prototype for Automatic Cultural Heritage Item Enr...
PATHSenrich: A Web Service Prototype for Automatic Cultural Heritage Item Enr...PATHSenrich: A Web Service Prototype for Automatic Cultural Heritage Item Enr...
PATHSenrich: A Web Service Prototype for Automatic Cultural Heritage Item Enr...
 
Implementing Recommendations in the PATHS system, SUEDL 2013
Implementing Recommendations in the PATHS system, SUEDL 2013Implementing Recommendations in the PATHS system, SUEDL 2013
Implementing Recommendations in the PATHS system, SUEDL 2013
 
User-Centred Design to Support Exploration and Path Creation in Cultural Her...
 User-Centred Design to Support Exploration and Path Creation in Cultural Her... User-Centred Design to Support Exploration and Path Creation in Cultural Her...
User-Centred Design to Support Exploration and Path Creation in Cultural Her...
 
Generating Paths through Cultural Heritage Collections Latech2013 paper
Generating Paths through Cultural Heritage Collections Latech2013 paperGenerating Paths through Cultural Heritage Collections Latech2013 paper
Generating Paths through Cultural Heritage Collections Latech2013 paper
 
Supporting User's Exploration of Digital Libraries, Suedl 2012 workshop proce...
Supporting User's Exploration of Digital Libraries, Suedl 2012 workshop proce...Supporting User's Exploration of Digital Libraries, Suedl 2012 workshop proce...
Supporting User's Exploration of Digital Libraries, Suedl 2012 workshop proce...
 
PATHS state of the art monitoring report
PATHS state of the art monitoring reportPATHS state of the art monitoring report
PATHS state of the art monitoring report
 
Recommendations for the automatic enrichment of digital library content using...
Recommendations for the automatic enrichment of digital library content using...Recommendations for the automatic enrichment of digital library content using...
Recommendations for the automatic enrichment of digital library content using...
 
Semantic Enrichment of Cultural Heritage content in PATHS
Semantic Enrichment of Cultural Heritage content in PATHSSemantic Enrichment of Cultural Heritage content in PATHS
Semantic Enrichment of Cultural Heritage content in PATHS
 
Generating Paths through Cultural Heritage Collections, LATECH 2013 paper
Generating Paths through Cultural Heritage Collections, LATECH 2013 paperGenerating Paths through Cultural Heritage Collections, LATECH 2013 paper
Generating Paths through Cultural Heritage Collections, LATECH 2013 paper
 
PATHS @ LATECH 2013
PATHS @ LATECH 2013PATHS @ LATECH 2013
PATHS @ LATECH 2013
 
PATHS at the eChallenges conference
PATHS at the eChallenges conferencePATHS at the eChallenges conference
PATHS at the eChallenges conference
 
PATHS at the EAA conference 2013
PATHS at the EAA conference 2013PATHS at the EAA conference 2013
PATHS at the EAA conference 2013
 
PATHS at the eCult dialogue day 2013
PATHS at the eCult dialogue day 2013PATHS at the eCult dialogue day 2013
PATHS at the eCult dialogue day 2013
 
Comparing taxonomies for organising collections of documents presentation
Comparing taxonomies for organising collections of documents presentationComparing taxonomies for organising collections of documents presentation
Comparing taxonomies for organising collections of documents presentation
 
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
SemEval-2012 Task 6: A Pilot on Semantic Textual SimilaritySemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
 
A pilot on Semantic Textual Similarity
A pilot on Semantic Textual SimilarityA pilot on Semantic Textual Similarity
A pilot on Semantic Textual Similarity
 
Comparing taxonomies for organising collections of documents
Comparing taxonomies for organising collections of documentsComparing taxonomies for organising collections of documents
Comparing taxonomies for organising collections of documents
 
PATHS Final prototype interface design v1.0
PATHS Final prototype interface design v1.0PATHS Final prototype interface design v1.0
PATHS Final prototype interface design v1.0
 

Recently uploaded

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 

Recently uploaded (20)

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 

Cross-lingual event-mining using wordnet as a shared knowledge interface

  • 1. Cross-lingual event-mining using wordnet as a shared knowledge interface Piek Vossen ˜ Aitor Soroa, Benat Zapirain, German Rigau VU University Amsterdam University of the Basque Country p.t.j.m.vossen@vu.nl a.soroa@ehu.es benat.zapirain@ehu.es german.rigau@ehu.es Abstract The major disadvantage of traditional IE systems is that they focus on satisfying precise, narrow, We describe a concept-based event-mining pre-specified requests e.g. all names of directors system that maximizes information ex- of movies. Compared to full text indexes in In- tracted from text and is not restricted to formation Retrieval (IR), IE systems only cover a predefined knowledge templates. Such small portion of the knowledge in texts while cap- a system needs to handle a wide range turing deeper semantic relations. of expressions while being able to ex- Alternatively, the KYOTO system1 combines tract precise semantic relations. The the comprehensiveness of IR systems with the system uses simple patterns of linguis- depths of IE systems. Furthermore, the system can tic and ontological constraints that are be applied to different languages and domains in applied to a uniform representation of an uniform way. It tries to extract any event and the text. It uses a generic ontology its participants from the text, and relate them to based on DOLCE and wordnets in dif- time and place. To achieve this, it uses a full text ferent languages to extract events from representation format and a wide range of knowl- text in these languages in an interopera- edge in the form of wordnets and a generic on- ble way. The system performs unsuper- tology (Vossen and Rigau, 2010). Word-sense- vised domain-independent event-mining disambiguation (WSD) is a crucial step to map text with promising results. Error-analysis to concepts. We implemented a two-phased WSD showed that the semantic model and the strategy and show the effects on event-extraction. mapping of text to concepts through word- We first apply a state-of-the-art WSD to words in sense-disambiguation (WSD) are not the context, scoring all possible synsets in a wordnet. main cause of the errors but the complex- Each of these synsets is mapped to a shared on- ity of the grammatical structures and the tology. From this mapping, all possible ontolog- quality of parsing. Using the same seman- ical implications are derived. Next, the mining tic model and their cross-wordnet links, system extracts all possible interpretations of all our English event-mining patterns were sequences of ontological concepts that match the transferred to Dutch in less than a day’s patterns. In the second-phase, we select an inter- work. The platform was tested on the envi- pretation only if there is a choice using the WSD ronment domain but can be applied to any score. The system has been tested on texts from other domain. the environment domain. However, the knowledge resources and patterns are generic and can be ap- 1 Introduction plied to any other domain. Traditionally, Information Extraction (IE) is the In the next section, we describe the general ar- task of filling template information from previ- chitecture of the KYOTO system and the knowl- ously unseen text which belongs to a predefined edge structure. In section 3, we describe the mod- domain (Peshkin and Pfeffer, 2003). Standard ule for mining knowledge from the text. In sec- IE systems are based on language-specific pattern tion 4, we describe the evaluation results and an matching (Kaiser and Miksch, 2005), consisting error-analysis for English. Since the profiles use of language specific regular expressions and asso- 1 ciated mappings from syntactic to logical forms. Available at http://www.kyoto-project.eu/
  • 2. <kaf xml:lang="en" doc="example1"> <text> <wf page="1" sent="40" wid="w267" fileoffset="6,11"> water</wf> <wf page="1" sent="40" wid="w268" fileoffset="12,21"> pollution</wf> <...> </text> <terms> <term id="t241" lemma="water" pos="N"> <span><target id="w267"></target></span> <externalReferences> <externalRef conf="0.29" ref="14845743-n" resource="wn30g"/> <externalRef ref="Kyoto#water" reftype="sc_equivalenceOf" resource="ontology"/> <!--...--> <externalRef reftype="SubClassOf" reference="DOLCE-Lite.owl#endurant" status="implied"/> <!--...--> </externalReferences> </term> <term tid="t242" lemma="pollution" pos="N" type="open"> <span><target id="w268"></target></span> <externalReferences> <externalRef conf="0.33" ref="14516743-n" Figure 2: System Architecture resource="wn30g"/> <externalRef ref="Kyoto#contamination_pollution" reftype="sc_equivalenceOf" resource="ontology"/> <!--...--> <externalRef reftype="SubClassOf" is therefore the same module for all the languages reference="DOLCE-Lite.owl#perdurant" (Agirre and Soroa, 2009). The current system status="implied"/> <!--...--> includes processors for English, Dutch, Italian, </externalReferences> </term> Spanish, Basque, Chinese and Japanese. Figure </terms> 1 shows a simplified example of a KAF struc- <!-- Additional layers (chunking, dependencies, ...) --> </kaf> ture with the two basic layers: <text> layer (tok- enization, segmentation) and <terms>, containing Figure 1: Example of a KAF document morpho-syntactic and semantic information drawn from WordNet and the KYOTO ontology for the words water and pollution. language-neutral ontological constraints, they can In KYOTO, the knowledge extraction is done by be easily transferred to another language. There- so-called Kybots (Knowledge Yielding Robots). fore in section 5, we describe how the system was Kybots are defined by a set of profiles represent- transferred from English to Dutch, through the ing information patterns. In the profile, concep- wordnet-equivalence links. tual relations are expressed using ontological and morpho-syntactic patterns. Since the semantics is 2 KYOTO overview defined through the ontology, it is possible to de- tect similar data even if expressed differently. In The KYOTO system starts with linguistic pro- Figure 2, we show an example of a conceptual pat- cessors that apply tokenization, segmentation, tern for the environment domain that relates organ- morpho-syntactic analysis and semantic tagging of isms that live in habitats. The pattern uses labels the text. The semantic tagging involves detection from the central ontology, whereas each wordnet of named-entities and the meaning of words ac- synsets is directly or indirectly related to these la- cording to a given wordnet. The output of the lin- bels. Such a pattern can be used to extract events guistic processors is stored in an XML annotation from text, such as frogs that live in cropland in format that is the same for all the languages, called France during the period 2000-2010. the KYOTO Annotation Format (KAF, (Bosma et The system exploits a 3-layered knowledge- al., 2009)). KAF is compatible with the Linguistic architecture (Vossen and Rigau, 2010), using a Annotation Framework (LAF, (Ide and L.Romary, central ontology, wordnets in different languages 2003)). In KAF, words, terms, constituents and and potential background vocabularies linked to syntactic dependencies are stored in separate lay- the wordnets. The ontology consists of around ers with references across the structures. All 2,000 classes divided over three layers (Hicks and modules in KYOTO draw their input from these Herold, 2009). The top layer is based on DOLCE2 XML structures. Likewise, WSD is done on the 2 same KAF annotation in different languages and DOLCE-Lite-Plus version 3.9.7
  • 3. (Gangemi et al., 2003) and OntoWordNet. The but also other relations such as events in which second layer are the Base Concepts3 (BCs) which the concept denoted by the word plays a role (e.g. cover an intermediate level of abstraction for all the word polluted water denotes water that plays nominal and verbal WordNet synsets (Izquierdo et a role in the event pollution) or, the other way al., 2007). Examples of BCs are: building, vehicle, around, the roles involved in the event denoted by animal, plant, change, move, size, weight. A third the word (e.g. the word water pollution denotes layer consists of domain classes introduced for de- events in which water is as a patient). tecting events and qualities in a particular domain (i.e. environment). 3 Event extraction The semantic model also provides complete A set of abstract patterns called Kybots use mappings to the ontology for all nominal, ver- the central ontology to extract actual concept bal and adjectival WordNet3.0 synsets (Fellbaum, instances and relations from KAF documents. 1998)4 . The mappings also harmonize predi- Event-mining is done by processing these abstract cate information across different part-of-speech patterns on the enriched documents. These pat- (POS). For instance, migratory events represented terns are defined in a declarative format using by different synsets of the verb migrate, the noun profiles, which describe general morpho-syntactic migration or the adjective migratory inherit the and semantic conditions on sequences of KAF same ontological information corresponding to the terms (which are lemmas in the text). These ChangeOfResidence class in the ontology. profiles are compiled to XQueries to efficiently This generic knowledge model provides an ex- scan over KAF documents uploaded into an XML tremely powerful basis for semantic processing database. These patterns extract the relevant infor- in any domain. Furthermore, through the equiv- mation from each match. alence relations of wordnets in other languages Figure 3 shows an example of a simple Kybot to the English WordNet, this semantic framework profile. Profiles are described using XML syntax can also be applied to other languages as shown in and consist of three main parts: Section 5. The WSD module assigns concepts to each • Variable declaration (<variables> element). word with a score based on the context. Onto- In this part, the search entities are defined, logical tagging of the text is then the last step in e.g.: X (terms whose part-of-speech is the pre-processing before the extraction of events. noun and whose lemma is not “system”), For each synset associated to a word, we use the Y (terms whose lemma is either “release”, wordnet to ontology mappings to look up its asso- “produce” or “generate”) and Z (terms ciated ontological classes and inherited properties. linked to a subclass of the ontological class The Base Concept mapping guarantees that every DOLCE-Lite.owl#contamination pollution, synset is mapped to an ontological class. Next, meaning being contaminated with harmful we insert into KAF all the ontological implications substances). that apply to each concept. By making the implicit • Relations among variables (<rel> element): ontological statements explicit, Kybots are able to This part specifies the relations among the find the same relations hidden in different expres- previously defined variables, e.g.: Y is the sions with different surface realizations, e.g.: wa- main pivot, variable X must precede variable ter pollution, polluted water, pollution of water, Y in the same sentence, and variable Z must water that is polluted directly or indirectly express follow variable Y. Thus, this relation declares the same relations. Figure 1 shows how ontolog- patterns like ’X → Y → Z’ in a sentence. ical statements are represented in KAF as exter- • Output template: describes the output to be nal references related to synsets with a score from produced for every match, e.g.: each match the WSD (the value of the atribute conf). Words generates a new event from the term Y and in the term structure usually get many ontological two roles: the ’done-by’ role filled by term X implications for each word meaning. The implica- and ’patient’ role, filled by Z. tions reflect subclass relations from the ontology 3 We created 261 generic profiles for English. http://adimen.si.ehu.es/web/BLC 4 This knowledge model is freely available through the These profiles capture very simple sequences of KYOTO website as open-source data. parts-of-speech or words, e.g. noun-verb or
  • 4. <kprofile> <variables> • a list of text token ids that represent a partic- <var name="x" type="term" pos="N" lemma="! system"/> <var name="y" type="term" ipant lemma="produce | generate | release"/> <var name="z" type="term" If an event has multiple participants, a separate ref="DOLCE-Lite.owl#contamination_pollution" reftype="SubClassOf"/> triplet is created for each event-participant pair. </variables> <relations> The triplet identifier is used to mark which triplets <root span="y"/> relate to the same event. <rel span="x" pivot="y" direction="preceding"/> <rel span="z" pivot="y" direction="following"/> </relations> 4.2 Evaluation results for English <events> <event target="$y/@tid" lemma="$y/@lemma" We created a gold-standard in the triplet format. pos="$y/@pos"/> <role target="$x/@tid" rtype="done-by" An annotation tool that reads KAF and can assign lemma="$x/@lemma"/> <role target="$z/@tid" rtype="patient" any set of tags to tokens in KAF was used to make lemma="$z/@lemma"/> a gold-standard for a document about the Chesa- </events> </kprofile> peake Bay, a large estuary in the US5 . The doc- ument has 16, 145 word tokens. We manually an- Figure 3: Example of a Kybot profile notated all relevant relations in 127 sentences, cor- responding to 1, 416 tokens, 353 triplets and 201 events. adjective-noun, where each word is restricted to The first column in Table 2 show the annotated classes from the ontology, e.g. a motion event fol- relations.6 The patient relation is most frequent lowed by a geographical region. Note that it is (38%), followed by done-by (15%) and simple- important that all possible expressions of relation cause-of (14%). are modeled by the profiles. As a baseline, we created triplets for all heads of constituents in a single sentence according to the 4 Evaluation constituent representation of the text in KAF. The 4.1 Triplet representation for representing baseline generates 3, 427 triplets for the annotated events sentences. Since there is no relation predicted, we assume the most-frequent patient relation. The event structure in KYOTO is rather specific To evaluate the Kybots, we used the 261 generic and events can be complex, including many dif- profiles. The profiles generated 548 triplets for the ferent roles and relations. Below is an example annotated sentences. In total 169 profiles or com- of such a structure extracted from the sentence: binations of profiles (since multiple profiles can “Forests also absorb air pollution and retain up to propose the same triplet) have been applied to the 85 percent of the nitrogen from sources such as annotated fragment. automobiles and power plants.” To measure the proportion of relevant events <event eid="e203" target="t4260" that are detected by these heuristics, we compare lemma="absorb" pos="V" synset="eng-30-01539633-v" rank="0.25"/> the baseline and Kybot events with the event to- <role rid="r280" event="e203" target="t4258" lemma="forest" pos="N" rtype="done-by" kens in the gold standard. The gold standard has synset="eng-30-09284015-n" rank="0.15"/> 201 events and the baseline 1,627 events, of which <role rid="r976" event="e203" target="t4262mw" lemma="air pollution" pos="N" rtype="patient" 249 overlap with the gold standard events. This synset="eng-30-14517412-n" rank="1"/> <role rid="r1609" event="e203" target="t4277mw" results in a recall of 1.24 and a precision of 0.15. lemma="power plant" pos="N" rtype="simple-cause-of" synset="eng-30-03996655-n" rank="1"/> The Kybot profiles detect 733 events, of which 209 <role rid="r276" event="e203" target="t4274" are relevant. Recall is 1.04 and precision is 0.29. lemma="automobile" pos="N" rtype="simple-cause-of" synset="eng-30-02958343-n" rank="1"/> Recall of events is similar to the baseline and pre- cision is almost twice as high. The fact that the To be able to compare our results with the out- recall is higher than 1 is caused by the fact that put of other systems and gold-standards, we de- the gold standard sometimes marks larger phrases fined a more neutral and simple triplet format. as a single event which may be separate events in A triplet consists of: 5 The tool and evaluation data is available at the KYOTO website • a relation 6 The relations are taken from the DOLCE part of the KY- OTO ontology. Please consult the DOLCE ontology for their • a list of text token ids that represent the event formal definition.
  • 5. the baseline and the Kybot output. Both the base- To analyze the recall in more detail, we looked line and the Kybot profiles thus do not miss any at the most-frequent missed relations: patient (48) relevant events but do extract a substantial amount and done-by (30) (see Table 2). of irrelevant events. The precision for the profiles Relation Gold % System Correct R. P. Missed is still reasonable, given the fact that no relevance destination-of 27 7.65% 17 6 22 35 21 ranking has been applied and only generic profiles use-of generic-location 4 11 1.13% 3.12% 1 22 1 8 25 72 100 36 3 3 have been used. Note that events that are not an- source-of 4 1.13% 10 1 25 10 3 instrument 2 0.57% 0 0 0 0 2 notated can still be proper events. product-of 2 0.57% 0 0 0 0 2 part-of 1 0.28% 3 0 0 0 1 Table 1 shows the results for the triplet evalua- purpose-of 7 1.98% 9 3 42 33 4 patient 133 37.68% 195 85 63 43 48 tion of the relevant events. path-of 1 0.28% 0 0 0 0 1 result-of 4 1.13% 7 0 0 0 4 Ignoring relations With relations participant 0 0.0% 3 0 0 0 0 Baseline Kybots Baseline Kybots has-state 32 9.07% 42 11 34 26 21 Nr. correct 306 222 115 174 state-of 22 6.23% 25 11 50 44 11 Precision 0.09 0.49 0.03 0.32 done-by 52 14.73% 89 22 42 24 30 Recall 0.86 0.63 0.33 0.49 simple-cause-of 51 14.45% 125 26 50 20 25 Total 353 100% 548 174 49 31 179 Table 1: Baseline and Kybot results Table 2: Generic processing with 261 profiles dif- ferentiated per relation When ignoring the relation, recall for the base- line is 86%, which shows that the baseline matches From the patient triplets, we missed 25% due to a substantial part of the annotated triplets. It also parser errors, among which wrong-POS, missed shows that 14% is missed. This is due to the fact verb-particle combinations and multiwords. An- that the parser only marks one word as the head other 15% of the patient triplets was not found be- in the case of a coordination of heads, e.g. in the cause the parser does not provide detailed and re- phrase “birds and fish” only “bird” is marked as liable dependency information to distinguish be- the head. Precision of the baseline is very low, tween subjects and objects and the ontology does even when we ignore the relation itself. If we take not distinguish sufficiently between events with the patient relation as the default, we see that the participants that control the process (e.g. “to precision and recall drop even more. The Kybot swim”) and participants that do not (e.g. “to profiles clearly outperform the baseline in terms flow”). Remarkably, only 4% of the errors are of precision: 49% when ignoring the relation and due to a missing concept in WordNet or a wrong 32% considering all relations. In terms of recall, mapping of WordNet to the ontology. Another 4% we see that 63% is covered when we ignore the could have been found by making more profiles. relation. This indicates that the profiles do con- sider the majority of structures, but still miss 37% In the case of the done-by relation, 30% of of the structures. When we consider the relations, the missed relations are the result of parser errors recall drops to 49% which is still well above the (mainly coordination of NPs and VPs in which baseline. only one is marked as the head) and another 30% because the structures of simple-cause-of and 4.2.1 Error analysis done-by are the same and the ontology does not We did a separate error analysis for recall and pre- provide sufficient information on the events to dis- cision. First of all, we checked the 1,023 term tinguish. tokens of content words (nouns, verbs and adjec- Precision errors are mostly caused by the fact tives) that occurred in the 127 gold-standard sen- that patient, done-by and simple-cause-of are eas- tences. It turned out that there are 70 tokens with ily confused not only by the Kybots but also by the wrong POS assigned (7%). The major errors humans. The patient relation performs slightly are nouns and verbs interpreted as adjectives and above average precision: 43% but done-by (24%), common nouns considered as proper names, most simple-cause-of (20%) and has-state (26%) are notably “wetlands” and “wastewater” occurring 3 performing below average. Especially, the simple- and 5 times respectively. If the wrong POS is as- cause-of relation is decreasing the overall preci- signed, the words cannot be found in WordNet sion since it represents 125 triplets (15%). The or the wrong synsets are assigned. In that case, simple-cause-of relation applies to perdurants re- wrong or no ontological statements are inserted for lated to other perdurants. Due to the ambiguity in a word. English of nouns to denote either an endurant or
  • 6. WSD threshold #triplets #correct P. R. F1 a perdurant, the system is likely to over-generate 0 548 174 0.32 0.49 0.39 this relation. The reverse holds for the done-by re- 10 500 169 0.34 0.48 0.40 20 479 167 0.35 0.47 0.40 lation. Both relations typically hold for the same 30 470 167 0.36 0.47 0.41 structures such as nouns in subject position of a 40 461 166 0.36 0.47 0.41 50 446 164 0.37 0.46 0.41 verb. Another common error that is related are 60 434 164 0.38 0.46 0.42 70 429 162 0.38 0.46 0.41 cases such as forest destroyed and houses built. 80 427 161 0.38 0.46 0.41 Since the parser does not provide information on 90 426 161 0.38 0.46 0.41 100 377 148 0.39 0.42 0.41 the inflection of the verbs nor on passive/active manual 364 141 0.39 0.40 0.39 form, the profile can only detect a noun+verb pat- tern and assigns a done-by relation where a patient Table 3: Generic processing with different WSD relation should be assigned. Again, more informa- thresholds. tion from the parser in the KAF representation can help here. 42%. We get the optimal settings using a thresh- The main conclusion is that major improve- old for WSD of 60%. This gives an F-measure of ment both in recall and precision can be achieved 42%, for precision 38% and recall 46%. by better and more input from the linguistic pre- We also applied a manual disambiguation of the processing, by richer ontological information e.g. benchmark file. The results for the manually dis- control of events, and by extending the number of ambiguated file are shown in the last row. We can profiles. Furthermore, precision could also be im- see that less triplets are generated (364) but close proved if we can resolve ambiguity between en- to the 100% WSD threshold (377). Remarkably, durants and perdurants of nouns to distinguish for the precision is the same as for 100% WSD while example done-by from simple-cause-of. It thus recall is a bit less (40%). This shows that the er- makes sense to consider the effect of WSD on the rors of WSD apparently do not have a big impact precision of the mining. This is discussed in the on the extraction. For recall, it is thus better to next section. use less perfect WSD. This is inline with the er- ror analysis in the previous section, which showed 4.2.2 Effects of Word-Sense-Disambiguation that structural processing is more a problem than The generic processing considers all the possible the mapping of the text to concepts. meanings of the words and does not take the WSD We also checked the effect of WSD on the ex- into account. traction of relevant events. Eliminating synsets To see the effect of the WSD, we implemented through WSD did not show any effect. Precision a filter on the Kybot output that selects interpreta- remains the same (29%), and recall only drops tions with the highest WSD score for each word slightly from 104% to 97% when we limit the in the output that has multiple interpretations. By events to 100% WSD threshold. In the case of interpretation we mean: being either an event or a manual WSD, we do get a much higher preci- role or having different relations assigned. By ex- sion (49%) and a bit lower recall (83%). This cluding low scoring concepts only when there is a clearly shows that the Kybots over-generate many choice to be made, we hope to capture as much re- events due to the event-object ambiguity of words call as possible and to gain precision. Note that in English. The fact that precision of the manually the WSD scored a precision of 48% in the Se- tagged file is much less than the recall, also sug- mEval2010 task on domain specific WSD, which gest that relevance of extracted events is not con- used documents from the same domain as KY- sidered by our system: even after manual (perfect) OTO ((Agirre et al., 2010)). We set a threshold WSD, the system detects events that are not anno- for eliminating relations in proportion to the max- tated. If we consider 49% of event detection as the imum WSD scores of each word. The results are upper limit here, which seems reasonable, we can shown in Table 3. A threshold of 0 means that all say that the Kybots reach a precision of 60% of the interpretations are considered, a threshold of 100 upper limit in detecting events. means only the highest scoring interpretations. We can see that there is a positive correlation 4.2.3 Effects of selecting best performing between WSD threshold and precision, where pre- profiles cision increases from 32% to 39% using the high- The profiles perform very differently in terms of est WSD scores only. Recall drops from 49% to recall and precision. We therefore derived the
  • 7. precision for each each profile, using the opti- sharing the same semantic backbone and transfer- mal WSD setting of 60% of the maximum score. ring Kybot profiles, we carried out a transfer ex- We implemented a filter that checks every conflict periment from English to Dutch. We collected 93 across triplets. If two triplets involve the same Dutch documents on a Dutch estuary (the West- events and roles but have a different relation, we erschelde) and related topics. We created KAF choose the triplet generated by the higher scoring files and applied WSD to these KAF file using the profile. The results are shown in Table 4. The Dutch wordnet data. first row of the table shows the results for a WSD To apply the profiles to the Dutch KAF docu- threshold of 60% using 128 profiles. The remain- ments, we need to apply the ontotagger program ing rows show the results when this 60% WSD to the Dutch KAF. We created tables that match output is post-filtered using profiles with precision every Dutch synset to the English Base Concepts scores 1, 5, 10, 25, 50 and 75. and to the ontology using the equivalence rela- tions. We generated 145,189 Dutch synset to En- #profiles #triplets #correct P. R. F1 All profiles 129 434 164 0.38 0.46 0.42 glish Base Concept mappings (for comparison for profiles 1% 104 332 147 0.44 0.42 0.43 English we have 114,477 mappings) and 326,667 profiles 5% 103 312 147 0.47 0.42 0.44 profiles 10% 103 312 147 0.47 0.42 0.44 Dutch synset to ontology mappings (186,383 for profiles 25% 93 284 141 0.50 0.40 0.44 profiles 50% 76 219 115 0.53 0.33 0.40 English). These ontotag tables were used to insert profiles 75% 22 46 32 0.70 0.09 0.16 the ontological implications into the Dutch KAF files. Table 4: Generic processing with WSD threshold Next, we adapted the 261 English Kybot pro- of 60% and using best performing profiles. files to replace all English specific elements by Dutch. This mainly involved: We see a clear increase in precision and a drop in recall, as expected. However, we also see an • replacing English prepositions and relative increase in the F-measure from 41% to 44% us- clause complementizers by Dutch equiva- ing a subset of the profiles with higher precision. lents; Using profiles with a precision score of at least 25%, we obtain a precision of 50% and a recall • adapting the word order sequences for rela- of 40%. With these settings, 90 profiles have been tive clauses in Dutch; used compared to 129 profiles using just the WSD • adapting profiles that include adverbials, threshold of 60%. This shows that the set of pro- since they occur in different positions in files can be optimized for specific document col- Dutch; lections by annotating a proportion of the collec- tion that is representative and deriving a precision- • eliminating profiles for multiword com- score for the different profiles. Likewise, we can pounds which mostly occur in Dutch as a one pair style of writing to the type of relation ex- word compound; pressed. If we compare these results with the manually • eliminating profiles for explicit English struc- annotated file in Table 3, we see that the best pro- tures that express causal relations; files have a much higher precision (50% against We kept all the ontological constraints exactly 39% manual) and the same recall. This again con- as they were for English. Only superficial syntac- firms that the challenge for getting more preci- tic properties were thus changed. It took us half-a- sion is in resolving the structural relations in the day to adapt the profiles for Dutch. From the orig- text rather than assigning better concepts through inal 261 English profiles, we obtained 134 Dutch WSD. profiles. 5 Transferring Kybots to another We ran the profiles on the 93 Dutch KAF files (42,697 word tokens) and 65 profiles generated language output: 4,095 events and 6,862 roles. In terms An important aspect of the KYOTO system is the of relations, we see a similar distribution as for sharing of the central ontology and the possibil- English, as shown in Table 5. The patient rela- ity to extract semantic relations in different lan- tion is most frequent, followed by relations such as guages in a uniform way. To test the feasibility of generic-location, has-state and done-by. We did a
  • 8. Relation # % and KNOW2 TIN2009-14715-C04-04 projects. destination-of 10 0,15% patient 2067 30,12% path-of 23 0,34% References has-state 1236 18,01% generic-location 396 5,77% E. Agirre and A. Soroa. 2009. Personalizing pagerank state-of 748 10,90% for word sense disambiguation. In EACL, pages 33– source-of 669 9,75% 41. done-by 792 11,54% part-of 87 1,27% simple-cause-of 573 8,35% Eneko Agirre, Montse Cuadros, German Rigau, and purpose-of 261 3,80% Aitor Soroa. 2010. Exploring knowledge bases Total 6,862 for similarity. In Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Table 5: Relations extracted for Dutch documents. Mariani, Jan Odijk, Stelios Piperidis, Mike Ros- ner, and Daniel Tapias, editors, Proceedings of the Seventh conference on International Language Re- preliminary inspection and the results look reason- sources and Evaluation (LREC’10), Valletta, Malta, able. For instance, two frequent words denoting may. European Language Resources Association events (the noun toename (increase) and the verb (ELRA). stijgen (increase)) appear to have sensible patients W. E. Bosma, Piek Vossen, Aitor Soroa, German (number, activity, consumption, pollution, trade, Rigau, Maurizio Tesconi, Andrea Marchetti, Mon- pressure, ground sea level, earth). ica Monachini, and Carlo Aliprandi. 2009. Kaf: a generic semantic annotation format. In Proceedings of the GL2009 Workshop on Semantic Annotation, 6 Conclusions Pisa, Italy. We described an open platform for event-mining C. Fellbaum. 1998. WordNet: An Electronical Lexical using wordnets and a central ontology that aims at Database. The MIT Press, Cambridge, MA. maximizing the information extracted from text. A. Gangemi, N. Guarino, C. Masolo, and A. Oltramari. The system uses a limited set of generic patterns 2003. Sweetening wordnet with dolce. AI Mag., with structural and ontological constraints on ele- 24(3):13–24. ments from the text. We have shown that word- A. Hicks and A. Herold. 2009. Evaluating ontolo- nets can be used to map text to ontological classes gies with rudify. In Jan L. G. Dietz, editor, Pro- and extract events and participants from text. Our ceedings of the 2nd International Conference on error analysis showed that recall is mostly ham- Knowledge Engineering and Ontology Development pered by the structural complexity of the text and (KEOD’09), pages 5–12. INSTICC Press. the incapability of the parser to handle this phe- N. Ide and L.Romary. 2003. Outline of the interna- nomenon. The knowledge resources, wordnet and tional standard linguistic annotation framework. In the ontology, did not play a major role in recall. Proceedings of ACL’03 Workshop on Linguistic An- However, precision of the event relations is more notation: Getting the Model, pages 1–5. affected by richness and quality of the semantics R. Izquierdo, A. Suarez, and G. Rigau. 2007. Explor- analysis. We have shown that WSD has a positive ing the automatic selection of basic level concepts. effect on the precision of the extracted relations In Galia Angelova et al., editor, International Con- ference Recent Advances in Natural Language Pro- and that precision can be further optimized by tun- cessing, pages 298–302, Borovets, Bulgaria. ing the structural profiles to the genre of the target text. The system can be easily transferred to any K. Kaiser and S. Miksch. 2005. Information extrac- language that has a wordnet connected to the En- tion. a survey. Technical report, Vienna University of Technology. Institute of Software Technology and glish WordNet, as was shown for Dutch. In the fu- Interactive Systems. ture, we want to further improve recall and preci- sion using richer event data and machine learning L. Peshkin and A. Pfeffer. 2003. Bayesian information extraction network. In In Proc. of the 18th Interna- techniques and use the output for reconstruction of tional Joint Conference on Artificial Intelligence. relations between events. We will also experiment with other parsers for English and Dutch to see the P. Vossen and G. Rigau. 2010. Division of semantic effect on the quality. labor in the global wordnet grid. In Proc. of Global WordNet Conference (GWC’2010). Mumbay, India. Acknowledgments This work was been possible thanks to the support provided by KYOTO ICT-2007-211423, PATHS ICT-2010-270082