SlideShare a Scribd company logo
1 of 8
Coreference Resolution using hybrid approach


Abstract                                           resolution system is to find out this relationship
This work presents a novel approach to find        between them.
structured information from the vast repository              In         information      retrieval,   when
of unstructured text in digital libraries using    performing a search on “Taj Mahal” we would
Coreference Resolution. Our approach uses a        like it if the page ranking algorithm also includes
salience based technique to find the antecedents   “it”, “the monument”, etc. which refer to “Taj
of pronouns, while it uses a classifier based      Mahal” for evaluating the rank of that page.
technique for the resolution of other noun         Similarly a question answering system would be
phrases. A comparison of the proposed approach     able to fetch better answers, and improve the
with several baselines methods shows that the      recall if a coreference resolution system is
suggested solution significantly outperforms       incorporated in it. It was suggested by Peral et al
them with 66.1 % precision, 58.4% recall and a     that identification of coreferences is essential for
balanced f-measure of 62.0%.                       an efficient machine translation system [1].
                                                             The algorithm due to Hobbs [2] is one
Key words: Coreference Resolution, Hybrid
                                                   of the earliest attempts to resolve personal
approach, Machine learning, anaphoricity
features.                                          pronouns he, she, it and they in three texts by
                                                   finding their antecedents. Lappin and Leass [3]
1 Introduction
                                                   identifies     the     noun-phrase      antecedents   of
          The availability of vast amounts of
                                                   personal pronouns, including the reflexive
machine readable texts in digital libraries has
                                                   pronouns. These algorithms are good examples
resulted in an ever increasing need to find
                                                   of approaches that are language specific and use
methods     for   extracting   some   structured
                                                   hand crafted rules. As an alternative, several
information from them. This has given an
                                                   knowledge-poor approaches have been proposed
impetus to the development of methods for
                                                   by Kennedy and Boguraev [4] and Mitkov [5].
Information Extraction. One of these tasks is
                                                   These approaches work with tools like part-of-
Coreference Resolution. Coreference is said to
                                                   speech tagger and shallow parser. The algorithm
occur when one expression has the same referent
                                                   used for resolution of pronouns is similar to the
as another expression. Coreference resolution is
                                                   one proposed in [3]. These approaches show
the process of finding coreferences in a
                                                   good accuracy. However, these approaches are
document. For example “Mohandas Karamchand
                                                   limited      to      third   person      pronouns.    A
Gandhi” can be referred to as “Mr. Gandhi”,
                                                   computational theory for discourse structure was
“Mahatma Gandhi”, “he” or even “Bapu”. All
                                                   proposed by Grosz et al [6]. further elaborated in
these phrases corefer and the purpose of the
                                                   the Centering Theory [7] which fits into this
                                                   computational model and gives the relationship
of the attentional state with local coherence. The    classifier is trained for pairs of coreferring
algorithm proposed by Vieira and Poesio [8]           markables which in turn is used for the
works by checking the definite description if it is   coreference resolution procedure. A set of
a special noun followed by a test for appositive      features are determined for a pair of markables
construction. The performance of this approach        before giving it to the classifier. The researchers
was measured in two ways, first by examining          trained a separate decision tree for MUC-6 [11]
how well the system performed coreference             and MUC-7 [13], each using 30 training texts
resolution on the anaphoric definite Noun             annotated with anaphoric links. The system
Phrases (NP), achieving 72% recall and 82%            generated 59% recall and 67% precision for
precision, and then by examining how well the         MUC-6 and 56% recall and 66% precision
system identified non-anaphoric definite NPs,         forMUC-7. The model of [12] was extended by
achieving 74% recall and 85% precision. Later         Ng and Cardie [14] making few changes to the
on, an extension of this approach was also            learning framework      and by adding more
proposed by the same authors where the bridging       features. They implemented their approach using
anaphors were also taken into consideration [9].      both a decision tree learner (C4.5) and a rule
The performance of this approach was found for        based learner (RIPPER). This model was tested
the system with and without considering the           for both MUC-6 and MUC-7 test corpus giving
heuristic bridging anaphors. The precision and        a recall of 64.2%, precision of 78.0% and an f-
recall for the first case were found to be 76% and    measure of 70.4% for MUC-6 test set and a
53% respectively and in the second case were          recall of 55.7%, precision of 72.8% and an f-
70% and 57% respectively. An unsupervised             measure of 63.1% for MUC-7 test set. Another
algorithm for noun-phrase coreference resolution      model [15], was proposed by the same authors
was presented by Cardie and Wagsta [10]. The          based on anaphoricity determination. The model
algorithm was tested for MUC-6 coreference            uses two classifiers: one to determine if a
data [11]. It contained two sets of 30 documents      particular NP will have an antecedent or not and
annotated with coreference links. The first set       the other was similar to the one used in the first
gave 48.8% recall, 57.4% precision and an f-          model [14]. The complete system gives a recall
measure of 52.8%. The second set gave 52.7%           of 57.2%, a precision of 71.6% and an f-measure
recall, 54.6% precision and an f-measure of           of 63.7 for MUC-6 test set while it gives a recall
52.6%. The problem with this approach is that         of 47.0 %, a precision of 77.1% and an f-
once a link is formed it cannot be broken.            measure of 58.4 for MUC-7 test set. Iida et al
         In the references cited above, the prime     [16] proposed a modified approach to noun-
emphasis was on pronoun resolution only. For          phrase coreference resolution by reversing the
the more general case of anaphora resolution a        steps of the model proposed in [15]. This model
corpus-based,    supervised   machine     learning    does not solve the problem of skewed training
approach to noun phrase coreference resolution        dataset. To overcome the drawback of skewed
was adopted by Soon et al [12]. A decision tree       training dataset this work presents a tournament
model [17]. However, the tournament model is         sentences and the current sentence. It is found
not capable of anaphoricity determination as it      empirically that in most of the cases the
always selects a candidate for a given NP.           antecedent of the pronoun is within 2 sentences.
         The rest of the paper is organized as       The pronoun model processes each sentence one
follows. In the following section the present        by one in chronological order. The local focus of
approach is discussed. Some of the tools and         the discourse is found by evaluating the salience
techniques are described in the third section. The   value for each noun / pronoun that occurs in
results are presented in the fourth section while    these sentences. When the pronoun occurs,
the last section presents some concluding            entities in the salience-list are passed to a
remarks including possible improvements.             syntactic-filter. The filter discards those entities
                                                     from the salience-list which do not satisfy the
2 Proposed Approach
                                                     agreement constraints and binding constraints.
         The proposed approach tries to rework
                                                     From the remaining entities, the one which is
the approach given [12]. Study of the discourse
                                                     most salient is selected as its antecedent. The
theory [7] reveals that the local focus plays an
                                                     detailed list of constraints and the method for
important role for resolution of pronouns. A
                                                     finding out salience is given in [3].
classifier like the one proposed in [12] will not
be able to determine the focus and so tend to err      2.2. Method for Non-Pronouns
                                                             The method for finding out coreferring
more   during    resolving   pronouns.   Another
                                                     expressions      (if   any)   for   non-pronominal
problem with this approach is in clustering. In
                                                     markables is based on the output of a classifier.
order to overcome these problems we are solving
                                                     The classifier takes two markables ((Markablei)
the coreference resolution problem by finding
                                                     and a candidate antecedent (Markablej)) as input
out links between referencing expressions- the
                                                     and classifies the pair as coreferring or non-
markables. Therefore, in the present approach,
                                                     coreferring. The classifier is similar to the one
the markables are divided into two categories: i)
                                                     used in [12]. The attributes used by our classifier
Pronouns and ii) Non-pronouns. We use
                                                     is given in Table 1.
different methods for finding antecedents of
                                                               In Table 1, the attribute head is the head
pronouns and non-pronouns. Both these methods
                                                     word of the noun phrase. A head word is the
are explained below.
                                                     main word of focus in a noun phrase. For
  2.1. Method for Pronouns                           instance, in the noun-phrase “the next biggest
        The method for resolving pronouns
                                                     achievement of the century”, the noun in focus is
should take into consideration the local focus of
                                                     “achievement”. So, the head of this noun phrase
the discourse in addition to the basic binding and
                                                     is “achievement”. The noun phrases which start
agreement constraints. We have adopted the
                                                     with “the” are considered as definite while the
pronoun resolution method proposed in [3].
                                                     noun phrases which start with “this”, “that”,
Whenever a pronoun occurs in a sentence, it is
                                                     “these”    and     “those”    are   considered   as
sent to a pronoun model with previous n-
                                                     demonstrative. The hypernymy relation is used
to find out the semantic class of a markable. The      semantic class of two markables, the hypernymy
class from the seven classes defined in the            relation of wordnet is used. Each markable is
previous section which is a hypernym of the            assigned a synset corresponding to the first noun
head of the markable is assigned as the semantic       sense of its head. Two markables match in
                                                       semantic class if their synsets are same or they
Table 1: Attributes for non-pronoun classifier
                                                       are related directly by the hypernymy-hyponymy
Attribute             Description
                                                       relationship of wordnet. According to the output
String Match          True if Markablei is same as
                      Markablej, else false            of the classifier, the markables are grouped into
Same Head             True if Markablei and            coreference classes.
                      Markablej has the same head,              The classifier is trained using a slightly
                      else false
Definitei             True if Markablei is definite,
                                                       modified approach to that of [12]. Instead of

                      else false                       using only the immediate antecedent of a
Demonstrativei        True if Markablei is             markable in a coreference chain, all the non-
                      demonstrative, else false
                                                       pronominal antecedents are paired with the
Number Agreement      True if Markablei and
                      Markablej match in number,       markable to form positive samples. Negative
                      else false                       samples are generated by pairing up the
Gender Agreement      True if Markablei & Markablej    markable with the intermediate markables that
                      have same gender
                                                       are not present in the chain. Finally, the resultant
                      False if Markablei &
                                                       coreference classes will be generated as output.
                      Markablej have different
                                                       Each coreference class will contain markables
                      gender
                      Unknown if the gender of         that corefer with each other.
                      either markable is unknown
Both Proper Name      True if both markables are       3 Tools and Techniques
                                                                This section summarizes some of the
                      proper names, else false
Part of               True if either markable is       tools and techniques used for achieving the
                      totally contained in the other   proposed technique outlined in previous section.
                      markable, else false
Semantic Class        True if both markables have
                                                       3.1 Named Entity Recognizer
Agreement             the same semantic class.                A named entity recognizer (NER)
                      False if both markables have     identifies the named entities occurring in a text.
                      different semantic class.
                                                       Named entities are the names of people,
                      Unknown if semantic class of
                                                       organizations, locations, dates, etc. The Stanford
                      either of the markables cannot
                                                       Named Entity Recognizer API [19] is used to
                      be determined.
                                                       extract the named entities present in a noun
class of that markable. If none of the seven           phrase and classifies them into three different
classes is a hypernym, the semantic class of that      types: person, organization and location. The
markable is made ‘unknown’. For implementing           NER is used to determine if the markable is a
the solution, freely available API namely,             named entity, extract the alias feature and also
JWordnet [18] is used. For matching the                determine if the markable is a proper noun.
The open source Java API Weka is used
3.2 Derivation of Gender Information
                                                     [23] to learn the decision tree. This API provides
        In order to evaluate the feature “gender
                                                     implementation of various machine learning
agreement” the system finds the gender of a
                                                     algorithms for data mining. We use the C4.5
noun phrase from a huge database containing
                                                     learning algorithm provided in the API.
gender information. The entries in the database
give the number of occurrences of an expression
                                                     4 Results and Analysis
as   masculine,   feminine     or   neuter.   This
                                                                 This     section     provides        a     detailed
probabilistic gender database is generated using
                                                     description of the experiments performed for the
a large number of texts [20,21]. For finding the
                                                     coreference resolution of Noun Phrases for
gender of a noun phrase, the head of the noun
                                                     English. The approach of [12] is taken as a
phrase is taken as the input of a database query
                                                     baseline. Out of the 78 pair of files, 38 pairs are
and the counts of the head as masculine,
                                                     used for training and 40 pairs are used for
feminine and neuter are yielded as output. The
                                                     testing. Table 2 shows the number of pronouns,
gender having the maximum count is taken as
                                                     definite/demonstrative           Noun         Phrases       and
the gender of the noun phrase. For example the
                                                     remaining      Noun       Phrases           participating    in
noun phrase “President Fujimori” having the
                                                     coreference relation.
head “Fujimori” is assigned male as its gender.
                                                     Table 2: Noun Phrases and their occurrences in
As gender information plays an important role in
                                                     coreference relation
coreference resolution, both for pronouns and                                       Definite /
                                                                  Pronouns                            Others     Overall
non-pronouns, the accuracy of this database will                               Demonstrative
                                                       Co-
govern the overall accuracy.                                        352                380                725     1457
                                                     referring
                                                       Total        559               1247             3732       5538
3.3 Parsing
         The system uses the Stanford Parser
[22] API for determining the head word for each        4.1 Experimental Results of Soon’s
                                                       Approach
of the noun phrases and for determining its
                                                             A detailed analysis of the approach [12]
grammatical role like subject, direct object,
                                                     was made. Decision trees were learned using
indirect object etc. for the pronoun resolution
                                                     various confidence factors for pruning (0.25,
method. It is a probabilistic parser which derives
                                                     0.4). Lower confidence factor for pruning
knowledge from hand-parsed sentences. The API
                                                     produces a generalized tree while a higher
provides three parses, probabilistic context free
                                                     confidence factor produces a detailed tree having
grammar (PCFG) parser, dependency parser and
                                                     more branches. Our analysis shows that in case
lexicalized PCFG parser. We use the lexicalized
                                                     of pronominal anaphor, the output is false when
PCFG parser to obtain the required information
                                                     all the agreement features like gender, number
for the noun phrases.
                                                     and semantic class are true. However we know
                                                     that for a pronoun and its antecedent, all these
3.4 Decision Tree Learner
                                                     agreement features do match. The reason for this
anomaly is that while generating training                               Total       Total
                                                                                               Correct
                                                      Approach        links in    links in
samples for a pronoun, a positive sample is                                                     links
                                                                        Key       Output
generated for one pair while negative samples          Approach
                                                                        238         246           41
are generated for all other pairs. Thus, the             [12]
predominance of negative samples outweighs             Proposed
                                                                        238         253           92
those of the positive samples, leading to the          Approach

anomaly mentioned above. We can infer from
                                                     An analysis of tree, given in Figure 1, generated
these experiments that the features used in [12]
                                                     using our approach shows that the “Same Head”
are not adequate to resolve pronouns. The
                                                     feature is the most important feature. In case
agreement     features   provide   the   necessary
                                                     where the markables do not have a same head,
conditions but they are not sufficient. So, extra
                                                     their semantic class proves is the next significant
information is required for resolving pronouns.
                                                     feature. The other agreement features also play
        One may suggest that salience of a
                                                     their part in determining the coreference between
candidate antecedent would prove decisive for
                                                     non-pronominal noun phrases.
pronoun resolution. However, salience is a
relative measure and is significant only if it is    Same Head = true
compared to the salience of another markable.        | Part of = true: false(163.0/3.0)
                                                     | Part of = false: true(2837.0/723.0)
So, salience would not resolve the anomaly           Same Head = false
either. We present the results of using the          | Semantic Class Agreement = true
                                                     | | Number Agreement = true
approach in [12], together with that of the          | | | Part of = true: false(14.0)
present work in Table 3 below.                       | | | Part of = false
                                                     | | | | Gender Agreement = true
                                                     | | | | | Definitei = true: true(201.0/87.0)
  4.2 Experimental Results of our Approach           | | | | | Definitei = false: false(85.0/33.0)
        The decision tree for non-pronominal         | | | | Gender Agreement = false:
noun phrases of the proposed approach is shown       false(36.0/10.0)
                                                     | | | | Gender Agreement = unknown:
in Figure 1. For resolution of pronouns, the         false(0.0)
method proposed by Lappin and Leass is used.         | | Number Agreement = false
                                                     | | | Demonstrativei = true: true(2.0)
Table 3 shows a comparison between the results       | | | Demonstrativei = false: false(146.0/20.0)
for pronouns. The total number of pronouns in        | Semantic Class Agreement = false:
                                                     false(69190.0/1432.0)
the input having an antecedent, the total number     | Semantic Class Agreement = unknown:
of pronouns that have been resolved by the           false(0.0)

approach and the number of resolved pronouns         Number of Leaves : 11
which have correct antecedent are given in           Size of the tree : 19

columns 2, 3 and 4 respectively. We can see          Figure 1: Decision Tree for Non-Pronouns

clearly that the proposed approach outperforms                The proposed model gives an overall

the approach of [12] for resolving pronouns.         precision of 66.1% and recall of 58.4% with a

Table 3: Comparison of Output for Resolution         balanced F-measure of 62.0%. Table 4 gives a

of Pronouns                                          comparison of the results between different
baselines with the proposed approach. In                 the resolution procedure is seen as a clustering
addition to the result of [12], two other baselines      task for the non-pronouns so that each member
are taken into consideration to determine                of a coreference chain corefers with all the other
coreference relations using: i) only String Match        members of the same chain. This idea of
attribute and ii) only the Same Head attribute.          consideration of a coreference chain as an
Table 4: Comparison of Overall Results                   equivalence class has proved fruitful. The
                                              F-         current system takes texts which are marked for
 Approach          Precision    Recall
                                           measure       noun phrases as input. Addition of a Noun
  Baseline                                               Phrase Identification module will allow the
                     59.1%      41.4%       48.7%
     [12]                                                current system to deal with plain text files. The
String Match
                     60.0%      35.1%       44.3%        decision tree for the non-pronominal noun
   only
Same Head                                                phrases shows that there is a scope for addition
                     51.6%      57.0%       54.2%
    only                                                 of more attributes to the classifier. Various other
  Proposed                                               machine learning algorithms can be applied in
                     66.1%      58.4%       62.0%
 Approach                                                place of Decision Tree which may improve the
                                                         results further.

5 Conclusion and Future scope
                                                         Acknowledgements
             Coreference resolution is a necessary
                                                         The authors gratefully acknowledge the support
step for effective information retrieval in
                                                         from Indian Institute of Information Technology-
collection of large text corpora like digital
                                                         Allahabad, India.
library. In this work we have proposed and
implemented a novel approach for the problem
                                                         References
of coreference resolution. The experiments show
                                                          [1] J Peral, M Palomar and A Ferrdndez,
that the approach of [12] is not adequate for
                                                         “Coreference-oriented Interlingual Slot Structure
resolving pronouns. The reason is that the local-
                                                         and Machine Translation”, Proc. of the ACL
focus of the discourse is very important for
                                                         Workshop Coreference & its Applications,
finding antecedents of pronouns. This makes it
                                                         pp.69-76, 1999.
difficult for the classifier to classify a pronoun
                                                         [2] J R Hobbs, ``Pronoun Resolution'', Research
and a candidate antecedent without considering
                                                         Report 76-1, Dept. of Comp. Sc., City College,
the other possible candidates. The results for
                                                         City Univ. of NY, 1976.
non-pronouns were also only slightly better than
                                                         [3] S Lappin and H J Leass, “An algorithm for
the baseline which used only string-match
                                                         pronominal     coreference   resolution”,   Comp.
attribute.
                                                         Linguistics, Vol 20, pp. 535-561, 1994.
             The proposed approach suggests two
                                                         [4] C Kennedy and B Boguraev, “Anaphora for
separate methods for resolution of pronouns and
                                                         everyone: Pronominal coreference resolution
non-pronouns.       Keeping    in   mind   that    the
                                                         without a parser”, Proc. 16th conf. on Comp.
coreference relation is an equivalence relation,
                                                         Linguistics, Vol. 1, pp. 113 – 118, 1996.
[5] R Mitkov, “Robust pronoun resolution with                     Resolution”, Proc. 40th Annual Meeting of the
limited knowledge”, Proceedings of the 18th Int.                  Assoc. Comp. Linguistics, pp. 104-111, 2002.
Conf. on Comp. Linguistics, Vol. 2, pp. 869-875,                  [15] V Ng and C Cardie, “Identifying anaphoric
1998.                                                             and non-anaphoric noun phrases to improve
[6] B Grosz and C Sidner, “Attention, intentions                  coreference resolution”, Proc. 19th Int. Conf.
and     the     structure      of    discourse”,       Comp.      Comp. Linguistics, pp. 730–736, 2002.
Linguistics, Vol. 12, pp.175-204, 1986                            [16] R Iida, K Inui and Y Matsumoto,
[7] B J Grosz, A K Joshi and S Weinstein                          “Anaphora        resolution    by      antecedent
"Centering: a framework for modelling the local                   identification    followed     by    anaphoricity
coherence of discourse", Comp. Linguistics, Vol.                  determination”, ACM Transactions on Asian
21, pp. 203-226, 1995.                                            Language Information Processing (TALIP), Vol.
[8] R Vieira and M Poesio. Processing definite                    4, pp. 417–434, 2005.
descriptions in corpora. In S. Botley and M.                      [17] R Iida, K Inui and Y Takamura and
McEnery,          editors,          Corpus-based           and    Matsumoto, “Incorporating contextual cues in
Computational          Approaches            to      Discourse    trainable models for coreference resolution”,
Anaphora. UCL Press, 1997                                         Proceedings of the 10th Conf. of the European
[9] R. Vieira and M. Poesio, “An empirically-                     Chapter of the Association for Comp. Linguistics
based         system     for        processing         definite   Workshop on Comp. Treatment of Anaphora, pp.
descriptions”, Comp. Linguistics, Vol. 26, pp.                    23–30, 2003.
539-593, 2000.                                                    [18] http://jwn.sourceforge.net/
 [10] C Cardie and K Wagsta, "Noun phrase                         [19] http://nlp.stanford.edu/ner/index.shtml
coreference as clustering", Proc. Joint Conf. on                  [20]http://www.cs.ualberta.ca/~bergsma/Gender/
Empirical Methods in NLP and Very Large                           [21] S Bergsma and D Lin, “Bootstrapping Path-
Corpora,1999                                                      Based Pronoun Resolution”, ACL '06: Proc. 21st
[11] MUC-6, Coreference task definition (v2.3),                   Int. Conf. on Comp. Linguistics and the 44th
Proc.    Sixth Message Understanding                     Conf.    annual meeting of the ACL, pp. 33-40, 2006.
(MUC-6), pages 335-344, 1995.                                     [22] http://www-nlp.stanford.edu/downloads/lex-
[12] W Soon, H Ng and D Lim “A machine                            parser.shtml
learning approach to coreference of noun                          [23] http://www.cs.waikato.ac.nz/ml/weka/
phrases”, Comp. Linguistics, Vol. 27, pp.
521-541, 2001.
[13]    MUC-7,         Coreference          task    definition,
(Vol.3.0)      Proc.    of     the    Seventh         Message
Understanding Conf. (MUC-7), 1997.
[14] V Ng and C Cardie, “Improving Machine
Learning         Approaches            to          Coreference

More Related Content

What's hot

Legal Document
Legal DocumentLegal Document
Legal Documentlegal4
 
4213ijaia04
4213ijaia044213ijaia04
4213ijaia04ijaia
 
A supervised word sense disambiguation method using ontology and context know...
A supervised word sense disambiguation method using ontology and context know...A supervised word sense disambiguation method using ontology and context know...
A supervised word sense disambiguation method using ontology and context know...Alexander Decker
 
An Efficient Semantic Relation Extraction Method For Arabic Texts Based On Si...
An Efficient Semantic Relation Extraction Method For Arabic Texts Based On Si...An Efficient Semantic Relation Extraction Method For Arabic Texts Based On Si...
An Efficient Semantic Relation Extraction Method For Arabic Texts Based On Si...CSCJournals
 
Performance analysis of linkage learning techniques
Performance analysis of linkage learning techniquesPerformance analysis of linkage learning techniques
Performance analysis of linkage learning techniqueseSAT Publishing House
 
Performance analysis of linkage learning techniques in genetic algorithms
Performance analysis of linkage learning techniques in genetic algorithmsPerformance analysis of linkage learning techniques in genetic algorithms
Performance analysis of linkage learning techniques in genetic algorithmseSAT Journals
 
The effect of number of concepts on readability of schemas 2
The effect of number of concepts on readability of schemas 2The effect of number of concepts on readability of schemas 2
The effect of number of concepts on readability of schemas 2Saman Sara
 
SENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCH
SENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCHSENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCH
SENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCHijwscjournal
 
Neural perceptual model to global local vision for the recognition of the log...
Neural perceptual model to global local vision for the recognition of the log...Neural perceptual model to global local vision for the recognition of the log...
Neural perceptual model to global local vision for the recognition of the log...ijaia
 
G04124041046
G04124041046G04124041046
G04124041046IOSR-JEN
 
The Cortisol Awakening Response Using Modified Proposed Method of Forecasting...
The Cortisol Awakening Response Using Modified Proposed Method of Forecasting...The Cortisol Awakening Response Using Modified Proposed Method of Forecasting...
The Cortisol Awakening Response Using Modified Proposed Method of Forecasting...IJERA Editor
 
Lifelong Topic Modelling presentation
Lifelong Topic Modelling presentation Lifelong Topic Modelling presentation
Lifelong Topic Modelling presentation Daniele Di Mitri
 
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...ijaia
 
Main single agent machine learning algorithms
Main single agent machine learning algorithmsMain single agent machine learning algorithms
Main single agent machine learning algorithmsbutest
 
Parameters Optimization for Improving ASR Performance in Adverse Real World N...
Parameters Optimization for Improving ASR Performance in Adverse Real World N...Parameters Optimization for Improving ASR Performance in Adverse Real World N...
Parameters Optimization for Improving ASR Performance in Adverse Real World N...Waqas Tariq
 

What's hot (16)

Legal Document
Legal DocumentLegal Document
Legal Document
 
4213ijaia04
4213ijaia044213ijaia04
4213ijaia04
 
A supervised word sense disambiguation method using ontology and context know...
A supervised word sense disambiguation method using ontology and context know...A supervised word sense disambiguation method using ontology and context know...
A supervised word sense disambiguation method using ontology and context know...
 
An Efficient Semantic Relation Extraction Method For Arabic Texts Based On Si...
An Efficient Semantic Relation Extraction Method For Arabic Texts Based On Si...An Efficient Semantic Relation Extraction Method For Arabic Texts Based On Si...
An Efficient Semantic Relation Extraction Method For Arabic Texts Based On Si...
 
Performance analysis of linkage learning techniques
Performance analysis of linkage learning techniquesPerformance analysis of linkage learning techniques
Performance analysis of linkage learning techniques
 
Performance analysis of linkage learning techniques in genetic algorithms
Performance analysis of linkage learning techniques in genetic algorithmsPerformance analysis of linkage learning techniques in genetic algorithms
Performance analysis of linkage learning techniques in genetic algorithms
 
The effect of number of concepts on readability of schemas 2
The effect of number of concepts on readability of schemas 2The effect of number of concepts on readability of schemas 2
The effect of number of concepts on readability of schemas 2
 
SENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCH
SENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCHSENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCH
SENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCH
 
Er
ErEr
Er
 
Neural perceptual model to global local vision for the recognition of the log...
Neural perceptual model to global local vision for the recognition of the log...Neural perceptual model to global local vision for the recognition of the log...
Neural perceptual model to global local vision for the recognition of the log...
 
G04124041046
G04124041046G04124041046
G04124041046
 
The Cortisol Awakening Response Using Modified Proposed Method of Forecasting...
The Cortisol Awakening Response Using Modified Proposed Method of Forecasting...The Cortisol Awakening Response Using Modified Proposed Method of Forecasting...
The Cortisol Awakening Response Using Modified Proposed Method of Forecasting...
 
Lifelong Topic Modelling presentation
Lifelong Topic Modelling presentation Lifelong Topic Modelling presentation
Lifelong Topic Modelling presentation
 
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
 
Main single agent machine learning algorithms
Main single agent machine learning algorithmsMain single agent machine learning algorithms
Main single agent machine learning algorithms
 
Parameters Optimization for Improving ASR Performance in Adverse Real World N...
Parameters Optimization for Improving ASR Performance in Adverse Real World N...Parameters Optimization for Improving ASR Performance in Adverse Real World N...
Parameters Optimization for Improving ASR Performance in Adverse Real World N...
 

Similar to Coreference Resolution using Hybrid Approach

Natural Language Processing Through Different Classes of Machine Learning
Natural Language Processing Through Different Classes of Machine LearningNatural Language Processing Through Different Classes of Machine Learning
Natural Language Processing Through Different Classes of Machine Learningcsandit
 
Automatic multiple choice question generation system for
Automatic multiple choice question generation system forAutomatic multiple choice question generation system for
Automatic multiple choice question generation system forAlexander Decker
 
A CONSENSUS MODEL FOR GROUP DECISION MAKING WITH HESITANT FUZZY INFORMATION
A CONSENSUS MODEL FOR GROUP DECISION MAKING WITH HESITANT FUZZY INFORMATIONA CONSENSUS MODEL FOR GROUP DECISION MAKING WITH HESITANT FUZZY INFORMATION
A CONSENSUS MODEL FOR GROUP DECISION MAKING WITH HESITANT FUZZY INFORMATIONijfls
 
Methods of Combining Neural Networks and Genetic Algorithms
Methods of Combining Neural Networks and Genetic AlgorithmsMethods of Combining Neural Networks and Genetic Algorithms
Methods of Combining Neural Networks and Genetic AlgorithmsESCOM
 
A hybrid composite features based sentence level sentiment analyzer
A hybrid composite features based sentence level sentiment analyzerA hybrid composite features based sentence level sentiment analyzer
A hybrid composite features based sentence level sentiment analyzerIAESIJAI
 
Prediction of Answer Keywords using Char-RNN
Prediction of Answer Keywords using Char-RNNPrediction of Answer Keywords using Char-RNN
Prediction of Answer Keywords using Char-RNNIJECEIAES
 
Paper id 28201441
Paper id 28201441Paper id 28201441
Paper id 28201441IJRAT
 
Anaphora resolution in hindi language using gazetteer method
Anaphora resolution in hindi language using gazetteer methodAnaphora resolution in hindi language using gazetteer method
Anaphora resolution in hindi language using gazetteer methodijcsa
 
Nature Inspired Models And The Semantic Web
Nature Inspired Models And The Semantic WebNature Inspired Models And The Semantic Web
Nature Inspired Models And The Semantic WebStefan Ceriu
 
A Review of Constraint Programming
A Review of Constraint ProgrammingA Review of Constraint Programming
A Review of Constraint ProgrammingEditor IJCATR
 
Comparative performance analysis of two anaphora resolution systems
Comparative performance analysis of two anaphora resolution systemsComparative performance analysis of two anaphora resolution systems
Comparative performance analysis of two anaphora resolution systemsijfcstjournal
 
Parallel evolutionary approach paper
Parallel evolutionary approach paperParallel evolutionary approach paper
Parallel evolutionary approach paperPriti Punia
 
HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY C...
HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY C...HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY C...
HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY C...dannyijwest
 
QA4MRE LIMSI-CNRS - Gleize et al. 2013
QA4MRE LIMSI-CNRS - Gleize et al. 2013QA4MRE LIMSI-CNRS - Gleize et al. 2013
QA4MRE LIMSI-CNRS - Gleize et al. 2013Frédéric Giannetti
 
Query Answering Approach Based on Document Summarization
Query Answering Approach Based on Document SummarizationQuery Answering Approach Based on Document Summarization
Query Answering Approach Based on Document SummarizationIJMER
 
Question Classification using Semantic, Syntactic and Lexical features
Question Classification using Semantic, Syntactic and Lexical featuresQuestion Classification using Semantic, Syntactic and Lexical features
Question Classification using Semantic, Syntactic and Lexical featuresIJwest
 
Question Classification using Semantic, Syntactic and Lexical features
Question Classification using Semantic, Syntactic and Lexical featuresQuestion Classification using Semantic, Syntactic and Lexical features
Question Classification using Semantic, Syntactic and Lexical featuresdannyijwest
 
A COMPARATIVE STUDY OF FEATURE SELECTION METHODS
A COMPARATIVE STUDY OF FEATURE SELECTION METHODSA COMPARATIVE STUDY OF FEATURE SELECTION METHODS
A COMPARATIVE STUDY OF FEATURE SELECTION METHODSijnlc
 
Document Retrieval System, a Case Study
Document Retrieval System, a Case StudyDocument Retrieval System, a Case Study
Document Retrieval System, a Case StudyIJERA Editor
 

Similar to Coreference Resolution using Hybrid Approach (20)

Natural Language Processing Through Different Classes of Machine Learning
Natural Language Processing Through Different Classes of Machine LearningNatural Language Processing Through Different Classes of Machine Learning
Natural Language Processing Through Different Classes of Machine Learning
 
Wsd final paper
Wsd final paperWsd final paper
Wsd final paper
 
Automatic multiple choice question generation system for
Automatic multiple choice question generation system forAutomatic multiple choice question generation system for
Automatic multiple choice question generation system for
 
A CONSENSUS MODEL FOR GROUP DECISION MAKING WITH HESITANT FUZZY INFORMATION
A CONSENSUS MODEL FOR GROUP DECISION MAKING WITH HESITANT FUZZY INFORMATIONA CONSENSUS MODEL FOR GROUP DECISION MAKING WITH HESITANT FUZZY INFORMATION
A CONSENSUS MODEL FOR GROUP DECISION MAKING WITH HESITANT FUZZY INFORMATION
 
Methods of Combining Neural Networks and Genetic Algorithms
Methods of Combining Neural Networks and Genetic AlgorithmsMethods of Combining Neural Networks and Genetic Algorithms
Methods of Combining Neural Networks and Genetic Algorithms
 
A hybrid composite features based sentence level sentiment analyzer
A hybrid composite features based sentence level sentiment analyzerA hybrid composite features based sentence level sentiment analyzer
A hybrid composite features based sentence level sentiment analyzer
 
Prediction of Answer Keywords using Char-RNN
Prediction of Answer Keywords using Char-RNNPrediction of Answer Keywords using Char-RNN
Prediction of Answer Keywords using Char-RNN
 
Paper id 28201441
Paper id 28201441Paper id 28201441
Paper id 28201441
 
Anaphora resolution in hindi language using gazetteer method
Anaphora resolution in hindi language using gazetteer methodAnaphora resolution in hindi language using gazetteer method
Anaphora resolution in hindi language using gazetteer method
 
Nature Inspired Models And The Semantic Web
Nature Inspired Models And The Semantic WebNature Inspired Models And The Semantic Web
Nature Inspired Models And The Semantic Web
 
A Review of Constraint Programming
A Review of Constraint ProgrammingA Review of Constraint Programming
A Review of Constraint Programming
 
Comparative performance analysis of two anaphora resolution systems
Comparative performance analysis of two anaphora resolution systemsComparative performance analysis of two anaphora resolution systems
Comparative performance analysis of two anaphora resolution systems
 
Parallel evolutionary approach paper
Parallel evolutionary approach paperParallel evolutionary approach paper
Parallel evolutionary approach paper
 
HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY C...
HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY C...HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY C...
HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY C...
 
QA4MRE LIMSI-CNRS - Gleize et al. 2013
QA4MRE LIMSI-CNRS - Gleize et al. 2013QA4MRE LIMSI-CNRS - Gleize et al. 2013
QA4MRE LIMSI-CNRS - Gleize et al. 2013
 
Query Answering Approach Based on Document Summarization
Query Answering Approach Based on Document SummarizationQuery Answering Approach Based on Document Summarization
Query Answering Approach Based on Document Summarization
 
Question Classification using Semantic, Syntactic and Lexical features
Question Classification using Semantic, Syntactic and Lexical featuresQuestion Classification using Semantic, Syntactic and Lexical features
Question Classification using Semantic, Syntactic and Lexical features
 
Question Classification using Semantic, Syntactic and Lexical features
Question Classification using Semantic, Syntactic and Lexical featuresQuestion Classification using Semantic, Syntactic and Lexical features
Question Classification using Semantic, Syntactic and Lexical features
 
A COMPARATIVE STUDY OF FEATURE SELECTION METHODS
A COMPARATIVE STUDY OF FEATURE SELECTION METHODSA COMPARATIVE STUDY OF FEATURE SELECTION METHODS
A COMPARATIVE STUDY OF FEATURE SELECTION METHODS
 
Document Retrieval System, a Case Study
Document Retrieval System, a Case StudyDocument Retrieval System, a Case Study
Document Retrieval System, a Case Study
 

More from butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 

More from butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

Coreference Resolution using Hybrid Approach

  • 1. Coreference Resolution using hybrid approach Abstract resolution system is to find out this relationship This work presents a novel approach to find between them. structured information from the vast repository In information retrieval, when of unstructured text in digital libraries using performing a search on “Taj Mahal” we would Coreference Resolution. Our approach uses a like it if the page ranking algorithm also includes salience based technique to find the antecedents “it”, “the monument”, etc. which refer to “Taj of pronouns, while it uses a classifier based Mahal” for evaluating the rank of that page. technique for the resolution of other noun Similarly a question answering system would be phrases. A comparison of the proposed approach able to fetch better answers, and improve the with several baselines methods shows that the recall if a coreference resolution system is suggested solution significantly outperforms incorporated in it. It was suggested by Peral et al them with 66.1 % precision, 58.4% recall and a that identification of coreferences is essential for balanced f-measure of 62.0%. an efficient machine translation system [1]. The algorithm due to Hobbs [2] is one Key words: Coreference Resolution, Hybrid of the earliest attempts to resolve personal approach, Machine learning, anaphoricity features. pronouns he, she, it and they in three texts by finding their antecedents. Lappin and Leass [3] 1 Introduction identifies the noun-phrase antecedents of The availability of vast amounts of personal pronouns, including the reflexive machine readable texts in digital libraries has pronouns. These algorithms are good examples resulted in an ever increasing need to find of approaches that are language specific and use methods for extracting some structured hand crafted rules. As an alternative, several information from them. This has given an knowledge-poor approaches have been proposed impetus to the development of methods for by Kennedy and Boguraev [4] and Mitkov [5]. Information Extraction. One of these tasks is These approaches work with tools like part-of- Coreference Resolution. Coreference is said to speech tagger and shallow parser. The algorithm occur when one expression has the same referent used for resolution of pronouns is similar to the as another expression. Coreference resolution is one proposed in [3]. These approaches show the process of finding coreferences in a good accuracy. However, these approaches are document. For example “Mohandas Karamchand limited to third person pronouns. A Gandhi” can be referred to as “Mr. Gandhi”, computational theory for discourse structure was “Mahatma Gandhi”, “he” or even “Bapu”. All proposed by Grosz et al [6]. further elaborated in these phrases corefer and the purpose of the the Centering Theory [7] which fits into this computational model and gives the relationship
  • 2. of the attentional state with local coherence. The classifier is trained for pairs of coreferring algorithm proposed by Vieira and Poesio [8] markables which in turn is used for the works by checking the definite description if it is coreference resolution procedure. A set of a special noun followed by a test for appositive features are determined for a pair of markables construction. The performance of this approach before giving it to the classifier. The researchers was measured in two ways, first by examining trained a separate decision tree for MUC-6 [11] how well the system performed coreference and MUC-7 [13], each using 30 training texts resolution on the anaphoric definite Noun annotated with anaphoric links. The system Phrases (NP), achieving 72% recall and 82% generated 59% recall and 67% precision for precision, and then by examining how well the MUC-6 and 56% recall and 66% precision system identified non-anaphoric definite NPs, forMUC-7. The model of [12] was extended by achieving 74% recall and 85% precision. Later Ng and Cardie [14] making few changes to the on, an extension of this approach was also learning framework and by adding more proposed by the same authors where the bridging features. They implemented their approach using anaphors were also taken into consideration [9]. both a decision tree learner (C4.5) and a rule The performance of this approach was found for based learner (RIPPER). This model was tested the system with and without considering the for both MUC-6 and MUC-7 test corpus giving heuristic bridging anaphors. The precision and a recall of 64.2%, precision of 78.0% and an f- recall for the first case were found to be 76% and measure of 70.4% for MUC-6 test set and a 53% respectively and in the second case were recall of 55.7%, precision of 72.8% and an f- 70% and 57% respectively. An unsupervised measure of 63.1% for MUC-7 test set. Another algorithm for noun-phrase coreference resolution model [15], was proposed by the same authors was presented by Cardie and Wagsta [10]. The based on anaphoricity determination. The model algorithm was tested for MUC-6 coreference uses two classifiers: one to determine if a data [11]. It contained two sets of 30 documents particular NP will have an antecedent or not and annotated with coreference links. The first set the other was similar to the one used in the first gave 48.8% recall, 57.4% precision and an f- model [14]. The complete system gives a recall measure of 52.8%. The second set gave 52.7% of 57.2%, a precision of 71.6% and an f-measure recall, 54.6% precision and an f-measure of of 63.7 for MUC-6 test set while it gives a recall 52.6%. The problem with this approach is that of 47.0 %, a precision of 77.1% and an f- once a link is formed it cannot be broken. measure of 58.4 for MUC-7 test set. Iida et al In the references cited above, the prime [16] proposed a modified approach to noun- emphasis was on pronoun resolution only. For phrase coreference resolution by reversing the the more general case of anaphora resolution a steps of the model proposed in [15]. This model corpus-based, supervised machine learning does not solve the problem of skewed training approach to noun phrase coreference resolution dataset. To overcome the drawback of skewed was adopted by Soon et al [12]. A decision tree training dataset this work presents a tournament
  • 3. model [17]. However, the tournament model is sentences and the current sentence. It is found not capable of anaphoricity determination as it empirically that in most of the cases the always selects a candidate for a given NP. antecedent of the pronoun is within 2 sentences. The rest of the paper is organized as The pronoun model processes each sentence one follows. In the following section the present by one in chronological order. The local focus of approach is discussed. Some of the tools and the discourse is found by evaluating the salience techniques are described in the third section. The value for each noun / pronoun that occurs in results are presented in the fourth section while these sentences. When the pronoun occurs, the last section presents some concluding entities in the salience-list are passed to a remarks including possible improvements. syntactic-filter. The filter discards those entities from the salience-list which do not satisfy the 2 Proposed Approach agreement constraints and binding constraints. The proposed approach tries to rework From the remaining entities, the one which is the approach given [12]. Study of the discourse most salient is selected as its antecedent. The theory [7] reveals that the local focus plays an detailed list of constraints and the method for important role for resolution of pronouns. A finding out salience is given in [3]. classifier like the one proposed in [12] will not be able to determine the focus and so tend to err 2.2. Method for Non-Pronouns The method for finding out coreferring more during resolving pronouns. Another expressions (if any) for non-pronominal problem with this approach is in clustering. In markables is based on the output of a classifier. order to overcome these problems we are solving The classifier takes two markables ((Markablei) the coreference resolution problem by finding and a candidate antecedent (Markablej)) as input out links between referencing expressions- the and classifies the pair as coreferring or non- markables. Therefore, in the present approach, coreferring. The classifier is similar to the one the markables are divided into two categories: i) used in [12]. The attributes used by our classifier Pronouns and ii) Non-pronouns. We use is given in Table 1. different methods for finding antecedents of In Table 1, the attribute head is the head pronouns and non-pronouns. Both these methods word of the noun phrase. A head word is the are explained below. main word of focus in a noun phrase. For 2.1. Method for Pronouns instance, in the noun-phrase “the next biggest The method for resolving pronouns achievement of the century”, the noun in focus is should take into consideration the local focus of “achievement”. So, the head of this noun phrase the discourse in addition to the basic binding and is “achievement”. The noun phrases which start agreement constraints. We have adopted the with “the” are considered as definite while the pronoun resolution method proposed in [3]. noun phrases which start with “this”, “that”, Whenever a pronoun occurs in a sentence, it is “these” and “those” are considered as sent to a pronoun model with previous n- demonstrative. The hypernymy relation is used
  • 4. to find out the semantic class of a markable. The semantic class of two markables, the hypernymy class from the seven classes defined in the relation of wordnet is used. Each markable is previous section which is a hypernym of the assigned a synset corresponding to the first noun head of the markable is assigned as the semantic sense of its head. Two markables match in semantic class if their synsets are same or they Table 1: Attributes for non-pronoun classifier are related directly by the hypernymy-hyponymy Attribute Description relationship of wordnet. According to the output String Match True if Markablei is same as Markablej, else false of the classifier, the markables are grouped into Same Head True if Markablei and coreference classes. Markablej has the same head, The classifier is trained using a slightly else false Definitei True if Markablei is definite, modified approach to that of [12]. Instead of else false using only the immediate antecedent of a Demonstrativei True if Markablei is markable in a coreference chain, all the non- demonstrative, else false pronominal antecedents are paired with the Number Agreement True if Markablei and Markablej match in number, markable to form positive samples. Negative else false samples are generated by pairing up the Gender Agreement True if Markablei & Markablej markable with the intermediate markables that have same gender are not present in the chain. Finally, the resultant False if Markablei & coreference classes will be generated as output. Markablej have different Each coreference class will contain markables gender Unknown if the gender of that corefer with each other. either markable is unknown Both Proper Name True if both markables are 3 Tools and Techniques This section summarizes some of the proper names, else false Part of True if either markable is tools and techniques used for achieving the totally contained in the other proposed technique outlined in previous section. markable, else false Semantic Class True if both markables have 3.1 Named Entity Recognizer Agreement the same semantic class. A named entity recognizer (NER) False if both markables have identifies the named entities occurring in a text. different semantic class. Named entities are the names of people, Unknown if semantic class of organizations, locations, dates, etc. The Stanford either of the markables cannot Named Entity Recognizer API [19] is used to be determined. extract the named entities present in a noun class of that markable. If none of the seven phrase and classifies them into three different classes is a hypernym, the semantic class of that types: person, organization and location. The markable is made ‘unknown’. For implementing NER is used to determine if the markable is a the solution, freely available API namely, named entity, extract the alias feature and also JWordnet [18] is used. For matching the determine if the markable is a proper noun.
  • 5. The open source Java API Weka is used 3.2 Derivation of Gender Information [23] to learn the decision tree. This API provides In order to evaluate the feature “gender implementation of various machine learning agreement” the system finds the gender of a algorithms for data mining. We use the C4.5 noun phrase from a huge database containing learning algorithm provided in the API. gender information. The entries in the database give the number of occurrences of an expression 4 Results and Analysis as masculine, feminine or neuter. This This section provides a detailed probabilistic gender database is generated using description of the experiments performed for the a large number of texts [20,21]. For finding the coreference resolution of Noun Phrases for gender of a noun phrase, the head of the noun English. The approach of [12] is taken as a phrase is taken as the input of a database query baseline. Out of the 78 pair of files, 38 pairs are and the counts of the head as masculine, used for training and 40 pairs are used for feminine and neuter are yielded as output. The testing. Table 2 shows the number of pronouns, gender having the maximum count is taken as definite/demonstrative Noun Phrases and the gender of the noun phrase. For example the remaining Noun Phrases participating in noun phrase “President Fujimori” having the coreference relation. head “Fujimori” is assigned male as its gender. Table 2: Noun Phrases and their occurrences in As gender information plays an important role in coreference relation coreference resolution, both for pronouns and Definite / Pronouns Others Overall non-pronouns, the accuracy of this database will Demonstrative Co- govern the overall accuracy. 352 380 725 1457 referring Total 559 1247 3732 5538 3.3 Parsing The system uses the Stanford Parser [22] API for determining the head word for each 4.1 Experimental Results of Soon’s Approach of the noun phrases and for determining its A detailed analysis of the approach [12] grammatical role like subject, direct object, was made. Decision trees were learned using indirect object etc. for the pronoun resolution various confidence factors for pruning (0.25, method. It is a probabilistic parser which derives 0.4). Lower confidence factor for pruning knowledge from hand-parsed sentences. The API produces a generalized tree while a higher provides three parses, probabilistic context free confidence factor produces a detailed tree having grammar (PCFG) parser, dependency parser and more branches. Our analysis shows that in case lexicalized PCFG parser. We use the lexicalized of pronominal anaphor, the output is false when PCFG parser to obtain the required information all the agreement features like gender, number for the noun phrases. and semantic class are true. However we know that for a pronoun and its antecedent, all these 3.4 Decision Tree Learner agreement features do match. The reason for this
  • 6. anomaly is that while generating training Total Total Correct Approach links in links in samples for a pronoun, a positive sample is links Key Output generated for one pair while negative samples Approach 238 246 41 are generated for all other pairs. Thus, the [12] predominance of negative samples outweighs Proposed 238 253 92 those of the positive samples, leading to the Approach anomaly mentioned above. We can infer from An analysis of tree, given in Figure 1, generated these experiments that the features used in [12] using our approach shows that the “Same Head” are not adequate to resolve pronouns. The feature is the most important feature. In case agreement features provide the necessary where the markables do not have a same head, conditions but they are not sufficient. So, extra their semantic class proves is the next significant information is required for resolving pronouns. feature. The other agreement features also play One may suggest that salience of a their part in determining the coreference between candidate antecedent would prove decisive for non-pronominal noun phrases. pronoun resolution. However, salience is a relative measure and is significant only if it is Same Head = true compared to the salience of another markable. | Part of = true: false(163.0/3.0) | Part of = false: true(2837.0/723.0) So, salience would not resolve the anomaly Same Head = false either. We present the results of using the | Semantic Class Agreement = true | | Number Agreement = true approach in [12], together with that of the | | | Part of = true: false(14.0) present work in Table 3 below. | | | Part of = false | | | | Gender Agreement = true | | | | | Definitei = true: true(201.0/87.0) 4.2 Experimental Results of our Approach | | | | | Definitei = false: false(85.0/33.0) The decision tree for non-pronominal | | | | Gender Agreement = false: noun phrases of the proposed approach is shown false(36.0/10.0) | | | | Gender Agreement = unknown: in Figure 1. For resolution of pronouns, the false(0.0) method proposed by Lappin and Leass is used. | | Number Agreement = false | | | Demonstrativei = true: true(2.0) Table 3 shows a comparison between the results | | | Demonstrativei = false: false(146.0/20.0) for pronouns. The total number of pronouns in | Semantic Class Agreement = false: false(69190.0/1432.0) the input having an antecedent, the total number | Semantic Class Agreement = unknown: of pronouns that have been resolved by the false(0.0) approach and the number of resolved pronouns Number of Leaves : 11 which have correct antecedent are given in Size of the tree : 19 columns 2, 3 and 4 respectively. We can see Figure 1: Decision Tree for Non-Pronouns clearly that the proposed approach outperforms The proposed model gives an overall the approach of [12] for resolving pronouns. precision of 66.1% and recall of 58.4% with a Table 3: Comparison of Output for Resolution balanced F-measure of 62.0%. Table 4 gives a of Pronouns comparison of the results between different
  • 7. baselines with the proposed approach. In the resolution procedure is seen as a clustering addition to the result of [12], two other baselines task for the non-pronouns so that each member are taken into consideration to determine of a coreference chain corefers with all the other coreference relations using: i) only String Match members of the same chain. This idea of attribute and ii) only the Same Head attribute. consideration of a coreference chain as an Table 4: Comparison of Overall Results equivalence class has proved fruitful. The F- current system takes texts which are marked for Approach Precision Recall measure noun phrases as input. Addition of a Noun Baseline Phrase Identification module will allow the 59.1% 41.4% 48.7% [12] current system to deal with plain text files. The String Match 60.0% 35.1% 44.3% decision tree for the non-pronominal noun only Same Head phrases shows that there is a scope for addition 51.6% 57.0% 54.2% only of more attributes to the classifier. Various other Proposed machine learning algorithms can be applied in 66.1% 58.4% 62.0% Approach place of Decision Tree which may improve the results further. 5 Conclusion and Future scope Acknowledgements Coreference resolution is a necessary The authors gratefully acknowledge the support step for effective information retrieval in from Indian Institute of Information Technology- collection of large text corpora like digital Allahabad, India. library. In this work we have proposed and implemented a novel approach for the problem References of coreference resolution. The experiments show [1] J Peral, M Palomar and A Ferrdndez, that the approach of [12] is not adequate for “Coreference-oriented Interlingual Slot Structure resolving pronouns. The reason is that the local- and Machine Translation”, Proc. of the ACL focus of the discourse is very important for Workshop Coreference & its Applications, finding antecedents of pronouns. This makes it pp.69-76, 1999. difficult for the classifier to classify a pronoun [2] J R Hobbs, ``Pronoun Resolution'', Research and a candidate antecedent without considering Report 76-1, Dept. of Comp. Sc., City College, the other possible candidates. The results for City Univ. of NY, 1976. non-pronouns were also only slightly better than [3] S Lappin and H J Leass, “An algorithm for the baseline which used only string-match pronominal coreference resolution”, Comp. attribute. Linguistics, Vol 20, pp. 535-561, 1994. The proposed approach suggests two [4] C Kennedy and B Boguraev, “Anaphora for separate methods for resolution of pronouns and everyone: Pronominal coreference resolution non-pronouns. Keeping in mind that the without a parser”, Proc. 16th conf. on Comp. coreference relation is an equivalence relation, Linguistics, Vol. 1, pp. 113 – 118, 1996.
  • 8. [5] R Mitkov, “Robust pronoun resolution with Resolution”, Proc. 40th Annual Meeting of the limited knowledge”, Proceedings of the 18th Int. Assoc. Comp. Linguistics, pp. 104-111, 2002. Conf. on Comp. Linguistics, Vol. 2, pp. 869-875, [15] V Ng and C Cardie, “Identifying anaphoric 1998. and non-anaphoric noun phrases to improve [6] B Grosz and C Sidner, “Attention, intentions coreference resolution”, Proc. 19th Int. Conf. and the structure of discourse”, Comp. Comp. Linguistics, pp. 730–736, 2002. Linguistics, Vol. 12, pp.175-204, 1986 [16] R Iida, K Inui and Y Matsumoto, [7] B J Grosz, A K Joshi and S Weinstein “Anaphora resolution by antecedent "Centering: a framework for modelling the local identification followed by anaphoricity coherence of discourse", Comp. Linguistics, Vol. determination”, ACM Transactions on Asian 21, pp. 203-226, 1995. Language Information Processing (TALIP), Vol. [8] R Vieira and M Poesio. Processing definite 4, pp. 417–434, 2005. descriptions in corpora. In S. Botley and M. [17] R Iida, K Inui and Y Takamura and McEnery, editors, Corpus-based and Matsumoto, “Incorporating contextual cues in Computational Approaches to Discourse trainable models for coreference resolution”, Anaphora. UCL Press, 1997 Proceedings of the 10th Conf. of the European [9] R. Vieira and M. Poesio, “An empirically- Chapter of the Association for Comp. Linguistics based system for processing definite Workshop on Comp. Treatment of Anaphora, pp. descriptions”, Comp. Linguistics, Vol. 26, pp. 23–30, 2003. 539-593, 2000. [18] http://jwn.sourceforge.net/ [10] C Cardie and K Wagsta, "Noun phrase [19] http://nlp.stanford.edu/ner/index.shtml coreference as clustering", Proc. Joint Conf. on [20]http://www.cs.ualberta.ca/~bergsma/Gender/ Empirical Methods in NLP and Very Large [21] S Bergsma and D Lin, “Bootstrapping Path- Corpora,1999 Based Pronoun Resolution”, ACL '06: Proc. 21st [11] MUC-6, Coreference task definition (v2.3), Int. Conf. on Comp. Linguistics and the 44th Proc. Sixth Message Understanding Conf. annual meeting of the ACL, pp. 33-40, 2006. (MUC-6), pages 335-344, 1995. [22] http://www-nlp.stanford.edu/downloads/lex- [12] W Soon, H Ng and D Lim “A machine parser.shtml learning approach to coreference of noun [23] http://www.cs.waikato.ac.nz/ml/weka/ phrases”, Comp. Linguistics, Vol. 27, pp. 521-541, 2001. [13] MUC-7, Coreference task definition, (Vol.3.0) Proc. of the Seventh Message Understanding Conf. (MUC-7), 1997. [14] V Ng and C Cardie, “Improving Machine Learning Approaches to Coreference