Reference Scope Identificationin Citing Sentences         Authors:                 Amjad Abu-Jbara, Dragomir Radev        ...
Abstract●   Problem:    ●   Multiple citation in one sentence    ●   There are many POS taggers developed using        dif...
Preprocessing & Methods
Reference Preprocessing    (tagging, grouping, non-syntactical element removal)●   These constraints can be lexicalized (R...
Approach 1(SVM,LR)●   Word classification    ●   with SVM, a logistic regression classifier●   Feature: Distance, Position...
Approach 2(CRF)●   Sequence Labeling with CRF    ●   feature is same as Approach 1
Approach 3-S1-* (CRF/segment)●   segmentation (1)    ●   punctuation marks    ●   coordination conjunctions        –   and...
Approach 3-S2-* (CRF/segment)●   segmentation (2)    ●   chunking tool        –   noun groups        –   verb groups      ...
Approach 3-*-R1,2,3                 (CRF/segment)●   R1: majority label of the words it contains●   R2: inside if any word...
AR2011the link grammar parser(Sleator and Temperley,1991)
Experiment
Data●   ACL Anthology Network Corpus●   3300 sentences, citations in each ≧ 2             Annotation agreement●   500 of 3...
Tools●   Edinburgh Language Technology Text    Tokenization Toolkit (LT-TTT)    ●   text tokenization, part-of-speech tagg...
Tools●   Machine Learning for Language Toolkit    (MALLET)    ●   CRF                    Validation●   10-fold cross valid...
Experiment (Preprocessing)    These constraints can be lexicalized (REF.1; REF.2), ll                                     ...
Experiment (Main)               ● CRF               ● Chunking               ● Majority
Feature Analysis●   Feature: Distance, Position(Before/After), Same    segment(,.; and, but, for, nor, or, so, yet), POS  ...
Summary●   Identified reference scope in a sentence which    has multiple citation● CRF● Chunking● Majority
Reference Scope Identification in Citing Sentences
Upcoming SlideShare
Loading in …5
×

Reference Scope Identification in Citing Sentences

2,052 views

Published on

This is material of reading camp (http://www.cl.ecei.tohoku.ac.jp/~y-matsu/snlp4/)

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,052
On SlideShare
0
From Embeds
0
Number of Embeds
868
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • 難波先生や自身らがある引用が文をまたがって説明されている場合のスコープの同定を扱っていることが関連研究に示されている。 応用は要約など。
  • Annotator 2人なのでたまたま被る確率P(E)は1/2 P(A)は8割ちょい
  • Reference Scope Identification in Citing Sentences

    1. 1. Reference Scope Identificationin Citing Sentences         Authors: Amjad Abu-Jbara, Dragomir Radev (University of Michigan)            Conference: NAACL 2012            Expositor: Akihiro Kameda (Aizawa Lab. The University of Tokyo)
    2. 2. Abstract● Problem: ● Multiple citation in one sentence ● There are many POS taggers developed using different techniques for many major languages such as transformation-based error-driven learning (Brill, 1995), decision trees (Black et al., 1992), Markov model (Cutting et al., 1992), maximum entropy methods (Ratnaparkhi, 1996) etc for English.● Approach:Prepossessing      and 2+1+2*3+1=10 methods
    3. 3. Preprocessing & Methods
    4. 4. Reference Preprocessing (tagging, grouping, non-syntactical element removal)● These constraints can be lexicalized (REF.1; REF.2), unlexicalized (REF.3; TREF.4) or automatically learned (REF.5; REF.6).● These constraints can be lexicalized (GREF.1), unlexicalized (GTREF.2) or automatically learned (GREF.3).● (GTREF.1) apply fuzzy techniques for integrating source syntax into hierarchical phrase-based systems (REF.2).
    5. 5. Approach 1(SVM,LR)● Word classification ● with SVM, a logistic regression classifier● Feature: Distance, Position(Before/After), in Segment(,.; and, but, for, nor, or, so, yet), POS tag, Dependency Distance, Dependency Relations, Common Ancestor Node, Syntactic Distance● Problem Example: ● There are many POS taggers developed using different techniques for many major languages such as transformation- based error-driven learning (Brill, 1995), decision trees (Black et al., 1992), Markov model (Cutting et al., 1992), maximum entropy methods (Ratnaparkhi, 1996) etc for English.
    6. 6. Approach 2(CRF)● Sequence Labeling with CRF ● feature is same as Approach 1
    7. 7. Approach 3-S1-* (CRF/segment)● segmentation (1) ● punctuation marks ● coordination conjunctions – and, but, for, nor, or, so, yet ● a set of special expressions – "for example", "for instance", "including", "includes", "such as", "like", etc.● [Rerankers have been successfully applied to numerous NLP tasks such as] [parse selection (GTREF)], [parse reranking (GREF)], [question-answering (REF)].
    8. 8. Approach 3-S2-* (CRF/segment)● segmentation (2) ● chunking tool – noun groups – verb groups – preposition groups – adjective groups – adverb groups – other parts form segment by themselves● [To] [score] [the output] [of] [the coreference models], [we] [employ] [the commonly-used MUC scoring program (REF)] [and] [the recently-developed CEAF scoring program (TREF)].
    9. 9. Approach 3-*-R1,2,3 (CRF/segment)● R1: majority label of the words it contains● R2: inside if any word is inside● R3: outside if any word is outside ● [I O O O O] [I I I] [O O]
    10. 10. AR2011the link grammar parser(Sleator and Temperley,1991)
    11. 11. Experiment
    12. 12. Data● ACL Anthology Network Corpus● 3300 sentences, citations in each ≧ 2 Annotation agreement● 500 of 3300, ● Preprocessing is perfect ● Kappa coefficient of scope is P ( A)−P ( E ) K= =2P ( A)−1=0.61 1−P ( E )
    13. 13. Tools● Edinburgh Language Technology Text Tokenization Toolkit (LT-TTT) ● text tokenization, part-of-speech tagging, chunking, and noun phrase head identification.● Stanford parser ● syntactic and dependency parsing● LibSVM with linear kernel● Weka ● logistic regression classification
    14. 14. Tools● Machine Learning for Language Toolkit (MALLET) ● CRF Validation● 10-fold cross validation
    15. 15. Experiment (Preprocessing) These constraints can be lexicalized (REF.1; REF.2), ll r ec a● unlexicalized (REF.3; TREF.4) or and 93 .1%learned (REF.5; REF.6). 3% preci s ion automatically ng: 9 8 . Taggi● These constraints can be lexicalized (GREF.1), unlexicalized (GTREF.2) or Perfect! automatically learned (GREF.3). Grouping: (GTREF.1) apply fuzzy techniques for integrating source a l:● syntax into hierarchicalence removsystems (REF.2).Non-syn tactic refer phrase-based ecall 9 0. 1% r cision and9 0.08% pre
    16. 16. Experiment (Main) ● CRF ● Chunking ● Majority
    17. 17. Feature Analysis● Feature: Distance, Position(Before/After), Same segment(,.; and, but, for, nor, or, so, yet), POS tag, Dependency Distance, Dependency Relations, Common Ancestor Node, Syntactic Distance
    18. 18. Summary● Identified reference scope in a sentence which has multiple citation● CRF● Chunking● Majority

    ×