An Alignment-based Pattern Representation Model for Information Extraction
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

An Alignment-based Pattern Representation Model for Information Extraction

  • 186 views
Uploaded on

 

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
186
On Slideshare
186
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
0
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. An Alignment-based Pattern Representation Model for Information Extraction Seokhwan Kim, Minwoo Jeong, Gary Geunbae Lee {megaup, stardust, gblee}@postech.ac.kr Abstract - In this paper, we propose an alternative pattern representation model and the effective method of utilizing it. While the previous pattern representation models completely depend on the result of dependency analysis, our approach is basically based on the lexical alignment and considers the result of dependency analysis only as a meaningful feature of the alignment process. In this way, we can cope with the errors of incomplete dependency analysis. An evaluation of a scenario template task shows that our proposed model outperforms the previous syntax-dependent models. Pattern Representation Model for Information Extraction Information Extraction  Pattern Representation Model for IE  Related Works Extracting the defined number of relevant  Problem Definition  Lexical Sequence Pattern Modelsarguments from natural language documents  Ex)  A set of lexical sequences Subtasks <hum_tgt> be kidnapped About 50 peasants have been kidnapped by terrorists of the FMNL be kidnapped by <prep_ind> # of arguments subtask 1 named-entity recognition Extraction Pattern  Syntax-dependent Pattern Models 2 binary relation extraction ?  A set of subtrees (from D-tree) more than 2 relation/event extraction incident type kidnapping kidnapped Approach prep_ind terrorists nsubjpass agent  Automatic Pattern Learning prep_org FMNL peasants terrorists hum_tgt peasants  Pattern Representation Model prep_of  Pattern Learning Algorithm (kidnapped ({HUM_TGT}-nsubjpass) FMNL (kidnapped ({PREP_IND}-agent)) (kidnapped ({PREP_IND}-agent ({PREP_ORG}-prep_of))) Method Our Approach  Pattern Sequences Extraction  Pattern Model 1) Searching the sentences containing all  Ex) (3+1)/(0+1) = 4  Lexical Sequence Pattern arguments of each tuple in source documents 2) Segmenting out subpart of the sentence kidnapped  + Term Weight (from Dependency Analysis) based on clausal boundaries nsubjpass agent <HUM_TGT> of [NP] have been kidnapped by <PRED_IND> 3) Replacing the parts of arguments in the (1+1)/(1+1) = 1 (2+1)/(1+1) = 1.5 1 0.33 0.33 4 4 4 1.5 1.5 sub-sentence with argument labels  Computing Term Weights <HUM_TGT> <PREP_IND>  Soft Pattern Matching  Sequence Alignment wi = (ri + c) / (di + c) prep_of prep_of about 50 peasants have been kidnapped by terrorists wi : weight of i-th term ri : number of relevant terms within [NP] <PREP_ORG> a subtree, ti as root <HUM_TGT> of [NP] have been kidnapped by <PREP_IND> di : distance from root node (0+1)/(2+1) = 0.33 (1+1)/(2+1) =0.67 c : for smoothing (default:1) Experiment Pattern Matching  Sequence Alignment  Experimental Setup  Experimental Result  Based on a Dynamic Programming  Data  Pattern Models  Alignment Matrix  MUC-3/4 Data  SVO Model (Yangarber ‘00) peasants have been kidnapped by terrorists  About the Terrorism Events  Linked-Chain Model (Greenwood ‘06) <HUM_TGT> 1 0 0 0 0 0 of 0 1 0 0 0 0  Simpler template structure with 4 slots  Subtree Model (Sudo ‘03) [NP] 0 0.66 1 0 0 0  perp_ind, perp_org, phys_tgt, hum_tgt  Our Model have 0 4 3 2 1 0 been 0 0 8 7 6 5  Dev-set (training), Test-set (evaluation)  Result kidnapped 0 0 4 12 11 10  Preprocessing Model Precision Recall F-measure by 0 0 0 8 13.5 12.5 <PRED_IND> 1.5 0.5 0 0 0 15  Dependency Parsing and NP-chunking SVO 21.74 20.62 21.16  Stanford Parser Linked-Chain 20.04 26.55 22.84  Matrix Computation  Extracting Pattern Candidates Subtree 23.34 32.73 27.25 Alignment 23.35 45.62 30.89 Mi-1,j-1 + sim i-1,j-1 * wi-1  Selecting all pattern candidates for test Mi-1,j + gp * wi-1  Without pattern filtering  Our proposed model achieved much Mi,j = max Mi,j-1 + gp * wi  To compare not the pattern filtering higher recall than the other models with 0 method, but the representative performance similar precision among pattern models