Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

An alignment-based approach to semi-supervised relation extraction including multiple arguments

295 views

Published on

Published in: Business, Technology
  • Be the first to comment

  • Be the first to like this

An alignment-based approach to semi-supervised relation extraction including multiple arguments

  1. 1. An alignment-based Approach to Semi-supervised Relation Extraction Including Multiple Arguments Seokhwan Kim, Minwoo Jeong, Gary Geunbae Lee, Kwangil Ko, and Zino Lee {megaup, stardust, gblee}@postech.ac.kr, {kik, zino}@alticast.com Abstract - We present an alignment-based approach to semi-supervised relation extraction task including more than two arguments. We concentrate on improving not only the precision of the extracted result, but also on the coverage of the method. Our relation extraction method is based on an alignment-based pattern matching approach which provides more flexibility of the method. In addition, we extract all relationships including two or more arguments at once in order to obtain the integrated result with high quality. We present experimental results which indicate the effectiveness of our method. Alignment-based Information Extractionv Information Extraction v Sentence Alignment for Information Extraction w Matrix Computationw Extracting the defined number of relevant w Example M i 1, j 1 sim iarguments from natural language documents the character <ROLE> portrayed by <ACTOR> in the television series <PROGRAM> is 1, j 1 M i 1, j gpw Subtasks M i, j max M i , j 1 gp # of arguments subtask 0 1 named-entity recognition character Michael Scofield portrayed by Wentworth Miller in the TV series Prison Break is { 1, if PTNi = RAWj 2 binary relation extraction w Alignment Matrix simi,j = or PTNi = <label> more than 2 relation/event extraction character the character Michael Scofield portrayed by Wentworth Miller in the TV series Prison Break is 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0, otherwise <ROLE> 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2w Approaches portrayed by 1 1 1 1 2 2 2 2 3 3 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 w Trace Back <ACTOR> 1 2 2 3 3 4 5 5 5 5 5 5 5 5 5 w Supervised in the 1 1 2 2 2 2 3 3 3 3 4 4 5 5 5 5 6 6 6 7 6 7 6 7 6 7 6 7 6 7 M i,j next position w Un/Semi-Supervised television series 1 1 2 2 2 2 3 3 3 3 4 4 5 5 5 5 6 6 7 7 7 7 7 8 7 8 7 8 7 8 M i,j-1 +gp [i, j-1] M i-1,j-1 + simi,j [i-1, j-1] <PROGRAM> 1 2 3 3 4 4 5 6 6 7 8 8 9 9 9 is 1 2 3 3 4 4 5 6 6 7 8 8 9 9 10 M i-1,j +gp [i-1, j] Semi-supervised Relation Extraction Including Multiple Arguments v Overall Architecture v Context Patterns Extraction v Alignment-based Verification 1) Searching the sentences containing all w Aligning between two candidate arguments arguments of each tuple in source documents Seed Data 2) Segmenting out subpart of the sentence with max{M(A, B)}× 2 n arguments similarity(A,B) = the window size w length(A) + length(B) 3) Replacing the parts of arguments in the sub- Seed Data Seed Data Seed Data Seed Data Seed Data Seed Data Seed Data w Tuple clustering based on 2 arguments k arguments n args sentence with argument labels Extracting Extracting … Extracting … Extracting Extracting … Extracting … Extracting v Relation Extraction based on sim(tuple1, tuple2) = Context Context Context Context Context Context Context Patterns Patterns Patterns Patterns Patterns Patterns Patterns Relation Relation Relation Relation Relation Relation Relation Pairwise Alignment |args| tuple2i) i=1 similarity(tuple1i, Extraction Extraction Extraction Extraction Extraction Extraction Extraction w Alignment score |arguments| Validation & max{M(PTN, RAW)} Integration Results score(PTN, RAW) = w Selecting the most probable tuple for each n arguments length(PTN) cluster Experimental Resultsv Experimental Setupw 930 Korean news documents (13,175 sents) about TV seriesw Only a tuple with 4 arguments (CHANNEL, PROGRAM, ACTOR, ROLE) is used as a seed v Comparison on the Coverage forw Each result is collected after the first iteration and evaluated manually Various Threshold Valuesv Result of the verification v Result of the integration 90 80 before after with only type of with all 70 verification verification type of binary relations intermediates 60 |tuples| P |tuples| P relations relations # of correct results (A,R) 249 36.55 79 73.42 |tuples| P |tuples| P 50 (P,R) 19 52.63 17 58.82 (P,A,R) 9 77.78 9 88.89 40 (P,A) 10 60 10 60 (C,P,R) 11 81.82 16 87.5 30 (C,P) 12 33.33 6 66.67 (C,P,A) 12 58.33 9 77.78 20 (P,A,R) 7 42.86 5 60 (C,P,A,R) 8 87.5 16 87.5 including 2 arguments (C,P,R) 18 55.56 16 81.25 10 including 3 arguments including 4 arguments (C,P,A) 8 62.5 8 75 w th = 0.85 0 1.00 0.95 0.90 0.85 0.80 0.75 0.70 (C,P,A,R) 15 60 14 85.71 w C(Channel), P(Program), A(Actor), R(Role) threshold

×