An alignment-based approach to semi-supervised relation extraction including multiple arguments

An alignment-based Approach to Semi-supervised Relation Extraction
Including Multiple Arguments
Seokhwan Kim, Minwoo Jeong, Gary Geunbae Lee, Kwangil Ko, and Zino Lee
{megaup, stardust, gblee}@postech.ac.kr, {kik, zino}@alticast.com

Abstract - We present an alignment-based approach to semi-supervised relation extraction task including more than two arguments. We concentrate
on improving not only the precision of the extracted result, but also on the coverage of the method. Our relation extraction method is based on an
alignment-based pattern matching approach which provides more flexibility of the method. In addition, we extract all relationships including two or
more arguments at once in order to obtain the integrated result with high quality. We present experimental results which indicate the effectiveness of
our method.

Alignment-based Information Extraction
v Information Extraction v Sentence Alignment for Information Extraction w Matrix Computation
w Extracting the defined number of relevant w Example M i 1, j 1 sim i
arguments from natural language documents the character <ROLE> portrayed by <ACTOR> in the television series <PROGRAM> is
1, j 1
M i 1, j gp
w Subtasks M i, j max
M i , j 1 gp
# of arguments subtask 0
1 named-entity recognition character Michael Scofield portrayed by Wentworth Miller in the TV series Prison Break is

{
1, if PTNi = RAWj
2 binary relation extraction w Alignment Matrix
simi,j = or PTNi = <label>
more than 2 relation/event extraction character
the character Michael Scofield portrayed by Wentworth Miller in the TV series Prison Break is
0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0, otherwise
<ROLE> 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2

w Approaches portrayed
by
1
1
1
1
2
2
2
2
3
3
3
4
3
4
3
4
3
4
3
4
3
4
3
4
3
4
3
4
3
4
w Trace Back
<ACTOR> 1 2 2 3 3 4 5 5 5 5 5 5 5 5 5
w Supervised in
the
1
1
2
2
2
2
3
3
3
3
4
4
5
5
5
5
6
6
6
7
6
7
6
7
6
7
6
7
6
7
M i,j next position
w Un/Semi-Supervised television
series
1
1
2
2
2
2
3
3
3
3
4
4
5
5
5
5
6
6
7
7
7
7
7
8
7
8
7
8
7
8 M i,j-1 +gp [i, j-1]
M i-1,j-1 + simi,j [i-1, j-1]
<PROGRAM> 1 2 3 3 4 4 5 6 6 7 8 8 9 9 9
is 1 2 3 3 4 4 5 6 6 7 8 8 9 9 10

M i-1,j +gp [i-1, j]

Semi-supervised Relation Extraction Including Multiple Arguments
v Overall Architecture v Context Patterns Extraction v Alignment-based Verification
1) Searching the sentences containing all w Aligning between two candidate arguments
arguments of each tuple in source documents
Seed Data
2) Segmenting out subpart of the sentence with max{M(A, B)}× 2
n arguments
similarity(A,B) =
the window size w length(A) + length(B)
3) Replacing the parts of arguments in the sub-
Seed Data Seed Data Seed Data Seed Data Seed Data Seed Data Seed Data
w Tuple clustering based on
2 arguments k arguments n args
sentence with argument labels
Extracting Extracting
… Extracting
… Extracting Extracting
… Extracting
… Extracting

v Relation Extraction based on sim(tuple1, tuple2) =
Context Context Context Context Context Context Context
Patterns Patterns Patterns Patterns Patterns Patterns Patterns

Relation Relation Relation Relation Relation Relation Relation
Pairwise Alignment |args|
tuple2i)
i=1 similarity(tuple1i,
Extraction Extraction Extraction Extraction Extraction Extraction Extraction

w Alignment score
|arguments|
Validation & max{M(PTN, RAW)}
Integration
Results
score(PTN, RAW) = w Selecting the most probable tuple for each
n arguments
length(PTN)
cluster

Experimental Results
v Experimental Setup
w 930 Korean news documents (13,175 sents) about TV series
w Only a tuple with 4 arguments (CHANNEL, PROGRAM, ACTOR, ROLE) is used as a seed
v Comparison on the Coverage for
w Each result is collected after the first iteration and evaluated manually
Various Threshold Values
v Result of the verification v Result of the integration
90

80

before after with only
type of with all
70
verification verification type of binary
relations intermediates 60
|tuples| P |tuples| P relations relations
# of correct results

(A,R) 249 36.55 79 73.42 |tuples| P |tuples| P
50

(P,R) 19 52.63 17 58.82 (P,A,R) 9 77.78 9 88.89 40

(P,A) 10 60 10 60 (C,P,R) 11 81.82 16 87.5 30
(C,P) 12 33.33 6 66.67 (C,P,A) 12 58.33 9 77.78 20
(P,A,R) 7 42.86 5 60 (C,P,A,R) 8 87.5 16 87.5 including 2 arguments
(C,P,R) 18 55.56 16 81.25 10 including 3 arguments
including 4 arguments
(C,P,A) 8 62.5 8 75 w th = 0.85 0
1.00 0.95 0.90 0.85 0.80 0.75 0.70
(C,P,A,R) 15 60 14 85.71 w C(Channel), P(Program), A(Actor), R(Role) threshold

An alignment-based approach to semi-supervised relation extraction including multiple arguments

Recommended

Recommended

More Related Content

What's hot

What's hot (9)

Viewers also liked

Viewers also liked (8)

More from Seokhwan Kim

More from Seokhwan Kim (15)

Recently uploaded

Recently uploaded (20)

An alignment-based approach to semi-supervised relation extraction including multiple arguments