SlideShare a Scribd company logo
1 of 44
Learning to Extract Relations from the Web
using Minimal Supervision
Razvan C. Bunescu
Machine Learning Group
Department of Computer Sciences
University of Texas at Austin
razvan@cs.utexas.edu
Raymond J. Mooney
Machine Learning Group
Department of Computer Sciences
University of Texas at Austin
mooney@cs.utexas.com
Introduction: Relation Extraction
• People are often interested in finding relations between
entities:
– What proteins interact with IRAK1?
– Which companies were acquired by Google?
– In which city was Mozart born?
• Relation Extraction (RE) is the task of automatically
locating predefined types of relations in text documents.
1
• Relation Examples:
1) Protein Interactions:
2) Company Acquisitions:
3) People Birthplaces:
Introduction: Relation Extraction
– The phosphorylation of Pellino2 by activated IRAK1 could trigger
the translocation of IRAKs from complex I to II.
– Search engine giant Google has bought video-sharing website
YouTube in a controversial $1.6 billion deal.
– Wolfgang Amadeus Mozart was born to Leopold and Ana Maria
Mozart, in the front room of Getreidegasse 9 in Salzburg.
2
Motivation: Minimal Supervision
• Developing an RE system usually requires a significant
amount of human effort:
– Extraction patterns designed by a human expert [Blaschke et al.,
2002].
– Extraction patterns learned from a corpus of manually annotated
examples [Zelenko et al., 2003; Culotta and Sorensen, 2004].
• A different RE approach:
– Extraction patterns learned from weak supervision derived from a
significantly reduced amount of human supervision.
3
Relation Extraction with Minimal Supervision
• Human supervision  a handful of pairs of entities known
to exhibit (+) or not exhibit (–) a particular relation.
• Weak supervision  bags of sentences containing the
pairs, automatically extracted from a very large corpus.
• Use bags of sentences in a Multiple Instance Learning
framework [Dietterich et al., 1997] to train a relation
extraction model.
4
Types of Supervision for RE
• Single Instance Learning (SIL):
– A corpus of positive and negative sentence examples, with the two
entity names annotated.
– A sentence example is positive iff it explicitly asserts the target
relationship between the two annotated entities.
• Multiple Instance Learning (MIL):
– A corpus of positive and negative bags of sentences.
– A bag is positive iff it contains at least one positive sentence
example.
5
RE from Web with Minimal Supervision
+/ Argument a1 Argument a2
+ Google YouTube
+ Adobe Systems Macromedia
+ Viacom DreamWorks
+ Novartis Eon Labs
 Yahoo Microsoft
 Pfizer Teva
Example pairs of named entities for R  Corporate Acquisitions.
6
Minimal Supervision: Positive bags
Use a search engine to extract bags of sentences containing
both entities in a pair.
Google, YouTube
S1
Search engine giant Google has bought video-sharing website YouTube in a
controversial $1.6 billion deal.
S2
The companies will merge Google's search expertise with YouTube's video
expertise, pushing what executives believe is a hot emerging market of video
offered over the Internet.
. .
. .
. .
Sn
Google has acquired social media company YouTube for $1.65 billion in a
stock-for-stock transaction as announced by Google Inc. on October 9, 2006.
7
Minimal Supervision: Positive bags
Use a search engine to extract bags of sentences containing
both entities in a pair.
Google, YouTube
S1
Search engine giant Google has bought video-sharing website YouTube in a
controversial $1.6 billion deal.
S2
The companies will merge Google's search expertise with YouTube's video
expertise, pushing what executives believe is a hot emerging market of video
offered over the Internet.
. .
. .
. .
Sn
Google has acquired social media company YouTube for $1.65 billion in a
stock-for-stock transaction as announced by Google Inc. on October 9, 2006.
8
Minimal Supervision: Positive bags
Use a search engine to extract bags of sentences containing
both entities in a pair.
Google, YouTube
S1
Search engine giant Google has bought video-sharing website YouTube in a
controversial $1.6 billion deal.
S2
The companies will merge Google's search expertise with YouTube's video
expertise, pushing what executives believe is a hot emerging market of video
offered over the Internet.
. .
. .
. .
Sn
Google has acquired social media company YouTube for $1.65 billion in a
stock-for-stock transaction as announced by Google Inc. on October 9, 2006.
9
Minimal Supervision: Negative Bags
Use a search engine to extract bags of sentences containing
both entities in a pair.
Yahoo, Microsoft
S1
Yahoo is starting to look more like Microsoft and less like the innovative,
unified service that got my loyalty in the first place.
S2
Whatever it is, Yahoo is dashing in front, with Microsoft close behind.
. .
. .
. .
Sn
Yahoo and Microsoft teamed up on October 12 to make their instant
messaging software compatible.
10
Minimal Supervision: Negative Bags
Use a search engine to extract bags of sentences containing
both entities in a pair.
Yahoo, Microsoft
S1
Yahoo is starting to look more like Microsoft and less like the innovative,
unified service that got my loyalty in the first place.
S2
Whatever it is, Yahoo is dashing in front, with Microsoft close behind.
. .
. .
. .
Sn
Yahoo and Microsoft teamed up on October 12 to make their instant
messaging software compatible.
11
MIL Background: Domains
• Originally introduced to solve a Drug Activity prediction
problem in biochemistry [Dietterich et al., 1997]
– Each molecule has a limited set of low energy conformations 
bags of 3D conformations.
– A bag is positive is at least one of the conformations binds to a
predefined target.
– MUSK dataset [Dietterich et al., 1997]
• A bag is positive if the molecule smells “musky”.
• Content Based Image Retrieval [Zhang et al., 2002]
• Text categorization [Andrews et al., 03], [Ray et al., 05].
12
MIL Background: Algorithms
• Axis Parallel Rectangles [Dietterich, 1997]
• Diverse Density [Maron, 1998]
• Multiple Instance Logistic Regression [Ray & Craven, 05]
• Multi-Instance SVM kernels of [Gartner et al., 2002]
– Normalized Set Kernel.
– Statistic Kernel.
13
MIL for Relation Extraction
• Focus on SVM approaches
– Through kernels, can work efficiently with instances that implicitly
belong to a high-dimensional feature spaces.
– Can reuse existing relation extraction kernels.
• Multi-Instance kernels of [Gartner et al., 2002] not appropriate
when very few bags:
– Bags (not instances) are considered as training examples.
– The number of SVs is upper bounded by the number of bags
– Very few bags  very few SVs  insufficient capacity.
14
MIL for Relation Extraction
• A simple approach to MIL is to transform it into a standard supervised
learning problem:
– Apply the bag label to all instances inside the bag.
– Train a standard supervised algorithm on the transformed dataset.
– Despite class noise, obtains competitive results [Ray & Craven, 05]
Google, YouTube
S1 Search engine giant Google has bought video-sharing website YouTube in a controversial
$1.6 billion deal.
S2 The companies will merge Google's search expertise with YouTube's video expertise, pushing
what executives believe is a hot emerging market of video offered over the Internet.
. .
. .
. .
Sn Google has acquired social media company YouTube for $1.65 billion in a stock-for-stock
transaction as announced by Google Inc. on October 9, 2006.
15
MIL for Relation Extraction
• A simple approach to MIL is to transform it into a standard supervised
learning problem:
– Apply the bag label to all instances inside the bag.
– Train a standard supervised algorithm on the transformed dataset.
– Despite class noise, obtains competitive results [Ray & Craven, 05]
Google, YouTube
S1 Search engine giant Google has bought video-sharing website YouTube in a controversial
$1.6 billion deal.
S2 The companies will merge Google's search expertise with YouTube's video expertise, pushing
what executives believe is a hot emerging market of video offered over the Internet.
. .
. .
. .
Sn Google has acquired social media company YouTube for $1.65 billion in a stock-for-stock
transaction as announced by Google Inc. on October 9, 2006.
16
SVM Framework with MIL Supervision








       np X Xx
x
p
n
X Xx
x
n
p
L
L
c
L
L
c
L
C
wJ 
2
2
1
)(
0
,1)(
,1)(



x
nx
px
Xxbxw
Xxbxw



minimize:
subject to:
17
SVM Framework with MIL Supervision








       np X Xx
x
p
n
X Xx
x
n
p
L
L
c
L
L
c
L
C
wJ 
2
2
1
)(
0
,1)(
,1)(



x
nx
px
Xxbxw
Xxbxw



minimize:
subject to:
Regularization term
18
SVM Framework with MIL Supervision








       np X Xx
x
p
n
X Xx
x
n
p
L
L
c
L
L
c
L
C
wJ 
2
2
1
)(
0
,1)(
,1)(



x
nx
px
Xxbxw
Xxbxw



minimize:
subject to:
Error on positive bags
19
SVM Framework with MIL Supervision








       np X Xx
x
p
n
X Xx
x
n
p
L
L
c
L
L
c
L
C
wJ 
2
2
1
)(
0
,1)(
,1)(



x
nx
px
Xxbxw
Xxbxw



minimize:
subject to:
Error on negative bags
20
SVM Framework with MIL Supervision








       np X Xx
x
p
n
X Xx
x
n
p
L
L
c
L
L
c
L
C
wJ 
2
2
1
)(
0
,1)(
,1)(



x
nx
px
Xxbxw
Xxbxw



minimize:
subject to:
• cp, cn > 0, cp+ cn = 1, controls the relative influence that
false negative vs. false positives have on the objective
function.
• want cp < 0.5 (penalize false negatives less than false
positives); used cp = 0.1
21
SVM Framework with MIL Supervision








       np X Xx
x
p
n
X Xx
x
n
p
L
L
c
L
L
c
L
C
wJ 
2
2
1
)(
0
,1)(
,1)(



x
nx
px
Xxbxw
Xxbxw



minimize:
subject to:
• Dual formulation  kernel between bag instances K(x1,x2)  (x1)(x2).
• Use SSK  a subsequence kernel customized for relation extraction.
[Bunescu & Mooney, 2005]
22
The Subsequence Kernel for Relation
Extraction
• Implicit features are sequences of words anchored at the
two entity names.
e1 … bought … e2 … billion … deal.
 s  a word sequence
Google has bought video-sharing website YouTube in a controversial $1.6 billion deal.
g1 1 g2  3 g3  4 g4  0
 x  an example sentence, containing s as a subsequence
[Bunescu & Mooney, 2005].
 s(x)  the value of feature s in example x
0431),(
)( 
  xsgapg
s
i
x
23
The Subsequence Kernel for Relation Extraction
• K(x1,x2)  (x1)(x2)  the number of common “anchored”
subsequences between x1 and x2, weighted by their total gap.
• Many relations require at least one content word  modify
kernel to optionally ignore sequences formed exclusively of
stop words and punctuation signs.
• Kernel is computed efficiently by a generalized version of
the dynamic programming procedure from [Lodhi et al., 2002].
[Bunescu & Mooney, 2005].
24
Two Types of Bias
• The MIL approach to RE differs from other MIL problems
in two respects:
– The training dataset contains very few bags.
– The bags can be very large.
• These properties lead to two types of bias:
– [Type I] Combinations of words that are correlated to the two
relation arguments are given too much weight in the learned
model.
– [Type II] Words specific to a particular relation instance are given
too much weight.
25
Type I Bias
Google, YouTube
S1 Search engine giant Google has bought video-sharing website YouTube
in a controversial $1.6 billion deal.
S2 The companies will merge Google's search expertise with YouTube's
video expertise, pushing what executives believe is a hot emerging
market of video offered over the Internet.
• Overweighted Patterns:
– search … e1 … video … e2
– … e1 … video … e2
– e1 … search … e2
– e1 … search … e2 … video
26
Type II Bias
Google, YouTube
S1
Ever since Google paid $1.65 billion for YouTube in October , plenty of
pundits  from Mark Cuban to yours truly  have been waiting for the other
shoe to drop.
S2
Google Gobbles Up YouTube for $1.6 BILLION  October 9, 2006
S3
Google has acquired social media company YouTube for $1.65 billion in a
stock-for-stock transaction as announced by Google Inc. on October 9, 2006.
• Overweighted Patterns:
– … e1 … for … e2 … October
– … e1 … has … e2 … October
27
A Solution for Type I Bias
• Use the SSK approach, with new feature weight:


sw
xsgap
s wx )()( ),(
),(
)( xsgap
s x  
• Modify subsequence kernel computations to use word
weights (w).
• Want small (w) for words w correlated with either of the
two relation arguments.
28
A Solution for Type I Bias: Word Weights
),(
)()..|(),(
)( 21
wXC
XCaXaXwPwXC
w


29
Use a formula for word weights (w) that discounts the effect
of correlations of w with either of the two arguments a1 and a2.
A Solution for Type I Bias: Word Weights
),(
)()..|(),(
)( 21
wXC
XCaXaXwPwXC
w


The # of sentences in bag X.
30
A Solution for Type I Bias: Word Weights
),(
)()..|(),(
)( 21
wXC
XCaXaXwPwXC
w


The # of sentences in bag X that
contain word w.
31
A Solution for Type I Bias: Word Weights
),(
)()..|(),(
)( 21
wXC
XCaXaXwPwXC
w


The probability that the word w appears in a sentence due
only to the presence of X.a1 or X.a2, assuming X.a1 and
X.a2 are independent causes for w.
)).|(1()).|(1(1)..|( 2121 aXwPaXwPaXaXwP 
).|().|().|().|( 2121 aXwPaXwPaXwPaXwP 
• P(w|a) is the probability that w appears in a sentence due to the presence of a.
• Estimate P(w|a) using counts from a separate bag of sentences containing a.
32
MIL Relation Extraction Datasets
• Given two arguments a1 and a2, submit query string
“a1 * * * * * * * a2” to Google.
• Download the resulting documents (less than 1000).
• Split text into sentences and tokenize using the OpenNLP
package.
• Keep only sentences containing both a1 and a2.
• Replace closest occurrences of a1 and a2 with generic tags
e1 and e2 .
33
MIL Relation Extraction Datasets
+/ Argument a1 Argument a2 Bag size
+ Google YouTube 1375
+ Adobe Systems Macromedia 622
+ Viacom DreamWorks 323
+ Novartis Eon Labs 311
 Yahoo Microsoft 163
 Pfizer Teva 247
+ Pfizer Rinat Neuroscience 50 (41)
+ Yahoo Inktomi 433 (115)
 Google Apple 281
 Viacom NBC 231
Training Pairs
Testing Pairs
manually labeled
all bag sentences
Corporate Acquisitions Dataset
34
MIL Relation Extraction Datasets
+/ Argument a1 Argument a2 Bag size
+ Franz Kafka Prague 522
+ Andre Agassi Las Vegas 386
+ Charlie Chaplin London 292
+ George Gershwin New York 260
 Luc Besson New York 74
 W. A. Mozart Vienna 288
+ Luc Besson Paris 126 (6)
+ Marie Antoinette Vienna 39 (10)
 Charlie Chaplin Hollywood 266
 George Gershwin London 104
Training Pairs
PersonBirthplace Dataset
35
Testing Pairs
manually labeled
all bag sentences
Experimental Results: Systems
• [SSK-MIL] MIL formulation using the original SSK.
• [SSK-T1] MIL formulation with the SSK modified to use
word weights in order to reduce Type I bias.
• [BW-MIL] MIL formulation using a bag-of-words kernel.
• [SSK-SIL] SIL formulation using the original subsequence
kernel:
– Use manually labeled instances from the test bags.
– Train on instances from one positive bag and one negative bag, test
on instances from the other two bags.
– Average results over all four combinations.
36
Experimental Results: Evaluation
1) Plot Precision vs. Recall (PR) graphs:
– vary a threshold on the extraction confidence.
2) Report Area Under PR Curve (AUC).
37
Company Acquisitions
38
Person–Birthplace
39
Experimental Results: AUC
• SSK-T1 is significantly more accurate than SSK-MIL.
• SSK-T1 is competitive with SSK-SIL, however:
– SSK-T1 supervision  only 6 pairs (4 positive).
– SSK-SIL average supervision:
• ~500 manually labeled sentences (78 positive) for Acquisitions.
• ~300 manually labeled sentences (22 positive) for Birthplaces.
Dataset SSK-MIL SSK-T1 BW-MIL SSK-SIL
Company Acquisitions 76.9% 81.1% 45.8% 80.4%
People Birthplace 72.5% 78.2% 69.2% 73.4%
40
Applications & Extensions
• A “Google Sets” system for relation extraction
– Ideally, the user provides only positive pairs.
– Likely negative examples are created by pairing the argument
entity with other named entities in the same sentence.
– Any pair of entities different from the relation pair is likely to be
negative  implicit negative evidence.
Google YouTube
Adobe Systems Macromedia
Viacom DreamWorks
Novartis Eon Labs
Pfizer Rinat Neuroscience
Yahoo Inktomi
. .
. .
. .
Input Output
41
Future Work
• Investigate methods for reducing Type II bias.
• Experiment with other, more sophisticated MIL algorithms.
• Explore the effect of Type I and Type II bias when using
dependency information in the relation extraction kernel.
42
Conclusion
• Presented a new approach to Relation Extraction, trained
using only a handful of pairs of entities known to exhibit or
not exhibit the target relationship.
• Extended an existing subsequence kernel to resolve
problems caused by the minimal supervision provided.
• The new MIL approach is competitive with its SIL
counterpart that uses significantly more human supervision.
43

More Related Content

Similar to Learning to Extract Relations from the Web using Minimal Supervision

SearchLove Boston 2013_Bill Slawski_Future Search
SearchLove Boston 2013_Bill Slawski_Future SearchSearchLove Boston 2013_Bill Slawski_Future Search
SearchLove Boston 2013_Bill Slawski_Future Search
Distilled
 

Similar to Learning to Extract Relations from the Web using Minimal Supervision (20)

google is which type of inovation and about google
google is which type of inovation and about googlegoogle is which type of inovation and about google
google is which type of inovation and about google
 
Effective googling
Effective googlingEffective googling
Effective googling
 
Google Cloud - Google's vision on AI
Google Cloud - Google's vision on AIGoogle Cloud - Google's vision on AI
Google Cloud - Google's vision on AI
 
Google not all clouds are created equal - sap sapphire 2014 (1)
Google not all clouds are created equal - sap sapphire 2014 (1)Google not all clouds are created equal - sap sapphire 2014 (1)
Google not all clouds are created equal - sap sapphire 2014 (1)
 
IRJET- Privacy Issues in Content based Image Retrieval (CBIR) for Mobile ...
IRJET-  	  Privacy Issues in Content based Image Retrieval (CBIR) for Mobile ...IRJET-  	  Privacy Issues in Content based Image Retrieval (CBIR) for Mobile ...
IRJET- Privacy Issues in Content based Image Retrieval (CBIR) for Mobile ...
 
Wall Street Mastermind Sector Spotlight - Technology (October 2023).pdf
Wall Street Mastermind Sector Spotlight - Technology (October 2023).pdfWall Street Mastermind Sector Spotlight - Technology (October 2023).pdf
Wall Street Mastermind Sector Spotlight - Technology (October 2023).pdf
 
Checking the pulse of your cloud native architecture
Checking the pulse of your cloud native architectureChecking the pulse of your cloud native architecture
Checking the pulse of your cloud native architecture
 
Cloud Accelerated Genomics
Cloud Accelerated GenomicsCloud Accelerated Genomics
Cloud Accelerated Genomics
 
20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix
20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix
20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix
 
GOOGLE
GOOGLEGOOGLE
GOOGLE
 
Google Management Ppt (group assignment).pptx
Google Management Ppt (group assignment).pptxGoogle Management Ppt (group assignment).pptx
Google Management Ppt (group assignment).pptx
 
Big Data at DYNO
Big Data at DYNOBig Data at DYNO
Big Data at DYNO
 
EIA2019Portugal - Google Cloud - Andres L Martinez
EIA2019Portugal - Google Cloud - Andres L MartinezEIA2019Portugal - Google Cloud - Andres L Martinez
EIA2019Portugal - Google Cloud - Andres L Martinez
 
SearchLove Boston 2013_Bill Slawski_Future Search
SearchLove Boston 2013_Bill Slawski_Future SearchSearchLove Boston 2013_Bill Slawski_Future Search
SearchLove Boston 2013_Bill Slawski_Future Search
 
Future search search love - bill slawski
Future search   search love - bill slawskiFuture search   search love - bill slawski
Future search search love - bill slawski
 
EIA2019Italy - Google Cloud - Andres L Ortiz
EIA2019Italy - Google Cloud - Andres L OrtizEIA2019Italy - Google Cloud - Andres L Ortiz
EIA2019Italy - Google Cloud - Andres L Ortiz
 
Web 2 & geo
Web 2 & geoWeb 2 & geo
Web 2 & geo
 
ODSC West 2022 – Kitbashing in ML
ODSC West 2022 – Kitbashing in MLODSC West 2022 – Kitbashing in ML
ODSC West 2022 – Kitbashing in ML
 
Big data in action
Big data in actionBig data in action
Big data in action
 
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
 

More from butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
butest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
butest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
butest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
butest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
butest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
butest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
butest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
butest
 
Facebook
Facebook Facebook
Facebook
butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
butest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
butest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
butest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
butest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
butest
 

More from butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

Learning to Extract Relations from the Web using Minimal Supervision

  • 1. Learning to Extract Relations from the Web using Minimal Supervision Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin razvan@cs.utexas.edu Raymond J. Mooney Machine Learning Group Department of Computer Sciences University of Texas at Austin mooney@cs.utexas.com
  • 2. Introduction: Relation Extraction • People are often interested in finding relations between entities: – What proteins interact with IRAK1? – Which companies were acquired by Google? – In which city was Mozart born? • Relation Extraction (RE) is the task of automatically locating predefined types of relations in text documents. 1
  • 3. • Relation Examples: 1) Protein Interactions: 2) Company Acquisitions: 3) People Birthplaces: Introduction: Relation Extraction – The phosphorylation of Pellino2 by activated IRAK1 could trigger the translocation of IRAKs from complex I to II. – Search engine giant Google has bought video-sharing website YouTube in a controversial $1.6 billion deal. – Wolfgang Amadeus Mozart was born to Leopold and Ana Maria Mozart, in the front room of Getreidegasse 9 in Salzburg. 2
  • 4. Motivation: Minimal Supervision • Developing an RE system usually requires a significant amount of human effort: – Extraction patterns designed by a human expert [Blaschke et al., 2002]. – Extraction patterns learned from a corpus of manually annotated examples [Zelenko et al., 2003; Culotta and Sorensen, 2004]. • A different RE approach: – Extraction patterns learned from weak supervision derived from a significantly reduced amount of human supervision. 3
  • 5. Relation Extraction with Minimal Supervision • Human supervision  a handful of pairs of entities known to exhibit (+) or not exhibit (–) a particular relation. • Weak supervision  bags of sentences containing the pairs, automatically extracted from a very large corpus. • Use bags of sentences in a Multiple Instance Learning framework [Dietterich et al., 1997] to train a relation extraction model. 4
  • 6. Types of Supervision for RE • Single Instance Learning (SIL): – A corpus of positive and negative sentence examples, with the two entity names annotated. – A sentence example is positive iff it explicitly asserts the target relationship between the two annotated entities. • Multiple Instance Learning (MIL): – A corpus of positive and negative bags of sentences. – A bag is positive iff it contains at least one positive sentence example. 5
  • 7. RE from Web with Minimal Supervision +/ Argument a1 Argument a2 + Google YouTube + Adobe Systems Macromedia + Viacom DreamWorks + Novartis Eon Labs  Yahoo Microsoft  Pfizer Teva Example pairs of named entities for R  Corporate Acquisitions. 6
  • 8. Minimal Supervision: Positive bags Use a search engine to extract bags of sentences containing both entities in a pair. Google, YouTube S1 Search engine giant Google has bought video-sharing website YouTube in a controversial $1.6 billion deal. S2 The companies will merge Google's search expertise with YouTube's video expertise, pushing what executives believe is a hot emerging market of video offered over the Internet. . . . . . . Sn Google has acquired social media company YouTube for $1.65 billion in a stock-for-stock transaction as announced by Google Inc. on October 9, 2006. 7
  • 9. Minimal Supervision: Positive bags Use a search engine to extract bags of sentences containing both entities in a pair. Google, YouTube S1 Search engine giant Google has bought video-sharing website YouTube in a controversial $1.6 billion deal. S2 The companies will merge Google's search expertise with YouTube's video expertise, pushing what executives believe is a hot emerging market of video offered over the Internet. . . . . . . Sn Google has acquired social media company YouTube for $1.65 billion in a stock-for-stock transaction as announced by Google Inc. on October 9, 2006. 8
  • 10. Minimal Supervision: Positive bags Use a search engine to extract bags of sentences containing both entities in a pair. Google, YouTube S1 Search engine giant Google has bought video-sharing website YouTube in a controversial $1.6 billion deal. S2 The companies will merge Google's search expertise with YouTube's video expertise, pushing what executives believe is a hot emerging market of video offered over the Internet. . . . . . . Sn Google has acquired social media company YouTube for $1.65 billion in a stock-for-stock transaction as announced by Google Inc. on October 9, 2006. 9
  • 11. Minimal Supervision: Negative Bags Use a search engine to extract bags of sentences containing both entities in a pair. Yahoo, Microsoft S1 Yahoo is starting to look more like Microsoft and less like the innovative, unified service that got my loyalty in the first place. S2 Whatever it is, Yahoo is dashing in front, with Microsoft close behind. . . . . . . Sn Yahoo and Microsoft teamed up on October 12 to make their instant messaging software compatible. 10
  • 12. Minimal Supervision: Negative Bags Use a search engine to extract bags of sentences containing both entities in a pair. Yahoo, Microsoft S1 Yahoo is starting to look more like Microsoft and less like the innovative, unified service that got my loyalty in the first place. S2 Whatever it is, Yahoo is dashing in front, with Microsoft close behind. . . . . . . Sn Yahoo and Microsoft teamed up on October 12 to make their instant messaging software compatible. 11
  • 13. MIL Background: Domains • Originally introduced to solve a Drug Activity prediction problem in biochemistry [Dietterich et al., 1997] – Each molecule has a limited set of low energy conformations  bags of 3D conformations. – A bag is positive is at least one of the conformations binds to a predefined target. – MUSK dataset [Dietterich et al., 1997] • A bag is positive if the molecule smells “musky”. • Content Based Image Retrieval [Zhang et al., 2002] • Text categorization [Andrews et al., 03], [Ray et al., 05]. 12
  • 14. MIL Background: Algorithms • Axis Parallel Rectangles [Dietterich, 1997] • Diverse Density [Maron, 1998] • Multiple Instance Logistic Regression [Ray & Craven, 05] • Multi-Instance SVM kernels of [Gartner et al., 2002] – Normalized Set Kernel. – Statistic Kernel. 13
  • 15. MIL for Relation Extraction • Focus on SVM approaches – Through kernels, can work efficiently with instances that implicitly belong to a high-dimensional feature spaces. – Can reuse existing relation extraction kernels. • Multi-Instance kernels of [Gartner et al., 2002] not appropriate when very few bags: – Bags (not instances) are considered as training examples. – The number of SVs is upper bounded by the number of bags – Very few bags  very few SVs  insufficient capacity. 14
  • 16. MIL for Relation Extraction • A simple approach to MIL is to transform it into a standard supervised learning problem: – Apply the bag label to all instances inside the bag. – Train a standard supervised algorithm on the transformed dataset. – Despite class noise, obtains competitive results [Ray & Craven, 05] Google, YouTube S1 Search engine giant Google has bought video-sharing website YouTube in a controversial $1.6 billion deal. S2 The companies will merge Google's search expertise with YouTube's video expertise, pushing what executives believe is a hot emerging market of video offered over the Internet. . . . . . . Sn Google has acquired social media company YouTube for $1.65 billion in a stock-for-stock transaction as announced by Google Inc. on October 9, 2006. 15
  • 17. MIL for Relation Extraction • A simple approach to MIL is to transform it into a standard supervised learning problem: – Apply the bag label to all instances inside the bag. – Train a standard supervised algorithm on the transformed dataset. – Despite class noise, obtains competitive results [Ray & Craven, 05] Google, YouTube S1 Search engine giant Google has bought video-sharing website YouTube in a controversial $1.6 billion deal. S2 The companies will merge Google's search expertise with YouTube's video expertise, pushing what executives believe is a hot emerging market of video offered over the Internet. . . . . . . Sn Google has acquired social media company YouTube for $1.65 billion in a stock-for-stock transaction as announced by Google Inc. on October 9, 2006. 16
  • 18. SVM Framework with MIL Supervision                np X Xx x p n X Xx x n p L L c L L c L C wJ  2 2 1 )( 0 ,1)( ,1)(    x nx px Xxbxw Xxbxw    minimize: subject to: 17
  • 19. SVM Framework with MIL Supervision                np X Xx x p n X Xx x n p L L c L L c L C wJ  2 2 1 )( 0 ,1)( ,1)(    x nx px Xxbxw Xxbxw    minimize: subject to: Regularization term 18
  • 20. SVM Framework with MIL Supervision                np X Xx x p n X Xx x n p L L c L L c L C wJ  2 2 1 )( 0 ,1)( ,1)(    x nx px Xxbxw Xxbxw    minimize: subject to: Error on positive bags 19
  • 21. SVM Framework with MIL Supervision                np X Xx x p n X Xx x n p L L c L L c L C wJ  2 2 1 )( 0 ,1)( ,1)(    x nx px Xxbxw Xxbxw    minimize: subject to: Error on negative bags 20
  • 22. SVM Framework with MIL Supervision                np X Xx x p n X Xx x n p L L c L L c L C wJ  2 2 1 )( 0 ,1)( ,1)(    x nx px Xxbxw Xxbxw    minimize: subject to: • cp, cn > 0, cp+ cn = 1, controls the relative influence that false negative vs. false positives have on the objective function. • want cp < 0.5 (penalize false negatives less than false positives); used cp = 0.1 21
  • 23. SVM Framework with MIL Supervision                np X Xx x p n X Xx x n p L L c L L c L C wJ  2 2 1 )( 0 ,1)( ,1)(    x nx px Xxbxw Xxbxw    minimize: subject to: • Dual formulation  kernel between bag instances K(x1,x2)  (x1)(x2). • Use SSK  a subsequence kernel customized for relation extraction. [Bunescu & Mooney, 2005] 22
  • 24. The Subsequence Kernel for Relation Extraction • Implicit features are sequences of words anchored at the two entity names. e1 … bought … e2 … billion … deal.  s  a word sequence Google has bought video-sharing website YouTube in a controversial $1.6 billion deal. g1 1 g2  3 g3  4 g4  0  x  an example sentence, containing s as a subsequence [Bunescu & Mooney, 2005].  s(x)  the value of feature s in example x 0431),( )(    xsgapg s i x 23
  • 25. The Subsequence Kernel for Relation Extraction • K(x1,x2)  (x1)(x2)  the number of common “anchored” subsequences between x1 and x2, weighted by their total gap. • Many relations require at least one content word  modify kernel to optionally ignore sequences formed exclusively of stop words and punctuation signs. • Kernel is computed efficiently by a generalized version of the dynamic programming procedure from [Lodhi et al., 2002]. [Bunescu & Mooney, 2005]. 24
  • 26. Two Types of Bias • The MIL approach to RE differs from other MIL problems in two respects: – The training dataset contains very few bags. – The bags can be very large. • These properties lead to two types of bias: – [Type I] Combinations of words that are correlated to the two relation arguments are given too much weight in the learned model. – [Type II] Words specific to a particular relation instance are given too much weight. 25
  • 27. Type I Bias Google, YouTube S1 Search engine giant Google has bought video-sharing website YouTube in a controversial $1.6 billion deal. S2 The companies will merge Google's search expertise with YouTube's video expertise, pushing what executives believe is a hot emerging market of video offered over the Internet. • Overweighted Patterns: – search … e1 … video … e2 – … e1 … video … e2 – e1 … search … e2 – e1 … search … e2 … video 26
  • 28. Type II Bias Google, YouTube S1 Ever since Google paid $1.65 billion for YouTube in October , plenty of pundits  from Mark Cuban to yours truly  have been waiting for the other shoe to drop. S2 Google Gobbles Up YouTube for $1.6 BILLION  October 9, 2006 S3 Google has acquired social media company YouTube for $1.65 billion in a stock-for-stock transaction as announced by Google Inc. on October 9, 2006. • Overweighted Patterns: – … e1 … for … e2 … October – … e1 … has … e2 … October 27
  • 29. A Solution for Type I Bias • Use the SSK approach, with new feature weight:   sw xsgap s wx )()( ),( ),( )( xsgap s x   • Modify subsequence kernel computations to use word weights (w). • Want small (w) for words w correlated with either of the two relation arguments. 28
  • 30. A Solution for Type I Bias: Word Weights ),( )()..|(),( )( 21 wXC XCaXaXwPwXC w   29 Use a formula for word weights (w) that discounts the effect of correlations of w with either of the two arguments a1 and a2.
  • 31. A Solution for Type I Bias: Word Weights ),( )()..|(),( )( 21 wXC XCaXaXwPwXC w   The # of sentences in bag X. 30
  • 32. A Solution for Type I Bias: Word Weights ),( )()..|(),( )( 21 wXC XCaXaXwPwXC w   The # of sentences in bag X that contain word w. 31
  • 33. A Solution for Type I Bias: Word Weights ),( )()..|(),( )( 21 wXC XCaXaXwPwXC w   The probability that the word w appears in a sentence due only to the presence of X.a1 or X.a2, assuming X.a1 and X.a2 are independent causes for w. )).|(1()).|(1(1)..|( 2121 aXwPaXwPaXaXwP  ).|().|().|().|( 2121 aXwPaXwPaXwPaXwP  • P(w|a) is the probability that w appears in a sentence due to the presence of a. • Estimate P(w|a) using counts from a separate bag of sentences containing a. 32
  • 34. MIL Relation Extraction Datasets • Given two arguments a1 and a2, submit query string “a1 * * * * * * * a2” to Google. • Download the resulting documents (less than 1000). • Split text into sentences and tokenize using the OpenNLP package. • Keep only sentences containing both a1 and a2. • Replace closest occurrences of a1 and a2 with generic tags e1 and e2 . 33
  • 35. MIL Relation Extraction Datasets +/ Argument a1 Argument a2 Bag size + Google YouTube 1375 + Adobe Systems Macromedia 622 + Viacom DreamWorks 323 + Novartis Eon Labs 311  Yahoo Microsoft 163  Pfizer Teva 247 + Pfizer Rinat Neuroscience 50 (41) + Yahoo Inktomi 433 (115)  Google Apple 281  Viacom NBC 231 Training Pairs Testing Pairs manually labeled all bag sentences Corporate Acquisitions Dataset 34
  • 36. MIL Relation Extraction Datasets +/ Argument a1 Argument a2 Bag size + Franz Kafka Prague 522 + Andre Agassi Las Vegas 386 + Charlie Chaplin London 292 + George Gershwin New York 260  Luc Besson New York 74  W. A. Mozart Vienna 288 + Luc Besson Paris 126 (6) + Marie Antoinette Vienna 39 (10)  Charlie Chaplin Hollywood 266  George Gershwin London 104 Training Pairs PersonBirthplace Dataset 35 Testing Pairs manually labeled all bag sentences
  • 37. Experimental Results: Systems • [SSK-MIL] MIL formulation using the original SSK. • [SSK-T1] MIL formulation with the SSK modified to use word weights in order to reduce Type I bias. • [BW-MIL] MIL formulation using a bag-of-words kernel. • [SSK-SIL] SIL formulation using the original subsequence kernel: – Use manually labeled instances from the test bags. – Train on instances from one positive bag and one negative bag, test on instances from the other two bags. – Average results over all four combinations. 36
  • 38. Experimental Results: Evaluation 1) Plot Precision vs. Recall (PR) graphs: – vary a threshold on the extraction confidence. 2) Report Area Under PR Curve (AUC). 37
  • 41. Experimental Results: AUC • SSK-T1 is significantly more accurate than SSK-MIL. • SSK-T1 is competitive with SSK-SIL, however: – SSK-T1 supervision  only 6 pairs (4 positive). – SSK-SIL average supervision: • ~500 manually labeled sentences (78 positive) for Acquisitions. • ~300 manually labeled sentences (22 positive) for Birthplaces. Dataset SSK-MIL SSK-T1 BW-MIL SSK-SIL Company Acquisitions 76.9% 81.1% 45.8% 80.4% People Birthplace 72.5% 78.2% 69.2% 73.4% 40
  • 42. Applications & Extensions • A “Google Sets” system for relation extraction – Ideally, the user provides only positive pairs. – Likely negative examples are created by pairing the argument entity with other named entities in the same sentence. – Any pair of entities different from the relation pair is likely to be negative  implicit negative evidence. Google YouTube Adobe Systems Macromedia Viacom DreamWorks Novartis Eon Labs Pfizer Rinat Neuroscience Yahoo Inktomi . . . . . . Input Output 41
  • 43. Future Work • Investigate methods for reducing Type II bias. • Experiment with other, more sophisticated MIL algorithms. • Explore the effect of Type I and Type II bias when using dependency information in the relation extraction kernel. 42
  • 44. Conclusion • Presented a new approach to Relation Extraction, trained using only a handful of pairs of entities known to exhibit or not exhibit the target relationship. • Extended an existing subsequence kernel to resolve problems caused by the minimal supervision provided. • The new MIL approach is competitive with its SIL counterpart that uses significantly more human supervision. 43