TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
PATTY: A Taxonomy of Relational Patterns with Semantic Types
1. PATTY:
A Taxonomy of Relational Patterns
with Semantic Types
Authors:
Ndapandula Nakashole, Gerhard Weikum, Fabian Suchanek
(Max Planck Institute for Informatics)
Expositor:
Akihiro Kameda
(Aizawa Lab. in The University of Tokyo)
3. SOL Pattern and synset
● Example: Syntactic + Ontological + Lexical
● <person>'s [adj] voice * <song>
● “Amy Winehouse's soft voice in 'Rehab'”
● Type signature: <person> × <song>
● Support set: {(Amy, Rehab), (Elvis, AllshookUp)}
● Synset
● syntactically general: X matches A ⊆ Y matches B
● semantically general: P supports A ⊆ Q support B
● synonymous: P⊆semQ ∧ Q⊆semP
4. Mining algorithm
● Pattern extraction and generalization
● Lexical + Ontological → +Syntactic
● Taxonomy Construction
● Find subsumption relationship
● Integrate them into DAG (directed acyclic graph)
5. Pattern Extraction
● Prepare surface name and semantic type dict
–YAGO2, Freebase
● Disambiguation
– Context-similarity prior proposed by Suchanek 2009
● Yields dependency path and connect 2 NE
– Stanford Parser
● “Winehouse effortlessly performed her song Rehab”
→”Amy Winehouse effortlessly performed Rehab(song)”
6. Syntactic Pattern Generalization
● Lexicon to POS-tags, wild-cards, or types
● Amy Winehouse's soft voice in 'Rehab'
● <person>'s soft voice in <song>
● <person>'s [adj] voice * <song>
● Generate all possible generalization at first.
● If that subsumes multiple patterns with disjoint
support sets, that is rejected.
7. Taxonomy Construction
● Compare every pattern support?
● Too slow. → Use Prefix-tree method (Han 2005)
● Frequency ordered (descending)
● total <= |total entity pairs|
● depth <= |largest support set|
8. Taxonomy Construction
● Traversing the tree
in bottom up manner.
● Find subsumption
by finding set inclusion
●
soft
p3 is nearly included by p4
9. Wilson estimator
B B S B
S
S
● Naively, deg(S⊆ B) = |S∩B|/|S|
● |S| should be considered also...
● Regard S as random sample from S'
● [c-d, c+d] (c≒0.5, d≒0.5→c≒|S∩B|/|S|, d≒0)
● deg(S⊂ B) = c-d
λ=Zα/2=1.96
10. DAG Construction
● Eliminate cyclic edge as few as possible
● … is NP hard.
● Greedy algorithm
● add by Wilson score order
● if the relation path exists already or creates a cycle,
do not add.
11. Mined Result (5 experiments)
● 2 data
● the New York Times archive (NYT) which includes about 1.8
Million newspaper articles from the years 1987 to 2007
● the English edition of Wikipedia (WKP), which contains about
3.8 Million articles (as of June 21, 2011)
● 2 knowledge base
● YAGO2 consists of about 350,000 semantic classes from
WordNet and the Wikipedia category system
●
Freebase consists of 85 domains and a total of about 2000
types within these domains
●
Ordered or Random sampling
●
typed/untyped order
12.
13.
14.
15. Summary of experiment
● High precision
● High recall
● WKP > NYT
● YAGO2 > Freebase
● Type is strong information
● Interesting
16. Summary
● Syntactic + Ontological + Lexical Patterns
with taxonomy tree
● 350,569 synset / precision 84.7%
8,162 subsumption / precision 75.0%
● Available online!
http://www.mpi-inf.mpg.de/yago-naga/patty/