SlideShare a Scribd company logo
1
Hierarchical Entity Extraction and Ranking with
Unsupervised Graph Convolutions
Presenter: Zhexiong Liu
Advisor: Jinho D. Choi
2
Outline
• Introduction
• Approaches
• Experiments
• Analysis
• Conclusion
• Reference
3
Outline
• Introduction
• Approaches
• Experiments
• Analysis
• Conclusion
• Reference
4
Informa(on Extrac(on aims to extract structured data
from unstructured data
• Named En((es:
Persons, Organiza.ons, Loca.ons, etc
• A6ributes:
Name, Descriptors, Categories, etc
• Rela(onships:
Person works for Organiza.on
• Events:
Kansas Chiefs won Super Bowl 2020
Background
5
• Keyword Extraction
– Generates a list of phrases from texts
– Without coreference
– Without importance
Related Works
6
• Named Entity Extraction
– Finding named entities in a text
– Conduct classifications
– Without coreference
– Without importance
Related Works
7
• Entity Salience Detection
– Identify main entities
– Binary categories
– Without ranked order
– Without coreference
Related Works
8
• This thesis focus
– Entities Extraction
– Entity Ranking
“Three Miami Dolphins players were spotted taking a
knee on the sideline during the singing of the national
anthem: Kenny Stills, Michael Thomas and Julius
Thomas.”
Proposed Task
9
“Three Miami Dolphins players were spotted taking a
knee on the sideline during the singing of the national
anthem: Kenny Stills, Michael Thomas and Julius
Thomas.”
1. the national anthem
2. a knee
3. Three Miami Dolphins players
Proposed Task
10
Outline
• Introduction
• Approaches
• Experiments
• Analysis
• Conclusion
• Reference
11
Input:
• Given a document D with sequence
Outputs:
• Extracted en((es
• En(ty rank scores
• En(ty hierarchy
Task Formulation
12
• Investigate unsupervised approaches for entity
extraction and ranking
• Determine the effectiveness of contextualized
embeddings for mention representation
• Exam coreference-based and parsing-based graph
convolution for embedding normalization
Research Questions
13
• Baseline model
– Constituency parsing
– Mention embedding
Proposed Approaches
14
• The purpose of constituency parsing is to
obtain mention candidates
• Process plaint data without labels
• Develop noun phrase trunker based on
constituency parsing tree
Cons=tuency parsing
15
Constituency parsing
“Kansas City Chiefs won their first Super Bowl on
February 2020”
16
• Rooted trees representing the syntactic
structures
• Trunked phrases may contain multiple or
single tokens (words)
“Kansas City Chiefs won their first Super Bowl on
February 2020”
Constituency parsing
17
• Map word into vectors
• Contextualized world embedding
Word Embedding
18
BERT Word Embedding
http://jalammar.github.io/illustrated-bert/
19
http://jalammar.github.io/illustrated-bert/
BERT Word Embedding
20
• Sum last four layers of BERT to obtain
contextualized word embedding
• Average each word embeddings in a mention
to present phrase embeddings
• Map phrases (mentions) into vectors
Phrase Embedding
21
• A graph for clustering
• Nodes are menIon set
• Each node has a phrase embedding
• Edges distance between and
Construct Cluster Graph
22
• The density of adjacency matrix for cluster
graph
Perform Clustering
23
HDBSCAN Clustering
• No need to indicate the number of clusters
• Cluster with noises, which could be singletons
• Perform hierarchy for cluster results
Minimum
Spanning
Tree
Single-
linkage
Hierarchy
Cluster
Graph
Cluster
results
24
Entity Ranking
• Based on the mention hierarchy, each
mention has a score
• Each clusters or singletons maintain a score
25
• Heavily rely on contextualized embeddings
• Cannot handle syntactics in the sentences
• May extract duplicate entities
• Cluster graph is not quite sparse-dense
oriented to perform HDBSCAN
Baseline Drawbacks
26
• Coreference model
– En(ty coreference
– Men(on normaliza(on
Proposed Approaches
27
• Link the entity mentions that refer to the
same entity across document.
Coreference Resolution
28
• Based on BERT model
• Mask spans instead of tokens
SpanBERT Coreference
29
Problem: Two mentions have different spans
• Mention cluster from SpanBERT:
– {Kansas City Chiefs, This team, their, the Chiefs}
• Mention extracted from Constituency Parsing:
– {Kansas City Chiefs, their first Super Bowl victory,
the chiefs' first NFL champion}
Men=on Alignments
30
Alignment rule: Half-Half
• If two mentions have half overlap, the parser-
based one is assigned to an entity cluster.
• Convert coreference-based mention cluster to
parser-based mention cluster
Mention Alignments
31
• Normalize mention embedding
• Mentions outside coreference clusters
• Mentions inside coreference clusters
Coreference Model
32
• Construct cluster graph
• Each node has normalized embedding
based on coreference resolution cluster it it
fall into a cluster, otherwise,
• Edges distance
• Perform HDBSCAN
• Obtain Entity ranking and hierarchy
Coreference Model
33
• Alignment problems
– “their first Super Bowl victory” vs “their”
• Coreference introduce noises
– “the chiefs’” vs “the chiefs’ first championship”
• Missing syntacIc informaIon within sentence
Coreference Model Drawbacks
34
• Graph-convolution model
– Perform graph convolution
– Dependency parsing
Proposed Model
35
Graph-convolution model
• Dependency parsing graph that considers the
mention relation within sentence
• Coreference graph that considers the mention
relation between sentences
36
• Structures capturing implicit dependencies
between tokens in a sentence
Dependency Parsing
37
• Construct a dependence-graph and a
coreference graph
• Nodes are entity mentions with embedding
• Edges are the inner-sentence or inter-
sentence distance
Graph-convolution Model
38
• Distance on dependency graph defines as the
number of edges on the shortest path that connects
two head nouns of the mentions
Graph-convolution Model
39
• The distance on coreference graph defines as the size
of coreference cluster
Graph-convolution Model
40
• Given adjacency matrix A, degree matrix D,
IdenIty matrix I, and feature (menIon
embedding) matrix: X
• Normalized Laplacian Matrix
• A filter for dependency graph
Graph-convolution Model
41
• Graph convolution aims to update node
embeddings
• K-order convolution leverages K-order
neighbors of current nodes
• Updated embedding feature matrix is
Graph-convolution Model
42
• Combine the dependency graph mention
embedding and coreference graph
mention embedding
Graph-convolution Model
43
• Word embedding based on BERT
• Mention embedding via averaging tokens
• Coreference embedding based on SpanBERT
• Normalized embedding based on dependency
graph convolution and coreference graph
convolution
Graph-convolu=on Model
44
• Political news dataset NELA2017 with news
between April 2017 and October 2017.
• Sample of 60 articles
• Annotated entity by three annotators
• Each annotator annotated all the entity in the
text and marked top 3, top 5 and top 10
Data Annotation
45
Outline
• Introduction
• Approaches
• Experiments
• Analysis
• Conclusion
• Reference
46
Data Annotation
47
• EnIty staIsIcs in annotated arIcles
• Total enIty annotated vs enIty length
Data Exploration
48
• Perform baseline, coreference model, and
graph-based model on 60 articles
Experimental Results
49
Outline
• Introduction
• Approaches
• Experiments
• Analysis
• Conclusion
• Reference
50
WASHINGTON NFL Sunday kicked off in London with more kneeling during the
national anthem, as President Trump continues to admonish players who don’t stand
for the flag.
Three Miami Dolphins players were spotted taking a knee on the sideline during
the singing of the national anthem: Kenny Stills, Michael Thomas and Julius Thomas.
The three kneeled side-by-side at one of the sidelines while the rest of their team stood
for the anthem.
Meanwhile, their opposing team, the New Orleans Saints, took a unified knee on
the sidelines but rose in time for the national anthem. Many of Saints linked arms as
Grammy-winning artist Darius Rucker sang the anthem in Wembley Stadium, the first
televised game of the day on Fox. The cameras didn’t zoom in on the Dolphins
kneelers, but attendees and journalists in London quickly posted pictures of the trio of
kneelers on social media.
President Trump has blasted a kneeling player as a son of a bitch and urged
owners to fire athletes who don’t stand up for the flag.
Ar=cle Example
51
• Right: embedding based on BERT
• left: embedding based on BERT & Coreference
Entity Embedding Comparison
52
• baseline, coreference model, graph-based model
with coreference embedding, and graph-based
model without coreference embedding
Sparse and Density
53
MST on Cluster Graph
54
• Top 10 enIty (clusters) and their ranking
Entity Extraction & Ranking
55
• Meaningful hierarchy with importance
Entity Hierarchy
56
Outline
• Introduction
• Approaches
• Experiments
• Analysis
• Conclusion
• Reference
57
• Unsupervised approaches for entity extraction and
ranking demonstrate promising results
• Contextualized word & phrase embeddings show
considerable semantics
• Coreference-based and parsing-based graph convolution
for embedding normalization archive good results
• Constituency parsing and dependency parsing performs
significant syntactics
Conclusion
58
• Proposed a novel task with entity extraction, entity ranking,
and entity hierarchy
• Perform unsupervised learning without training
• Collaboratively annotated 60 articles based on NELA2017
• Demonstrated a simple yet effective normalization
mechanism with coreference resolution
• Employ entity relation graph for graph convolution
Contribution
59
• Construct more meaningful graph that could be used
to update the node embeddings
• Fine-tune SpanBERT on NELA2017 to improve
mention representation
• Leverage title and position features to improve entity
ranking
• Apply entity extraction and ranking to event
Further Works
60
Outline
• Introduction
• Approaches
• Experiments
• Analysis
• Conclusion
• Reference
61
[1] Alan Akbik, Tanja Bergmann, and Roland Vollgraf. Pooled contextualized em- beddings for named enBty recogniBon. In Proceedings of the 2019 Conference of the North American Chapter of the
AssociaBon for ComputaBonal Linguis- Bcs: Human Language Technologies, Volume 1 (Long and Short Papers), pages 724–728, 2019.
[2] Felipe Almeida and Geraldo Xex ́eo. Word embeddings: A survey. arXiv preprint arXiv:1901.09069, 2019.
[3] Gabriel Bernier-Colborne. IdenBfying semanBc relaBons in a specialized corpus through distribuBonal analysis of a cooccurrence tensor. In Proceedings of the Third Joint Conference on Lexical and
ComputaBonal SemanBcs (* SEM 2014), pages 57–62, 2014.
[4] Parminder BhaBa, E Busra Celikkaya, and Mohammed Khalilia. End-to-end joint enBty extracBon and negaBon detecBon for clinical text. In InternaBonal Workshop on Health Intelligence, pages
139–148. Springer, 2019.
[5] Ann Bies, Mark Ferguson, Karen Katz, Robert MacIntyre, Victoria Tredinnick, Grace Kim, Mary Ann Marcinkiewicz, and Briea Schasberger. BrackeBng guide- lines for treebank ii style penn treebank
project. University of Pennsylvania, 97: 100, 1995.
[6] Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. Spectral net- works and locally connected networks on graphs. arXiv preprint arXiv:1312.6203, 2013.
[7] Jose Camacho-Collados and Mohammad Taher Pilehvar. From word to sense embeddings: A survey on vector representaBons of meaning. J. ArBf. Int. Res., 63(1):743788, September 2018. ISSN
1076-9757. doi: 10.1613/jair.1.11259. URL heps://doi.org/10.1613/jair.1.11259.
[8] Ricardo JGB Campello, Davoud Moulavi, and J ̈org Sander. Density-based clus- tering based on hierarchical density esBmates. In Pacific-Asia conference on knowledge discovery and data mining,
pages 160–172. Springer, 2013.
[9] Jing Chen, Chenyan Xiong, and Jamie Callan. An empirical study of learning to rank for enBty search. In Proceedings of the 39th InternaBonal ACM SIGIR conference on Research and Development in
InformaBon Retrieval, pages 737– 740, 2016.
[10] Fenia Christopoulou, Makoto Miwa, and Sophia Ananiadou. A walk-based model on enBty graphs for relaBon extracBon. arXiv preprint arXiv:1902.07023, 2019.
[11] Scoe Deerwester, Susan T Dumais, George W Furnas, Thomas K Landauer, and Richard Harshman. Indexing by latent semanBc analysis. Journal of the American society for informaBon science,
41(6):391–407, 1990.
[12] Gianluca DemarBni, Tereza Iofciu, and Arjen P De Vries. Overview of the inex 2009 enBty ranking track. In InternaBonal Workshop of the IniBaBve for the EvaluaBon of XML Retrieval, pages 254–
264. Springer, 2009.
[13] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and KrisBna Toutanova. Bert: Pre- training of deep bidirecBonal transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
[14] JesseDunietzandDanGillick.AnewenBtysaliencetaskwithmillionsonraining examples. In Proceedings of the 14th Conference of the European Chapter of the AssociaBon for ComputaBonal
LinguisBcs, volume 2: Short Papers, pages 205– 209, 2014.
[15] Oren Etzioni, Michael Cafarella, Doug Downey, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S Weld, and Alexander Yates. Unsupervised named- enBty extracBon from the web: An
experimental study. ArBficial intelligence, 165(1):91–134, 2005.
[16] YunBan Feng, Hongjun Zhang, Wenning Hao, and Gang Chen. Joint extracBon of enBBes and relaBons using reinforcement learning and deep learning. Com- putaBonal intelligence and
neuroscience, 2017, 2017.
[17] Kavita Ganesan and Chengxiang Zhai. Opinion-based enBty ranking. Informa- Bon retrieval, 15(2):116–150, 2012.
[18] Palash Goyal and Emilio Ferrara. Graph embedding techniques, applicaBons, and performance: A survey. Knowledge-Based Systems, 151:78–94, 2018.
[19] Hannaneh Hajishirzi, Leila Zilles, Daniel S Weld, and Luke Zeelemoyer. Joint coreference resoluBon and named-enBty linking with mulB-pass sieves. In Pro- ceedings of the 2013 Conference on
Empirical Methods in Natural Language Pro- cessing, pages 289–299, 2013.
[20] Kazi Saidul Hasan and Vincent Ng. AutomaBc keyphrase extracBon: A survey of the state of the art. In Proceedings of the 52nd Annual MeeBng of the AssociaBon for ComputaBonal LinguisBcs
(Volume 1: Long Papers), pages 1262–1273, 2014.
Reference
62
[21] Takaaki Hasegawa, Satoshi Sekine, and Ralph Grishman. Discovering relaBons among named enBBes from large corpora. In Proceedings of the 42nd annual
meeBng on associaBon for computaBonal linguisBcs, page 415. AssociaBon for ComputaBonal LinguisBcs, 2004.
[22] Mikael Henaff, Joan Bruna, and Yann LeCun. Deep convoluBonal networks on graph-structured data. arXiv preprint arXiv:1506.05163, 2015.
[23] Dan Hendrycks and Kevin Gimpel. Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415, 2016.
[24] Johannes Hoffart, Mohamed Amir Yosef, Ilaria Bordino, Hagen Fu ̈rstenau, Man- fred Pinkal, Marc Spaniol, Bilyana Taneva, Stefan Thater, and Gerhard Weikum. Robust disambiguaBon of
named enBBes in text. In Proceedings of the Con- ference on Empirical Methods in Natural Language Processing, pages 782–792. AssociaBon for ComputaBonal LinguisBcs, 2011.
[25] Benjamin Horne. NELA2017, 2019. URL heps://doi.org/10.7910/DVN/ ZCXSKG.
[26] Alpa Jain and Marco Pennacchioq. Open enBty extracBon from web search query logs. In Proceedings of the 23rd InternaBonal Conference on Computa- Bonal LinguisBcs, COLING 10, page
510518, USA, 2010. AssociaBon for Com- putaBonal LinguisBcs.
[27] Xin Jiang, Yunhua Hu, and Hang Li. A ranking approach to keyphrase ex- tracBon. In Proceedings of the 32nd internaBonal ACM SIGIR conference on Research and development in informaBon
retrieval, pages 756–757, 2009.
[28] Mandar Joshi, Danqi Chen, Yinhan Liu, Daniel S Weld, Luke Zeelemoyer, and Omer Levy. Spanbert: Improving pre-training by represenBng and predicBng spans. arXiv preprint
arXiv:1907.10529, 2019.
[29] Arzoo KaByar and Claire Cardie. Nested named enBty recogniBon revisited. In
Proceedings of the 2018 Conference of the North American Chapter of the Asso- ciaBon for ComputaBonal LinguisBcs: Human Language Technologies, Volume 1 (Long Papers), pages 861–871,
2018.
[30] Steven Kearnes, Kevin McCloskey, Marc Berndl, Vijay Pande, and Patrick Riley. Molecular graph convoluBons: moving beyond fingerprints. Journal of computer- aided molecular design,
30(8):595–608, 2016.
[31] Thomas N Kipf and Max Welling. Semi-supervised classificaBon with graph convoluBonal networks. arXiv preprint arXiv:1609.02907, 2016.
[32] R ́emi Lebret and Ronan Collobert. Word emdeddings through hellinger pca. arXiv preprint arXiv:1312.5542, 2013.
[33] Kenton Lee, Luheng He, Mike Lewis, and Luke Zeelemoyer. End-to-end neural coreference resoluBon. arXiv preprint arXiv:1707.07045, 2017.
[34] Kenton Lee, Luheng He, and Luke Zeelemoyer. Higher-order coreference reso- luBon with coarse-to-fine inference. arXiv preprint arXiv:1804.05392, 2018.
[35] Qimai Li, Xiao-Ming Wu, Han Liu, Xiaotong Zhang, and Zhichao Guan. La- bel efficient semi-supervised learning via graph filtering. In Proceedings of the IEEE Conference on Computer Vision
and Paeern RecogniBon, pages 9582– 9591, 2019.
[36] Zhiyuan Liu, Wenyi Huang, Yabin Zheng, and Maosong Sun. AutomaBc keyphrase extracBon via topic decomposiBon. In Proceedings of the 2010 con- ference on empirical methods in natural
language processing, pages 366–376. As- sociaBon for ComputaBonal LinguisBcs, 2010.
[37] Peter Lofgren, Siddhartha Banerjee, and Ashish Goel. Personalized pagerank esBmaBon and search: A bidirecBonal approach. In Proceedings of the Ninth ACM InternaBonal Conference on
Web Search and Data Mining, pages 163– 172, 2016.
[38] Yi Luan, Dave Wadden, Luheng He, Amy Shah, Mari Ostendorf, and Hannaneh Hajishirzi. A general framework for informaBon extracBon using dynamic span graphs. arXiv preprint
arXiv:1904.03296, 2019.
[39] Kevin Lund and Curt Burgess. Producing high-dimensional semanBc spaces from lexical co-occurrence. Behavior research methods, instruments, & computers, 28 (2):203–208, 1996.
[40] Mitchell Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz. Building a large annotated corpus of english: The penn treebank. 1993.
Reference
63
[41] Fionn Murtagh and Pedro Contreras. Algorithms for hierarchical clustering: an overview. WIREs Data Mining and Knowledge Discovery, 2(1):86–97, 2012.
doi: 10.1002/widm.53. URL https://onlinelibrary.wiley.com/doi/abs/ 10.1002/widm.53.
[42] Vincent Ng and Claire Cardie. Improving machine learning approaches to coref- erence resolution. In Proceedings of the 40th annual meeting on association for computational linguistics, pages 104–111. Association
for Computational Lin- guistics, 2002.
[43] Thien Huu Nguyen and Ralph Grishman. Graph convolutional networks with argument-aware pooling for event detection. In Thirty-second AAAI conference on artificial intelligence, 2018.
[44] Xiaodong Ning, Lina Yao, Boualem Benatallah, Yihong Zhang, Quan Z Sheng, and Salil S Kanhere. Source-aware crisis-relevant tweet identification and key
information summarization. ACM Transactions on Internet Technology (TOIT), 19(3):1–20, 2019.
[45] Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The pager- ank citation ranking: Bringing order to the web. Technical report, Stanford InfoLab, 1999.
[46] Nanyun Peng, Hoifung Poon, Chris Quirk, Kristina Toutanova, and Wen-tau Yih. Cross-sentence n-ary relation extraction with graph lstms. Transactions of the Association for Computational Linguistics, 5:101–115, 2017.
[47] Jeffrey Pennington, Richard Socher, and Christopher D Manning. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empir- ical methods in natural language processing (EMNLP),
pages 1532–1543, 2014.
[48] Matthew E Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. Deep contextualized word represen- tations. arXiv preprint arXiv:1802.05365, 2018.
[49] Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. Deep contextualized word representa- tions. CoRR, abs/1802.05365, 2018. URL
http://arxiv.org/abs/1802.05365.
[50] Sameer Pradhan, Alessandro Moschitti, Nianwen Xue, Olga Uryupina, and Yuchen Zhang. Conll-2012 shared task: Modeling multilingual unrestricted coref- erence in ontonotes. In Joint Conference on EMNLP and
CoNLL-Shared Task, pages 1–40. Association for Computational Linguistics, 2012.
[51] Douglas LT Rohde, Laura M Gonnerman, and David C Plaut. An improved method for deriving word meaning from lexical co-occurrence. Cognitive Psy- chology, 7:573–605, 2004.
[52] Sunil Kumar Sahu, Fenia Christopoulou, Makoto Miwa, and Sophia Ananiadou. Inter-sentence relation extraction with document-level graph convolutional neu- ral network. arXiv preprint arXiv:1906.04684, 2019.
[53] Richard Socher, John Bauer, Christopher D Manning, and Andrew Y Ng. Parsing with compositional vector grammars. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume
1: Long Papers), pages 455–465, 2013.
[54] Wee Meng Soon, Hwee Tou Ng, and Daniel Chung Yong Lim. A machine learning approach to coreference resolution of noun phrases. Computational linguistics, 27(4):521–544, 2001.
[55] Peter D Turney and Patrick Pantel. From frequency to meaning: Vector space models of semantics. Journal of artificial intelligence research, 37:141–188, 2010.
[56] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, L# ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in neural information processing systems, pages
5998–6008, 2017.
[57] Alex D Wade, Kuansan Wang, Yizhou Sun, and Antonio Gulli. Wsdm cup 2016: Entity ranking challenge. In Proceedings of the ninth ACM international conference on web search and data mining, pages 593–594, 2016.
[58] Chengyu Wang, Guomin Zhou, Xiaofeng He, and Aoying Zhou. Nerank+: a graph-based approach for entity ranking in document collections. Frontiers of Computer Science, 12(3):504–517, 2018.
[59] Yi-fang Brook Wu, Quanzhi Li, Razvan Stefan Bot, and Xin Chen. Domain- specific keyphrase extraction. In Proceedings of the 14th ACM international conference on Information and knowledge management, pages
283–284, 2005.
[60] Chenyan Xiong, Russell Power, and Jamie Callan. Explicit semantic ranking for academic search via knowledge graph embedding. In Proceedings of the 26th international conference on world wide web, pages 1271–
1279, 2017.
[61] Ying Xiong, Yedan Shen, Yuanhang Huang, Shuai Chen, Buzhou Tang, Xiaolong Wang, Qingcai Chen, Jun Yan, and Yi Zhou. A deep learning-based system for pharmaconer. In Proceedings of The 5th Workshop on BioNLP
Open Shared Tasks, pages 33–37, 2019.
[62] Vikas Yadav and Steven Bethard. A survey on recent advances in named entity recognition from deep learning models, 2019.
[63] Bishan Yang, Wen-tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. Embed- ding entities and relations for learning and inference in knowledge bases. arXiv preprint arXiv:1412.6575, 2014.
[64] Meishan Zhang, Yue Zhang, and Guohong Fu. End-to-end neural relation ex- traction with global optimization. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1730–
1740, 2017.
[65] Yuhao Zhang, Peng Qi, and Christopher D Manning. Graph convolution over pruned dependency trees improves relation extraction. arXiv preprint arXiv:1809.10185, 2018.
Reference

More Related Content

Similar to Hierarchical Entity Extraction and Ranking with Unsupervised Graph Convolutions

RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
Joaquin Delgado PhD.
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
S. Diana Hu
 
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter BonczFOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
Ioan Toma
 
naveed-kamran-software-architecture-agile
naveed-kamran-software-architecture-agilenaveed-kamran-software-architecture-agile
naveed-kamran-software-architecture-agile
Naveed Kamran
 

Similar to Hierarchical Entity Extraction and Ranking with Unsupervised Graph Convolutions (20)

How Graph Technology is Changing AI
How Graph Technology is Changing AIHow Graph Technology is Changing AI
How Graph Technology is Changing AI
 
Using the Chebotko Method to Design Sound and Scalable Data Models for Apache...
Using the Chebotko Method to Design Sound and Scalable Data Models for Apache...Using the Chebotko Method to Design Sound and Scalable Data Models for Apache...
Using the Chebotko Method to Design Sound and Scalable Data Models for Apache...
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 
Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...
Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...
Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...
 
Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-S...
Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-S...Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-S...
Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-S...
 
CiteSeerX: Mining Scholarly Big Data
CiteSeerX: Mining Scholarly Big DataCiteSeerX: Mining Scholarly Big Data
CiteSeerX: Mining Scholarly Big Data
 
Conceptos básicos. Seminario web 1: Introducción a NoSQL
Conceptos básicos. Seminario web 1: Introducción a NoSQLConceptos básicos. Seminario web 1: Introducción a NoSQL
Conceptos básicos. Seminario web 1: Introducción a NoSQL
 
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter BonczFOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
 
Iterative Graph Computation in the Big Data Era
Iterative Graph Computation in the Big Data EraIterative Graph Computation in the Big Data Era
Iterative Graph Computation in the Big Data Era
 
FOSDEM 2014: Social Network Benchmark (SNB) Graph Generator
FOSDEM 2014:  Social Network Benchmark (SNB) Graph GeneratorFOSDEM 2014:  Social Network Benchmark (SNB) Graph Generator
FOSDEM 2014: Social Network Benchmark (SNB) Graph Generator
 
An Introduction To Software Development - Architecture & Detailed Design
An Introduction To Software Development - Architecture & Detailed DesignAn Introduction To Software Development - Architecture & Detailed Design
An Introduction To Software Development - Architecture & Detailed Design
 
Big Data Day LA 2015 - Spark after Dark by Chris Fregly of Databricks
Big Data Day LA 2015 - Spark after Dark by Chris Fregly of DatabricksBig Data Day LA 2015 - Spark after Dark by Chris Fregly of Databricks
Big Data Day LA 2015 - Spark after Dark by Chris Fregly of Databricks
 
IMCSummit 2015 - Day 1 Developer Track - Spark After Dark: Generating High Qu...
IMCSummit 2015 - Day 1 Developer Track - Spark After Dark: Generating High Qu...IMCSummit 2015 - Day 1 Developer Track - Spark After Dark: Generating High Qu...
IMCSummit 2015 - Day 1 Developer Track - Spark After Dark: Generating High Qu...
 
A task-based scientific paper recommender system for literature review and ma...
A task-based scientific paper recommender system for literature review and ma...A task-based scientific paper recommender system for literature review and ma...
A task-based scientific paper recommender system for literature review and ma...
 
naveed-kamran-software-architecture-agile
naveed-kamran-software-architecture-agilenaveed-kamran-software-architecture-agile
naveed-kamran-software-architecture-agile
 
ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+ ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+
 
Netizen style commenting on fashion photos
Netizen style commenting on fashion photosNetizen style commenting on fashion photos
Netizen style commenting on fashion photos
 
Search quality in practice
Search quality in practiceSearch quality in practice
Search quality in practice
 
Table Retrieval and Generation
Table Retrieval and GenerationTable Retrieval and Generation
Table Retrieval and Generation
 

More from Jinho Choi

More from Jinho Choi (20)

Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
 
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
 
Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Competence-Level Prediction and Resume & Job Description Matching Using Conte...Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Competence-Level Prediction and Resume & Job Description Matching Using Conte...
 
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
 
The Myth of Higher-Order Inference in Coreference Resolution
The Myth of Higher-Order Inference in Coreference ResolutionThe Myth of Higher-Order Inference in Coreference Resolution
The Myth of Higher-Order Inference in Coreference Resolution
 
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
 
Abstract Meaning Representation
Abstract Meaning RepresentationAbstract Meaning Representation
Abstract Meaning Representation
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role Labeling
 
CKY Parsing
CKY ParsingCKY Parsing
CKY Parsing
 
CS329 - WordNet Similarities
CS329 - WordNet SimilaritiesCS329 - WordNet Similarities
CS329 - WordNet Similarities
 
CS329 - Lexical Relations
CS329 - Lexical RelationsCS329 - Lexical Relations
CS329 - Lexical Relations
 
Automatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue ManagementAutomatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue Management
 
Attention is All You Need for AMR Parsing
Attention is All You Need for AMR ParsingAttention is All You Need for AMR Parsing
Attention is All You Need for AMR Parsing
 
Graph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to DialogueGraph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to Dialogue
 
Real-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue UnderstandingReal-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue Understanding
 
Topological Sort
Topological SortTopological Sort
Topological Sort
 
Tries - Put
Tries - PutTries - Put
Tries - Put
 
Multi-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's DiseaseMulti-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's Disease
 
Building Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue ContextsBuilding Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue Contexts
 
How to make Emora talk about Sports Intelligently
How to make Emora talk about Sports IntelligentlyHow to make Emora talk about Sports Intelligently
How to make Emora talk about Sports Intelligently
 

Recently uploaded

Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 

Recently uploaded (20)

Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 

Hierarchical Entity Extraction and Ranking with Unsupervised Graph Convolutions

  • 1. 1 Hierarchical Entity Extraction and Ranking with Unsupervised Graph Convolutions Presenter: Zhexiong Liu Advisor: Jinho D. Choi
  • 2. 2 Outline • Introduction • Approaches • Experiments • Analysis • Conclusion • Reference
  • 3. 3 Outline • Introduction • Approaches • Experiments • Analysis • Conclusion • Reference
  • 4. 4 Informa(on Extrac(on aims to extract structured data from unstructured data • Named En((es: Persons, Organiza.ons, Loca.ons, etc • A6ributes: Name, Descriptors, Categories, etc • Rela(onships: Person works for Organiza.on • Events: Kansas Chiefs won Super Bowl 2020 Background
  • 5. 5 • Keyword Extraction – Generates a list of phrases from texts – Without coreference – Without importance Related Works
  • 6. 6 • Named Entity Extraction – Finding named entities in a text – Conduct classifications – Without coreference – Without importance Related Works
  • 7. 7 • Entity Salience Detection – Identify main entities – Binary categories – Without ranked order – Without coreference Related Works
  • 8. 8 • This thesis focus – Entities Extraction – Entity Ranking “Three Miami Dolphins players were spotted taking a knee on the sideline during the singing of the national anthem: Kenny Stills, Michael Thomas and Julius Thomas.” Proposed Task
  • 9. 9 “Three Miami Dolphins players were spotted taking a knee on the sideline during the singing of the national anthem: Kenny Stills, Michael Thomas and Julius Thomas.” 1. the national anthem 2. a knee 3. Three Miami Dolphins players Proposed Task
  • 10. 10 Outline • Introduction • Approaches • Experiments • Analysis • Conclusion • Reference
  • 11. 11 Input: • Given a document D with sequence Outputs: • Extracted en((es • En(ty rank scores • En(ty hierarchy Task Formulation
  • 12. 12 • Investigate unsupervised approaches for entity extraction and ranking • Determine the effectiveness of contextualized embeddings for mention representation • Exam coreference-based and parsing-based graph convolution for embedding normalization Research Questions
  • 13. 13 • Baseline model – Constituency parsing – Mention embedding Proposed Approaches
  • 14. 14 • The purpose of constituency parsing is to obtain mention candidates • Process plaint data without labels • Develop noun phrase trunker based on constituency parsing tree Cons=tuency parsing
  • 15. 15 Constituency parsing “Kansas City Chiefs won their first Super Bowl on February 2020”
  • 16. 16 • Rooted trees representing the syntactic structures • Trunked phrases may contain multiple or single tokens (words) “Kansas City Chiefs won their first Super Bowl on February 2020” Constituency parsing
  • 17. 17 • Map word into vectors • Contextualized world embedding Word Embedding
  • 20. 20 • Sum last four layers of BERT to obtain contextualized word embedding • Average each word embeddings in a mention to present phrase embeddings • Map phrases (mentions) into vectors Phrase Embedding
  • 21. 21 • A graph for clustering • Nodes are menIon set • Each node has a phrase embedding • Edges distance between and Construct Cluster Graph
  • 22. 22 • The density of adjacency matrix for cluster graph Perform Clustering
  • 23. 23 HDBSCAN Clustering • No need to indicate the number of clusters • Cluster with noises, which could be singletons • Perform hierarchy for cluster results Minimum Spanning Tree Single- linkage Hierarchy Cluster Graph Cluster results
  • 24. 24 Entity Ranking • Based on the mention hierarchy, each mention has a score • Each clusters or singletons maintain a score
  • 25. 25 • Heavily rely on contextualized embeddings • Cannot handle syntactics in the sentences • May extract duplicate entities • Cluster graph is not quite sparse-dense oriented to perform HDBSCAN Baseline Drawbacks
  • 26. 26 • Coreference model – En(ty coreference – Men(on normaliza(on Proposed Approaches
  • 27. 27 • Link the entity mentions that refer to the same entity across document. Coreference Resolution
  • 28. 28 • Based on BERT model • Mask spans instead of tokens SpanBERT Coreference
  • 29. 29 Problem: Two mentions have different spans • Mention cluster from SpanBERT: – {Kansas City Chiefs, This team, their, the Chiefs} • Mention extracted from Constituency Parsing: – {Kansas City Chiefs, their first Super Bowl victory, the chiefs' first NFL champion} Men=on Alignments
  • 30. 30 Alignment rule: Half-Half • If two mentions have half overlap, the parser- based one is assigned to an entity cluster. • Convert coreference-based mention cluster to parser-based mention cluster Mention Alignments
  • 31. 31 • Normalize mention embedding • Mentions outside coreference clusters • Mentions inside coreference clusters Coreference Model
  • 32. 32 • Construct cluster graph • Each node has normalized embedding based on coreference resolution cluster it it fall into a cluster, otherwise, • Edges distance • Perform HDBSCAN • Obtain Entity ranking and hierarchy Coreference Model
  • 33. 33 • Alignment problems – “their first Super Bowl victory” vs “their” • Coreference introduce noises – “the chiefs’” vs “the chiefs’ first championship” • Missing syntacIc informaIon within sentence Coreference Model Drawbacks
  • 34. 34 • Graph-convolution model – Perform graph convolution – Dependency parsing Proposed Model
  • 35. 35 Graph-convolution model • Dependency parsing graph that considers the mention relation within sentence • Coreference graph that considers the mention relation between sentences
  • 36. 36 • Structures capturing implicit dependencies between tokens in a sentence Dependency Parsing
  • 37. 37 • Construct a dependence-graph and a coreference graph • Nodes are entity mentions with embedding • Edges are the inner-sentence or inter- sentence distance Graph-convolution Model
  • 38. 38 • Distance on dependency graph defines as the number of edges on the shortest path that connects two head nouns of the mentions Graph-convolution Model
  • 39. 39 • The distance on coreference graph defines as the size of coreference cluster Graph-convolution Model
  • 40. 40 • Given adjacency matrix A, degree matrix D, IdenIty matrix I, and feature (menIon embedding) matrix: X • Normalized Laplacian Matrix • A filter for dependency graph Graph-convolution Model
  • 41. 41 • Graph convolution aims to update node embeddings • K-order convolution leverages K-order neighbors of current nodes • Updated embedding feature matrix is Graph-convolution Model
  • 42. 42 • Combine the dependency graph mention embedding and coreference graph mention embedding Graph-convolution Model
  • 43. 43 • Word embedding based on BERT • Mention embedding via averaging tokens • Coreference embedding based on SpanBERT • Normalized embedding based on dependency graph convolution and coreference graph convolution Graph-convolu=on Model
  • 44. 44 • Political news dataset NELA2017 with news between April 2017 and October 2017. • Sample of 60 articles • Annotated entity by three annotators • Each annotator annotated all the entity in the text and marked top 3, top 5 and top 10 Data Annotation
  • 45. 45 Outline • Introduction • Approaches • Experiments • Analysis • Conclusion • Reference
  • 47. 47 • EnIty staIsIcs in annotated arIcles • Total enIty annotated vs enIty length Data Exploration
  • 48. 48 • Perform baseline, coreference model, and graph-based model on 60 articles Experimental Results
  • 49. 49 Outline • Introduction • Approaches • Experiments • Analysis • Conclusion • Reference
  • 50. 50 WASHINGTON NFL Sunday kicked off in London with more kneeling during the national anthem, as President Trump continues to admonish players who don’t stand for the flag. Three Miami Dolphins players were spotted taking a knee on the sideline during the singing of the national anthem: Kenny Stills, Michael Thomas and Julius Thomas. The three kneeled side-by-side at one of the sidelines while the rest of their team stood for the anthem. Meanwhile, their opposing team, the New Orleans Saints, took a unified knee on the sidelines but rose in time for the national anthem. Many of Saints linked arms as Grammy-winning artist Darius Rucker sang the anthem in Wembley Stadium, the first televised game of the day on Fox. The cameras didn’t zoom in on the Dolphins kneelers, but attendees and journalists in London quickly posted pictures of the trio of kneelers on social media. President Trump has blasted a kneeling player as a son of a bitch and urged owners to fire athletes who don’t stand up for the flag. Ar=cle Example
  • 51. 51 • Right: embedding based on BERT • left: embedding based on BERT & Coreference Entity Embedding Comparison
  • 52. 52 • baseline, coreference model, graph-based model with coreference embedding, and graph-based model without coreference embedding Sparse and Density
  • 54. 54 • Top 10 enIty (clusters) and their ranking Entity Extraction & Ranking
  • 55. 55 • Meaningful hierarchy with importance Entity Hierarchy
  • 56. 56 Outline • Introduction • Approaches • Experiments • Analysis • Conclusion • Reference
  • 57. 57 • Unsupervised approaches for entity extraction and ranking demonstrate promising results • Contextualized word & phrase embeddings show considerable semantics • Coreference-based and parsing-based graph convolution for embedding normalization archive good results • Constituency parsing and dependency parsing performs significant syntactics Conclusion
  • 58. 58 • Proposed a novel task with entity extraction, entity ranking, and entity hierarchy • Perform unsupervised learning without training • Collaboratively annotated 60 articles based on NELA2017 • Demonstrated a simple yet effective normalization mechanism with coreference resolution • Employ entity relation graph for graph convolution Contribution
  • 59. 59 • Construct more meaningful graph that could be used to update the node embeddings • Fine-tune SpanBERT on NELA2017 to improve mention representation • Leverage title and position features to improve entity ranking • Apply entity extraction and ranking to event Further Works
  • 60. 60 Outline • Introduction • Approaches • Experiments • Analysis • Conclusion • Reference
  • 61. 61 [1] Alan Akbik, Tanja Bergmann, and Roland Vollgraf. Pooled contextualized em- beddings for named enBty recogniBon. In Proceedings of the 2019 Conference of the North American Chapter of the AssociaBon for ComputaBonal Linguis- Bcs: Human Language Technologies, Volume 1 (Long and Short Papers), pages 724–728, 2019. [2] Felipe Almeida and Geraldo Xex ́eo. Word embeddings: A survey. arXiv preprint arXiv:1901.09069, 2019. [3] Gabriel Bernier-Colborne. IdenBfying semanBc relaBons in a specialized corpus through distribuBonal analysis of a cooccurrence tensor. In Proceedings of the Third Joint Conference on Lexical and ComputaBonal SemanBcs (* SEM 2014), pages 57–62, 2014. [4] Parminder BhaBa, E Busra Celikkaya, and Mohammed Khalilia. End-to-end joint enBty extracBon and negaBon detecBon for clinical text. In InternaBonal Workshop on Health Intelligence, pages 139–148. Springer, 2019. [5] Ann Bies, Mark Ferguson, Karen Katz, Robert MacIntyre, Victoria Tredinnick, Grace Kim, Mary Ann Marcinkiewicz, and Briea Schasberger. BrackeBng guide- lines for treebank ii style penn treebank project. University of Pennsylvania, 97: 100, 1995. [6] Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. Spectral net- works and locally connected networks on graphs. arXiv preprint arXiv:1312.6203, 2013. [7] Jose Camacho-Collados and Mohammad Taher Pilehvar. From word to sense embeddings: A survey on vector representaBons of meaning. J. ArBf. Int. Res., 63(1):743788, September 2018. ISSN 1076-9757. doi: 10.1613/jair.1.11259. URL heps://doi.org/10.1613/jair.1.11259. [8] Ricardo JGB Campello, Davoud Moulavi, and J ̈org Sander. Density-based clus- tering based on hierarchical density esBmates. In Pacific-Asia conference on knowledge discovery and data mining, pages 160–172. Springer, 2013. [9] Jing Chen, Chenyan Xiong, and Jamie Callan. An empirical study of learning to rank for enBty search. In Proceedings of the 39th InternaBonal ACM SIGIR conference on Research and Development in InformaBon Retrieval, pages 737– 740, 2016. [10] Fenia Christopoulou, Makoto Miwa, and Sophia Ananiadou. A walk-based model on enBty graphs for relaBon extracBon. arXiv preprint arXiv:1902.07023, 2019. [11] Scoe Deerwester, Susan T Dumais, George W Furnas, Thomas K Landauer, and Richard Harshman. Indexing by latent semanBc analysis. Journal of the American society for informaBon science, 41(6):391–407, 1990. [12] Gianluca DemarBni, Tereza Iofciu, and Arjen P De Vries. Overview of the inex 2009 enBty ranking track. In InternaBonal Workshop of the IniBaBve for the EvaluaBon of XML Retrieval, pages 254– 264. Springer, 2009. [13] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and KrisBna Toutanova. Bert: Pre- training of deep bidirecBonal transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018. [14] JesseDunietzandDanGillick.AnewenBtysaliencetaskwithmillionsonraining examples. In Proceedings of the 14th Conference of the European Chapter of the AssociaBon for ComputaBonal LinguisBcs, volume 2: Short Papers, pages 205– 209, 2014. [15] Oren Etzioni, Michael Cafarella, Doug Downey, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S Weld, and Alexander Yates. Unsupervised named- enBty extracBon from the web: An experimental study. ArBficial intelligence, 165(1):91–134, 2005. [16] YunBan Feng, Hongjun Zhang, Wenning Hao, and Gang Chen. Joint extracBon of enBBes and relaBons using reinforcement learning and deep learning. Com- putaBonal intelligence and neuroscience, 2017, 2017. [17] Kavita Ganesan and Chengxiang Zhai. Opinion-based enBty ranking. Informa- Bon retrieval, 15(2):116–150, 2012. [18] Palash Goyal and Emilio Ferrara. Graph embedding techniques, applicaBons, and performance: A survey. Knowledge-Based Systems, 151:78–94, 2018. [19] Hannaneh Hajishirzi, Leila Zilles, Daniel S Weld, and Luke Zeelemoyer. Joint coreference resoluBon and named-enBty linking with mulB-pass sieves. In Pro- ceedings of the 2013 Conference on Empirical Methods in Natural Language Pro- cessing, pages 289–299, 2013. [20] Kazi Saidul Hasan and Vincent Ng. AutomaBc keyphrase extracBon: A survey of the state of the art. In Proceedings of the 52nd Annual MeeBng of the AssociaBon for ComputaBonal LinguisBcs (Volume 1: Long Papers), pages 1262–1273, 2014. Reference
  • 62. 62 [21] Takaaki Hasegawa, Satoshi Sekine, and Ralph Grishman. Discovering relaBons among named enBBes from large corpora. In Proceedings of the 42nd annual meeBng on associaBon for computaBonal linguisBcs, page 415. AssociaBon for ComputaBonal LinguisBcs, 2004. [22] Mikael Henaff, Joan Bruna, and Yann LeCun. Deep convoluBonal networks on graph-structured data. arXiv preprint arXiv:1506.05163, 2015. [23] Dan Hendrycks and Kevin Gimpel. Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415, 2016. [24] Johannes Hoffart, Mohamed Amir Yosef, Ilaria Bordino, Hagen Fu ̈rstenau, Man- fred Pinkal, Marc Spaniol, Bilyana Taneva, Stefan Thater, and Gerhard Weikum. Robust disambiguaBon of named enBBes in text. In Proceedings of the Con- ference on Empirical Methods in Natural Language Processing, pages 782–792. AssociaBon for ComputaBonal LinguisBcs, 2011. [25] Benjamin Horne. NELA2017, 2019. URL heps://doi.org/10.7910/DVN/ ZCXSKG. [26] Alpa Jain and Marco Pennacchioq. Open enBty extracBon from web search query logs. In Proceedings of the 23rd InternaBonal Conference on Computa- Bonal LinguisBcs, COLING 10, page 510518, USA, 2010. AssociaBon for Com- putaBonal LinguisBcs. [27] Xin Jiang, Yunhua Hu, and Hang Li. A ranking approach to keyphrase ex- tracBon. In Proceedings of the 32nd internaBonal ACM SIGIR conference on Research and development in informaBon retrieval, pages 756–757, 2009. [28] Mandar Joshi, Danqi Chen, Yinhan Liu, Daniel S Weld, Luke Zeelemoyer, and Omer Levy. Spanbert: Improving pre-training by represenBng and predicBng spans. arXiv preprint arXiv:1907.10529, 2019. [29] Arzoo KaByar and Claire Cardie. Nested named enBty recogniBon revisited. In Proceedings of the 2018 Conference of the North American Chapter of the Asso- ciaBon for ComputaBonal LinguisBcs: Human Language Technologies, Volume 1 (Long Papers), pages 861–871, 2018. [30] Steven Kearnes, Kevin McCloskey, Marc Berndl, Vijay Pande, and Patrick Riley. Molecular graph convoluBons: moving beyond fingerprints. Journal of computer- aided molecular design, 30(8):595–608, 2016. [31] Thomas N Kipf and Max Welling. Semi-supervised classificaBon with graph convoluBonal networks. arXiv preprint arXiv:1609.02907, 2016. [32] R ́emi Lebret and Ronan Collobert. Word emdeddings through hellinger pca. arXiv preprint arXiv:1312.5542, 2013. [33] Kenton Lee, Luheng He, Mike Lewis, and Luke Zeelemoyer. End-to-end neural coreference resoluBon. arXiv preprint arXiv:1707.07045, 2017. [34] Kenton Lee, Luheng He, and Luke Zeelemoyer. Higher-order coreference reso- luBon with coarse-to-fine inference. arXiv preprint arXiv:1804.05392, 2018. [35] Qimai Li, Xiao-Ming Wu, Han Liu, Xiaotong Zhang, and Zhichao Guan. La- bel efficient semi-supervised learning via graph filtering. In Proceedings of the IEEE Conference on Computer Vision and Paeern RecogniBon, pages 9582– 9591, 2019. [36] Zhiyuan Liu, Wenyi Huang, Yabin Zheng, and Maosong Sun. AutomaBc keyphrase extracBon via topic decomposiBon. In Proceedings of the 2010 con- ference on empirical methods in natural language processing, pages 366–376. As- sociaBon for ComputaBonal LinguisBcs, 2010. [37] Peter Lofgren, Siddhartha Banerjee, and Ashish Goel. Personalized pagerank esBmaBon and search: A bidirecBonal approach. In Proceedings of the Ninth ACM InternaBonal Conference on Web Search and Data Mining, pages 163– 172, 2016. [38] Yi Luan, Dave Wadden, Luheng He, Amy Shah, Mari Ostendorf, and Hannaneh Hajishirzi. A general framework for informaBon extracBon using dynamic span graphs. arXiv preprint arXiv:1904.03296, 2019. [39] Kevin Lund and Curt Burgess. Producing high-dimensional semanBc spaces from lexical co-occurrence. Behavior research methods, instruments, & computers, 28 (2):203–208, 1996. [40] Mitchell Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz. Building a large annotated corpus of english: The penn treebank. 1993. Reference
  • 63. 63 [41] Fionn Murtagh and Pedro Contreras. Algorithms for hierarchical clustering: an overview. WIREs Data Mining and Knowledge Discovery, 2(1):86–97, 2012. doi: 10.1002/widm.53. URL https://onlinelibrary.wiley.com/doi/abs/ 10.1002/widm.53. [42] Vincent Ng and Claire Cardie. Improving machine learning approaches to coref- erence resolution. In Proceedings of the 40th annual meeting on association for computational linguistics, pages 104–111. Association for Computational Lin- guistics, 2002. [43] Thien Huu Nguyen and Ralph Grishman. Graph convolutional networks with argument-aware pooling for event detection. In Thirty-second AAAI conference on artificial intelligence, 2018. [44] Xiaodong Ning, Lina Yao, Boualem Benatallah, Yihong Zhang, Quan Z Sheng, and Salil S Kanhere. Source-aware crisis-relevant tweet identification and key information summarization. ACM Transactions on Internet Technology (TOIT), 19(3):1–20, 2019. [45] Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The pager- ank citation ranking: Bringing order to the web. Technical report, Stanford InfoLab, 1999. [46] Nanyun Peng, Hoifung Poon, Chris Quirk, Kristina Toutanova, and Wen-tau Yih. Cross-sentence n-ary relation extraction with graph lstms. Transactions of the Association for Computational Linguistics, 5:101–115, 2017. [47] Jeffrey Pennington, Richard Socher, and Christopher D Manning. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empir- ical methods in natural language processing (EMNLP), pages 1532–1543, 2014. [48] Matthew E Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. Deep contextualized word represen- tations. arXiv preprint arXiv:1802.05365, 2018. [49] Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. Deep contextualized word representa- tions. CoRR, abs/1802.05365, 2018. URL http://arxiv.org/abs/1802.05365. [50] Sameer Pradhan, Alessandro Moschitti, Nianwen Xue, Olga Uryupina, and Yuchen Zhang. Conll-2012 shared task: Modeling multilingual unrestricted coref- erence in ontonotes. In Joint Conference on EMNLP and CoNLL-Shared Task, pages 1–40. Association for Computational Linguistics, 2012. [51] Douglas LT Rohde, Laura M Gonnerman, and David C Plaut. An improved method for deriving word meaning from lexical co-occurrence. Cognitive Psy- chology, 7:573–605, 2004. [52] Sunil Kumar Sahu, Fenia Christopoulou, Makoto Miwa, and Sophia Ananiadou. Inter-sentence relation extraction with document-level graph convolutional neu- ral network. arXiv preprint arXiv:1906.04684, 2019. [53] Richard Socher, John Bauer, Christopher D Manning, and Andrew Y Ng. Parsing with compositional vector grammars. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 455–465, 2013. [54] Wee Meng Soon, Hwee Tou Ng, and Daniel Chung Yong Lim. A machine learning approach to coreference resolution of noun phrases. Computational linguistics, 27(4):521–544, 2001. [55] Peter D Turney and Patrick Pantel. From frequency to meaning: Vector space models of semantics. Journal of artificial intelligence research, 37:141–188, 2010. [56] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, L# ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in neural information processing systems, pages 5998–6008, 2017. [57] Alex D Wade, Kuansan Wang, Yizhou Sun, and Antonio Gulli. Wsdm cup 2016: Entity ranking challenge. In Proceedings of the ninth ACM international conference on web search and data mining, pages 593–594, 2016. [58] Chengyu Wang, Guomin Zhou, Xiaofeng He, and Aoying Zhou. Nerank+: a graph-based approach for entity ranking in document collections. Frontiers of Computer Science, 12(3):504–517, 2018. [59] Yi-fang Brook Wu, Quanzhi Li, Razvan Stefan Bot, and Xin Chen. Domain- specific keyphrase extraction. In Proceedings of the 14th ACM international conference on Information and knowledge management, pages 283–284, 2005. [60] Chenyan Xiong, Russell Power, and Jamie Callan. Explicit semantic ranking for academic search via knowledge graph embedding. In Proceedings of the 26th international conference on world wide web, pages 1271– 1279, 2017. [61] Ying Xiong, Yedan Shen, Yuanhang Huang, Shuai Chen, Buzhou Tang, Xiaolong Wang, Qingcai Chen, Jun Yan, and Yi Zhou. A deep learning-based system for pharmaconer. In Proceedings of The 5th Workshop on BioNLP Open Shared Tasks, pages 33–37, 2019. [62] Vikas Yadav and Steven Bethard. A survey on recent advances in named entity recognition from deep learning models, 2019. [63] Bishan Yang, Wen-tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. Embed- ding entities and relations for learning and inference in knowledge bases. arXiv preprint arXiv:1412.6575, 2014. [64] Meishan Zhang, Yue Zhang, and Guohong Fu. End-to-end neural relation ex- traction with global optimization. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1730– 1740, 2017. [65] Yuhao Zhang, Peng Qi, and Christopher D Manning. Graph convolution over pruned dependency trees improves relation extraction. arXiv preprint arXiv:1809.10185, 2018. Reference