SlideShare a Scribd company logo
1 of 15
Download to read offline
Concept Identification of
Directly and Indirectly Related Mentions
Referring to Groups of Persons
Anastasia Zhukova1 Felix Hamborg2,4 Karsten Donnay3,4 Bela Gipp1,4
1University of Wuppertal,
Germany
2University of Konstanz,
Germany
3University of Zurich,
Switzerland
4Heidelberg Academy of Sciences and
Humanities, Germany
07 February 2023
Motivation
07 February 2023 2
A. Zhukova et al. "Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons"
Migrant caravan of asylum seekers reaches U.S. border. By Sunday afternoon, a group of about 150 of the
caravan's members had begun crossing into the U.S.
People who request protection at a United States entry point must be referred to an asylum officer for a
screening. Asylum-seekers are typically held up to three days at the border.
The human stakes for the individual migrants planning to seek asylum Sunday were at least as high. About
80 U.S. families have also offered to sponsor migrants seeking asylum. Lawyers who went to Tijuana
denied any coaching of the roughly 400 people in the caravan.
Central American migrants and supporters of the migrant caravan from the U.S. side looking south into
Mexico on April 29, 2018.
The message was intended as a show of support for the Central American transgender women seeking
asylum.
These migrants will decide whether to present themselves to U.S Border officers at the San Ysidro port of
entry and apply for asylum.
Motivation
07 February 2023 3
A. Zhukova et al. "Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons"
Migrant caravan of asylum seekers reaches U.S. border. By Sunday afternoon, a group of about 150 of the
caravan's members had begun crossing into the U.S.
People who request protection at a United States entry point must be referred to an asylum officer for a
screening. Asylum-seekers are typically held up to three days at the border.
The human stakes for the individual migrants planning to seek asylum Sunday were at least as high. About
80 U.S. families have also offered to sponsor migrants seeking asylum. Lawyers who went to Tijuana
denied any coaching of the roughly 400 people in the caravan.
Central American migrants and supporters of the migrant caravan from the U.S. side looking south into
Mexico on April 29, 2018.
The message was intended as a show of support for the Central American transgender women seeking
asylum.
These migrants will decide whether to present themselves to U.S Border officers at the San Ysidro port of
entry and apply for asylum.
Named Entity Recognition & Coreference Resolution?
Motivation
07 February 2023 4
A. Zhukova et al. "Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons"
Migrant caravan of asylum seekers reaches U.S. border. By Sunday afternoon, a group of about 150 of the
caravan's members had begun crossing into the U.S.
People who request protection at a United States entry point must be referred to an asylum officer for a
screening. Asylum-seekers are typically held up to three days at the border.
The human stakes for the individual migrants planning to seek asylum Sunday were at least as high. About
80 U.S. families have also offered to sponsor migrants seeking asylum. Lawyers who went to Tijuana
denied any coaching of the roughly 400 people in the caravan.
Central American migrants and supporters of the migrant caravan from the U.S. side looking south into
Mexico on April 29, 2018.
The message was intended as a show of support for the Central American transgender women seeking
asylum.
These migrants will decide whether to present themselves to U.S Border officers at the San Ysidro port of
entry and apply for asylum.
Concept extraction: groups of persons
Research question
How to automatically identify
conceptually fine-grained clusters of related mentions
referring to groups of people
in an unsupervised way?
07 February 2023 5
A. Zhukova et al. "Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons"
White House officials
Trump Administration
U.S. officials
American diplomats
demonstrators
28,000 attendees of the demonstration
Mr. Trump’s critics
people opposing Trump’s visit
Direct mentions Indirect mentions
same group of people associated with geo-political entity or organization
Pipeline
07 February 2023 6
A. Zhukova et al. "Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons"
Main principles
• OPTICS merge points in decreasing density
• Hierarchical clustering aggregating linkage criteria to merge clusters
Border mentions and
non-core clusters
Cluster cores Cluster bodies Merge clusters
Preprocessing
07 February 2023 7
A. Zhukova et al. "Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons"
GOP RepublicansUnited_States U.S. American Americans Spanish Mexico
GOP
Republicans
United_States
U.S.
American
Americans
Spanish
Mexico
- extract noun phrases
- keep only headwords, adjectives,
compound, noun, and number modifiers
- vectorize words with word2vec
- construct named-entity grid
- vectorize phrase with averaging weighted words
- similarity = cosine similarity
Americans
U.S.
citizens
U.S. + citizens
Russian
Russian + citizens
Russians
Difference & similarity
between phrases is
hard to distinguish
Americans
U.S.
2 × U.S. + citizens
Russian
2 × Americans
2 × Russian + citizens
2 × Russians
citizens
Easier to identify
related phrases
Russians
“people from migrant caravan”
“people migrant caravan”
Cluster cores
1) A & B are similar to each other
2) A & B are similar to sufficient number
other phrases
3) Assemble similarity chains
Core phrases form distinctive initial clusters
𝑨~𝑩
𝑨~𝐁: 𝐂, 𝐃, 𝐄, 𝐆
𝐁~𝐀: 𝐂, 𝐃, 𝐄, 𝐅, 𝐇
other similar phrases
07 February 2023 8
A. Zhukova et al. "Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons"
core
mentions
specializing
mentions
generalizing
mentions
Republican
establishment
GOP leaders,
Republicans
a Republican
attorney general
𝑨~𝐁, 𝐁~𝐂 → {𝑨, 𝑩, 𝑪}
𝑨
𝑩
𝑪
Bodies and borders
Body phrases:
similar to min 1 core phrase
Resolve conflicting terms:
most similar to phrases of a core cluster
Border phrases:
similar to min 2 clustered phrases
Resolve conflicts:
similar to the most clustered terms
07 February 2023 9
A. Zhukova et al. "Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons"
Non-core and merge clusters
Form a cluster
Merge?
07 February 2023 10
A. Zhukova et al. "Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons"
Non-core clusters:
min 2 phrases are similar
The remaining unclustered phrases are similar to
each other
Merge clusters:
clusters are semantically similar
Semantic similarity of the TF-IDF-weighted clusters’
phrases exceed a threshold
Qualitative evaluation: indirect mentions
Hierarchical clustering Our approach
Title Phrases
Americans Former intelligence officials, American officials, White House
officials, outside experts, Officials, Trump administration,
intelligence community, officials, administration
Iranians brutal regime, Iran leaders, exhaustive regimes, inspectors,
inspection regime, Iranian regime, regime
Israelis senior Israeli official, Israelis, Israeli networks, Israeli leader,
Israeli officials
Europeans Europeans, European leaders
Title Phrases
officials American officials, White House officials,
outside experts, Officials, officials, Israeli
officials
regime administration, brutal regime, exhaustive
regimes, Iranian regime, regime
leaders Iran leaders, Israeli leader, European
leaders
? senior Israeli official, Israelis, Europeans
07 February 2023 11
A. Zhukova et al. "Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons"
Qualitative evaluation: direct mentions
Hierarchical clustering Our approach
Phrases
Central American migrants, asylum-seekers, Similar migrant
groups, Central Americans, gay migrants, American sponsors,
Central American children, several American advocacy groups,
Asylum-seeking immigrant, Central American transgender
women, refugees, undocumented immigrants, immigrant rights
activists, Asylum-seekers, individuals, queer, migrant families,
legitimate asylum-seekers, Migrant caravan, migrants, individual,
caravan main organizing group, several groups, asylum seekers,
families, his case, smugglers, immigration judges, particular group,
caravan, sponsor, several groups, American sponsor, nonprofit
group, children, Migrants, groups, protesters, his children, many
migrants, group, her children, Immigrants, activists, their children,
immigrants
Phrases
Central American migrants, Central American
children, several American advocacy groups, several
groups, Other administration officials
asylum-seekers, gay migrants, refugees,
undocumented immigrants, Asylum-seekers,
migrants, asylum seekers, smugglers, Migrants,
Immigrants, immigrants
Similar migrant groups, caravan main organizing
group, several groups, groups, protesters, group,
activists
migrant families, families, children, his children, her
children, their children, U.S. families, his family
07 February 2023 12
A. Zhukova et al. "Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons"
Conclusion and Future work
• Resolved reliably mentions related to geo-political entities or organizations
• Clustered mentions while maintaining a fine-grained level of conceptualization
Future work
• Use as a component to cross-document coreference resolution
• Quantitative evaluation
• Context-dependent word/phrase sense disambiguation
07 February 2023 13
A. Zhukova et al. "Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons"
References
1. Hamborg, F., Zhukova, A., Gipp, B.: Automated identification of media bias by word choice and labeling in news articles. In: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL)
(Jun 2019)
2. Hamborg, F., Zhukova, A., Gipp, B.: Illegal aliens or undocumented immigrants? Towards the automated identification of bias by word choice and labeling. In: Proceedings of the iConference 2019
(Mar 2019)
3. Ankerst, M., Breunig, M.M., Kriegel, H.P., Sander, J.: Optics: Ordering points to identify the clustering structure. In: Proceedings of the 1999 ACM SIGMOD International Conference on Management of
Data. p. 49–60. SIGMOD ’99, Association for Computing Machinery, New York, NY, USA (1999).
4. Cambria, E., Poria, S., Hazarika, D., Kwok, K.: Senticnet 5: Discovering conceptual primitives for sentiment analysis by means of context embeddings. In: Thirty-Second AAAI Conference on Artificial
Intelligence (2018)
5. Cha, M., Gwon, Y., Kung, H.: Language modeling by clustering with word embeddings for text readability assessment. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge
Management. pp. 2003–2006 (2017)
6. Chen, N.C., Suh, J., Verwey, J., Ramos, G., Drucker, S., Simard, P.: Anchorviz: Facilitating classifier error discovery through interactive semantic data exploration. In: 23rd International Conference on
Intelligent User Interfaces. pp. 269–280 (2018)
7. Han, X., Wu, Z., Huang, P.X., Zhang, X., Zhu, M., Li, Y., Zhao, Y., Davis, L.S.: Automatic spatially-aware fashion concept discovery. In: Proceedings of the IEEE International Conference on Computer
Vision. pp. 1463–1471 (2017)
8. Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky,D.: The Stanford CoreNLP natural language processing toolkit. In: Association for Computational Linguistics (ACL) System
Demonstrations. pp. 55–60 (2014),
9. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems. pp.
3111–3119 (2013)
10. Murtagh, F., Contreras, P.: Algorithms for hierarchical clustering: an overview. Wiley Inter-disciplinary Reviews: Data Mining and Knowledge Discovery2(1), 86–97 (2012)
11. Subramanian, S., Roth, D.: Improving generalization in coreference resolution via adversarial training. In: Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM
2019). pp. 192–197. Association for Computational Linguistics, Minneapolis, Minnesota (Jun 2019)
12. Zheng, G., Callan, J.: Learning to reweight terms with distributed representations. In: Proceedings of the 38th international ACM SIGIR conference on research and development in information
retrieval. pp. 575–584 (2015)
07 February 2023 14
A. Zhukova et al. "Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons"
Thank you for your attention!
Questions?
07 February 2023 15
A. Zhukova et al. "Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons"

More Related Content

More from Anastasia Zhukova

What's in the News? Towards Identification of Bias by Commission, Omission, a...
What's in the News? Towards Identification of Bias by Commission, Omission, a...What's in the News? Towards Identification of Bias by Commission, Omission, a...
What's in the News? Towards Identification of Bias by Commission, Omission, a...Anastasia Zhukova
 
Seminar Paper: Putting News in a Perspective: Framing by Word Choice and Labe...
Seminar Paper: Putting News in a Perspective: Framing by Word Choice and Labe...Seminar Paper: Putting News in a Perspective: Framing by Word Choice and Labe...
Seminar Paper: Putting News in a Perspective: Framing by Word Choice and Labe...Anastasia Zhukova
 
M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...
M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...
M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...Anastasia Zhukova
 
Talk: Automated Identification of Media Bias by Word Choice and Labeling in N...
Talk: Automated Identification of Media Bias by Word Choice and Labeling in N...Talk: Automated Identification of Media Bias by Word Choice and Labeling in N...
Talk: Automated Identification of Media Bias by Word Choice and Labeling in N...Anastasia Zhukova
 
Automated Identification of Framing by Word Choice and Labeling to Reveal Med...
Automated Identification of Framing by Word Choice and Labeling to Reveal Med...Automated Identification of Framing by Word Choice and Labeling to Reveal Med...
Automated Identification of Framing by Word Choice and Labeling to Reveal Med...Anastasia Zhukova
 
Putting News in a Perspective: Framing by Word Choice and Labeling
Putting News in a Perspective: Framing by Word Choice and LabelingPutting News in a Perspective: Framing by Word Choice and Labeling
Putting News in a Perspective: Framing by Word Choice and LabelingAnastasia Zhukova
 
Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference R...
Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference R...Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference R...
Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference R...Anastasia Zhukova
 
Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...
Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...
Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...Anastasia Zhukova
 
Towards Evaluation of Cross-document Coreference Resolution Models Using Data...
Towards Evaluation of Cross-document Coreference Resolution Models Using Data...Towards Evaluation of Cross-document Coreference Resolution Models Using Data...
Towards Evaluation of Cross-document Coreference Resolution Models Using Data...Anastasia Zhukova
 
XCoref: Cross-document Coreference Resolution in the Wild
XCoref: Cross-document Coreference Resolution in the WildXCoref: Cross-document Coreference Resolution in the Wild
XCoref: Cross-document Coreference Resolution in the WildAnastasia Zhukova
 
ANEA: Automated (Named) Entity Annotation for German Domain-Specific Texts
ANEA: Automated (Named) Entity Annotation for German Domain-Specific TextsANEA: Automated (Named) Entity Annotation for German Domain-Specific Texts
ANEA: Automated (Named) Entity Annotation for German Domain-Specific TextsAnastasia Zhukova
 

More from Anastasia Zhukova (11)

What's in the News? Towards Identification of Bias by Commission, Omission, a...
What's in the News? Towards Identification of Bias by Commission, Omission, a...What's in the News? Towards Identification of Bias by Commission, Omission, a...
What's in the News? Towards Identification of Bias by Commission, Omission, a...
 
Seminar Paper: Putting News in a Perspective: Framing by Word Choice and Labe...
Seminar Paper: Putting News in a Perspective: Framing by Word Choice and Labe...Seminar Paper: Putting News in a Perspective: Framing by Word Choice and Labe...
Seminar Paper: Putting News in a Perspective: Framing by Word Choice and Labe...
 
M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...
M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...
M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...
 
Talk: Automated Identification of Media Bias by Word Choice and Labeling in N...
Talk: Automated Identification of Media Bias by Word Choice and Labeling in N...Talk: Automated Identification of Media Bias by Word Choice and Labeling in N...
Talk: Automated Identification of Media Bias by Word Choice and Labeling in N...
 
Automated Identification of Framing by Word Choice and Labeling to Reveal Med...
Automated Identification of Framing by Word Choice and Labeling to Reveal Med...Automated Identification of Framing by Word Choice and Labeling to Reveal Med...
Automated Identification of Framing by Word Choice and Labeling to Reveal Med...
 
Putting News in a Perspective: Framing by Word Choice and Labeling
Putting News in a Perspective: Framing by Word Choice and LabelingPutting News in a Perspective: Framing by Word Choice and Labeling
Putting News in a Perspective: Framing by Word Choice and Labeling
 
Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference R...
Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference R...Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference R...
Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference R...
 
Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...
Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...
Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...
 
Towards Evaluation of Cross-document Coreference Resolution Models Using Data...
Towards Evaluation of Cross-document Coreference Resolution Models Using Data...Towards Evaluation of Cross-document Coreference Resolution Models Using Data...
Towards Evaluation of Cross-document Coreference Resolution Models Using Data...
 
XCoref: Cross-document Coreference Resolution in the Wild
XCoref: Cross-document Coreference Resolution in the WildXCoref: Cross-document Coreference Resolution in the Wild
XCoref: Cross-document Coreference Resolution in the Wild
 
ANEA: Automated (Named) Entity Annotation for German Domain-Specific Texts
ANEA: Automated (Named) Entity Annotation for German Domain-Specific TextsANEA: Automated (Named) Entity Annotation for German Domain-Specific Texts
ANEA: Automated (Named) Entity Annotation for German Domain-Specific Texts
 

Recently uploaded

PHOTOSYNTHETIC BACTERIA (OXYGENIC AND ANOXYGENIC)
PHOTOSYNTHETIC BACTERIA  (OXYGENIC AND ANOXYGENIC)PHOTOSYNTHETIC BACTERIA  (OXYGENIC AND ANOXYGENIC)
PHOTOSYNTHETIC BACTERIA (OXYGENIC AND ANOXYGENIC)kushbuR
 
GBSN - Microbiology (Unit 7) Microbiology in Everyday Life
GBSN - Microbiology (Unit 7) Microbiology in Everyday LifeGBSN - Microbiology (Unit 7) Microbiology in Everyday Life
GBSN - Microbiology (Unit 7) Microbiology in Everyday LifeAreesha Ahmad
 
Electricity and Circuits for Grade 9 students
Electricity and Circuits for Grade 9 studentsElectricity and Circuits for Grade 9 students
Electricity and Circuits for Grade 9 studentslevieagacer
 
Mining Activity and Investment Opportunity in Myanmar.pptx
Mining Activity and Investment Opportunity in Myanmar.pptxMining Activity and Investment Opportunity in Myanmar.pptx
Mining Activity and Investment Opportunity in Myanmar.pptxKyawThanTint
 
Film Coated Tablet and Film Coating raw materials.pdf
Film Coated Tablet and Film Coating raw materials.pdfFilm Coated Tablet and Film Coating raw materials.pdf
Film Coated Tablet and Film Coating raw materials.pdfPharmatech-rx
 
Isolation of AMF by wet sieving and decantation method pptx
Isolation of AMF by wet sieving and decantation method pptxIsolation of AMF by wet sieving and decantation method pptx
Isolation of AMF by wet sieving and decantation method pptxGOWTHAMIM22
 
Heads-Up Multitasker: CHI 2024 Presentation.pdf
Heads-Up Multitasker: CHI 2024 Presentation.pdfHeads-Up Multitasker: CHI 2024 Presentation.pdf
Heads-Up Multitasker: CHI 2024 Presentation.pdfbyp19971001
 
Abortion uae unmarried price +27791653574 Contact Us Dubai Abu Dhabi Sharjah ...
Abortion uae unmarried price +27791653574 Contact Us Dubai Abu Dhabi Sharjah ...Abortion uae unmarried price +27791653574 Contact Us Dubai Abu Dhabi Sharjah ...
Abortion uae unmarried price +27791653574 Contact Us Dubai Abu Dhabi Sharjah ...mikehavy0
 
Fun for mover student's book- English book for teaching.pdf
Fun for mover student's book- English book for teaching.pdfFun for mover student's book- English book for teaching.pdf
Fun for mover student's book- English book for teaching.pdfhoangquan21999
 
Continuum emission from within the plunging region of black hole discs
Continuum emission from within the plunging region of black hole discsContinuum emission from within the plunging region of black hole discs
Continuum emission from within the plunging region of black hole discsSérgio Sacani
 
TEST BANK for Organic Chemistry 6th Edition.pdf
TEST BANK for Organic Chemistry 6th Edition.pdfTEST BANK for Organic Chemistry 6th Edition.pdf
TEST BANK for Organic Chemistry 6th Edition.pdfmarcuskenyatta275
 
Efficient spin-up of Earth System Models usingsequence acceleration
Efficient spin-up of Earth System Models usingsequence accelerationEfficient spin-up of Earth System Models usingsequence acceleration
Efficient spin-up of Earth System Models usingsequence accelerationSérgio Sacani
 
GBSN - Microbiology (Unit 5) Concept of isolation
GBSN - Microbiology (Unit 5) Concept of isolationGBSN - Microbiology (Unit 5) Concept of isolation
GBSN - Microbiology (Unit 5) Concept of isolationAreesha Ahmad
 
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...yogeshlabana357357
 
A Scientific PowerPoint on Albert Einstein
A Scientific PowerPoint on Albert EinsteinA Scientific PowerPoint on Albert Einstein
A Scientific PowerPoint on Albert Einsteinxgamestudios8
 
MODERN PHYSICS_REPORTING_QUANTA_.....pdf
MODERN PHYSICS_REPORTING_QUANTA_.....pdfMODERN PHYSICS_REPORTING_QUANTA_.....pdf
MODERN PHYSICS_REPORTING_QUANTA_.....pdfRevenJadePalma
 
GBSN - Biochemistry (Unit 8) Enzymology
GBSN - Biochemistry (Unit 8) EnzymologyGBSN - Biochemistry (Unit 8) Enzymology
GBSN - Biochemistry (Unit 8) EnzymologyAreesha Ahmad
 
Information science research with large language models: between science and ...
Information science research with large language models: between science and ...Information science research with large language models: between science and ...
Information science research with large language models: between science and ...Fabiano Dalpiaz
 
Technical english Technical english.pptx
Technical english Technical english.pptxTechnical english Technical english.pptx
Technical english Technical english.pptxyoussefboujtat3
 
NUMERICAL Proof Of TIme Electron Theory.
NUMERICAL Proof Of TIme Electron Theory.NUMERICAL Proof Of TIme Electron Theory.
NUMERICAL Proof Of TIme Electron Theory.syedmuneemqadri
 

Recently uploaded (20)

PHOTOSYNTHETIC BACTERIA (OXYGENIC AND ANOXYGENIC)
PHOTOSYNTHETIC BACTERIA  (OXYGENIC AND ANOXYGENIC)PHOTOSYNTHETIC BACTERIA  (OXYGENIC AND ANOXYGENIC)
PHOTOSYNTHETIC BACTERIA (OXYGENIC AND ANOXYGENIC)
 
GBSN - Microbiology (Unit 7) Microbiology in Everyday Life
GBSN - Microbiology (Unit 7) Microbiology in Everyday LifeGBSN - Microbiology (Unit 7) Microbiology in Everyday Life
GBSN - Microbiology (Unit 7) Microbiology in Everyday Life
 
Electricity and Circuits for Grade 9 students
Electricity and Circuits for Grade 9 studentsElectricity and Circuits for Grade 9 students
Electricity and Circuits for Grade 9 students
 
Mining Activity and Investment Opportunity in Myanmar.pptx
Mining Activity and Investment Opportunity in Myanmar.pptxMining Activity and Investment Opportunity in Myanmar.pptx
Mining Activity and Investment Opportunity in Myanmar.pptx
 
Film Coated Tablet and Film Coating raw materials.pdf
Film Coated Tablet and Film Coating raw materials.pdfFilm Coated Tablet and Film Coating raw materials.pdf
Film Coated Tablet and Film Coating raw materials.pdf
 
Isolation of AMF by wet sieving and decantation method pptx
Isolation of AMF by wet sieving and decantation method pptxIsolation of AMF by wet sieving and decantation method pptx
Isolation of AMF by wet sieving and decantation method pptx
 
Heads-Up Multitasker: CHI 2024 Presentation.pdf
Heads-Up Multitasker: CHI 2024 Presentation.pdfHeads-Up Multitasker: CHI 2024 Presentation.pdf
Heads-Up Multitasker: CHI 2024 Presentation.pdf
 
Abortion uae unmarried price +27791653574 Contact Us Dubai Abu Dhabi Sharjah ...
Abortion uae unmarried price +27791653574 Contact Us Dubai Abu Dhabi Sharjah ...Abortion uae unmarried price +27791653574 Contact Us Dubai Abu Dhabi Sharjah ...
Abortion uae unmarried price +27791653574 Contact Us Dubai Abu Dhabi Sharjah ...
 
Fun for mover student's book- English book for teaching.pdf
Fun for mover student's book- English book for teaching.pdfFun for mover student's book- English book for teaching.pdf
Fun for mover student's book- English book for teaching.pdf
 
Continuum emission from within the plunging region of black hole discs
Continuum emission from within the plunging region of black hole discsContinuum emission from within the plunging region of black hole discs
Continuum emission from within the plunging region of black hole discs
 
TEST BANK for Organic Chemistry 6th Edition.pdf
TEST BANK for Organic Chemistry 6th Edition.pdfTEST BANK for Organic Chemistry 6th Edition.pdf
TEST BANK for Organic Chemistry 6th Edition.pdf
 
Efficient spin-up of Earth System Models usingsequence acceleration
Efficient spin-up of Earth System Models usingsequence accelerationEfficient spin-up of Earth System Models usingsequence acceleration
Efficient spin-up of Earth System Models usingsequence acceleration
 
GBSN - Microbiology (Unit 5) Concept of isolation
GBSN - Microbiology (Unit 5) Concept of isolationGBSN - Microbiology (Unit 5) Concept of isolation
GBSN - Microbiology (Unit 5) Concept of isolation
 
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
 
A Scientific PowerPoint on Albert Einstein
A Scientific PowerPoint on Albert EinsteinA Scientific PowerPoint on Albert Einstein
A Scientific PowerPoint on Albert Einstein
 
MODERN PHYSICS_REPORTING_QUANTA_.....pdf
MODERN PHYSICS_REPORTING_QUANTA_.....pdfMODERN PHYSICS_REPORTING_QUANTA_.....pdf
MODERN PHYSICS_REPORTING_QUANTA_.....pdf
 
GBSN - Biochemistry (Unit 8) Enzymology
GBSN - Biochemistry (Unit 8) EnzymologyGBSN - Biochemistry (Unit 8) Enzymology
GBSN - Biochemistry (Unit 8) Enzymology
 
Information science research with large language models: between science and ...
Information science research with large language models: between science and ...Information science research with large language models: between science and ...
Information science research with large language models: between science and ...
 
Technical english Technical english.pptx
Technical english Technical english.pptxTechnical english Technical english.pptx
Technical english Technical english.pptx
 
NUMERICAL Proof Of TIme Electron Theory.
NUMERICAL Proof Of TIme Electron Theory.NUMERICAL Proof Of TIme Electron Theory.
NUMERICAL Proof Of TIme Electron Theory.
 

Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons

  • 1. Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons Anastasia Zhukova1 Felix Hamborg2,4 Karsten Donnay3,4 Bela Gipp1,4 1University of Wuppertal, Germany 2University of Konstanz, Germany 3University of Zurich, Switzerland 4Heidelberg Academy of Sciences and Humanities, Germany 07 February 2023
  • 2. Motivation 07 February 2023 2 A. Zhukova et al. "Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons" Migrant caravan of asylum seekers reaches U.S. border. By Sunday afternoon, a group of about 150 of the caravan's members had begun crossing into the U.S. People who request protection at a United States entry point must be referred to an asylum officer for a screening. Asylum-seekers are typically held up to three days at the border. The human stakes for the individual migrants planning to seek asylum Sunday were at least as high. About 80 U.S. families have also offered to sponsor migrants seeking asylum. Lawyers who went to Tijuana denied any coaching of the roughly 400 people in the caravan. Central American migrants and supporters of the migrant caravan from the U.S. side looking south into Mexico on April 29, 2018. The message was intended as a show of support for the Central American transgender women seeking asylum. These migrants will decide whether to present themselves to U.S Border officers at the San Ysidro port of entry and apply for asylum.
  • 3. Motivation 07 February 2023 3 A. Zhukova et al. "Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons" Migrant caravan of asylum seekers reaches U.S. border. By Sunday afternoon, a group of about 150 of the caravan's members had begun crossing into the U.S. People who request protection at a United States entry point must be referred to an asylum officer for a screening. Asylum-seekers are typically held up to three days at the border. The human stakes for the individual migrants planning to seek asylum Sunday were at least as high. About 80 U.S. families have also offered to sponsor migrants seeking asylum. Lawyers who went to Tijuana denied any coaching of the roughly 400 people in the caravan. Central American migrants and supporters of the migrant caravan from the U.S. side looking south into Mexico on April 29, 2018. The message was intended as a show of support for the Central American transgender women seeking asylum. These migrants will decide whether to present themselves to U.S Border officers at the San Ysidro port of entry and apply for asylum. Named Entity Recognition & Coreference Resolution?
  • 4. Motivation 07 February 2023 4 A. Zhukova et al. "Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons" Migrant caravan of asylum seekers reaches U.S. border. By Sunday afternoon, a group of about 150 of the caravan's members had begun crossing into the U.S. People who request protection at a United States entry point must be referred to an asylum officer for a screening. Asylum-seekers are typically held up to three days at the border. The human stakes for the individual migrants planning to seek asylum Sunday were at least as high. About 80 U.S. families have also offered to sponsor migrants seeking asylum. Lawyers who went to Tijuana denied any coaching of the roughly 400 people in the caravan. Central American migrants and supporters of the migrant caravan from the U.S. side looking south into Mexico on April 29, 2018. The message was intended as a show of support for the Central American transgender women seeking asylum. These migrants will decide whether to present themselves to U.S Border officers at the San Ysidro port of entry and apply for asylum. Concept extraction: groups of persons
  • 5. Research question How to automatically identify conceptually fine-grained clusters of related mentions referring to groups of people in an unsupervised way? 07 February 2023 5 A. Zhukova et al. "Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons" White House officials Trump Administration U.S. officials American diplomats demonstrators 28,000 attendees of the demonstration Mr. Trump’s critics people opposing Trump’s visit Direct mentions Indirect mentions same group of people associated with geo-political entity or organization
  • 6. Pipeline 07 February 2023 6 A. Zhukova et al. "Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons" Main principles • OPTICS merge points in decreasing density • Hierarchical clustering aggregating linkage criteria to merge clusters Border mentions and non-core clusters Cluster cores Cluster bodies Merge clusters
  • 7. Preprocessing 07 February 2023 7 A. Zhukova et al. "Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons" GOP RepublicansUnited_States U.S. American Americans Spanish Mexico GOP Republicans United_States U.S. American Americans Spanish Mexico - extract noun phrases - keep only headwords, adjectives, compound, noun, and number modifiers - vectorize words with word2vec - construct named-entity grid - vectorize phrase with averaging weighted words - similarity = cosine similarity Americans U.S. citizens U.S. + citizens Russian Russian + citizens Russians Difference & similarity between phrases is hard to distinguish Americans U.S. 2 × U.S. + citizens Russian 2 × Americans 2 × Russian + citizens 2 × Russians citizens Easier to identify related phrases Russians “people from migrant caravan” “people migrant caravan”
  • 8. Cluster cores 1) A & B are similar to each other 2) A & B are similar to sufficient number other phrases 3) Assemble similarity chains Core phrases form distinctive initial clusters 𝑨~𝑩 𝑨~𝐁: 𝐂, 𝐃, 𝐄, 𝐆 𝐁~𝐀: 𝐂, 𝐃, 𝐄, 𝐅, 𝐇 other similar phrases 07 February 2023 8 A. Zhukova et al. "Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons" core mentions specializing mentions generalizing mentions Republican establishment GOP leaders, Republicans a Republican attorney general 𝑨~𝐁, 𝐁~𝐂 → {𝑨, 𝑩, 𝑪} 𝑨 𝑩 𝑪
  • 9. Bodies and borders Body phrases: similar to min 1 core phrase Resolve conflicting terms: most similar to phrases of a core cluster Border phrases: similar to min 2 clustered phrases Resolve conflicts: similar to the most clustered terms 07 February 2023 9 A. Zhukova et al. "Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons"
  • 10. Non-core and merge clusters Form a cluster Merge? 07 February 2023 10 A. Zhukova et al. "Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons" Non-core clusters: min 2 phrases are similar The remaining unclustered phrases are similar to each other Merge clusters: clusters are semantically similar Semantic similarity of the TF-IDF-weighted clusters’ phrases exceed a threshold
  • 11. Qualitative evaluation: indirect mentions Hierarchical clustering Our approach Title Phrases Americans Former intelligence officials, American officials, White House officials, outside experts, Officials, Trump administration, intelligence community, officials, administration Iranians brutal regime, Iran leaders, exhaustive regimes, inspectors, inspection regime, Iranian regime, regime Israelis senior Israeli official, Israelis, Israeli networks, Israeli leader, Israeli officials Europeans Europeans, European leaders Title Phrases officials American officials, White House officials, outside experts, Officials, officials, Israeli officials regime administration, brutal regime, exhaustive regimes, Iranian regime, regime leaders Iran leaders, Israeli leader, European leaders ? senior Israeli official, Israelis, Europeans 07 February 2023 11 A. Zhukova et al. "Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons"
  • 12. Qualitative evaluation: direct mentions Hierarchical clustering Our approach Phrases Central American migrants, asylum-seekers, Similar migrant groups, Central Americans, gay migrants, American sponsors, Central American children, several American advocacy groups, Asylum-seeking immigrant, Central American transgender women, refugees, undocumented immigrants, immigrant rights activists, Asylum-seekers, individuals, queer, migrant families, legitimate asylum-seekers, Migrant caravan, migrants, individual, caravan main organizing group, several groups, asylum seekers, families, his case, smugglers, immigration judges, particular group, caravan, sponsor, several groups, American sponsor, nonprofit group, children, Migrants, groups, protesters, his children, many migrants, group, her children, Immigrants, activists, their children, immigrants Phrases Central American migrants, Central American children, several American advocacy groups, several groups, Other administration officials asylum-seekers, gay migrants, refugees, undocumented immigrants, Asylum-seekers, migrants, asylum seekers, smugglers, Migrants, Immigrants, immigrants Similar migrant groups, caravan main organizing group, several groups, groups, protesters, group, activists migrant families, families, children, his children, her children, their children, U.S. families, his family 07 February 2023 12 A. Zhukova et al. "Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons"
  • 13. Conclusion and Future work • Resolved reliably mentions related to geo-political entities or organizations • Clustered mentions while maintaining a fine-grained level of conceptualization Future work • Use as a component to cross-document coreference resolution • Quantitative evaluation • Context-dependent word/phrase sense disambiguation 07 February 2023 13 A. Zhukova et al. "Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons"
  • 14. References 1. Hamborg, F., Zhukova, A., Gipp, B.: Automated identification of media bias by word choice and labeling in news articles. In: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL) (Jun 2019) 2. Hamborg, F., Zhukova, A., Gipp, B.: Illegal aliens or undocumented immigrants? Towards the automated identification of bias by word choice and labeling. In: Proceedings of the iConference 2019 (Mar 2019) 3. Ankerst, M., Breunig, M.M., Kriegel, H.P., Sander, J.: Optics: Ordering points to identify the clustering structure. In: Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data. p. 49–60. SIGMOD ’99, Association for Computing Machinery, New York, NY, USA (1999). 4. Cambria, E., Poria, S., Hazarika, D., Kwok, K.: Senticnet 5: Discovering conceptual primitives for sentiment analysis by means of context embeddings. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018) 5. Cha, M., Gwon, Y., Kung, H.: Language modeling by clustering with word embeddings for text readability assessment. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. pp. 2003–2006 (2017) 6. Chen, N.C., Suh, J., Verwey, J., Ramos, G., Drucker, S., Simard, P.: Anchorviz: Facilitating classifier error discovery through interactive semantic data exploration. In: 23rd International Conference on Intelligent User Interfaces. pp. 269–280 (2018) 7. Han, X., Wu, Z., Huang, P.X., Zhang, X., Zhu, M., Li, Y., Zhao, Y., Davis, L.S.: Automatic spatially-aware fashion concept discovery. In: Proceedings of the IEEE International Conference on Computer Vision. pp. 1463–1471 (2017) 8. Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky,D.: The Stanford CoreNLP natural language processing toolkit. In: Association for Computational Linguistics (ACL) System Demonstrations. pp. 55–60 (2014), 9. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems. pp. 3111–3119 (2013) 10. Murtagh, F., Contreras, P.: Algorithms for hierarchical clustering: an overview. Wiley Inter-disciplinary Reviews: Data Mining and Knowledge Discovery2(1), 86–97 (2012) 11. Subramanian, S., Roth, D.: Improving generalization in coreference resolution via adversarial training. In: Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019). pp. 192–197. Association for Computational Linguistics, Minneapolis, Minnesota (Jun 2019) 12. Zheng, G., Callan, J.: Learning to reweight terms with distributed representations. In: Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval. pp. 575–584 (2015) 07 February 2023 14 A. Zhukova et al. "Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons"
  • 15. Thank you for your attention! Questions? 07 February 2023 15 A. Zhukova et al. "Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons"