SlideShare a Scribd company logo
1 of 1
Download to read offline
Interpretable and Comparative Textual Dataset
Exploration Using Near-Identity Mention Relations
Anastasia Zhukova1, Felix Hamborg2, Bela Gipp1
Problem
• Text dataset exploration lacks of methods
allow text comparison and content
interpretation
• SoTA: uni-/bigram instead of phrase level
• Lack of identification of complex
coreferential relations
2. Detailed representation
On cluster level: NI-CDCR (near-identity cross-document coreference resolution)
Contact
Web: https://dke.uni-wuppertal.de/en/
Email: zhukova@uni-wuppertal.de, felix.hamborg@uni.kn
1 University of Wuppertal, 2 University of Konstanz
Research question
How to explore text datasets to ensure
comparability across texts, interpretability of
representation, and employment of the
complex semantic relations among text
components?
Features
• Histogram-matrix → comparative text
exploration
• Dimensions
o LDA-based clusters
o user-defined metadata / extracted
information
• Resolution of semantically complex and
lexical diverse concepts (NI-CDCR)
o Identity: “Cuba” = “the Caribbean Pearl”
o Near-identity: “US government” ~ “Trump
Administration” ~ “White House”
o Context identity: “resuming its nuclear
program” = “Tehran's nuclear ambitions”
• Easy-to-interpret document representation
• Three levels of analysis: cluster, detailed,
and document view
• User-controlled number of concepts
Metadata:
news
slant
/
frame
/
subtopic
0+2 5 6 7 3 1 4+9
8
RR
R
M
LL
L
0 2
Document view
Metadata: event id
Detailed view
Methodology
1. High-level representation
On article level: LDA
• cosine similarity
between LDA-topic
vectors
• small squares: articles
• 𝑠𝑖𝑚 ≥ 0.7 → related
0
1
2
3
4
5
6
7
8
9
4+9
4+9
0+2
0+2
Event ids
“Trump” entity is mentioned across all clusters
253
Cluster view

More Related Content

Similar to Interpretable and Comparative Textual Dataset Exploration Using Near-Identity Mention Relations

Learning Relations from Social Tagging Data
Learning Relations from Social Tagging DataLearning Relations from Social Tagging Data
Learning Relations from Social Tagging DataHang Dong
 
Creating metadata for data visualization
Creating metadata for data visualizationCreating metadata for data visualization
Creating metadata for data visualizationgesinaphillips
 
Modeling Causal Reasoning in Complex Networks through NLP: an Introduction
Modeling Causal Reasoning in Complex Networks through NLP: an IntroductionModeling Causal Reasoning in Complex Networks through NLP: an Introduction
Modeling Causal Reasoning in Complex Networks through NLP: an IntroductionLuca Nannini
 
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation l...
NS-CUK Seminar:  H.B.Kim,  Review on "metapath2vec: Scalable representation l...NS-CUK Seminar:  H.B.Kim,  Review on "metapath2vec: Scalable representation l...
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation l...ssuser4b1f48
 
Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment
Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment
Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment Paris Sud University
 
Easing embedding learning by comprehensive transcription of heterogeneous inf...
Easing embedding learning by comprehensive transcription of heterogeneous inf...Easing embedding learning by comprehensive transcription of heterogeneous inf...
Easing embedding learning by comprehensive transcription of heterogeneous inf...paper_reader
 
TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxKalpit Desai
 
Network analyses of SPSSI
Network analyses of SPSSINetwork analyses of SPSSI
Network analyses of SPSSIKevin Lanning
 
MacroMicroZoom.pdf
MacroMicroZoom.pdfMacroMicroZoom.pdf
MacroMicroZoom.pdfMartin Wynne
 
Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference R...
Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference R...Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference R...
Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference R...Anastasia Zhukova
 
Concurrent Inference of Topic Models and Distributed Vector Representations
Concurrent Inference of Topic Models and Distributed Vector RepresentationsConcurrent Inference of Topic Models and Distributed Vector Representations
Concurrent Inference of Topic Models and Distributed Vector RepresentationsParang Saraf
 
DH Tools Workshop #1: Text Analysis
DH Tools Workshop #1:  Text AnalysisDH Tools Workshop #1:  Text Analysis
DH Tools Workshop #1: Text Analysiscjbuckner
 
NS-CUK Joint Journal Club: V.T.Hoang, Review on "Heterogeneous Graph Attentio...
NS-CUK Joint Journal Club: V.T.Hoang, Review on "Heterogeneous Graph Attentio...NS-CUK Joint Journal Club: V.T.Hoang, Review on "Heterogeneous Graph Attentio...
NS-CUK Joint Journal Club: V.T.Hoang, Review on "Heterogeneous Graph Attentio...ssuser4b1f48
 
Fueling the future with Semantic Web patterns - Keynote at WOP2014@ISWC
Fueling the future with Semantic Web patterns - Keynote at WOP2014@ISWCFueling the future with Semantic Web patterns - Keynote at WOP2014@ISWC
Fueling the future with Semantic Web patterns - Keynote at WOP2014@ISWCValentina Presutti
 
Auto Mapping Texts for Human-Machine Analysis and Sensemaking
Auto Mapping Texts for Human-Machine Analysis and SensemakingAuto Mapping Texts for Human-Machine Analysis and Sensemaking
Auto Mapping Texts for Human-Machine Analysis and SensemakingShalin Hai-Jew
 
XCoref: Cross-document Coreference Resolution in the Wild
XCoref: Cross-document Coreference Resolution in the WildXCoref: Cross-document Coreference Resolution in the Wild
XCoref: Cross-document Coreference Resolution in the WildAnastasia Zhukova
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingNimrita Koul
 

Similar to Interpretable and Comparative Textual Dataset Exploration Using Near-Identity Mention Relations (20)

Learning Relations from Social Tagging Data
Learning Relations from Social Tagging DataLearning Relations from Social Tagging Data
Learning Relations from Social Tagging Data
 
Creating metadata for data visualization
Creating metadata for data visualizationCreating metadata for data visualization
Creating metadata for data visualization
 
Modeling Causal Reasoning in Complex Networks through NLP: an Introduction
Modeling Causal Reasoning in Complex Networks through NLP: an IntroductionModeling Causal Reasoning in Complex Networks through NLP: an Introduction
Modeling Causal Reasoning in Complex Networks through NLP: an Introduction
 
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation l...
NS-CUK Seminar:  H.B.Kim,  Review on "metapath2vec: Scalable representation l...NS-CUK Seminar:  H.B.Kim,  Review on "metapath2vec: Scalable representation l...
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation l...
 
Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment
Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment
Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment
 
Easing embedding learning by comprehensive transcription of heterogeneous inf...
Easing embedding learning by comprehensive transcription of heterogeneous inf...Easing embedding learning by comprehensive transcription of heterogeneous inf...
Easing embedding learning by comprehensive transcription of heterogeneous inf...
 
TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptx
 
Network analyses of SPSSI
Network analyses of SPSSINetwork analyses of SPSSI
Network analyses of SPSSI
 
MacroMicroZoom.pdf
MacroMicroZoom.pdfMacroMicroZoom.pdf
MacroMicroZoom.pdf
 
Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference R...
Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference R...Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference R...
Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference R...
 
Concurrent Inference of Topic Models and Distributed Vector Representations
Concurrent Inference of Topic Models and Distributed Vector RepresentationsConcurrent Inference of Topic Models and Distributed Vector Representations
Concurrent Inference of Topic Models and Distributed Vector Representations
 
DH Tools Workshop #1: Text Analysis
DH Tools Workshop #1:  Text AnalysisDH Tools Workshop #1:  Text Analysis
DH Tools Workshop #1: Text Analysis
 
A Bridge Not too Far
A Bridge Not too FarA Bridge Not too Far
A Bridge Not too Far
 
NS-CUK Joint Journal Club: V.T.Hoang, Review on "Heterogeneous Graph Attentio...
NS-CUK Joint Journal Club: V.T.Hoang, Review on "Heterogeneous Graph Attentio...NS-CUK Joint Journal Club: V.T.Hoang, Review on "Heterogeneous Graph Attentio...
NS-CUK Joint Journal Club: V.T.Hoang, Review on "Heterogeneous Graph Attentio...
 
Fueling the future with Semantic Web patterns - Keynote at WOP2014@ISWC
Fueling the future with Semantic Web patterns - Keynote at WOP2014@ISWCFueling the future with Semantic Web patterns - Keynote at WOP2014@ISWC
Fueling the future with Semantic Web patterns - Keynote at WOP2014@ISWC
 
DCLA14_Haythornthwaite_Absar_Paulin
DCLA14_Haythornthwaite_Absar_PaulinDCLA14_Haythornthwaite_Absar_Paulin
DCLA14_Haythornthwaite_Absar_Paulin
 
Auto Mapping Texts for Human-Machine Analysis and Sensemaking
Auto Mapping Texts for Human-Machine Analysis and SensemakingAuto Mapping Texts for Human-Machine Analysis and Sensemaking
Auto Mapping Texts for Human-Machine Analysis and Sensemaking
 
Semantic, Cognitive and Perceptual Computing -Human mental representation
Semantic, Cognitive and Perceptual Computing -Human mental representationSemantic, Cognitive and Perceptual Computing -Human mental representation
Semantic, Cognitive and Perceptual Computing -Human mental representation
 
XCoref: Cross-document Coreference Resolution in the Wild
XCoref: Cross-document Coreference Resolution in the WildXCoref: Cross-document Coreference Resolution in the Wild
XCoref: Cross-document Coreference Resolution in the Wild
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 

More from Anastasia Zhukova

What's in the News? Towards Identification of Bias by Commission, Omission, a...
What's in the News? Towards Identification of Bias by Commission, Omission, a...What's in the News? Towards Identification of Bias by Commission, Omission, a...
What's in the News? Towards Identification of Bias by Commission, Omission, a...Anastasia Zhukova
 
Seminar Paper: Putting News in a Perspective: Framing by Word Choice and Labe...
Seminar Paper: Putting News in a Perspective: Framing by Word Choice and Labe...Seminar Paper: Putting News in a Perspective: Framing by Word Choice and Labe...
Seminar Paper: Putting News in a Perspective: Framing by Word Choice and Labe...Anastasia Zhukova
 
M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...
M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...
M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...Anastasia Zhukova
 
Talk: Automated Identification of Media Bias by Word Choice and Labeling in N...
Talk: Automated Identification of Media Bias by Word Choice and Labeling in N...Talk: Automated Identification of Media Bias by Word Choice and Labeling in N...
Talk: Automated Identification of Media Bias by Word Choice and Labeling in N...Anastasia Zhukova
 
Automated Identification of Framing by Word Choice and Labeling to Reveal Med...
Automated Identification of Framing by Word Choice and Labeling to Reveal Med...Automated Identification of Framing by Word Choice and Labeling to Reveal Med...
Automated Identification of Framing by Word Choice and Labeling to Reveal Med...Anastasia Zhukova
 
Putting News in a Perspective: Framing by Word Choice and Labeling
Putting News in a Perspective: Framing by Word Choice and LabelingPutting News in a Perspective: Framing by Word Choice and Labeling
Putting News in a Perspective: Framing by Word Choice and LabelingAnastasia Zhukova
 
Towards Evaluation of Cross-document Coreference Resolution Models Using Data...
Towards Evaluation of Cross-document Coreference Resolution Models Using Data...Towards Evaluation of Cross-document Coreference Resolution Models Using Data...
Towards Evaluation of Cross-document Coreference Resolution Models Using Data...Anastasia Zhukova
 
Concept Identification of Directly and Indirectly Related Mentions Referring ...
Concept Identification of Directly and Indirectly Related Mentions Referring ...Concept Identification of Directly and Indirectly Related Mentions Referring ...
Concept Identification of Directly and Indirectly Related Mentions Referring ...Anastasia Zhukova
 
ANEA: Automated (Named) Entity Annotation for German Domain-Specific Texts
ANEA: Automated (Named) Entity Annotation for German Domain-Specific TextsANEA: Automated (Named) Entity Annotation for German Domain-Specific Texts
ANEA: Automated (Named) Entity Annotation for German Domain-Specific TextsAnastasia Zhukova
 

More from Anastasia Zhukova (9)

What's in the News? Towards Identification of Bias by Commission, Omission, a...
What's in the News? Towards Identification of Bias by Commission, Omission, a...What's in the News? Towards Identification of Bias by Commission, Omission, a...
What's in the News? Towards Identification of Bias by Commission, Omission, a...
 
Seminar Paper: Putting News in a Perspective: Framing by Word Choice and Labe...
Seminar Paper: Putting News in a Perspective: Framing by Word Choice and Labe...Seminar Paper: Putting News in a Perspective: Framing by Word Choice and Labe...
Seminar Paper: Putting News in a Perspective: Framing by Word Choice and Labe...
 
M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...
M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...
M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...
 
Talk: Automated Identification of Media Bias by Word Choice and Labeling in N...
Talk: Automated Identification of Media Bias by Word Choice and Labeling in N...Talk: Automated Identification of Media Bias by Word Choice and Labeling in N...
Talk: Automated Identification of Media Bias by Word Choice and Labeling in N...
 
Automated Identification of Framing by Word Choice and Labeling to Reveal Med...
Automated Identification of Framing by Word Choice and Labeling to Reveal Med...Automated Identification of Framing by Word Choice and Labeling to Reveal Med...
Automated Identification of Framing by Word Choice and Labeling to Reveal Med...
 
Putting News in a Perspective: Framing by Word Choice and Labeling
Putting News in a Perspective: Framing by Word Choice and LabelingPutting News in a Perspective: Framing by Word Choice and Labeling
Putting News in a Perspective: Framing by Word Choice and Labeling
 
Towards Evaluation of Cross-document Coreference Resolution Models Using Data...
Towards Evaluation of Cross-document Coreference Resolution Models Using Data...Towards Evaluation of Cross-document Coreference Resolution Models Using Data...
Towards Evaluation of Cross-document Coreference Resolution Models Using Data...
 
Concept Identification of Directly and Indirectly Related Mentions Referring ...
Concept Identification of Directly and Indirectly Related Mentions Referring ...Concept Identification of Directly and Indirectly Related Mentions Referring ...
Concept Identification of Directly and Indirectly Related Mentions Referring ...
 
ANEA: Automated (Named) Entity Annotation for German Domain-Specific Texts
ANEA: Automated (Named) Entity Annotation for German Domain-Specific TextsANEA: Automated (Named) Entity Annotation for German Domain-Specific Texts
ANEA: Automated (Named) Entity Annotation for German Domain-Specific Texts
 

Recently uploaded

A Scientific PowerPoint on Albert Einstein
A Scientific PowerPoint on Albert EinsteinA Scientific PowerPoint on Albert Einstein
A Scientific PowerPoint on Albert Einsteinxgamestudios8
 
Factor Causing low production and physiology of mamary Gland
Factor Causing low production and physiology of mamary GlandFactor Causing low production and physiology of mamary Gland
Factor Causing low production and physiology of mamary GlandRcvets
 
Taphonomy and Quality of the Fossil Record
Taphonomy and Quality of the  Fossil RecordTaphonomy and Quality of the  Fossil Record
Taphonomy and Quality of the Fossil RecordSangram Sahoo
 
Polyethylene and its polymerization.pptx
Polyethylene and its polymerization.pptxPolyethylene and its polymerization.pptx
Polyethylene and its polymerization.pptxMuhammadRazzaq31
 
Heads-Up Multitasker: CHI 2024 Presentation.pdf
Heads-Up Multitasker: CHI 2024 Presentation.pdfHeads-Up Multitasker: CHI 2024 Presentation.pdf
Heads-Up Multitasker: CHI 2024 Presentation.pdfbyp19971001
 
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...yogeshlabana357357
 
FORENSIC CHEMISTRY ARSON INVESTIGATION.pdf
FORENSIC CHEMISTRY ARSON INVESTIGATION.pdfFORENSIC CHEMISTRY ARSON INVESTIGATION.pdf
FORENSIC CHEMISTRY ARSON INVESTIGATION.pdfSuchita Rawat
 
NuGOweek 2024 programme final FLYER short.pdf
NuGOweek 2024 programme final FLYER short.pdfNuGOweek 2024 programme final FLYER short.pdf
NuGOweek 2024 programme final FLYER short.pdfpablovgd
 
Electricity and Circuits for Grade 9 students
Electricity and Circuits for Grade 9 studentsElectricity and Circuits for Grade 9 students
Electricity and Circuits for Grade 9 studentslevieagacer
 
dkNET Webinar: The 4DN Data Portal - Data, Resources and Tools to Help Elucid...
dkNET Webinar: The 4DN Data Portal - Data, Resources and Tools to Help Elucid...dkNET Webinar: The 4DN Data Portal - Data, Resources and Tools to Help Elucid...
dkNET Webinar: The 4DN Data Portal - Data, Resources and Tools to Help Elucid...dkNET
 
PHOTOSYNTHETIC BACTERIA (OXYGENIC AND ANOXYGENIC)
PHOTOSYNTHETIC BACTERIA  (OXYGENIC AND ANOXYGENIC)PHOTOSYNTHETIC BACTERIA  (OXYGENIC AND ANOXYGENIC)
PHOTOSYNTHETIC BACTERIA (OXYGENIC AND ANOXYGENIC)kushbuR
 
Heat Units in plant physiology and the importance of Growing Degree days
Heat Units in plant physiology and the importance of Growing Degree daysHeat Units in plant physiology and the importance of Growing Degree days
Heat Units in plant physiology and the importance of Growing Degree daysBrahmesh Reddy B R
 
Information science research with large language models: between science and ...
Information science research with large language models: between science and ...Information science research with large language models: between science and ...
Information science research with large language models: between science and ...Fabiano Dalpiaz
 
SaffronCrocusGenomicsThessalonikiOnlineMay2024TalkOnline.pptx
SaffronCrocusGenomicsThessalonikiOnlineMay2024TalkOnline.pptxSaffronCrocusGenomicsThessalonikiOnlineMay2024TalkOnline.pptx
SaffronCrocusGenomicsThessalonikiOnlineMay2024TalkOnline.pptxPat (JS) Heslop-Harrison
 
Warming the earth and the atmosphere.pptx
Warming the earth and the atmosphere.pptxWarming the earth and the atmosphere.pptx
Warming the earth and the atmosphere.pptxGlendelCaroz
 
X-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
X-rays from a Central “Exhaust Vent” of the Galactic Center ChimneyX-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
X-rays from a Central “Exhaust Vent” of the Galactic Center ChimneySérgio Sacani
 
Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...
Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...
Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...Ansari Aashif Raza Mohd Imtiyaz
 
Classification of Kerogen, Perspective on palynofacies in depositional envi...
Classification of Kerogen,  Perspective on palynofacies in depositional  envi...Classification of Kerogen,  Perspective on palynofacies in depositional  envi...
Classification of Kerogen, Perspective on palynofacies in depositional envi...Sangram Sahoo
 

Recently uploaded (20)

A Scientific PowerPoint on Albert Einstein
A Scientific PowerPoint on Albert EinsteinA Scientific PowerPoint on Albert Einstein
A Scientific PowerPoint on Albert Einstein
 
Factor Causing low production and physiology of mamary Gland
Factor Causing low production and physiology of mamary GlandFactor Causing low production and physiology of mamary Gland
Factor Causing low production and physiology of mamary Gland
 
Chemistry Data Delivery from the US-EPA Center for Computational Toxicology a...
Chemistry Data Delivery from the US-EPA Center for Computational Toxicology a...Chemistry Data Delivery from the US-EPA Center for Computational Toxicology a...
Chemistry Data Delivery from the US-EPA Center for Computational Toxicology a...
 
Taphonomy and Quality of the Fossil Record
Taphonomy and Quality of the  Fossil RecordTaphonomy and Quality of the  Fossil Record
Taphonomy and Quality of the Fossil Record
 
Polyethylene and its polymerization.pptx
Polyethylene and its polymerization.pptxPolyethylene and its polymerization.pptx
Polyethylene and its polymerization.pptx
 
Heads-Up Multitasker: CHI 2024 Presentation.pdf
Heads-Up Multitasker: CHI 2024 Presentation.pdfHeads-Up Multitasker: CHI 2024 Presentation.pdf
Heads-Up Multitasker: CHI 2024 Presentation.pdf
 
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
 
FORENSIC CHEMISTRY ARSON INVESTIGATION.pdf
FORENSIC CHEMISTRY ARSON INVESTIGATION.pdfFORENSIC CHEMISTRY ARSON INVESTIGATION.pdf
FORENSIC CHEMISTRY ARSON INVESTIGATION.pdf
 
NuGOweek 2024 programme final FLYER short.pdf
NuGOweek 2024 programme final FLYER short.pdfNuGOweek 2024 programme final FLYER short.pdf
NuGOweek 2024 programme final FLYER short.pdf
 
Electricity and Circuits for Grade 9 students
Electricity and Circuits for Grade 9 studentsElectricity and Circuits for Grade 9 students
Electricity and Circuits for Grade 9 students
 
dkNET Webinar: The 4DN Data Portal - Data, Resources and Tools to Help Elucid...
dkNET Webinar: The 4DN Data Portal - Data, Resources and Tools to Help Elucid...dkNET Webinar: The 4DN Data Portal - Data, Resources and Tools to Help Elucid...
dkNET Webinar: The 4DN Data Portal - Data, Resources and Tools to Help Elucid...
 
PHOTOSYNTHETIC BACTERIA (OXYGENIC AND ANOXYGENIC)
PHOTOSYNTHETIC BACTERIA  (OXYGENIC AND ANOXYGENIC)PHOTOSYNTHETIC BACTERIA  (OXYGENIC AND ANOXYGENIC)
PHOTOSYNTHETIC BACTERIA (OXYGENIC AND ANOXYGENIC)
 
Heat Units in plant physiology and the importance of Growing Degree days
Heat Units in plant physiology and the importance of Growing Degree daysHeat Units in plant physiology and the importance of Growing Degree days
Heat Units in plant physiology and the importance of Growing Degree days
 
Information science research with large language models: between science and ...
Information science research with large language models: between science and ...Information science research with large language models: between science and ...
Information science research with large language models: between science and ...
 
SaffronCrocusGenomicsThessalonikiOnlineMay2024TalkOnline.pptx
SaffronCrocusGenomicsThessalonikiOnlineMay2024TalkOnline.pptxSaffronCrocusGenomicsThessalonikiOnlineMay2024TalkOnline.pptx
SaffronCrocusGenomicsThessalonikiOnlineMay2024TalkOnline.pptx
 
Warming the earth and the atmosphere.pptx
Warming the earth and the atmosphere.pptxWarming the earth and the atmosphere.pptx
Warming the earth and the atmosphere.pptx
 
X-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
X-rays from a Central “Exhaust Vent” of the Galactic Center ChimneyX-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
X-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
 
Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...
Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...
Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...
 
HIV AND INFULENZA VIRUS PPT HIV PPT INFULENZA VIRUS PPT
HIV AND INFULENZA VIRUS PPT HIV PPT  INFULENZA VIRUS PPTHIV AND INFULENZA VIRUS PPT HIV PPT  INFULENZA VIRUS PPT
HIV AND INFULENZA VIRUS PPT HIV PPT INFULENZA VIRUS PPT
 
Classification of Kerogen, Perspective on palynofacies in depositional envi...
Classification of Kerogen,  Perspective on palynofacies in depositional  envi...Classification of Kerogen,  Perspective on palynofacies in depositional  envi...
Classification of Kerogen, Perspective on palynofacies in depositional envi...
 

Interpretable and Comparative Textual Dataset Exploration Using Near-Identity Mention Relations

  • 1. Interpretable and Comparative Textual Dataset Exploration Using Near-Identity Mention Relations Anastasia Zhukova1, Felix Hamborg2, Bela Gipp1 Problem • Text dataset exploration lacks of methods allow text comparison and content interpretation • SoTA: uni-/bigram instead of phrase level • Lack of identification of complex coreferential relations 2. Detailed representation On cluster level: NI-CDCR (near-identity cross-document coreference resolution) Contact Web: https://dke.uni-wuppertal.de/en/ Email: zhukova@uni-wuppertal.de, felix.hamborg@uni.kn 1 University of Wuppertal, 2 University of Konstanz Research question How to explore text datasets to ensure comparability across texts, interpretability of representation, and employment of the complex semantic relations among text components? Features • Histogram-matrix → comparative text exploration • Dimensions o LDA-based clusters o user-defined metadata / extracted information • Resolution of semantically complex and lexical diverse concepts (NI-CDCR) o Identity: “Cuba” = “the Caribbean Pearl” o Near-identity: “US government” ~ “Trump Administration” ~ “White House” o Context identity: “resuming its nuclear program” = “Tehran's nuclear ambitions” • Easy-to-interpret document representation • Three levels of analysis: cluster, detailed, and document view • User-controlled number of concepts Metadata: news slant / frame / subtopic 0+2 5 6 7 3 1 4+9 8 RR R M LL L 0 2 Document view Metadata: event id Detailed view Methodology 1. High-level representation On article level: LDA • cosine similarity between LDA-topic vectors • small squares: articles • 𝑠𝑖𝑚 ≥ 0.7 → related 0 1 2 3 4 5 6 7 8 9 4+9 4+9 0+2 0+2 Event ids “Trump” entity is mentioned across all clusters 253 Cluster view