Informed Neural Topic Model with Pre-trained Word Embeddings

•

0 likes•119 views

AAAI19 paper PosterTalk: Document Informed Neural Autoregressive Topic Models with Distributional Prior Authors: Pankaj Gupta, Yatin Chaudhary, Florian Buettner, Hinrich Schutze

Data & Analytics

$Document Informed Neural Autoregressive Topic Models with Distributional Prior Informed Neural Topic Model with Pre-trained Word Embeddings Pankaj Gupta1,2 , Yatin Chaudhary2 , Florian Buettner2 & Hinrich Schütze1 1 CIS, University of Munich (LMU), Germany 2 Corporate Technology, Machine-Intelligence, Siemens AG Munich, Germany pankaj.gupta@campus.lmu.de | pankaj.gupta@siemens.com Introduction A novel Neural Autoregressive Topic Model for short and long texts, empowered by: • Context-awareness in learning better representations, • Distributional semantics i.e., Word Embeddings as prior knowledge Problem Statement / Motivation 1. “Need for the context-awareness in representation learning”? • To determine actual meaning of ambiguous words • To improve word and document representations Figure 1: Need for context-awareness in learning representations 2. “Need for Prior Knowledge in the limited context settings”? • ↓ word co-occurrences in short texts (e.g., headlines, tweets) or small corpora • Difficult to learn good representations → Generates incoherent topics Figure 2: (left): Word embedding similarity (right): Topic examples Figure 3: Contributions in this work Evaluation and Analysis • 8 short-text and 7 long-text datasets from news, Q&A, sentiment and Industrial domains • Generalization (perplexity), Interpretability (topic coherence), Text retrieval (IR) and classification Table 1: Perplexity (PPL) and IR-precision (at fraction 0.02) scores for short and long texts Table 2: IR-precision at different retrieval fractions Methodology: Document Neural Autoregressive Topic Models Figure 4: (left): DocNADE[1] (baseline) (right): iDocNADE (DocNADE + context-awareness) Figure 5: (left): DocNADE[1] (baseline) (right): DocNADEe (DocNADE + Word Embeddings) Table 3: (left): Topic coherence with the top 10 and 20 words (right): Qualitative example Table 4: (left): Text classification (F1 and Accuracy) scores for short texts Conclusion & Key Takeaways • Leverage full context + pre-trained word embeddings in neural autoregressive topic model • Gain of 5.2% (404 vs 426) in PPL, 2.8% (.74 vs .72) in coherence, 11.1% (.60 vs .54) in IR- precision, 5.2% (.664 vs .631) in F1 for text categorization, on avg over 15 datasets • Demonstrate learning better word/document representation for short and long texts • Tryout: Code available at: https://github.com/pgcool/iDocNADEe Our recent extension of this work: “textTOvec” Pankaj Gupta, Yatin Chaudhary, Florian Buettner and Hinrich Schütze. textTOvec: Deep Contextu- alized Neural Autoregressive Topic Models of Language with Distributed Compositional Prior. To appear in ICLR2019. TL;DR: A Neural Topic Model with Language Structures. References [1] Hugo Larochelle and Stanislas Lauly. A neural autoregressive topic model. In Advances in Neural Information Processing Systems 25, pages 2708–2716. Curran Associates, Inc., 2012.$

What's hot

O01741103108IOSR Journals

Extraction of Socio-Semantic Data from Chat Conversations in Collaborative Le...Traian Rebedea

NLP_Project_Paper_up276_vec241Urjit Patel

Truth management systemMohammad Kamrul Hasan

Nlp research presentationSurya Sg

Text summarizationkareemhashem

NLP Project PresentationAryak Sengupta

10.1.1.35.8376Mahmoud Abdullah

Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...ijtsrd

Artificial IntelligenceInstitute of Technology Telkom

Ihi2012 semantic-similarity-tutorial-part1University of Minnesota, Duluth

How to write a paperIrika Widiasanti

Induction and Decision Tree Learning (Part 1)butest

Natural Language Processing Theory, Applications and Difficultiesijtsrd

Machine Learning Applications in NLP.pptbutest

IRJET- Survey on Text Error Detection using Deep LearningIRJET Journal

Chapter 6 : Connectionist ApproachesPiseth Chea

Artificial intelligence cs607 handouts lecture 11 - 45Sattar kayani

Sementic netsMudassar Salfi

What's hot (19)

O01741103108

Extraction of Socio-Semantic Data from Chat Conversations in Collaborative Le...

NLP_Project_Paper_up276_vec241

Truth management system

Nlp research presentation

Text summarization

NLP Project Presentation

10.1.1.35.8376

Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...

Artificial Intelligence

Ihi2012 semantic-similarity-tutorial-part1

How to write a paper

Induction and Decision Tree Learning (Part 1)

Natural Language Processing Theory, Applications and Difficulties

Machine Learning Applications in NLP.ppt

IRJET- Survey on Text Error Detection using Deep Learning

Chapter 6 : Connectionist Approaches

Artificial intelligence cs607 handouts lecture 11 - 45

Sementic nets

Recently uploaded (20)

(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...

BigBuy dropshipping via API with DroFx.pptx

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh

Zuja dropshipping via API with DroFx.pptx

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure

April 2024 - Crypto Market Report's Analysis

Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call

Data-Analysis for Chicago Crime Data 2023

Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night

Generative AI on Enterprise Cloud with NiFi and Milvus

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE

Sampling (random) method and Non random.ppt

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call

Determinants of health, dimensions of health, positive health and spectrum of...

Schema on read is obsolete. Welcome metaprogramming..pdf

Smarteg dropshipping via API with DroFx.pptx

Informed Neural Topic Model with Pre-trained Word Embeddings

1. Document Informed Neural Autoregressive Topic Models with Distributional Prior Informed Neural Topic Model with Pre-trained Word Embeddings Pankaj Gupta1,2 , Yatin Chaudhary2 , Florian Buettner2 & Hinrich Schütze1 1 CIS, University of Munich (LMU), Germany 2 Corporate Technology, Machine-Intelligence, Siemens AG Munich, Germany pankaj.gupta@campus.lmu.de | pankaj.gupta@siemens.com Introduction A novel Neural Autoregressive Topic Model for short and long texts, empowered by: • Context-awareness in learning better representations, • Distributional semantics i.e., Word Embeddings as prior knowledge Problem Statement / Motivation 1. “Need for the context-awareness in representation learning”? • To determine actual meaning of ambiguous words • To improve word and document representations Figure 1: Need for context-awareness in learning representations 2. “Need for Prior Knowledge in the limited context settings”? • ↓ word co-occurrences in short texts (e.g., headlines, tweets) or small corpora • Difficult to learn good representations → Generates incoherent topics Figure 2: (left): Word embedding similarity (right): Topic examples Figure 3: Contributions in this work Evaluation and Analysis • 8 short-text and 7 long-text datasets from news, Q&A, sentiment and Industrial domains • Generalization (perplexity), Interpretability (topic coherence), Text retrieval (IR) and classification Table 1: Perplexity (PPL) and IR-precision (at fraction 0.02) scores for short and long texts Table 2: IR-precision at different retrieval fractions Methodology: Document Neural Autoregressive Topic Models Figure 4: (left): DocNADE[1] (baseline) (right): iDocNADE (DocNADE + context-awareness) Figure 5: (left): DocNADE[1] (baseline) (right): DocNADEe (DocNADE + Word Embeddings) Table 3: (left): Topic coherence with the top 10 and 20 words (right): Qualitative example Table 4: (left): Text classification (F1 and Accuracy) scores for short texts Conclusion & Key Takeaways • Leverage full context + pre-trained word embeddings in neural autoregressive topic model • Gain of 5.2% (404 vs 426) in PPL, 2.8% (.74 vs .72) in coherence, 11.1% (.60 vs .54) in IR- precision, 5.2% (.664 vs .631) in F1 for text categorization, on avg over 15 datasets • Demonstrate learning better word/document representation for short and long texts • Tryout: Code available at: https://github.com/pgcool/iDocNADEe Our recent extension of this work: “textTOvec” Pankaj Gupta, Yatin Chaudhary, Florian Buettner and Hinrich Schütze. textTOvec: Deep Contextu- alized Neural Autoregressive Topic Models of Language with Distributed Compositional Prior. To appear in ICLR2019. TL;DR: A Neural Topic Model with Language Structures. References [1] Hugo Larochelle and Stanislas Lauly. A neural autoregressive topic model. In Advances in Neural Information Processing Systems 25, pages 2708–2716. Curran Associates, Inc., 2012.

Informed Neural Topic Model with Pre-trained Word Embeddings

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Informed Neural Topic Model with Pre-trained Word Embeddings

Similar to Informed Neural Topic Model with Pre-trained Word Embeddings (20)

More from Pankaj Gupta, PhD

More from Pankaj Gupta, PhD (13)

Recently uploaded

Recently uploaded (20)

Informed Neural Topic Model with Pre-trained Word Embeddings