SlideShare a Scribd company logo
Intern © Siemens AG 2017
Document Informed Neural Autoregressive
Topic Models with Distributional Prior
Author(s): Pankaj Gupta1,2, Yatin Chaudhary2, Florian Büttner2 and Hinrich Schütze1
Presenter: Pankaj Gupta @AAAI19 Honolulu Hawaii, USA. Jan 2019
1CIS, University of Munich (LMU) | 2 Machine Intelligence, Siemens AG | Jan 2019
Intern © Siemens AG 2017
May 2017Seite 2 Corporate Technology
Outline
Motivation
→ Context awareness in Topic Models
→ Distributional semantics in Neural Topic Model
→ Baseline: DocNADE
Proposed Models
→ DocNADE+ context awareness, i.e., iDocNADE
→ DocNADE + word embeddings priors, i.e., DocNADEe
Evaluation
→ Generalization, Topic Coherence, Text retrieval and classification
Intern © Siemens AG 2017
May 2017Seite 3 Corporate Technology
Outline
Motivation
→ Context awareness in Topic Models
→ Distributional semantics in Neural Topic Model
→ Baseline: DocNADE
Proposed Models
→ DocNADE+ context awareness, i.e., iDocNADE
→ DocNADE + word embeddings priors, i.e., DocNADEe
Evaluation
→ Generalization, Topic Coherence, Text retrieval and classification
Intern © Siemens AG 2017
May 2017Seite 4 Corporate Technology
Outline
Motivation
→ Context awareness in Topic Models
→ Distributional semantics in Neural Topic Model
→ Baseline: DocNADE
Proposed Models
→ DocNADE+ context awareness, i.e., iDocNADE
→ DocNADE + word embeddings priors, i.e., DocNADEe
Evaluation
→ Generalization, Topic Coherence, Text retrieval and classification
Intern © Siemens AG 2017
May 2017Seite 5 Corporate Technology
Outline
Motivation
→ Context awareness in Topic Models
→ Distributional semantics in Neural Topic Model
→ Baseline: DocNADE
Proposed Models
→ DocNADE+ context awareness, i.e., iDocNADE
→ DocNADE + word embeddings priors, i.e., DocNADEe
Evaluation
→ Generalization, Topic Coherence, Text retrieval and classification
Intern © Siemens AG 2017
May 2017Seite 6 Corporate Technology
This Work: CONTRIBUTIONS
Incorporate context information around words
→ determining actual meaning of ambiguous words
→ improving word and document representations
Improving Topic Modeling for short-text and long-text documents
Incorporate external knowledge for each word
→ using distributional semantics, i.e., word embeddings
→ improving document representations and topics
Intern © Siemens AG 2017
May 2017Seite 7 Corporate Technology
This Work: CONTRIBUTIONS
Incorporate context information around words
→ determining actual meaning of ambiguous words
→ improving word and document representations
Improving Topic Modeling for short-text and long-text documents
Incorporate external knowledge for each word
→ using distributional semantics, i.e., word embeddings
→ improving document representations and topics
Intern © Siemens AG 2017
May 2017Seite 8 Corporate Technology
This Work: CONTRIBUTIONS
Incorporate context information around words
→ determining actual meaning of ambiguous words
→ improving word and document representations
Improving Topic Modeling for short-text and long-text documents
Incorporate external knowledge for each word
→ using distributional semantics, i.e., word embeddings
→ improving document representations and topics
Intern © Siemens AG 2017
May 2017Seite 9 Corporate Technology
This Work: CONTRIBUTIONS
Incorporate context information around words
→ determining actual meaning of ambiguous words
→ improving word and document representations
Improving Topic Modeling for short-text and long-text documents
Incorporate external knowledge for each word
→ using distributional semantics, i.e., word embeddings
→ improving document representations and topics
Intern © Siemens AG 2017
May 2017Seite 10 Corporate Technology
This Work: CONTRIBUTIONS
Incorporate context information around words
→ determining actual meaning of ambiguous words
→ improving word and document representations
Improving Topic Modeling for short-text and long-text documents
Incorporate external knowledge for each word
→ using distributional semantics, i.e., word embeddings
→ improving document representations and topics
Intern © Siemens AG 2017
May 2017Seite 11 Corporate Technology
This Work: CONTRIBUTIONS
Incorporate context information around words
→ determining actual meaning of ambiguous words
→ improving word and document representations
Improving Topic Modeling for short-text and long-text documents
Incorporate external knowledge for each word
→ using distributional semantics, i.e., word embeddings
→ improving document representations and topics
Intern © Siemens AG 2017
May 2017Seite 12 Corporate Technology
Motivation1: Need for Full Contextual Information
Source Text Sense/Topic
In biological brains, we study noisy neurons at cellular level → “biological neural network”
Like biological brains, study of noisy neurons in artificial neural networks → “artificial neural network”
Intern © Siemens AG 2017
May 2017Seite 13 Corporate Technology
Motivation1: Need for Full Contextual Information
Source Text Sense/Topic
In biological brains, we study noisy neurons at cellular level → “biological neural network”
Like biological brains, study of noisy neurons in artificial neural networks → “artificial neural network”
Preceding Context Following Context Sense/Topic of “neurons”
Like biological brains, study of noisy + in artificial neural networks → “biological neural network”
Intern © Siemens AG 2017
May 2017Seite 14 Corporate Technology
Motivation1: Need for Full Contextual Information
Source Text Sense/Topic
In biological brains, we study noisy neurons at cellular level → “biological neural network”
Like biological brains, study of noisy neurons in artificial neural networks → “artificial neural network”
Preceding Context Following Context Sense/Topic of “neurons”
Like biological brains, study of noisy + in artificial neural networks → “biological neural network”
Like biological brains, study of noisy + in artificial neural networks → “artificial neural network”
Intern © Siemens AG 2017
May 2017Seite 15 Corporate Technology
Motivation1: Need for Full Contextual Information
Source Text Sense/Topic
In biological brains, we study noisy neurons at cellular level → “biological neural network”
Like biological brains, study of noisy neurons in artificial neural networks → “artificial neural network”
Preceding Context Following Context Sense/Topic of “neurons”
Like biological brains, study of noisy + in artificial neural networks → “biological neural network”
Like biological brains, study of noisy + in artificial neural networks → “artificial neural network”
Context information around words helps in determining their actual meaning !!!
Intern © Siemens AG 2017
May 2017Seite 16 Corporate Technology
Motivation2: Need for Distributional Semantics or Prior Knowledge
➢ “Lack of Context” in short-text documents, e.g., headlines, tweets, etc.
➢ “Lack of Context” in a corpus of few documents
Small number of
Word co-occurrences
Intern © Siemens AG 2017
May 2017Seite 17 Corporate Technology
Motivation2: Need for Distributional Semantics or Prior Knowledge
➢ “Lack of Context” in short-text documents, e.g., headlines, tweets, etc.
➢ “Lack of Context” in a corpus of few documents
Small number of
Word co-occurrences
Lack of Context
Difficult to learn good representations
Intern © Siemens AG 2017
May 2017Seite 18 Corporate Technology
Motivation2: Need for Distributional Semantics or Prior Knowledge
➢ “Lack of Context” in short-text documents, e.g., headlines, tweets, etc.
➢ “Lack of Context” in a corpus of few documents
Small number of
Word co-occurrences
Lack of Context
Generate Incoherent TopicsDifficult to learn good representations
Intern © Siemens AG 2017
May 2017Seite 19 Corporate Technology
Motivation2: Need for Distributional Semantics or Prior Knowledge
➢ “Lack of Context” in short-text documents, e.g., headlines, tweets, etc.
➢ “Lack of Context” in a corpus of few documents
Small number of
Word co-occurrences
Lack of Context
Generate Incoherent TopicsDifficult to learn good representations
Topic1: price, wall, china, fall, shares
Topic2: shares, price, profits, rises, earnings
incoherent
coherent
example topics for ‘trading’
Intern © Siemens AG 2017
May 2017Seite 20 Corporate Technology
Motivation2: Need for Distributional Semantics or Prior Knowledge
➢ “Lack of Context” in short-text documents, e.g., headlines, tweets, etc.
➢ “Lack of Context” in a corpus of few documents
Small number of
Word co-occurrences
Lack of Context
Generate Incoherent TopicsDifficult to learn good representations
TO RESCUE: Use External/additional information, e.g., WORD EMBEDDINGS
(encodes semantic and syntactic relatedness in words in a vector space)
Intern © Siemens AG 2017
May 2017Seite 21 Corporate Technology
Motivation2: Need for Distributional Semantics or Prior Knowledge
➢ “Lack of Context” in short-text documents, e.g., headlines, tweets, etc.
➢ “Lack of Context” in a corpus of few documents
Small number of
Word co-occurrences
Lack of Context
Generate Incoherent TopicsDifficult to learn good representations
→ trading
No word overlap
(e.g., 1-hot-encoding)
Same
topic
class→ trading
TO RESCUE: Use External/additional information, e.g., WORD EMBEDDINGS
(encodes semantic and syntactic relatedness in words in a vector space)
Intern © Siemens AG 2017
May 2017Seite 22 Corporate Technology
Baseline: Neural Autoregressive Topic Model (DocNADE)
A probabilistic graphical model inspired by RBM, RSM and NADE models
➢ learns topics over sequences of words, v
➢ learn distributed word representations based on word co-occurrences
➢ follows encoder-decoder mechanism of unsupervised learning
EncodingDecoding
Intern © Siemens AG 2017
May 2017Seite 23 Corporate Technology
Baseline: Neural Autoregressive Topic Model (DocNADE)
A probabilistic graphical model inspired by RBM, RSM and NADE models
➢ learns topics over sequences of words, v
➢ learn distributed word representations based on word co-occurrences
➢ follows encoder-decoder principle
➢ computes joint distribution or log-likelihood for a document, v
in language modeling fashion via autoregressive conditionals
i.e., predict the word vi, given the sequence of preceding words v<i
autoregressive
conditional, p(vi=3 | v<3)
via a feed-forward neural network, where
EncodingDecoding
Intern © Siemens AG 2017
May 2017Seite 24 Corporate Technology
Baseline: Neural Autoregressive Topic Model (DocNADE)
A probabilistic graphical model inspired by RBM, RSM and NADE models
➢ learns topics over sequences of words, v
➢ learn distributed word representations based on word co-occurrences
➢ follows encoder-decoder principle
➢ computes joint distribution or log-likelihood for a document, v
in language modeling fashion via autoregressive conditionals
i.e., predict the word vi, given the sequence of preceding words v<i
autoregressive
conditional, p(vi=3 | v<3)
via a feed-forward neural network, where
where,
Topic matrix
Encoding
EncodingDecoding
Embedding aggregation
Intern © Siemens AG 2017
May 2017Seite 25 Corporate Technology
Baseline: Neural Autoregressive Topic Model (DocNADE)
Limitations
➢ does not take into account the following words v>i in the sequence
➢ poor in modeling short-text documents due to limited context
➢ does not use pre-trained word embeddings (or external knowledge)
autoregressive
conditional, p(vi=3 | v<3)
EncodingDecoding
DOES NOT take into account the
following words v>i
Intern © Siemens AG 2017
May 2017Seite 26 Corporate Technology
Proposed Neural Architectures, extending DocNADE to
→incorporate full context information
→incorporate pre-trained Word Embeddings
Intern © Siemens AG 2017
May 2017Seite 27 Corporate Technology
Proposed Variant1: Contextualized DocNADE (iDocNADE)
Intern © Siemens AG 2017
May 2017Seite 28 Corporate Technology
Proposed Variant1: Contextualized DocNADE (iDocNADE)
➢ incorporating full contextual information around words in a document (preceding and following words)
➢ boost the likelihood of each word and subsequently the document likelihood
➢ improved representation learning
Intern © Siemens AG 2017
May 2017Seite 29 Corporate Technology
Proposed Variant1: Contextualized DocNADE (iDocNADE)
➢ incorporating full contextual information around words in a document (preceding and following words)
➢ boost the likelihood of each word and subsequently the document likelihood
➢ improved representation learning
Intern © Siemens AG 2017
May 2017Seite 30 Corporate Technology
Proposed Variant1: Contextualized DocNADE (iDocNADE)
Intern © Siemens AG 2017
May 2017Seite 31 Corporate Technology
Proposed Variant1: Contextualized DocNADE (iDocNADE)
DocNADE
Incomplete Context
around words
iDocNADE
Full Context
around words in
Need for
Intern © Siemens AG 2017
May 2017Seite 32 Corporate Technology
Proposed Variant2: DocNADE + Embedding Priors (DocNADEe)
Intern © Siemens AG 2017
May 2017Seite 33 Corporate Technology
Proposed Variant2: DocNADE + Embedding Priors ‘e’ (DocNADEe)
Intern © Siemens AG 2017
May 2017Seite 34 Corporate Technology
Proposed Variant2: DocNADE + Embedding Priors ‘e’ (DocNADEe)
➢ introduce weighted pre-trained
word embedding aggregation at
each autoregressive step k
➢ E, pretrained emb as fixed prior
➢ generate topics with embeddings
➢ learn a complementary textual
representation
E
Intern © Siemens AG 2017
May 2017Seite 35 Corporate Technology
Proposed Variant2: DocNADE + Embedding Priors ‘e’ (DocNADEe)
mixture weights
➢ introduce weighted pre-trained
word embedding aggregation at
each autoregressive step k
➢ E, pretrained emb as fixed prior
➢ generate topics with embeddings
➢ learn a complementary textual
representation
Intern © Siemens AG 2017
May 2017Seite 36 Corporate Technology
Proposed Variant3: iDocNADE + Embedding Priors ‘e’ (iDocNADEe)
Vairant1
iDocNADE
Vairant2
DocNADEe
Vairant3
iDocNADEe
Intern © Siemens AG 2017
May 2017Seite 37 Corporate Technology
Evaluation: Datasets, Statistics and Properties
➢ 8 short-text and 7 long-text datasets
➢ short-text → #words < 25 in a document
➢ a corpus of few documents (e.g., 20NSsmall)
➢ topics: 50 and 200 (hidden layer size)
➢ Quantitatively evaluated using:
- generalization (perplexity, PPL)
- interpretability using topic coherence
- text/information retrieval (IR)
- text classification
Intern © Siemens AG 2017
May 2017Seite 38 Corporate Technology
Evaluation: Datasets, Statistics and Properties
➢ 8 short-text and 7 long-text datasets
➢ short-text → #words < 25 in a document
➢ a corpus of few documents (e.g., 20NSsmall)
➢ topics: 50 and 200 (hidden layer size)
➢ Quantitatively evaluated using:
- generalization (perplexity, PPL)
- interpretability using topic coherence
- text/information retrieval (IR)
- text classification
Intern © Siemens AG 2017
May 2017Seite 39 Corporate Technology
Evaluation: Generalization via Perplexity (PPL)
Generalization (PPL)
→ lower the better
→ on short-text datasets
Intern © Siemens AG 2017
May 2017Seite 40 Corporate Technology
Evaluation: Generalization via Perplexity (PPL)
Generalization (PPL)
→ lower the better
→ on short-text datasets
Intern © Siemens AG 2017
May 2017Seite 41 Corporate Technology
Evaluation: Generalization via Perplexity (PPL)
Generalization (PPL)
→ lower the better
→ on short-text datasets
Gain (%)
4.1%
Intern © Siemens AG 2017
May 2017Seite 42 Corporate Technology
Evaluation: Generalization via Perplexity (PPL)
Generalization (PPL)
→ lower the better
→ on short-text datasets
Gain (%)
4.1% 4.3%
Intern © Siemens AG 2017
May 2017Seite 43 Corporate Technology
Evaluation: Generalization via Perplexity (PPL)
Generalization (PPL)
→ lower the better
→ on short-text datasets
Gain (%)
4.1% 4.3% 5.1%
Intern © Siemens AG 2017
May 2017Seite 44 Corporate Technology
Evaluation: Generalization via Perplexity (PPL)
Generalization (PPL)
→ lower the better
→ on long-text datasets
Gain (%)
5.3% 4.8% 5.5%
Intern © Siemens AG 2017
May 2017Seite 45 Corporate Technology
Evaluation: Applicability (Information Retrieval)
IR-precision (on short-text data)
→ Precision at retrieval fraction 0.02
→ higher the better
short-text
Intern © Siemens AG 2017
May 2017Seite 46 Corporate Technology
Evaluation: Applicability (Information Retrieval)
IR-precision (on short-text data)
→ Precision at retrieval fraction 0.02
→ higher the better
Gain (%)
short-text
Intern © Siemens AG 2017
May 2017Seite 47 Corporate Technology
Evaluation: Applicability (Information Retrieval)
IR-precision (on short-text data)
→ Precision at retrieval fraction 0.02
→ higher the better
Gain (%)
5.6%
short-text
Intern © Siemens AG 2017
May 2017Seite 48 Corporate Technology
Evaluation: Applicability (Information Retrieval)
IR-precision (on short-text data)
→ Precision at retrieval fraction 0.02
→ higher the better
Gain (%)
5.6% 7.4%
short-text
Intern © Siemens AG 2017
May 2017Seite 49 Corporate Technology
Evaluation: Applicability (Information Retrieval)
IR-precision (on short-text data)
→ Precision at retrieval fraction 0.02
→ higher the better
Gain (%)
5.6% 7.4%
11.1%
short-text
Intern © Siemens AG 2017
May 2017Seite 50 Corporate Technology
Evaluation: Applicability (Information Retrieval)
IR-precision (on long-text data)
→ Precision at retrieval fraction 0.02
→ higher the better
Gain (%)
7.1% 7.1%7.1%
long-text
Intern © Siemens AG 2017
May 2017Seite 51 Corporate Technology
Evaluation: Applicability (Information Retrieval)
TMNtitle AGnewstitle
Intern © Siemens AG 2017
May 2017Seite 52 Corporate Technology
Evaluation: Interpretability (Topic Coherence)
→ assess the meaningfulness of
the underlying topics captured
→ coherence measure proposed by
Roeder, Both, and Hinneburg (2015)
→ higher scores imply
more coherent topics
Intern © Siemens AG 2017
May 2017Seite 53 Corporate Technology
Evaluation: Interpretability (Topic Coherence)
→ assess the meaningfulness of
the underlying topics captured
→ coherence measure proposed by
Roeder, Both, and Hinneburg (2015)
→ higher scores imply
more coherent topics
short-text
Intern © Siemens AG 2017
May 2017Seite 54 Corporate Technology
Evaluation: Interpretability (Topic Coherence)
→ assess the meaningfulness of
the underlying topics captured
→ coherence measure proposed by
Roeder, Both, and Hinneburg (2015)
→ higher scores imply
more coherent topics
long-text
Intern © Siemens AG 2017
May 2017Seite 55 Corporate Technology
Evaluation: Qualitative Topics (e.g., ‘religion’)
coherent
topic words
Intern © Siemens AG 2017
May 2017Seite 56 Corporate Technology
Conclusion: Take Away
➢ Leveraging full contextual information in neural autoregressive topic model
➢ Introducing distributional priors via pre-trained word embeddings
➢ Gain of 5.2% (404 vs 426) in perplexity,
2.8% (.74 vs .72) in topic coherence,
11.1% (.60 vs .54) in precision at retrieval fraction 0.02,
5.2% (.664 vs .631) in F1 for text categorization
on avg over 15 datasets
➢ Learning better word/document representation for short/long texts
➢ State-of-the-art topic models unified with textual representation learning
Intern © Siemens AG 2017
May 2017Seite 57 Corporate Technology
Conclusion: Take Away
➢ Leveraging full contextual information in neural autoregressive topic model
➢ Introducing distributional priors via pre-trained word embeddings
➢ Gain of 5.2% (404 vs 426) in perplexity,
2.8% (.74 vs .72) in topic coherence,
11.1% (.60 vs .54) in precision at retrieval fraction 0.02,
5.2% (.664 vs .631) in F1 for text categorization
on avg over 15 datasets
➢ Learning better word/document representation for short/long texts
➢ State-of-the-art topic models unified with textual representation learning
Tryout: The code and data are available at https://github.com/pgcool/iDocNADEe
Thanks !!
“textTOvec”: Latest work to appear in ICLR19 | A Neural Topic Model with Language Structures

More Related Content

Similar to Document Informed Neural Autoregressive Topic Models with Distributional Prior

Data Science Lecture: Overview and Information Collateral
Data Science Lecture: Overview and Information CollateralData Science Lecture: Overview and Information Collateral
Data Science Lecture: Overview and Information Collateral
Frank Kienle
 
Text Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's NextText Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's Next
Seth Grimes
 
[Webinar Slides] Tapping the Power of Content Analytics – Exploring this Powe...
[Webinar Slides] Tapping the Power of Content Analytics – Exploring this Powe...[Webinar Slides] Tapping the Power of Content Analytics – Exploring this Powe...
[Webinar Slides] Tapping the Power of Content Analytics – Exploring this Powe...
AIIM International
 
Sentiment analysis in Twitter on Big Data
Sentiment analysis in Twitter on Big DataSentiment analysis in Twitter on Big Data
Sentiment analysis in Twitter on Big Data
Iswarya M
 
Schema Engineering for Enterprise Knowledge Graphs
Schema Engineering for Enterprise Knowledge GraphsSchema Engineering for Enterprise Knowledge Graphs
Schema Engineering for Enterprise Knowledge Graphs
Vera G. Meister
 
Hvordan få søk til å fungere effektivt
Hvordan få søk til å fungere effektivtHvordan få søk til å fungere effektivt
Hvordan få søk til å fungere effektivt
Kristian Norling
 
Highlights on most interesting RecSys papers - Elena Smirnova, Lowik Chanusso...
Highlights on most interesting RecSys papers - Elena Smirnova, Lowik Chanusso...Highlights on most interesting RecSys papers - Elena Smirnova, Lowik Chanusso...
Highlights on most interesting RecSys papers - Elena Smirnova, Lowik Chanusso...
recsysfr
 
How Customers Are Using the IBM Data Science Experience - Expected Cases and ...
How Customers Are Using the IBM Data Science Experience - Expected Cases and ...How Customers Are Using the IBM Data Science Experience - Expected Cases and ...
How Customers Are Using the IBM Data Science Experience - Expected Cases and ...
Databricks
 
Semantic engagement handouts
Semantic engagement handoutsSemantic engagement handouts
Semantic engagement handouts
STIinnsbruck
 
Document Classification Using KNN with Fuzzy Bags of Word Representation
Document Classification Using KNN with Fuzzy Bags of Word RepresentationDocument Classification Using KNN with Fuzzy Bags of Word Representation
Document Classification Using KNN with Fuzzy Bags of Word Representation
suthi
 
Themes for graduation projects 2010
Themes for graduation projects   2010Themes for graduation projects   2010
Themes for graduation projects 2010
mohamedsamyali
 
Using Knowledge Graph for Promoting Cognitive Computing
Using Knowledge Graph for Promoting Cognitive ComputingUsing Knowledge Graph for Promoting Cognitive Computing
Using Knowledge Graph for Promoting Cognitive Computing
Artificial Intelligence Institute at UofSC
 
Maturing Artificial Intelligence – Data Science for Real-World Applications
Maturing Artificial Intelligence – Data Science for Real-World ApplicationsMaturing Artificial Intelligence – Data Science for Real-World Applications
Maturing Artificial Intelligence – Data Science for Real-World Applications
scrdbsr
 
ANChor: A powerful approach to scientific communication
ANChor: A powerful approach to scientific communicationANChor: A powerful approach to scientific communication
ANChor: A powerful approach to scientific communication
Josh Inouye
 
Storytelling for research software engineers
Storytelling for research software engineersStorytelling for research software engineers
Storytelling for research software engineers
AlbanLevy
 
Guideline for Preparing PhD Course Work Synopsis on Engineering Technology - ...
Guideline for Preparing PhD Course Work Synopsis on Engineering Technology - ...Guideline for Preparing PhD Course Work Synopsis on Engineering Technology - ...
Guideline for Preparing PhD Course Work Synopsis on Engineering Technology - ...
PhD Assistance
 
NLP@DATEV: Setting up a domain specific language model, Dr. Jonas Rende & Tho...
NLP@DATEV: Setting up a domain specific language model, Dr. Jonas Rende & Tho...NLP@DATEV: Setting up a domain specific language model, Dr. Jonas Rende & Tho...
NLP@DATEV: Setting up a domain specific language model, Dr. Jonas Rende & Tho...
Erlangen Artificial Intelligence & Machine Learning Meetup
 
Guideline for Preparing PhD Course Work Synopsis on Engineering Technology - ...
Guideline for Preparing PhD Course Work Synopsis on Engineering Technology - ...Guideline for Preparing PhD Course Work Synopsis on Engineering Technology - ...
Guideline for Preparing PhD Course Work Synopsis on Engineering Technology - ...
PhD Assistance
 
Using NLP to understand textual content at scale
Using NLP to understand textual content at scaleUsing NLP to understand textual content at scale
Using NLP to understand textual content at scale
Parsa Ghaffari
 
Changing Times - the Future of ECM - AIIM 2017
Changing Times - the Future of ECM - AIIM 2017 Changing Times - the Future of ECM - AIIM 2017
Changing Times - the Future of ECM - AIIM 2017
Stephen Ludlow
 

Similar to Document Informed Neural Autoregressive Topic Models with Distributional Prior (20)

Data Science Lecture: Overview and Information Collateral
Data Science Lecture: Overview and Information CollateralData Science Lecture: Overview and Information Collateral
Data Science Lecture: Overview and Information Collateral
 
Text Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's NextText Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's Next
 
[Webinar Slides] Tapping the Power of Content Analytics – Exploring this Powe...
[Webinar Slides] Tapping the Power of Content Analytics – Exploring this Powe...[Webinar Slides] Tapping the Power of Content Analytics – Exploring this Powe...
[Webinar Slides] Tapping the Power of Content Analytics – Exploring this Powe...
 
Sentiment analysis in Twitter on Big Data
Sentiment analysis in Twitter on Big DataSentiment analysis in Twitter on Big Data
Sentiment analysis in Twitter on Big Data
 
Schema Engineering for Enterprise Knowledge Graphs
Schema Engineering for Enterprise Knowledge GraphsSchema Engineering for Enterprise Knowledge Graphs
Schema Engineering for Enterprise Knowledge Graphs
 
Hvordan få søk til å fungere effektivt
Hvordan få søk til å fungere effektivtHvordan få søk til å fungere effektivt
Hvordan få søk til å fungere effektivt
 
Highlights on most interesting RecSys papers - Elena Smirnova, Lowik Chanusso...
Highlights on most interesting RecSys papers - Elena Smirnova, Lowik Chanusso...Highlights on most interesting RecSys papers - Elena Smirnova, Lowik Chanusso...
Highlights on most interesting RecSys papers - Elena Smirnova, Lowik Chanusso...
 
How Customers Are Using the IBM Data Science Experience - Expected Cases and ...
How Customers Are Using the IBM Data Science Experience - Expected Cases and ...How Customers Are Using the IBM Data Science Experience - Expected Cases and ...
How Customers Are Using the IBM Data Science Experience - Expected Cases and ...
 
Semantic engagement handouts
Semantic engagement handoutsSemantic engagement handouts
Semantic engagement handouts
 
Document Classification Using KNN with Fuzzy Bags of Word Representation
Document Classification Using KNN with Fuzzy Bags of Word RepresentationDocument Classification Using KNN with Fuzzy Bags of Word Representation
Document Classification Using KNN with Fuzzy Bags of Word Representation
 
Themes for graduation projects 2010
Themes for graduation projects   2010Themes for graduation projects   2010
Themes for graduation projects 2010
 
Using Knowledge Graph for Promoting Cognitive Computing
Using Knowledge Graph for Promoting Cognitive ComputingUsing Knowledge Graph for Promoting Cognitive Computing
Using Knowledge Graph for Promoting Cognitive Computing
 
Maturing Artificial Intelligence – Data Science for Real-World Applications
Maturing Artificial Intelligence – Data Science for Real-World ApplicationsMaturing Artificial Intelligence – Data Science for Real-World Applications
Maturing Artificial Intelligence – Data Science for Real-World Applications
 
ANChor: A powerful approach to scientific communication
ANChor: A powerful approach to scientific communicationANChor: A powerful approach to scientific communication
ANChor: A powerful approach to scientific communication
 
Storytelling for research software engineers
Storytelling for research software engineersStorytelling for research software engineers
Storytelling for research software engineers
 
Guideline for Preparing PhD Course Work Synopsis on Engineering Technology - ...
Guideline for Preparing PhD Course Work Synopsis on Engineering Technology - ...Guideline for Preparing PhD Course Work Synopsis on Engineering Technology - ...
Guideline for Preparing PhD Course Work Synopsis on Engineering Technology - ...
 
NLP@DATEV: Setting up a domain specific language model, Dr. Jonas Rende & Tho...
NLP@DATEV: Setting up a domain specific language model, Dr. Jonas Rende & Tho...NLP@DATEV: Setting up a domain specific language model, Dr. Jonas Rende & Tho...
NLP@DATEV: Setting up a domain specific language model, Dr. Jonas Rende & Tho...
 
Guideline for Preparing PhD Course Work Synopsis on Engineering Technology - ...
Guideline for Preparing PhD Course Work Synopsis on Engineering Technology - ...Guideline for Preparing PhD Course Work Synopsis on Engineering Technology - ...
Guideline for Preparing PhD Course Work Synopsis on Engineering Technology - ...
 
Using NLP to understand textual content at scale
Using NLP to understand textual content at scaleUsing NLP to understand textual content at scale
Using NLP to understand textual content at scale
 
Changing Times - the Future of ECM - AIIM 2017
Changing Times - the Future of ECM - AIIM 2017 Changing Times - the Future of ECM - AIIM 2017
Changing Times - the Future of ECM - AIIM 2017
 

More from Pankaj Gupta, PhD

textTOvec: Deep Contextualized Neural Autoregressive Topic Models of Language...
textTOvec: Deep Contextualized Neural Autoregressive Topic Models of Language...textTOvec: Deep Contextualized Neural Autoregressive Topic Models of Language...
textTOvec: Deep Contextualized Neural Autoregressive Topic Models of Language...
Pankaj Gupta, PhD
 
Poster: Neural Relation ExtractionWithin and Across Sentence Boundaries
Poster: Neural Relation ExtractionWithin and Across Sentence BoundariesPoster: Neural Relation ExtractionWithin and Across Sentence Boundaries
Poster: Neural Relation ExtractionWithin and Across Sentence Boundaries
Pankaj Gupta, PhD
 
Poster: Document Informed Neural Autoregressive Topic Models with Distributio...
Poster: Document Informed Neural Autoregressive Topic Models with Distributio...Poster: Document Informed Neural Autoregressive Topic Models with Distributio...
Poster: Document Informed Neural Autoregressive Topic Models with Distributio...
Pankaj Gupta, PhD
 
Pankaj Gupta CV / Resume
Pankaj Gupta CV / ResumePankaj Gupta CV / Resume
Pankaj Gupta CV / Resume
Pankaj Gupta, PhD
 
LISA: Explaining RNN Judgments via Layer-wIse Semantic Accumulation and Examp...
LISA: Explaining RNN Judgments via Layer-wIse Semantic Accumulation and Examp...LISA: Explaining RNN Judgments via Layer-wIse Semantic Accumulation and Examp...
LISA: Explaining RNN Judgments via Layer-wIse Semantic Accumulation and Examp...
Pankaj Gupta, PhD
 
Replicated Siamese LSTM in Ticketing System for Similarity Learning and Retri...
Replicated Siamese LSTM in Ticketing System for Similarity Learning and Retri...Replicated Siamese LSTM in Ticketing System for Similarity Learning and Retri...
Replicated Siamese LSTM in Ticketing System for Similarity Learning and Retri...
Pankaj Gupta, PhD
 

More from Pankaj Gupta, PhD (6)

textTOvec: Deep Contextualized Neural Autoregressive Topic Models of Language...
textTOvec: Deep Contextualized Neural Autoregressive Topic Models of Language...textTOvec: Deep Contextualized Neural Autoregressive Topic Models of Language...
textTOvec: Deep Contextualized Neural Autoregressive Topic Models of Language...
 
Poster: Neural Relation ExtractionWithin and Across Sentence Boundaries
Poster: Neural Relation ExtractionWithin and Across Sentence BoundariesPoster: Neural Relation ExtractionWithin and Across Sentence Boundaries
Poster: Neural Relation ExtractionWithin and Across Sentence Boundaries
 
Poster: Document Informed Neural Autoregressive Topic Models with Distributio...
Poster: Document Informed Neural Autoregressive Topic Models with Distributio...Poster: Document Informed Neural Autoregressive Topic Models with Distributio...
Poster: Document Informed Neural Autoregressive Topic Models with Distributio...
 
Pankaj Gupta CV / Resume
Pankaj Gupta CV / ResumePankaj Gupta CV / Resume
Pankaj Gupta CV / Resume
 
LISA: Explaining RNN Judgments via Layer-wIse Semantic Accumulation and Examp...
LISA: Explaining RNN Judgments via Layer-wIse Semantic Accumulation and Examp...LISA: Explaining RNN Judgments via Layer-wIse Semantic Accumulation and Examp...
LISA: Explaining RNN Judgments via Layer-wIse Semantic Accumulation and Examp...
 
Replicated Siamese LSTM in Ticketing System for Similarity Learning and Retri...
Replicated Siamese LSTM in Ticketing System for Similarity Learning and Retri...Replicated Siamese LSTM in Ticketing System for Similarity Learning and Retri...
Replicated Siamese LSTM in Ticketing System for Similarity Learning and Retri...
 

Recently uploaded

Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 

Recently uploaded (20)

Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 

Document Informed Neural Autoregressive Topic Models with Distributional Prior

  • 1. Intern © Siemens AG 2017 Document Informed Neural Autoregressive Topic Models with Distributional Prior Author(s): Pankaj Gupta1,2, Yatin Chaudhary2, Florian Büttner2 and Hinrich Schütze1 Presenter: Pankaj Gupta @AAAI19 Honolulu Hawaii, USA. Jan 2019 1CIS, University of Munich (LMU) | 2 Machine Intelligence, Siemens AG | Jan 2019
  • 2. Intern © Siemens AG 2017 May 2017Seite 2 Corporate Technology Outline Motivation → Context awareness in Topic Models → Distributional semantics in Neural Topic Model → Baseline: DocNADE Proposed Models → DocNADE+ context awareness, i.e., iDocNADE → DocNADE + word embeddings priors, i.e., DocNADEe Evaluation → Generalization, Topic Coherence, Text retrieval and classification
  • 3. Intern © Siemens AG 2017 May 2017Seite 3 Corporate Technology Outline Motivation → Context awareness in Topic Models → Distributional semantics in Neural Topic Model → Baseline: DocNADE Proposed Models → DocNADE+ context awareness, i.e., iDocNADE → DocNADE + word embeddings priors, i.e., DocNADEe Evaluation → Generalization, Topic Coherence, Text retrieval and classification
  • 4. Intern © Siemens AG 2017 May 2017Seite 4 Corporate Technology Outline Motivation → Context awareness in Topic Models → Distributional semantics in Neural Topic Model → Baseline: DocNADE Proposed Models → DocNADE+ context awareness, i.e., iDocNADE → DocNADE + word embeddings priors, i.e., DocNADEe Evaluation → Generalization, Topic Coherence, Text retrieval and classification
  • 5. Intern © Siemens AG 2017 May 2017Seite 5 Corporate Technology Outline Motivation → Context awareness in Topic Models → Distributional semantics in Neural Topic Model → Baseline: DocNADE Proposed Models → DocNADE+ context awareness, i.e., iDocNADE → DocNADE + word embeddings priors, i.e., DocNADEe Evaluation → Generalization, Topic Coherence, Text retrieval and classification
  • 6. Intern © Siemens AG 2017 May 2017Seite 6 Corporate Technology This Work: CONTRIBUTIONS Incorporate context information around words → determining actual meaning of ambiguous words → improving word and document representations Improving Topic Modeling for short-text and long-text documents Incorporate external knowledge for each word → using distributional semantics, i.e., word embeddings → improving document representations and topics
  • 7. Intern © Siemens AG 2017 May 2017Seite 7 Corporate Technology This Work: CONTRIBUTIONS Incorporate context information around words → determining actual meaning of ambiguous words → improving word and document representations Improving Topic Modeling for short-text and long-text documents Incorporate external knowledge for each word → using distributional semantics, i.e., word embeddings → improving document representations and topics
  • 8. Intern © Siemens AG 2017 May 2017Seite 8 Corporate Technology This Work: CONTRIBUTIONS Incorporate context information around words → determining actual meaning of ambiguous words → improving word and document representations Improving Topic Modeling for short-text and long-text documents Incorporate external knowledge for each word → using distributional semantics, i.e., word embeddings → improving document representations and topics
  • 9. Intern © Siemens AG 2017 May 2017Seite 9 Corporate Technology This Work: CONTRIBUTIONS Incorporate context information around words → determining actual meaning of ambiguous words → improving word and document representations Improving Topic Modeling for short-text and long-text documents Incorporate external knowledge for each word → using distributional semantics, i.e., word embeddings → improving document representations and topics
  • 10. Intern © Siemens AG 2017 May 2017Seite 10 Corporate Technology This Work: CONTRIBUTIONS Incorporate context information around words → determining actual meaning of ambiguous words → improving word and document representations Improving Topic Modeling for short-text and long-text documents Incorporate external knowledge for each word → using distributional semantics, i.e., word embeddings → improving document representations and topics
  • 11. Intern © Siemens AG 2017 May 2017Seite 11 Corporate Technology This Work: CONTRIBUTIONS Incorporate context information around words → determining actual meaning of ambiguous words → improving word and document representations Improving Topic Modeling for short-text and long-text documents Incorporate external knowledge for each word → using distributional semantics, i.e., word embeddings → improving document representations and topics
  • 12. Intern © Siemens AG 2017 May 2017Seite 12 Corporate Technology Motivation1: Need for Full Contextual Information Source Text Sense/Topic In biological brains, we study noisy neurons at cellular level → “biological neural network” Like biological brains, study of noisy neurons in artificial neural networks → “artificial neural network”
  • 13. Intern © Siemens AG 2017 May 2017Seite 13 Corporate Technology Motivation1: Need for Full Contextual Information Source Text Sense/Topic In biological brains, we study noisy neurons at cellular level → “biological neural network” Like biological brains, study of noisy neurons in artificial neural networks → “artificial neural network” Preceding Context Following Context Sense/Topic of “neurons” Like biological brains, study of noisy + in artificial neural networks → “biological neural network”
  • 14. Intern © Siemens AG 2017 May 2017Seite 14 Corporate Technology Motivation1: Need for Full Contextual Information Source Text Sense/Topic In biological brains, we study noisy neurons at cellular level → “biological neural network” Like biological brains, study of noisy neurons in artificial neural networks → “artificial neural network” Preceding Context Following Context Sense/Topic of “neurons” Like biological brains, study of noisy + in artificial neural networks → “biological neural network” Like biological brains, study of noisy + in artificial neural networks → “artificial neural network”
  • 15. Intern © Siemens AG 2017 May 2017Seite 15 Corporate Technology Motivation1: Need for Full Contextual Information Source Text Sense/Topic In biological brains, we study noisy neurons at cellular level → “biological neural network” Like biological brains, study of noisy neurons in artificial neural networks → “artificial neural network” Preceding Context Following Context Sense/Topic of “neurons” Like biological brains, study of noisy + in artificial neural networks → “biological neural network” Like biological brains, study of noisy + in artificial neural networks → “artificial neural network” Context information around words helps in determining their actual meaning !!!
  • 16. Intern © Siemens AG 2017 May 2017Seite 16 Corporate Technology Motivation2: Need for Distributional Semantics or Prior Knowledge ➢ “Lack of Context” in short-text documents, e.g., headlines, tweets, etc. ➢ “Lack of Context” in a corpus of few documents Small number of Word co-occurrences
  • 17. Intern © Siemens AG 2017 May 2017Seite 17 Corporate Technology Motivation2: Need for Distributional Semantics or Prior Knowledge ➢ “Lack of Context” in short-text documents, e.g., headlines, tweets, etc. ➢ “Lack of Context” in a corpus of few documents Small number of Word co-occurrences Lack of Context Difficult to learn good representations
  • 18. Intern © Siemens AG 2017 May 2017Seite 18 Corporate Technology Motivation2: Need for Distributional Semantics or Prior Knowledge ➢ “Lack of Context” in short-text documents, e.g., headlines, tweets, etc. ➢ “Lack of Context” in a corpus of few documents Small number of Word co-occurrences Lack of Context Generate Incoherent TopicsDifficult to learn good representations
  • 19. Intern © Siemens AG 2017 May 2017Seite 19 Corporate Technology Motivation2: Need for Distributional Semantics or Prior Knowledge ➢ “Lack of Context” in short-text documents, e.g., headlines, tweets, etc. ➢ “Lack of Context” in a corpus of few documents Small number of Word co-occurrences Lack of Context Generate Incoherent TopicsDifficult to learn good representations Topic1: price, wall, china, fall, shares Topic2: shares, price, profits, rises, earnings incoherent coherent example topics for ‘trading’
  • 20. Intern © Siemens AG 2017 May 2017Seite 20 Corporate Technology Motivation2: Need for Distributional Semantics or Prior Knowledge ➢ “Lack of Context” in short-text documents, e.g., headlines, tweets, etc. ➢ “Lack of Context” in a corpus of few documents Small number of Word co-occurrences Lack of Context Generate Incoherent TopicsDifficult to learn good representations TO RESCUE: Use External/additional information, e.g., WORD EMBEDDINGS (encodes semantic and syntactic relatedness in words in a vector space)
  • 21. Intern © Siemens AG 2017 May 2017Seite 21 Corporate Technology Motivation2: Need for Distributional Semantics or Prior Knowledge ➢ “Lack of Context” in short-text documents, e.g., headlines, tweets, etc. ➢ “Lack of Context” in a corpus of few documents Small number of Word co-occurrences Lack of Context Generate Incoherent TopicsDifficult to learn good representations → trading No word overlap (e.g., 1-hot-encoding) Same topic class→ trading TO RESCUE: Use External/additional information, e.g., WORD EMBEDDINGS (encodes semantic and syntactic relatedness in words in a vector space)
  • 22. Intern © Siemens AG 2017 May 2017Seite 22 Corporate Technology Baseline: Neural Autoregressive Topic Model (DocNADE) A probabilistic graphical model inspired by RBM, RSM and NADE models ➢ learns topics over sequences of words, v ➢ learn distributed word representations based on word co-occurrences ➢ follows encoder-decoder mechanism of unsupervised learning EncodingDecoding
  • 23. Intern © Siemens AG 2017 May 2017Seite 23 Corporate Technology Baseline: Neural Autoregressive Topic Model (DocNADE) A probabilistic graphical model inspired by RBM, RSM and NADE models ➢ learns topics over sequences of words, v ➢ learn distributed word representations based on word co-occurrences ➢ follows encoder-decoder principle ➢ computes joint distribution or log-likelihood for a document, v in language modeling fashion via autoregressive conditionals i.e., predict the word vi, given the sequence of preceding words v<i autoregressive conditional, p(vi=3 | v<3) via a feed-forward neural network, where EncodingDecoding
  • 24. Intern © Siemens AG 2017 May 2017Seite 24 Corporate Technology Baseline: Neural Autoregressive Topic Model (DocNADE) A probabilistic graphical model inspired by RBM, RSM and NADE models ➢ learns topics over sequences of words, v ➢ learn distributed word representations based on word co-occurrences ➢ follows encoder-decoder principle ➢ computes joint distribution or log-likelihood for a document, v in language modeling fashion via autoregressive conditionals i.e., predict the word vi, given the sequence of preceding words v<i autoregressive conditional, p(vi=3 | v<3) via a feed-forward neural network, where where, Topic matrix Encoding EncodingDecoding Embedding aggregation
  • 25. Intern © Siemens AG 2017 May 2017Seite 25 Corporate Technology Baseline: Neural Autoregressive Topic Model (DocNADE) Limitations ➢ does not take into account the following words v>i in the sequence ➢ poor in modeling short-text documents due to limited context ➢ does not use pre-trained word embeddings (or external knowledge) autoregressive conditional, p(vi=3 | v<3) EncodingDecoding DOES NOT take into account the following words v>i
  • 26. Intern © Siemens AG 2017 May 2017Seite 26 Corporate Technology Proposed Neural Architectures, extending DocNADE to →incorporate full context information →incorporate pre-trained Word Embeddings
  • 27. Intern © Siemens AG 2017 May 2017Seite 27 Corporate Technology Proposed Variant1: Contextualized DocNADE (iDocNADE)
  • 28. Intern © Siemens AG 2017 May 2017Seite 28 Corporate Technology Proposed Variant1: Contextualized DocNADE (iDocNADE) ➢ incorporating full contextual information around words in a document (preceding and following words) ➢ boost the likelihood of each word and subsequently the document likelihood ➢ improved representation learning
  • 29. Intern © Siemens AG 2017 May 2017Seite 29 Corporate Technology Proposed Variant1: Contextualized DocNADE (iDocNADE) ➢ incorporating full contextual information around words in a document (preceding and following words) ➢ boost the likelihood of each word and subsequently the document likelihood ➢ improved representation learning
  • 30. Intern © Siemens AG 2017 May 2017Seite 30 Corporate Technology Proposed Variant1: Contextualized DocNADE (iDocNADE)
  • 31. Intern © Siemens AG 2017 May 2017Seite 31 Corporate Technology Proposed Variant1: Contextualized DocNADE (iDocNADE) DocNADE Incomplete Context around words iDocNADE Full Context around words in Need for
  • 32. Intern © Siemens AG 2017 May 2017Seite 32 Corporate Technology Proposed Variant2: DocNADE + Embedding Priors (DocNADEe)
  • 33. Intern © Siemens AG 2017 May 2017Seite 33 Corporate Technology Proposed Variant2: DocNADE + Embedding Priors ‘e’ (DocNADEe)
  • 34. Intern © Siemens AG 2017 May 2017Seite 34 Corporate Technology Proposed Variant2: DocNADE + Embedding Priors ‘e’ (DocNADEe) ➢ introduce weighted pre-trained word embedding aggregation at each autoregressive step k ➢ E, pretrained emb as fixed prior ➢ generate topics with embeddings ➢ learn a complementary textual representation E
  • 35. Intern © Siemens AG 2017 May 2017Seite 35 Corporate Technology Proposed Variant2: DocNADE + Embedding Priors ‘e’ (DocNADEe) mixture weights ➢ introduce weighted pre-trained word embedding aggregation at each autoregressive step k ➢ E, pretrained emb as fixed prior ➢ generate topics with embeddings ➢ learn a complementary textual representation
  • 36. Intern © Siemens AG 2017 May 2017Seite 36 Corporate Technology Proposed Variant3: iDocNADE + Embedding Priors ‘e’ (iDocNADEe) Vairant1 iDocNADE Vairant2 DocNADEe Vairant3 iDocNADEe
  • 37. Intern © Siemens AG 2017 May 2017Seite 37 Corporate Technology Evaluation: Datasets, Statistics and Properties ➢ 8 short-text and 7 long-text datasets ➢ short-text → #words < 25 in a document ➢ a corpus of few documents (e.g., 20NSsmall) ➢ topics: 50 and 200 (hidden layer size) ➢ Quantitatively evaluated using: - generalization (perplexity, PPL) - interpretability using topic coherence - text/information retrieval (IR) - text classification
  • 38. Intern © Siemens AG 2017 May 2017Seite 38 Corporate Technology Evaluation: Datasets, Statistics and Properties ➢ 8 short-text and 7 long-text datasets ➢ short-text → #words < 25 in a document ➢ a corpus of few documents (e.g., 20NSsmall) ➢ topics: 50 and 200 (hidden layer size) ➢ Quantitatively evaluated using: - generalization (perplexity, PPL) - interpretability using topic coherence - text/information retrieval (IR) - text classification
  • 39. Intern © Siemens AG 2017 May 2017Seite 39 Corporate Technology Evaluation: Generalization via Perplexity (PPL) Generalization (PPL) → lower the better → on short-text datasets
  • 40. Intern © Siemens AG 2017 May 2017Seite 40 Corporate Technology Evaluation: Generalization via Perplexity (PPL) Generalization (PPL) → lower the better → on short-text datasets
  • 41. Intern © Siemens AG 2017 May 2017Seite 41 Corporate Technology Evaluation: Generalization via Perplexity (PPL) Generalization (PPL) → lower the better → on short-text datasets Gain (%) 4.1%
  • 42. Intern © Siemens AG 2017 May 2017Seite 42 Corporate Technology Evaluation: Generalization via Perplexity (PPL) Generalization (PPL) → lower the better → on short-text datasets Gain (%) 4.1% 4.3%
  • 43. Intern © Siemens AG 2017 May 2017Seite 43 Corporate Technology Evaluation: Generalization via Perplexity (PPL) Generalization (PPL) → lower the better → on short-text datasets Gain (%) 4.1% 4.3% 5.1%
  • 44. Intern © Siemens AG 2017 May 2017Seite 44 Corporate Technology Evaluation: Generalization via Perplexity (PPL) Generalization (PPL) → lower the better → on long-text datasets Gain (%) 5.3% 4.8% 5.5%
  • 45. Intern © Siemens AG 2017 May 2017Seite 45 Corporate Technology Evaluation: Applicability (Information Retrieval) IR-precision (on short-text data) → Precision at retrieval fraction 0.02 → higher the better short-text
  • 46. Intern © Siemens AG 2017 May 2017Seite 46 Corporate Technology Evaluation: Applicability (Information Retrieval) IR-precision (on short-text data) → Precision at retrieval fraction 0.02 → higher the better Gain (%) short-text
  • 47. Intern © Siemens AG 2017 May 2017Seite 47 Corporate Technology Evaluation: Applicability (Information Retrieval) IR-precision (on short-text data) → Precision at retrieval fraction 0.02 → higher the better Gain (%) 5.6% short-text
  • 48. Intern © Siemens AG 2017 May 2017Seite 48 Corporate Technology Evaluation: Applicability (Information Retrieval) IR-precision (on short-text data) → Precision at retrieval fraction 0.02 → higher the better Gain (%) 5.6% 7.4% short-text
  • 49. Intern © Siemens AG 2017 May 2017Seite 49 Corporate Technology Evaluation: Applicability (Information Retrieval) IR-precision (on short-text data) → Precision at retrieval fraction 0.02 → higher the better Gain (%) 5.6% 7.4% 11.1% short-text
  • 50. Intern © Siemens AG 2017 May 2017Seite 50 Corporate Technology Evaluation: Applicability (Information Retrieval) IR-precision (on long-text data) → Precision at retrieval fraction 0.02 → higher the better Gain (%) 7.1% 7.1%7.1% long-text
  • 51. Intern © Siemens AG 2017 May 2017Seite 51 Corporate Technology Evaluation: Applicability (Information Retrieval) TMNtitle AGnewstitle
  • 52. Intern © Siemens AG 2017 May 2017Seite 52 Corporate Technology Evaluation: Interpretability (Topic Coherence) → assess the meaningfulness of the underlying topics captured → coherence measure proposed by Roeder, Both, and Hinneburg (2015) → higher scores imply more coherent topics
  • 53. Intern © Siemens AG 2017 May 2017Seite 53 Corporate Technology Evaluation: Interpretability (Topic Coherence) → assess the meaningfulness of the underlying topics captured → coherence measure proposed by Roeder, Both, and Hinneburg (2015) → higher scores imply more coherent topics short-text
  • 54. Intern © Siemens AG 2017 May 2017Seite 54 Corporate Technology Evaluation: Interpretability (Topic Coherence) → assess the meaningfulness of the underlying topics captured → coherence measure proposed by Roeder, Both, and Hinneburg (2015) → higher scores imply more coherent topics long-text
  • 55. Intern © Siemens AG 2017 May 2017Seite 55 Corporate Technology Evaluation: Qualitative Topics (e.g., ‘religion’) coherent topic words
  • 56. Intern © Siemens AG 2017 May 2017Seite 56 Corporate Technology Conclusion: Take Away ➢ Leveraging full contextual information in neural autoregressive topic model ➢ Introducing distributional priors via pre-trained word embeddings ➢ Gain of 5.2% (404 vs 426) in perplexity, 2.8% (.74 vs .72) in topic coherence, 11.1% (.60 vs .54) in precision at retrieval fraction 0.02, 5.2% (.664 vs .631) in F1 for text categorization on avg over 15 datasets ➢ Learning better word/document representation for short/long texts ➢ State-of-the-art topic models unified with textual representation learning
  • 57. Intern © Siemens AG 2017 May 2017Seite 57 Corporate Technology Conclusion: Take Away ➢ Leveraging full contextual information in neural autoregressive topic model ➢ Introducing distributional priors via pre-trained word embeddings ➢ Gain of 5.2% (404 vs 426) in perplexity, 2.8% (.74 vs .72) in topic coherence, 11.1% (.60 vs .54) in precision at retrieval fraction 0.02, 5.2% (.664 vs .631) in F1 for text categorization on avg over 15 datasets ➢ Learning better word/document representation for short/long texts ➢ State-of-the-art topic models unified with textual representation learning Tryout: The code and data are available at https://github.com/pgcool/iDocNADEe Thanks !! “textTOvec”: Latest work to appear in ICLR19 | A Neural Topic Model with Language Structures