Topic modeling of marketing scientific papers: An experimental survey

Presentation of the article
Presented by:
Malek Chebil
Authors:
Topic modeling of marketing scientific papers:
An experimental survey
Malek Chebil,
Rim Jallouli,
Mohamed Anis Bach Tobji,
Chiheb Eddine Ben Ncir,
2020-2021
Video link: https://drive.google.com/file/d/1ppGoL0qirOlZ4-ecdNg3JG_v85-ZohQ-/view?usp=sharing

Introduction
Analyse des besoins
Conception
Réalisation
Conclusion et perspectives
Plan
2
 Introduction
 Natural Language Processing (NLP)
 Topic modeling
 Application of topic modeling techniques on
marketing scientific papers' corpus
 Objective and subjective evaluation
 Conclusion and perspectives

3
Introduction (1/….)
Introduction
Increase
Unstructured
Information
Analyze
knowledge
Size of corpus
Important
Natural Language Processing (NLP)

4
Natural Language Processing (NLP)
Machine Leraning
Natural Language Processing
Topic Modeling

5
Topic modeling: definition
• A type of statistical model for discovering the abstract “topic” that occur in a
collection of documents (corpus).
• Topics contain the cluster of words which frequently occur together in the
corpus.
• Each document consists of a mixture of topics.
• Each topic consists of a collection of words.

6
Topic modeling: techniques
Latent Semantic Analysis
(LSA)
Latent Dirichlet Allocation
(LDA)
Correlated Topic Model
(CTM)
Algebric method Generative probabilistic
model
Generative probabilistic
model
Analyzes relationships
between a set of documents
and the terms contained
within
Improves the way of mixture
models that capture the
exchangeability of both
words and documents
Improves the way of mixture
models that capture the
exchangeability of both
words and documents
Words that are similar in
meaning will appear in
similar pieces of text
(distributional hypothesis).
Each document is a
probability distribution of
topics and each topic is a
words from the corpus
Each document is a
topics and each topic is a
words from the corpus
Uses singular value
decomposition (SVD) to scan
unstructured data to find
hidden relationships between
terms and concepts.
Uses the Dirichlet prior to
model the variability among
the topic proportions.
Uses the logistic normal
distribution to model the
pairwise topic correlations

7
Application of topic modeling techniques on
marketing scientific papers' corpus (1/4)
NLP process

8
Topic model generated by LSA for k=6

9
Topic model generated by LDA for k=6

10
Topic model generated by CTM for k=6

11
Objective evaluation (1/4)
Probabilistic coherence
• Measure how well the topics are extracted.
• Score a topic by measuring the degree of coherence between its words.
• Higher probabilistic coherence score means better model.

12
R-squared
• Known as the coefficient of determination, or the coefficient of multiple
determination for multiple regression.
• Evaluate how well the model fits the data.
• Interpretable as the proportion of variability in the data explained by the model.
• Higher r-squared indicates that the model fits the data perfectly.

13
Perplexity
• Measure how well a probability distribution or probability model predicts a set
of data.
• Compare probability models and not algebraic models.
• Lower perplexity suggests a better model.

14
• CaoJuan2009: calculates the cosine
distance of topics. The minimum value
indicates that the corresponding K is the
optimal number of topics.
• Griffiths2004: is computed based
on an estimate multinomial distribution
of K topics to words in the corpus. The
maximum value indicates that the
corresponding K is the optimal number.
Arun2010, CaoJuan2009 and Griffiths2004
• Arun2010: is computed based on the symmetric KL-Divergence between two
matrices (Topic-Words and Document-Topics). The lower value, the better.

15
Subjective evaluation (1/4)
Subjective comparison of topic modeling techniques

16
Final labels of topic modeling techniques

17
 Comparison of topic models to the results of Benslama and
Jallouli

18
 Synthetic comparative table of topic modeling techniques
ranking in the context of scientific papers' corpus

19
• A comparative study of LSA, LDA and CTM on a corpus of marketing
scientific papers.
• Objective evaluation using different metrics.
• Subjective evaluation using marketing expert.
• LDA and CTM models are better than LSA.
• Using a large corpus of scientific papers in other fields or context.
• Appling topic modeling techniques on the full-text of the corpus.
• Comparing other topic modeling techniques.
• Applying cognitive analytics to certain tasks such as “improving the label of
topics” to minimize the cost and efforts of the experts.
Conclusion and perspectives

20
Thank You

Topic modeling of marketing scientific papers: An experimental survey

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Topic modeling of marketing scientific papers: An experimental survey

Similar to Topic modeling of marketing scientific papers: An experimental survey (20)

More from ICDEcCnferenece

More from ICDEcCnferenece (20)

Recently uploaded

Recently uploaded (19)

Topic modeling of marketing scientific papers: An experimental survey