Dynamic Topic Modeling
via Non-negative Matrix
Factorization
Derek Greene
University College Dublin
Overview
• Topic Modeling
• Non-negative Matrix Factorization

• Dynamic Topic Modeling

• Proposed Approach
• Dynamic Topic Modeling via Non-negative 

Matrix Factorization

• Application
• Topic Modeling European 

Parliamentary Speeches
September 2016 2
Topic Modeling
September 2016 3
• Goal: Discover hidden thematic structure in a corpus of text 

(e.g. tweets, Facebook posts, news articles, political speeches).

• Unsupervised approach, no prior annotation required.
Input Output
Data

Preparation
Topic
Modeling
Algorithm
Topic 1
Topic 2
Topic k
• Output of topic modeling is a set of k topics. Each topic has:

1. A descriptor, based on highest-ranked terms for the topic.

2. Membership weights for all documents relative to the topic.
Topic Modeling with NMF
• Non-negative Matrix Factorization (NMF): Family of linear algebra
algorithms for identifying the latent structure in data represented
as a non-negative matrix (Lee & Seung, 1999).

• NMF can be applied for topic modeling, where the input is a
document-term matrix, typically TF-IDF normalized.
September 2016 4
Input Matrix 

(documents x terms)
• Input: Document-term matrix A; User-specified number of topics k.

• Output: Two k-dimensional factors W and H approximating A.
An
m
Factor

(documents x topics)
NMF Wn
k
Factor

(topics x terms)
H
m
k·
Example: NMF Topic Modeling
• Apply standard NMF to document-term matrix A (6 rows x 10
columns) for k=3 topics…
September 2016 5
document 1
document 2
document 3
document 4
document 5
document 6
research
stem
education
disease
patient
health
budget
finance
banking
bonds
Example: NMF Topic Modeling
September 2016 6
research
stem
education
disease
patient
health
budget
finance
banking
bonds
Topic 1 Topic 2 Topic 3
Factor H

Weights for terms
document 1
document 2
document 3
document 4
document 5
document 6
Topic 1 Topic 2 Topic 3
Factor W 

Weights for documents
(D. Blei, 2012)
Dynamic Topic Models
• Standard topic modeling approaches assume the order of
documents does not matter. Not suitable for time-stamped data.

• Dynamic topic modeling: Approaches to track how language
changes and topics evolve over time in a time-stamped corpus.
September 2016 7
Inaugural address
Dynamic Topic Modeling
via Non-negative Matrix
Factorization
Proposed Approach
• Two-Level approach: Link together related topics found in
different time windows to track topics over time.
9
Rank Term
1 eurozone
2 greece
3 imf
4 loan
5 debt
Rank Term
1 greece
2 debt
3 germany
4 reparations
5 eu
Rank Term
1 greece
2 russia
3 debt
4 eu
5 loan
Topic in

Window 1
Topic in

Window 2
Topic in

Window 3
Divide corpus into 𝜏 time windows of equal duration (e.g. days,
weeks, months, quarters, or years).

Level 1: Apply NMF topic modeling to documents in each
window to produce window topics.

Level 2: Apply another layer of NMF to all topics from Step 1 to
find dynamic topics which span multiple time windows.
Proposed Approach
• Key Idea for Level 2:
• View the topic basis vectors (columns of factor H) found in
each time window as “topic documents”.

• Construct a new combined representation from these H
factors. Similar to idea of “stacking” in supervised ensembles.

• Apply NMF to this new representation.
September 2016 10
𝜏 x Time Window 

Datasets 𝜏 x NMF H Factors
Factor H from Window 1
Factor H from Window 2
Factor H from Window 3
Factor H from Window 𝜏
…
m’ terms
n’topicdocuments
Topic-Term Matrix
Example: Dynamic Topic Modeling
11
Topic-term matrix for 2 time window results, each with 3 topics.
Window1-01
Window1-02
Window1-03
Window2-01
Window2-02
Window2-03
Topics for

Time 

Window 1
Topics for

Time 

Window 2
health
patient
disease
citizen
research
education
budget
finance
banking
Topic-Term Matrix Heatmap
Application:

European Parliament
Collaboration with Dr. James Cross 

UCD School of Politics & 

International Relations
Exploring the European Parliament Agenda
September 2016 13
• Directly elected parliamentary
institution of the EU.

• 8th term began in July 2014.

• 751 Members of European
Parliament (MEPs) from 28
member states.
• 12 plenary sessions per year are held in Strasbourg.

• During sessions, members may speak after being called by the
President. Speaking time available to MEPs is strictly limited.

• MEPs use speeches to state their positions on policies, to
explain votes, and to demonstrate to their electorates that they
are representing their interests in Europe.
Data Collection
• In Autumn 2014 we collected
~400k records from EuroParl.

• Covers activities of MEPS in the
European parliament during
terms 5-7 (1999-2014).

• Focus on records of speeches
in plenary. Accounts for 54.3%
of all Europarl records.
14
http://europarl.europa.eu
Data Collection
• Original corpus contains 269,696 plenary speeches.

• Identified subset of 210,247 English language speeches, either
native or translated.
15
• Divided these into 60 “time window” datasets. Each time
window is a quarter from 1999-Q3 to 2014-Q2.
Time Window (Quarter Number)
NumberofSpeeches
Time Window Topic Modeling
• Applied NMF to document-term matrix for the speeches in
each of the 60 time windows. 

• Use automated topic coherence approach to choose number
of topics k for each window (O’Callaghan et al, 2015).

➡ Output: 60 sets of time window topics.
September 2016 16
Time Window Topic Modeling
Example Topic: 2003-Q1
17
Top 10 terms suggest that this
topic relates to the Iraq war.
Top 10 speeches for this topic
provide the context.
Dynamic Topic Modeling Results
• Applying dynamic topic modeling to the resulting topic-term
matrix with parameter selection yields 57 dynamic topics
which show varied nature of European Parliament’s agenda…
18
Example: Climate Change
19
0
100
200
300
400
500
600
2000 2002 2004 2006 2008 2010 2012 2014
NumberofSpeeches
Year
Climate Change

Package
Cancun
CopenhagenMontreal
Example: Financial & Euro Crisis
20
0
200
400
600
800
1000
1200
2000 2002 2004 2006 2008 2010 2012 2014
NumberofSpeeches
Year
Financial crisis
Euro crisis
A
D
C
B
Dynamic Topics by Politician
We associate MEPs with dynamic topics based on the number of
speeches by the MEP associated with its window topics.
September 2016 21
Pat Cox (Ireland)
Top 10 Most Relevant Dynamic Topics
Dynamic Topics by Country
22
Ireland
Cyprus
More Information
European Parliament Speeches - Topic Explorer

http://erdos.ucd.ie/europarl
September 2016 23
Python Code and Documentation

https://github.com/derekgreene/dynamic-nmf
D. Greene, J. P. Cross, “Unveiling the Political Agenda of the
European Parliament Plenary: A Topical Analysis,” in Proc. ACM Web
Science’15, 2015.
derek.greene@ucd.ie @derekgreene
D. Greene, J. P. Cross. “Exploring the political agenda of the
European parliament using a dynamic topic modeling approach”,
Political Analysis, 2017 (in press).
References
• D. Blei, A. Y. Ng, M. Jordan. “Latent dirichlet allocation”. Journal of
Machine Learning Research, 3:993–1022, 2003. 

• D. Blei. “Probabilistic topic models”. Communications of the ACM, 2012.

• D. D. Lee & H. S. Seung. “Learning the parts of objects by non-negative
matrix factorization”. Nature, 401:788–91, 1999.

• D. O’Callaghan, D. Greene, J. Carthy & P. Cunningham. “An analysis of the
coherence of descriptors in topic modeling”. Expert Systems with
Applications (ESWA), 2015.

• Zhao, Wayne Xin, et al. "Comparing twitter and traditional media using
topic models." Advances in Information Retrieval, 2011.

• J. Grimmer. “A Bayesian Hierarchical Topic Model for Political Texts:
Measuring Expressed Agendas in Senate Press Releases.” Political
Analysis 18 (1). 1–35, 2010.
September 2016 24

Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)

  • 1.
    Dynamic Topic Modeling viaNon-negative Matrix Factorization Derek Greene University College Dublin
  • 2.
    Overview • Topic Modeling •Non-negative Matrix Factorization • Dynamic Topic Modeling • Proposed Approach • Dynamic Topic Modeling via Non-negative 
 Matrix Factorization • Application • Topic Modeling European 
 Parliamentary Speeches September 2016 2
  • 3.
    Topic Modeling September 20163 • Goal: Discover hidden thematic structure in a corpus of text 
 (e.g. tweets, Facebook posts, news articles, political speeches). • Unsupervised approach, no prior annotation required. Input Output Data
 Preparation Topic Modeling Algorithm Topic 1 Topic 2 Topic k • Output of topic modeling is a set of k topics. Each topic has: 1. A descriptor, based on highest-ranked terms for the topic. 2. Membership weights for all documents relative to the topic.
  • 4.
    Topic Modeling withNMF • Non-negative Matrix Factorization (NMF): Family of linear algebra algorithms for identifying the latent structure in data represented as a non-negative matrix (Lee & Seung, 1999). • NMF can be applied for topic modeling, where the input is a document-term matrix, typically TF-IDF normalized. September 2016 4 Input Matrix 
 (documents x terms) • Input: Document-term matrix A; User-specified number of topics k. • Output: Two k-dimensional factors W and H approximating A. An m Factor
 (documents x topics) NMF Wn k Factor
 (topics x terms) H m k·
  • 5.
    Example: NMF TopicModeling • Apply standard NMF to document-term matrix A (6 rows x 10 columns) for k=3 topics… September 2016 5 document 1 document 2 document 3 document 4 document 5 document 6 research stem education disease patient health budget finance banking bonds
  • 6.
    Example: NMF TopicModeling September 2016 6 research stem education disease patient health budget finance banking bonds Topic 1 Topic 2 Topic 3 Factor H
 Weights for terms document 1 document 2 document 3 document 4 document 5 document 6 Topic 1 Topic 2 Topic 3 Factor W 
 Weights for documents
  • 7.
    (D. Blei, 2012) DynamicTopic Models • Standard topic modeling approaches assume the order of documents does not matter. Not suitable for time-stamped data. • Dynamic topic modeling: Approaches to track how language changes and topics evolve over time in a time-stamped corpus. September 2016 7 Inaugural address
  • 8.
    Dynamic Topic Modeling viaNon-negative Matrix Factorization
  • 9.
    Proposed Approach • Two-Levelapproach: Link together related topics found in different time windows to track topics over time. 9 Rank Term 1 eurozone 2 greece 3 imf 4 loan 5 debt Rank Term 1 greece 2 debt 3 germany 4 reparations 5 eu Rank Term 1 greece 2 russia 3 debt 4 eu 5 loan Topic in
 Window 1 Topic in
 Window 2 Topic in
 Window 3 Divide corpus into 𝜏 time windows of equal duration (e.g. days, weeks, months, quarters, or years). Level 1: Apply NMF topic modeling to documents in each window to produce window topics. Level 2: Apply another layer of NMF to all topics from Step 1 to find dynamic topics which span multiple time windows.
  • 10.
    Proposed Approach • KeyIdea for Level 2: • View the topic basis vectors (columns of factor H) found in each time window as “topic documents”. • Construct a new combined representation from these H factors. Similar to idea of “stacking” in supervised ensembles. • Apply NMF to this new representation. September 2016 10 𝜏 x Time Window 
 Datasets 𝜏 x NMF H Factors Factor H from Window 1 Factor H from Window 2 Factor H from Window 3 Factor H from Window 𝜏 … m’ terms n’topicdocuments Topic-Term Matrix
  • 11.
    Example: Dynamic TopicModeling 11 Topic-term matrix for 2 time window results, each with 3 topics. Window1-01 Window1-02 Window1-03 Window2-01 Window2-02 Window2-03 Topics for
 Time 
 Window 1 Topics for
 Time 
 Window 2 health patient disease citizen research education budget finance banking Topic-Term Matrix Heatmap
  • 12.
    Application:
 European Parliament Collaboration withDr. James Cross 
 UCD School of Politics & 
 International Relations
  • 13.
    Exploring the EuropeanParliament Agenda September 2016 13 • Directly elected parliamentary institution of the EU. • 8th term began in July 2014. • 751 Members of European Parliament (MEPs) from 28 member states. • 12 plenary sessions per year are held in Strasbourg. • During sessions, members may speak after being called by the President. Speaking time available to MEPs is strictly limited. • MEPs use speeches to state their positions on policies, to explain votes, and to demonstrate to their electorates that they are representing their interests in Europe.
  • 14.
    Data Collection • InAutumn 2014 we collected ~400k records from EuroParl. • Covers activities of MEPS in the European parliament during terms 5-7 (1999-2014). • Focus on records of speeches in plenary. Accounts for 54.3% of all Europarl records. 14 http://europarl.europa.eu
  • 15.
    Data Collection • Originalcorpus contains 269,696 plenary speeches. • Identified subset of 210,247 English language speeches, either native or translated. 15 • Divided these into 60 “time window” datasets. Each time window is a quarter from 1999-Q3 to 2014-Q2. Time Window (Quarter Number) NumberofSpeeches
  • 16.
    Time Window TopicModeling • Applied NMF to document-term matrix for the speeches in each of the 60 time windows. • Use automated topic coherence approach to choose number of topics k for each window (O’Callaghan et al, 2015). ➡ Output: 60 sets of time window topics. September 2016 16
  • 17.
    Time Window TopicModeling Example Topic: 2003-Q1 17 Top 10 terms suggest that this topic relates to the Iraq war. Top 10 speeches for this topic provide the context.
  • 18.
    Dynamic Topic ModelingResults • Applying dynamic topic modeling to the resulting topic-term matrix with parameter selection yields 57 dynamic topics which show varied nature of European Parliament’s agenda… 18
  • 19.
    Example: Climate Change 19 0 100 200 300 400 500 600 20002002 2004 2006 2008 2010 2012 2014 NumberofSpeeches Year Climate Change
 Package Cancun CopenhagenMontreal
  • 20.
    Example: Financial &Euro Crisis 20 0 200 400 600 800 1000 1200 2000 2002 2004 2006 2008 2010 2012 2014 NumberofSpeeches Year Financial crisis Euro crisis A D C B
  • 21.
    Dynamic Topics byPolitician We associate MEPs with dynamic topics based on the number of speeches by the MEP associated with its window topics. September 2016 21 Pat Cox (Ireland) Top 10 Most Relevant Dynamic Topics
  • 22.
    Dynamic Topics byCountry 22 Ireland Cyprus
  • 23.
    More Information European ParliamentSpeeches - Topic Explorer http://erdos.ucd.ie/europarl September 2016 23 Python Code and Documentation https://github.com/derekgreene/dynamic-nmf D. Greene, J. P. Cross, “Unveiling the Political Agenda of the European Parliament Plenary: A Topical Analysis,” in Proc. ACM Web Science’15, 2015. derek.greene@ucd.ie @derekgreene D. Greene, J. P. Cross. “Exploring the political agenda of the European parliament using a dynamic topic modeling approach”, Political Analysis, 2017 (in press).
  • 24.
    References • D. Blei,A. Y. Ng, M. Jordan. “Latent dirichlet allocation”. Journal of Machine Learning Research, 3:993–1022, 2003. • D. Blei. “Probabilistic topic models”. Communications of the ACM, 2012. • D. D. Lee & H. S. Seung. “Learning the parts of objects by non-negative matrix factorization”. Nature, 401:788–91, 1999. • D. O’Callaghan, D. Greene, J. Carthy & P. Cunningham. “An analysis of the coherence of descriptors in topic modeling”. Expert Systems with Applications (ESWA), 2015. • Zhao, Wayne Xin, et al. "Comparing twitter and traditional media using topic models." Advances in Information Retrieval, 2011. • J. Grimmer. “A Bayesian Hierarchical Topic Model for Political Texts: Measuring Expressed Agendas in Senate Press Releases.” Political Analysis 18 (1). 1–35, 2010. September 2016 24