Conceptual Text Mining

Conceptual text mining
Pim Huijnen
Utrecht University & University of Sheffield Digital Humanities Workshop
May 12, 2016

What to do with 11 million newspaper
pages?

a) distant reading
In: Het Centrum, 10 October 1919, p. 4.
b) finding the needle
in the hay stack

In: Het Volk: Dagblad voor de
arbeiderspartij, 29 January 1921, p. 3.

1 How to define a concept?  
Efficiency ≠ “efficiency”
Eugenetica ≠ “eugenetica” +
“eugenetiek” + “eugeniek" + "rassenleer" 
2 How to study its changing uses,
contexts, and meaning over time?
5
How to know what to look at?

1 How to define a concept?  
Efficiency ≠ “efficiency”
Eugenetica ≠ “eugenetica” +
“eugenetiek” + “eugeniek" + "rassenleer" 
2 How to study its changing uses,
contexts, and meaning over time?
6
How to know what to look at?

1) Eugenics
7
* topic modeling newspaper articles containing "eugenics"
* using meaningful words to look for eugenics without
“eugenics”
* in the given example: querying Texcavator with
‘regulation AND health AND race’ (575 results)

Texcavator
8
plotting the results on a
time scale (relative to total
number of articles per year)
extracting distinctive words
from query results per year
(tf-idf)

2) Scientific management
12
* using close reading to find all significant Dutch
equivalents for “scientific management"
* extract results, divide them per year and upload them
to Voyant Tools
* study changing vocabulary in the subset over time

Scientific management query
13
”wetenschappelijke bedrijfsleiding” (233)
”wetenschappelijke bedrijfsorganisatie” (216)
”wetenschappelijke bedrijfsvoering” (32)
”scientific management” (28)
’taylorstelsel OR taylor-stelsel’ (330)
’taylorsysteem OR taylor-systeem’ (369)
’taylorisme’ (42)
Combined in a single query results in 1175 hits

The third way: distributional semantics
17
* Our implementation combines a) creating dictionaries and
b) tracing meaning over time in a single workflow
* by finding ‘most similar words’ (i.e. words with equal vector
values / words with similar meaning in sentences)
* Use cluster of most similar words from ten-year time period
to find most similar words in next (and partly overlapping)
time frame
* Trace word use of concepts over time without being
dependant on single terms or predefined dictionaries

Conceptual Text Mining

Recommended

Recommended

More Related Content

Similar to Conceptual Text Mining

Similar to Conceptual Text Mining (14)

Recently uploaded

Recently uploaded (20)

Conceptual Text Mining