Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
CONCEPTS
THROUGH TIME
Tracing Concepts in Dutch Newspaper Discourse
using Sequential Word Vector Spaces
Translantis Projec...
PROBLEM =
CHALLENGE
• Conceptual history / intellectual history studies the emergence and
transformation of concepts, idea...
CONCEPTS THROUGH
TIME
• We would like to study changes in
the meaning (constitution) of
concepts over time
• Question: Wha...
OUR APPROACH
• Multi-dimensional word-vector
space using Google’s
word2vec (neural language
model)
• Data: 500.000 digitiz...
TRACING CONCEPTS
• One or more words as entry-
points into concept
• Concepts defined by in and out
links > inspired by De...
RAW OUTPUT
>>> tc.trackWord(dModels, 'buitenlanders')
1950_1959: vreemdelingen (0.76), nederlanders (0.69), indonesiërs (0...
PROPAGANDA
TRACE CONCEPT
tc.trackClouds3(dModels,
['propaganda'], fMinDist=.6,
bSumOfDistances=True)
1950-1959: propaganda...
ALIENS
TC.TRACKWORD(DMODELS, 'ALIENS')
1950-1959: aliens, foreigners, tourists, Indonesians,
Europeans, traveling worker
1...
CONCLUSIONS
• Trace concepts over large periods of
time
• Greater sensitivity to semantic
changes based on corpus
• Greate...
FUTURE WORK
• Optimize algorithm based on
different types of conceptual
changes
• Query expansion. Use this
technique to f...
THANK YOU!
@melvinwevers //
melvinwevers@gmail.com
www.translantis.nl
(2009): 71.
Deleuze, Gilles. A Thousand Plateaus: Capitalism and Schizophrenia. University of Minnesota Press, 1987.
Huijn...
Upcoming SlideShare
Loading in …5
×

Concepts Through Time: Tracing Concepts in Dutch Newspaper Discourse using Sequential Word Vector Spaces

1,967 views

Published on

  • Be the first to comment

Concepts Through Time: Tracing Concepts in Dutch Newspaper Discourse using Sequential Word Vector Spaces

  1. 1. CONCEPTS THROUGH TIME Tracing Concepts in Dutch Newspaper Discourse using Sequential Word Vector Spaces Translantis Project Digital Humanities Approaches to Reference Cultures: The Emergence of the United States in Dutch Public Discourse 1890-1990 Melvin Wevers, Tom Kenter & Pim Huijnen Utrecht University & University of Amsterdam, the Netherlands
  2. 2. PROBLEM = CHALLENGE • Conceptual history / intellectual history studies the emergence and transformation of concepts, ideas, and thoughts. • Problems with existing methods • Use of predefined list of words (N-gram viewers / Full-text search) • Top-down approaches (NER, word classification lists) make use pre- established models that are often a-historic • Topic modeling is useful but quite static • How to to trace the genealogy of a concept?
  3. 3. CONCEPTS THROUGH TIME • We would like to study changes in the meaning (constitution) of concepts over time • Question: What words were used in the past to talk about particular concepts?
  4. 4. OUR APPROACH • Multi-dimensional word-vector space using Google’s word2vec (neural language model) • Data: 500.000 digitized newspaper issues from the Dutch National Library • Semantic and syntactic information representation by geometry (Baroni & Kruszweksi, 2014; Wijaya & Yeniterzi, 2011) 1950 1960 1970 1 model = 10 years 40 models for period between 1950-1990
  5. 5. TRACING CONCEPTS • One or more words as entry- points into concept • Concepts defined by in and out links > inspired by Deleuze’s notion of the rhizome • Model ambiguity see which words remain and disappear from network • Fast and relatively light • Forwards and backwards
  6. 6. RAW OUTPUT >>> tc.trackWord(dModels, 'buitenlanders') 1950_1959: vreemdelingen (0.76), nederlanders (0.69), indonesiërs (0.65), toeristen (0.62), europeanen (0.61), vacantiegangers (0.58), mensen (0.57), vakantiegangers (0.56), duitsers (0.54), dagjesmensen (0.54) 1951_1960: vreemdelingen (0.76), nederlanders (0.74), toeristen (0.64), indonesiërs (0.64), europeanen (0.64), bezoekers (0.59), immigranten (0.58), duitsers (0.57), mensen (0.57), kampeerders (0.57) 1952_1961: vreemdelingen (0.74), toeristen (0.69), nederlanders (0.68), indonesiërs (0.61), dagjesmensen (0.61), bezoekers (0.61), kampeerders (0.60), europeanen (0.59), vakantiegangers (0.59), duitsers (0.57) 1953_1962: vreemdelingen (0.74), toeristen (0.70), bezoekers (0.64), nederlanders (0.63), vacantiegangers (0.62), kampeerders (0.59), vakantiegangers (0.59), dagjesmensen (0.57), mensen (0.57), automobilisten (0.55) 1954_1963: toeristen (0.69), vreemdelingen (0.68), nederlanders (0.66), bezoekers (0.62), vakantiegangers (0.60), kampeerders (0.59), vacantiegangers (0.58), immigranten (0.56), jongelui (0.55), jongeren (0.55) 1955_1964: toeristen (0.70), vreemdelingen (0.70), nederlanders (0.64), vakantiegangers (0.64), bezoekers (0.63), kampeerders (0.63), vacantiegangers (0.59), mensen (0.59), dagjesmensen (0.56), jongelui (0.55) 1956_1965: vreemdelingen (0.71), toeristen (0.70), vakantiegangers (0.64), kampeerders (0.63), nederlanders (0.62), bezoekers (0.62), mensen (0.61), duitsers (0.57), vacantiegangers (0.56), gezinnen (0.56) 1957_1966: vreemdelingen (0.68), toeristen (0.68), nederlanders (0.63), kampeerders (0.62), vakantiegangers (0.60), mensen (0.59), bezoekers (0.58), duitsers (0.57), sportvissers (0.56), vacantiegangers (0.55) 1958_1967: toeristen (0.71), vreemdelingen (0.71), nederlanders (0.68), vakantiegangers (0.64), kampeerders (0.63), bezoekers (0.60), marokkanen (0.59), duitsers (0.58), dagjesmensen (0.58), mensen (0.57) 1959_1968: toeristen (0.69), nederlanders (0.68), vreemdelingen (0.66), kampeerders (0.62), bezoekers (0.61), vacantiegangers (0.61), vakantiegangers (0.58), sportvissers (0.58), hotelgasten (0.57), mensen (0.57) 1960_1969: toeristen (0.72), vreemdelingen (0.70), nederlanders (0.68), kampeerders (0.61), vakantiegangers (0.61), zakenmensen (0.59), marokkanen (0.59), mensen (0.59), zakenlieden (0.58), bezoekers (0.58) 1961_1970: vreemdelingen (0.71), toeristen (0.68), nederlanders (0.65), kampeerders (0.63), vakantiegangers (0.62), reizigers (0.61), marokkanen (0.59), bezoekers (0.59), vacantiegangers (0.59), mensen (0.58) 1962_1971: vreemdelingen (0.71), nederlanders (0.68), toeristen (0.67), kampeerders (0.63), indonesiërs (0.59), vakantiegangers (0.59), dagjesmensen (0.59), marokkanen (0.58), sportvissers (0.57), vakantiegasten (0.57) 1963_1972: vreemdelingen (0.72), nederlanders (0.71), toeristen (0.68), indonesiërs (0.62), kampeerders (0.61), mensen (0.58), gezinnen (0.58), scandinaviërs (0.58), turken (0.57), duitsers (0.57) 1964_1973: nederlanders (0.69), vreemdelingen (0.67), toeristen (0.66), surinamers (0.62), indonesiërs (0.62), marokkanen (0.61), sportvissers (0.60), turken (0.59), mensen (0.58), antillianen (0.57) 1965_1974: nederlanders (0.73), vreemdelingen (0.71), toeristen (0.64), marokkanen (0.62), turken (0.60), kampeerders (0.59), indonesiërs (0.59), surinamers (0.59), spanjaarden (0.57), duitsers (0.56) 1966_1975: nederlanders (0.70), vreemdelingen (0.69), toeristen (0.68), indonesiërs (0.64), prostituées (0.61), marokkanen (0.60), gezinnen (0.59), mensen (0.59), surinamers (0.58), kampeerders (0.58) 1967_1976: nederlanders (0.71), toeristen (0.65), indonesiërs (0.63), vreemdelingen (0.63), chilenen (0.57), surinamers (0.57), kampeerders (0.57), gezinnen (0.57), duitsers (0.56), jongelui (0.55) 1968_1977: nederlanders (0.72), vreemdelingen (0.68), toeristen (0.64), vakantiegangers (0.62), kampeerders (0.62), indonesiërs (0.59), duitsers (0.59), loeristen (0.58), mensen (0.58), tunesiërs (0.58) 1969_1978: nederlanders (0.73), vreemdelingen (0.72), toeristen (0.66), surinamers (0.63), indonesiërs (0.61), tunesiërs (0.59), guyanezen (0.58), gezinnen (0.58), chilenen (0.58), vakantiegangers (0.58) 1970_1979: nederlanders (0.75), surinamers (0.65), vreemdelingen (0.64), toeristen (0.63), indonesiërs (0.62), guyanezen (0.60), vakantiegangers (0.60), gastarbeiders (0.59), antillianen (0.59), chilenen (0.59) 1971_1980: nederlanders (0.71), surinamers (0.65), toeristen (0.64), vreemdelingen (0.63), vakantiegangers (0.61), chinezen (0.61), antillianen (0.58), guyanezen (0.57), mensen (0.57), gezinnen (0.57) 1972_1981: nederlanders (0.72), surinamers (0.66), vreemdelingen (0.63), toeristen (0.60), gastarbeiders (0.60), chinezen (0.59), vietnamezen (0.59), indonesiërs (0.59), illegalen (0.58), vakantiegangers (0.58) 1973_1982: surinamers (0.71), vreemdelingen (0.70), nederlanders (0.69), gastarbeiders (0.63), guyanezen (0.62), illegalen (0.61), indonesiërs (0.61), chinezen (0.60), zigeuners (0.60), molukkers (0.59) 1974_1983: surinamers (0.70), vreemdelingen (0.69), gastarbeiders (0.69), nederlanders (0.67), antillianen (0.63), zigeuners (0.59), illegalen (0.58), immigranten (0.58), jongeren (0.58), turken (0.57) 1975_1984: surinamers (0.59), gastarbeiders (0.58), vreemdelingen (0.57), turken (0.55), marokkanen (0.54), nederlanders (0.52), jongeren (0.51), antillianen (0.50), zigeuners (0.50), illegalen (0.49) 1976_1985: gastarbeiders (0.57), surinamers (0.55), vreemdelingen (0.55), turken (0.53), migranten (0.52), turks (0.52), marokkanen (0.50), zigeuners (0.50), nederlanders (0.49), jongeren (0.48) 1977_1986: surinamers (0.58), gastarbeiders (0.57), vreemdelingen (0.55), turken (0.53), nederlanders (0.53), migranten (0.52), marokkanen (0.50), antillianen (0.49), visumplichtige (0.49), illegalen (0.48)
  7. 7. PROPAGANDA TRACE CONCEPT tc.trackClouds3(dModels, ['propaganda'], fMinDist=.6, bSumOfDistances=True) 1950-1959: propaganda 1960-1969: advertising, commercial, non-commercial, commercial messages 1970-1979: tv broadcasting, advertising, propaganda, tv programs 1980-1989: sport broadcasting, television broadcasting, advertising, radio broadcastign RELATED WORDS tc.trackWord(dModels, 'propaganda', fMinDist=0.5) 1950-1959: agitation, campaign, campaigns, infiltration, election propaganda, advertising 1960-1969: agitation, campaign, nuclear protest, nuclear arms protest, anti, activities 1968-1977: campaign, agitation, imperialistic, sovietism, soviet campaign, soviet propaganda, militaristic, strikes
  8. 8. ALIENS TC.TRACKWORD(DMODELS, 'ALIENS') 1950-1959: aliens, foreigners, tourists, Indonesians, Europeans, traveling worker 1960-1969: foreigners, tourists, holiday people, automobile drivers, islanders, campers 1970-1979: foreigners, Surinamese, gypsies, Ambonesians, Guyanese, delinquents, country men, minors, illegal aliens, drug users 1980-1990: illegals, Surinamese, gypsies, asylum seekers, immigrants, guest workers, trailer people, Antilles, tamils
  9. 9. CONCLUSIONS • Trace concepts over large periods of time • Greater sensitivity to semantic changes based on corpus • Greater heuristic interactivity with the researcher
  10. 10. FUTURE WORK • Optimize algorithm based on different types of conceptual changes • Query expansion. Use this technique to find relevant related words within specific periods
  11. 11. THANK YOU! @melvinwevers // melvinwevers@gmail.com www.translantis.nl
  12. 12. (2009): 71. Deleuze, Gilles. A Thousand Plateaus: Capitalism and Schizophrenia. University of Minnesota Press, 1987. Huijnen, Pim, Fons Laan, Maarten de Rijke, and Toine Pieters. “A Digital Humanities Approach to the History of Science.” In Social Informatics, edited by Akiyo Nadamoto, Adam Jatowt, Adam Wierzbicki, and Jochen L. Leidner, 71– 85. Lecture Notes in Computer Science 8359. Springer Berlin Heidelberg, 2014. Kenter, Tom, Melvin Wevers, and Pim Huijnen “Ad Hoc Monitoring of Vocabulary Shifts over Time.” To be published Kim, Yoon, Yi-I. Chiu, Kentaro Hanaki, Darshan Hegde, and Slav Petrov. “Temporal Analysis of Language through Neural Language Models.” arXiv:1405.3515 [cs], May 14, 2014. http://arxiv.org/abs/1405.3515. Klingenstein, S., T. Hitchcock, and S. DeDeo. “The Civilizing Process in London’s Old Bailey.” Proceedings of the National Academy of Sciences 111, no. 26 (July 1, 2014): 9419–24. Kruszewski, Marco Baroni Georgiana Dinu Germán. “Don’t Count, Predict! A Systematic Comparison of Context- Counting vs. Context-Predicting Semantic Vectors.” Accessed September 11, 2014. http://anthology.aclweb.org/P/P14/P14-1023.xhtml. Wang, Xuerui, and Andrew McCallum. “Topics over Time: A Non-Markov Continuous-Time Model of Topical Trends.” In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 424–33. ACM, 2006. Wiedemann, Gregor, Andreas Niekler, and others. “Document Retrieval for Large Scale Content Analysis Using Contextualized Dictionaries.” In Terminology and Knowledge Engineering 2014, 2014. http://hal.archives-ouvertes.fr/hal- 01005879/. Wijaya, Derry Tanti, and Reyyan Yeniterzi. “Understanding Semantic Change of Words over Centuries.” In Proceedings of the 2011 International Workshop on DETecting and Exploiting Cultural diversiTy on the Social Web, 35–40. ACM,

×