Using Topic Modeling to Study Everyday "Civic Talk" and Proto-political Engagements


We present a two-step topic modeling method of analysing political articulations in everyday proto-political "civic talk" on online social media and interpreting them in terms of cultural and political sociology.

  1. 1. Using Topic Modeling to Study Everyday “Civic Talk” and Proto-political Engagements Veikko Eranti &TuukkaYlä-Anttila Universities of Helsinki &Tampere “Citizens in the Making” (Kone Foundation 2015–2017) @VeikkoEranti @TuukkaYlaAnt
  2. 2. Background • A larger project combining ethnographic and digital methods • Citizenship as action, as process – “grown into” • Our subfield: online proto- politics and politics • How does everyday “civic talk” get articulated and raised onto the level of political discourse, participation? • Proof-of-concept empirical analysis of a discussion forum dataset
  3. 3. Materials • Project: several social media datasets • Here: Suomi24 (Finland24) forum • Subset of 2.5M words (whole 2001–2015 dataset: 2.5B words) • A general interest forum, largest of its kind in Finland • Sub-forums: local municipalities, cars, hobbies, home & DIY, pets, travel, Jesus, sex, and Jesus & sex • Dedicated sections for political discussion, but it also “leaks” to other discussion areas • We look at proto-political talk on the forum as a whole
  4. 4. Theory • Online political talk not an ideal Habermasian speech situation or public sphere • Not necessarily political arguments: grievances, expressions of resentment... below the threshold of argumentation and deliberation (Mouffe,Young, Habermas, Laclau, Dahlgren,Thévenot, Klofstadt) • Working hypothesis: articulation of grievances (and bigger idea of civic culture) reflected in pre- /proto-political discussions
  5. 5. Methods • Topic modeling: unsupervised machine learning • Takes text, gives you “topics”: sets of words that occur together in documents • Can frames, discourses, justifications etc. objects of cultural sociology be operationalized as such topics? (DiMaggio, Nag & Blei 2013) • We run a 50-topic LDA model with MALLET to find (proto)political talk in everyday debates • 50 sets of words which often occur together: topics of discussion
  6. 6. Examples of topics (top 10 words) topic17: new need Finland through produce change problem build small action future use nowadays opportunity option topic23: Finland Sweden language church Finnish Swedish speak school country learn Catholic religion belong study Islam topic32: Finland pay Euro money tax billion state million poor cut government economy rich count large
  7. 7. Interpreting topics • These were political words, but don’t really represent a political articulation (a position, a justification or even a policy theme) • We interpret 9 of 50 topics as political or proto-political • How to get closer to political articulations from this general “civic talk”? • Let’s pick “proto-political” topics from the 50 and reduce the dataset to the 100 most important messages from each • Reduced to 827 messages (from ~42 000) • 30-topic LDA model on them
  8. 8. But first… an aside on VALIDATION of interpretations • This is a proof-of-concept, so we validated these very superficially • In actual work…VALIDATE,VALIDATE,VALIDATE! • Context-specific deep knowledge of your data – read it! • Internal validation, external validation (Evans 2014, Grimmer & Stewart 2013) • ICCSS2015 poster: more systematic validation
  9. 9. Examples of topics in “submodel” topic3: Marx work workingclass capitalism teacher socialism worker create pay workingtime value long wellbeing production product topic12: Finland Niinistö parliament Soini president TrueFinn party Halla-aho choose minister leader chairman foreignminister memberofparliament Russia topic22: member association function union expel organization name right important only Halonen membershipfee forum join DDR
  10. 10. 21 of 30 topics are rather clear political articulations! Example:
  11. 11. Conclusions • 50-topic model of a general interest forum: no or vaguely political articulations • However, “proto-political” discussions as reduced dataset produces much more coherent articulations • Locating proto-political talk in big data and then, further, pinpointing political articulations arising from that • Drawing a map of big datasets for further qualitative exploration • Sub-model topics are still largely thematic instead of practices, frames, justifications etc. • Can we get at these through vocabulary? • Note: this demo was 1/1000 of the entire Suomi24 dataset • Importance of theory and conceptual work
  12. 12. Extra idea Could we model 1) fringe forums, 2) “mid-level” forums and 3) general forums/media to plot the emergence, spreading and mainstreaming of articulations?
  13. 13. References • Dahlgren, Peter. 2000. “The Internet and the Democratization of Civic Culture.” Political Communication 17: 335–40. • DiMaggio, Paul, Manish Nag, and David M. Blei. 2013. “ExploitingAffinities betweenTopic Modeling and the Sociological Perspective on Culture: Application to Newspaper Coverage of U.S. Government Arts Funding.” Poetics 41(6): 570–606. • Evans, Michael S. 2014. “A Computational Approach to Qualitative Analysis in LargeTextual Datasets.” PLoS ONE 9(2): 1–10. • Grimmer, Justin, and Brandon M. Stewart. 2013. “Text as Data:The Promise and Pitfalls of Automatic Content Analysis Methods for PoliticalTexts.” Political Analysis 21(3): 267–97. • Klofstad, Casey A. 2011. CivicTalk: Peers, Politics, and the Future of Democracy. Temple University Press. • Thévenot, Laurent. 2014. “VoicingConcern and Difference: From Public Spaces to Common-Places.” EuropeanJournal of Cultural and Political Sociology 1(1): 7– 34. • Etc.