Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

A War of Words? Text Mining Political Speeches in Britain in the 19th and 20th Centuries


Published on

Dr Luke Blaxill (Leverhulme Early Career Research Fellow, Anglia Ruskin University) - Negotiated Texts Network Seminar (29 June 2017)

Historians are increasingly surrounded by an ever-growing forest of machine-readable textual sources. The old challenge of scarcity has been replaced by that of abundance. Despite this, the impact of text mining in History has been remarkably weak. Historians, who continue to be extremely interested in language, continue overwhelmingly to prize sharply focussed analyses based on close readings. Macroscopic computational approaches based on large (or even small) corpora remain at the fringes, despite the traditional barrier of cost and manpower being considerably mitigated by the march of technology.

This paper explores the transformative potential of text mining in this field in two areas of political language where large corpora have become available. The first example is based on election platform speeches in the late Victorian and Edwardian era. In this age of emerging democracy, even local constituency candidates would routinely hold over a hundred public meetings in an election campaign, speaking at length to large audiences which were often reported very thoroughly by a diligent and wordy press. I argue that even very simple text mining techniques in a relatively small corpus (4 million words) can challenge historical consensus on the contents of general election campaigns, on the significance of issues such as imperialism and Irish Home Rule, and the respective visibility of party leaders such as Gladstone and Disraeli.

The second is an analysis of the language of women MPs in Parliament since 1945. Drawing upon the outputs of the Digging into Linked Parliamentary Data ('Dilipad') project – which has added gender and party coding to the digital edition of Hansard – I present a wide-ranging empirical analysis of the role of gender in the 677 million words of Commons debates from 1945 to 2015. I investigate whether there is strong evidence to support the central feminist claim that women's contributions to Commons debates are substantively different to those of men, ask whether the 'gender effect' has been strengthening or weakening as the number of women in Parliament has increased since the 1997 election, and also at the effect of party, such as the oft-made claim that Labour (with its greater proportion of female MPs and ideological sympathy for feminism) was more focussed on representing women than the Conservatives.

Overall, I argue these techniques, whether used conservatively in a supplementary capacity alongside traditional approaches, or more boldly to lead analysis, have considerable potential to reshape historians' work in the digital age. They allow us to analyse texts too large in size to read, help overcome flaws in human ability to intuitively estimate frequency, allow greater verifiability, more precise communication of quantity, and a more empirical approach to working.

Published in: Education
  • Be the first to comment

  • Be the first to like this

A War of Words? Text Mining Political Speeches in Britain in the 19th and 20th Centuries

  1. 1. Negotiated Texts Networks Seminar A War of Words? Text Mining Political Speeches in Britain in the 19th and 20th Centuries 1 Dr. Luke Blaxill Leverhulme Early Career Research Fellow
  2. 2. Data Deluge • Billions of words of digitised next now available to historians. We use these to access and to search these archives. But the advent of the ‘digital turn’ has not fundamentally changed methods of analysis.
  3. 3. Political Historians, the Linguistic Turn, and the Controversy of Text Mining… • Great deal of recent interest in language in political history since the ‘linguistic turn’ • The turn has given us a great deal, but its focus on deep readings of specific texts in isolation has often caused historians, especially of language, to be gun-shy of offering broader explanations. • Many historians have called for a ‘reintegration’ of the explanatory ambition of the old social-scientific historical tradition with the new sensitivity to language. But how could such a reintegration occur? Text Mining is a controversial answer in a field which prizes ‘close readings’, is fearful of classification and ontology, and has an awkward relationship with computing after widespread perceived failures in the 1960s and 1970s.
  4. 4. My Proposal: Quantitative language analysis with a Corpus A corpus: a huge bank of text of several million words Candidates’ Speeches from Press SCANNED Computerised Corpus: Millions of words of freely searchable text
  5. 5. My Corpora East Anglian Speeches 1880-1910. Subdivided per party, per election (1 million words) National Speeches 1880-1910. Subdivided per party, per election (4 million words) Grassroots Speeches 1880-1910. Outside East Anglia (3 million words) Hansard Corpus, 1800-present Marked up by gender and party (2.5 Billion words)
  6. 6. Corpus-driven Quantification in action: Example One: Ireland, 1880-1900
  7. 7. Corpus-driven Quantification in action: Example Two: Imperialism 1880-1900 How can we measure the ‘language of Imperialism’ with a corpus? 1. Historical Judgement 2. Topic Modelling 3. Seed Words
  8. 8. Corpus-driven Quantification in action: Example Two: Imperialism 1880-1900 A five word Taxonomy of Imperialism: • Imperial (and all variants) • Empire • Flag • British (and all variants) • Colony (and all variants)
  9. 9. Corpus-driven Quantification in action: Example Two: Imperialism 1880-1900 To ensure the ascriptions of ‘the language of imperialism’ are correct, each word is checked in its original context:
  10. 10. Corpus-driven Quantification in action: Example Two: Imperialism 1880-1900
  11. 11. ‘Gladstone’: Top Contexts 42% Foreign Policy Weakness 17% Inferiority to Disraeli 8% Good orator 16% Disunity 11% Disestablishment 7% General Gordon 7% Financial incompetence 57% Irish Home Rule 11% Liberal Unionists 7% Abandoning Land Reform 56% Irish Home Rule 14% Disunity 12% Newcastle Programme 11% General Greatness General Greatness 20% Financial competence 17% Bringer of Peace 14% Superiority to Disraeli 14% Manifesto/Programme 21% Party unity 21% General Greatness 20% Irish Home Rule 39% General Greatness 30% Newcastle Programme 12% 1880 1885 1892 Irish Home Rule 40% General Greatness 29% Party unity 17% Superiority to Chamberlain 13% 1886
  12. 12. The Case for using Corpus-driven quantification to study Political Language Four Arguments 1. Difficult to measure quantity by intuition. 2. Corpora can help establish typicality better than selected quotations from speech. 3. Numerical conclusions easy to verify 4. More empirical approach to working can illuminate unexpected patterns otherwise hard to detect
  13. 13. Digging into Linked Parliamentary Data (Dilipad) Project 0.7 Billion Words HANSARD CORPUS 1945-present Enhanced Corpora for UK, Canada, the Netherlands since c.1800 for all three counties. Data Semantically enhanced… • Coded By Gender • Coded by Constituency • Coded by Party • Coded by whether MP was minister or backbencher.
  14. 14. Women’s Parliamentary Language
  15. 15. Women’s Parliamentary Language
  16. 16. Women MPs: Keywords
  17. 17. Women’s Parliamentary Language
  18. 18. Topic % Macroeconomics 12,30 Civil Rights, Immigration 3,26 Health 6,27 Agriculture 4,02 Labour and Employment 3,78 Education 4,93 Environment 1,76 Energy 3,25 Transportation 6,39 Law, Crime, and Family 6,36 Social Welfare 3,90 Planning and Housing 5,43 Banking, Finance 4,12 Defence 11,61 Space, Science, Technology 1,41 Foreign Trade 0,61 International Affairs 8,36 Government Operations 11,10 Colonial and Territorial Issues 1,14 Table 1: Percentage[L1] of speeches delivered in the Commons since 1945 by CAP topic codes
  19. 19. Variable Name Explanation Gender (Female= 1) Dummy variable: the main explanatory variable of interest. Party (Labour=1) Dummy variable, with Conservative= 0, Labour= 1, and Liberal= 2. We excluded the other parties because they didn’t have a consistent presence in the House of Commons since 1945. Party Status (Government= 1) Dummy variable that indicates if the member sits on the government or opposition benches. Seniority Integer variable that counts the number of times a member was elected. Portfolio (Responsible=1) Dummy variable: set to 1 for members whose portfolio applies to the specific policy area. It is not surprising, for example, that the Health Secretary will talk more about Health issues, and this variable allows us to control for this. Topic Topology Dummy variable that aligns with the topology introduced in table ??. Soft issues are coded as 0; hard issues as 1; and all others as 2. Parliamentary Session Categorical Variable: Each parliamentary s Total Speeches (Exposure) Integer variable that records the total number of speeches across all topics Table 3: Variables that are included in our model
  20. 20. Topic Gender(Female=1 ) Party(Lab) Macroeconomics -4,941 *** -0,732 Civil Rights, Immigration 7,712 *** 0,458 Health 6,703 *** 1,975 * Agriculture 0,507 -4,153 *** Labour and Employment -2,532 * 7,423 *** Education 2,458 * 1,401 Environment 0,336 -3,381 ** Energy -4,559 *** 6,038 *** Transportation -2,131 * -0,508 Law, Crime, and Family 4,628 *** -0,306 Social Welfare 5,534 *** 2,422 * Planning and Housing 0,371 3,089 ** Banking, Finance -2,631 ** -3,332 ** Defence -5,037 *** -2,466 * Space, Science -3,419 ** 0,261 Foreign Trade -0,940 -0,357 International Affairs -1,646 -3,703 *** Government Operations -4,352 *** 1,568 Commonwealth Issues -1,111 -2,079 * Table 4: z-scores and significance for the gender and party variables. * > 0.05, ** > 0.01 *** > 0.001. Standard errors are clustered by MP. We therefore code as ‘Male Topics’ Macroeconomics Energy Defence Space Science Govt Operations We therefore code as ‘Female Topics’ Civil Rights + Immigration Health Law, Crime and Family Social Welfare Education
  21. 21. Figure 1: ratio of speeches on civil rights, health, education and social welfare. Red line represents the ration for all MPs, black line for male MPs and grey line for female MPs
  22. 22. z-score topic words HEATH DEBATES 6,31 abort pregnanc women foetus termin contracept woman rape babi gynaecologist 2,87 cancer research transplant cell organ screen fluorid human donor blood 2,83 smoke tobacco alcohol cigarett advertis drink ban smoker pub product 0,98 vaccin osteopath immunis whoop acupunctur hairdress osteopat regi cough measl -5,29 nhs bill trust mental amend nurs carer communiti doctor claus Table 5: Z-scores for topics most related to gender. Topic model is trained on all health debates.
  23. 23. Figure 7: predicted number of speeches for male (left) and female MPs (rights). The blue line represents the ‘male’ topics, the red line the ‘female’ topics. Dotted lines show the 95% confidence intervals.
  24. 24. Thank You.
  25. 25. • Note: The following few pages show extras tables and graphs, which aren’t part of the paper but might be referred to in discussions.
  26. 26. Corpus-driven Quantification in action: Gladstone, 1880-1892 EAST ANGLIA NATIONAL SPEAKERS
  27. 27. Education Issue in 1885: top contexts Liberals mentions of School, Child, Education (163 mentions) Score Percentage of total mentions Poor people priced out of education 34 21% General expressions of support in favour of free education 34 21% Will give dignity to poor/ will help poor 19 12% Improve social mobility of poor 14 9% Reassurance that religious aspect to education will be kept 9 6% Conservative mentions of school, Child, Education (146 mentions) Score Percentage of total mentions Weaken voluntary schools 26 18% Expensive, wasteful 24 16% Highlighting Liberal attack on religious basis of education 19 13% Stops people being stakeholders* 13 9% Criticising compulsory and universal education 16 11% Poor standard of board Schools 5 3%
  28. 28. Church Disestablishment in 1885: top contexts Context of Liberal Mentions of Church (105 total) Score Percentage of total mentions Proclamations in favour of Disestablishment 37 35% Disestablishment as a route to Religious Equality 17 16% Candidate distancing themselves from Disestablishment 10 10% Attacks on Conservatives for politicising the Church 9 9% Context of Conservative Mentions of Church (111) Score Percentage of total mentions Attacks on Liberals for tying to weaken/ abolish Church 36 32% General vows to protect Church 31 28% Benefits of Church (Education) 5 5% Benefits of Church (Church sponsored charities) 11 10% Benefits of Church (Classless, available for all Classes) 10 9% Benefits of Church (Education) 3 3% Benefits of Church (Improvement of character) 3 3%
  29. 29. Mentions of Tariff Reform in East Anglia, Jan 1910 compared to 1906: Liberals
  30. 30. Mentions of Tariff Reform in East Anglia, Jan 1910 compared to 1906: Liberals + Conservatives