Magnus Huber - The Old Bailey Corpus: Spoken English in the 18th and 19th Centuries

1,464 views
1,202 views

Published on

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,464
On SlideShare
0
From Embeds
0
Number of Embeds
36
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Magnus Huber - The Old Bailey Corpus: Spoken English in the 18th and 19th Centuries

  1. 1. The Old Bailey Corpus Spoken English in the 18th and 19th centuries The use of historical court records in the investigation of language change Digital History Seminar, 21 February 2012Magnus HuberDepartment of EnglishUniversity of GiessenOtto-Behaghel-Str. 10BD-35394 Giessen, Germanymagnus.huber@anglistik.uni-giessen.de
  2. 2. Structure1. Introduction 1.1 Corpus linguistics, sociolinguistics and sociohistorical linguistics 1.2 The Proceedings of the Old Bailey 1.3 Turning the Proceedings into a linguistic corpus2. How linguistically accurate is OBC? 2.1 Comparison with alternative accounts 2.2 Language event and its representation 2.3 Internal consistency: negative contraction 2.4 Sociolinguistic potential: relative clauses3. Brief summary 2
  3. 3. 1. Introduction1.1 Corpus linguistics, sociolinguistics and sociohistorical linguisticsDefinition of linguistic corpusGenerally speaking, a(usually large) collection ofmachine-readable texts usedas a database in linguisticanalysesImportance ofspoken languageSpoken language precedeswritten language
  4. 4. Peter Trudgill (1974)The social differentiation of English in Norwich100 Percentage 80 of (ng):[n] by 60 social class 40 and sex 20 Female 0 Male MMC LMC UWC MWC LWC MMC middle middle class drinking LMC lower middle class UWC upper working class (ng):[n] MWC middle working class = [drɪnkɪn] LWC lower working class
  5. 5. Historical linguistics: language changeye > you in subject positionwhen yecome set it insech rewle asye seemebest (1465)And thus inhast fare youhartely well(1545)
  6. 6. Sociohistorical linguisticsGender-related change: ye > you
  7. 7. 1.2 The Proceedings of the Old Bailey• Old Bailey = Londons Central Criminal Court• meets 8 times/year, from 1830s 10 times/year• "Proceedings" published 1674-1913• start as a commercial enterprise: publishers send scribes into courtroom• proceedings taken down in shorthand• sold privately by publishers• City of London gains more and more control during 18th century 7
  8. 8. • 2100+ volumes• ca. 200,000 trials• ca. 134 million words
  9. 9. www.oldbaileyonline.org
  10. 10. Original computerized Proceedings (Sheffield)<unit id="t17330510-1"><trial><info><identifier>t17330510-1</identifier><source>173305100002</source><header>Sarah Sanders, theft: specified place, 10 May 1733.</header><pfro>17330510</pfro><ntrial>2</ntrial><psession>17330404</psession><nsession>17330628</nsession></info><p>1. <person gender="f"><defendgender="f"><given>Sarah </given><surname>Sanders</surname></defend></person>, was indicted for <off><thefttype="specified place">stealing a Portugal Piece of Gold,value 36 s. a Gold Ring, value 10 s. a Gold Ring set withVermillion Stones, value 7 s. 6d. a Silver Girdle Buckle, value10 s. three Aprons, a Shirt, a Shift, and 2 Ells of Holland, theGoods of <person gender="m"><victimgender="m"><given>John </given><surname>Underwood</surname></victim> </person>, in his House</theft></off>,<cd>March 4</cd>.</p><p>John Underwood. The Prisoner was my<deflabel>Servant</deflabel>, she came to me very wellrecommended, but had not staid above ten Weeks beforeseveral [. . .]
  11. 11. Original computerized Proceedings (Sheffield)<unit id="t17330510-1"><trial><info><identifier>t17330510-1</identifier><source>173305100002</source><header>Sarah Sanders, theft: specified place, 10 May 1733.</header><pfro>17330510</pfro><ntrial>2</ntrial><psession>17330404</psession><nsession>17330628</nsession></info><p>1. <person gender="f"><defendgender="f"><given>Sarah </given><surname>Sanders</surname></defend></person>, was indicted for <off><thefttype="specified place">stealing a Portugal Piece of Gold,value 36 s. a Gold Ring, value 10 s. a Gold Ring set withVermillion Stones, value 7 s. 6d. a Silver Girdle Buckle, value10 s. three Aprons, a Shirt, a Shift, and 2 Ells of Holland, theGoods of <person gender="m"><victimgender="m"><given>John </given><surname>Underwood</surname></victim> </person>, in his House</theft></off>,<cd>March 4</cd>.</p><p>John Underwood. The Prisoner was my<deflabel>Servant</deflabel>, she came to me very wellrecommended, but had not staid above ten Weeks beforeseveral [. . .]
  12. 12. Sociolinguistically useful XML-tagsin Sheffield Proceedings• name <given>Sarah</given> <surname>Sanders</surname>• year <identifier>t17180110-1</identifier>• gender <defend gender="f">• age <age>43</age>• profession <deflabel>Servant</deflabel>• origin <crimeloc>Tottenham</crimeloc>
  13. 13. 1.3 Turning the Proceedings into a linguistic corpus of early spoken English 13
  14. 14. <unit id="t17330510-1"><trial><info><identifier>t17330510-1</identifier><source>173305100002</source><header>Sarah Sanders, theft: specified place, 10 May 1733.</header><pfro>17330510</pfro><ntrial>2</ntrial><psession>17330404</psession><nsession>17330628</nsession></info><p>1. <person gender="f"><defendgender="f"><given>Sarah </given><surname>Sanders</surname></defend></person>, was indicted for<off><theft type="specified place">stealing a Portugal Pieceof Gold, value 36 s. a Gold Ring, value 10 s. a Gold Ring setwith Vermillion Stones, value 7 s. 6d. a Silver Girdle Buckle,value 10 s. three Aprons, a Shirt, a Shift, and 2 Ells of <speech>Holland, the Goods of <person gender="m"><victimgender="m"><given>John </given><surname>Underwood</surname></victim> </person>, in his House</theft></off>,<cd>March 4</cd>.</p><p>John Underwood. The Prisoner was my<deflabel>Servant</deflabel>, she came to me very wellrecommended, but had not staid above ten Weeks beforeseveral [. . .]
  15. 15. Tagging spoken language• Need for automatic annotation• Perl script identifying non-linguistic patterns indicating spoken language in the original proceedings – layout – metalinguistic information• Linguistic markers indicating spoken language? > 1st + 2nd person prns
  16. 16. Automatic speech tagging e.g. "Q. – A."-sequences <speech> </speech> Q. Did you see him on Sunday night? - A.<speech> Yes, at Walworth, on Sunday night, the 12th of January, at one oclock - I am sure </speech> of that.</p>
  17. 17. Sociobiographical speech event annotationThe New Bailey Tag Assistant 17
  18. 18. - <xml> - <document name="19100426"> Social data file ... • XML format - <speaker id="271"> • attributes of every speaker <sex>m</sex> <age></age> in OBC <given>Thomas</given> • plus: scribe, printer, <surname>Tuckey</surname> publisher <occupation>Warder</occupation> <occupation2></occupation2> <hiscolabel>Prison Guard</hiscolabel> <hiscocode>58930</hiscocode> <hiscolabel2></hiscolabel2> <hiscocode2></hiscocode2> <crimescene></crimescene> <birthplace></birthplace> <workplace>Wormwood Scrubs Prison</workplace> <placeofresidence></placeofresidence> <role>witness</role> </speaker> ... - </document> 18- </xml>
  19. 19. 2. How linguistically accurate is OBC?2.1. Comparison with alternative accounts, e.g. trial of John Ayliffe, 17591024-27, vs. alternative account The tryal at large of John AyliffeProceedings (718 words) Tryal (1290 words)Thomas. I am clerk to Mr Jones, Henry Thomas. I am clerk to Mra Stationer in the Temple. Jones, a Stationer, in the Temple.Hargrave. By Mr Ayliffe: I saw Walter Hargrave. By Mr Ayliffe. – Ihim seal and deliver it. saw him sign, seal, and deliver it, as his act and deed../. John Fannen. I am not sure; but to the best of my remembrance, it was sometime the beginning of December last, at Mr Foxs house. 19
  20. 20. Proceedings (718 words) Tryal (1290 words)Hargrave. Because he said he Walter Hargrave. The reason Mrwas not willing Mr Fox should Ayliffe gave, was, that he would notknow of it? on any account have it come to Mr Foxs ears.Thomas. I cant particularly say Henry Thomas. I cannot positivelythat; sometimes we leave a say. – We sometimes leave out theblank by the gentlemens desire, conclusion by gentlemens desire, inperhaps they may add another order that they may add a covenant,covenant, or something of that or some such thing, if it should besort, I cant recollect the reason thought necessary; but I cannotfor that. particularly recollect the reason why the conclusion was omitted in this case. 20
  21. 21. 2.2 Language event ↔ written representationLettersformulation writingTrial proceedings (e.g. Old Bailey Proceedings) speech perception shorthand expanding proof type event by scribe script shorthand reading setting 21
  22. 22. Gurney (1752)Brachygraphy: or short-writingto take a Speech,or Sermonverbatim, as aPerson talks incommon (p. 3)ScribesThomas Gurney(1749-1770)Joseph Gurney(1770-1782) 22
  23. 23. Recording linguisticdetails• no distinction between inflected and uninflected auxiliaries = may or mayst = can or canst  = should or shouldst• dot placed on the top left of the noun phrase = allomorphs a and an• auxiliary contractions you will (you w-il) vs. youll (you-l) but │ it will ~ twill (│= <t> and it) 23
  24. 24. 2.3 Internal consistency: negative contraction e.g. do not > dont, need not > neednt, was not > wasnt N = 1,344,244 NEG contraction in %181614121086420 24 1732-1759 1760-1789 1790-1819 1820-1849 1850-1879 1818-1913
  25. 25. Negative contraction in theOBC, 1732-1912 1. Lexeme?AUX form % contr. N AUX form % contr. Ndo not 28.9 189,776 is not 0.2 47,142will not 27.7 17,302 must not 0.2 1,620shall not 20.6 4,172 would not 0.2 52,123cannot 13.3 106,005 had not 0.1 72,395are not 3.2 11,552 has not 0.1 9,244dare not 3.1 260 should not 0.1 20,192need not 0.6 2,136 was not 0.1 64,574did not 0.4 429,143 may not 0.0 1,271does not 0.4 9,539 might not 0.0 2,404have not 0.4 44,038 ought not 0.0 1,221could not 0.2 85,361 25
  26. 26. Negative contraction in theOBC, 1732-1912 2. Frequency?AUX form % contr. N AUX form % contr. Ndo not 28.9 189,776 is not 0.2 47,142will not 27.7 17,302 must not 0.2 1,620shall not 20.6 4,172 would not 0.2 52,123cannot 13.3 106,005 had not 0.1 72,395are not 3.2 11,552 has not 0.1 9,244dare not 3.1 260 should not 0.1 20,192need not 0.6 2,136 was not 0.1 64,574did not 0.4 429,143 may not 0.0 1,271does not 0.4 9,539 might not 0.0 2,404have not 0.4 44,038 ought not 0.0 1,221could not 0.2 85,361 26
  27. 27. Negative contraction in theOBC, 1732-1912 3. Tense?AUX form % contr. N AUX form % contr. Ndo not 28.9 189,776 is not 0.2 47,142will not 27.7 17,302 must not 0.2 1,620shall not 20.6 4,172 would not 0.2 52,123cannot 13.3 106,005 had not 0.1 72,395are not 3.2 11,552 has not 0.1 9,244dare not 3.1 260 should not 0.1 20,192need not 0.6 2,136 was not 0.1 64,574did not 0.4 429,143 may not 0.0 1,271does not 0.4 9,539 might not 0.0 2,404have not 0.4 44,038 ought not 0.0 1,221could not 0.2 85,361 27
  28. 28. Explaining the absence ofnegative contraction• combination of phonology and genre• nt is phonetically reduced, less salient than not• do-dont [u - o(u)] vs. did-didnt [ɪ - ɪ] can-cant vs. could-couldnt will-wont vs. would-wouldnt shall-shant vs. should-shouldnt• negative contraction is (near) absent where the context (e.g. change in the stem vowel in the negative) does not allow disambiguation 28
  29. 29. Hierarchy of perceptive difference between positive and negative contracted forms V change C change/ Score additiondo-don(t) 1 1 2will-won(t) 1 1 2shall-shan(t) 0.5 1 1.5can-can(t) 0.5 0 0.5 29
  30. 30. 2.4 Sociolinguistic potential: relative clauses • random extracts of speech events from OBC: 20,000 words/decade (10,000 w. each for m + f) • 2500+ relative clauses, of which 1533 restrictive 1720- % 1780- % 1840- % ∑ % 1779 1839 1913that 259 53.8 240 45.4 136 26.0 635 41.4zero 107 22.2 118 22.3 201 38.4 426 27.8which 70 14.6 97 18.3 92 17.6 259 16.9who 38 7.9 69 13.0 89 17.0 196 12.8whom 6 1.2 2 0.4 5 1.0 13 0.8whose 1 0.2 3 0.6 0 0.0 4 0.3∑ 481 529 523 1533 30
  31. 31. Diagram 1 Distribution of that with regard to animacy of the head 100% 80% 60% 40% 20% 0% 1720-1779 1780-1839 1840-1913 non-human 121 164 105 human 137 76 31 1720-1779 vs 1780-1839 p = 0.000 1720-1779 vs 1840-1913 p = 0.000 1780-1839 vs 1840-1913 p = 0.070 31
  32. 32. Diagram 2 Distribution of that and pronominal relativizers with human heads 100% 80% 60% 40% 20% 0% 1720-1779 1780-1839 1840-1913 PRN 49 72 93 that 137 76 31 1720-1779 vs 1780-1839: p = 0.000 1720-1779 vs 1840-1913: p = 0.000 1780-1839 vs 1840-1913: p = 0.000 32
  33. 33. Diagram 3 Relativizers by gender (excl. genitives) p = 0.135 p = 0.001 p = 0.000 100% 80% 60% 40% 20% 0% f m f m f m 1720-1779 1780-1839 1840-1913 PRN 43 71 56 112 66 119 zero 53 54 66 52 110 73 that 124 134 108 132 72 64 f 1720-1779 vs 1780-1839: p = 0.135 m 1720-1779 vs 1780-1839: p = 0.033 f 1720-1779 vs 1840-1913: p = 0.000 m 1720-1779 vs 1840-1913: p = 0.000 f 1780-1839 vs 1840-1913: p = 0.000 m 1780-1839 vs 1840-1913: p = 0.000
  34. 34. Diagram 4 Zero relativizer by gender (excl. genitives) 100% 80% 60% 40% 20% 0% f m f m f m 1720-1779 1780-1839 1840-1913 other 167 205 164 244 138 173 zero 53 54 66 52 110 73 f 1720-1779 vs 1780-1839: p = 0.268 m 1720-1779 vs 1780-1839: p = 0.326 f 1720-1779 vs 1840-1913: p = 0.000 m 1720-1779 vs 1840-1913: p = 0.022 f 1780-1839 vs 1840-1913: p = 0.000 m 1780-1839 vs 1840-1913: p = 0.001
  35. 35. Thank you 35
  36. 36. References• Gurney, Thomas. 1752. Brachygraphy: or short-writing. 2nd ed. London: [no publisher].• Nevalainen, Terttu & Raumolin-Brunberg, Helena (eds). 1996. Sociolinguistics and language history: studies based on the corpus of early English correspondence. Amsterdam: Rodopi.• Trudgill, Peter. 1974. The Social Differentiation of English in Norwich. Cambridge: Cambridge University Press.• van Leeuwen, Marco H.D., Ineke Maas and Andrew Miles. 2002. HISCO: Historical international standard classification of occupations. Leuven: Leuven University Press. 36

×