Using texts to explore historical texts:           Examples from Lake District literature and the                   Regist...
What is GIS?
Change in Infant Mortality in  England & Wales, 1851-2001      180      160      140      120      100IMR      80      60 ...
Traditional HGIS:Infant mortality decline in England & Wales, 1851-1911                                          30       ...
Distant ReadingGraphs (p. 16)         Maps (p. 55)            Trees (p. 73)                             Moretti (2005) Gra...
Literary Mapping of the Lakes• British Academy funded pilot project  with David Cooper and Sally Bushell• Two tours of the...
Place names coded in XML<p in_text="Y">On Sunday Augt. 1st - half after 12 I had a Shirt, cravat, 2 pair ofStockings, a li...
Convert to a GISOS 1:50,000 gazetteer – all places on 1:50,000 maps• Accuracy• Spelling problems• Disambiguation
Coleridge & Gray in a GIS
Smoothed surface of Gray’s places  All mentions            Visits
Smoothed surface of Coleridges’s places     All mentions              Visits                           Class intervals are...
Comparing Coleridge and Gray  All mentions         Visits                     Green: Only in Gray                     Yell...
Mapping Emotional Response Gray              Coleridge
Physical Characteristics of Tours                            70                                                           ...
Close Reading with Internet Mapping                     http://www.lancs.ac.uk/mappingthelakes                     http://...
The Histpop Collection• Covers the printed reports published in the Census  and the Registrar General’s Annual Reports, 18...
Dot maps of place-name instances
Place-name instances, 1850s                  Density Smoothing   Cluster identification:                                  ...
Extract place-names   Word         Cnt      Kernel         Density   Cnt Frequency               DensityNorth Shields   30...
Collocation• “In Southwick and Monkwearmouth offensive nuisances  abound.”• “At Royton, in Oldham, where the drainage was ...
KWIC of “West Bromwich”
Most common words in clusters•   Uses Mutual Information scores – top 10 for each cluster, excluding place-names, numbers,...
“Company” in Cluster 5
Mentions of diseases collocating to             place-names       Mentions of diseases from 1850 to 1910                  ...
Places that collocate with “measles”                                 www.histpop.org
Comparing texts with statistics        40    %        30        20        10                                              ...
Do mentions of “Diarrhoea, dysentery and cholera”   correlate with deaths from these diseases?                            ...
Geographical Text Analysis• Combination of Corpus Linguistics and GIS allows us to:   – 1. Geographical approach:       • ...
Further work• HistPop• BL’s C19th Century  Newspapers• Other sources
Upcoming SlideShare
Loading in …5
×

Ig ihr 2012

315 views
253 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
315
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Ig ihr 2012

  1. 1. Using texts to explore historical texts: Examples from Lake District literature and the Registrar General’s Reports Ian Gregory Lancaster UniversityAcknowledgements: Alistair Baron, Patricia Murrieta-Flores, Andrew Hardie , and Paul Rayson (Lancaster) Claire Grover (Edinburgh) – providing access to the geo-reference Histpop data Richard Deswarte – help with the HistPop data
  2. 2. What is GIS?
  3. 3. Change in Infant Mortality in England & Wales, 1851-2001 180 160 140 120 100IMR 80 60 40 20 0 1851 1861 1871 1881 1891 1901 1911 1921 1931 1941 1951 1961 1971 1981 1991 2001
  4. 4. Traditional HGIS:Infant mortality decline in England & Wales, 1851-1911 30 20 1 . 10 2 3 % national rate 4 0 5 1850s 1860s 1870s 1880s 1890s 1900s 6 -10 7 8 -20 -30 Source: Gregory (2008) Annals of the Assoc. of American Geographers
  5. 5. Distant ReadingGraphs (p. 16) Maps (p. 55) Trees (p. 73) Moretti (2005) Graphs, Maps, Trees
  6. 6. Literary Mapping of the Lakes• British Academy funded pilot project with David Cooper and Sally Bushell• Two tours of the Lake District – Thomas Gray, 1769 (9,000 words) • Proto-Picturesque – ST Coleridge, 1802 (10,000 words) • Romantic• Aims: – Can we create a GIS of text? – What can it offer to literary research?• Method: – Texts typed up by hand – Places tagged manually – Conversion – Analysis
  7. 7. Place names coded in XML<p in_text="Y">On Sunday Augt. 1st - half after 12 I had a Shirt, cravat, 2 pair ofStockings, a little paper &amp; half a dozen Pens, a German Book (Vosss Poems)&amp; a little Tea &amp; Sugar, with my Night Cap, packed up in my natty green oil-skin, neatly squared, and put into my <format format_type="I">net</format>Knapsack / and the Knap-sack on my back &amp; the Besom stick in my hand, whichfor want of a better, and in spite of <person>Mrs C.</person> &amp;<person>Mary</person>, who both raised their voices against it, especially as I leftthe Besom scattered on the Kitchen Floor, off I sallied - over theBridge<my_comment><pl_name visited="Y">Greta Bridge,Keswick</pl_name></my_comment>, thro the Hop-Field, thro the <pl_namevisited="Y">Prospect Bridge</pl_name> at <pl_namevisited="Y">Portinscale</pl_name>, so on by the tall Birch that grows out of thecenter of the huge Oak, along into <pl_name visited="Y">Newlands</pl_name>--<pl_name visited="Y">Newlands</pl_name>is indeed a lovely Place-the houses…
  8. 8. Convert to a GISOS 1:50,000 gazetteer – all places on 1:50,000 maps• Accuracy• Spelling problems• Disambiguation
  9. 9. Coleridge & Gray in a GIS
  10. 10. Smoothed surface of Gray’s places All mentions Visits
  11. 11. Smoothed surface of Coleridges’s places All mentions Visits Class intervals are 10 equal intervals of the all mentions. Bandwidth=10km
  12. 12. Comparing Coleridge and Gray All mentions Visits Green: Only in Gray Yellow: Evenly in both Red: Only in Coleridge
  13. 13. Mapping Emotional Response Gray Coleridge
  14. 14. Physical Characteristics of Tours 70 700 60 600 50 % of mentions 500 Pop Density 40 400 30 300Gray 20 200 10 100 0 0 0 to 99 100 to 200 to 300 to 400 to 500 to 600 to 700 to 800+ STC Not visited STC Visited Grey Not visited Grey Visited 199 299 399 499 599 699 799 70 Height 60 Normal Visited Didnt visit/Unclear 50 % of mentions 1000 40Coleridge 30 Pop. Density 100 20 10 0 10 0 to 99 100 to 200 to 300 to 400 to 500 to 600 to 700 to 800+ 199 299 399 499 599 699 799 Height 1 STC Not visited STC Visited Grey Not visited Grey Visited Visited Didnt visit/Unclear Logged Altitude of mentions Population density
  15. 15. Close Reading with Internet Mapping http://www.lancs.ac.uk/mappingthelakes http://www.lancs.ac.uk/mappingthelakes/v2
  16. 16. The Histpop Collection• Covers the printed reports published in the Census and the Registrar General’s Annual Reports, 1801- 1937• Nearly 13,000,000 words• Georeferenced by C. Grover (University of Edinburgh)• Just concerned with the Registrar General’s Reports, 1851-1911• Total: 3,750,000 words• England & Wales: 2,000,000 words• http://www.histpop.org
  17. 17. Dot maps of place-name instances
  18. 18. Place-name instances, 1850s Density Smoothing Cluster identification: Standard deviationswww.histpop.org of density
  19. 19. Extract place-names Word Cnt Kernel Density Cnt Frequency DensityNorth Shields 300 Bermondsey .5849 6London 294 Newington .5842 4Durham 207 Spitalfields .5835 1Nottingham 193 Whitechapel .5835 1Liverpool 171 Stepney .5823 2Hawarden 145 Rotherhithe .5809 5Grantham 131 London .5803 294Cardington 125 Shoreditch .5794 1Linslade 121 Bethnal Green .5788 4Wakefield 121 Camberwell .5787 12 58th: Southwick .3498 1 (nr Sunderland)
  20. 20. Collocation• “In Southwick and Monkwearmouth offensive nuisances abound.”• “At Royton, in Oldham, where the drainage was imperfect, typhoid fever was prevalent”• “The deaths in the Liverpool workhouse, in the Mount Pleasant sub-district of Liverpool, were above 100 more than in the same period of the two previous years, owing chiefly to an epidemic of measles among children of German emigrants temporarily located in this institution; there were also 101 deaths from typhus, nearly all of which occurred in the workhouse.”
  21. 21. KWIC of “West Bromwich”
  22. 22. Most common words in clusters• Uses Mutual Information scores – top 10 for each cluster, excluding place-names, numbers, and punctuation• 1 (North-East): Fog, took [changes in rainfall or temperature took place], largest [changes in weather], least [as largest], dense [weather related], greatest [weather], observatory, Asiatic [cholera], Halos [lunar or solar], thunder. WEATHER• 2 (Wakefield): Falls, rain, seen [meteorological phenomena or “swallows”], reading, fell [snow or rain], number [met. readings], June, March. WEATHER• 3 (South Lancs): declining [marriages, births or mortality], incorporated [boundary changes], noted [health or weather], cubic [cubic feet – earth movement for sanitation], workhouse, sail [Irish emigrants sailing from Liverpool], observatory, aurora, salutary [salutary effects that led to death], took [weather]. MIXED• 4 (Oxon to Beds): cuckoo [was first heard], infirmary, Regius Professor, intermittent [intermittent fevers], sleet, solar, halos, least [rainfall or temperature], heard [thunder], thunder - WEATHER• 5 (London): changed [changed water supply], anemometer, exclusively [supplied by one water company], hospital, command [front matter], Junction [Grand Junction Water Company], Company [almost always water company], pipes, Bills [Bills of Mortality], asylum, sewage – WATER SUPPLY
  23. 23. “Company” in Cluster 5
  24. 24. Mentions of diseases collocating to place-names Mentions of diseases from 1850 to 1910 1600 1400 1200 Frequency 1000 800 600 400 200 0 Scarlet- Whooping Diarrhoea Diphtheria Dysentery Measles Smallpox Fever -cough Mentions_1850-1911 1555 1261 332 1513 964 333 23 Diseases related to placenames 700 600 Whooping cough 500 Mentions Smallpox 400 Dysenterya 300 Scarlet Fever 200 100 Diphtheria 0 Measles 1850 1860 1870 1880 1890 1900 1910 Diarrhoea Decades
  25. 25. Places that collocate with “measles” www.histpop.org
  26. 26. Comparing texts with statistics 40 % 30 20 10 Mentions of measles Districts 0 Population 1 2 3 4 5 6 7 8 Urban Level % national Sample areas pop (1911)1 9.4 Stow on the Wold (Glou), Whitchurch (Hants.), Hexham (N’humb), Oakham (Rutland), Northallerton (N.Rid.), Holbeach (Lincs)2 13.0 Cockermouth (Cumb), Chippenham (Wilts), Bridport (Dorset), Bangor (Carn), Alton (Hants), Pembroke (Pembs)3 17.8 Guildford (Surrey), Redruth (Corn), York (E.Rid), Bucklow (Chesh), Chorley (Lancs), Maidstone (Kent)4 18.7 Swansea, Canterbury, Hastings, Rochdale, Bolton, Wolverhampton5 18.0 Sheffield, Leeds, Oxford, Southampton, Coventry, Edmonton (Mdlsex)6 11.9 Exeter, Hull, Nottingham, Portsmouth, Leicester, Salford (Lancs)7 9.0 Most of London, also Manchester, Liverpool and Birmingham8 2.1 Only London, mainly East End
  27. 27. Do mentions of “Diarrhoea, dysentery and cholera” correlate with deaths from these diseases? IMRchdidy Mchdiady Kendalls tau_b IMRchdidy Correlation Coefficient 1.000 .225** Sig. (1-tailed) .000 N 626 626 Mchdiady Correlation Coefficient ** 1.000 .225 Sig. (1-tailed) .000 N 626 626 Spearmans rho IMRchdidy Correlation Coefficient 1.000 .290** Sig. (1-tailed) .000 N 626 626 Mchdiady Correlation Coefficient ** 1.000 .290 Sig. (1-tailed) .000 N 626 626 **. Correlation is significant at the 0.01 level (1-tailed).
  28. 28. Geographical Text Analysis• Combination of Corpus Linguistics and GIS allows us to: – 1. Geographical approach: • Ask where is this corpus talking about? • Identify place-names in areas that the corpus concentrates on. • Find out what it is saying about these places – 2. Theme of interest approach: • Find out which places are associated with our theme • Find out what it is saying in relation to this theme • Find out what other themes are associated with these places • Compare geography of place-name mentions with statistical evidence to explore biases in sources
  29. 29. Further work• HistPop• BL’s C19th Century Newspapers• Other sources

×