Your SlideShare is downloading. ×
Regional Science Presentation
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Regional Science Presentation

709

Published on

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
709
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • It is worth noting that the trends/ regions I am seeking to identify are best represented in “Anglo-Saxon” names or those with origins in Britain. Migrant names, although interesting, are included in the calculations but do not exert significant influence on regional characteristics. The exception is London in the 2001 data.
  • Kernel Density Estimation maps to show the areas of highest frequency of a particular name in Britain. Two extremely common names at the top, two rarer names at the bottom.
  • Data analyzed at district level in this study. The two years are kept separate and provide interesting comparisons of the changing regions. No study has utlized this volume of data in the mapping or regionalizing of British surnames. Only a couple have attempted a study on the national scale, and none have attempted comparisons with a historical dataset.
  • Lasker’s Coefficient of Isonymy is widely used for surname studies and extends the idea of monophyly (sharing a single common ancestor) between two populations. Measure explained as the probability of members of two populations or subpopulations having genes in common by descent as estimated from sharing the same surnames. No the intention of this talk to go into significant depth regarding this measure.
  • In this example, you can see from the 1881 Matrix that Yeovil is more similar in surname structure to Aberayron than Aberdeen. Diagram represents how the districts would look if projected into Lasker’s Distance Space. Clustering (represented by dashed circles) enables groups of districts with similar names to be regionalized.
  • K-m eans is stochastic- only a couple of results from the Corby and Audlem examples have maps produced from K -means results due to time constraints in this talk.
  • Animated cube on the left shows the relative position of each of the 2001 districts following the MDS applied to the Lasker’s Distances between districts. One can clearly see that the clusters in the cube represent geographical locations. Animated colour cube on the right illustrates how each of the MDS coordinates are converted into colour values according to the axes of the right hand cube. These values allow colours to be assigned to districts to produce the choropleth map in the next slide. Districts that are more similar have coordinates that place them closer together in the colour cube. This means that they receive similar colour values.
  • Results from the MDS analysis for of the Lasker’s Distances for 1881 (left) and 2001 (right). The more gradual change for 2001 clearly shows isolation of names by distance but also a more homogenous Britain than the one mapped for 1881.
  • Explain what the tree means from the top down: In 1881 the first split occurs between England and Wales, followed by a north/ south split in England then a split between north England and Scotland. In 2001 the first split occurs between England and Scotland, then England, Wales and Scotland, then North/ South England.
  • Map of Ward’s Clustering, splitting Britain into 15 clusters. Despite the fact that spatial information regarding the geographical locations of the districts has not been included in the clustering and that there are no continuity constraints, the resulting regions at 15 clusters are surprisingly homogenous.
  • Map of Ward’s Clustering, splitting Britain into 15 clusters. Despite the fact that spatial information regarding the geographical locations of the districts has not been included in the clustering and that there are no continuity constraints, the resulting regions at 15 clusters are surprisingly homogenous.
  • The town of Corby is consistently clustered/ highlighted as a Scottish District in 2001, not a central England as would be expected given its location in Northamptonshire. This is not the case with the 1881 data, suggesting a Scottish migration into the area.
  • This migration theory appears to be plausible.
  • This migration theory appears to be plausible.
  • Finally, the town that voted to be Welsh. Do the surnames of its population get clustered into the Welsh group or an English one?
  • Political motives, such as free prescriptions, rather than genealogical or cultural motives appear to be driving the locals to vote to be Welsh. It could of course also have been tongue in cheek!.
  • Suggest that the MDS is a more elegant way of mapping surnames. It does not require a preconception of the number of clusters and also facilitates an impression of gradual change in surname structure, if one exists, rather than the abrupt changes inevitably inferred by the Ward’s and K-means clustering.
  • Transcript

    • 1. Surnames as Indicators of Cultural Regions James Cheshire PhD Supervisors: Prof. Paul Longley, Dr Pablo Mateos Department of Geography, University College London Research Blog: jamescheshire.co.uk Email: james.cheshire@ucl.ac.uk
    • 2. Outline
      • Regional identity in Britain.
      • Surnames and regions.
      • Data and Lasker’s Distance.
      • Regionalization Results: Multidimensional Scaling, Clustering.
      • Corby: an interesting example.
      • Future Work
      • Conclusions.
    • 3. Regional identity in Britain
    • 4. Surnames and Regions
      • Many surnames originate from a specific area.
      • The highest frequency of these names still exists in their place of origin.
      • We can therefore expect areas to possess unique combinations of names.
      • We can also expect certain types of surname to occur more frequently in some areas rather than others.
      • This study draws on the above assertions to identify areas/ populations that have similar surname structures within Great Britain.
    • 5. Some Examples: Lewis Smith Macleod Buckley
    • 6. Data 2001 Enhanced Electoral Roll 45.6 Million People 1,597, 805 Surnames 1,457, 681< 10 occurrences 1.5 million postcodes, 436 Districts 1881 Census 29 Million People 425, 793 Surnames 345, 781 <10 occurrences 657 Districts Worldnames Database Approx. 300 million individuals, 26 Countries
    • 7. Creating Regions: Aggregating Surname Data
      • Isonymy : The occurrence of the same name in marriage.
        • The smaller the surname ‘pool’ the greater the probability of isonymy .
      • Geneticists developed the Coefficient of Isonymy to estimate the probability of isonymy between two populations.
      L x,y = -log e 2(R x,y ) x and y: Districts i: Surname x i and y i : Freq. proportional to the x and y total popn.
      • The Coefficient of Isonymy has been extended to a distance measure, the Lasker’s Distance, for comparison between populations.
    • 8. Creating Regions: Aggregating Surname Data - Each district in Britain is assigned a position in “surname space” based on a matrix Lasker’s Distances. 95Z 99ZZ OOLN 00BL 7.520982 7.336616 7.219516 00BM 7.428889 7.315671 7.425037 00BN 7.347616 7.356772 7.394888 00BP 7.452982 7.299915 7.330886 00BQ 7.410027 7.300150 7.387787 Yarmouth Yeovil York Aberayron 6.389540 6.289929 6.438361 Aberdeen 6.356152 7.019357 6.213222 Abergavenny 6.412893 6.361753 6.566717 Aberystwith 6.327093 6.319481 6.467985 Abingdon 6.353814 6.559106 6.621873 2001 Matrix 1881 Matrix District x Lasker’s Distance
    • 9. District x Lasker’s Distance Creating Regions: Grouping Lasker’s Distance - Multidimensional Scaling - Clustering: Ward’s Hierachical Clustering K- Means
    • 10. Creating Regions: Multidimensional Scaling http://www.let.rug.nl/~kleiweg/indexs.html North East North West Yorkshire and the Humber East Midlands West Midlands East of England South East South West Wales Scotland Northern Ireland 1881
    • 11. Creating Regions: Multidimensional Scaling 1881 2001
    • 12. Creating Regions: Ward’s Hierarchical Clustering 1881 2001
    • 13. Creating Regions: Ward’s Hierarchical Clustering 1881 2001
    • 14. Danish Rule
    • 15. Corby: A Scottish Town? 1881 2001 MDS Ward’s K -Means
    • 16. Corby: A Scottish Town? In 1932 Stewarts and Lloyds built a new iron and steel works in Corby. Workforce sourced from closing Scottish steelworks, mainly in Lanarkshire. Into the 1970s, 50% of the incoming population Scottish. Transformed population from 1,500 to 34,000 . Annual Highland Games.
    • 17. Future Work
      • Methodological:
      • Different input geographies.
      • Narrow focus to specific areas/ groups of names.
      • Validation :
      • Comparison with genetics data.
      • Telephone call flows.
      • Application:
      • Genetic sampling strategy: “local names”.
      • Expansion:
      • - Incorporating Worldnames data for regionalisation of Europe.
    • 18. Audlem…Is it Welsh?
    • 19. Back to Audlem…Is it Welsh?
    • 20. Conclusions
      • Unprecedented mobility over the last century has failed to erase surname regions.
      • Clustering and MDS provide powerful methods for drawing out surname trends.
      • More research is required into the methods and scales to which they can be applied.
      jamescheshire.co.uk
    • 21. References Lasker Distance: Lasker, G. W. and C. G. N. Mascie-Taylor (2001). &quot;The genetic structure of English villages: surname diversity changes between 1976 and 1997.&quot; Annals of Human Biology 28(5): 546-553. K-Means: Adnan, M., Singleton, A.D., Brunsdon, C., Longley, P.A. 2009. Moving to Real-Time Segmentation: Efficient Computation of Geodemographic Classification. GISRUK 2009. Multidimensional Scaling Plots: Kleiweg, P. : http://www.let.rug.nl/~kleiweg/L04/ Monmonier Algorithm: Manni, F., E. Guerard, et al. (2004). &quot;Geographic Patterns of (Genetic, Morphologic, Linguistic) Variation: How Barriers Can Be Detected by Using Monmonier’s Algorithm.&quot; Human Biology 76(2): 173-190. KDE: Crimestat Workbook: http://www.icpsrdirect.org/CRIMESTAT/workbook/CrimeStat_III_Workbook_PowerPoint.ppt R Packages: Adegenet, cluster, maptools, rgl, sm, spdep , splancs from http://cran.r-project.org iL04_1.13 from http://www.let.rug.nl/~kleiweg/L04/ All boundary data from the maps Crown Copyright Ordnance Survey 2009.
    • 22. Please Visit for slides: jamescheshire.co.uk

    ×