3A_3_Informing population genetics through spatial analysis of surnames


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

3A_3_Informing population genetics through spatial analysis of surnames

  1. 1. Informing Population Genetics through Spatial Analysis of Surnames James Cheshire University College London Department of Geography spatialanalysis.co.uk @spatialanalysis
  2. 2. Outline <ul><li>Context: the geographers perspective. </li></ul><ul><li>Context: the geneticists perspective. </li></ul><ul><li>Combining the two: requirements. </li></ul><ul><li>Proposed solution: Kernel Density Estimation. </li></ul><ul><li>Applications to health. </li></ul><ul><li>Further Analysis. </li></ul><ul><li>Conclusions. </li></ul>
  3. 3. Surnames as Spatial Data <ul><li>Everyone has a surname. </li></ul><ul><li>Widely recorded alongside location. </li></ul><ul><li>Majority originated in a specific areas. </li></ul><ul><li>All maintain distinct spatial patterns. </li></ul><ul><li>For many these patterns are closely tied to where the name originated. </li></ul>Context: the geographers perspective.
  4. 4. Surnames as Genetic Data <ul><li>In many cultures surnames are inheritable characteristics. </li></ul><ul><li>In the case of Britain they generally follow the male line. This is similar to the Y-chromosome. </li></ul><ul><li>If you share a surname with someone you are more likely to be related to them than if you don’t. </li></ul>Context: the geneticists perspective. nist.gov
  5. 5. Surnames as Genetic Data Context: the geneticists perspective. Genetic Variation in Europe. Cavalli-Sforza 2001 .
  6. 6. Requirements <ul><li>Representative and valid spatial analysis. </li></ul><ul><li>Comprehensive coverage of surnames. </li></ul><ul><li>Ability to filter out “non-local” names. </li></ul><ul><li>Flexibility in the degree of filtering. </li></ul><ul><li>Common software platform. </li></ul>Combining the two: requirements
  7. 7. Data Combining the two: requirements 2001 Enhanced Electoral Roll 45.6 Million People 1,597, 805 Surnames 1,457, 681< 10 occurrences 1.5 million postcodes 1881 Census 4, 679, 574 People 425, 793 Surnames 345, 781 <10 occurrences 657 Districts
  8. 8. Data Surnames and place of birth of 842 volunteers* and their maternal/ paternal grandparents. * to qualify for the study volunteers had to be born within 60 km of the birthplaces of 3 out of 4 grandparents. All birthplaces should be “rural”.
  9. 9. Kernel Density Estimation Proposed Solution Calculates the probability of a surname occurring in an area. Adjusted by altering bandwidth size and model, grid cell size, sample size within kernel (interval), dual or single KDE, applying a weight to each point. Following KDE parameters used: - 50 x 50 grid. - Fixed bandwidth that changes with each name. - Each point weighted by the location quotient of surname occurrence. - Constrained by coast. Normal Uniform Quartic Triangular
  10. 10. Parameters Proposed Solution
  11. 11. Results Proposed Solution Approx. 40% of the 842 people sampled
  12. 12. Applications to health Applications to health. <ul><li>Enables genetic variation due to geographic isolation to be quantified. </li></ul><ul><li>This is important when establishing the extent to which certain illnesses are controlled by genetics. </li></ul><ul><li>Efficiency in sampling. </li></ul>
  13. 13. Further Analysis Further Analysis
  14. 14. Further Analysis Further Analysis
  15. 15. Further Analysis
  16. 16. Conclusions Conclusions <ul><li>Combination of established spatial data analysis methods with a novel data source and application. </li></ul><ul><li>It will never be perfect but KDE and other techniques offer a large improvement on current genetic sampling strategies. </li></ul>