Successfully reported this slideshow.
Your SlideShare is downloading. ×

Open data and Free UK Genealogy

Loading in …3

Check these out next

1 of 41 Ad

More Related Content

Slideshows for you (19)

Similar to Open data and Free UK Genealogy (20)


Recently uploaded (20)

Open data and Free UK Genealogy

  1. 1. Family History and Linked Data Free UK Genealogy Open Data Conference, 30 January 2016 Richard Light
  2. 2. Lights
  3. 3. Kerridges
  4. 4. Kerridges + Lights!
  5. 5. Kerridge and Light • … and Weissbeck • relatively uncommon names • How can FreeBMD and FreeCen help?
  6. 6. Other people were here first … • Lots of Kerridge research • Lights actually feature in a book: Common People (Alison Light)
  7. 7. Kerridge
  8. 8. Light
  9. 9. Pooling results • Do we want to do it? (Not everyone does …) • If so, how can it be done? • How do you say that you’re both talking about the same person?
  10. 10. Current FreeUKGen search facilities • BMD search is sophisticated and flexible • Only one result type: people who match • Census search has same approach, with links to individual households
  11. 11. BMD search
  12. 12. Register search
  13. 13. Census search
  14. 14. Limitations of current search • Limit of 3000 hits per BMD search • Difficult to get to household info • Result pages can’t be bookmarked – • Main problem: searches all return HTML!
  15. 15. Getting machine-processible data • Save FreeBMD HTML results page • Copy table of results • Paste into spreadsheet • Save as CSV file • Convert to XML and load into Modes
  16. 16. BMD data in Modes
  17. 17. Limitations • Imprecision – temporal, e.g. BMD ‘after the event’ and grouped by quarter – geographical: BMD only specifies District; Census -> Parish – names: variations in spelling – copying/transcription errors • Incompleteness – overseas births/deaths – non-registration – transcription backlog
  18. 18. Encoding a BMD entry as XML
  19. 19. Indexed search, e.g. places
  20. 20. Inference of birth data
  21. 21. Speculative matching death -> birth
  22. 22. Working with census data • Initial efforts ‘broke’ FreeCen! • Data had to be loaded from a full dump • Loaded all Districts, Pieces and Households • Selectively loaded Light and Kerridge records • Then loaded all people registered in one of these Light or Kerridge households • Shows up Lights/Kerridges as servants, in institutions, etc.
  23. 23. Districts
  24. 24. Pieces
  25. 25. Households
  26. 26. Census data: co-contextuality • Each ‘household’ records relationships between people • Binary links between ‘Head’ and others, but other family relationships can be inferred • Nothing like the completeness of FreeBMD, but more can be done with the data that is there
  27. 27. Household summaries
  28. 28. Occupations - Kerridge Occupations of Kerridges (>1) KERRIDGE Scholar KERRIDGE - KERRIDGE Ag Labr KERRIDGE Agricultural Labourer(Em'ee) KERRIDGE Farmer's Son KERRIDGE Farm Labourer(Em'ee) KERRIDGE Farmer(Em'er) KERRIDGE Labourer(Em'ee) KERRIDGE Domestic Servant KERRIDGE Farm Labr KERRIDGE Agricultural Laborer(Em'ee) KERRIDGE Brickmaker(Em'ee) KERRIDGE Farm Labourer (Em'ee) KERRIDGE Retired Ag Labr
  29. 29. Occupations - Light Occupations of Lights (>1) LIGHT Scholar LIGHT Ag Lab LIGHT Ag Laborer LIGHT Labourer LIGHT Copper Miner LIGHT Female Servant LIGHT Miner LIGHT Pauper LIGHT Sawyer LIGHT Tin Miner(Em'ee) LIGHT - LIGHT Butcher(Em'ee) LIGHT Coal Miner(Em'ee) LIGHT Cordwainer LIGHT Gardener LIGHT General Servant LIGHT Independent LIGHT Mariner LIGHT Milliner LIGHT Miner Copper
  30. 30. Cross-linking census data to BMD • Census records include place of birth and age • Can use same inference techniques to match against BMD data
  31. 31. An Open Data FreeUKGen API … • … could be HTTP-based; RESTful • would support a wide variety of information needs • would deliver a variety of machine-processible formats • would allow re-use of the data
  32. 32. The problem of identity • All my data files use invented primary keys for people, places, … which are only significant within my database • In general, how do we assert that two statements are about the same person? • None of these is sufficient on its own: – Name – Date of birth/death – Place of birth/death
  33. 33. Linked Data • One step beyond Open Data • Combines idea of machine-processible data with a persistent identity for each concept • Uses content negotiation to return RDF, XML, JSON, … for each URL • Allows programmatic access to data; processing chains (‘follow your nose’) • Requires suitably open licensing
  34. 34. Linked Data example: Wordsworth Trust
  35. 35. Museum catalogue data as RDF
  36. 36. Everything comes from the same URL By default, return HTML: When RDF requested (in Accept header), redirect to a variant URL: Can support lots of variant formats, e.g. XML, JSON, … This approach relies on a technique called Content Negotiation Linked Data URLs are unique; persistent; dereferenceable
  37. 37. What FreeUKGen resources could we publish as Linked Data? • Can only assign identifiers to data we have – BMD registration events – Census return events – Pieces, Districts etc. • Can’t assign identifiers to people • Problem: current database update strategy generates identifiers afresh each time – Conflicts with need for persistent identifiers
  38. 38. Potential Linked Data projects • Produce authorities which can be integrated into current approach: – Geographical units: Districts, Parishes, Pieces, named places. Link to Geonames, OS Gazetteer – Occupations: potential for useful groupings (e.g. Ag Lab and variants). Link to SIC, SHIC? • Generate persistent identifiers for the primary references published by FreeUKGen – e.g. a page within the BMD index
  39. 39. Let the computer work harder! • Current approach makes very little use of the computer as a data-processing tool • FreeUKGen resources as Open Data would support new types of research and simplify e.g. Single Name Studies • FreeUKGen resources as Linked Data would give the community a common frame of reference for its work
  40. 40. Cultural Heritage Linked Data
  41. 41. Thank you! Richard Light FreeUKGen Trustee @richardofsussex