Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Family History and Linked Data
Free UK Genealogy Open Data
Conference, 30 January 2016
Richard Light
Lights
Kerridges
Kerridges + Lights!
Kerridge and Light
• … and Weissbeck
• relatively uncommon names
• How can FreeBMD and FreeCen help?
Other people were here first …
• Lots of Kerridge research
• Lights actually feature in a book: Common
People (Alison Ligh...
Kerridge
Light
Pooling results
• Do we want to do it? (Not everyone does …)
• If so, how can it be done?
• How do you say that you’re bot...
Current FreeUKGen search facilities
• BMD search is sophisticated and flexible
• Only one result type: people who match
• ...
BMD search
Register search
Census search
Limitations of current search
• Limit of 3000 hits per BMD search
• Difficult to get to household info
• Result pages can’...
Getting machine-processible data
• Save FreeBMD HTML results page
• Copy table of results
• Paste into spreadsheet
• Save ...
BMD data in Modes
Limitations
• Imprecision
– temporal, e.g. BMD ‘after the event’ and grouped by
quarter
– geographical: BMD only specifies...
Encoding a BMD entry as XML
Indexed search, e.g. places
Inference of birth data
Speculative matching death -> birth
Working with census data
• Initial efforts ‘broke’ FreeCen!
• Data had to be loaded from a full dump
• Loaded all District...
Districts
Pieces
Households
Census data: co-contextuality
• Each ‘household’ records relationships
between people
• Binary links between ‘Head’ and ot...
Household summaries
Occupations - Kerridge
Occupations of Kerridges (>1)
KERRIDGE Scholar KERRIDGE - KERRIDGE Ag Labr
KERRIDGE Agricultural La...
Occupations - Light
Occupations of Lights (>1)
LIGHT Scholar LIGHT Ag Lab LIGHT Ag Laborer LIGHT Labourer LIGHT Copper Min...
Cross-linking census data to BMD
• Census records include place of birth and age
• Can use same inference techniques to ma...
An Open Data FreeUKGen API …
• … could be HTTP-based; RESTful
• would support a wide variety of information
needs
• would ...
The problem of identity
• All my data files use invented primary keys for
people, places, … which are only significant
wit...
Linked Data
• One step beyond Open Data
• Combines idea of machine-processible data
with a persistent identity for each co...
Linked Data example: Wordsworth
Trust
Museum catalogue data as RDF
Everything comes from the same URL
http://collections.wordsworth.org.uk/Object/WTcoll/id/GRMDC.C144.9
By default, return H...
What FreeUKGen resources could we
publish as Linked Data?
• Can only assign identifiers to data we have
– BMD registration...
Potential Linked Data projects
• Produce authorities which can be integrated
into current approach:
– Geographical units: ...
Let the computer work harder!
• Current approach makes very little use of the
computer as a data-processing tool
• FreeUKG...
Cultural Heritage Linked Data
Thank you!
Richard Light
FreeUKGen Trustee
@richardofsussex
richard@light.demon.co.uk
Upcoming SlideShare
Loading in …5
×
Upcoming SlideShare
Opening up Data - the benefits and value from a community and funding perspective
Next
Download to read offline and view in fullscreen.

1

Share

Download to read offline

Open data and Free UK Genealogy

Download to read offline

Free UK Genealogy Open Data Conference, 30 January 2016
by Richard Light

Open data and Free UK Genealogy

  1. 1. Family History and Linked Data Free UK Genealogy Open Data Conference, 30 January 2016 Richard Light
  2. 2. Lights
  3. 3. Kerridges
  4. 4. Kerridges + Lights!
  5. 5. Kerridge and Light • … and Weissbeck • relatively uncommon names • How can FreeBMD and FreeCen help?
  6. 6. Other people were here first … • Lots of Kerridge research • Lights actually feature in a book: Common People (Alison Light)
  7. 7. Kerridge
  8. 8. Light
  9. 9. Pooling results • Do we want to do it? (Not everyone does …) • If so, how can it be done? • How do you say that you’re both talking about the same person?
  10. 10. Current FreeUKGen search facilities • BMD search is sophisticated and flexible • Only one result type: people who match • Census search has same approach, with links to individual households
  11. 11. BMD search
  12. 12. Register search
  13. 13. Census search
  14. 14. Limitations of current search • Limit of 3000 hits per BMD search • Difficult to get to household info • Result pages can’t be bookmarked – http://www.freecen.org.uk/cgi/search.pl • Main problem: searches all return HTML!
  15. 15. Getting machine-processible data • Save FreeBMD HTML results page • Copy table of results • Paste into spreadsheet • Save as CSV file • Convert to XML and load into Modes
  16. 16. BMD data in Modes
  17. 17. Limitations • Imprecision – temporal, e.g. BMD ‘after the event’ and grouped by quarter – geographical: BMD only specifies District; Census -> Parish – names: variations in spelling – copying/transcription errors • Incompleteness – overseas births/deaths – non-registration – transcription backlog
  18. 18. Encoding a BMD entry as XML
  19. 19. Indexed search, e.g. places
  20. 20. Inference of birth data
  21. 21. Speculative matching death -> birth
  22. 22. Working with census data • Initial efforts ‘broke’ FreeCen! • Data had to be loaded from a full dump • Loaded all Districts, Pieces and Households • Selectively loaded Light and Kerridge records • Then loaded all people registered in one of these Light or Kerridge households • Shows up Lights/Kerridges as servants, in institutions, etc.
  23. 23. Districts
  24. 24. Pieces
  25. 25. Households
  26. 26. Census data: co-contextuality • Each ‘household’ records relationships between people • Binary links between ‘Head’ and others, but other family relationships can be inferred • Nothing like the completeness of FreeBMD, but more can be done with the data that is there
  27. 27. Household summaries
  28. 28. Occupations - Kerridge Occupations of Kerridges (>1) KERRIDGE Scholar KERRIDGE - KERRIDGE Ag Labr KERRIDGE Agricultural Labourer(Em'ee) KERRIDGE Farmer's Son KERRIDGE Farm Labourer(Em'ee) KERRIDGE Farmer(Em'er) KERRIDGE Labourer(Em'ee) KERRIDGE Domestic Servant KERRIDGE Farm Labr KERRIDGE Agricultural Laborer(Em'ee) KERRIDGE Brickmaker(Em'ee) KERRIDGE Farm Labourer (Em'ee) KERRIDGE Retired Ag Labr
  29. 29. Occupations - Light Occupations of Lights (>1) LIGHT Scholar LIGHT Ag Lab LIGHT Ag Laborer LIGHT Labourer LIGHT Copper Miner LIGHT Female Servant LIGHT Miner LIGHT Pauper LIGHT Sawyer LIGHT Tin Miner(Em'ee) LIGHT - LIGHT Butcher(Em'ee) LIGHT Coal Miner(Em'ee) LIGHT Cordwainer LIGHT Gardener LIGHT General Servant LIGHT Independent LIGHT Mariner LIGHT Milliner LIGHT Miner Copper
  30. 30. Cross-linking census data to BMD • Census records include place of birth and age • Can use same inference techniques to match against BMD data
  31. 31. An Open Data FreeUKGen API … • … could be HTTP-based; RESTful • would support a wide variety of information needs • would deliver a variety of machine-processible formats • would allow re-use of the data
  32. 32. The problem of identity • All my data files use invented primary keys for people, places, … which are only significant within my database • In general, how do we assert that two statements are about the same person? • None of these is sufficient on its own: – Name – Date of birth/death – Place of birth/death
  33. 33. Linked Data • One step beyond Open Data • Combines idea of machine-processible data with a persistent identity for each concept • Uses content negotiation to return RDF, XML, JSON, … for each URL • Allows programmatic access to data; processing chains (‘follow your nose’) • Requires suitably open licensing
  34. 34. Linked Data example: Wordsworth Trust
  35. 35. Museum catalogue data as RDF
  36. 36. Everything comes from the same URL http://collections.wordsworth.org.uk/Object/WTcoll/id/GRMDC.C144.9 By default, return HTML: http://collections.wordsworth.org.uk/Object/WTcoll/id/html/GRMDC.C144.9 When RDF requested (in Accept header), redirect to a variant URL: http://collections.wordsworth.org.uk/Object/WTcoll/id/rdf/GRMDC.C144.9 Can support lots of variant formats, e.g. XML, JSON, … This approach relies on a technique called Content Negotiation Linked Data URLs are unique; persistent; dereferenceable
  37. 37. What FreeUKGen resources could we publish as Linked Data? • Can only assign identifiers to data we have – BMD registration events – Census return events – Pieces, Districts etc. • Can’t assign identifiers to people • Problem: current database update strategy generates identifiers afresh each time – Conflicts with need for persistent identifiers
  38. 38. Potential Linked Data projects • Produce authorities which can be integrated into current approach: – Geographical units: Districts, Parishes, Pieces, named places. Link to Geonames, OS Gazetteer – Occupations: potential for useful groupings (e.g. Ag Lab and variants). Link to SIC, SHIC? • Generate persistent identifiers for the primary references published by FreeUKGen – e.g. a page within the BMD index
  39. 39. Let the computer work harder! • Current approach makes very little use of the computer as a data-processing tool • FreeUKGen resources as Open Data would support new types of research and simplify e.g. Single Name Studies • FreeUKGen resources as Linked Data would give the community a common frame of reference for its work
  40. 40. Cultural Heritage Linked Data
  41. 41. Thank you! Richard Light FreeUKGen Trustee @richardofsussex richard@light.demon.co.uk
  • todrobbins

    Feb. 10, 2016

Free UK Genealogy Open Data Conference, 30 January 2016 by Richard Light

Views

Total views

775

On Slideshare

0

From embeds

0

Number of embeds

25

Actions

Downloads

3

Shares

0

Comments

0

Likes

1

×