AddressingHistory -
 Crowdsourcing the Past

 Stuart Macdonald
 Associate Data Librarian
 EDINA & Data Library
 University of Edinburgh

 stuart.macdonald@ed.ac.uk



Association of American Geographers Annual Meeting - Working Digitally with Historical Maps, New York Public Library, 25 Feb. 2012
Phase 1
JISC-funded Community Content
project

6 months (April 2010 – September
2010)

Partner with National Library of
Scotland

Advisory Board
To create an online crowdsourcing tool which will combine
data from digitised historical Scottish Post Office
Directories (PODs) with contemporaneous historical maps

Similar to Australian Historic Newspapers project
provided by National Library of Australia where members
of the public correct and improve OCR’d text of old
newspapers - http://www.nla.gov.au/ndp/project_details/

JISC-funded Great War Archive (Univ. Oxford) that asked
members of the general public to digitise any First World
War artefacts and upload them to a purpose built website.
PODs offer a fine-grained spatial
and temporal view on social,
economic and demographic
circumstances

They provide residential names,
occupations, and addresses.

Each contain 3 sub-directories:
general, street, and trades

May also contain misc. trade
directories e.g. banking,
education, law, insurance,
medical
Phase 1 focused on 3 vols. of
Edinburgh PODs: 1784-5; 1865;
1905-6

Historic Scottish maps geo-
referenced by NLS

PODs digitised by NLS in
conjunction with the Internet
Archive

c.700 PODs (1773 to 1911)
covering 28 of Scotland's towns
and counties now online

Public domain (CC BY-NC-SA 2.5)
Using Open Layers as web-
based mapping client

Tool allows ‘the crowd’ to
georeference a POD entry by
moving a ‘map pin’ on a
digitised map thus facilitating
the addition of an grid reference
to the OCR’d POD held in XML
format in a database structure
(PostgreSQL)

API available allowing web
developers access to the raw
data in multiple output formats
(JSON, XML, CSV)

Geo-coding of POD addresses
parsed against Google
geocoder
Interface had to be easy-to-use for a
range of users

Robust and scalable to accommodate
c.700 digitised Scottish PODs

Mechanism to check user-generated
content such as geo-references,
name or address edits/annotations

Crowdsourcing of geo-coded grid
references

View original scanned directory page

Amplification of tool and API via
Social Media Channels – Facebook,
Twitter, Blog, Flickr, YouTube
Search people, place, profession




                                   Historic Map
                                   overlay
                                   selected




                                   Record edits
                                   by the ‘crowd’



                                   View original
Search
results

                                   Download
                                   options
Phase 2 sought to develop functionality to resonate with JISC’s
vision to build sustainable and durable deliverables and to
compliment phase 1 by broadening both geographic and temporal
coverage

Feb. – Sept. 2011 (EDINA
Sustainability Funding)

New content (Aberdeen, Glasgow,
Edinburgh for 1881 & 1891

Re-evaluate (and enhance) parsing
tool performance

Old parser :
•Exact geotag – 60%
•Professions – 25%
New parser (no configuration file):
•Exact geotag – 72%
•Professions – 76%
New parser (with configuration file)
•Exact geotag – 88%
•Professions – 82%
Phase 2
Other additional features include:

   •   Spatial searching (bounding box)

   •   Associate map pin with search
       results

   •   Search across multiple address

   •   Aid searching by applying Standard
       Industrial Classification (SIC) codes
       to Professions
Augmented Reality

An AddressingHistory layer has
been created and published for
use with the ‘Layar’ Application
for either iPhone or Android

Geo-referenced Points of
Interest (POIs) are uploaded
into the BuildAR CMS

POIs (e.g. each profession or
SIC Code) have an image
associated with it


The App allows users to compare their current location (from phone)
with the geo-referenced AH records in order to establish which names
and professions are located in the local vicinity
Lessons Learned
Critical mass – does geographic & temporal coverage attract and
engage the crowd?

Separate out parsing from interface and back end
storage - to allow any refinements to be implemented without
impacting on tool and API

Externalise ‘configuration’ files – editable XML-based files
that accommodate repeated OCR and content inconsistencies –
these are run in conjunction with the POD parser to refine the parsed
content hence improved searching

Parsing and refining process is almost unending -
Identify what is realistically achievable with available resources
and time constraints
- i.e. perform proper requirements analysis

Consult with others - involved in digitising and parsing
city/town/post office directories e.g. Richard Marciano
(UNC Chapel-Hill), Matt Knutzen (NYPL)
Sustainability
Given the broad applicability of the
resource a range of communities may be
interested in the longer term curation of
the project tools e.g. the Open Street Map
community, NLS

Evaluation of possible business models
for sustainability:

• revenue generation via online donations

• subscription model (e.g. per annum, per
month, per use)

• ‘freemium model’ (e.g. free API
download of a certain number of records
with payment for further downloads)

• academic advertising.
Second last slide…

New content and features to be made available start of
March 2012

Gauging the success of the project goes beyond the
delivery of engaging and innovative online tools. It will
be ultimately be measured by continual and extended
use within the wider community.
Website:
http://addressinghistory.edina.ac.uk/


  THANKING YOU!


   Credits:
   Image by aroid - http://www.flickr.com/photos/selago/34843234/ - CC BY 2.0
   Image by konqui - http://www.flickr.com/photos/konqui/2301314089/ - CC BY-NC 2.0
   Image by mosilager - http://www.flickr.com/photos/mosilager/2260598271/ - CC BY-NC-SA 2.0
   Image by racoles - http://www.flickr.com/photos/racoles/5719938981/ - CC BY-NC 2.0
   Image by James Bowe - http://www.flickr.com/photos/jamesrbowe/3351247547/ (CC BY 2.0)
   Image by yelnoc - http://www.flickr.com/photos/yelnoc/361303918/ - CC BY-NC-SA 2.0
   Image by epSos.de - http://www.flickr.com/photos/epsos/3384297473/ - CC BY 2.0
   Image by bek30 - http://www.flickr.com/photos/bek30/6107854810/ - CC BY-NC 2.0
   Image by karen horton - http://www.flickr.com/photos/karenhorton/3261277303/ - CC BY-NC 2.0
   Image by lofaesofa - http://www.flickr.com/photos/lofaesofa/227019975/ - CC BY 2.0
   Image by Psycho Delia - http://www.flickr.com/photos/24557420@N05/5588473657/ - CC BY-NC
   2.0
   Image by wdj(0) - http://www.flickr .com/photos/davidjoyner/534893725/ - CC BY-SA 2.0
   Image by Symic - http://www.flickr.com/photos/symic/2870349309/ - CC BY-SA 2.0
   Image by ~milj - http://www.flickr.com/photos/21989292@N07/4938052014/ - CC BY-NC-SA 2.0

   Acknowledgements:
   JISC - http://www.jisc.ac.uk/
   NLS Geo-referenced maps and applications - http://geo.nls.uk/
   Visualising Urban Geographies (VUG) project – http://geo.nls.uk/urbhist/
   Edinburgh City Libraries – http://www.edinburgh.gov.uk/libraries/

AddressingHistory - Crowdsourcing the Past - Stuart Macdonald

  • 1.
    AddressingHistory - Crowdsourcingthe Past Stuart Macdonald Associate Data Librarian EDINA & Data Library University of Edinburgh stuart.macdonald@ed.ac.uk Association of American Geographers Annual Meeting - Working Digitally with Historical Maps, New York Public Library, 25 Feb. 2012
  • 2.
    Phase 1 JISC-funded CommunityContent project 6 months (April 2010 – September 2010) Partner with National Library of Scotland Advisory Board
  • 3.
    To create anonline crowdsourcing tool which will combine data from digitised historical Scottish Post Office Directories (PODs) with contemporaneous historical maps Similar to Australian Historic Newspapers project provided by National Library of Australia where members of the public correct and improve OCR’d text of old newspapers - http://www.nla.gov.au/ndp/project_details/ JISC-funded Great War Archive (Univ. Oxford) that asked members of the general public to digitise any First World War artefacts and upload them to a purpose built website.
  • 4.
    PODs offer afine-grained spatial and temporal view on social, economic and demographic circumstances They provide residential names, occupations, and addresses. Each contain 3 sub-directories: general, street, and trades May also contain misc. trade directories e.g. banking, education, law, insurance, medical
  • 5.
    Phase 1 focusedon 3 vols. of Edinburgh PODs: 1784-5; 1865; 1905-6 Historic Scottish maps geo- referenced by NLS PODs digitised by NLS in conjunction with the Internet Archive c.700 PODs (1773 to 1911) covering 28 of Scotland's towns and counties now online Public domain (CC BY-NC-SA 2.5)
  • 6.
    Using Open Layersas web- based mapping client Tool allows ‘the crowd’ to georeference a POD entry by moving a ‘map pin’ on a digitised map thus facilitating the addition of an grid reference to the OCR’d POD held in XML format in a database structure (PostgreSQL) API available allowing web developers access to the raw data in multiple output formats (JSON, XML, CSV) Geo-coding of POD addresses parsed against Google geocoder
  • 7.
    Interface had tobe easy-to-use for a range of users Robust and scalable to accommodate c.700 digitised Scottish PODs Mechanism to check user-generated content such as geo-references, name or address edits/annotations Crowdsourcing of geo-coded grid references View original scanned directory page Amplification of tool and API via Social Media Channels – Facebook, Twitter, Blog, Flickr, YouTube
  • 8.
    Search people, place,profession Historic Map overlay selected Record edits by the ‘crowd’ View original Search results Download options
  • 9.
    Phase 2 soughtto develop functionality to resonate with JISC’s vision to build sustainable and durable deliverables and to compliment phase 1 by broadening both geographic and temporal coverage Feb. – Sept. 2011 (EDINA Sustainability Funding) New content (Aberdeen, Glasgow, Edinburgh for 1881 & 1891 Re-evaluate (and enhance) parsing tool performance Old parser : •Exact geotag – 60% •Professions – 25% New parser (no configuration file): •Exact geotag – 72% •Professions – 76% New parser (with configuration file) •Exact geotag – 88% •Professions – 82%
  • 10.
    Phase 2 Other additionalfeatures include: • Spatial searching (bounding box) • Associate map pin with search results • Search across multiple address • Aid searching by applying Standard Industrial Classification (SIC) codes to Professions
  • 11.
    Augmented Reality An AddressingHistorylayer has been created and published for use with the ‘Layar’ Application for either iPhone or Android Geo-referenced Points of Interest (POIs) are uploaded into the BuildAR CMS POIs (e.g. each profession or SIC Code) have an image associated with it The App allows users to compare their current location (from phone) with the geo-referenced AH records in order to establish which names and professions are located in the local vicinity
  • 13.
    Lessons Learned Critical mass– does geographic & temporal coverage attract and engage the crowd? Separate out parsing from interface and back end storage - to allow any refinements to be implemented without impacting on tool and API Externalise ‘configuration’ files – editable XML-based files that accommodate repeated OCR and content inconsistencies – these are run in conjunction with the POD parser to refine the parsed content hence improved searching Parsing and refining process is almost unending - Identify what is realistically achievable with available resources and time constraints - i.e. perform proper requirements analysis Consult with others - involved in digitising and parsing city/town/post office directories e.g. Richard Marciano (UNC Chapel-Hill), Matt Knutzen (NYPL)
  • 14.
    Sustainability Given the broadapplicability of the resource a range of communities may be interested in the longer term curation of the project tools e.g. the Open Street Map community, NLS Evaluation of possible business models for sustainability: • revenue generation via online donations • subscription model (e.g. per annum, per month, per use) • ‘freemium model’ (e.g. free API download of a certain number of records with payment for further downloads) • academic advertising.
  • 15.
    Second last slide… Newcontent and features to be made available start of March 2012 Gauging the success of the project goes beyond the delivery of engaging and innovative online tools. It will be ultimately be measured by continual and extended use within the wider community.
  • 16.
    Website: http://addressinghistory.edina.ac.uk/ THANKINGYOU! Credits: Image by aroid - http://www.flickr.com/photos/selago/34843234/ - CC BY 2.0 Image by konqui - http://www.flickr.com/photos/konqui/2301314089/ - CC BY-NC 2.0 Image by mosilager - http://www.flickr.com/photos/mosilager/2260598271/ - CC BY-NC-SA 2.0 Image by racoles - http://www.flickr.com/photos/racoles/5719938981/ - CC BY-NC 2.0 Image by James Bowe - http://www.flickr.com/photos/jamesrbowe/3351247547/ (CC BY 2.0) Image by yelnoc - http://www.flickr.com/photos/yelnoc/361303918/ - CC BY-NC-SA 2.0 Image by epSos.de - http://www.flickr.com/photos/epsos/3384297473/ - CC BY 2.0 Image by bek30 - http://www.flickr.com/photos/bek30/6107854810/ - CC BY-NC 2.0 Image by karen horton - http://www.flickr.com/photos/karenhorton/3261277303/ - CC BY-NC 2.0 Image by lofaesofa - http://www.flickr.com/photos/lofaesofa/227019975/ - CC BY 2.0 Image by Psycho Delia - http://www.flickr.com/photos/24557420@N05/5588473657/ - CC BY-NC 2.0 Image by wdj(0) - http://www.flickr .com/photos/davidjoyner/534893725/ - CC BY-SA 2.0 Image by Symic - http://www.flickr.com/photos/symic/2870349309/ - CC BY-SA 2.0 Image by ~milj - http://www.flickr.com/photos/21989292@N07/4938052014/ - CC BY-NC-SA 2.0 Acknowledgements: JISC - http://www.jisc.ac.uk/ NLS Geo-referenced maps and applications - http://geo.nls.uk/ Visualising Urban Geographies (VUG) project – http://geo.nls.uk/urbhist/ Edinburgh City Libraries – http://www.edinburgh.gov.uk/libraries/

Editor's Notes

  • #3 UK Digitisation programme Developing Community Content strand of the JISC Digitisation and e-Content programme Welsh Voices of the Great War in Wales – Cardiff University
  • #4 Online engagement tool based on web 2.0 principles Galaxy Zoo is an online astronomy project which invites members of the public to assist in classifying over sixty million galaxies Old Weather is a web-based effort to transcribe weather observations made by Royal Navy ships around the time of World War I The Great War Archive , was a 2008 project led by the University of Oxford that asked members of the general public to digitise any artefacts they held relating to the First World War and upload them to a purpose built website.
  • #5 Bank directory listing banks and banking companies Educational directory listing educational institutions and teachers by their subject Law directory listing juridical institutions and practitioners Medical directory listing medical and surgical institutions and practitioners Insurance directory listing insurance companies Rich source of adverts which give an idea as to lifestyles, spending habits, Of interest to genealogists, local or family historians, academic researchers
  • #6 44,000 historical maps of Scotland – county maps, town plans, admiralty charts (coastline), military maps, Historic OS series Plus 600 of Edinburgh and its environs Images, OCR text Creative Commons licences - IPR free - Attribution-NonCommercial-ShareAlike 2.5 UK: Scotland Internet Archive team based at the National Library of Scotland for scanning the Scottish Post office Directories used in the project.
  • #7 Registered users Google Geocoding API assigns a georeference with scales of accuracy – from town to street to intersection to building
  • #10 Identify and fix line returns, identify which fields belong to which column, Fix OCR errors – list of search patters and their replace strings (for names, professions, addresses XML files) Name stop words to remove commercial entries
  • #12 POI’s in this case are POD entries – namely Address, Name and profession
  • #14 Critical Mass – it could be argued that the geographical & temporal coverage provided by AH doesn’t provide the critical mass of content required to attract and engage ‘the crowd’? This is borne out in our usage stats (and registered users) – which whilst not small were modest