AddressingHistory - Crowdsourcing the Past - Stuart Macdonald


Published on

Presentation given at the Geospatial in the Cultural Heritage Domain - Past, Present & Future event in London on 7th March 2012. The event was organised as part of the JISC GECO project.

Published in: Education, Technology, Sports
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • UK Digitisation programme Developing Community Content strand of the JISC Digitisation and e-Content programme Welsh Voices of the Great War in Wales – Cardiff University
  • Online engagement tool based on web 2.0 principles Galaxy Zoo is an online astronomy project which invites members of the public to assist in classifying over sixty million galaxies Old Weather is a web-based effort to transcribe weather observations made by Royal Navy ships around the time of World War I The Great War Archive , was a 2008 project led by the University of Oxford that asked members of the general public to digitise any artefacts they held relating to the First World War and upload them to a purpose built website.
  • Bank directory listing banks and banking companies Educational directory listing educational institutions and teachers by their subject Law directory listing juridical institutions and practitioners Medical directory listing medical and surgical institutions and practitioners Insurance directory listing insurance companies Rich source of adverts which give an idea as to lifestyles, spending habits, Of interest to genealogists, local or family historians, academic researchers
  • 44,000 historical maps of Scotland – county maps, town plans, admiralty charts (coastline), military maps, Historic OS series Plus 600 of Edinburgh and its environs Images, OCR text Creative Commons licences - IPR free - Attribution-NonCommercial-ShareAlike 2.5 UK: Scotland Internet Archive team based at the National Library of Scotland for scanning the Scottish Post office Directories used in the project.
  • Registered users Google Geocoding API assigns a georeference with scales of accuracy – from town to street to intersection to building
  • Identify and fix line returns, identify which fields belong to which column, Fix OCR errors – list of search patters and their replace strings (for names, professions, addresses XML files) Name stop words to remove commercial entries
  • POI’s in this case are POD entries – namely Address, Name and profession
  • Critical Mass – it could be argued that the geographical & temporal coverage provided by AH doesn’t provide the critical mass of content required to attract and engage ‘the crowd’? This is borne out in our usage stats (and registered users) – which whilst not small were modest
  • AddressingHistory - Crowdsourcing the Past - Stuart Macdonald

    1. 1. AddressingHistory - Crowdsourcing the Past Stuart Macdonald Associate Data Librarian EDINA & Data Library University of Edinburgh of American Geographers Annual Meeting - Working Digitally with Historical Maps, New York Public Library, 25 Feb. 2012
    2. 2. Phase 1JISC-funded Community Contentproject6 months (April 2010 – September2010)Partner with National Library ofScotlandAdvisory Board
    3. 3. To create an online crowdsourcing tool which will combinedata from digitised historical Scottish Post OfficeDirectories (PODs) with contemporaneous historical mapsSimilar to Australian Historic Newspapers projectprovided by National Library of Australia where membersof the public correct and improve OCR’d text of oldnewspapers - Great War Archive (Univ. Oxford) that askedmembers of the general public to digitise any First WorldWar artefacts and upload them to a purpose built website.
    4. 4. PODs offer a fine-grained spatialand temporal view on social,economic and demographiccircumstancesThey provide residential names,occupations, and addresses.Each contain 3 sub-directories:general, street, and tradesMay also contain misc. tradedirectories e.g. banking,education, law, insurance,medical
    5. 5. Phase 1 focused on 3 vols. ofEdinburgh PODs: 1784-5; 1865;1905-6Historic Scottish maps geo-referenced by NLSPODs digitised by NLS inconjunction with the InternetArchivec.700 PODs (1773 to 1911)covering 28 of Scotlands townsand counties now onlinePublic domain (CC BY-NC-SA 2.5)
    6. 6. Using Open Layers as web-based mapping clientTool allows ‘the crowd’ togeoreference a POD entry bymoving a ‘map pin’ on adigitised map thus facilitatingthe addition of an grid referenceto the OCR’d POD held in XMLformat in a database structure(PostgreSQL)API available allowing webdevelopers access to the rawdata in multiple output formats(JSON, XML, CSV)Geo-coding of POD addressesparsed against Googlegeocoder
    7. 7. Interface had to be easy-to-use for arange of usersRobust and scalable to accommodatec.700 digitised Scottish PODsMechanism to check user-generatedcontent such as geo-references,name or address edits/annotationsCrowdsourcing of geo-coded gridreferencesView original scanned directory pageAmplification of tool and API viaSocial Media Channels – Facebook,Twitter, Blog, Flickr, YouTube
    8. 8. Search people, place, profession Historic Map overlay selected Record edits by the ‘crowd’ View originalSearchresults Download options
    9. 9. Phase 2 sought to develop functionality to resonate with JISC’svision to build sustainable and durable deliverables and tocompliment phase 1 by broadening both geographic and temporalcoverageFeb. – Sept. 2011 (EDINASustainability Funding)New content (Aberdeen, Glasgow,Edinburgh for 1881 & 1891Re-evaluate (and enhance) parsingtool performanceOld parser :•Exact geotag – 60%•Professions – 25%New parser (no configuration file):•Exact geotag – 72%•Professions – 76%New parser (with configuration file)•Exact geotag – 88%•Professions – 82%
    10. 10. Phase 2Other additional features include: • Spatial searching (bounding box) • Associate map pin with search results • Search across multiple address • Aid searching by applying Standard Industrial Classification (SIC) codes to Professions
    11. 11. Augmented RealityAn AddressingHistory layer hasbeen created and published foruse with the ‘Layar’ Applicationfor either iPhone or AndroidGeo-referenced Points ofInterest (POIs) are uploadedinto the BuildAR CMSPOIs (e.g. each profession orSIC Code) have an imageassociated with itThe App allows users to compare their current location (from phone)with the geo-referenced AH records in order to establish which namesand professions are located in the local vicinity
    12. 12. Lessons LearnedCritical mass – does geographic & temporal coverage attract andengage the crowd?Separate out parsing from interface and back endstorage - to allow any refinements to be implemented withoutimpacting on tool and APIExternalise ‘configuration’ files – editable XML-based filesthat accommodate repeated OCR and content inconsistencies –these are run in conjunction with the POD parser to refine the parsedcontent hence improved searchingParsing and refining process is almost unending -Identify what is realistically achievable with available resourcesand time constraints- i.e. perform proper requirements analysisConsult with others - involved in digitising and parsingcity/town/post office directories e.g. Richard Marciano(UNC Chapel-Hill), Matt Knutzen (NYPL)
    13. 13. SustainabilityGiven the broad applicability of theresource a range of communities may beinterested in the longer term curation ofthe project tools e.g. the Open Street Mapcommunity, NLSEvaluation of possible business modelsfor sustainability:• revenue generation via online donations• subscription model (e.g. per annum, permonth, per use)• ‘freemium model’ (e.g. free APIdownload of a certain number of recordswith payment for further downloads)• academic advertising.
    14. 14. Second last slide…New content and features to be made available start ofMarch 2012Gauging the success of the project goes beyond thedelivery of engaging and innovative online tools. It willbe ultimately be measured by continual and extendeduse within the wider community.
    15. 15. Website: THANKING YOU! Credits: Image by aroid - - CC BY 2.0 Image by konqui - - CC BY-NC 2.0 Image by mosilager - - CC BY-NC-SA 2.0 Image by racoles - - CC BY-NC 2.0 Image by James Bowe - (CC BY 2.0) Image by yelnoc - - CC BY-NC-SA 2.0 Image by - - CC BY 2.0 Image by bek30 - - CC BY-NC 2.0 Image by karen horton - - CC BY-NC 2.0 Image by lofaesofa - - CC BY 2.0 Image by Psycho Delia - - CC BY-NC 2.0 Image by wdj(0) - .com/photos/davidjoyner/534893725/ - CC BY-SA 2.0 Image by Symic - - CC BY-SA 2.0 Image by ~milj - - CC BY-NC-SA 2.0 Acknowledgements: JISC - NLS Geo-referenced maps and applications - Visualising Urban Geographies (VUG) project – Edinburgh City Libraries –