Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Mapping the maps - review and latest update (GlamWiki, Apr 12, 2015)

767 views

Published on

How to find 50,000 maps in a haystack of 1,000,000 images; geolocate them, and categorise them ... on a budget of no or not many euros.

The 1,000,000 image collection extracted by the British Library from 19th-century books is a wonderful resource — but one Wikimedia Commons felt it could not accept, other than through exhaustive hand-uploading, because without good metadata about the subject of the image at the image level, the images could not be made categorisable and so would simply not be discoverable. This talk describes a joint BL/Wikimedia initiative to systematically go through the images, which discovered 50,000 maps in eight weeks.

In the second stage of the process, now just getting under way, crowd geolocation of these map images is now making it possible to use automated tools to group them and organise them and categorise them in different ways, with the aim of uploading them to Commons with a full provisional categorisation, the key step to making them valuable and reusable.

(25 minute talk given at GlamWiki 2015 in the Hague)

Published in: Internet
  • Be the first to comment

Mapping the maps - review and latest update (GlamWiki, Apr 12, 2015)

  1. 1. Wikimedia/British Library map mapping project – review and latest update How to find 50,000 maps in a haystack of 1,000,000 images; geolocate them, and categorise them ... on a budget of no not many euros. James Heald, Wikimedia volunteer (User:Jheald) Kimberly Kowal, British Library Kimberly.Kowal@bl.uk
  2. 2. 1,000,000 images Fantastic, but …
  3. 3. Very limited metadata
  4. 4. Very limited metadata Commons said no bulk upload
  5. 5. Volunteer response… Create a subject index by book…
  6. 6. … encouraging images to be uploaded by the book (20,000 so far – majority by one user)
  7. 7. … however, manual categorisation of images is very very time-consuming.
  8. 8. Could anything be done more automatically…
  9. 9. Maps: natural classification, given co-ordinates Could anything be done more automatically…
  10. 10. So: find the maps on Flickr, and tag them…
  11. 11. … using the index to drive the process 31 Oct
  12. 12. … using the index to drive the process 31 Oct
  13. 13. … using the index to drive the process 31 Oct
  14. 14. … using the index to drive the process 03 Nov
  15. 15. … using the index to drive the process 17 Dec
  16. 16. … using the index to drive the process 19 Dec
  17. 17. But how many maps were there ? Oct 31
  18. 18. But how many maps were there ? Oct 31
  19. 19. But how many maps were there ? Nov 2
  20. 20. But how many maps were there ? Nov 7
  21. 21. But how many maps were there ? Nov 14
  22. 22. But how many maps were there ? Dec 1
  23. 23. But how many maps were there ? Dec 10
  24. 24. But how many maps were there ? Dec 17
  25. 25. But how many maps were there ? Dec 28
  26. 26. -- including 20,000 found independently by @Quasimondo, machine-assisted using his own pattern recognition methods 50,000 maps in all: classmark detailed totals index index ------ ---------- ----------- misc 16074 14091 1983 Europe 13136 6254 6882 British Isles 7191 269 6922 North America 6758 1524 5234 USA 5782 1209 4573 Asia 2736 1280 1456 Africa 2300 1075 1225 South America 895 659 236
  27. 27. Geo-location, using the Klokan/BL Georeferencer (Free alternatives are also available) Next step:
  28. 28. 10x more images than the BL has ever attempted before Next step:
  29. 29. Success allows the old map to be laid over the top of a modern one
  30. 30. Pilot run of 3,000 completed
  31. 31. Now characterised by location … Pilot run of 3,000 completed
  32. 32. ... and scale
  33. 33. All that is needed to identify individual continents …
  34. 34. … countries …
  35. 35. … nation … … nations …
  36. 36. … cities …
  37. 37. … and beyond … and beyond.
  38. 38. Ready to be uploaded to Commons…
  39. 39. Ready to be uploaded to Commons… … almost
  40. 40. To do list: Better subject identification Reasonable Commons categorisation
  41. 41. To do/1: Subject identification Current: OSM Nominatim, 4 votes out of 5
  42. 42. To do/1: Subject identification Small features: Look up on Wikidata, find plausible candidate
  43. 43. To do/1: Subject identification Large features: can be over-cautious Need better idea of size of candidate features…
  44. 44. To do/1: Subject identification Large features: … so compare typical existing maps
  45. 45. To do/2: Categorisation Principle on Commons is to refine into groups of 'human manageable' size. ~ 4 to 40 images (larger for series) Good for humans, less good for machines ... wildly different categorisation depths & naming
  46. 46. To do/2: Categorisation Routine upload and management categories ... straightforward enough.  Maps from collection uploaded on <date>  Maps from collection uploaded on <date> with categorisation to confirm  Images from <book> but then ...
  47. 47. To do/2: Categorisation Countries: Old maps of <country> Old maps of part of <country> Cities: Old maps of <city> Old maps of cities in <country> Old maps of cities in <part of country> + "<city>" itself ? Features: (ie buildings, castles, cathedrals, battlefields, etc) <Feature> / Plans of <Feature> Plans of <feature-type>s in <place>
  48. 48. To do/3: Strengthening Wikidata <feature-type> should be given by P31 ("Instance of“) -> church, castle, cathedral, battlefield, etc But data often not yet there... Need to supply: WP category mining (care needed:"category spillage"), databases (if PD), etc.
  49. 49. To do list There is work to do… But with some work, (and some human mop-up), automated upload + reasonable categorisation should be possible.
  50. 50. State of play Georeferencing is underway Index pages now have “to georef” templates.
  51. 51. State of play Main progress page is live
  52. 52. Conclusions: Tiered levels of wiki-pages leading to image searches can be used to drive large projects Even ad-hoc rough indexes are useful Commons's own old maps should be next (~ 60,000) Georeferencing is fun -- come and give it a try

×