This document discusses extracting data from historical documents through crowdsourcing annotations on Wikisource. It describes a project to digitize the field notebooks of Junius Henderson, a early 20th century curator at the University of Colorado Museum of Natural History. Volunteers helped transcribe and annotate the notebooks by adding images, text, and metadata templates on Wikisource. This allowed the data within the notebooks to be extracted and linked to other online resources. The project demonstrates how small incremental steps over many years, like initial scanning and transcription efforts, enabled the fully digitized, annotated, and data-linked version of the notebooks.
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Extracting Data from Historical Documents: Crowdsourcing Annotations on Wikisource
1. Extracting Data from Historical
Documents: Crowdsourcing
Annotations on Wikisource
Andrea Thomer, Gaurav Vaidya*, Robert
Guralnick, David Bloom, Laura Russell
8. http://pinterest.com/cumnh/
http://media-cache-ec3.pinterest.com/avatars/ucmnh-1346976471_600.jpg
University of Colorado Museum of Natural
History (CUMNH) -- founded 1909
17. The Process
1. Images on the Wikimedia Commons.
2. Images + text on Wikisource.
3. Images + text + annotations on Wikisource.
4. Data using the MediaWiki APIs.
• Full details: http://dx.doi.org/10.3897/zookeys.209.3247
• Short URL: http://bit.ly/henderson-paper
37. An template of our own
{{element|formal name of this element|
element as written by Henderson}}
Examples:
{{taxon|Sayornis saya|Say Phoebe}}
{{taxon|Carduelis pinus|siskins}}
{{taxon|Siskin|siskins}}
38. An template of our own
{{element|formal name of this element|
element as written by Henderson}}
Examples:
{{dated|1905-07-28|July 28, 1905}}
{{place|Boulder, Colorado|Boulder,
Colo}}
66. 240. Physa anatina Lea .......................................... Identified by Bastsch
Creek 1 mile north of Loveland, Colo. June 9, 1906. Junius Henderson
Museum records
67. 240. Physa anatina Lea .......................................... Identified by Bastsch
Creek 1 mile north of Loveland, Colo. June 9, 1906. Junius Henderson
Museum records
68. 240. Physa anatina Lea .......................................... Identified by Bastsch
Creek 1 mile north of Loveland, Colo. June 9, 1906. Junius Henderson
Problem: context