What Henderson Saw   E XTRACTING OBSERVATIONS FROM CENTURY- OLD FIELD                                           NOTEBOOKS ...
or
From documents to datasets M INING THE JUNIUS HENDERSON FIELD NOTES FOR SPECIES                                O CCURRENCE...
Field notes and Biodiversity science• Field work is central to biodiversity work• Field notes:  • Are central to field wor...
Biodiversity science and “first person                                     precision”• We often forget that field notes st...
Junius Henderson• A typical natural history “old-  timer”  • Had a mustache  • wore suspenders  • wrote snarky comments in...
Influential in small but lasting ways, but not well-known beyond Boulder
Henderson’s field notes•   13 notebooks, 1 locality notebook•   1672 pages of notes total•   Prolific collector•   numerou...
The Henderson Field Note Project• Were looking for a low-tech digitization project• Rob knew of the existence of the trans...
Challenges in making notes available•   No time!•   No resources!•   No time!•   No repository!•   No platform!•   No time!
Solutions to challenges (ver. 1)•   No sleeping!•   Use free resources!•   Guerrilla takeover of Wikisource!•   Profit!
Wikisource• Part of Wikimedia Foundation, as is Wikipedia• Has its own “collections” or “accessions” policies  • All docs ...
Basic Project Steps•   Upload notebooks to Wikisource•   Match transcriptions to scans by hand•   Create templates to supp...
Basic Project Steps•   Upload notebooks to Wikisource•   Match transcriptions to scans by hand•   Create templates to supp...
Basic Project Steps•   Upload notebooks to Wikisource•   Match transcriptions to scans by hand•   Create templates to supp...
Annotation Templates• Anyone can annotate the transcribed to tag  elements• Ex. “I saw a white-tailed jack rabbit”  “I sa...
Annotation Templates                          Note: “white                           tailed jack                          ...
Basic Project Steps•   Upload notebooks to Wikisource•   Match transcriptions to scans by hand•   Create templates to supp...
Basic Project Steps•   Upload notebooks to Wikisource•   Match transcriptions to scans by hand•   Create templates to supp...
Basic Project Steps•   Upload notebooks to Wikisource•   Match transcriptions to scans by hand•   Create templates to supp...
Taxonomic Referencing•   Remember that “Wikipedia link”?•   We want to check if that is a valid taxonomic name•   How?•   ...
Taxonomic Referencing•   Remember that “Wikipedia link”?•   We want to check if that is a valid taxonomic name•   How?•   ...
Basic Project Steps•   Upload notebooks to Wikisource•   Match transcriptions to scans by hand•   Create templates to supp...
Results!   • 3 Notebooks posted and fully annotated                             Notebook 1          Notebook 2         Not...
Results!... With caveats• 3 Notebooks posted and fully mostly annotated• 1076 occurrences extracted• A published Darwin Co...
What challenges remain?• How do we georeference these occurrences?• How to we maintain ties between DwC records and  field...
Why this could work for you too:• Wikimedia projects really are community driven
Why this could work for you too:• Wikimedia projects really are community driven• We can all be a part of this community –...
Why this could work for you too:• Wikimedia projects really are community driven• We can all be a part of this community –...
Why this could work for you too:• Wikimedia projects really are community driven• We can all be a part of this community –...
This entire project was only possible because people had   been making small stepstowards digitization over the last      ...
Questions?• References: • Grinnell J (1912) An Afternoon’s Field Notes. The   Condor, 14(3), 104-107. Retrieved from   htt...
From documents to datasets -- mining the Junius Henderson Field Notes for species occurrence records
From documents to datasets -- mining the Junius Henderson Field Notes for species occurrence records
Upcoming SlideShare
Loading in …5
×

From documents to datasets -- mining the Junius Henderson Field Notes for species occurrence records

632 views

Published on

Slides from SPNHC 2012 presentation in the Archives and Special Collections session -- titled alternately "What Henderson Saw" or "From Documents to Datasets" depending on which author you ask. See http://soyouthinkyoucandigitize.wordpress.com/category/henderson-project/ for more detail. Contact: @an_dre_a_, @mrvaidya, @robgural, @dabblepop, @pagodarose

Published in: Technology, Health & Medicine
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
632
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • “first person precision refers to the idiosyncratic, unatomizable narrative about nature — be it a drawing on a cave wall or a handwritten page in a field journal — gives specimens and observations context that may not readily fit into a spreadsheet, and which may form the nucleus of an important new insight or discovery. Thus, field notes are the product of both qualitative and quantitative methods, in which structured and unstructured data are intertwined
  • A classic “neat old guy” – this is a phrase I just made up, but the point is that Henderson is like a lot of the people whose notes you likely keep; he was influential in lasting ways but is little known beyond his immediate sphere of influence (in this case, Boulder, CO and malacology); he was a dutiful scientist; we as LIS professionals are charged with preserving his legacy
  • Poor man’s transcription platform
  • From documents to datasets -- mining the Junius Henderson Field Notes for species occurrence records

    1. 1. What Henderson Saw E XTRACTING OBSERVATIONS FROM CENTURY- OLD FIELD NOTEBOOKS Andrea ThomerUIUC, Gaurav VaidyaCU-B,Robert GuralnickCU-B, David BloomUC-B & Laura RussellKU
    2. 2. or
    3. 3. From documents to datasets M INING THE JUNIUS HENDERSON FIELD NOTES FOR SPECIES O CCURRENCE RECORDS Andrea ThomerUIUC, Gaurav VaidyaCU-B, Robert GuralnickCU-B, David BloomUC-B & Laura RussellKU
    4. 4. Field notes and Biodiversity science• Field work is central to biodiversity work• Field notes: • Are central to field work • Are typically stored in archives • But contain data • Data wants to be free!
    5. 5. Biodiversity science and “first person precision”• We often forget that field notes store data• Value of field notes is in the combination of qualitative/quantitative data (Kramer, 2011)• Grinnell: “first person precision” (1912)• How do we free the data, while also preserving the record of its context of production?
    6. 6. Junius Henderson• A typical natural history “old- timer” • Had a mustache • wore suspenders • wrote snarky comments in his field notes about young whippersnappers and trains • Studied clams
    7. 7. Influential in small but lasting ways, but not well-known beyond Boulder
    8. 8. Henderson’s field notes• 13 notebooks, 1 locality notebook• 1672 pages of notes total• Prolific collector• numerous photographs• 1905: Began field work for CU Museum• 2000-2002: Transcribed by Dr. Peter Robinson• 2006: NSIDC scanned the Henderson notebooks• 2011-2012: annotation and data extraction
    9. 9. The Henderson Field Note Project• Were looking for a low-tech digitization project• Rob knew of the existence of the transcribed notes• “What we can accomplish with five hours of work each?”• Goals: • Make notes freely available • Try to engage volunteers on the internet • Produce one “neat thing” (a visualization, a map, etc)
    10. 10. Challenges in making notes available• No time!• No resources!• No time!• No repository!• No platform!• No time!
    11. 11. Solutions to challenges (ver. 1)• No sleeping!• Use free resources!• Guerrilla takeover of Wikisource!• Profit!
    12. 12. Wikisource• Part of Wikimedia Foundation, as is Wikipedia• Has its own “collections” or “accessions” policies • All docs from before 1923 • Post-1922: Documentary sources, peer-reviewed scientific research, analytical & artistic works• Support for “adding value” via transcription, translation, annotation, and more
    13. 13. Basic Project Steps• Upload notebooks to Wikisource• Match transcriptions to scans by hand• Create templates to support annotation• Advertise project; attract volunteers• Write simple script to extract annotations• Publish those via IPT installation as a DwC-A• Sleep
    14. 14. Basic Project Steps• Upload notebooks to Wikisource• Match transcriptions to scans by hand• Create templates to support annotation• Advertise project; attract volunteers• Write simple script to extract annotations• Publish those via IPT installation as a DwC-A• Sleep
    15. 15. Basic Project Steps• Upload notebooks to Wikisource• Match transcriptions to scans by hand• Create templates to support annotation• Advertise project; attract volunteers• Write simple script to extract annotations• Publish those via IPT installation as a DwC-A• Sleep
    16. 16. Annotation Templates• Anyone can annotate the transcribed to tag elements• Ex. “I saw a white-tailed jack rabbit”  “I saw a {{taxon|Lepus townsendii|white tailed jack rabbit}}.”
    17. 17. Annotation Templates Note: “white tailed jack rabbit” would work here as well. {{taxon|Lepus townsendii|white tailed jack rabbit}}.Type of annotation Wikipedia link verbatim text
    18. 18. Basic Project Steps• Upload notebooks to Wikisource• Match transcriptions to scans by hand• Create templates to support annotation• Advertise project; attract volunteers• Write simple script to extract annotations• Publish those via IPT installation as a DwC-A• Sleep
    19. 19. Basic Project Steps• Upload notebooks to Wikisource• Match transcriptions to scans by hand• Create templates to support annotation• Advertise project; attract volunteers• Write simple script to extract annotations• Publish those via IPT installation as a DwC-A• Sleep
    20. 20. Basic Project Steps• Upload notebooks to Wikisource• Match transcriptions to scans by hand• Create templates to support annotation• Advertise project; attract volunteers• Write simple script to extract annotations• Write complex scripts to extract annotations and compile them into occurrences• Extensively review occurrences• Taxonomic referencing• Publish those via IPT installation as a DwC-A• Sleep
    21. 21. Taxonomic Referencing• Remember that “Wikipedia link”?• We want to check if that is a valid taxonomic name• How?• Easy, right? Just check against a resolver!
    22. 22. Taxonomic Referencing• Remember that “Wikipedia link”?• We want to check if that is a valid taxonomic name• How?• Easy, right? Just check against a resolver!• Hard! Which resolver? How to verify? 1) Check name against ITIS and EOL. 2) Possible outcomes: a) Both concordant! YAY! b) No results from both. Boo! c) Discordant results. Need HUMANS! 3) This was LOTS of work (thanks, Gaurav!)
    23. 23. Basic Project Steps• Upload notebooks to Wikisource• Match transcriptions to scans by hand• Create templates to support annotation• Advertise project; attract volunteers• Write simple script to extract annotations• Write complex scripts to extract annotations and compile them into occurrences• Extensively review occurrences• Taxonomic referencing• Publish those via IPT installation as a DwC-A• Sleep
    24. 24. Results! • 3 Notebooks posted and fully annotated Notebook 1 Notebook 2 Notebook 3Downloaded on March 27, 2012 March 27, 2012 March 27, 2012Pages processed 112 of 114 120 of 123 120 of 122Number of entries 62 of 64 62 of 63 98 of 99Number of annotations 632 703 1007Taxon annotations 349 (201 unique) 224 (125 unique) 514 (248 unique)Place annotations 219 (115 unique) 419 (154 unique) 401 (139 unique)Date annotations 64 (63 unique) 60 (59 unique) 92 (90 unique)Dates in range July 1905 to April May 1907 to January 1909 to 1907 October 1908 September 1909
    25. 25. Results!... With caveats• 3 Notebooks posted and fully mostly annotated• 1076 occurrences extracted• A published Darwin Core Archive! • Most of our project’s Skype calls were about Dwc term use• A ZooKeys paper (hopefully)• A lot more questions….
    26. 26. What challenges remain?• How do we georeference these occurrences?• How to we maintain ties between DwC records and field notes?• How do we assign unique identifiers to wiki tags?• Is Wikisource the best place for this data?
    27. 27. Why this could work for you too:• Wikimedia projects really are community driven
    28. 28. Why this could work for you too:• Wikimedia projects really are community driven• We can all be a part of this community – if we do the work
    29. 29. Why this could work for you too:• Wikimedia projects really are community driven• We can all be a part of this community – if we do the work• Your lab, archive or library has as many or more potential contributors as our project
    30. 30. Why this could work for you too:• Wikimedia projects really are community driven• We can all be a part of this community – if we do the work• Your lab, archive or library has as many or more potential contributors as our project• There are many flexible transcription platforms in addition to Wikipedia
    31. 31. This entire project was only possible because people had been making small stepstowards digitization over the last 10 years
    32. 32. Questions?• References: • Grinnell J (1912) An Afternoon’s Field Notes. The Condor, 14(3), 104-107. Retrieved from http://www.jstor.org/stable/1362226. • Kramer KL (2011) The spoken and the unspoken. In M. R. Canfield (Ed.), Field Notes on Science & Nature. Cambridge, Massachusetts: Harvard University Press.• For more about Henderson, see our blog! http://soyouthinkyoucandigitize.wordpress.com/cat egory/henderson-project/

    ×