Crowdsourcing Historical Research


Published on

Presentation on the Founders and Survivors project for Drupal Downunder 2012.

Published in: Education, Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Abstract: Founders and Survivors is an Australian Research Council-funded research project to build biographies of the approx. 70,000 convicts transported to Tasmania, and their descendants. The project is a collaboration between historians, public health scientists, and a growing number of volunteer genealogists and amateur historians. Our tools include a massive and complex XML database, and social and collaboration tools built with Drupal and Google Docs. This presentation will describe the goals and challenges of the project, the motivations behind the adoption of these tools, and their implementation. I am one of two developers on the Founders and Survivors project. I will introduce the project and how we use 'crowdsourcing' methods to enrich our archival sources.
  • Introduction to the project. Founders and Survivors: Tasmanian convicts and their descendants -- health and resilience. Collaboration between researchers from Universities of Melbourne and Tasmania and elsewhere.
  • Digital history: 'problem' or opportunity. Historical, archival resources that have not been compiled or explored.
  • Some examples of current research.
  • Shoestring budget. Experimental. Lots of research questions, limited technical resources.
  • Users of this website. Research team: diverse backgrounds, locations. Amateur historians with interest in Tasmanian history.
  • Some existing archival material has been digitised. We want to incorporate material from lives of convicts after leaving the convict system, from a range of primary sources and family histories.
  • The project began with digitised images of archival documents from court trials, ships and prisons, recording the physical and behavioural characteristics of convicts before and during transportation and during their sentence in Van Diemen's Land.
  • Good starting point for quantitative history. High-quality data.
  • Less reliable. Less accessible. Collaboration between 'professional' and 'amateur' historians
  • Our volunteers include family historians, retired historians, librarians and engineers ... Interest in family or local histories, or convicts in general. Varying levels of experience with technology and historical research.
  • What happened to convicts after they left the convict system?
  • How to collate different sources of data and incorporate new data (from volunteers and other researchers or archives). Experimentation – solutions not planned from the start.
  • Our other developer has consolidated the different sources of tabular data into one massive XML database using the BaseX engine and a data format based on the Text Encoding Initiative.
  • At the same time, I was experimenting with presenting some of the same ata in Drupal, but it would not scale (73,000 convicts, many different source documents for each). Drupal is now used to document the project, collect some data from volunteers, and coordinate volunteer efforts.
  • Some tabular data has been captured in Excel or CSV form. Most textual/narrative documents are yet to be transcribed and will require more human intervention to incorporate them into the master database. Unfulfilled dream about GEDCOM import.
  • Public and staff views of consolidated convict biographies using XSLT. Link between basex and ccc: scripts to add links to basex, run as cron jobs.
  • Convict biographies are captured in Drupal. XSLT template for a convict record includes a url to create a new entry in a Drupal form, using the Prepopulate module to capture enough from the XML record to assist in two-way linkage. (Just the record ID)
  • Automated process to incorporate community-contributed content into the master database (Perl).
  • Consolidated source info from the XML entry and prepopulated Drupal form with link to Archive Index ID number. [NB some of our record IDs are obscure. Here: CON31/40...]
  • What if more than one person submits info on the same convict? These will not be identical because every descendant or research has different (but overlapping) info. All submissions are checked by staff before being added to the master database.
  • More committed volunteers are assigned to ships and try to trace all convicts on that ship. In addition to convict biographies in Drupal, some summary data (targeted at analysis) is captured in Google Spreadsheets, one for each ship. Prepopulated using the Perl Google Docs API.
  • Links to XML and Drupal records.
  • Scale: both developers started on this project around the same time, with our own experiments (Drupal and XML), and XML appeared more suitable to the scale of our dataset. That was when we had much less data than we do now. Complex nature of our data: combination of tabular, textual and image sources; XML was a more natural fit for presenting a whole individual's lifecourse. Expertise: Some staff and volunteers seemed to have difficulty navigating complex forms. For the ship project, which involved volunteers making lots of numerical entries, we decided to use spreadsheets with validation controls instead.
  • Building a web frontend which is more than the requisite "About the project" site – interface to XML database, data capture, and volunteer forums. BaseX and Drupal live on our own servers – not dependent on Google.
  • This model has evolved as new data has become available and new analytical questions have been proposed – we did not know exactly what we needed to do when we began 3-4 years ago.
  • Crowdsourcing Historical Research

    1. 1. Crowdsourcing Historical Research Claudine Chionh Drupal Downunder 2012
    2. 2. Founders and Survivors <ul><ul><li>Study of the 73,000 convicts transported to Van Diemen's Land (Tasmania) between 1803 and 1853 </li></ul></ul><ul><ul><li>Records from the convict system and elsewhere </li></ul></ul><ul><ul><li>Health, environment, lifestyle, wellbeing </li></ul></ul><ul><ul><li>Effects on health and resilience of descendants </li></ul></ul><ul><li> </li></ul>
    3. 3. Goals of the project <ul><ul><li>Compile (health and demographic) data about this population from a range of sources </li></ul></ul><ul><ul><li>Enable other researchers to use this data </li></ul></ul><ul><ul><li>Explore quantitative and geographic tools and analyses that are not commonly used in historical research </li></ul></ul><ul><ul><li>Combine professional expertise with the enthusiasm of volunteers </li></ul></ul>
    4. 4. Some research projects <ul><ul><li>Morbidity and mortality on the voyage to Australia </li></ul></ul><ul><ul><li>Crime and convicts in Tasmania, 1853-1900 </li></ul></ul><ul><ul><li>Fertility decline in late C19 Tasmania </li></ul></ul><ul><ul><li>Prostitution and female convicts </li></ul></ul><ul><ul><li>Tracing convicts' descendants who served in WWI </li></ul></ul><ul><li> </li></ul>
    5. 5. Project staff <ul><ul><li>Historians </li></ul></ul><ul><ul><li>Demographers </li></ul></ul><ul><ul><li>Epidemiologists </li></ul></ul><ul><ul><li>Two part-time developers </li></ul></ul>
    6. 6. Who are our users? <ul><ul><li>Research team </li></ul></ul><ul><ul><li>Other interested researchers </li></ul></ul><ul><ul><li>Genealogists/family historians </li></ul></ul><ul><ul><li>Local historians </li></ul></ul>
    7. 7. Data sources <ul><ul><li>Conduct records </li></ul></ul><ul><ul><li>Surgeons' journals </li></ul></ul><ul><ul><li>Newspaper reports </li></ul></ul><ul><ul><li>Births, deaths, marriages </li></ul></ul><ul><ul><li>Parish records </li></ul></ul><ul><ul><li>Family histories, memories, legends </li></ul></ul>
    8. 8. Official/formal sources <ul><li>Records from the convict system </li></ul><ul><ul><li>Trial, conviction documents </li></ul></ul><ul><ul><li>Conduct records </li></ul></ul><ul><ul><li>Ship surgeons' journals </li></ul></ul><ul><ul><li>Permissions to marry </li></ul></ul><ul><ul><li>Ticket of leave </li></ul></ul><ul><li>Outside the convict system </li></ul><ul><ul><li>Births, deaths, marriages </li></ul></ul><ul><ul><li>Later convictions </li></ul></ul>
    9. 9. Paper databases <ul><li>Broader historical context: </li></ul><ul><ul><li>Mass transportation </li></ul></ul><ul><ul><li>Modern record-keeping and statistics </li></ul></ul>
    10. 10. Informal sources <ul><ul><li>Newspaper reports </li></ul></ul><ul><ul><li>Family history: primary sources, compiled genealogies, anecdote and legend </li></ul></ul>
    11. 11. Our volunteers <ul><ul><li>Amateur historians, genealogists </li></ul></ul><ul><ul><li>Librarians </li></ul></ul><ul><ul><li>IT specialists </li></ul></ul>
    12. 12. How volunteers can contribute <ul><ul><li>Individual convict biographies </li></ul></ul><ul><ul><li>Tracing batches of convicts in ships </li></ul></ul>
    13. 13. Solutions <ul><ul><li>XML database </li></ul></ul><ul><ul><li>Drupal </li></ul></ul><ul><ul><li>Google Docs </li></ul></ul>
    14. 14. The Founders and Survivors database <ul><ul><li>XML (based on Text Encoding Initiative ) </li></ul></ul><ul><ul><li>BaseX XML database engine </li></ul></ul>
    15. 15. Experimenting with Drupal <ul><ul><li>Used an older version of Migrate to import some tabular data as nodes </li></ul></ul><ul><ul><li>Problem of scale: 73,000 convicts </li></ul></ul><ul><ul><li>XML approach proved to be more efficient </li></ul></ul>
    16. 16. Getting data into our system <ul><li>Formal sources </li></ul><ul><ul><li>Collected by archives and individual researchers </li></ul></ul><ul><ul><li>CSV, Excel, Filemaker, Access ... </li></ul></ul><ul><ul><li>Incorporated into BaseX database with Perl scripts </li></ul></ul><ul><li>Informal sources </li></ul><ul><ul><li>Individual convicts' life histories are captured in a Drupal content type ('Community contributed content') </li></ul></ul><ul><ul><li>Some sub-projects also capture summary data in Google spreadsheets </li></ul></ul>
    17. 17. Viewing data <ul><ul><li>Master database in BaseX: presented in XSLT, different views for logged in researchers and others </li></ul></ul><ul><ul><li>Community contributed content (CCC): Drupal </li></ul></ul><ul><ul><li>Two-way link between master database and CCC </li></ul></ul><ul><ul><li>Google spreadsheets prepopulated with links to corresponding records in master database </li></ul></ul>
    18. 18. Data capture <ul><ul><li>Convict biographies captured in Drupal – Community Contributed Content (CCC) </li></ul></ul><ul><ul><li>Linked to entry in XML database </li></ul></ul><ul><ul><li>Perl scripts to incorporate CCC records into master database </li></ul></ul>
    19. 19. XML entry for an individual convict
    20. 20. Prepopulated Drupal form
    21. 21. Community contributed content
    22. 22. Ships (batches of data) <ul><ul><li>Tracing all convicts on a ship </li></ul></ul><ul><ul><li>Summary data in Google Spreadsheets </li></ul></ul><ul><ul><li>Spreadsheets are prepopulated from the master database </li></ul></ul>
    23. 23. Ship summary data in Google Spreadsheets
    24. 24. Drupal can't do everything <ul><ul><li>Scale </li></ul></ul><ul><ul><li>Complexity </li></ul></ul><ul><ul><li>Expertise </li></ul></ul>
    25. 25. Where Drupal is appropriate for our project <ul><ul><li>Web frontend </li></ul></ul><ul><ul><li>Data capture </li></ul></ul><ul><ul><li>Collaboration, forums </li></ul></ul>
    26. 26. Summary <ul><ul><li>Massive XML database with complex relations </li></ul></ul><ul><ul><li>Drupal for capturing slightly complex data and facilitating collaboration </li></ul></ul><ul><ul><li>Google Spreadsheets for capturing tabular data </li></ul></ul>
    27. 27. Questions? <ul><li>Founders and Survivors </li></ul><ul><li> </li></ul><ul><li>[email_address] </li></ul><ul><li>Claudine Chionh </li></ul><ul><li> </li></ul><ul><li>[email_address] </li></ul>