Digital Medieval Data Curation

263 views
204 views

Published on

An overview of work currently being done by the Digital Manuscript Technology group. This presentation was given to the 2013 CLIR fellows in medieval data curation, and is a synthesis of earlier presentations, some of which were co-authored with Robert Sanderson.

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
263
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Allows filtering by date, item, and manuscript, as well as search across the items
  • Digital Medieval Data Curation

    1. 1. Digital Medieval Data Curation CLIR Postdoctoral Fellowship Seminar Bryn Mawr, 2013 Benjamin Albritton, Stanford University Libraries blalbrit@stanford.edu @bla222
    2. 2. Current State: A World of Silos Roman de la Rose Parker on the Web e-codices And so on…
    3. 3. Data Interoperability • Break down silos • Separate data from applications • Share data models and programming interfaces • Enable interactions at the tool and repository level
    4. 4. Designing Modular Repositories and Tools Image Data (Canonical) Image Viewer Discovery Annotation Non-image data (Canonical) Transcription Image Viewer Image Analysis Discovery Tool X? Repository Repository User Interface 3rd-Party Tools
    5. 5. Image Data (Canonical) Image Viewer Discovery Annotation Non-image data (Canonical) Transcription Image Viewer Image Analysis Discovery Tool X? Repository Repository User Interface 3rd-Party Tools Designing Modular Repositories and Tools
    6. 6. Image Data (Canonical) Image Viewer Discovery Annotation Non-image data (Canonical) Transcription Image Viewer Image Analysis Discovery Tool X? Designing Modular Repositories and Tools
    7. 7. Iterative Interactions
    8. 8. Multiple Data Sources • Existing structured data (catalogs) • User-added – Comments – Transcriptions – Etc. • Digital images • Machine processing
    9. 9. Motivating Questions What does this mean for medieval data? • How do we rethink medieval object data in a shared, distributed, global space? • How do we enable collaboration and encourage engagement? • How do we deal with tools that are producing new data on digital surrogates that are implicitly about a real world object?
    10. 10. Transcribing from Digital Surrogates La Terre de Secille
    11. 11. Naïve Approach: Attach Transcription to Image One problem example: Multiple Representations CCC 26 f. iiiR
    12. 12. Naïve Approach: Attach Transcription to Image One problem example: Multiple Representations CCC 26 f. iiiR Fold A Open
    13. 13. Naïve Approach: Attach Transcription to Image One problem example: Multiple Representations CCC 26 f. iiiR Fold A Open Fold A and B Open
    14. 14. Naïve Approach: Attach Transcription to Image One problem example: Multiple Representations CCC 26 f. iiiR Fold A Open Fold A and B Open f. iiiV
    15. 15. The Shared Canvas • Represents a real world thing we want to “talk” about • Has a unique name • http://dms-data.stanford.edu/Parker/CCC026/canvas-12
    16. 16. Data Model: SharedCanvas http://www.shared-canvas.org
    17. 17. Data is “about” a real thing
    18. 18. Canvas Paradigm • A Canvas is an empty space in which to build up a display • Makes explicit that the image is a surrogate
    19. 19. Open Annotation Model • Annotation (a document) • Body (the ‘comment’ of the annotation) • Target (the resource the Body is ‘about’)
    20. 20. Model: Annotations to Paint Canvas • The Canvas represents the empty page • Annotation links Image with Canvas
    21. 21. Model: Annotations to Paint Canvas • Annotation links Text with Canvas
    22. 22. Model: Annotations to Paint Canvas
    23. 23. Model: Missing Pages
    24. 24. Medieval Data Use-Cases: A Sampler • Structured data from existing sources • Transcription and glyphs • Structured data from new sources
    25. 25. Structured Data from Existing Sources A Catalog of the Manuscripts of Salisbury Cathedral Library
    26. 26. Drives Discovery
    27. 27. Transcription: T-PEN (Saint Louis University) http://t-pen.org • Transcription tool • Provides image parsing – Columns BNF fr. 9221 – column parsing
    28. 28. T-PEN (Saint Louis University) http://t-pen.org • Transcription tool • Provides image parsing – Columns – Lines BNF fr. 9221 – line parsing
    29. 29. T-PEN (Saint Louis University) http://t-pen.org BNF fr. 9221 – transcription view
    30. 30. Drives Full-Text Search http://t-pen.org/TPEN
    31. 31. … and other interfaces http://stanford.edu/~blalbrit/v-machine-2/samples/DamedequiRF5.xml
    32. 32. T-PEN’s PaleoTool BNF fr. 1586 – glyph parsing
    33. 33. Results for “matching” glyphs
    34. 34. Glyphs with multiple letters
    35. 35. Comparing results across manuscripts BNF fr. 1586 CCCC 324
    36. 36. User-created Structured Data Beinecke MS 310, f. 1r • Each row = 1 day (January 1, here) • Lists the feast of the Circumcision • Optionally provides additional information
    37. 37. Distributed Resources / Distributed Environments
    38. 38. Data capture in T-PEN http:t-pen.org – Saint Louis University
    39. 39. Front-end: Exhibit http://guillaumedemachaut.com/kalendar/sharedkalendar.html Simple (really simple) Exhibit based on kalendar transcriptions (Exhibit: http://www.simile-widgets.org/exhibit/)
    40. 40. For each record:
    41. 41. Enabling rapid comparison Two mss. include the entry “Thimotheus apostel”
    42. 42. Distributed Resources / Distributed Environments
    43. 43. SharedCanvas Demo Implementation http://www.shared-canvas.org/impl/demodh
    44. 44. SharedCanvas Demo Implementation http://www.shared-canvas.org/impl/demodh
    45. 45. SharedCanvas Demo Implementation http://www.shared-canvas.org/impl/demodh
    46. 46. A Sea of Manuscript Data • Thousands of manuscripts currently available interoperably, with more coming rapidly • Discovery data is a mixed bag • Tools provide data back into the system that can be re-used • New data drives new discovery, new interfaces, and new visualization challenges • Management and manipulation of that “wild” data is a serious challenge

    ×