SlideShare a Scribd company logo
1 of 24
Encoding and Designing for the Swift Poems Project
Jonathan Swift and the Text Encoding Initiative
James R. Griffin III
Digital Library Developer
Lafayette College Libraries
Introductions
James Woolley
(Emeritus) Frank Lee and Edna M. Smith Professor of English at Lafayette College
Stephen Karian
Associate Professor of English at the University of Missouri
James R. Griffin III
Digital Library Developer at the Lafayette College Libraries
Overview of the Swift Poems Project
Woolley and Karian seek to archive poems attributed to Jonathan Swift
Beginning in 1987, this has involved:
Identifying and Cataloging Primary Sources
Transcription
Copy-typing, encoding, and annotating the primary sources
This method has not relied upon the usage of the TEI Collation
Collation
Identifying the copy-text (and variant texts) for any given poem
Overview of the Swift Poems Project
The Libraries at Lafayette
In 2009 Woolley consulted with the Libraries for assistance with the Project
Visual Resources Curator (Paul Miller) developed a set of Microsoft Access Databases
These structure the catalogs maintained by Woolley and Karian
In 2012, the NEH awarded a Scholarly Editions Grant for the Project
This includes supporting the development of a digital edition
This will be integrated into volumes of the Cambridge Edition of the Works of Jonathan Swift
Digital Scholarship Services (Department in the Libraries)
Identifying and Cataloging Primary Sources
Identifying the Sources
Sources are 18th century printed and manuscript texts
To date, over 6500 manuscripts have been identified and cataloged
Few digital surrogates for the printed and manuscript texts are available
Of these only a restricted set aren’t protected under copyright
Cataloging the Sources
Bibliographic metadata are structured for each source
Elements are extracted from external catalogs (e. g. English Short Title Catalogue)
Transcribing the Primary Sources
On Poetry A Rapsody (Poem 640-35D-)
Transcribing the Primary Sources
The transcripts themselves are created using the Nota Bene application
Nota Bene encodes textual structure using a system of tags termed as “mode codes”
«MDUL»This mode encodes italicized style rendering«MDNM»
«MDBO»This mode encodes black letter«MDNM»
The researchers have further extended this system to support editorial annotation:
Lorem«MDUL»add·caret«MDNM»·this text added with a caretipsum
Dolor«MDUL»del«MDNM»·this was deleted·sit amet
«MDUL»This mode encodes italicized style rendering«MDNM»
«MDBO»This mode encodes black letter«MDNM»
Lorem«MDUL»add·caret«MDNM»·this text added with a caretipsum
Dolor«MDUL»del«MDNM»·this was deleted·sit amet
pasted·over
printed text
Accessing and Preserving the Transcripts
Accessing the Nota Bene comes with challenges
The Nota Bene release used by the researchers has been 3.0 (released in 1988)
Accessing the Nota Bene directly would require a virtualized environment for Microsoft DOS
The Nota Bene transcripts are managed as text/plain media resources
The Text Encoding Initiative P5
Provides a robust data model
Standardized and open format for interchange
A more effective solution for preservation
Encoding the Transcripts
On Poetry A Rapsody(Poem 640-35D-)
Encoding the Transcripts
Encoding using the TEI-P5 could not be a manual process
The researchers required a system to transform Nota Bene into a TEI-XML implementation
An API for Ruby using Nokogiri was developed to support this
Viewing the TEI-XML was of limited value
Research techniques driven by Nota Bene require a rendering of the text
Styled HTML5 (using Twitter Bootstrap) serves as a minimum viable product
Improvements can be rapidly prototyped for the encoding
This approach takes inspiration from Agile software development practices
Viewing the Encoded Transcript
On Poetry A Rapsody (Poem 640-35D-)
Enriching the Encoded Transcripts
Limits are obviously present with this approach
Researchers are not encoding the transcripts using the TEI P5
The developer for the Ruby API is not a literary scholar
How can this encoding be made collaborative?
The developer and the researcher could operate in a shared environment
This is inspired heavily by the pair-programming technique within eXtreme Programming
In this case, both the developer and a researcher share a physical working environment
Enriching the Encoded Transcripts
Collaborative encoding and quality control
The researchers will identify faults in the rendered transcripts
The developer can extend the Ruby API, XSL, or styling for the HTML5
This enables rapid prototyping of the interface
Delivery time for the researcher (in transcribing the sources) can be increased
In response, the developer can more readily scope improvement requests
Textual criticism is still not enabled by this approach
The researchers must identify variant readings to a given text
Collation for the Swift Poems Project
Collation as a solution
Originally the researchers collated the Nota Bene transcripts using a FoxPro program
Visualization was used to identify variation
Tokenization was customized
A set of controlled characters (&,~,|,#) symbolized differences in structure
664D721L 1 The Pulteney’s and Shippens & such folk
664D233Y 1 Well may Poultney & Shippen rant, grumble,
664D360Y 1 Your Pulteney & Shippen, ~ ~ folks
664D721L 2 How unlucky it is for the Nation #######
664D233Y 2 |& curse the hard Fate of the Nation;
664D360Y 2 |How hard is the Fate of the Nation
Collation within a Digital Scholarly Edition
A collation interface was scoped for the digital edition
This interface must enable the transition from the legacy collation engine
Collation features could be extended
Lines are still tokenized
Initially attempted to preprocess the text and use the Penn Treebank tokenizer
Ultimately found that abstracting the tokenizer was simply more effective
Alignment is addressed without the use of controlled characters
The edit distance between tokens can be calculated
Collation within a Digital Scholarly Edition
Collating Variants for the Poem !W190
Collation within a Digital Scholarly Edition
The collation can also address flaws in the encoding
By default, all unencoded Nota Bene markup is stripped from the TEI
Users will be able to collate the texts and visualize differences
Optionally Nota Bene can be preserved
Researchers still retain access to some of the controlled characters
Researchers and the developer can identify unencoded Nota Bene sequences
A heatmap is currently the supported visualization
This is a straightforward means of rendering the textual differences
Collation within a Digital Scholarly Edition
Collation for 640-
640-35D- (Copy-Text)
640-36L- (Variant)
640-34L2 (Variant)
Forthcoming Features
Design Improvements
Stakeholders have driven the requirements for the UI
Interviewing and testing for public users must be undertaken
Extending UI features using JavaScript frameworks
The digital edition is current implemented in the Tornado framework for Python
Solutions such as AngularJS and React reduce UI to a set of modular components
They also require a RESTful API to be implemented
Preservation
Encoding and Designing for the Swift Poems Project
Thank you for your attention
Contact Us
James Woolley (woolleyj@lafayette.edu)
Stephen Karian (karians@missouri.edu)
James R. Griffin III (griffinj@lafayette.edu)
Appendix: Workflow and System Architecture
Appendix: Collation Engine
Solutions for collating variants of the transcribed texts were explored
Juxta
Supporting the integration of the Juxta API was given the highest priority
Given existing infrastructure and resource concerns there were limitations:
Preprocessing and postprocessing the TEI-XML Documents was necessary
Juxta itself required performance optimization for our environment
CollateX
An extremely viable solution
Mature (and maintained) Module for Python
Appendix: Collation Engine
Prototyping a collation application in Python
The Tornado framework offered several advantages
Support for multiprocessing in collating larger sets of TEI-XML
Support for WebSockets (enabling asynchronous updates for a collation job)
Python Modules used to extend the features for the collation could be used
Natural Language Toolkit (supporting extensible tokenization)
NetworkX (supporting the building of stemmatic trees)
Integration with API’s for XML databases could also be explored
eXistdb

More Related Content

What's hot

Graph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to DialogueGraph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to DialogueJinho Choi
 
Object Oriented Technologies
Object Oriented TechnologiesObject Oriented Technologies
Object Oriented TechnologiesUmesh Nikam
 
EDF2012 Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...
EDF2012   Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...EDF2012   Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...
EDF2012 Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...European Data Forum
 
Various io stream classes .47
Various io stream classes .47Various io stream classes .47
Various io stream classes .47myrajendra
 
Python Programming language
Python Programming languagePython Programming language
Python Programming languageHadeelAlbedah
 
Guidance, Please! Towards a Framework for RDF-based Constraint Languages.
Guidance, Please! Towards a Framework for RDF-based Constraint Languages.Guidance, Please! Towards a Framework for RDF-based Constraint Languages.
Guidance, Please! Towards a Framework for RDF-based Constraint Languages.Kai Eckert
 
Java chapter 3 - OOPs concepts
Java chapter 3 - OOPs conceptsJava chapter 3 - OOPs concepts
Java chapter 3 - OOPs conceptsMukesh Tekwani
 
Pre trained language model
Pre trained language modelPre trained language model
Pre trained language modelJiWenKim
 
OOP Comparative Study
OOP Comparative StudyOOP Comparative Study
OOP Comparative StudyDarren Tan
 
Oop(object oriented programming)
Oop(object oriented programming)Oop(object oriented programming)
Oop(object oriented programming)geetika goyal
 
Thesis+of+laleh+eshkevari.ppt
Thesis+of+laleh+eshkevari.pptThesis+of+laleh+eshkevari.ppt
Thesis+of+laleh+eshkevari.pptPtidej Team
 
Rest style web services (google protocol buffers) prasad nirantar
Rest style web services (google protocol buffers)   prasad nirantarRest style web services (google protocol buffers)   prasad nirantar
Rest style web services (google protocol buffers) prasad nirantarIndicThreads
 
Principles of-programming-languages-lecture-notes-
Principles of-programming-languages-lecture-notes-Principles of-programming-languages-lecture-notes-
Principles of-programming-languages-lecture-notes-Krishna Sai
 

What's hot (19)

Graph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to DialogueGraph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to Dialogue
 
Javaiostream
JavaiostreamJavaiostream
Javaiostream
 
Abstract Access Control Model for Dynamic RDF Datasets
Abstract Access Control Model for Dynamic RDF DatasetsAbstract Access Control Model for Dynamic RDF Datasets
Abstract Access Control Model for Dynamic RDF Datasets
 
Object Oriented Technologies
Object Oriented TechnologiesObject Oriented Technologies
Object Oriented Technologies
 
EDF2012 Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...
EDF2012   Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...EDF2012   Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...
EDF2012 Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...
 
Msr17a.ppt
Msr17a.pptMsr17a.ppt
Msr17a.ppt
 
Java chapter 3
Java   chapter 3Java   chapter 3
Java chapter 3
 
Various io stream classes .47
Various io stream classes .47Various io stream classes .47
Various io stream classes .47
 
Python Programming language
Python Programming languagePython Programming language
Python Programming language
 
Guidance, Please! Towards a Framework for RDF-based Constraint Languages.
Guidance, Please! Towards a Framework for RDF-based Constraint Languages.Guidance, Please! Towards a Framework for RDF-based Constraint Languages.
Guidance, Please! Towards a Framework for RDF-based Constraint Languages.
 
Java chapter 3 - OOPs concepts
Java chapter 3 - OOPs conceptsJava chapter 3 - OOPs concepts
Java chapter 3 - OOPs concepts
 
Memory models in c#
Memory models in c#Memory models in c#
Memory models in c#
 
Pre trained language model
Pre trained language modelPre trained language model
Pre trained language model
 
Compiler design Project
Compiler design ProjectCompiler design Project
Compiler design Project
 
OOP Comparative Study
OOP Comparative StudyOOP Comparative Study
OOP Comparative Study
 
Oop(object oriented programming)
Oop(object oriented programming)Oop(object oriented programming)
Oop(object oriented programming)
 
Thesis+of+laleh+eshkevari.ppt
Thesis+of+laleh+eshkevari.pptThesis+of+laleh+eshkevari.ppt
Thesis+of+laleh+eshkevari.ppt
 
Rest style web services (google protocol buffers) prasad nirantar
Rest style web services (google protocol buffers)   prasad nirantarRest style web services (google protocol buffers)   prasad nirantar
Rest style web services (google protocol buffers) prasad nirantar
 
Principles of-programming-languages-lecture-notes-
Principles of-programming-languages-lecture-notes-Principles of-programming-languages-lecture-notes-
Principles of-programming-languages-lecture-notes-
 

Similar to Encoding and Designing for the Swift Poems Project

Source-to-source transformations: Supporting tools and infrastructure
Source-to-source transformations: Supporting tools and infrastructureSource-to-source transformations: Supporting tools and infrastructure
Source-to-source transformations: Supporting tools and infrastructurekaveirious
 
mbeddr meets IncQuer - Combining the Best Features of Two Modeling Worlds
mbeddr meets IncQuer - Combining the Best Features of Two Modeling Worldsmbeddr meets IncQuer - Combining the Best Features of Two Modeling Worlds
mbeddr meets IncQuer - Combining the Best Features of Two Modeling WorldsIstvan Rath
 
MarcOnt Initiative - Protege meeting
MarcOnt Initiative - Protege meetingMarcOnt Initiative - Protege meeting
MarcOnt Initiative - Protege meetingmdabrowski
 
Usage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application ScenariosUsage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application ScenariosEUCLID project
 
Lecture 1 introduction to language processors
Lecture 1  introduction to language processorsLecture 1  introduction to language processors
Lecture 1 introduction to language processorsRebaz Najeeb
 
Realizing the promise of portable data processing with Apache Beam
Realizing the promise of portable data processing with Apache BeamRealizing the promise of portable data processing with Apache Beam
Realizing the promise of portable data processing with Apache BeamDataWorks Summit
 
DODDLE-OWL: A Domain Ontology Construction Tool with OWL
DODDLE-OWL: A Domain Ontology Construction Tool with OWLDODDLE-OWL: A Domain Ontology Construction Tool with OWL
DODDLE-OWL: A Domain Ontology Construction Tool with OWLTakeshi Morita
 
Language Server Protocol - Why the Hype?
Language Server Protocol - Why the Hype?Language Server Protocol - Why the Hype?
Language Server Protocol - Why the Hype?mikaelbarbero
 
Swap For Dummies Rsp 2007 11 29
Swap For Dummies Rsp 2007 11 29Swap For Dummies Rsp 2007 11 29
Swap For Dummies Rsp 2007 11 29Julie Allinson
 
COMMitMDE'18: Eclipse Hawk: model repository querying as a service
COMMitMDE'18: Eclipse Hawk: model repository querying as a serviceCOMMitMDE'18: Eclipse Hawk: model repository querying as a service
COMMitMDE'18: Eclipse Hawk: model repository querying as a serviceAntonio García-Domínguez
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataGiorgos Santipantakis
 
Present and future of unified, portable and efficient data processing with Ap...
Present and future of unified, portable and efficient data processing with Ap...Present and future of unified, portable and efficient data processing with Ap...
Present and future of unified, portable and efficient data processing with Ap...DataWorks Summit
 
Re-implementing Thrift using MDE
Re-implementing Thrift using MDERe-implementing Thrift using MDE
Re-implementing Thrift using MDESina Madani
 
Overview of the SPARQL-Generate language and latest developments
Overview of the SPARQL-Generate language and latest developmentsOverview of the SPARQL-Generate language and latest developments
Overview of the SPARQL-Generate language and latest developmentsMaxime Lefrançois
 
Chapter One
Chapter OneChapter One
Chapter Onebolovv
 
Code as Data workshop: Using source{d} Engine to extract insights from git re...
Code as Data workshop: Using source{d} Engine to extract insights from git re...Code as Data workshop: Using source{d} Engine to extract insights from git re...
Code as Data workshop: Using source{d} Engine to extract insights from git re...source{d}
 
JSR 335 / java 8 - update reference
JSR 335 / java 8 - update referenceJSR 335 / java 8 - update reference
JSR 335 / java 8 - update referencesandeepji_choudhary
 

Similar to Encoding and Designing for the Swift Poems Project (20)

10-DesignPatterns.ppt
10-DesignPatterns.ppt10-DesignPatterns.ppt
10-DesignPatterns.ppt
 
Metamorphic Domain-Specific Languages
Metamorphic Domain-Specific LanguagesMetamorphic Domain-Specific Languages
Metamorphic Domain-Specific Languages
 
Icsme16.ppt
Icsme16.pptIcsme16.ppt
Icsme16.ppt
 
Source-to-source transformations: Supporting tools and infrastructure
Source-to-source transformations: Supporting tools and infrastructureSource-to-source transformations: Supporting tools and infrastructure
Source-to-source transformations: Supporting tools and infrastructure
 
mbeddr meets IncQuer - Combining the Best Features of Two Modeling Worlds
mbeddr meets IncQuer - Combining the Best Features of Two Modeling Worldsmbeddr meets IncQuer - Combining the Best Features of Two Modeling Worlds
mbeddr meets IncQuer - Combining the Best Features of Two Modeling Worlds
 
MarcOnt Initiative - Protege meeting
MarcOnt Initiative - Protege meetingMarcOnt Initiative - Protege meeting
MarcOnt Initiative - Protege meeting
 
Usage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application ScenariosUsage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application Scenarios
 
Lecture 1 introduction to language processors
Lecture 1  introduction to language processorsLecture 1  introduction to language processors
Lecture 1 introduction to language processors
 
Realizing the promise of portable data processing with Apache Beam
Realizing the promise of portable data processing with Apache BeamRealizing the promise of portable data processing with Apache Beam
Realizing the promise of portable data processing with Apache Beam
 
DODDLE-OWL: A Domain Ontology Construction Tool with OWL
DODDLE-OWL: A Domain Ontology Construction Tool with OWLDODDLE-OWL: A Domain Ontology Construction Tool with OWL
DODDLE-OWL: A Domain Ontology Construction Tool with OWL
 
Language Server Protocol - Why the Hype?
Language Server Protocol - Why the Hype?Language Server Protocol - Why the Hype?
Language Server Protocol - Why the Hype?
 
Swap For Dummies Rsp 2007 11 29
Swap For Dummies Rsp 2007 11 29Swap For Dummies Rsp 2007 11 29
Swap For Dummies Rsp 2007 11 29
 
COMMitMDE'18: Eclipse Hawk: model repository querying as a service
COMMitMDE'18: Eclipse Hawk: model repository querying as a serviceCOMMitMDE'18: Eclipse Hawk: model repository querying as a service
COMMitMDE'18: Eclipse Hawk: model repository querying as a service
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival data
 
Present and future of unified, portable and efficient data processing with Ap...
Present and future of unified, portable and efficient data processing with Ap...Present and future of unified, portable and efficient data processing with Ap...
Present and future of unified, portable and efficient data processing with Ap...
 
Re-implementing Thrift using MDE
Re-implementing Thrift using MDERe-implementing Thrift using MDE
Re-implementing Thrift using MDE
 
Overview of the SPARQL-Generate language and latest developments
Overview of the SPARQL-Generate language and latest developmentsOverview of the SPARQL-Generate language and latest developments
Overview of the SPARQL-Generate language and latest developments
 
Chapter One
Chapter OneChapter One
Chapter One
 
Code as Data workshop: Using source{d} Engine to extract insights from git re...
Code as Data workshop: Using source{d} Engine to extract insights from git re...Code as Data workshop: Using source{d} Engine to extract insights from git re...
Code as Data workshop: Using source{d} Engine to extract insights from git re...
 
JSR 335 / java 8 - update reference
JSR 335 / java 8 - update referenceJSR 335 / java 8 - update reference
JSR 335 / java 8 - update reference
 

More from James Griffin

Developing a Staff-Only Samvera Application
Developing a Staff-Only Samvera ApplicationDeveloping a Staff-Only Samvera Application
Developing a Staff-Only Samvera ApplicationJames Griffin
 
Open Bibliographic Data and Author Claiming
Open Bibliographic Data and Author ClaimingOpen Bibliographic Data and Author Claiming
Open Bibliographic Data and Author ClaimingJames Griffin
 
Modeling Geospatial Data in Hydra Using GeoConcerns
Modeling Geospatial Data in Hydra Using GeoConcernsModeling Geospatial Data in Hydra Using GeoConcerns
Modeling Geospatial Data in Hydra Using GeoConcernsJames Griffin
 
(Open Hack Night 2014) GitHub Tutorial
(Open Hack Night 2014) GitHub Tutorial(Open Hack Night 2014) GitHub Tutorial
(Open Hack Night 2014) GitHub TutorialJames Griffin
 
(Open Hack Night Fall 2014) Hacking Tutorial
(Open Hack Night Fall 2014) Hacking Tutorial(Open Hack Night Fall 2014) Hacking Tutorial
(Open Hack Night Fall 2014) Hacking TutorialJames Griffin
 
(Open Hack Night Fall 2014) Overview
(Open Hack Night Fall 2014) Overview(Open Hack Night Fall 2014) Overview
(Open Hack Night Fall 2014) OverviewJames Griffin
 

More from James Griffin (6)

Developing a Staff-Only Samvera Application
Developing a Staff-Only Samvera ApplicationDeveloping a Staff-Only Samvera Application
Developing a Staff-Only Samvera Application
 
Open Bibliographic Data and Author Claiming
Open Bibliographic Data and Author ClaimingOpen Bibliographic Data and Author Claiming
Open Bibliographic Data and Author Claiming
 
Modeling Geospatial Data in Hydra Using GeoConcerns
Modeling Geospatial Data in Hydra Using GeoConcernsModeling Geospatial Data in Hydra Using GeoConcerns
Modeling Geospatial Data in Hydra Using GeoConcerns
 
(Open Hack Night 2014) GitHub Tutorial
(Open Hack Night 2014) GitHub Tutorial(Open Hack Night 2014) GitHub Tutorial
(Open Hack Night 2014) GitHub Tutorial
 
(Open Hack Night Fall 2014) Hacking Tutorial
(Open Hack Night Fall 2014) Hacking Tutorial(Open Hack Night Fall 2014) Hacking Tutorial
(Open Hack Night Fall 2014) Hacking Tutorial
 
(Open Hack Night Fall 2014) Overview
(Open Hack Night Fall 2014) Overview(Open Hack Night Fall 2014) Overview
(Open Hack Night Fall 2014) Overview
 

Recently uploaded

TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaWSO2
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....rightmanforbloodline
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxMarkSteadman7
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Choreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringChoreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringWSO2
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governanceWSO2
 

Recently uploaded (20)

TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using Ballerina
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Choreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringChoreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software Engineering
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governance
 

Encoding and Designing for the Swift Poems Project

  • 1. Encoding and Designing for the Swift Poems Project Jonathan Swift and the Text Encoding Initiative James R. Griffin III Digital Library Developer Lafayette College Libraries
  • 2. Introductions James Woolley (Emeritus) Frank Lee and Edna M. Smith Professor of English at Lafayette College Stephen Karian Associate Professor of English at the University of Missouri James R. Griffin III Digital Library Developer at the Lafayette College Libraries
  • 3. Overview of the Swift Poems Project Woolley and Karian seek to archive poems attributed to Jonathan Swift Beginning in 1987, this has involved: Identifying and Cataloging Primary Sources Transcription Copy-typing, encoding, and annotating the primary sources This method has not relied upon the usage of the TEI Collation Collation Identifying the copy-text (and variant texts) for any given poem
  • 4. Overview of the Swift Poems Project The Libraries at Lafayette In 2009 Woolley consulted with the Libraries for assistance with the Project Visual Resources Curator (Paul Miller) developed a set of Microsoft Access Databases These structure the catalogs maintained by Woolley and Karian In 2012, the NEH awarded a Scholarly Editions Grant for the Project This includes supporting the development of a digital edition This will be integrated into volumes of the Cambridge Edition of the Works of Jonathan Swift Digital Scholarship Services (Department in the Libraries)
  • 5. Identifying and Cataloging Primary Sources Identifying the Sources Sources are 18th century printed and manuscript texts To date, over 6500 manuscripts have been identified and cataloged Few digital surrogates for the printed and manuscript texts are available Of these only a restricted set aren’t protected under copyright Cataloging the Sources Bibliographic metadata are structured for each source Elements are extracted from external catalogs (e. g. English Short Title Catalogue)
  • 6. Transcribing the Primary Sources On Poetry A Rapsody (Poem 640-35D-)
  • 7. Transcribing the Primary Sources The transcripts themselves are created using the Nota Bene application Nota Bene encodes textual structure using a system of tags termed as “mode codes” «MDUL»This mode encodes italicized style rendering«MDNM» «MDBO»This mode encodes black letter«MDNM» The researchers have further extended this system to support editorial annotation: Lorem«MDUL»add·caret«MDNM»·this text added with a caretipsum Dolor«MDUL»del«MDNM»·this was deleted·sit amet «MDUL»This mode encodes italicized style rendering«MDNM» «MDBO»This mode encodes black letter«MDNM» Lorem«MDUL»add·caret«MDNM»·this text added with a caretipsum Dolor«MDUL»del«MDNM»·this was deleted·sit amet pasted·over printed text
  • 8. Accessing and Preserving the Transcripts Accessing the Nota Bene comes with challenges The Nota Bene release used by the researchers has been 3.0 (released in 1988) Accessing the Nota Bene directly would require a virtualized environment for Microsoft DOS The Nota Bene transcripts are managed as text/plain media resources The Text Encoding Initiative P5 Provides a robust data model Standardized and open format for interchange A more effective solution for preservation
  • 9. Encoding the Transcripts On Poetry A Rapsody(Poem 640-35D-)
  • 10. Encoding the Transcripts Encoding using the TEI-P5 could not be a manual process The researchers required a system to transform Nota Bene into a TEI-XML implementation An API for Ruby using Nokogiri was developed to support this Viewing the TEI-XML was of limited value Research techniques driven by Nota Bene require a rendering of the text Styled HTML5 (using Twitter Bootstrap) serves as a minimum viable product Improvements can be rapidly prototyped for the encoding This approach takes inspiration from Agile software development practices
  • 11. Viewing the Encoded Transcript On Poetry A Rapsody (Poem 640-35D-)
  • 12. Enriching the Encoded Transcripts Limits are obviously present with this approach Researchers are not encoding the transcripts using the TEI P5 The developer for the Ruby API is not a literary scholar How can this encoding be made collaborative? The developer and the researcher could operate in a shared environment This is inspired heavily by the pair-programming technique within eXtreme Programming In this case, both the developer and a researcher share a physical working environment
  • 13. Enriching the Encoded Transcripts Collaborative encoding and quality control The researchers will identify faults in the rendered transcripts The developer can extend the Ruby API, XSL, or styling for the HTML5 This enables rapid prototyping of the interface Delivery time for the researcher (in transcribing the sources) can be increased In response, the developer can more readily scope improvement requests Textual criticism is still not enabled by this approach The researchers must identify variant readings to a given text
  • 14. Collation for the Swift Poems Project Collation as a solution Originally the researchers collated the Nota Bene transcripts using a FoxPro program Visualization was used to identify variation Tokenization was customized A set of controlled characters (&,~,|,#) symbolized differences in structure 664D721L 1 The Pulteney’s and Shippens & such folk 664D233Y 1 Well may Poultney & Shippen rant, grumble, 664D360Y 1 Your Pulteney & Shippen, ~ ~ folks 664D721L 2 How unlucky it is for the Nation ####### 664D233Y 2 |& curse the hard Fate of the Nation; 664D360Y 2 |How hard is the Fate of the Nation
  • 15. Collation within a Digital Scholarly Edition A collation interface was scoped for the digital edition This interface must enable the transition from the legacy collation engine Collation features could be extended Lines are still tokenized Initially attempted to preprocess the text and use the Penn Treebank tokenizer Ultimately found that abstracting the tokenizer was simply more effective Alignment is addressed without the use of controlled characters The edit distance between tokens can be calculated
  • 16. Collation within a Digital Scholarly Edition Collating Variants for the Poem !W190
  • 17. Collation within a Digital Scholarly Edition The collation can also address flaws in the encoding By default, all unencoded Nota Bene markup is stripped from the TEI Users will be able to collate the texts and visualize differences Optionally Nota Bene can be preserved Researchers still retain access to some of the controlled characters Researchers and the developer can identify unencoded Nota Bene sequences A heatmap is currently the supported visualization This is a straightforward means of rendering the textual differences
  • 18. Collation within a Digital Scholarly Edition Collation for 640- 640-35D- (Copy-Text) 640-36L- (Variant) 640-34L2 (Variant)
  • 19. Forthcoming Features Design Improvements Stakeholders have driven the requirements for the UI Interviewing and testing for public users must be undertaken Extending UI features using JavaScript frameworks The digital edition is current implemented in the Tornado framework for Python Solutions such as AngularJS and React reduce UI to a set of modular components They also require a RESTful API to be implemented Preservation
  • 20. Encoding and Designing for the Swift Poems Project Thank you for your attention Contact Us James Woolley (woolleyj@lafayette.edu) Stephen Karian (karians@missouri.edu) James R. Griffin III (griffinj@lafayette.edu)
  • 21.
  • 22. Appendix: Workflow and System Architecture
  • 23. Appendix: Collation Engine Solutions for collating variants of the transcribed texts were explored Juxta Supporting the integration of the Juxta API was given the highest priority Given existing infrastructure and resource concerns there were limitations: Preprocessing and postprocessing the TEI-XML Documents was necessary Juxta itself required performance optimization for our environment CollateX An extremely viable solution Mature (and maintained) Module for Python
  • 24. Appendix: Collation Engine Prototyping a collation application in Python The Tornado framework offered several advantages Support for multiprocessing in collating larger sets of TEI-XML Support for WebSockets (enabling asynchronous updates for a collation job) Python Modules used to extend the features for the collation could be used Natural Language Toolkit (supporting extensible tokenization) NetworkX (supporting the building of stemmatic trees) Integration with API’s for XML databases could also be explored eXistdb

Editor's Notes

  1. testing