Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Digital Frontiers 2015: eMOP's Imprint (Printer's and Publisher's) DB

935 views

Published on

A description of the eMOP DB, the Imprint DB we created, and how we created it.

Published in: Education
  • If you’re looking for a great essay service then you should check out HelpWriting.net. A friend of mine asked them to write a whole dissertation for him and he said it turned out great! Afterwards I also ordered an essay from them and I was very happy with the work I got too.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

Digital Frontiers 2015: eMOP's Imprint (Printer's and Publisher's) DB

  1. 1. eMOP’s Printers and Publishers: Toward Crafting an Early Modern Print Database Matthew Christy, Elizabeth Grumbach
  2. 2. emop.tamu.edu  eMOP ImprintDB  github.com/Early- Modern-OCR/ImprintDB  Mellon Grant Proposal  idhmc.tamu.edu/projects /Mellon/eMOPPublic.pdf eMOP Info eMOP Resources More eMOP  Facebook  The Early Modern OCR Project  Twitter  #emop  @IDHMC_Nexus  @mandellc  @matt_christy  @EMGrumbach 2 Sept. 18, 2015Digital Frontiers 2015 - eMOP ImprintDB
  3. 3.  The Early Modern OCR Project (eMOP) is an  Andrew W. Mellon Foundation funded grant project running out of the Initiative for Digital Humanities, Media, and Culture (IDHMC) at Texas A&M University, to  develop and test tools and techniques to apply Optical Character Recognition (OCR) to early modern English documents  from the hand press period, roughly 1475-1800.  eMOP aims to improve the visibility of early modern texts by making their contents fully searchable. The current paradigm of searching special collections for early modern materials by either metadata alone or “dirty” OCR is insufficient for scholarly research. 3 Digital Frontiers 2015 - eMOP ImprintDB Goals Sept. 18, 2015
  4. 4. Digital Frontiers 2015 - eMOP ImprintDB 4 Sept. 18, 2015
  5. 5. Wrangling Data The Numbers  EEBO: ~125,000 documents, ~13 million pages images (1475-1700)  ECCO: ~182,000 documents, ~32 million page images (1700-1800)  TCP: ~46,000 double-keyed hand transcriptions (44,000 EEBO, 2,200 ECCO) – Groundtruth  Total: >300,000 documents & ~45 million page images. The Data  ECCO page images (1 pg/ image)  ECCO original OCR results (doc-level XML files)  ECCO TCP transcriptions (doc- level XML and text files)  EEBO page images (2 pgs/ image)  EEBO TCP transcriptions (doc- level XML and text files) Digital Frontiers 2015 - eMOP ImprintDB 5 Sept. 18, 2015
  6. 6. eMOP DB Sept. 18, 2015Digital Frontiers 2015 - eMOP ImprintDB 6 • Document metadata • File locations • Page images • Pages text • Groundtruth text • OCR Results • Pages text • Scores against Groundtruth • Results of analysis • noise measure • skew measure • multiple column coords • corrections made
  7. 7. The Problems Early Modern Imprints  Missing  Incorrect  accidentally by printer  accidentally by DB provider  purposefully  No standard format or consistent inclusion of information  Inconsistent spelling and use of initials  Use of conversational language  Use of non-English (Latin, Welsh)  or a mix of languages 7 Sept. 18, 2015Digital Frontiers 2015 - eMOP ImprintDB Imprinted at London : by John Jugge, dwellyng at the north doore of Paules
  8. 8. Early Modern Imprints  Iterative application of regular expressions to cull out the data:  Who the work was Printed By  Who the work was Printed For  Who the work was Sold By  The Place of printing (London, Cambridge, Dublin, etc.)  The Location of printing (“the north doore of Paules”)  Date (gathered from separate metadata field) The Solution 8 Sept. 18, 2015Digital Frontiers 2015 - eMOP ImprintDB Printed by: Iohn Iugge Place: London Location: the north doore of Paules : 1580? Terms to identify the printer: • “printed”, sometimes also accompanied by “by” • prynted • reprinted or re-printed • imprinted • pressed • brintwyd (Welsh) • Typis, presso, pressare, excudebat, … (Latin) • etc. etc.
  9. 9. Results <work> <emopNO>140776</emopNO> <eccoNO>67101600</eccoNO> <tcpNO>NULL</tcpNO> <estcNO>T077294</estcNO> <imprintORIG>[London] : In the Savoy: printed by John Nutt; for John Walthoe, 1713.</imprintORIG> <date>1713</date> <imprintCLN>London : in the Savoy: printed by John Nutt; for John Walthoe,</imprintCLN> <place>London</place> <printedBy>John Nutt</printedBy> <printedFor>John Walthoe</printedFor> <location>in the Savoy</location> </work> Sept. 18, 2015Digital Frontiers 2015 - eMOP ImprintDB 9 sourcehttp://bit.ly/1hXpVpd
  10. 10. eMOP Outcomes - Github Sept. 18, 2015Digital Frontiers 2015 - eMOP ImprintDB 10 https://github.com/Early-Modern-OCR/ImprintDB
  11. 11. source: http://blog.volkovlaw.com/2013/03/the-future-of-compliance-what-will-the-new-tools-look-like/ Sept. 18, 2015Digital Frontiers 2015 - eMOP ImprintDB 11 Outcomes – DB of EM Printers
  12. 12. The end For eMOP questions please contact us at : mchristy@tamu.edu egrumbac@tamu.edu mandell@tamu.edu 12 Digital Frontiers 2015 - eMOP ImprintDB Sept. 18, 2015

×