There is an obvious convergence between the two projects, and we imagine many benefits from interoperability. The gazetteer and mapping features in MoEML would significantly enhance the critical apparatus of the plays, while tying MoEML's placeography into the works of Shakespeare and his contemporaries would reinforce links between the physical geography and the literature.
We cannot simply ask or expect the editors of ISE plays to tag all the placenames for us. They're too busy with other stuff, and they can't see the payoff.
Our original plan involved the creation of a detailed London gazetteer, including all the variant spellings of placenames we know from our own texts, along with a training set of manually tagged plays, to serve as input to the NER process.
Between the first and second plays, I improved the gazetteer substantially by importing a lot of non-London content; and for each play after the first, there's a larger training set, leading to better results. While precision is remarkably good – over 95% for the last two – recall is very low, and improving only slowly. Note: NER did find several placenames I'd missed in the tagging of the plays.
Places are people throughout the history plays. Syntax is frequently convoluted. Spelling in the old-spelling plays is inconsistent within the play. Nouns are frequently capitalized, so capitalization is not a useful clue for the NER engine as it is with modern texts.
The point here is that the placenames we are most interested in are precisely the ones the tagger is least good at finding. It even missed "London" in one case. Note also, though, that despite finding "England" 29 times, it contrived to miss it 14 times.
No, because it's hopeless at the very thing we care about most; and we only have 10 plays in our Shakespeare set. Possibly, because it's slowly getting better, and although we have only 10 Shakespeare plays, we have up to 75 "city plays" coming in the future from Digital Renaissance Editions (out of 500 they intend to tag). Yes, because NER did catch a few instances of placenames I'd missed.
Final dh2013 interoperability
DH 2013, Nebraska
The Map of Early Modern London
and Internet Shakespeare Editions
Janelle Jenstad and Martin Holmes
University of Victoria
MoEML and ISE
● On UVic servers
● Overlapping teams
● Mutual need
The Map of Early Modern London
● Maps streets, sites and
boundaries of London 1560-1640
● Interface based on Agas Map
● Includes (1) gazetteer, (2)
encyclopedia of London people
and places, (3) library of primary
source texts, (4) edition of A
Survey of London
● Pure TEI XML throughout
Internet Shakespeare Editions
● Open-source digital anthology
● Also hosts and incubates Queen's
Men's Editions and Digital
● Goal: all plays of Shakespeare and
● SGML and non-standard XML :-(
Frequency of toponyms
The London locations in Richard III on the Agas Map, sized
according to the number of references to them.
● How typical is Shakespeare's invocation of
● How do his characters move through the urban
● What is the relationship between London and
● How does this vision compare to other
playwrights and to historians?
Is it reasonable to ask editors to revisit their
● Can we overcome the significant programming
● First rule of collaboration: You're on your own.
● The ISE agenda is not MoEML's agenda.
● MoEML can't ask the ISE editors to tag their
● MoEML can't depend on the ISE programmers to
implement things for us.
● We must beware of making features on our site
dependent on functionality on theirs.
● We take the ISE texts and tag them.
● We generate sets of links based on through
line numbers (TLNs).
● We store the links in our database.
● We only depend on the fact that links to TLNs
on their website work.
● 4 plays:
– Richard II and Richard III (modern spelling)
– Henry VIII and Henry VI Part 2 (old spelling)
● 495 placenames marked up
● 95 linked to Map of Early Modern London
Difficulties for NER tagger
● <stage>Enter Yorke, Salisbury, and
● "Was not your husband / In Margaret's battle at
St Albans slain?"
● Spelling variation ("Tower" versus "Towre")
● Capitalization is unhelpful in old-spelling texts.
● Short utterances confuse it:
– Queen Margaret: Richard.
– Richard: <LOC>Ha</LOC>?
The showstopper problem
● Henry VI pt 2:
– 210 placenames in the text
– tagger tagged 109 places, of which 106 were correct
– 29 of these were "England" and 38 "France"
– Among placenames missed:
● 48 were in Britain
● 20 of these were key London locations (Bedlam,
Southwark, London Bridge & Smithfield)
Is it worth using NER for
– It can function as a check on manual tagging.
– 75 "city plays" are eventually coming...
● Second rule of collaboration: nobody wants to
be left out.
● Now the ISE editors have seen how we're
linking to their plays, they want to tag
placenames for themselves.
● We'll just be able to harvest their tags for
Internal Links to MoEML's London Locations
We are moving towards interoperability with The Map of Early
Modern London (MoEML). If your play includes references to
London locations, you will identify each London location using
the ilink element and the unique MoEML identifier for the
ISE guidelines, cont.
The purpose of this tagging is two-fold: (1) it
will allow us to visualize the London locations in
a play using a MoEML map in the
ISE/DRE/QME environment, and (2) it will
allow MoEML to import London references in
ISE/DRE/QME plays into its database of
literary references (with a link back to the ISE).
ISE will have various instructions in its "geo"
component (England, France, Europe, London, stage
All we need is the mol:XXXX# and the TLN
Should we continue to use NER?
ISE wants to use tags only in modern critical
ISE editions of 1 Henry IV, 2 Henry IV, and
Henry V are “done.”
500+ plays in DRE