Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Chalice / Edinburgh Geoparser at GISRUK


Published on

Published in: Education
  • Be the first to comment

  • Be the first to like this

Chalice / Edinburgh Geoparser at GISRUK

  1. 1. Chalice – Linked Data and Historic Place-names Jo Walsh [email_address] Kate Byrne, Richard Tobin, Claire Grover
  2. 2. Overview of the Edinburgh Geoparser <ul><li>System to automatically recognise place names in text and disambiguate them with respect to a gazetteer. (Athens, Springfield) </li></ul><ul><li>Patchy development over past few years funded by a variety of projects applied to a range of data sets: </li></ul><ul><ul><li>GeoCrossWalk </li></ul></ul><ul><ul><li>BOPCRIS </li></ul></ul><ul><ul><li>GeoDigRef (Histpop, BOPCRIS, BL) </li></ul></ul><ul><ul><li>Embedding GeoCrossWalk (Stormont Papers) </li></ul></ul><ul><ul><li>SYNC3 (online news) </li></ul></ul><ul><ul><li>Chalice (EPNS) </li></ul></ul><ul><ul><li>Unlock </li></ul></ul><ul><li>Main concern has been to keep it generally usable while applying it to specific data sets. </li></ul>
  3. 3. Overview of the Edinburgh Geoparser .txt .html .xml Format conversion Tokenisation POS tagging Lemmatis- ation Named Entity Recognition .geotagged.xml Geotagging Gazetteer lookup Resolution .geotagged.xml .gaz.xml Georesolution
  4. 6. Chalice <ul><li>Connecting Historical Authorities with Linked Data, Contexts, and Entities. </li></ul><ul><li>Part of jiscEXPO - &quot;exposing digital content for education and research&quot;. </li></ul><ul><li>The project is exploring the viability of creating a historical gazetteer from digitized volumes from the English Place-Name Society (EPNS). </li></ul><ul><li>Partners: </li></ul><ul><ul><li>CDDA, Queen’s University, Belfast </li></ul></ul><ul><ul><li>School of Informatics, Edinburgh </li></ul></ul><ul><ul><li>EDINA, Edinburgh </li></ul></ul><ul><ul><li>CeRch, Kings College London </li></ul></ul>
  5. 7. English Place-Name Survey <ul><li>At the Institute of Name Studies in Nottingham </li></ul><ul><li>80+ volumes covering English counties </li></ul><ul><li>Over 1000 years of place-name history </li></ul><ul><li>Started in 1925 and still going! </li></ul>
  6. 9. Archaeology and Place-names and History <ul><li>&quot;The first point, already noted repeatedly but so important that it cannot be too strongly emphasised, is that historical evidence is documentary and therefore direct evidence only of a state of mind; that archaeological evidence is material and therefore direct evidence only of practical skills, technological processes, aesthetic interests and physical sequences; and that place-name evidence is linguistic and therefore direct evidence only of language and speech habits. Indirect inferences may be drawn in each case, and the evidence of place-names may be used to throw light on the date, nature and extent of settlements, on the movements of peoples and their relationships to each other , on certain aspects of their organisation and on many of the other problems that concern the historian and the archaeologist. But in all these cases the inferences depend to some extent on assumptions and they must be examined carefully before they are accepted as valid.&quot; – F.T. Wainwright </li></ul>
  7. 10. Chalice data <ul><li>Cheshire </li></ul><ul><ul><li>Cheshire Part I. EPNS Volume 44, 1970 </li></ul></ul><ul><ul><li>Cheshire Part II. EPNS Volume 45, 1970 </li></ul></ul><ul><ul><li>Cheshire Part III. EPNS Volume 46, 1971 </li></ul></ul><ul><ul><li>Cheshire Part IV. EPNS Volume 47, 1972 </li></ul></ul><ul><ul><li>Cheshire Part V (1 :i). EPNS Volume 48, 1981 </li></ul></ul><ul><ul><li>Cheshire Part V (1 :ii). EPNS Volume 54, 1981 </li></ul></ul><ul><li>Small samples from: </li></ul><ul><ul><li>Berkshire, Buckinghamshire (Vol. 2), Cambridgeshire (Vol 19), Derbyshire (Vols 27-29), Hertfordshire (Vol. 15) </li></ul></ul><ul><li>Shropshire: Pimhill Hundred (born digital) </li></ul>
  8. 11. EPNS <ul><li>Parishes organised in terms of the hundreds in which they belong. </li></ul><ul><li>Towns and villages referred to as townships, organised in terms of the parish in which they belong. </li></ul><ul><li>Township descriptions often contain descriptions of buildings, bridges, lanes, woods and farms. </li></ul><ul><li>Information about river and major road names are described separately from the inhabited place descriptions. </li></ul><ul><li>Names and spellings that have been attested in historical sources and the etymology of names or name parts. </li></ul><ul><li>In Chalice we focus on capturing parishes, townships, sub-townships, attestation. </li></ul>
  9. 13. The start of the entry for the township of Willaston in the parish of Neston in Wirral Hundred.
  10. 18. Turtle-ish version <ul><li>@prefix owl: <> . </li></ul><ul><li>@prefix geo: <> . </li></ul><ul><li>@prefix gn: <> . </li></ul><ul><li>@prefix dc: <> </li></ul><ul><li>@prefix chalice: <> </li></ul><ul><li>:Bosley a chalice:Place; </li></ul><ul><li>dc:title Bosley . </li></ul><ul><li>owl:sameas <> </li></ul><ul><li>:Boselega a chalice:PlaceName; </li></ul><ul><li>dc:title Boselega . </li></ul><ul><li>#attested a chalice:PlaceNameAttestation; </li></ul><ul><li>chalice:place :Bosley ; </li></ul><ul><li>chalice:known_as :Boselega ; </li></ul><ul><li>chalice:source :DB ; </li></ul><ul><li>chalice:date 1086 . </li></ul><ul><li>:DB a chalice:Source dc:title 'Domesday Book' . </li></ul>
  11. 19. Linking Data <ul><li>A URI for each place-name </li></ul><ul><li>Links to information about each attestation </li></ul><ul><li>Links to nearby places </li></ul><ul><li>Links to other sources of place-name references </li></ul><ul><ul><li> (variable quality, wide usage) </li></ul></ul><ul><ul><li>Ordnance Survey Open Data (also variable quality) </li></ul></ul><ul><li>Then links from and between documentary sources </li></ul>
  12. 23. Issues <ul><li>OCR quality needs to be high: not just recognising characters correctly but getting font and layout information right. </li></ul><ul><li>Variation in use of layout and font to indicate structure </li></ul><ul><li>Different volumes reflect different decisions about where place name information should be put </li></ul><ul><li>Consider long-term preservation of URIs </li></ul><ul><li>Need to share vocabularies with other projects </li></ul><ul><li>(Pleaides, SPQR, geodataverse?) </li></ul>
  13. 24. Integrating (with) other sources <ul><li>Series of use cases by Stuart Dunn at KCL </li></ul><ul><li>Victoria County History </li></ul><ul><li>Clergy of the Church of England Database </li></ul><ul><li>Archaeology Data Service </li></ul>
  14. 29. GAP & Ancient Place-names <ul><li>Based on Pleiades set of ancient place names but extended in two ways: </li></ul><ul><li>by matching Pleiades place names against GeoNames place names in the same location and adding the GeoNames alternative names to the Pleiades+ list: </li></ul><ul><ul><li>adds three alternative names for the single Pleiades entry for &quot;Autricum&quot; (&quot;Chartrez&quot;, &quot;Chartres&quot;, &quot;Shartr&quot;), because &quot;Autricum” is present in both Pleiades and GeoNames, with the same approximate location </li></ul></ul><ul><li>(We don't want to simply take places directly from GeoNames because, when we tried it, we were swamped with irrelevant modern places having names corresponding to ancient toponyms.) </li></ul>
  15. 30. Pleiades+(+) <ul><li>Pleiades+: get alternative names for places that match in geonames </li></ul><ul><li>Pleiades++ is a runtime supercharging bit: </li></ul><ul><ul><li>if place X isn't in Pleiades+, </li></ul></ul><ul><ul><li>look at &quot;synonym ring&quot; of alternative names in geonames </li></ul></ul><ul><ul><li>try all of those against Pleiades+ </li></ul></ul><ul><ul><li>mysql> select distinct,p.plid,p.geonameId,p.fclass,p.fcode,,p.latitude,p.longitude,p.population,p.normname from plplus p join geonames.alternatename a on join geonames.geoname g on a.geonameid=g.geonameid join geonames.alternatename a2 on a2.geonameid=g.geonameid where a2.alternatename=&quot;Egypt&quot;; +----------+---------+-----------+--------+-------+---------+------------+------------+------------+----------+ | name     | plid    | geonameId | fclass | fcode | country | latitude   | longitude  | population | normname | +----------+---------+-----------+--------+-------+---------+------------+------------+------------+----------+ | Aegyptus |     766 |         0 |        |       |         | 32.5000000 | 32.5000000 |          0 | aegyptus | | Aegyptus |  981503 |         0 |        |       |         | 27.5000000 | 26.5476190 |          0 | aegyptus | | Aigyptos | 1001943 |         0 |        |       |         | 32.5000000 | 32.5000000 |          0 | aigyptos | +----------+---------+-----------+--------+-------+---------+------------+------------+------------+----------+ 3 rows in set (0.05 sec) </li></ul></ul>
  16. 31. Thanks