SlideShare a Scribd company logo
Mining and mapping places with multiple names
James Butler & Christopher Donaldson
Lancaster University
1901
Corpus of Lake District
Literature
1688 1789 1837
• 80 texts, comprising more than
1,500,000 words
• Mixture of canonical and non-
canonical literature about the Lake
District, mainly from c18 and c19
(78 out of 80 works)
• Mixture of genres, including
guidebooks, travelogues, novels,
poems, journals, and private letters
34 Texts
650K words
22 Texts
250K words
22 Texts
613K words
Sample sentence collocation: beautiful
‘Again entering the boat, we passed up the channel between Lord’s
Island the shore, from whence beautiful prospects are obtained of the
majestic form of Skiddaw, with the woods of Castlehead and
Cockshot Park in the foreground.’ (Edward Baines, A Companion to the
Lakes [1829] 121.)
±5 tokens: No place-names identified
±10 tokens: 2 place-names identified – Lord’s Island & Skiddaw
Within sentence: 4 place-names identified – Lord’s Island, Skiddaw, Castlehead &
Cockshot Park.
Average sentence length
Lake District corpus = 29.8 words
British National Corpus (BNC) = 16 words
from C. Grover, et al., ‘Use of the Edinburgh Geoparser for Georeferencing Digitized
Historical Collections’, Phil. Trans. R. Soc. A 368 (2010) 3875–89.
Diagram of the Edinburgh Geoparser System
Example of input/output from the Edinburgh Geoparser
System
Geo-referenced Data from the Edinburgh Geoparser
Geo-referenced Data, Corrected
Bowness: ‘the curved headland’, from ON bogi/OE boga ‘bow’ and ON nes/OE naess
‘headland’
*Variant Historical Spellings: Bownus, Bawnas, Bonas, Bonus, Boulness
cf. D. Whaley, A Dictionary of Lake District Place Names
(Nottingham: English Place-Name Society, 2006), 42.
Some of the common generic gazetteer geo-referenced issues…
Spatial misattribution.
Onomastic misassumption
Incorrect weighting
Just for the items that are found!
An extract of our custom manually-collected gazetteer for the corpus
Unique
ID
Topog.
Cat.
Primary Name Secondary Names Regional
Placement
CONISTON (lake):
Thurstan, Coniston Lake, Coniston Water, Thurston, Conistone, Conistone
Lake, Cunnistone Lake, Thurston Lake, Coniston Mere, Lake of Coniston,
Conis- ton, Conyngs Tun, Conyngeston, Thorstane's watter, Turstinus.
Geospatial categories chosen for flexibility and degree of universal referential
specificity
An extract from the latest iteration of the corpus - allowing referential
relationships to be analysed on a whole new level.
Lake, Vale, Specific - Farm, Waterfall

More Related Content

Viewers also liked

Measuring research impact with bibliometrics
Measuring research impact with bibliometricsMeasuring research impact with bibliometrics
Measuring research impact with bibliometrics
Lancaster University Library
 
2013 pod travel fellowship announcement final
2013 pod travel fellowship announcement final2013 pod travel fellowship announcement final
2013 pod travel fellowship announcement final
Dillard University Library
 
Alma_Implementation_slides_May06_2016
Alma_Implementation_slides_May06_2016Alma_Implementation_slides_May06_2016
Alma_Implementation_slides_May06_2016
mahongzn
 
Alma Day Presentations - Lancaster University 2013-06-03
Alma Day Presentations - Lancaster University 2013-06-03Alma Day Presentations - Lancaster University 2013-06-03
Alma Day Presentations - Lancaster University 2013-06-03
Lancaster University Library
 
Newcastle University Library - Pop-up Library
Newcastle University Library - Pop-up LibraryNewcastle University Library - Pop-up Library
Newcastle University Library - Pop-up Library
CILIP PPRG
 
Sparc-Japan-Slow-revolution-in-scholarly-communication
Sparc-Japan-Slow-revolution-in-scholarly-communicationSparc-Japan-Slow-revolution-in-scholarly-communication
Sparc-Japan-Slow-revolution-in-scholarly-communication
hierohiero
 
Social Networking with SEDA
Social Networking with SEDASocial Networking with SEDA
Social Networking with SEDASue Beckingham
 
Public Engagement in the Digital Age and Handling Web Negativity
Public Engagement in the Digital Age and Handling Web NegativityPublic Engagement in the Digital Age and Handling Web Negativity
Public Engagement in the Digital Age and Handling Web Negativity
National Research Center, Inc.
 
Science & Community Public Engagement Workshop
Science & Community Public Engagement WorkshopScience & Community Public Engagement Workshop
Science & Community Public Engagement Workshop
wellcome.trust
 
The value of engagement
The value of engagementThe value of engagement
The value of engagement
wellcome.trust
 
Session 5 keeping up to date
Session 5   keeping up to dateSession 5   keeping up to date
Session 5 keeping up to date
RLS-Johnrylands
 
The good, the efficient and the open: changing research workflows and the nee...
The good, the efficient and the open: changing research workflows and the nee...The good, the efficient and the open: changing research workflows and the nee...
The good, the efficient and the open: changing research workflows and the nee...
hierohiero
 
M25 2016 Conference Presentation
M25 2016 Conference PresentationM25 2016 Conference Presentation
Alma Live at Imperial College London
Alma Live at Imperial College LondonAlma Live at Imperial College London
Alma Live at Imperial College London
Andrew Preater
 
Different Media for communicating Science to different groups
Different Media for communicating Science to different groupsDifferent Media for communicating Science to different groups
Different Media for communicating Science to different groups
wellcome.trust
 

Viewers also liked (15)

Measuring research impact with bibliometrics
Measuring research impact with bibliometricsMeasuring research impact with bibliometrics
Measuring research impact with bibliometrics
 
2013 pod travel fellowship announcement final
2013 pod travel fellowship announcement final2013 pod travel fellowship announcement final
2013 pod travel fellowship announcement final
 
Alma_Implementation_slides_May06_2016
Alma_Implementation_slides_May06_2016Alma_Implementation_slides_May06_2016
Alma_Implementation_slides_May06_2016
 
Alma Day Presentations - Lancaster University 2013-06-03
Alma Day Presentations - Lancaster University 2013-06-03Alma Day Presentations - Lancaster University 2013-06-03
Alma Day Presentations - Lancaster University 2013-06-03
 
Newcastle University Library - Pop-up Library
Newcastle University Library - Pop-up LibraryNewcastle University Library - Pop-up Library
Newcastle University Library - Pop-up Library
 
Sparc-Japan-Slow-revolution-in-scholarly-communication
Sparc-Japan-Slow-revolution-in-scholarly-communicationSparc-Japan-Slow-revolution-in-scholarly-communication
Sparc-Japan-Slow-revolution-in-scholarly-communication
 
Social Networking with SEDA
Social Networking with SEDASocial Networking with SEDA
Social Networking with SEDA
 
Public Engagement in the Digital Age and Handling Web Negativity
Public Engagement in the Digital Age and Handling Web NegativityPublic Engagement in the Digital Age and Handling Web Negativity
Public Engagement in the Digital Age and Handling Web Negativity
 
Science & Community Public Engagement Workshop
Science & Community Public Engagement WorkshopScience & Community Public Engagement Workshop
Science & Community Public Engagement Workshop
 
The value of engagement
The value of engagementThe value of engagement
The value of engagement
 
Session 5 keeping up to date
Session 5   keeping up to dateSession 5   keeping up to date
Session 5 keeping up to date
 
The good, the efficient and the open: changing research workflows and the nee...
The good, the efficient and the open: changing research workflows and the nee...The good, the efficient and the open: changing research workflows and the nee...
The good, the efficient and the open: changing research workflows and the nee...
 
M25 2016 Conference Presentation
M25 2016 Conference PresentationM25 2016 Conference Presentation
M25 2016 Conference Presentation
 
Alma Live at Imperial College London
Alma Live at Imperial College LondonAlma Live at Imperial College London
Alma Live at Imperial College London
 
Different Media for communicating Science to different groups
Different Media for communicating Science to different groupsDifferent Media for communicating Science to different groups
Different Media for communicating Science to different groups
 

Similar to Mining and mapping places with multiple names

Varvitos
VarvitosVarvitos
Varvitos
Cidinhoveronese
 
Health_of_the_Casperkill
Health_of_the_CasperkillHealth_of_the_Casperkill
Health_of_the_Casperkill
Dylan Cate
 
GLM-Long
GLM-LongGLM-Long
GLM-Long
Rich Chambers
 
шотландия
шотландияшотландия
601 l9-dicts+quizrev s10[1]
601 l9-dicts+quizrev s10[1]601 l9-dicts+quizrev s10[1]
601 l9-dicts+quizrev s10[1]
bellhawaii
 
Lecture6 radiometricdating
Lecture6 radiometricdatingLecture6 radiometricdating
Lecture6 radiometricdating
airporte
 

Similar to Mining and mapping places with multiple names (6)

Varvitos
VarvitosVarvitos
Varvitos
 
Health_of_the_Casperkill
Health_of_the_CasperkillHealth_of_the_Casperkill
Health_of_the_Casperkill
 
GLM-Long
GLM-LongGLM-Long
GLM-Long
 
шотландия
шотландияшотландия
шотландия
 
601 l9-dicts+quizrev s10[1]
601 l9-dicts+quizrev s10[1]601 l9-dicts+quizrev s10[1]
601 l9-dicts+quizrev s10[1]
 
Lecture6 radiometricdating
Lecture6 radiometricdatingLecture6 radiometricdating
Lecture6 radiometricdating
 

More from Lancaster University Library

Open Research exercise using Mission Model Canvas
Open Research exercise using Mission Model CanvasOpen Research exercise using Mission Model Canvas
Open Research exercise using Mission Model Canvas
Lancaster University Library
 
Promoting a culture of Open Research at Lancaster University
Promoting a culture of Open Research at Lancaster UniversityPromoting a culture of Open Research at Lancaster University
Promoting a culture of Open Research at Lancaster University
Lancaster University Library
 
PSC2019 - Community Building: How Does It Work?
PSC2019 - Community Building: How Does It Work?PSC2019 - Community Building: How Does It Work?
PSC2019 - Community Building: How Does It Work?
Lancaster University Library
 
"We're in the land of poo" - Fertilising your work with knowledge from the field
"We're in the land of poo" - Fertilising your work with knowledge from the field"We're in the land of poo" - Fertilising your work with knowledge from the field
"We're in the land of poo" - Fertilising your work with knowledge from the field
Lancaster University Library
 
Working with police recorded data
Working with police recorded dataWorking with police recorded data
Working with police recorded data
Lancaster University Library
 
Navigating NHS Administrative Data
Navigating NHS Administrative DataNavigating NHS Administrative Data
Navigating NHS Administrative Data
Lancaster University Library
 
Lancaster 2018-open data
Lancaster 2018-open dataLancaster 2018-open data
Lancaster 2018-open data
Lancaster University Library
 
Data bites
Data bitesData bites
Documenting Flood Experience
Documenting Flood ExperienceDocumenting Flood Experience
Documenting Flood Experience
Lancaster University Library
 
Stephen Robinson containers for software preservation
Stephen Robinson containers for software preservationStephen Robinson containers for software preservation
Stephen Robinson containers for software preservation
Lancaster University Library
 
Kris Geyer retrieving psychological relevant data from smartphones
Kris Geyer retrieving psychological relevant data from smartphonesKris Geyer retrieving psychological relevant data from smartphones
Kris Geyer retrieving psychological relevant data from smartphones
Lancaster University Library
 
20171003 lancaster data conversations Chue-Hong
20171003 lancaster data conversations Chue-Hong20171003 lancaster data conversations Chue-Hong
20171003 lancaster data conversations Chue-Hong
Lancaster University Library
 
Andrew Moore past-present-potential
Andrew Moore past-present-potentialAndrew Moore past-present-potential
Andrew Moore past-present-potential
Lancaster University Library
 
Barry Rowlingson CHICAS use of git lab
Barry Rowlingson CHICAS use of git labBarry Rowlingson CHICAS use of git lab
Barry Rowlingson CHICAS use of git lab
Lancaster University Library
 
The sensor cloud around us
The sensor cloud around usThe sensor cloud around us
The sensor cloud around us
Lancaster University Library
 
Running Research as a Service. Implications for Privacy Policies and Ethics
Running Research as a Service. Implications for Privacy Policies and EthicsRunning Research as a Service. Implications for Privacy Policies and Ethics
Running Research as a Service. Implications for Privacy Policies and Ethics
Lancaster University Library
 
Security overview at Lancaster University
Security overview at Lancaster UniversitySecurity overview at Lancaster University
Security overview at Lancaster University
Lancaster University Library
 
Better data for better justice - Towards data-driven analyses of Family Court...
Better data for better justice - Towards data-driven analyses of Family Court...Better data for better justice - Towards data-driven analyses of Family Court...
Better data for better justice - Towards data-driven analyses of Family Court...
Lancaster University Library
 
Cloud computing - When is Deletion Deletion?
Cloud computing - When is Deletion Deletion?Cloud computing - When is Deletion Deletion?
Cloud computing - When is Deletion Deletion?
Lancaster University Library
 
Sharing Qualitative Data - Challenges and Opportunities
Sharing Qualitative Data - Challenges and OpportunitiesSharing Qualitative Data - Challenges and Opportunities
Sharing Qualitative Data - Challenges and Opportunities
Lancaster University Library
 

More from Lancaster University Library (20)

Open Research exercise using Mission Model Canvas
Open Research exercise using Mission Model CanvasOpen Research exercise using Mission Model Canvas
Open Research exercise using Mission Model Canvas
 
Promoting a culture of Open Research at Lancaster University
Promoting a culture of Open Research at Lancaster UniversityPromoting a culture of Open Research at Lancaster University
Promoting a culture of Open Research at Lancaster University
 
PSC2019 - Community Building: How Does It Work?
PSC2019 - Community Building: How Does It Work?PSC2019 - Community Building: How Does It Work?
PSC2019 - Community Building: How Does It Work?
 
"We're in the land of poo" - Fertilising your work with knowledge from the field
"We're in the land of poo" - Fertilising your work with knowledge from the field"We're in the land of poo" - Fertilising your work with knowledge from the field
"We're in the land of poo" - Fertilising your work with knowledge from the field
 
Working with police recorded data
Working with police recorded dataWorking with police recorded data
Working with police recorded data
 
Navigating NHS Administrative Data
Navigating NHS Administrative DataNavigating NHS Administrative Data
Navigating NHS Administrative Data
 
Lancaster 2018-open data
Lancaster 2018-open dataLancaster 2018-open data
Lancaster 2018-open data
 
Data bites
Data bitesData bites
Data bites
 
Documenting Flood Experience
Documenting Flood ExperienceDocumenting Flood Experience
Documenting Flood Experience
 
Stephen Robinson containers for software preservation
Stephen Robinson containers for software preservationStephen Robinson containers for software preservation
Stephen Robinson containers for software preservation
 
Kris Geyer retrieving psychological relevant data from smartphones
Kris Geyer retrieving psychological relevant data from smartphonesKris Geyer retrieving psychological relevant data from smartphones
Kris Geyer retrieving psychological relevant data from smartphones
 
20171003 lancaster data conversations Chue-Hong
20171003 lancaster data conversations Chue-Hong20171003 lancaster data conversations Chue-Hong
20171003 lancaster data conversations Chue-Hong
 
Andrew Moore past-present-potential
Andrew Moore past-present-potentialAndrew Moore past-present-potential
Andrew Moore past-present-potential
 
Barry Rowlingson CHICAS use of git lab
Barry Rowlingson CHICAS use of git labBarry Rowlingson CHICAS use of git lab
Barry Rowlingson CHICAS use of git lab
 
The sensor cloud around us
The sensor cloud around usThe sensor cloud around us
The sensor cloud around us
 
Running Research as a Service. Implications for Privacy Policies and Ethics
Running Research as a Service. Implications for Privacy Policies and EthicsRunning Research as a Service. Implications for Privacy Policies and Ethics
Running Research as a Service. Implications for Privacy Policies and Ethics
 
Security overview at Lancaster University
Security overview at Lancaster UniversitySecurity overview at Lancaster University
Security overview at Lancaster University
 
Better data for better justice - Towards data-driven analyses of Family Court...
Better data for better justice - Towards data-driven analyses of Family Court...Better data for better justice - Towards data-driven analyses of Family Court...
Better data for better justice - Towards data-driven analyses of Family Court...
 
Cloud computing - When is Deletion Deletion?
Cloud computing - When is Deletion Deletion?Cloud computing - When is Deletion Deletion?
Cloud computing - When is Deletion Deletion?
 
Sharing Qualitative Data - Challenges and Opportunities
Sharing Qualitative Data - Challenges and OpportunitiesSharing Qualitative Data - Challenges and Opportunities
Sharing Qualitative Data - Challenges and Opportunities
 

Recently uploaded

ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
74nqk8xf
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
zsjl4mimo
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
Natural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptxNatural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptx
fkyes25
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
g4dpvqap0
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
kuntobimo2016
 

Recently uploaded (20)

ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
Natural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptxNatural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptx
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
 

Mining and mapping places with multiple names

  • 1. Mining and mapping places with multiple names James Butler & Christopher Donaldson Lancaster University
  • 2. 1901 Corpus of Lake District Literature 1688 1789 1837 • 80 texts, comprising more than 1,500,000 words • Mixture of canonical and non- canonical literature about the Lake District, mainly from c18 and c19 (78 out of 80 works) • Mixture of genres, including guidebooks, travelogues, novels, poems, journals, and private letters 34 Texts 650K words 22 Texts 250K words 22 Texts 613K words
  • 3. Sample sentence collocation: beautiful ‘Again entering the boat, we passed up the channel between Lord’s Island the shore, from whence beautiful prospects are obtained of the majestic form of Skiddaw, with the woods of Castlehead and Cockshot Park in the foreground.’ (Edward Baines, A Companion to the Lakes [1829] 121.) ±5 tokens: No place-names identified ±10 tokens: 2 place-names identified – Lord’s Island & Skiddaw Within sentence: 4 place-names identified – Lord’s Island, Skiddaw, Castlehead & Cockshot Park. Average sentence length Lake District corpus = 29.8 words British National Corpus (BNC) = 16 words
  • 4. from C. Grover, et al., ‘Use of the Edinburgh Geoparser for Georeferencing Digitized Historical Collections’, Phil. Trans. R. Soc. A 368 (2010) 3875–89. Diagram of the Edinburgh Geoparser System
  • 5. Example of input/output from the Edinburgh Geoparser System
  • 6. Geo-referenced Data from the Edinburgh Geoparser
  • 8. Bowness: ‘the curved headland’, from ON bogi/OE boga ‘bow’ and ON nes/OE naess ‘headland’ *Variant Historical Spellings: Bownus, Bawnas, Bonas, Bonus, Boulness cf. D. Whaley, A Dictionary of Lake District Place Names (Nottingham: English Place-Name Society, 2006), 42.
  • 9. Some of the common generic gazetteer geo-referenced issues… Spatial misattribution. Onomastic misassumption Incorrect weighting Just for the items that are found!
  • 10. An extract of our custom manually-collected gazetteer for the corpus Unique ID Topog. Cat. Primary Name Secondary Names Regional Placement CONISTON (lake): Thurstan, Coniston Lake, Coniston Water, Thurston, Conistone, Conistone Lake, Cunnistone Lake, Thurston Lake, Coniston Mere, Lake of Coniston, Conis- ton, Conyngs Tun, Conyngeston, Thorstane's watter, Turstinus.
  • 11. Geospatial categories chosen for flexibility and degree of universal referential specificity
  • 12. An extract from the latest iteration of the corpus - allowing referential relationships to be analysed on a whole new level. Lake, Vale, Specific - Farm, Waterfall

Editor's Notes

  1. Overview of corpus…
  2. Our interest in finding what attributes are given to places mentioned…
  3. The Edinburgh Geoparser: NLP tool on which we’ve relied
  4. What the Geoparser do…
  5. The Geoparser output a bit ropey…
  6. Much correction required..
  7. One of the chief reasons for the poor performance of the geoparser is place-name variation…
  8. Geospatial relationships between environmental types as well as connective strengths between any paired locations.