SlideShare a Scribd company logo
Automatic Reconstruction of Emperor Itineraries
from the Regesta Imperii
Juri Opitz, Leo Born, Vivi Nastase, Yannick Pultar
The RI corpus
● more than 150,000 “regests”
○ abstracts of charters issued by the Holy Roman Emperors
■ and also events (battles, births, etc.)
○ reference time span: almost 1,000 years
○ starting from the Carolingian dynasty
○ ….to Maximilian I
Historic itinerary research
● examines traveling paths of historic entities
● to examine their influence and reach
Regests have a given place name
was he really here?
empty?!
Let’s just map “Ofen” to a map -- easy
● is it?
Openstreetmap returns many places - however, not the correct one
Google returns places where I can buy Ovens.
Geonames: 19 places scattered across the globe…. Ofen, sub-part of Budapest
To sum up...
● It’s not so easy to map regest place names from a time span of almost 1k years onto maps
● To address this, we engage two main problems:
○ place name prediction: many place names are unknown or return zero candidates
○ coordinate prediction: place name queries return large candidate sets and the correct point must be chosen
Place name prediction
● Experiments with Logistic regression
○ features: last known place name, text uni grams, emperor
● baseline 1: most-frequent-place name
● baseline 2: last known place name
○ closest possible anterior regest, time-wise
Place name prediction results
% of issuers where method performed best
% of correct choices
% correct choices - mean over all issuers
For every place name...
● the place name predictions made sure that we have a non-empty set of candidates points
○ problem: sometimes the correct point is not contained in the candidate set (Future work)
Coordinate prediction
● we model the itinerary of an emperor in a DAG
● Assumption: lowest-cost path approximates the true itinerary
Edge cost heuristic
bias towards crowded places
(many places of medieval significance are still crowded
today, e.g. Rome, Nuremberg, etc.)
straight line distance
bias towards high ranked results
bias towards exact name matches
(we want to keep unexact matches, e.g.
Franckfurt -> autocorrect -> Frankfurt)
Shortest path selection enables us to obtain...
● for every regest/event a tuple of predicted lat-lng coordinates
● additionally we compute centroids
○ i.e. for every place name we compute the most centered coordinate
○ many and frequent place names have unequivocal points of reference (Rome, Nuremberg, etc.)
Gold standard
● Gold standard
○ appr. 10k place names manually resolved by HiWi interns on a place name level
■ this means that the gold standard cannot possibly account for the case where a king visited two places
of same name but different locations
○ our resolutions: event-level
○ nevertheless, it’s the best we have to evaluate against
Results of different path searches vs. time
very hard
even for historians
Staufer’s
Italian travels
dist. to gold,
lower = better
Did we find the correct Ofen?
Naive selection (random) vs optimal path
Detection of human labeling error
false: hiwi
correct: automatic
Conclusions
● optimal path better predictions than greedy and much better than random
○ evidence that our edge cost heuristic formula contains some useful information
● method can capture human annotation errors
● in some time periods, places are much harder to resolve than in others
Future work
● improve place name prediction
○ try time-series prediction models which model geo-spatial-temporal context better
○ place name normalization (Franckfurt, Vrankenforde, Franckenfurt → Frankfurt a. Main)
● improve coordinate prediction
○ improve cost heuristic
○ try historian place gazetteers instead of modern geo data bases
■ caveat: how well will they generalize across Europe and over almost 1k years?
● mine and resolve the rich place names and place name references inside the texts
○ difficult but yields new large-scale resources and options for statistical historic itinerary research!
Thank you for your attention!

More Related Content

More from IMPACT Centre of Competence

More from IMPACT Centre of Competence (20)

Session6 03.sandra young
Session6 03.sandra youngSession6 03.sandra young
Session6 03.sandra young
 
Session6 02.jeremi ochab
Session6 02.jeremi ochabSession6 02.jeremi ochab
Session6 02.jeremi ochab
 
Session5 04.evangelos varthis
Session5 04.evangelos varthisSession5 04.evangelos varthis
Session5 04.evangelos varthis
 
Session5 03.george rehm
Session5 03.george rehmSession5 03.george rehm
Session5 03.george rehm
 
Session5 02.tom derrick
Session5 02.tom derrickSession5 02.tom derrick
Session5 02.tom derrick
 
Session5 01.rutger vankoert
Session5 01.rutger vankoertSession5 01.rutger vankoert
Session5 01.rutger vankoert
 
Session4 04.senka drobac
Session4 04.senka drobacSession4 04.senka drobac
Session4 04.senka drobac
 
Session3 04.arnau baro
Session3 04.arnau baroSession3 04.arnau baro
Session3 04.arnau baro
 
Session3 03.christian clausner
Session3 03.christian clausnerSession3 03.christian clausner
Session3 03.christian clausner
 
Session3 02.kimmo ketunnen
Session3 02.kimmo ketunnenSession3 02.kimmo ketunnen
Session3 02.kimmo ketunnen
 
Session3 01.clemens neudecker
Session3 01.clemens neudeckerSession3 01.clemens neudecker
Session3 01.clemens neudecker
 
Session2 04.ashkan ashkpour
Session2 04.ashkan ashkpourSession2 04.ashkan ashkpour
Session2 04.ashkan ashkpour
 
Session2 02.christian reul
Session2 02.christian reulSession2 02.christian reul
Session2 02.christian reul
 
Session2 01.emad mohamed
Session2 01.emad mohamedSession2 01.emad mohamed
Session2 01.emad mohamed
 
Session1 04.florian fink
Session1 04.florian finkSession1 04.florian fink
Session1 04.florian fink
 
Session1 02.anna-maria sichani
Session1 02.anna-maria sichaniSession1 02.anna-maria sichani
Session1 02.anna-maria sichani
 
Session1 01.konstantin baierer
Session1 01.konstantin baiererSession1 01.konstantin baierer
Session1 01.konstantin baierer
 
Advanced Imaging Services at KU Leuven Libraries Webinar slides
Advanced Imaging Services at KU Leuven Libraries Webinar slidesAdvanced Imaging Services at KU Leuven Libraries Webinar slides
Advanced Imaging Services at KU Leuven Libraries Webinar slides
 
Xii simposi internacional noves tendencies
Xii simposi internacional noves tendenciesXii simposi internacional noves tendencies
Xii simposi internacional noves tendencies
 
Impact management report 2016
Impact management report 2016Impact management report 2016
Impact management report 2016
 

Recently uploaded

Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Peter Udo Diehl
 

Recently uploaded (20)

Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 

Session2 03.juri opitz

  • 1. Automatic Reconstruction of Emperor Itineraries from the Regesta Imperii Juri Opitz, Leo Born, Vivi Nastase, Yannick Pultar
  • 2. The RI corpus ● more than 150,000 “regests” ○ abstracts of charters issued by the Holy Roman Emperors ■ and also events (battles, births, etc.) ○ reference time span: almost 1,000 years ○ starting from the Carolingian dynasty ○ ….to Maximilian I
  • 3. Historic itinerary research ● examines traveling paths of historic entities ● to examine their influence and reach
  • 4. Regests have a given place name was he really here? empty?!
  • 5. Let’s just map “Ofen” to a map -- easy ● is it?
  • 6. Openstreetmap returns many places - however, not the correct one
  • 7. Google returns places where I can buy Ovens.
  • 8. Geonames: 19 places scattered across the globe…. Ofen, sub-part of Budapest
  • 9. To sum up... ● It’s not so easy to map regest place names from a time span of almost 1k years onto maps ● To address this, we engage two main problems: ○ place name prediction: many place names are unknown or return zero candidates ○ coordinate prediction: place name queries return large candidate sets and the correct point must be chosen
  • 10. Place name prediction ● Experiments with Logistic regression ○ features: last known place name, text uni grams, emperor ● baseline 1: most-frequent-place name ● baseline 2: last known place name ○ closest possible anterior regest, time-wise
  • 11. Place name prediction results % of issuers where method performed best % of correct choices % correct choices - mean over all issuers
  • 12. For every place name... ● the place name predictions made sure that we have a non-empty set of candidates points ○ problem: sometimes the correct point is not contained in the candidate set (Future work)
  • 13. Coordinate prediction ● we model the itinerary of an emperor in a DAG ● Assumption: lowest-cost path approximates the true itinerary
  • 14.
  • 15. Edge cost heuristic bias towards crowded places (many places of medieval significance are still crowded today, e.g. Rome, Nuremberg, etc.) straight line distance bias towards high ranked results bias towards exact name matches (we want to keep unexact matches, e.g. Franckfurt -> autocorrect -> Frankfurt)
  • 16. Shortest path selection enables us to obtain... ● for every regest/event a tuple of predicted lat-lng coordinates ● additionally we compute centroids ○ i.e. for every place name we compute the most centered coordinate ○ many and frequent place names have unequivocal points of reference (Rome, Nuremberg, etc.)
  • 17. Gold standard ● Gold standard ○ appr. 10k place names manually resolved by HiWi interns on a place name level ■ this means that the gold standard cannot possibly account for the case where a king visited two places of same name but different locations ○ our resolutions: event-level ○ nevertheless, it’s the best we have to evaluate against
  • 18. Results of different path searches vs. time very hard even for historians Staufer’s Italian travels dist. to gold, lower = better
  • 19. Did we find the correct Ofen?
  • 20. Naive selection (random) vs optimal path
  • 21. Detection of human labeling error false: hiwi correct: automatic
  • 22. Conclusions ● optimal path better predictions than greedy and much better than random ○ evidence that our edge cost heuristic formula contains some useful information ● method can capture human annotation errors ● in some time periods, places are much harder to resolve than in others
  • 23. Future work ● improve place name prediction ○ try time-series prediction models which model geo-spatial-temporal context better ○ place name normalization (Franckfurt, Vrankenforde, Franckenfurt → Frankfurt a. Main) ● improve coordinate prediction ○ improve cost heuristic ○ try historian place gazetteers instead of modern geo data bases ■ caveat: how well will they generalize across Europe and over almost 1k years? ● mine and resolve the rich place names and place name references inside the texts ○ difficult but yields new large-scale resources and options for statistical historic itinerary research!
  • 24. Thank you for your attention!