SlideShare a Scribd company logo
1 of 35
Download to read offline
Scaling up the
Extraction of
Canonical
Citations in
Classics
Matteo
Romanello
(DAI / KCL)
@mr56k
Prologue
Approach
Evaluation
Outlook
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Scaling up the Extraction of Canonical
Citations in Classics
Matteo Romanello (DAI / KCL) @mr56k
Humanitiés Numériques et Antiquité – 4 Sept. 2015
Scaling up the
Extraction of
Canonical
Citations in
Classics
Matteo
Romanello
(DAI / KCL)
@mr56k
Prologue
Approach
Evaluation
Outlook
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Prologue
Scaling up the
Extraction of
Canonical
Citations in
Classics
Matteo
Romanello
(DAI / KCL)
@mr56k
Prologue
Approach
Evaluation
Outlook
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Centrality of References
Referring as scholarly primitive (Unsworth):
Referring, Discovering, Annotating, Comparing, Sampling,
Illustrating, Representing
References in Classics:
canonical texts, fragmentary texts, inscriptions, papyri,
manuscripts, coins, etc.
Ubiquity of References
journal articles, reviews, monographs, indexes, commentaries
Scaling up the
Extraction of
Canonical
Citations in
Classics
Matteo
Romanello
(DAI / KCL)
@mr56k
Prologue
Approach
Evaluation
Outlook
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Classical Commentaries
Scaling up the
Extraction of
Canonical
Citations in
Classics
Matteo
Romanello
(DAI / KCL)
@mr56k
Prologue
Approach
Evaluation
Outlook
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Enhanced Reading (1)
Line-by-line Bibliographical Database of Wolfram von Eschenbach’s
Parzival, http://wolfram.lexcoll.com/txts/index.htm
Scaling up the
Extraction of
Canonical
Citations in
Classics
Matteo
Romanello
(DAI / KCL)
@mr56k
Prologue
Approach
Evaluation
Outlook
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Enhanced Reading (2)
http://labs.jstor.org/shakespeare/
Scaling up the
Extraction of
Canonical
Citations in
Classics
Matteo
Romanello
(DAI / KCL)
@mr56k
Prologue
Approach
Evaluation
Outlook
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Enhanced Reading (3)
Segetes, http://segetes.io/aeneid
Scaling up the
Extraction of
Canonical
Citations in
Classics
Matteo
Romanello
(DAI / KCL)
@mr56k
Prologue
Approach
Evaluation
Outlook
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Enhanced Reading (4)
Hellespont Project
http://gapvis.hellespont.dainst.org/#book/1/read/113/
Scaling up the
Extraction of
Canonical
Citations in
Classics
Matteo
Romanello
(DAI / KCL)
@mr56k
Prologue
Approach
Evaluation
Outlook
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Trends in Text Reception
Neville Morley, Number Crunching,
http://thesphinxblog.com/2015/06/25/number-crunching/
Scaling up the
Extraction of
Canonical
Citations in
Classics
Matteo
Romanello
(DAI / KCL)
@mr56k
Prologue
Approach
Evaluation
Outlook
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Citation Networks
with applications to:
1 search
2 document clustering
3 formal network analysis
Scaling up the
Extraction of
Canonical
Citations in
Classics
Matteo
Romanello
(DAI / KCL)
@mr56k
Prologue
Approach
Evaluation
Outlook
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Approach
Scaling up the
Extraction of
Canonical
Citations in
Classics
Matteo
Romanello
(DAI / KCL)
@mr56k
Prologue
Approach
Evaluation
Outlook
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Rationale
beyond string-based search
references not quotations
scalable approach:
language independent
applicable to large amounts of documents
easily adaptable to different materials and ways of
referencing
Examples:
In Statius’ « Achilleid » (2, 96-102) Achilles describes […]
e.g. Vergil, Aen. 12, 101-109 ; Lucan 1, 204-212 ; Statius,
Th. 12, 736-740 […]
Scaling up the
Extraction of
Canonical
Citations in
Classics
Matteo
Romanello
(DAI / KCL)
@mr56k
Prologue
Approach
Evaluation
Outlook
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Named Entity Recognition
Computer Science > Information Extraction > NER
Question Answering System
Q: where did Aaron Swartz die?
A: New York
Two days after the prosecution rejected a counter-offer by
Swartz, he was found dead in his Brooklyn, New York
apartment, where he had hanged himself.
3-step process:
1 Named Entity Recognition and Classification
2 Relation Extraction
3 Named Entity Disambiguation
Scaling up the
Extraction of
Canonical
Citations in
Classics
Matteo
Romanello
(DAI / KCL)
@mr56k
Prologue
Approach
Evaluation
Outlook
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Citation Extraction: Step 1 (NER)
Named Entities (= citation components):
AAUTHOR = ancient author
AWORK = ancient work
REFAUWORK = concise reference to author, work or both
(“Pliny, nat.”, “Thuc.”)
REFSCOPE = indication of the cited passage (“11, 4, 11”)
Scaling up the
Extraction of
Canonical
Citations in
Classics
Matteo
Romanello
(DAI / KCL)
@mr56k
Prologue
Approach
Evaluation
Outlook
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Citation Extraction: Step 2 (Relation Detection)
reference as relation vs. reference as monolithic entity
binary scope relation between two entities (arguments)
arg1: aauthor | awork | refauwork
arg2: refscope
examples:
Ammianus (15, 8, 7)
Trabajos 159–173”
Pliny, nat. 11, 4, 11
Scaling up the
Extraction of
Canonical
Citations in
Classics
Matteo
Romanello
(DAI / KCL)
@mr56k
Prologue
Approach
Evaluation
Outlook
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Citation Extraction: Step 3 (Disambiguation)
assign each author/work/canonical reference a unique ID
IDs are CTS URNs
Scaling up the
Extraction of
Canonical
Citations in
Classics
Matteo
Romanello
(DAI / KCL)
@mr56k
Prologue
Approach
Evaluation
Outlook
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Canonical Text Services (CTS) Unique Resource
Names (URNs)
A machine-readable syntax for canonical references [refs]
Pliny
urn:cts:latinLit:phi0978
Pliny’s NH
urn:cts:latinLit:phi0978.phi001
Pliny, Nat. 11,4,11
urn:cts:latinLit:phi0978.phi001:11.4.11
Scaling up the
Extraction of
Canonical
Citations in
Classics
Matteo
Romanello
(DAI / KCL)
@mr56k
Prologue
Approach
Evaluation
Outlook
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
The Extraction Pipeline
Scaling up the
Extraction of
Canonical
Citations in
Classics
Matteo
Romanello
(DAI / KCL)
@mr56k
Prologue
Approach
Evaluation
Outlook
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Evaluation
Scaling up the
Extraction of
Canonical
Citations in
Classics
Matteo
Romanello
(DAI / KCL)
@mr56k
Prologue
Approach
Evaluation
Outlook
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
L’Année philologique (APh)
http://www.annee-philologique.com/
Scaling up the
Extraction of
Canonical
Citations in
Classics
Matteo
Romanello
(DAI / KCL)
@mr56k
Prologue
Approach
Evaluation
Outlook
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
APh Example
APh 75-06697 => S. Braund & G. Gilbert. 2004. “An ABC of epic ira:
anger, beasts, and cannibalism” Yale Classical Studies 32:250-285
In Statius ’ « Achilleid » (2, 96-102) Achilles describes his diet
of wild animals in infancy, which rendered him fearless and may
indicate another aspect of his character - a tendency toward
aggression and anger.
The portrayal of angry warriors in Roman epic is effected for
the most part not by direct descriptions but indirectly, by
similes of wild beasts (e.g. Vergil, Aen. 12, 101-109;
Lucan 1, 204-212; Statius, Th. 12, 736-740; Silius 5, 306-315).
These similes may be compared to two passages from
Statius (Th. 1, 395-433 and 8, 383-394) that portray the onset
of anger in direct narrative. Analysis of these passages
demonstrates that the concept of « ira » in epic takes its moral
aspect from the context.
Scaling up the
Extraction of
Canonical
Citations in
Classics
Matteo
Romanello
(DAI / KCL)
@mr56k
Prologue
Approach
Evaluation
Outlook
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
The Data
APh
analytical reviews (en, de, fr, es, it)
80 volumes (1924-)
autom. processed vol. 75 (2004)
6,694 abstracts (total = 6,946; errors = 252)
350k tokens
3k citations
man. corrected ~8 % of vol. 75
366 abstracts
26k tokens
380 citations
Scaling up the
Extraction of
Canonical
Citations in
Classics
Matteo
Romanello
(DAI / KCL)
@mr56k
Prologue
Approach
Evaluation
Outlook
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Precision, Recall and F1 Score
https://commons.wikimedia.org/wiki/File:
Precisionrecall.svg
By Walber (Own work) [CC BY-SA 4.0]
Scaling up the
Extraction of
Canonical
Citations in
Classics
Matteo
Romanello
(DAI / KCL)
@mr56k
Prologue
Approach
Evaluation
Outlook
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Evaluation Summary
Task Precision Recall F1 Score
NER 79.24% 69.62% 73.88%
RelEx 93.33% 91.87% 92.60%
NED 61.04% 90.94% 73.05%
methods:
NER: machine learning-based
RelEx: rule-based
NED: rule-based + knowledge base
manually corrected ~8 % of vol. 75
366 abstracts
26k tokens
380 citations
Scaling up the
Extraction of
Canonical
Citations in
Classics
Matteo
Romanello
(DAI / KCL)
@mr56k
Prologue
Approach
Evaluation
Outlook
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
NER: Method
1 Linguistic Features (PoS Tag, neighbouring words)
2 Word-level Features:
punctuation
final_dot, quotation_mark, has_hyphen, bracket
case
mixed_caps, all_caps, init_caps, all_lower
number
roman, year, range, mixed_alphanum
patterns
“Avien.” –> “Aaaaa-” (expanded)
“Avien.” –> “Aa-” (compressed)
3 Semantic Features (matches against dictionaries)
Scaling up the
Extraction of
Canonical
Citations in
Classics
Matteo
Romanello
(DAI / KCL)
@mr56k
Prologue
Approach
Evaluation
Outlook
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
NER: Evaluation
Task: extraction of entities aauthor, awork, refauwork,
refscope
Algorithm Precision Recall F1 Score
CRF 79.24% 69.62% 73.88%
MaxEnt 75.29% 66.75% 70.43%
SVM 74.44% 70.21% 71.93%
Aauthor : P = 91.15%, R = 39.67%, F1 = 54.53%
Scaling up the
Extraction of
Canonical
Citations in
Classics
Matteo
Romanello
(DAI / KCL)
@mr56k
Prologue
Approach
Evaluation
Outlook
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
RelEx: Evaluation
rule-based method
Precision Recall F1 Score
93.33% 91.87% 92.60%
Missed scope relations:
du [REFSCOPE chant 4] de l’ [AWORK « Énéide » ]
Le [REFSCOPE livre 13 ] de la [AWORK « Chronique » ]
les [REFSCOPE v. 9–12 ] des [AWORK « Acharniens » ]
Scaling up the
Extraction of
Canonical
Citations in
Classics
Matteo
Romanello
(DAI / KCL)
@mr56k
Prologue
Approach
Evaluation
Outlook
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
NED: Method
Thuc. I 89, 1s.
1 match reference against knowledge base
exact and approximate string matching
approximate string matching:
edit_distance("Virgilio","Virgil") = 3
Thuc. → urn:cts:greekLit:tlg0003.tlg001
2 normalise the reference scope
e.g. 1.89.1–1.89.2
1.89.1–2
1, 89, 1–2
I 89, 1s.
3 assign unique ID
urn:cts:greekLit:tlg0003.tlg001:1.89.1–1.89.2
Scaling up the
Extraction of
Canonical
Citations in
Classics
Matteo
Romanello
(DAI / KCL)
@mr56k
Prologue
Approach
Evaluation
Outlook
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
NED: Knowledge Base
underlying model: HuCit, CIDOC-CRM & FRBRoo
usages:
extract abbreviations
resolve implicit refs:, e.g. “Herod. 4, 5-7”
validate citations, e.g. “Thuc. 1.100.9.4”
Scaling up the
Extraction of
Canonical
Citations in
Classics
Matteo
Romanello
(DAI / KCL)
@mr56k
Prologue
Approach
Evaluation
Outlook
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
NED: Evaluation
Matching Type Precision Recall F1 Score
Exact 58.33% 62.88% 60.52%
Approximate (n=4) 61.04% 90.94% 73.05%
Approximate (n=7) 58.94% 94.76% 72.67%
Scaling up the
Extraction of
Canonical
Citations in
Classics
Matteo
Romanello
(DAI / KCL)
@mr56k
Prologue
Approach
Evaluation
Outlook
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
NED: Error Types
1 abbreviation is highly ambiguous (without context)
But Horace undermines the suggestion that his own
poetry will forever represent the Augustan Age.
Carm. 4, 15 in fact […]
2 ambiguous author mention
Esame dell’ esegesi papiracea ad Aristofane :
permanenza del lavoro degli eruditi alessandrini […]
Scaling up the
Extraction of
Canonical
Citations in
Classics
Matteo
Romanello
(DAI / KCL)
@mr56k
Prologue
Approach
Evaluation
Outlook
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
NED: Error Types (contd.)
3 implied context (title of reviewed publ.)
Dans son chap. 5 sur le squelette et la respiration,
Lactance utilise des sources disparates et arrive aux
limites de son savoir médical.
4 ambiguously expressed reference
Analysis of the pederastic poems in the Theocritean
corpus (12 ; 23 ; 29 ; 30) reveals that Theocritus
reflects on mutuality in a relationship […]
Scaling up the
Extraction of
Canonical
Citations in
Classics
Matteo
Romanello
(DAI / KCL)
@mr56k
Prologue
Approach
Evaluation
Outlook
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Outlook
Scaling up the
Extraction of
Canonical
Citations in
Classics
Matteo
Romanello
(DAI / KCL)
@mr56k
Prologue
Approach
Evaluation
Outlook
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Future Plans
1 improve overall accuracy
test other methods for each processing step
more training data (and more specialised)
expand knowledge base
2 make software available to others
streamline installation
improve documentation
offer as web service
offer as part of a research infrastructure
3 apply on a larger scale
improve performances (optimisation)
use of high performance and parallel computing
Scaling up the
Extraction of
Canonical
Citations in
Classics
Matteo
Romanello
(DAI / KCL)
@mr56k
Prologue
Approach
Evaluation
Outlook
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Thank you for your attention!
Links
matteo.romanello@gmail.com
https://github.com/mromanello/CRefEx
https://github.com/mromanello/APh_Corpus

More Related Content

Similar to Scaling up the Extraction of Canonical Citations in Classics

Similar to Scaling up the Extraction of Canonical Citations in Classics (15)

Joel Christine_Master's Thesis-SU2012
Joel Christine_Master's Thesis-SU2012Joel Christine_Master's Thesis-SU2012
Joel Christine_Master's Thesis-SU2012
 
James_Atkinson_Dissertation
James_Atkinson_DissertationJames_Atkinson_Dissertation
James_Atkinson_Dissertation
 
3.alfonso faganet portman2014
3.alfonso faganet portman20143.alfonso faganet portman2014
3.alfonso faganet portman2014
 
Hssttx2
Hssttx2Hssttx2
Hssttx2
 
VILLAFRANCA-THESIS-2016
VILLAFRANCA-THESIS-2016VILLAFRANCA-THESIS-2016
VILLAFRANCA-THESIS-2016
 
misra et al,2014
misra et al,2014misra et al,2014
misra et al,2014
 
Francisco Davila resume
Francisco Davila resumeFrancisco Davila resume
Francisco Davila resume
 
Interpretation of European Cultural Heritage
Interpretation of European Cultural HeritageInterpretation of European Cultural Heritage
Interpretation of European Cultural Heritage
 
Duality in Physics
Duality in PhysicsDuality in Physics
Duality in Physics
 
Project Supernova Report
Project Supernova ReportProject Supernova Report
Project Supernova Report
 
BME 405 Final Report v03
BME 405 Final Report v03BME 405 Final Report v03
BME 405 Final Report v03
 
The Foundations of Faith in the Light of the Quran and the Sunnah
The Foundations of Faith in the Light of the Quran and the SunnahThe Foundations of Faith in the Light of the Quran and the Sunnah
The Foundations of Faith in the Light of the Quran and the Sunnah
 
Modelling The Human Cochlea, Ku, 2008
Modelling The Human Cochlea, Ku, 2008Modelling The Human Cochlea, Ku, 2008
Modelling The Human Cochlea, Ku, 2008
 
Ucs(gen)
Ucs(gen)Ucs(gen)
Ucs(gen)
 
doctor
doctordoctor
doctor
 

More from Matteo Romanello

Towards the Automatic Retrieval of Cited Parallel Passages from Secondary Lit...
Towards the Automatic Retrieval of Cited Parallel Passages from Secondary Lit...Towards the Automatic Retrieval of Cited Parallel Passages from Secondary Lit...
Towards the Automatic Retrieval of Cited Parallel Passages from Secondary Lit...Matteo Romanello
 
Transforming Indexes Locorum into Citation Networks
Transforming Indexes Locorum into Citation NetworksTransforming Indexes Locorum into Citation Networks
Transforming Indexes Locorum into Citation NetworksMatteo Romanello
 
Enhancing and Extending the Digital Study of Intertextuality (pt. 2): Reveali...
Enhancing and Extending the Digital Study of Intertextuality (pt. 2): Reveali...Enhancing and Extending the Digital Study of Intertextuality (pt. 2): Reveali...
Enhancing and Extending the Digital Study of Intertextuality (pt. 2): Reveali...Matteo Romanello
 
Introduction to the Text Reuse panel at DH 2014
Introduction to the Text Reuse panel at DH 2014Introduction to the Text Reuse panel at DH 2014
Introduction to the Text Reuse panel at DH 2014Matteo Romanello
 
Exploring Citation Networks to Study Intertextuality in Classics
Exploring Citation Networks to Study Intertextuality in ClassicsExploring Citation Networks to Study Intertextuality in Classics
Exploring Citation Networks to Study Intertextuality in ClassicsMatteo Romanello
 
DARIAH Geo-browser: Exploring Data through Time and Space
DARIAH Geo-browser: Exploring Data through Time and SpaceDARIAH Geo-browser: Exploring Data through Time and Space
DARIAH Geo-browser: Exploring Data through Time and SpaceMatteo Romanello
 
Greedy Enough for the Grid?
Greedy Enough for the Grid?Greedy Enough for the Grid?
Greedy Enough for the Grid?Matteo Romanello
 
Structured and Unstructured:Extracting Information From Classics Scholarly Texts
Structured and Unstructured:Extracting Information From Classics Scholarly TextsStructured and Unstructured:Extracting Information From Classics Scholarly Texts
Structured and Unstructured:Extracting Information From Classics Scholarly TextsMatteo Romanello
 
Stuctured Vs Unstructured: Extracting Information from Classics Scholarly Texts
Stuctured Vs Unstructured: Extracting Information from Classics Scholarly TextsStuctured Vs Unstructured: Extracting Information from Classics Scholarly Texts
Stuctured Vs Unstructured: Extracting Information from Classics Scholarly TextsMatteo Romanello
 
[poster] Extracting Information From Classics Scholarly Texts
[poster] Extracting Information From Classics Scholarly Texts[poster] Extracting Information From Classics Scholarly Texts
[poster] Extracting Information From Classics Scholarly TextsMatteo Romanello
 
DIGITAL HUMANITIES E FILOLOGIA Un'introduzione
DIGITAL HUMANITIES   E FILOLOGIA   Un'introduzioneDIGITAL HUMANITIES   E FILOLOGIA   Un'introduzione
DIGITAL HUMANITIES E FILOLOGIA Un'introduzioneMatteo Romanello
 
Rethinking Critical Editions of Fragments by Ontologies
Rethinking Critical Editions of Fragments by OntologiesRethinking Critical Editions of Fragments by Ontologies
Rethinking Critical Editions of Fragments by OntologiesMatteo Romanello
 
Presentatio @ ELPUB 2008, Toronto
Presentatio @ ELPUB 2008, TorontoPresentatio @ ELPUB 2008, Toronto
Presentatio @ ELPUB 2008, TorontoMatteo Romanello
 
Linking Primary and Secondary by Microformats
Linking Primary and Secondary by MicroformatsLinking Primary and Secondary by Microformats
Linking Primary and Secondary by MicroformatsMatteo Romanello
 
M. Romanello, E-scholia: scenari digitali per la comunicazione scientifica in...
M. Romanello, E-scholia: scenari digitali per la comunicazione scientifica in...M. Romanello, E-scholia: scenari digitali per la comunicazione scientifica in...
M. Romanello, E-scholia: scenari digitali per la comunicazione scientifica in...Matteo Romanello
 
M.Romanello Ecal Presentation
M.Romanello Ecal PresentationM.Romanello Ecal Presentation
M.Romanello Ecal PresentationMatteo Romanello
 

More from Matteo Romanello (18)

Towards the Automatic Retrieval of Cited Parallel Passages from Secondary Lit...
Towards the Automatic Retrieval of Cited Parallel Passages from Secondary Lit...Towards the Automatic Retrieval of Cited Parallel Passages from Secondary Lit...
Towards the Automatic Retrieval of Cited Parallel Passages from Secondary Lit...
 
Transforming Indexes Locorum into Citation Networks
Transforming Indexes Locorum into Citation NetworksTransforming Indexes Locorum into Citation Networks
Transforming Indexes Locorum into Citation Networks
 
Enhancing and Extending the Digital Study of Intertextuality (pt. 2): Reveali...
Enhancing and Extending the Digital Study of Intertextuality (pt. 2): Reveali...Enhancing and Extending the Digital Study of Intertextuality (pt. 2): Reveali...
Enhancing and Extending the Digital Study of Intertextuality (pt. 2): Reveali...
 
Introduction to the Text Reuse panel at DH 2014
Introduction to the Text Reuse panel at DH 2014Introduction to the Text Reuse panel at DH 2014
Introduction to the Text Reuse panel at DH 2014
 
Exploring Citation Networks to Study Intertextuality in Classics
Exploring Citation Networks to Study Intertextuality in ClassicsExploring Citation Networks to Study Intertextuality in Classics
Exploring Citation Networks to Study Intertextuality in Classics
 
DARIAH Geo-browser: Exploring Data through Time and Space
DARIAH Geo-browser: Exploring Data through Time and SpaceDARIAH Geo-browser: Exploring Data through Time and Space
DARIAH Geo-browser: Exploring Data through Time and Space
 
Greedy Enough for the Grid?
Greedy Enough for the Grid?Greedy Enough for the Grid?
Greedy Enough for the Grid?
 
Structured and Unstructured:Extracting Information From Classics Scholarly Texts
Structured and Unstructured:Extracting Information From Classics Scholarly TextsStructured and Unstructured:Extracting Information From Classics Scholarly Texts
Structured and Unstructured:Extracting Information From Classics Scholarly Texts
 
Romanello tokyo
Romanello tokyoRomanello tokyo
Romanello tokyo
 
Stuctured Vs Unstructured: Extracting Information from Classics Scholarly Texts
Stuctured Vs Unstructured: Extracting Information from Classics Scholarly TextsStuctured Vs Unstructured: Extracting Information from Classics Scholarly Texts
Stuctured Vs Unstructured: Extracting Information from Classics Scholarly Texts
 
[poster] Extracting Information From Classics Scholarly Texts
[poster] Extracting Information From Classics Scholarly Texts[poster] Extracting Information From Classics Scholarly Texts
[poster] Extracting Information From Classics Scholarly Texts
 
DIGITAL HUMANITIES E FILOLOGIA Un'introduzione
DIGITAL HUMANITIES   E FILOLOGIA   Un'introduzioneDIGITAL HUMANITIES   E FILOLOGIA   Un'introduzione
DIGITAL HUMANITIES E FILOLOGIA Un'introduzione
 
Ht159 Poster
Ht159 PosterHt159 Poster
Ht159 Poster
 
Rethinking Critical Editions of Fragments by Ontologies
Rethinking Critical Editions of Fragments by OntologiesRethinking Critical Editions of Fragments by Ontologies
Rethinking Critical Editions of Fragments by Ontologies
 
Presentatio @ ELPUB 2008, Toronto
Presentatio @ ELPUB 2008, TorontoPresentatio @ ELPUB 2008, Toronto
Presentatio @ ELPUB 2008, Toronto
 
Linking Primary and Secondary by Microformats
Linking Primary and Secondary by MicroformatsLinking Primary and Secondary by Microformats
Linking Primary and Secondary by Microformats
 
M. Romanello, E-scholia: scenari digitali per la comunicazione scientifica in...
M. Romanello, E-scholia: scenari digitali per la comunicazione scientifica in...M. Romanello, E-scholia: scenari digitali per la comunicazione scientifica in...
M. Romanello, E-scholia: scenari digitali per la comunicazione scientifica in...
 
M.Romanello Ecal Presentation
M.Romanello Ecal PresentationM.Romanello Ecal Presentation
M.Romanello Ecal Presentation
 

Recently uploaded

Q4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxQ4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxnelietumpap1
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxMaryGraceBautista27
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 

Recently uploaded (20)

Q4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxQ4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptx
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptx
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 

Scaling up the Extraction of Canonical Citations in Classics

  • 1. Scaling up the Extraction of Canonical Citations in Classics Matteo Romanello (DAI / KCL) @mr56k Prologue Approach Evaluation Outlook ..... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . ..... . .... . .... . Scaling up the Extraction of Canonical Citations in Classics Matteo Romanello (DAI / KCL) @mr56k Humanitiés Numériques et Antiquité – 4 Sept. 2015
  • 2. Scaling up the Extraction of Canonical Citations in Classics Matteo Romanello (DAI / KCL) @mr56k Prologue Approach Evaluation Outlook ..... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . ..... . .... . .... . Prologue
  • 3. Scaling up the Extraction of Canonical Citations in Classics Matteo Romanello (DAI / KCL) @mr56k Prologue Approach Evaluation Outlook ..... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . ..... . .... . .... . Centrality of References Referring as scholarly primitive (Unsworth): Referring, Discovering, Annotating, Comparing, Sampling, Illustrating, Representing References in Classics: canonical texts, fragmentary texts, inscriptions, papyri, manuscripts, coins, etc. Ubiquity of References journal articles, reviews, monographs, indexes, commentaries
  • 4. Scaling up the Extraction of Canonical Citations in Classics Matteo Romanello (DAI / KCL) @mr56k Prologue Approach Evaluation Outlook ..... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . ..... . .... . .... . Classical Commentaries
  • 5. Scaling up the Extraction of Canonical Citations in Classics Matteo Romanello (DAI / KCL) @mr56k Prologue Approach Evaluation Outlook ..... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . ..... . .... . .... . Enhanced Reading (1) Line-by-line Bibliographical Database of Wolfram von Eschenbach’s Parzival, http://wolfram.lexcoll.com/txts/index.htm
  • 6. Scaling up the Extraction of Canonical Citations in Classics Matteo Romanello (DAI / KCL) @mr56k Prologue Approach Evaluation Outlook ..... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . ..... . .... . .... . Enhanced Reading (2) http://labs.jstor.org/shakespeare/
  • 7. Scaling up the Extraction of Canonical Citations in Classics Matteo Romanello (DAI / KCL) @mr56k Prologue Approach Evaluation Outlook ..... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . ..... . .... . .... . Enhanced Reading (3) Segetes, http://segetes.io/aeneid
  • 8. Scaling up the Extraction of Canonical Citations in Classics Matteo Romanello (DAI / KCL) @mr56k Prologue Approach Evaluation Outlook ..... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . ..... . .... . .... . Enhanced Reading (4) Hellespont Project http://gapvis.hellespont.dainst.org/#book/1/read/113/
  • 9. Scaling up the Extraction of Canonical Citations in Classics Matteo Romanello (DAI / KCL) @mr56k Prologue Approach Evaluation Outlook ..... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . ..... . .... . .... . Trends in Text Reception Neville Morley, Number Crunching, http://thesphinxblog.com/2015/06/25/number-crunching/
  • 10. Scaling up the Extraction of Canonical Citations in Classics Matteo Romanello (DAI / KCL) @mr56k Prologue Approach Evaluation Outlook ..... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . ..... . .... . .... . Citation Networks with applications to: 1 search 2 document clustering 3 formal network analysis
  • 11. Scaling up the Extraction of Canonical Citations in Classics Matteo Romanello (DAI / KCL) @mr56k Prologue Approach Evaluation Outlook ..... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . ..... . .... . .... . Approach
  • 12. Scaling up the Extraction of Canonical Citations in Classics Matteo Romanello (DAI / KCL) @mr56k Prologue Approach Evaluation Outlook ..... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . ..... . .... . .... . Rationale beyond string-based search references not quotations scalable approach: language independent applicable to large amounts of documents easily adaptable to different materials and ways of referencing Examples: In Statius’ « Achilleid » (2, 96-102) Achilles describes […] e.g. Vergil, Aen. 12, 101-109 ; Lucan 1, 204-212 ; Statius, Th. 12, 736-740 […]
  • 13. Scaling up the Extraction of Canonical Citations in Classics Matteo Romanello (DAI / KCL) @mr56k Prologue Approach Evaluation Outlook ..... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . ..... . .... . .... . Named Entity Recognition Computer Science > Information Extraction > NER Question Answering System Q: where did Aaron Swartz die? A: New York Two days after the prosecution rejected a counter-offer by Swartz, he was found dead in his Brooklyn, New York apartment, where he had hanged himself. 3-step process: 1 Named Entity Recognition and Classification 2 Relation Extraction 3 Named Entity Disambiguation
  • 14. Scaling up the Extraction of Canonical Citations in Classics Matteo Romanello (DAI / KCL) @mr56k Prologue Approach Evaluation Outlook ..... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . ..... . .... . .... . Citation Extraction: Step 1 (NER) Named Entities (= citation components): AAUTHOR = ancient author AWORK = ancient work REFAUWORK = concise reference to author, work or both (“Pliny, nat.”, “Thuc.”) REFSCOPE = indication of the cited passage (“11, 4, 11”)
  • 15. Scaling up the Extraction of Canonical Citations in Classics Matteo Romanello (DAI / KCL) @mr56k Prologue Approach Evaluation Outlook ..... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . ..... . .... . .... . Citation Extraction: Step 2 (Relation Detection) reference as relation vs. reference as monolithic entity binary scope relation between two entities (arguments) arg1: aauthor | awork | refauwork arg2: refscope examples: Ammianus (15, 8, 7) Trabajos 159–173” Pliny, nat. 11, 4, 11
  • 16. Scaling up the Extraction of Canonical Citations in Classics Matteo Romanello (DAI / KCL) @mr56k Prologue Approach Evaluation Outlook ..... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . ..... . .... . .... . Citation Extraction: Step 3 (Disambiguation) assign each author/work/canonical reference a unique ID IDs are CTS URNs
  • 17. Scaling up the Extraction of Canonical Citations in Classics Matteo Romanello (DAI / KCL) @mr56k Prologue Approach Evaluation Outlook ..... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . ..... . .... . .... . Canonical Text Services (CTS) Unique Resource Names (URNs) A machine-readable syntax for canonical references [refs] Pliny urn:cts:latinLit:phi0978 Pliny’s NH urn:cts:latinLit:phi0978.phi001 Pliny, Nat. 11,4,11 urn:cts:latinLit:phi0978.phi001:11.4.11
  • 18. Scaling up the Extraction of Canonical Citations in Classics Matteo Romanello (DAI / KCL) @mr56k Prologue Approach Evaluation Outlook ..... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . ..... . .... . .... . The Extraction Pipeline
  • 19. Scaling up the Extraction of Canonical Citations in Classics Matteo Romanello (DAI / KCL) @mr56k Prologue Approach Evaluation Outlook ..... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . ..... . .... . .... . Evaluation
  • 20. Scaling up the Extraction of Canonical Citations in Classics Matteo Romanello (DAI / KCL) @mr56k Prologue Approach Evaluation Outlook ..... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . ..... . .... . .... . L’Année philologique (APh) http://www.annee-philologique.com/
  • 21. Scaling up the Extraction of Canonical Citations in Classics Matteo Romanello (DAI / KCL) @mr56k Prologue Approach Evaluation Outlook ..... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . ..... . .... . .... . APh Example APh 75-06697 => S. Braund & G. Gilbert. 2004. “An ABC of epic ira: anger, beasts, and cannibalism” Yale Classical Studies 32:250-285 In Statius ’ « Achilleid » (2, 96-102) Achilles describes his diet of wild animals in infancy, which rendered him fearless and may indicate another aspect of his character - a tendency toward aggression and anger. The portrayal of angry warriors in Roman epic is effected for the most part not by direct descriptions but indirectly, by similes of wild beasts (e.g. Vergil, Aen. 12, 101-109; Lucan 1, 204-212; Statius, Th. 12, 736-740; Silius 5, 306-315). These similes may be compared to two passages from Statius (Th. 1, 395-433 and 8, 383-394) that portray the onset of anger in direct narrative. Analysis of these passages demonstrates that the concept of « ira » in epic takes its moral aspect from the context.
  • 22. Scaling up the Extraction of Canonical Citations in Classics Matteo Romanello (DAI / KCL) @mr56k Prologue Approach Evaluation Outlook ..... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . ..... . .... . .... . The Data APh analytical reviews (en, de, fr, es, it) 80 volumes (1924-) autom. processed vol. 75 (2004) 6,694 abstracts (total = 6,946; errors = 252) 350k tokens 3k citations man. corrected ~8 % of vol. 75 366 abstracts 26k tokens 380 citations
  • 23. Scaling up the Extraction of Canonical Citations in Classics Matteo Romanello (DAI / KCL) @mr56k Prologue Approach Evaluation Outlook ..... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . ..... . .... . .... . Precision, Recall and F1 Score https://commons.wikimedia.org/wiki/File: Precisionrecall.svg By Walber (Own work) [CC BY-SA 4.0]
  • 24. Scaling up the Extraction of Canonical Citations in Classics Matteo Romanello (DAI / KCL) @mr56k Prologue Approach Evaluation Outlook ..... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . ..... . .... . .... . Evaluation Summary Task Precision Recall F1 Score NER 79.24% 69.62% 73.88% RelEx 93.33% 91.87% 92.60% NED 61.04% 90.94% 73.05% methods: NER: machine learning-based RelEx: rule-based NED: rule-based + knowledge base manually corrected ~8 % of vol. 75 366 abstracts 26k tokens 380 citations
  • 25. Scaling up the Extraction of Canonical Citations in Classics Matteo Romanello (DAI / KCL) @mr56k Prologue Approach Evaluation Outlook ..... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . ..... . .... . .... . NER: Method 1 Linguistic Features (PoS Tag, neighbouring words) 2 Word-level Features: punctuation final_dot, quotation_mark, has_hyphen, bracket case mixed_caps, all_caps, init_caps, all_lower number roman, year, range, mixed_alphanum patterns “Avien.” –> “Aaaaa-” (expanded) “Avien.” –> “Aa-” (compressed) 3 Semantic Features (matches against dictionaries)
  • 26. Scaling up the Extraction of Canonical Citations in Classics Matteo Romanello (DAI / KCL) @mr56k Prologue Approach Evaluation Outlook ..... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . ..... . .... . .... . NER: Evaluation Task: extraction of entities aauthor, awork, refauwork, refscope Algorithm Precision Recall F1 Score CRF 79.24% 69.62% 73.88% MaxEnt 75.29% 66.75% 70.43% SVM 74.44% 70.21% 71.93% Aauthor : P = 91.15%, R = 39.67%, F1 = 54.53%
  • 27. Scaling up the Extraction of Canonical Citations in Classics Matteo Romanello (DAI / KCL) @mr56k Prologue Approach Evaluation Outlook ..... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . ..... . .... . .... . RelEx: Evaluation rule-based method Precision Recall F1 Score 93.33% 91.87% 92.60% Missed scope relations: du [REFSCOPE chant 4] de l’ [AWORK « Énéide » ] Le [REFSCOPE livre 13 ] de la [AWORK « Chronique » ] les [REFSCOPE v. 9–12 ] des [AWORK « Acharniens » ]
  • 28. Scaling up the Extraction of Canonical Citations in Classics Matteo Romanello (DAI / KCL) @mr56k Prologue Approach Evaluation Outlook ..... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . ..... . .... . .... . NED: Method Thuc. I 89, 1s. 1 match reference against knowledge base exact and approximate string matching approximate string matching: edit_distance("Virgilio","Virgil") = 3 Thuc. → urn:cts:greekLit:tlg0003.tlg001 2 normalise the reference scope e.g. 1.89.1–1.89.2 1.89.1–2 1, 89, 1–2 I 89, 1s. 3 assign unique ID urn:cts:greekLit:tlg0003.tlg001:1.89.1–1.89.2
  • 29. Scaling up the Extraction of Canonical Citations in Classics Matteo Romanello (DAI / KCL) @mr56k Prologue Approach Evaluation Outlook ..... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . ..... . .... . .... . NED: Knowledge Base underlying model: HuCit, CIDOC-CRM & FRBRoo usages: extract abbreviations resolve implicit refs:, e.g. “Herod. 4, 5-7” validate citations, e.g. “Thuc. 1.100.9.4”
  • 30. Scaling up the Extraction of Canonical Citations in Classics Matteo Romanello (DAI / KCL) @mr56k Prologue Approach Evaluation Outlook ..... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . ..... . .... . .... . NED: Evaluation Matching Type Precision Recall F1 Score Exact 58.33% 62.88% 60.52% Approximate (n=4) 61.04% 90.94% 73.05% Approximate (n=7) 58.94% 94.76% 72.67%
  • 31. Scaling up the Extraction of Canonical Citations in Classics Matteo Romanello (DAI / KCL) @mr56k Prologue Approach Evaluation Outlook ..... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . ..... . .... . .... . NED: Error Types 1 abbreviation is highly ambiguous (without context) But Horace undermines the suggestion that his own poetry will forever represent the Augustan Age. Carm. 4, 15 in fact […] 2 ambiguous author mention Esame dell’ esegesi papiracea ad Aristofane : permanenza del lavoro degli eruditi alessandrini […]
  • 32. Scaling up the Extraction of Canonical Citations in Classics Matteo Romanello (DAI / KCL) @mr56k Prologue Approach Evaluation Outlook ..... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . ..... . .... . .... . NED: Error Types (contd.) 3 implied context (title of reviewed publ.) Dans son chap. 5 sur le squelette et la respiration, Lactance utilise des sources disparates et arrive aux limites de son savoir médical. 4 ambiguously expressed reference Analysis of the pederastic poems in the Theocritean corpus (12 ; 23 ; 29 ; 30) reveals that Theocritus reflects on mutuality in a relationship […]
  • 33. Scaling up the Extraction of Canonical Citations in Classics Matteo Romanello (DAI / KCL) @mr56k Prologue Approach Evaluation Outlook ..... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . ..... . .... . .... . Outlook
  • 34. Scaling up the Extraction of Canonical Citations in Classics Matteo Romanello (DAI / KCL) @mr56k Prologue Approach Evaluation Outlook ..... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . ..... . .... . .... . Future Plans 1 improve overall accuracy test other methods for each processing step more training data (and more specialised) expand knowledge base 2 make software available to others streamline installation improve documentation offer as web service offer as part of a research infrastructure 3 apply on a larger scale improve performances (optimisation) use of high performance and parallel computing
  • 35. Scaling up the Extraction of Canonical Citations in Classics Matteo Romanello (DAI / KCL) @mr56k Prologue Approach Evaluation Outlook ..... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . ..... . .... . .... . Thank you for your attention! Links matteo.romanello@gmail.com https://github.com/mromanello/CRefEx https://github.com/mromanello/APh_Corpus