SlideShare a Scribd company logo
1 of 21
Download to read offline
What digital corpora for Ancient History?
Linguistic Annotation of Thucydides 1.98-118




Treebanking in the World of Thucydides
     Linguistic annotation for the Hellespont Project


                             Francesco Mambrini

                             Center For Hellenic Studies

                        Deutsches Archäologisches Institut


                              November 20 2012




                                                Hellespont Project
What digital corpora for Ancient History?
        Linguistic Annotation of Thucydides 1.98-118


Outline



  1   What digital corpora for Ancient History?
       The questions at hand
       Data-driven approaches


  2   Linguistic Annotation of Thucydides 1.98-118
         The Hellespont Project
         Examples




                                                        Hellespont Project
What digital corpora for Ancient History?   The questions at hand
        Linguistic Annotation of Thucydides 1.98-118    Data-driven approaches


Outline



  1   What digital corpora for Ancient History?
       The questions at hand
       Data-driven approaches


  2   Linguistic Annotation of Thucydides 1.98-118
         The Hellespont Project
         Examples




                                                        Hellespont Project
What digital corpora for Ancient History?   The questions at hand
     Linguistic Annotation of Thucydides 1.98-118    Data-driven approaches


A web of knowledge




                               Figure: A simplified model


                                                     Hellespont Project
What digital corpora for Ancient History?   The questions at hand
      Linguistic Annotation of Thucydides 1.98-118    Data-driven approaches


Interconnectedness: the problem


     The multivalent nature of historical thought [. . . ]
     eludes the keyword-indexed approach to the Web
     today on offer through Google and other search
     engines. Though we can summon up an exhaustive
     list of Web resources that contain the words “Gallipoli”
     and “sources”, today’s Web cannot effectively respond
     to a basic historical question such as, “which sources
     attest the Gallipoli Campaign of World War I?”


                                                                               B. Robertson




                                                      Hellespont Project
What digital corpora for Ancient History?   The questions at hand
       Linguistic Annotation of Thucydides 1.98-118    Data-driven approaches


CIDOC Conceptual Reference Model

  Objects represented as being part of events




                            Figure: by Doer and Stead 2009



                                                       Hellespont Project
What digital corpora for Ancient History?   The questions at hand
         Linguistic Annotation of Thucydides 1.98-118    Data-driven approaches


One more problem!
Know what our sources are!




        big and complex works; e.g. Thucydides:
               6.126 sentences, 167.512 words
               ca 30 years of war, + 50 years in digression, references that
               go back to before the Trojan War!
        Unstructured natural language
        Written in Ancient Greek
        Controversial (interpretation and textual reconstruction)
        Literary work (= shaped by discursive and ideological
        strategies)




                                                         Hellespont Project
What digital corpora for Ancient History?   The questions at hand
        Linguistic Annotation of Thucydides 1.98-118    Data-driven approaches


Outline



  1   What digital corpora for Ancient History?
       The questions at hand
       Data-driven approaches


  2   Linguistic Annotation of Thucydides 1.98-118
         The Hellespont Project
         Examples




                                                        Hellespont Project
What digital corpora for Ancient History?   The questions at hand
      Linguistic Annotation of Thucydides 1.98-118    Data-driven approaches


Ontologiemodellierung für die Erforschung von
Ritualstrukturen (SBF 619, Heidelberg)




                         Figure: Event extraction from texts




                                                      Hellespont Project
What digital corpora for Ancient History?   The questions at hand
     Linguistic Annotation of Thucydides 1.98-118    Data-driven approaches


NLP Pipeline


           NLP Process                                         Ancient Greek?

           Chunking
           Lemmatization
           POS-tagging
           Syntactic parsing
           Word-sense disambiguation
           Co-reference resolution
           Semantic role annotation



                                                     Hellespont Project
What digital corpora for Ancient History?   The questions at hand
         Linguistic Annotation of Thucydides 1.98-118    Data-driven approaches


Using and Enhancing the available resources
The Ancient Greek Dependency Treebank




        AGDT: treebank with word-by-word morphological and
              dependency-based syntactical description
   a step forward: semantic information


                                                         Hellespont Project
What digital corpora for Ancient History?   The questions at hand
         Linguistic Annotation of Thucydides 1.98-118    Data-driven approaches


A syntactic tree
Thuc. 1.89.1




                                                         Hellespont Project
What digital corpora for Ancient History?   The Hellespont Project
        Linguistic Annotation of Thucydides 1.98-118    Examples


Outline



  1   What digital corpora for Ancient History?
       The questions at hand
       Data-driven approaches


  2   Linguistic Annotation of Thucydides 1.98-118
         The Hellespont Project
         Examples




                                                        Hellespont Project
What digital corpora for Ancient History?   The Hellespont Project
         Linguistic Annotation of Thucydides 1.98-118    Examples


A case study
Athens, 479-431 BCE



   Goal:
      Connecting textual and archaeological sources in the
      Perseus DL and Arachne via CIDOC-CRM
   Steps:
        Enriching the text of one source (Thucydides) with
        linguistic and historical information
        Identify and mark events on the text
               manually
               data-driven approach
        Integrating secondary literature (through data mining
        algorithms)


                                                         Hellespont Project
What digital corpora for Ancient History?   The Hellespont Project
         Linguistic Annotation of Thucydides 1.98-118    Examples


Toward a 3-level scenario
Morphology and Syntax




                                                         Hellespont Project
What digital corpora for Ancient History?   The Hellespont Project
         Linguistic Annotation of Thucydides 1.98-118    Examples


Toward a 3-level scenario
+ semantic and pragmatical information




                                                         Hellespont Project
What digital corpora for Ancient History?   The Hellespont Project
        Linguistic Annotation of Thucydides 1.98-118    Examples


Outline



  1   What digital corpora for Ancient History?
       The questions at hand
       Data-driven approaches


  2   Linguistic Annotation of Thucydides 1.98-118
         The Hellespont Project
         Examples




                                                        Hellespont Project
What digital corpora for Ancient History?   The Hellespont Project
        Linguistic Annotation of Thucydides 1.98-118    Examples


With tectogrammatical annotation:




  Our text is:
    1   easier to browse for content-related search (easier to use
        in digital environments)
    2   more informative on historically relevant questions




                                                        Hellespont Project
What digital corpora for Ancient History?   The Hellespont Project
        Linguistic Annotation of Thucydides 1.98-118    Examples


With tectogrammatical annotation:




  Our text is:
    1   easier to browse for content-related search (easier to use
        in digital environments)
    2   more informative on historically relevant questions




                                                        Hellespont Project
What digital corpora for Ancient History?   The Hellespont Project
        Linguistic Annotation of Thucydides 1.98-118    Examples


With tectogrammatical annotation:




  Our text is:
    1   easier to browse for content-related search (easier to use
        in digital environments)
    2   more informative on historically relevant questions




                                                        Hellespont Project
What digital corpora for Ancient History?   The Hellespont Project
       Linguistic Annotation of Thucydides 1.98-118    Examples


Conclusions



   1   Currently, our literary sources are not structured for
       semantic, event-based queries
   2   NLP processes for event extraction are not yet capable of
       handling raw Ancient Greek texts
   3   NLP tools and techniques are adaptable to the task
             provide standards
             help and speed manual annotation
             (incidentally) they add a lot of information on linguistic
             aspects of the documentary sources




                                                       Hellespont Project

More Related Content

Similar to Linguistic Annotation of Thucydides Advances Ancient History Research

Text as a Resource. Text Mining in Historical Science #dhiha7
Text as a Resource. Text Mining in Historical Science #dhiha7Text as a Resource. Text Mining in Historical Science #dhiha7
Text as a Resource. Text Mining in Historical Science #dhiha7DHI_Paris
 
Europeana 1914-1918, User-Generated Content and Linked Open Data
Europeana 1914-1918, User-Generated Content and Linked Open DataEuropeana 1914-1918, User-Generated Content and Linked Open Data
Europeana 1914-1918, User-Generated Content and Linked Open DataValentine Charles
 
Datech2014 Session 2 - Automated Assignment of Topics to OCRed Texts
Datech2014 Session 2 - Automated Assignment of Topics to OCRed TextsDatech2014 Session 2 - Automated Assignment of Topics to OCRed Texts
Datech2014 Session 2 - Automated Assignment of Topics to OCRed TextsIMPACT Centre of Competence
 
Tex tworkshop2013
Tex tworkshop2013Tex tworkshop2013
Tex tworkshop2013stephhodde
 
Open Philology @ Citizen Cyberscience Summit
Open Philology @ Citizen Cyberscience SummitOpen Philology @ Citizen Cyberscience Summit
Open Philology @ Citizen Cyberscience SummitOpenPhilologyProject
 
Ontologies and the humanities: some issues affecting the design of digital in...
Ontologies and the humanities: some issues affecting the design of digital in...Ontologies and the humanities: some issues affecting the design of digital in...
Ontologies and the humanities: some issues affecting the design of digital in...Toby Burrows
 
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 1...
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 1...Europeana meeting under Finland’s Presidency of the Council of the EU - Day 1...
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 1...Europeana
 
What does it mean to be (en)languaged in a world of vulnerability, discrimina...
What does it mean to be (en)languaged in a world of vulnerability, discrimina...What does it mean to be (en)languaged in a world of vulnerability, discrimina...
What does it mean to be (en)languaged in a world of vulnerability, discrimina...RMBorders
 
(Un)writing the histories of Humanities Computing(s)
(Un)writing the histories of Humanities Computing(s)(Un)writing the histories of Humanities Computing(s)
(Un)writing the histories of Humanities Computing(s)Edward Vanhoutte
 

Similar to Linguistic Annotation of Thucydides Advances Ancient History Research (9)

Text as a Resource. Text Mining in Historical Science #dhiha7
Text as a Resource. Text Mining in Historical Science #dhiha7Text as a Resource. Text Mining in Historical Science #dhiha7
Text as a Resource. Text Mining in Historical Science #dhiha7
 
Europeana 1914-1918, User-Generated Content and Linked Open Data
Europeana 1914-1918, User-Generated Content and Linked Open DataEuropeana 1914-1918, User-Generated Content and Linked Open Data
Europeana 1914-1918, User-Generated Content and Linked Open Data
 
Datech2014 Session 2 - Automated Assignment of Topics to OCRed Texts
Datech2014 Session 2 - Automated Assignment of Topics to OCRed TextsDatech2014 Session 2 - Automated Assignment of Topics to OCRed Texts
Datech2014 Session 2 - Automated Assignment of Topics to OCRed Texts
 
Tex tworkshop2013
Tex tworkshop2013Tex tworkshop2013
Tex tworkshop2013
 
Open Philology @ Citizen Cyberscience Summit
Open Philology @ Citizen Cyberscience SummitOpen Philology @ Citizen Cyberscience Summit
Open Philology @ Citizen Cyberscience Summit
 
Ontologies and the humanities: some issues affecting the design of digital in...
Ontologies and the humanities: some issues affecting the design of digital in...Ontologies and the humanities: some issues affecting the design of digital in...
Ontologies and the humanities: some issues affecting the design of digital in...
 
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 1...
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 1...Europeana meeting under Finland’s Presidency of the Council of the EU - Day 1...
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 1...
 
What does it mean to be (en)languaged in a world of vulnerability, discrimina...
What does it mean to be (en)languaged in a world of vulnerability, discrimina...What does it mean to be (en)languaged in a world of vulnerability, discrimina...
What does it mean to be (en)languaged in a world of vulnerability, discrimina...
 
(Un)writing the histories of Humanities Computing(s)
(Un)writing the histories of Humanities Computing(s)(Un)writing the histories of Humanities Computing(s)
(Un)writing the histories of Humanities Computing(s)
 

More from Digital Classicist Seminar Berlin

[DCSB] Aline Deicke (Digital Academy Mainz) From E19 to MATCH and MERGE. Mapp...
[DCSB] Aline Deicke (Digital Academy Mainz) From E19 to MATCH and MERGE. Mapp...[DCSB] Aline Deicke (Digital Academy Mainz) From E19 to MATCH and MERGE. Mapp...
[DCSB] Aline Deicke (Digital Academy Mainz) From E19 to MATCH and MERGE. Mapp...Digital Classicist Seminar Berlin
 
[DCSB] Wolfgang Schmidle et al. (DAI) chronOntology: A time gazetteer with pr...
[DCSB] Wolfgang Schmidle et al. (DAI) chronOntology: A time gazetteer with pr...[DCSB] Wolfgang Schmidle et al. (DAI) chronOntology: A time gazetteer with pr...
[DCSB] Wolfgang Schmidle et al. (DAI) chronOntology: A time gazetteer with pr...Digital Classicist Seminar Berlin
 
[DCSB] Chiara Palladino & Tariq Youssef (Leipzig) iAligner: a tool for syntax...
[DCSB] Chiara Palladino & Tariq Youssef (Leipzig) iAligner: a tool for syntax...[DCSB] Chiara Palladino & Tariq Youssef (Leipzig) iAligner: a tool for syntax...
[DCSB] Chiara Palladino & Tariq Youssef (Leipzig) iAligner: a tool for syntax...Digital Classicist Seminar Berlin
 
[DCSB] Katherine Crawford (Southampton) In the Footsteps of the Gods: network...
[DCSB] Katherine Crawford (Southampton) In the Footsteps of the Gods: network...[DCSB] Katherine Crawford (Southampton) In the Footsteps of the Gods: network...
[DCSB] Katherine Crawford (Southampton) In the Footsteps of the Gods: network...Digital Classicist Seminar Berlin
 
[DCSB] Nathan Gibson (Vanderbilt) Toward a Cyberinfrastructure for Syriac Lit...
[DCSB] Nathan Gibson (Vanderbilt) Toward a Cyberinfrastructure for Syriac Lit...[DCSB] Nathan Gibson (Vanderbilt) Toward a Cyberinfrastructure for Syriac Lit...
[DCSB] Nathan Gibson (Vanderbilt) Toward a Cyberinfrastructure for Syriac Lit...Digital Classicist Seminar Berlin
 
[DCSB] Duncan Keenan-Jones (Glasgow) Digital Experimental Archaeology: Hero o...
[DCSB] Duncan Keenan-Jones (Glasgow) Digital Experimental Archaeology: Hero o...[DCSB] Duncan Keenan-Jones (Glasgow) Digital Experimental Archaeology: Hero o...
[DCSB] Duncan Keenan-Jones (Glasgow) Digital Experimental Archaeology: Hero o...Digital Classicist Seminar Berlin
 
[DCSB] Undine Lieberwirth & Axel Gering (TOPOI) 3D GIS in archaeology – a mic...
[DCSB] Undine Lieberwirth & Axel Gering (TOPOI) 3D GIS in archaeology – a mic...[DCSB] Undine Lieberwirth & Axel Gering (TOPOI) 3D GIS in archaeology – a mic...
[DCSB] Undine Lieberwirth & Axel Gering (TOPOI) 3D GIS in archaeology – a mic...Digital Classicist Seminar Berlin
 
[DCSB] Christian Prager (Bonn) Of Codes, Glyphs and Kings: Tasks, Limits and ...
[DCSB] Christian Prager (Bonn) Of Codes, Glyphs and Kings: Tasks, Limits and ...[DCSB] Christian Prager (Bonn) Of Codes, Glyphs and Kings: Tasks, Limits and ...
[DCSB] Christian Prager (Bonn) Of Codes, Glyphs and Kings: Tasks, Limits and ...Digital Classicist Seminar Berlin
 
[DCSB] Silvia Polla (Topoi) Between Demography and Consumption: Digital and Q...
[DCSB] Silvia Polla (Topoi) Between Demography and Consumption: Digital and Q...[DCSB] Silvia Polla (Topoi) Between Demography and Consumption: Digital and Q...
[DCSB] Silvia Polla (Topoi) Between Demography and Consumption: Digital and Q...Digital Classicist Seminar Berlin
 
[DCSB] Pau de Soto (University of Southampton), “Network Analysis to Understa...
[DCSB] Pau de Soto (University of Southampton), “Network Analysis to Understa...[DCSB] Pau de Soto (University of Southampton), “Network Analysis to Understa...
[DCSB] Pau de Soto (University of Southampton), “Network Analysis to Understa...Digital Classicist Seminar Berlin
 
[DCSB] Torsten Roeder (Julius Maximilian University of Würzburg) and Yury Arz...
[DCSB] Torsten Roeder (Julius Maximilian University of Würzburg) and Yury Arz...[DCSB] Torsten Roeder (Julius Maximilian University of Würzburg) and Yury Arz...
[DCSB] Torsten Roeder (Julius Maximilian University of Würzburg) and Yury Arz...Digital Classicist Seminar Berlin
 
[DCSB] Christian Fron (University of Stuttgart), “Beyond the visual. The acou...
[DCSB] Christian Fron (University of Stuttgart), “Beyond the visual. The acou...[DCSB] Christian Fron (University of Stuttgart), “Beyond the visual. The acou...
[DCSB] Christian Fron (University of Stuttgart), “Beyond the visual. The acou...Digital Classicist Seminar Berlin
 
[DCSB] Silke Vanbeselaere (KU Leuven), “Love Thy (Theban) Neighbours, or how ...
[DCSB] Silke Vanbeselaere (KU Leuven), “Love Thy (Theban) Neighbours, or how ...[DCSB] Silke Vanbeselaere (KU Leuven), “Love Thy (Theban) Neighbours, or how ...
[DCSB] Silke Vanbeselaere (KU Leuven), “Love Thy (Theban) Neighbours, or how ...Digital Classicist Seminar Berlin
 
[DCSB] Jorit Wintjes (University of Würzburg), “Diekplous! – understanding an...
[DCSB] Jorit Wintjes (University of Würzburg), “Diekplous! – understanding an...[DCSB] Jorit Wintjes (University of Würzburg), “Diekplous! – understanding an...
[DCSB] Jorit Wintjes (University of Würzburg), “Diekplous! – understanding an...Digital Classicist Seminar Berlin
 
[DCSB] Gregory Crane (University of Leipzig): "Digital Philology, World Liter...
[DCSB] Gregory Crane (University of Leipzig): "Digital Philology, World Liter...[DCSB] Gregory Crane (University of Leipzig): "Digital Philology, World Liter...
[DCSB] Gregory Crane (University of Leipzig): "Digital Philology, World Liter...Digital Classicist Seminar Berlin
 
[DCSB] Chris Forstall, Lavinia Galli Milić (University of Geneva): "Thematic ...
[DCSB] Chris Forstall, Lavinia Galli Milić (University of Geneva): "Thematic ...[DCSB] Chris Forstall, Lavinia Galli Milić (University of Geneva): "Thematic ...
[DCSB] Chris Forstall, Lavinia Galli Milić (University of Geneva): "Thematic ...Digital Classicist Seminar Berlin
 
Kathryn Piquette (U of Cologne), "The Herculaneum Papyri and Greek Magical Te...
Kathryn Piquette (U of Cologne), "The Herculaneum Papyri and Greek Magical Te...Kathryn Piquette (U of Cologne), "The Herculaneum Papyri and Greek Magical Te...
Kathryn Piquette (U of Cologne), "The Herculaneum Papyri and Greek Magical Te...Digital Classicist Seminar Berlin
 
[DCSB] Gabriel Bodard / Faith Lawrence (KCL), "Standards for Networking Ancie...
[DCSB] Gabriel Bodard / Faith Lawrence (KCL), "Standards for Networking Ancie...[DCSB] Gabriel Bodard / Faith Lawrence (KCL), "Standards for Networking Ancie...
[DCSB] Gabriel Bodard / Faith Lawrence (KCL), "Standards for Networking Ancie...Digital Classicist Seminar Berlin
 
[DCSB] Tom Brughmans (U of Konstanz), "Roman bazaar or market economy? Explai...
[DCSB] Tom Brughmans (U of Konstanz), "Roman bazaar or market economy? Explai...[DCSB] Tom Brughmans (U of Konstanz), "Roman bazaar or market economy? Explai...
[DCSB] Tom Brughmans (U of Konstanz), "Roman bazaar or market economy? Explai...Digital Classicist Seminar Berlin
 
[DCSB] Yannick Anné and Toon Van Hal (U of Leuven), "Creating a Dynamic Gramm...
[DCSB] Yannick Anné and Toon Van Hal (U of Leuven), "Creating a Dynamic Gramm...[DCSB] Yannick Anné and Toon Van Hal (U of Leuven), "Creating a Dynamic Gramm...
[DCSB] Yannick Anné and Toon Van Hal (U of Leuven), "Creating a Dynamic Gramm...Digital Classicist Seminar Berlin
 

More from Digital Classicist Seminar Berlin (20)

[DCSB] Aline Deicke (Digital Academy Mainz) From E19 to MATCH and MERGE. Mapp...
[DCSB] Aline Deicke (Digital Academy Mainz) From E19 to MATCH and MERGE. Mapp...[DCSB] Aline Deicke (Digital Academy Mainz) From E19 to MATCH and MERGE. Mapp...
[DCSB] Aline Deicke (Digital Academy Mainz) From E19 to MATCH and MERGE. Mapp...
 
[DCSB] Wolfgang Schmidle et al. (DAI) chronOntology: A time gazetteer with pr...
[DCSB] Wolfgang Schmidle et al. (DAI) chronOntology: A time gazetteer with pr...[DCSB] Wolfgang Schmidle et al. (DAI) chronOntology: A time gazetteer with pr...
[DCSB] Wolfgang Schmidle et al. (DAI) chronOntology: A time gazetteer with pr...
 
[DCSB] Chiara Palladino & Tariq Youssef (Leipzig) iAligner: a tool for syntax...
[DCSB] Chiara Palladino & Tariq Youssef (Leipzig) iAligner: a tool for syntax...[DCSB] Chiara Palladino & Tariq Youssef (Leipzig) iAligner: a tool for syntax...
[DCSB] Chiara Palladino & Tariq Youssef (Leipzig) iAligner: a tool for syntax...
 
[DCSB] Katherine Crawford (Southampton) In the Footsteps of the Gods: network...
[DCSB] Katherine Crawford (Southampton) In the Footsteps of the Gods: network...[DCSB] Katherine Crawford (Southampton) In the Footsteps of the Gods: network...
[DCSB] Katherine Crawford (Southampton) In the Footsteps of the Gods: network...
 
[DCSB] Nathan Gibson (Vanderbilt) Toward a Cyberinfrastructure for Syriac Lit...
[DCSB] Nathan Gibson (Vanderbilt) Toward a Cyberinfrastructure for Syriac Lit...[DCSB] Nathan Gibson (Vanderbilt) Toward a Cyberinfrastructure for Syriac Lit...
[DCSB] Nathan Gibson (Vanderbilt) Toward a Cyberinfrastructure for Syriac Lit...
 
[DCSB] Duncan Keenan-Jones (Glasgow) Digital Experimental Archaeology: Hero o...
[DCSB] Duncan Keenan-Jones (Glasgow) Digital Experimental Archaeology: Hero o...[DCSB] Duncan Keenan-Jones (Glasgow) Digital Experimental Archaeology: Hero o...
[DCSB] Duncan Keenan-Jones (Glasgow) Digital Experimental Archaeology: Hero o...
 
[DCSB] Undine Lieberwirth & Axel Gering (TOPOI) 3D GIS in archaeology – a mic...
[DCSB] Undine Lieberwirth & Axel Gering (TOPOI) 3D GIS in archaeology – a mic...[DCSB] Undine Lieberwirth & Axel Gering (TOPOI) 3D GIS in archaeology – a mic...
[DCSB] Undine Lieberwirth & Axel Gering (TOPOI) 3D GIS in archaeology – a mic...
 
[DCSB] Christian Prager (Bonn) Of Codes, Glyphs and Kings: Tasks, Limits and ...
[DCSB] Christian Prager (Bonn) Of Codes, Glyphs and Kings: Tasks, Limits and ...[DCSB] Christian Prager (Bonn) Of Codes, Glyphs and Kings: Tasks, Limits and ...
[DCSB] Christian Prager (Bonn) Of Codes, Glyphs and Kings: Tasks, Limits and ...
 
[DCSB] Silvia Polla (Topoi) Between Demography and Consumption: Digital and Q...
[DCSB] Silvia Polla (Topoi) Between Demography and Consumption: Digital and Q...[DCSB] Silvia Polla (Topoi) Between Demography and Consumption: Digital and Q...
[DCSB] Silvia Polla (Topoi) Between Demography and Consumption: Digital and Q...
 
[DCSB] Pau de Soto (University of Southampton), “Network Analysis to Understa...
[DCSB] Pau de Soto (University of Southampton), “Network Analysis to Understa...[DCSB] Pau de Soto (University of Southampton), “Network Analysis to Understa...
[DCSB] Pau de Soto (University of Southampton), “Network Analysis to Understa...
 
[DCSB] Torsten Roeder (Julius Maximilian University of Würzburg) and Yury Arz...
[DCSB] Torsten Roeder (Julius Maximilian University of Würzburg) and Yury Arz...[DCSB] Torsten Roeder (Julius Maximilian University of Würzburg) and Yury Arz...
[DCSB] Torsten Roeder (Julius Maximilian University of Würzburg) and Yury Arz...
 
[DCSB] Christian Fron (University of Stuttgart), “Beyond the visual. The acou...
[DCSB] Christian Fron (University of Stuttgart), “Beyond the visual. The acou...[DCSB] Christian Fron (University of Stuttgart), “Beyond the visual. The acou...
[DCSB] Christian Fron (University of Stuttgart), “Beyond the visual. The acou...
 
[DCSB] Silke Vanbeselaere (KU Leuven), “Love Thy (Theban) Neighbours, or how ...
[DCSB] Silke Vanbeselaere (KU Leuven), “Love Thy (Theban) Neighbours, or how ...[DCSB] Silke Vanbeselaere (KU Leuven), “Love Thy (Theban) Neighbours, or how ...
[DCSB] Silke Vanbeselaere (KU Leuven), “Love Thy (Theban) Neighbours, or how ...
 
[DCSB] Jorit Wintjes (University of Würzburg), “Diekplous! – understanding an...
[DCSB] Jorit Wintjes (University of Würzburg), “Diekplous! – understanding an...[DCSB] Jorit Wintjes (University of Würzburg), “Diekplous! – understanding an...
[DCSB] Jorit Wintjes (University of Würzburg), “Diekplous! – understanding an...
 
[DCSB] Gregory Crane (University of Leipzig): "Digital Philology, World Liter...
[DCSB] Gregory Crane (University of Leipzig): "Digital Philology, World Liter...[DCSB] Gregory Crane (University of Leipzig): "Digital Philology, World Liter...
[DCSB] Gregory Crane (University of Leipzig): "Digital Philology, World Liter...
 
[DCSB] Chris Forstall, Lavinia Galli Milić (University of Geneva): "Thematic ...
[DCSB] Chris Forstall, Lavinia Galli Milić (University of Geneva): "Thematic ...[DCSB] Chris Forstall, Lavinia Galli Milić (University of Geneva): "Thematic ...
[DCSB] Chris Forstall, Lavinia Galli Milić (University of Geneva): "Thematic ...
 
Kathryn Piquette (U of Cologne), "The Herculaneum Papyri and Greek Magical Te...
Kathryn Piquette (U of Cologne), "The Herculaneum Papyri and Greek Magical Te...Kathryn Piquette (U of Cologne), "The Herculaneum Papyri and Greek Magical Te...
Kathryn Piquette (U of Cologne), "The Herculaneum Papyri and Greek Magical Te...
 
[DCSB] Gabriel Bodard / Faith Lawrence (KCL), "Standards for Networking Ancie...
[DCSB] Gabriel Bodard / Faith Lawrence (KCL), "Standards for Networking Ancie...[DCSB] Gabriel Bodard / Faith Lawrence (KCL), "Standards for Networking Ancie...
[DCSB] Gabriel Bodard / Faith Lawrence (KCL), "Standards for Networking Ancie...
 
[DCSB] Tom Brughmans (U of Konstanz), "Roman bazaar or market economy? Explai...
[DCSB] Tom Brughmans (U of Konstanz), "Roman bazaar or market economy? Explai...[DCSB] Tom Brughmans (U of Konstanz), "Roman bazaar or market economy? Explai...
[DCSB] Tom Brughmans (U of Konstanz), "Roman bazaar or market economy? Explai...
 
[DCSB] Yannick Anné and Toon Van Hal (U of Leuven), "Creating a Dynamic Gramm...
[DCSB] Yannick Anné and Toon Van Hal (U of Leuven), "Creating a Dynamic Gramm...[DCSB] Yannick Anné and Toon Van Hal (U of Leuven), "Creating a Dynamic Gramm...
[DCSB] Yannick Anné and Toon Van Hal (U of Leuven), "Creating a Dynamic Gramm...
 

Linguistic Annotation of Thucydides Advances Ancient History Research

  • 1. What digital corpora for Ancient History? Linguistic Annotation of Thucydides 1.98-118 Treebanking in the World of Thucydides Linguistic annotation for the Hellespont Project Francesco Mambrini Center For Hellenic Studies Deutsches Archäologisches Institut November 20 2012 Hellespont Project
  • 2. What digital corpora for Ancient History? Linguistic Annotation of Thucydides 1.98-118 Outline 1 What digital corpora for Ancient History? The questions at hand Data-driven approaches 2 Linguistic Annotation of Thucydides 1.98-118 The Hellespont Project Examples Hellespont Project
  • 3. What digital corpora for Ancient History? The questions at hand Linguistic Annotation of Thucydides 1.98-118 Data-driven approaches Outline 1 What digital corpora for Ancient History? The questions at hand Data-driven approaches 2 Linguistic Annotation of Thucydides 1.98-118 The Hellespont Project Examples Hellespont Project
  • 4. What digital corpora for Ancient History? The questions at hand Linguistic Annotation of Thucydides 1.98-118 Data-driven approaches A web of knowledge Figure: A simplified model Hellespont Project
  • 5. What digital corpora for Ancient History? The questions at hand Linguistic Annotation of Thucydides 1.98-118 Data-driven approaches Interconnectedness: the problem The multivalent nature of historical thought [. . . ] eludes the keyword-indexed approach to the Web today on offer through Google and other search engines. Though we can summon up an exhaustive list of Web resources that contain the words “Gallipoli” and “sources”, today’s Web cannot effectively respond to a basic historical question such as, “which sources attest the Gallipoli Campaign of World War I?” B. Robertson Hellespont Project
  • 6. What digital corpora for Ancient History? The questions at hand Linguistic Annotation of Thucydides 1.98-118 Data-driven approaches CIDOC Conceptual Reference Model Objects represented as being part of events Figure: by Doer and Stead 2009 Hellespont Project
  • 7. What digital corpora for Ancient History? The questions at hand Linguistic Annotation of Thucydides 1.98-118 Data-driven approaches One more problem! Know what our sources are! big and complex works; e.g. Thucydides: 6.126 sentences, 167.512 words ca 30 years of war, + 50 years in digression, references that go back to before the Trojan War! Unstructured natural language Written in Ancient Greek Controversial (interpretation and textual reconstruction) Literary work (= shaped by discursive and ideological strategies) Hellespont Project
  • 8. What digital corpora for Ancient History? The questions at hand Linguistic Annotation of Thucydides 1.98-118 Data-driven approaches Outline 1 What digital corpora for Ancient History? The questions at hand Data-driven approaches 2 Linguistic Annotation of Thucydides 1.98-118 The Hellespont Project Examples Hellespont Project
  • 9. What digital corpora for Ancient History? The questions at hand Linguistic Annotation of Thucydides 1.98-118 Data-driven approaches Ontologiemodellierung für die Erforschung von Ritualstrukturen (SBF 619, Heidelberg) Figure: Event extraction from texts Hellespont Project
  • 10. What digital corpora for Ancient History? The questions at hand Linguistic Annotation of Thucydides 1.98-118 Data-driven approaches NLP Pipeline NLP Process Ancient Greek? Chunking Lemmatization POS-tagging Syntactic parsing Word-sense disambiguation Co-reference resolution Semantic role annotation Hellespont Project
  • 11. What digital corpora for Ancient History? The questions at hand Linguistic Annotation of Thucydides 1.98-118 Data-driven approaches Using and Enhancing the available resources The Ancient Greek Dependency Treebank AGDT: treebank with word-by-word morphological and dependency-based syntactical description a step forward: semantic information Hellespont Project
  • 12. What digital corpora for Ancient History? The questions at hand Linguistic Annotation of Thucydides 1.98-118 Data-driven approaches A syntactic tree Thuc. 1.89.1 Hellespont Project
  • 13. What digital corpora for Ancient History? The Hellespont Project Linguistic Annotation of Thucydides 1.98-118 Examples Outline 1 What digital corpora for Ancient History? The questions at hand Data-driven approaches 2 Linguistic Annotation of Thucydides 1.98-118 The Hellespont Project Examples Hellespont Project
  • 14. What digital corpora for Ancient History? The Hellespont Project Linguistic Annotation of Thucydides 1.98-118 Examples A case study Athens, 479-431 BCE Goal: Connecting textual and archaeological sources in the Perseus DL and Arachne via CIDOC-CRM Steps: Enriching the text of one source (Thucydides) with linguistic and historical information Identify and mark events on the text manually data-driven approach Integrating secondary literature (through data mining algorithms) Hellespont Project
  • 15. What digital corpora for Ancient History? The Hellespont Project Linguistic Annotation of Thucydides 1.98-118 Examples Toward a 3-level scenario Morphology and Syntax Hellespont Project
  • 16. What digital corpora for Ancient History? The Hellespont Project Linguistic Annotation of Thucydides 1.98-118 Examples Toward a 3-level scenario + semantic and pragmatical information Hellespont Project
  • 17. What digital corpora for Ancient History? The Hellespont Project Linguistic Annotation of Thucydides 1.98-118 Examples Outline 1 What digital corpora for Ancient History? The questions at hand Data-driven approaches 2 Linguistic Annotation of Thucydides 1.98-118 The Hellespont Project Examples Hellespont Project
  • 18. What digital corpora for Ancient History? The Hellespont Project Linguistic Annotation of Thucydides 1.98-118 Examples With tectogrammatical annotation: Our text is: 1 easier to browse for content-related search (easier to use in digital environments) 2 more informative on historically relevant questions Hellespont Project
  • 19. What digital corpora for Ancient History? The Hellespont Project Linguistic Annotation of Thucydides 1.98-118 Examples With tectogrammatical annotation: Our text is: 1 easier to browse for content-related search (easier to use in digital environments) 2 more informative on historically relevant questions Hellespont Project
  • 20. What digital corpora for Ancient History? The Hellespont Project Linguistic Annotation of Thucydides 1.98-118 Examples With tectogrammatical annotation: Our text is: 1 easier to browse for content-related search (easier to use in digital environments) 2 more informative on historically relevant questions Hellespont Project
  • 21. What digital corpora for Ancient History? The Hellespont Project Linguistic Annotation of Thucydides 1.98-118 Examples Conclusions 1 Currently, our literary sources are not structured for semantic, event-based queries 2 NLP processes for event extraction are not yet capable of handling raw Ancient Greek texts 3 NLP tools and techniques are adaptable to the task provide standards help and speed manual annotation (incidentally) they add a lot of information on linguistic aspects of the documentary sources Hellespont Project