SlideShare a Scribd company logo
1 of 41
Download to read offline
Towards Digital Coptic
Searching and Visualizing
Coptic Manuscript Data
Caroline T. Schroeder,
University of the Pacific

cschroeder@pacific.edu

Amir Zeldes,
Humboldt-Universität zu Berlin

amir.zeldes@rz.hu-berlin.de

Berlin Digital Classicist Seminar, 14.1.2014
Plan
 Introduction
 Coptic data
 Annotations so far: normalizing, tokenizing and tagging

 Search architecture
 Searching through multiple segmentations: ANNIS
 Dealing with corpus formats: TEI, SaltNPepper

 Visualization
 Dedicated visualizations
 A reusable generic approach

 Conclusion and outlook

Schroeder & Zeldes / Towards Digital Coptic

Berlin, 14.1.2014

1/37
Who are these people?
 Prof. Caroline T. Schroeder –
Religious and Classical Studies /
Humanities Center Director
University of the Pacific
 Dr. Amir Zeldes –
Korpuslinguistik /
SFB 632 Information Structure
(from March: eHumanities group KOMeT)
Humboldt-Universität zu Berlin
 Cooperation Coptic SCRIPTORIUM established at 2012
NEH summer institute on "Text in a Digital Age" (Tufts):
http://coptic.pacific.edu/

Schroeder & Zeldes / Towards Digital Coptic

Berlin, 14.1.2014

2/37
Why Coptic?
 Last stage of Ancient Egyptian Language (starting 2nd Century)
 Mediterranean in 1st millenium
 Hellenistic period

 Unique language
 Longest continuous documentation
 Contact language (with Greek)

 Religious significance
 Early Christianity

 Rise of monasticism
 Gnosticism
 ...
Schroeder & Zeldes / Towards Digital Coptic

Coptische Dialects
14.1.2014

BMBF eHumanties - KOMeT / Zeldes
Berlin,

3/37
The data
 Lots of material (thanks to the Egyptian desert )
 Relatively little online, nothing like Greek and Latin
(Perseus)
 Lots of things you may want are not available:







New Testament (online, not normalized/lemmatized/annotated)
Old Testament
The Rule of St. Pachomius
Works of Shenoute of Atripe
Apophthegmata patrum
...

 But some have been digitized at some point!
Schroeder & Zeldes / Towards Digital Coptic

Berlin, 14.1.2014

4/37
A word about the texts in this talk
 So far we've concentrated on Shenoute's sermon Abraham our
Father
 "As for us, brethren, let us live by the truth so that we are upstanding in
all our works, and so that the prophets, apostles and all the saints might
dwell among us, ..."

 Apophthegmata Patrum (sayings of the desert fathers)
 "They said about the blessed Sarah the virgin that she spent sixty years
living at the top of the river and she never set foot outside to see the
river."

 New Testament, esp. Gospel of Mark
see http://coptic.pacific.edu/ for corpora and tools
Schroeder & Zeldes / Towards Digital Coptic

Berlin, 14.1.2014

5/37
Getting from raw text to annotated corpora
 Making the data searchable starts
with:
 Encoding manuscripts (Epidoc TEI)
 Segmentation of "word forms"

 Normalization
 Segmentation of morphemes
 Part-of-speech tagging

 More annotations...

 Brief recap: Detailed talk in Leipzig
last month (slides on my page)
Schroeder & Zeldes / Towards Digital Coptic

Berlin, 14.1.2014

6/37
Normalization
 Automatic normalization, manual correction
 handling of known diacritics, abbreviations

 closed, growing list of known variants

Schroeder & Zeldes / Towards Digital Coptic

Berlin, 14.1.2014

7/37
Tokenization
 Identifying morphemes non-trivial (agglutinative language,
different conventions; we follow Layton 2004)
 ϫⲓⲛⲧⲁⲓⲣ̅ⲙⲟⲛⲁⲭⲟⲥ
'Since I became a monk'
since-that-PAST-1sg-do-monk
 ⲉⲛⲧⲁϥⲧⲣⲉⲛⲣⲡϣⲁ
'he who made us keep the ceremony'
REL-PAST-3sgM-CAUS-1pl-do-the-observance

 Word level segmentation: manual (no scriptio continua)
 Morph segmentation: automatic (accuracy: 84% - 94%)
ⲛ̄ⲟⲩϣⲏⲣⲉ` ⲛ̄ⲁⲃⲣⲁϩⲁⲙ` 
of-a-son of-Abraham

Schroeder & Zeldes / Towards Digital Coptic

ⲛ ⲟⲩ ϣⲏⲣⲉ ⲛ ⲁⲃⲣⲁϩⲁⲙ
of a son
of Abraham

Berlin, 14.1.2014

8/37
Part-of-speech tagging
 POS tagging using TreeTagger (Schmid 1994) and a lexicon from the
CMCL project (courtesy of Prof. Tito Orlandi)
 Two tag sets:
 fine grained (45 tags) and coarse (22 tags)
(see http://coptic.pacific.edu/ for documentation)
 Interannotator agreement: 94.19% agreement, kappa = 93.67
(considers chance agreement, cf. Artstein & Poesio 2008)

 Accuracy:
 In domain, 10-fold cross-validation: 94.04% (fine)
 Out of domain (test with papyri.info): 79.6% (fine) / 87.7% (coarse)

 Main difficulties: open classes (N/V),
disambiguating homonyms (ⲉ can have 6 different tags!)

Schroeder & Zeldes / Towards Digital Coptic

Berlin, 14.1.2014

9/37
Further annotations
 Many other layers are done manually:
 Translation
 Language of origin
 Coreference

 Entity tagging (people, places...)
 Parallel alignment (with Greek)
 Syntax trees (very preliminary tests)

Schroeder & Zeldes / Towards Digital Coptic

Berlin, 14.1.2014

10/37
Representing data – how to look at all this stuff?
 We now have a lot of data to represent:
 Diplomatic transcriptions (including character rendering!)
 Normalization
 Segmentation into words, morphemes, sometimes letters

 Annotations

 How do we encode this data for search and
visualization?

Schroeder & Zeldes / Towards Digital Coptic

Berlin, 14.1.2014

11/37
The first challenge: minimal units
 Minimal units, or tokens, are critical for searching:
 Find all words preceding the word "God"
 Give me any mentions of Saint Paphnutius, ±10 words
 Search for the glosses father and son within 20 words

 Two problems:
 The concept of words is complex in Coptic
 Annotations overlap parts of words:
individual letters, line breaks...
 tokens are smaller than words!
Schroeder & Zeldes / Towards Digital Coptic

ⲡⲉϪⲁϥ ϫⲉ ⲉⲓ̇ⲥ ϣ
ⲙⲟⲩⲛ ⲛ̇ⲣⲟⲙⲡⲉ ⲻ
Ⲡⲉϫⲉ ⲡ̇ϩⲗ̇ⲗⲟ ⲛⲁϥ
he sAid "it's been e
ight years" –
The old man told him
Berlin, 14.1.2014

12/37
Solution: segmentation layers in ANNIS
 We use the open source ANNIS platform as a search
interface (Zeldes et al. 2009)
 Any annotation layer can be defined as a segmentation
defining alternative views on:
 Adjacency

(in words, morphemes, etc.)

 Proximity

(in words, morphemes, etc.)

 Context size

(in words, morphemes, etc.)

 But which segmentation layer do you want to see?
 Remember, diplomatic and normalized layers don't match
 Any segmentation layer is usable as "base text"
Schroeder & Zeldes / Towards Digital Coptic

Berlin, 14.1.2014

13/37
Switching segmentations in ANNIS

Schroeder & Zeldes / Towards Digital Coptic

Berlin, 14.1.2014

14/37
Different contexts
 Example search: entity="person"

 Hit: Abba Antonius
 Some options:

Ⲁⲩϭⲱⲗⲡ̇
5 ⲉ̇ⲃⲟⲗ ⲛⲁⲡⲁ ⲁ̇ⲛ
ⲧⲱⲛⲓ̇ⲟⲥ ϩⲓ̇ ⲡ̇ϫⲁⲓ̇ⲉ̇ ·
ϫⲉ ⲟⲩⲛ ⲟⲩⲁ̇ ⲉ̇ϥⲉⲓⲛⲉ̇

 ±5 words, diplomatic: (less than -5 found, since start of text)
Ⲁⲩϭⲱⲗⲡ̇ ⲉ̇ⲃⲟⲗ ⲛⲁⲡⲁ ⲁ̇ⲛⲧⲱⲛⲓ̇ⲟⲥ ϩⲓ̇ⲡ̇ϫⲁⲓ̇ⲉ̇ · ϫⲉⲟⲩⲛⲟⲩⲁ̇ ⲉ̇ϥⲉⲓⲛⲉ̇ ⲙ̇ⲙⲟⲕ

 ±10 morphs, normalized:
ⲁ ⲩ ϭⲱⲗⲡ ⲉⲃⲟⲗ ⲛ ⲁⲡⲁ ⲁⲛⲧⲱⲛⲓⲟⲥ ϩⲓ ⲡ ϫⲁⲓⲉ · ϫⲉ ⲟⲩⲛ ⲟⲩⲁ ⲉ ϥ ⲉⲓⲛⲉ ⲙⲙⲟ ⲕ

 ±5 tokens:
Ⲁ ⲩ ϭⲱⲗⲡ̇ ⲉ̇ⲃⲟⲗ ⲛ ⲁⲡⲁ ⲁ̇ⲛ ⲧⲱⲛⲓ̇ⲟⲥ ϩⲓ̇ ⲡ̇ ϫⲁⲓ̇ⲉ̇ · ϫⲉ

Schroeder & Zeldes / Towards Digital Coptic

Berlin, 14.1.2014

15/37
Searching with AQL
(see http://www.sfb632.uni-potsdam.de/annis/ )

 Basic principle of ANNIS Query Language (AQL):
 search for some annotations (#1, #2, #3...)
 stipulate relationships between them (operators)

 Example: verbs of Greek origin
pos="V" &
source_lang="Greek" &
#1 _=_ #2

The head bandit repented

identical coverage operator
I have faith in God
Schroeder & Zeldes / Towards Digital Coptic

Berlin, 14.1.2014

16/37
Referencing segmentations
 There are many operators
 . (adjacent), _i_ (inclusion), _o_ (overlap), _l_ (left aligned)...
 > (dominance), -> (pointing relation), >@l (left child)...
 ...

 Possible to use segmentations in queries:
 #1 . #2

- one followed by two

 #1 .word #2

- two is the next word after one

 #1 .norm,1,10 #2

- within 1 to 10 norm units

 ...
Schroeder & Zeldes / Towards Digital Coptic

Berlin, 14.1.2014

17/37
Adding metadata
 Metadata is like any other constraint, with meta::
prefix
 Can use regular expressions and negation
pos!="V" & source_lang="Greek" &
#1 _=_ #2 & meta::msName=/MONB.*/

 For metadata names and values we use TEI/EpiDoc as
a guideline

 More information on AQL:
http://www.sfb632.uni-potsdam.de/annis/

Schroeder & Zeldes / Towards Digital Coptic

Berlin, 14.1.2014

18/37
Architecture and formats
 Different formats are suitable for different parts of the
data
 TEI ideal for manuscript structure, metadata
 Linguistic formats for computational corpus linguistics:
tagging, parsing, coreference
 Convert and merge data using SaltNPepper
(Zipser & Romary 2010)

Schroeder & Zeldes / Towards Digital Coptic

Berlin, 14.1.2014

19/37
SaltNPepper (Zipser & Romary 2010)
 Metamodel Salt for
multiformat conversion
 Work on extending
TEI support: 2014-15

 Salt as internal representation
in ANNIS
Schroeder & Zeldes / Towards Digital Coptic

Berlin, 14.1.2014

20/37
How can we view the data?
 Even if we can query everything at once:
 people who are indirect objects of the verb "show" aligned
with Greek neuters...

 Can we also look at everything at once?

 Excerpt from a Salt graph view of two words:

Schroeder & Zeldes / Towards Digital Coptic

Berlin, 14.1.2014

21/37
Breaking it down
 Different annotations require different visualizations

 Two conflicting requirements:
 Ideal representation for each layer (syntax -> trees)
 Stay generic and minimize amount of visualizations

 How can we avoid programming new visualizations
with each new annotation layer?

Schroeder & Zeldes / Towards Digital Coptic

Berlin, 14.1.2014

22/37
Generic versus dedicated
 For some purposes, dedicated visualizations cannot be
avoided
 Special interactive functionality
 Special layouting algorithms

 For other purposes, we can reuse visualizations by
making flexible and configurable
 Need to take segmentations into account

Schroeder & Zeldes / Towards Digital Coptic

Berlin, 14.1.2014

23/37
Some dedicated examples
 Syntax trees

 Coreference view (interactive)

Schroeder & Zeldes / Towards Digital Coptic

Berlin, 14.1.2014

24/37
Taking segmentations into account
 Visualizations must be configurable to be aware of different
base texts
 Syntax tree is based on normalized "word"-internal morphs
 Sometimes one syntactic unit has multiple tokens

band

of ban dits

came upon a band

Schroeder & Zeldes / Towards Digital Coptic

of bandits

band ofban
15 dits and foundthem
drinking . [...]
Berlin, 14.1.2014

25/37
Reusing dedicated visualizers?
 In some cases, some creative uses can be found for
existing visualizations
 Using the coreference visualizer for parallel alignment:

apophthegmata patrum

Schroeder & Zeldes / Towards Digital Coptic

Berlin, 14.1.2014

26/37
Generic visualizations
Two main generic visualizers:

 Annotation grid:
 just mark borders of annotations
 good for flat information

 HTML visualizer:
 generates HTML elements based
on annotations

 defined using two simple stylesheets
 can look like (almost) anything

Schroeder & Zeldes / Towards Digital Coptic

Berlin, 14.1.2014

27/37
Multiple grids
 All annotations in one grid can lead to visual overload

 Often better to separate groups of annotations:

Schroeder & Zeldes / Towards Digital Coptic

Berlin, 14.1.2014

28/37
The HTML visualizer
 Any specific visualization is configured by two style sheets:
a config file and a CSS file
norm.config
p

norm.css

p

div.htmlvis {

word

span; style="word"

norm

span; style="norm"

font-family: Antinoou, sans-serif;
width: 500px;
white-space: normal !important;

value

trans t:title; style="trans" value

}
.trans:hover{color: red}
.word:after{content: " ";}

Schroeder & Zeldes / Towards Digital Coptic

Berlin, 14.1.2014

29/37
Result

Abraham our Father

<p>
<t class="translation"
title="Abraham our father wished to
have children with Sarah.">
<span class="word">
<span class="norm">
ⲁⲃⲣⲁϩⲁⲙ
</span>
</span>
<span class="word">
<span class="norm">
ⲡⲉⲛ
</span>
<span class="norm">
ⲉⲓⲱⲧ
</span>
</span>
</t>
...
</p>

Schroeder & Zeldes / Towards Digital Coptic

Berlin, 14.1.2014

30/37
Reusing the HTML visualizer
dipl.config

tok

span

lb

div; style="line"

pb

table:title; style="pb"

pb

tr

cb

td; style="cb"

hi_rend

hi_rend:rend

Schroeder & Zeldes / Towards Digital Coptic

value

value

value

Berlin, 14.1.2014

31/37
Visualizing TEI @rend attributes
dipl.css
div.line{display: block;
height: 22px
counter-increment: linecount;}
div.line:nth-of-type(5n):before{
content: counter(linecount)" "}
...

.pb{border-style:solid;}
.cb{counter-reset: linecount 0;
width: 160px;
min-width: 160px}

...
hi_rend[rend*=superscript]
{vertical-align: super; font-size: 80%}
hi_rend[rend*=red] {color: red}
hi_rend[rend*=tall] {font-size: 120%}

hi_rend[rend*=extralarge] {font-size: 160%}

Schroeder & Zeldes / Towards Digital Coptic

Berlin, 14.1.2014

32/37
Aggregate visualizations
 Latest version of ANNIS offers basic frequency analysis

 Open question: How much more should we build?
Schroeder & Zeldes / Towards Digital Coptic

Berlin, 14.1.2014

33/37
Aggregate visualizations
 Other visualizations are currently done e.g. in R:

ϫⲟⲟ

ⲉⲓⲣⲉ

ϣⲁ ⲟⲩⲛ
ϩⲟⲟⲩI/me

ⲡⲉϫⲉ

you.SG.M Egyptian vocabulary

ⲓⲏⲥⲟⲩⲥ

ϯⲥⲃⲱ
ⲕⲁ
ⲛⲉⲩ

ⲉⲓ

ⲅⲁⲗⲓⲗⲁⲓⲁ
ⲛⲥⲱ
Gospel
ⲕⲏⲣⲩⲥⲥⲉ

ⲉⲩⲁⲅⲅⲉⲗⲓⲟⲛ

said

Schroeder & Zeldes / Towards Digital Coptic

ⲛⲙⲙⲁ

ⲧⲃⲃⲟ

Jesus

ⲉⲣⲏⲙⲟⲥ

ⲁⲡⲁ

ⲕⲱ

ⲫⲟⲣⲉⲓ ⲣⲓ

ⲕ

. ⲥⲱ

ϣⲧⲏⲛ

ⲣⲁⲧ
ⲙⲉⲉⲩⲉ

ⲗⲁⲁⲩ

ⲙⲟⲛⲁⲭⲟⲥ

ⲡⲉϫⲁ

ⲣⲟⲙⲡⲉ

ϫⲉⲓ
ⲧⲁ

ⲁϣ

ⲓⲱϩⲁⲛⲛⲏⲥbaptism
ⲃⲁⲡⲧⲓⲥⲙⲁ

ⲁⲕⲁⲑⲁⲣⲧⲟⲛ
impure

John

ⲥⲓⲙⲱⲛ

old man

ⲧⲉⲧⲛ

ⲥⲩⲛⲁⲅⲱⲅⲏ

ⲛⲙ
ⲛⲧⲉⲣⲉ

ϣⲟⲙⲛⲧ
ⲏⲣⲡ

ⲉⲓⲃⲉ

Abba

ⲟⲩⲱⲙ

ⲡⲉⲓ ϩⲗⲗⲟ

ⲙⲟⲟⲩ ϭⲱⲗⲡ

wine

synagogue

ⲇⲁⲓⲙⲱⲛⲓⲟⲛ
ⲥⲟⲩⲧⲛ

eat

Gospel of Mark 1

ⲩⲛⲟⲩ

11 apophthegmata patrum

ⲡⲛⲉⲩⲙⲁ Holy
Ghost

Greek vocabulary
Berlin, 14.1.2014

34/37
Conclusion
 Annotation projects should not be limited by corpus
architectures:
 annotate whatever you want, however often you want
 link anything to anything

 Why annotate all of these things in the corpus?
(and not just in a separate spreadsheet)






Plots of just the verbs? Proper names?  POS tagging
Highlight, search and link place-names?  Entity tagging
Collapse inflected variants?  Lemmatization
Collapse prominent referents?  Coreference annotation
Dispersion of any of the above, alignment ... and much more

Schroeder & Zeldes / Towards Digital Coptic

Berlin, 14.1.2014

35/37
Conclusion
 Anything can be made queryable with more layers:
 typical constructions and objects of verbs?
 Greek vs. native verbs -> add language of origin layer
 Translation behavior -> add alignment layer

 ...

 Fitting visualization facilities
 should be easy to re-use

 optimized to the task, display relevant portions of information
 for many purposes, they must be sensitive to segmentations
Schroeder & Zeldes / Towards Digital Coptic

Berlin, 14.1.2014

36/37
Outlook
 This March: BMBF funded young researcher group on
eHumanities at HU Berlin
 KOMeT:
KOrpuslinguistische Methoden für ePhilologie mit TEI
 Focus on marrying TEI resources with computational linguistics methods
and formats
 Developing NLP tools, search and visualization for ancient world textual
resources
 Pilot phase (2014, approved): Coptic
 Main phase (2015-2019, pending): Other languages as well
 Currently looking for a student assistant (60h/month)

 Stay tuned for more!

Schroeder & Zeldes / Towards Digital Coptic

Berlin, 14.1.2014

37/37
Ⲙⲓⲱⲧⲛ ⲧⲱⲛⲟⲩ!
well-being+your.PL greatly
=>
Thanks!
References
 Artstein, Ron & Massimo Poesio (2008), Inter-Coder Agreement for
Computational Linguistics. Computational Linguistics 34(4), 556–596.
 Layton, Bentley (2004), A Coptic Grammar. Second Edition, Revised and
Expanded. (Porta linguarum orientalium 20.) Wiesbaden: Harrassowitz.
 Schmid, Helmut (1994), Probabilistic Part-of-Speech Tagging Using Decision
Trees. In: Proceedings of the Conference on New Methods in Language
Processing. Manchester, UK, 44–49. Available at: http://www.ims.unistuttgart.de/ftp/pub/corpora/tree-tagger1.pdf.
 Zeldes, Amir, Julia Ritz, Anke Lüdeling & Christian Chiarcos (2009), ANNIS: A
Search Tool for Multi-Layer Annotated Corpora. In: Proceedings of Corpus
Linguistics 2009. Liverpool, UK.
 Zipser, Florian & Laurent Romary (2010), A Model Oriented Approach to the
Mapping of Annotation Formats using Standards. In: Proceedings of the
Workshop on Language Resource and Language Technology Standards,
LREC-2010. Valletta, Malta, 7–18.
Links
 Coptic SCRIPTORIUM:

 ANNIS:

http://coptic.pacific.edu/

http://www.sfb632.uni-potsdam.de/annis/

 Search engine for our corpora:
https://korpling.german.hu-berlin.de/annis3/scriptorium

 Papyri.info: http://papyri.info/
 CMCL: http://cmcl.let.uniroma1.it/

More Related Content

Viewers also liked

Digital Manuscripts Toolkit, using IIIF and JavaScript. Monica Messaggi Kaya
Digital Manuscripts Toolkit, using IIIF and JavaScript. Monica Messaggi KayaDigital Manuscripts Toolkit, using IIIF and JavaScript. Monica Messaggi Kaya
Digital Manuscripts Toolkit, using IIIF and JavaScript. Monica Messaggi KayaFuture Insights
 
XVIII Jornada de Gestión de la Información de SEDIC. Análisis de impacto en r...
XVIII Jornada de Gestión de la Información de SEDIC. Análisis de impacto en r...XVIII Jornada de Gestión de la Información de SEDIC. Análisis de impacto en r...
XVIII Jornada de Gestión de la Información de SEDIC. Análisis de impacto en r...SEDIC
 
Digitization of Documentary Heritage Collections in Indic Language Comparativ...
Digitization of Documentary Heritage Collections in Indic LanguageComparativ...Digitization of Documentary Heritage Collections in Indic LanguageComparativ...
Digitization of Documentary Heritage Collections in Indic Language Comparativ...Anup Kumar Das
 
Preservación digital en la BNE: necesidad de un panorama global. Isabel Borde...
Preservación digital en la BNE: necesidad de un panorama global. Isabel Borde...Preservación digital en la BNE: necesidad de un panorama global. Isabel Borde...
Preservación digital en la BNE: necesidad de un panorama global. Isabel Borde...Biblioteca Nacional de España
 
Presentation of Europeana Regia at "The Message of the Old Book in the New En...
Presentation of Europeana Regia at "The Message of the Old Book in the New En...Presentation of Europeana Regia at "The Message of the Old Book in the New En...
Presentation of Europeana Regia at "The Message of the Old Book in the New En...Europeana Regia
 
Defragmenting Digitized Manuscripts Sources
Defragmenting Digitized Manuscripts SourcesDefragmenting Digitized Manuscripts Sources
Defragmenting Digitized Manuscripts SourcesDH Benelux
 
Manuscript digitisation
Manuscript digitisationManuscript digitisation
Manuscript digitisationSanjay Goel
 
Biblissima: Medieval Manuscripts and the Semantic Web
Biblissima: Medieval Manuscripts and the Semantic WebBiblissima: Medieval Manuscripts and the Semantic Web
Biblissima: Medieval Manuscripts and the Semantic WebEquipex Biblissima
 
Medieval Music Manuscript Exploration, Baylor Libraries
Medieval Music Manuscript Exploration, Baylor LibrariesMedieval Music Manuscript Exploration, Baylor Libraries
Medieval Music Manuscript Exploration, Baylor Librariesbaylor university
 
MANUSCRIPT ACQUISITION
MANUSCRIPT ACQUISITIONMANUSCRIPT ACQUISITION
MANUSCRIPT ACQUISITIONMaude1
 
Ukad forum 2 march_2011_iams
Ukad forum 2 march_2011_iamsUkad forum 2 march_2011_iams
Ukad forum 2 march_2011_iamsWilliam Stockting
 
Parker Keio 2011: Interoperable Manuscript Framework
Parker Keio 2011: Interoperable Manuscript FrameworkParker Keio 2011: Interoperable Manuscript Framework
Parker Keio 2011: Interoperable Manuscript FrameworkRobert Sanderson
 
The Library as a Digital Research infrastructure: Digital Initiatives and Dig...
The Library as a Digital Research infrastructure: Digital Initiatives and Dig...The Library as a Digital Research infrastructure: Digital Initiatives and Dig...
The Library as a Digital Research infrastructure: Digital Initiatives and Dig...lorna_hughes
 
Treasures of the National Library of Myanmar
Treasures of the National Library of MyanmarTreasures of the National Library of Myanmar
Treasures of the National Library of MyanmarMya OO
 
Fitt Toolbox Best Practice Cluster Collaboration Final
Fitt Toolbox Best Practice Cluster Collaboration FinalFitt Toolbox Best Practice Cluster Collaboration Final
Fitt Toolbox Best Practice Cluster Collaboration FinalFITT
 
2013 RBMS Premodern manuscript application profile presentation
2013 RBMS Premodern manuscript application profile presentation2013 RBMS Premodern manuscript application profile presentation
2013 RBMS Premodern manuscript application profile presentationssteuer
 
Word e PowerPoint per testi di laurea
Word e PowerPoint per testi di laureaWord e PowerPoint per testi di laurea
Word e PowerPoint per testi di laureaAngelo Gino Varrati
 

Viewers also liked (19)

Digital Manuscripts Toolkit, using IIIF and JavaScript. Monica Messaggi Kaya
Digital Manuscripts Toolkit, using IIIF and JavaScript. Monica Messaggi KayaDigital Manuscripts Toolkit, using IIIF and JavaScript. Monica Messaggi Kaya
Digital Manuscripts Toolkit, using IIIF and JavaScript. Monica Messaggi Kaya
 
XVIII Jornada de Gestión de la Información de SEDIC. Análisis de impacto en r...
XVIII Jornada de Gestión de la Información de SEDIC. Análisis de impacto en r...XVIII Jornada de Gestión de la Información de SEDIC. Análisis de impacto en r...
XVIII Jornada de Gestión de la Información de SEDIC. Análisis de impacto en r...
 
Digitization of Documentary Heritage Collections in Indic Language Comparativ...
Digitization of Documentary Heritage Collections in Indic LanguageComparativ...Digitization of Documentary Heritage Collections in Indic LanguageComparativ...
Digitization of Documentary Heritage Collections in Indic Language Comparativ...
 
Preservación digital en la BNE: necesidad de un panorama global. Isabel Borde...
Preservación digital en la BNE: necesidad de un panorama global. Isabel Borde...Preservación digital en la BNE: necesidad de un panorama global. Isabel Borde...
Preservación digital en la BNE: necesidad de un panorama global. Isabel Borde...
 
Presentation of Europeana Regia at "The Message of the Old Book in the New En...
Presentation of Europeana Regia at "The Message of the Old Book in the New En...Presentation of Europeana Regia at "The Message of the Old Book in the New En...
Presentation of Europeana Regia at "The Message of the Old Book in the New En...
 
Defragmenting Digitized Manuscripts Sources
Defragmenting Digitized Manuscripts SourcesDefragmenting Digitized Manuscripts Sources
Defragmenting Digitized Manuscripts Sources
 
Manuscript digitisation
Manuscript digitisationManuscript digitisation
Manuscript digitisation
 
Biblissima: Medieval Manuscripts and the Semantic Web
Biblissima: Medieval Manuscripts and the Semantic WebBiblissima: Medieval Manuscripts and the Semantic Web
Biblissima: Medieval Manuscripts and the Semantic Web
 
Expanding Horizons - Ideas into Practice
Expanding Horizons - Ideas into PracticeExpanding Horizons - Ideas into Practice
Expanding Horizons - Ideas into Practice
 
Medieval Music Manuscript Exploration, Baylor Libraries
Medieval Music Manuscript Exploration, Baylor LibrariesMedieval Music Manuscript Exploration, Baylor Libraries
Medieval Music Manuscript Exploration, Baylor Libraries
 
MANUSCRIPT ACQUISITION
MANUSCRIPT ACQUISITIONMANUSCRIPT ACQUISITION
MANUSCRIPT ACQUISITION
 
Ukad forum 2 march_2011_iams
Ukad forum 2 march_2011_iamsUkad forum 2 march_2011_iams
Ukad forum 2 march_2011_iams
 
Parker Keio 2011: Interoperable Manuscript Framework
Parker Keio 2011: Interoperable Manuscript FrameworkParker Keio 2011: Interoperable Manuscript Framework
Parker Keio 2011: Interoperable Manuscript Framework
 
The Library as a Digital Research infrastructure: Digital Initiatives and Dig...
The Library as a Digital Research infrastructure: Digital Initiatives and Dig...The Library as a Digital Research infrastructure: Digital Initiatives and Dig...
The Library as a Digital Research infrastructure: Digital Initiatives and Dig...
 
Treasures of the National Library of Myanmar
Treasures of the National Library of MyanmarTreasures of the National Library of Myanmar
Treasures of the National Library of Myanmar
 
Buz digital cooperacion
Buz digital   cooperacionBuz digital   cooperacion
Buz digital cooperacion
 
Fitt Toolbox Best Practice Cluster Collaboration Final
Fitt Toolbox Best Practice Cluster Collaboration FinalFitt Toolbox Best Practice Cluster Collaboration Final
Fitt Toolbox Best Practice Cluster Collaboration Final
 
2013 RBMS Premodern manuscript application profile presentation
2013 RBMS Premodern manuscript application profile presentation2013 RBMS Premodern manuscript application profile presentation
2013 RBMS Premodern manuscript application profile presentation
 
Word e PowerPoint per testi di laurea
Word e PowerPoint per testi di laureaWord e PowerPoint per testi di laurea
Word e PowerPoint per testi di laurea
 

Similar to [DCSB] Amiz Zeldes (HU, Berlin) "Towards Digital Coptic: Searching and Visualizing Coptic Manuscript Data"

Semantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenzaSemantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenzaGiorgia Lodi
 
Corpora, Blogs and Linguistic Variation (Paderborn)
Corpora, Blogs and Linguistic Variation (Paderborn)Corpora, Blogs and Linguistic Variation (Paderborn)
Corpora, Blogs and Linguistic Variation (Paderborn)Cornelius Puschmann
 
20110330 bruxelles doc_freedom
20110330 bruxelles doc_freedom20110330 bruxelles doc_freedom
20110330 bruxelles doc_freedomStefan Gradmann
 
NLP2RDF Wortschatz and Linguistic LOD draft
NLP2RDF Wortschatz and Linguistic LOD draftNLP2RDF Wortschatz and Linguistic LOD draft
NLP2RDF Wortschatz and Linguistic LOD draftSebastian Hellmann
 
Session 0.0 aussenac semanticsnl-pwebsem2017-v4
Session 0.0   aussenac semanticsnl-pwebsem2017-v4Session 0.0   aussenac semanticsnl-pwebsem2017-v4
Session 0.0 aussenac semanticsnl-pwebsem2017-v4semanticsconference
 
Data management for researchers
Data management for researchersData management for researchers
Data management for researchersDirk Roorda
 
SESAM4 - A guide to semantics in the Linked Open Data cloud, Robert HP Engels...
SESAM4 - A guide to semantics in the Linked Open Data cloud, Robert HP Engels...SESAM4 - A guide to semantics in the Linked Open Data cloud, Robert HP Engels...
SESAM4 - A guide to semantics in the Linked Open Data cloud, Robert HP Engels...The Research Council of Norway, IKTPLUSS
 
Semantic Web and Linked Data for cultural heritage materials - Approaches in ...
Semantic Web and Linked Data for cultural heritage materials - Approaches in ...Semantic Web and Linked Data for cultural heritage materials - Approaches in ...
Semantic Web and Linked Data for cultural heritage materials - Approaches in ...Antoine Isaac
 
Big Data and Natural Language Processing
Big Data and Natural Language ProcessingBig Data and Natural Language Processing
Big Data and Natural Language ProcessingMichel Bruley
 
Linked Open Europeana: Semantics for the Citizen
Linked Open Europeana: Semantics for the CitizenLinked Open Europeana: Semantics for the Citizen
Linked Open Europeana: Semantics for the CitizenStefan Gradmann
 
Harmony project - JISC Synthesis meeting 2001
Harmony project - JISC Synthesis meeting 2001Harmony project - JISC Synthesis meeting 2001
Harmony project - JISC Synthesis meeting 2001Dan Brickley
 
20110324 linked openeuropeanahumanities
20110324 linked openeuropeanahumanities20110324 linked openeuropeanahumanities
20110324 linked openeuropeanahumanitiesStefan Gradmann
 
Fhbib Chronology2
Fhbib Chronology2Fhbib Chronology2
Fhbib Chronology2translit
 
culture victoria lodlam lightningtalk
culture victoria lodlam lightningtalkculture victoria lodlam lightningtalk
culture victoria lodlam lightningtalkDavid F. Flanders
 
APLA OS Session 2008
APLA OS Session 2008APLA OS Session 2008
APLA OS Session 2008Mark Leggott
 
Prof. M. Thaller (Universität Köln) - Toward a reference curriculum in Digita...
Prof. M. Thaller (Universität Köln) - Toward a reference curriculum in Digita...Prof. M. Thaller (Universität Köln) - Toward a reference curriculum in Digita...
Prof. M. Thaller (Universität Köln) - Toward a reference curriculum in Digita...infoclio.ch
 
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.Lifeng (Aaron) Han
 

Similar to [DCSB] Amiz Zeldes (HU, Berlin) "Towards Digital Coptic: Searching and Visualizing Coptic Manuscript Data" (20)

Semantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenzaSemantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenza
 
Corpora, Blogs and Linguistic Variation (Paderborn)
Corpora, Blogs and Linguistic Variation (Paderborn)Corpora, Blogs and Linguistic Variation (Paderborn)
Corpora, Blogs and Linguistic Variation (Paderborn)
 
20110330 bruxelles doc_freedom
20110330 bruxelles doc_freedom20110330 bruxelles doc_freedom
20110330 bruxelles doc_freedom
 
NLP2RDF Wortschatz and Linguistic LOD draft
NLP2RDF Wortschatz and Linguistic LOD draftNLP2RDF Wortschatz and Linguistic LOD draft
NLP2RDF Wortschatz and Linguistic LOD draft
 
WALS and eLanguage (Leipzig)
WALS and eLanguage (Leipzig)WALS and eLanguage (Leipzig)
WALS and eLanguage (Leipzig)
 
Session 0.0 aussenac semanticsnl-pwebsem2017-v4
Session 0.0   aussenac semanticsnl-pwebsem2017-v4Session 0.0   aussenac semanticsnl-pwebsem2017-v4
Session 0.0 aussenac semanticsnl-pwebsem2017-v4
 
Aussenac semanticsnl pwebsem2017-v4
Aussenac semanticsnl pwebsem2017-v4Aussenac semanticsnl pwebsem2017-v4
Aussenac semanticsnl pwebsem2017-v4
 
20110728 datalift-rpi-troy
20110728 datalift-rpi-troy20110728 datalift-rpi-troy
20110728 datalift-rpi-troy
 
Data management for researchers
Data management for researchersData management for researchers
Data management for researchers
 
SESAM4 - A guide to semantics in the Linked Open Data cloud, Robert HP Engels...
SESAM4 - A guide to semantics in the Linked Open Data cloud, Robert HP Engels...SESAM4 - A guide to semantics in the Linked Open Data cloud, Robert HP Engels...
SESAM4 - A guide to semantics in the Linked Open Data cloud, Robert HP Engels...
 
Semantic Web and Linked Data for cultural heritage materials - Approaches in ...
Semantic Web and Linked Data for cultural heritage materials - Approaches in ...Semantic Web and Linked Data for cultural heritage materials - Approaches in ...
Semantic Web and Linked Data for cultural heritage materials - Approaches in ...
 
Big Data and Natural Language Processing
Big Data and Natural Language ProcessingBig Data and Natural Language Processing
Big Data and Natural Language Processing
 
Linked Open Europeana: Semantics for the Citizen
Linked Open Europeana: Semantics for the CitizenLinked Open Europeana: Semantics for the Citizen
Linked Open Europeana: Semantics for the Citizen
 
Harmony project - JISC Synthesis meeting 2001
Harmony project - JISC Synthesis meeting 2001Harmony project - JISC Synthesis meeting 2001
Harmony project - JISC Synthesis meeting 2001
 
20110324 linked openeuropeanahumanities
20110324 linked openeuropeanahumanities20110324 linked openeuropeanahumanities
20110324 linked openeuropeanahumanities
 
Fhbib Chronology2
Fhbib Chronology2Fhbib Chronology2
Fhbib Chronology2
 
culture victoria lodlam lightningtalk
culture victoria lodlam lightningtalkculture victoria lodlam lightningtalk
culture victoria lodlam lightningtalk
 
APLA OS Session 2008
APLA OS Session 2008APLA OS Session 2008
APLA OS Session 2008
 
Prof. M. Thaller (Universität Köln) - Toward a reference curriculum in Digita...
Prof. M. Thaller (Universität Köln) - Toward a reference curriculum in Digita...Prof. M. Thaller (Universität Köln) - Toward a reference curriculum in Digita...
Prof. M. Thaller (Universität Köln) - Toward a reference curriculum in Digita...
 
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
 

More from Digital Classicist Seminar Berlin

[DCSB] Aline Deicke (Digital Academy Mainz) From E19 to MATCH and MERGE. Mapp...
[DCSB] Aline Deicke (Digital Academy Mainz) From E19 to MATCH and MERGE. Mapp...[DCSB] Aline Deicke (Digital Academy Mainz) From E19 to MATCH and MERGE. Mapp...
[DCSB] Aline Deicke (Digital Academy Mainz) From E19 to MATCH and MERGE. Mapp...Digital Classicist Seminar Berlin
 
[DCSB] Wolfgang Schmidle et al. (DAI) chronOntology: A time gazetteer with pr...
[DCSB] Wolfgang Schmidle et al. (DAI) chronOntology: A time gazetteer with pr...[DCSB] Wolfgang Schmidle et al. (DAI) chronOntology: A time gazetteer with pr...
[DCSB] Wolfgang Schmidle et al. (DAI) chronOntology: A time gazetteer with pr...Digital Classicist Seminar Berlin
 
[DCSB] Chiara Palladino & Tariq Youssef (Leipzig) iAligner: a tool for syntax...
[DCSB] Chiara Palladino & Tariq Youssef (Leipzig) iAligner: a tool for syntax...[DCSB] Chiara Palladino & Tariq Youssef (Leipzig) iAligner: a tool for syntax...
[DCSB] Chiara Palladino & Tariq Youssef (Leipzig) iAligner: a tool for syntax...Digital Classicist Seminar Berlin
 
[DCSB] Katherine Crawford (Southampton) In the Footsteps of the Gods: network...
[DCSB] Katherine Crawford (Southampton) In the Footsteps of the Gods: network...[DCSB] Katherine Crawford (Southampton) In the Footsteps of the Gods: network...
[DCSB] Katherine Crawford (Southampton) In the Footsteps of the Gods: network...Digital Classicist Seminar Berlin
 
[DCSB] Nathan Gibson (Vanderbilt) Toward a Cyberinfrastructure for Syriac Lit...
[DCSB] Nathan Gibson (Vanderbilt) Toward a Cyberinfrastructure for Syriac Lit...[DCSB] Nathan Gibson (Vanderbilt) Toward a Cyberinfrastructure for Syriac Lit...
[DCSB] Nathan Gibson (Vanderbilt) Toward a Cyberinfrastructure for Syriac Lit...Digital Classicist Seminar Berlin
 
[DCSB] Duncan Keenan-Jones (Glasgow) Digital Experimental Archaeology: Hero o...
[DCSB] Duncan Keenan-Jones (Glasgow) Digital Experimental Archaeology: Hero o...[DCSB] Duncan Keenan-Jones (Glasgow) Digital Experimental Archaeology: Hero o...
[DCSB] Duncan Keenan-Jones (Glasgow) Digital Experimental Archaeology: Hero o...Digital Classicist Seminar Berlin
 
[DCSB] Undine Lieberwirth & Axel Gering (TOPOI) 3D GIS in archaeology – a mic...
[DCSB] Undine Lieberwirth & Axel Gering (TOPOI) 3D GIS in archaeology – a mic...[DCSB] Undine Lieberwirth & Axel Gering (TOPOI) 3D GIS in archaeology – a mic...
[DCSB] Undine Lieberwirth & Axel Gering (TOPOI) 3D GIS in archaeology – a mic...Digital Classicist Seminar Berlin
 
[DCSB] Christian Prager (Bonn) Of Codes, Glyphs and Kings: Tasks, Limits and ...
[DCSB] Christian Prager (Bonn) Of Codes, Glyphs and Kings: Tasks, Limits and ...[DCSB] Christian Prager (Bonn) Of Codes, Glyphs and Kings: Tasks, Limits and ...
[DCSB] Christian Prager (Bonn) Of Codes, Glyphs and Kings: Tasks, Limits and ...Digital Classicist Seminar Berlin
 
[DCSB] Silvia Polla (Topoi) Between Demography and Consumption: Digital and Q...
[DCSB] Silvia Polla (Topoi) Between Demography and Consumption: Digital and Q...[DCSB] Silvia Polla (Topoi) Between Demography and Consumption: Digital and Q...
[DCSB] Silvia Polla (Topoi) Between Demography and Consumption: Digital and Q...Digital Classicist Seminar Berlin
 
[DCSB] Pau de Soto (University of Southampton), “Network Analysis to Understa...
[DCSB] Pau de Soto (University of Southampton), “Network Analysis to Understa...[DCSB] Pau de Soto (University of Southampton), “Network Analysis to Understa...
[DCSB] Pau de Soto (University of Southampton), “Network Analysis to Understa...Digital Classicist Seminar Berlin
 
[DCSB] Torsten Roeder (Julius Maximilian University of Würzburg) and Yury Arz...
[DCSB] Torsten Roeder (Julius Maximilian University of Würzburg) and Yury Arz...[DCSB] Torsten Roeder (Julius Maximilian University of Würzburg) and Yury Arz...
[DCSB] Torsten Roeder (Julius Maximilian University of Würzburg) and Yury Arz...Digital Classicist Seminar Berlin
 
[DCSB] Christian Fron (University of Stuttgart), “Beyond the visual. The acou...
[DCSB] Christian Fron (University of Stuttgart), “Beyond the visual. The acou...[DCSB] Christian Fron (University of Stuttgart), “Beyond the visual. The acou...
[DCSB] Christian Fron (University of Stuttgart), “Beyond the visual. The acou...Digital Classicist Seminar Berlin
 
[DCSB] Silke Vanbeselaere (KU Leuven), “Love Thy (Theban) Neighbours, or how ...
[DCSB] Silke Vanbeselaere (KU Leuven), “Love Thy (Theban) Neighbours, or how ...[DCSB] Silke Vanbeselaere (KU Leuven), “Love Thy (Theban) Neighbours, or how ...
[DCSB] Silke Vanbeselaere (KU Leuven), “Love Thy (Theban) Neighbours, or how ...Digital Classicist Seminar Berlin
 
[DCSB] Jorit Wintjes (University of Würzburg), “Diekplous! – understanding an...
[DCSB] Jorit Wintjes (University of Würzburg), “Diekplous! – understanding an...[DCSB] Jorit Wintjes (University of Würzburg), “Diekplous! – understanding an...
[DCSB] Jorit Wintjes (University of Würzburg), “Diekplous! – understanding an...Digital Classicist Seminar Berlin
 
[DCSB] Gregory Crane (University of Leipzig): "Digital Philology, World Liter...
[DCSB] Gregory Crane (University of Leipzig): "Digital Philology, World Liter...[DCSB] Gregory Crane (University of Leipzig): "Digital Philology, World Liter...
[DCSB] Gregory Crane (University of Leipzig): "Digital Philology, World Liter...Digital Classicist Seminar Berlin
 
[DCSB] Chris Forstall, Lavinia Galli Milić (University of Geneva): "Thematic ...
[DCSB] Chris Forstall, Lavinia Galli Milić (University of Geneva): "Thematic ...[DCSB] Chris Forstall, Lavinia Galli Milić (University of Geneva): "Thematic ...
[DCSB] Chris Forstall, Lavinia Galli Milić (University of Geneva): "Thematic ...Digital Classicist Seminar Berlin
 
Kathryn Piquette (U of Cologne), "The Herculaneum Papyri and Greek Magical Te...
Kathryn Piquette (U of Cologne), "The Herculaneum Papyri and Greek Magical Te...Kathryn Piquette (U of Cologne), "The Herculaneum Papyri and Greek Magical Te...
Kathryn Piquette (U of Cologne), "The Herculaneum Papyri and Greek Magical Te...Digital Classicist Seminar Berlin
 
[DCSB] Gabriel Bodard / Faith Lawrence (KCL), "Standards for Networking Ancie...
[DCSB] Gabriel Bodard / Faith Lawrence (KCL), "Standards for Networking Ancie...[DCSB] Gabriel Bodard / Faith Lawrence (KCL), "Standards for Networking Ancie...
[DCSB] Gabriel Bodard / Faith Lawrence (KCL), "Standards for Networking Ancie...Digital Classicist Seminar Berlin
 
[DCSB] Tom Brughmans (U of Konstanz), "Roman bazaar or market economy? Explai...
[DCSB] Tom Brughmans (U of Konstanz), "Roman bazaar or market economy? Explai...[DCSB] Tom Brughmans (U of Konstanz), "Roman bazaar or market economy? Explai...
[DCSB] Tom Brughmans (U of Konstanz), "Roman bazaar or market economy? Explai...Digital Classicist Seminar Berlin
 
[DCSB] Yannick Anné and Toon Van Hal (U of Leuven), "Creating a Dynamic Gramm...
[DCSB] Yannick Anné and Toon Van Hal (U of Leuven), "Creating a Dynamic Gramm...[DCSB] Yannick Anné and Toon Van Hal (U of Leuven), "Creating a Dynamic Gramm...
[DCSB] Yannick Anné and Toon Van Hal (U of Leuven), "Creating a Dynamic Gramm...Digital Classicist Seminar Berlin
 

More from Digital Classicist Seminar Berlin (20)

[DCSB] Aline Deicke (Digital Academy Mainz) From E19 to MATCH and MERGE. Mapp...
[DCSB] Aline Deicke (Digital Academy Mainz) From E19 to MATCH and MERGE. Mapp...[DCSB] Aline Deicke (Digital Academy Mainz) From E19 to MATCH and MERGE. Mapp...
[DCSB] Aline Deicke (Digital Academy Mainz) From E19 to MATCH and MERGE. Mapp...
 
[DCSB] Wolfgang Schmidle et al. (DAI) chronOntology: A time gazetteer with pr...
[DCSB] Wolfgang Schmidle et al. (DAI) chronOntology: A time gazetteer with pr...[DCSB] Wolfgang Schmidle et al. (DAI) chronOntology: A time gazetteer with pr...
[DCSB] Wolfgang Schmidle et al. (DAI) chronOntology: A time gazetteer with pr...
 
[DCSB] Chiara Palladino & Tariq Youssef (Leipzig) iAligner: a tool for syntax...
[DCSB] Chiara Palladino & Tariq Youssef (Leipzig) iAligner: a tool for syntax...[DCSB] Chiara Palladino & Tariq Youssef (Leipzig) iAligner: a tool for syntax...
[DCSB] Chiara Palladino & Tariq Youssef (Leipzig) iAligner: a tool for syntax...
 
[DCSB] Katherine Crawford (Southampton) In the Footsteps of the Gods: network...
[DCSB] Katherine Crawford (Southampton) In the Footsteps of the Gods: network...[DCSB] Katherine Crawford (Southampton) In the Footsteps of the Gods: network...
[DCSB] Katherine Crawford (Southampton) In the Footsteps of the Gods: network...
 
[DCSB] Nathan Gibson (Vanderbilt) Toward a Cyberinfrastructure for Syriac Lit...
[DCSB] Nathan Gibson (Vanderbilt) Toward a Cyberinfrastructure for Syriac Lit...[DCSB] Nathan Gibson (Vanderbilt) Toward a Cyberinfrastructure for Syriac Lit...
[DCSB] Nathan Gibson (Vanderbilt) Toward a Cyberinfrastructure for Syriac Lit...
 
[DCSB] Duncan Keenan-Jones (Glasgow) Digital Experimental Archaeology: Hero o...
[DCSB] Duncan Keenan-Jones (Glasgow) Digital Experimental Archaeology: Hero o...[DCSB] Duncan Keenan-Jones (Glasgow) Digital Experimental Archaeology: Hero o...
[DCSB] Duncan Keenan-Jones (Glasgow) Digital Experimental Archaeology: Hero o...
 
[DCSB] Undine Lieberwirth & Axel Gering (TOPOI) 3D GIS in archaeology – a mic...
[DCSB] Undine Lieberwirth & Axel Gering (TOPOI) 3D GIS in archaeology – a mic...[DCSB] Undine Lieberwirth & Axel Gering (TOPOI) 3D GIS in archaeology – a mic...
[DCSB] Undine Lieberwirth & Axel Gering (TOPOI) 3D GIS in archaeology – a mic...
 
[DCSB] Christian Prager (Bonn) Of Codes, Glyphs and Kings: Tasks, Limits and ...
[DCSB] Christian Prager (Bonn) Of Codes, Glyphs and Kings: Tasks, Limits and ...[DCSB] Christian Prager (Bonn) Of Codes, Glyphs and Kings: Tasks, Limits and ...
[DCSB] Christian Prager (Bonn) Of Codes, Glyphs and Kings: Tasks, Limits and ...
 
[DCSB] Silvia Polla (Topoi) Between Demography and Consumption: Digital and Q...
[DCSB] Silvia Polla (Topoi) Between Demography and Consumption: Digital and Q...[DCSB] Silvia Polla (Topoi) Between Demography and Consumption: Digital and Q...
[DCSB] Silvia Polla (Topoi) Between Demography and Consumption: Digital and Q...
 
[DCSB] Pau de Soto (University of Southampton), “Network Analysis to Understa...
[DCSB] Pau de Soto (University of Southampton), “Network Analysis to Understa...[DCSB] Pau de Soto (University of Southampton), “Network Analysis to Understa...
[DCSB] Pau de Soto (University of Southampton), “Network Analysis to Understa...
 
[DCSB] Torsten Roeder (Julius Maximilian University of Würzburg) and Yury Arz...
[DCSB] Torsten Roeder (Julius Maximilian University of Würzburg) and Yury Arz...[DCSB] Torsten Roeder (Julius Maximilian University of Würzburg) and Yury Arz...
[DCSB] Torsten Roeder (Julius Maximilian University of Würzburg) and Yury Arz...
 
[DCSB] Christian Fron (University of Stuttgart), “Beyond the visual. The acou...
[DCSB] Christian Fron (University of Stuttgart), “Beyond the visual. The acou...[DCSB] Christian Fron (University of Stuttgart), “Beyond the visual. The acou...
[DCSB] Christian Fron (University of Stuttgart), “Beyond the visual. The acou...
 
[DCSB] Silke Vanbeselaere (KU Leuven), “Love Thy (Theban) Neighbours, or how ...
[DCSB] Silke Vanbeselaere (KU Leuven), “Love Thy (Theban) Neighbours, or how ...[DCSB] Silke Vanbeselaere (KU Leuven), “Love Thy (Theban) Neighbours, or how ...
[DCSB] Silke Vanbeselaere (KU Leuven), “Love Thy (Theban) Neighbours, or how ...
 
[DCSB] Jorit Wintjes (University of Würzburg), “Diekplous! – understanding an...
[DCSB] Jorit Wintjes (University of Würzburg), “Diekplous! – understanding an...[DCSB] Jorit Wintjes (University of Würzburg), “Diekplous! – understanding an...
[DCSB] Jorit Wintjes (University of Würzburg), “Diekplous! – understanding an...
 
[DCSB] Gregory Crane (University of Leipzig): "Digital Philology, World Liter...
[DCSB] Gregory Crane (University of Leipzig): "Digital Philology, World Liter...[DCSB] Gregory Crane (University of Leipzig): "Digital Philology, World Liter...
[DCSB] Gregory Crane (University of Leipzig): "Digital Philology, World Liter...
 
[DCSB] Chris Forstall, Lavinia Galli Milić (University of Geneva): "Thematic ...
[DCSB] Chris Forstall, Lavinia Galli Milić (University of Geneva): "Thematic ...[DCSB] Chris Forstall, Lavinia Galli Milić (University of Geneva): "Thematic ...
[DCSB] Chris Forstall, Lavinia Galli Milić (University of Geneva): "Thematic ...
 
Kathryn Piquette (U of Cologne), "The Herculaneum Papyri and Greek Magical Te...
Kathryn Piquette (U of Cologne), "The Herculaneum Papyri and Greek Magical Te...Kathryn Piquette (U of Cologne), "The Herculaneum Papyri and Greek Magical Te...
Kathryn Piquette (U of Cologne), "The Herculaneum Papyri and Greek Magical Te...
 
[DCSB] Gabriel Bodard / Faith Lawrence (KCL), "Standards for Networking Ancie...
[DCSB] Gabriel Bodard / Faith Lawrence (KCL), "Standards for Networking Ancie...[DCSB] Gabriel Bodard / Faith Lawrence (KCL), "Standards for Networking Ancie...
[DCSB] Gabriel Bodard / Faith Lawrence (KCL), "Standards for Networking Ancie...
 
[DCSB] Tom Brughmans (U of Konstanz), "Roman bazaar or market economy? Explai...
[DCSB] Tom Brughmans (U of Konstanz), "Roman bazaar or market economy? Explai...[DCSB] Tom Brughmans (U of Konstanz), "Roman bazaar or market economy? Explai...
[DCSB] Tom Brughmans (U of Konstanz), "Roman bazaar or market economy? Explai...
 
[DCSB] Yannick Anné and Toon Van Hal (U of Leuven), "Creating a Dynamic Gramm...
[DCSB] Yannick Anné and Toon Van Hal (U of Leuven), "Creating a Dynamic Gramm...[DCSB] Yannick Anné and Toon Van Hal (U of Leuven), "Creating a Dynamic Gramm...
[DCSB] Yannick Anné and Toon Van Hal (U of Leuven), "Creating a Dynamic Gramm...
 

Recently uploaded

The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docxPoojaSen20
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptxPoojaSen20
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 

Recently uploaded (20)

The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.Compdf
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docx
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptx
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 

[DCSB] Amiz Zeldes (HU, Berlin) "Towards Digital Coptic: Searching and Visualizing Coptic Manuscript Data"

  • 1. Towards Digital Coptic Searching and Visualizing Coptic Manuscript Data Caroline T. Schroeder, University of the Pacific cschroeder@pacific.edu Amir Zeldes, Humboldt-Universität zu Berlin amir.zeldes@rz.hu-berlin.de Berlin Digital Classicist Seminar, 14.1.2014
  • 2. Plan  Introduction  Coptic data  Annotations so far: normalizing, tokenizing and tagging  Search architecture  Searching through multiple segmentations: ANNIS  Dealing with corpus formats: TEI, SaltNPepper  Visualization  Dedicated visualizations  A reusable generic approach  Conclusion and outlook Schroeder & Zeldes / Towards Digital Coptic Berlin, 14.1.2014 1/37
  • 3. Who are these people?  Prof. Caroline T. Schroeder – Religious and Classical Studies / Humanities Center Director University of the Pacific  Dr. Amir Zeldes – Korpuslinguistik / SFB 632 Information Structure (from March: eHumanities group KOMeT) Humboldt-Universität zu Berlin  Cooperation Coptic SCRIPTORIUM established at 2012 NEH summer institute on "Text in a Digital Age" (Tufts): http://coptic.pacific.edu/ Schroeder & Zeldes / Towards Digital Coptic Berlin, 14.1.2014 2/37
  • 4. Why Coptic?  Last stage of Ancient Egyptian Language (starting 2nd Century)  Mediterranean in 1st millenium  Hellenistic period  Unique language  Longest continuous documentation  Contact language (with Greek)  Religious significance  Early Christianity  Rise of monasticism  Gnosticism  ... Schroeder & Zeldes / Towards Digital Coptic Coptische Dialects 14.1.2014 BMBF eHumanties - KOMeT / Zeldes Berlin, 3/37
  • 5. The data  Lots of material (thanks to the Egyptian desert )  Relatively little online, nothing like Greek and Latin (Perseus)  Lots of things you may want are not available:       New Testament (online, not normalized/lemmatized/annotated) Old Testament The Rule of St. Pachomius Works of Shenoute of Atripe Apophthegmata patrum ...  But some have been digitized at some point! Schroeder & Zeldes / Towards Digital Coptic Berlin, 14.1.2014 4/37
  • 6. A word about the texts in this talk  So far we've concentrated on Shenoute's sermon Abraham our Father  "As for us, brethren, let us live by the truth so that we are upstanding in all our works, and so that the prophets, apostles and all the saints might dwell among us, ..."  Apophthegmata Patrum (sayings of the desert fathers)  "They said about the blessed Sarah the virgin that she spent sixty years living at the top of the river and she never set foot outside to see the river."  New Testament, esp. Gospel of Mark see http://coptic.pacific.edu/ for corpora and tools Schroeder & Zeldes / Towards Digital Coptic Berlin, 14.1.2014 5/37
  • 7. Getting from raw text to annotated corpora  Making the data searchable starts with:  Encoding manuscripts (Epidoc TEI)  Segmentation of "word forms"  Normalization  Segmentation of morphemes  Part-of-speech tagging  More annotations...  Brief recap: Detailed talk in Leipzig last month (slides on my page) Schroeder & Zeldes / Towards Digital Coptic Berlin, 14.1.2014 6/37
  • 8. Normalization  Automatic normalization, manual correction  handling of known diacritics, abbreviations  closed, growing list of known variants Schroeder & Zeldes / Towards Digital Coptic Berlin, 14.1.2014 7/37
  • 9. Tokenization  Identifying morphemes non-trivial (agglutinative language, different conventions; we follow Layton 2004)  ϫⲓⲛⲧⲁⲓⲣ̅ⲙⲟⲛⲁⲭⲟⲥ 'Since I became a monk' since-that-PAST-1sg-do-monk  ⲉⲛⲧⲁϥⲧⲣⲉⲛⲣⲡϣⲁ 'he who made us keep the ceremony' REL-PAST-3sgM-CAUS-1pl-do-the-observance  Word level segmentation: manual (no scriptio continua)  Morph segmentation: automatic (accuracy: 84% - 94%) ⲛ̄ⲟⲩϣⲏⲣⲉ` ⲛ̄ⲁⲃⲣⲁϩⲁⲙ`  of-a-son of-Abraham Schroeder & Zeldes / Towards Digital Coptic ⲛ ⲟⲩ ϣⲏⲣⲉ ⲛ ⲁⲃⲣⲁϩⲁⲙ of a son of Abraham Berlin, 14.1.2014 8/37
  • 10. Part-of-speech tagging  POS tagging using TreeTagger (Schmid 1994) and a lexicon from the CMCL project (courtesy of Prof. Tito Orlandi)  Two tag sets:  fine grained (45 tags) and coarse (22 tags) (see http://coptic.pacific.edu/ for documentation)  Interannotator agreement: 94.19% agreement, kappa = 93.67 (considers chance agreement, cf. Artstein & Poesio 2008)  Accuracy:  In domain, 10-fold cross-validation: 94.04% (fine)  Out of domain (test with papyri.info): 79.6% (fine) / 87.7% (coarse)  Main difficulties: open classes (N/V), disambiguating homonyms (ⲉ can have 6 different tags!) Schroeder & Zeldes / Towards Digital Coptic Berlin, 14.1.2014 9/37
  • 11. Further annotations  Many other layers are done manually:  Translation  Language of origin  Coreference  Entity tagging (people, places...)  Parallel alignment (with Greek)  Syntax trees (very preliminary tests) Schroeder & Zeldes / Towards Digital Coptic Berlin, 14.1.2014 10/37
  • 12. Representing data – how to look at all this stuff?  We now have a lot of data to represent:  Diplomatic transcriptions (including character rendering!)  Normalization  Segmentation into words, morphemes, sometimes letters  Annotations  How do we encode this data for search and visualization? Schroeder & Zeldes / Towards Digital Coptic Berlin, 14.1.2014 11/37
  • 13. The first challenge: minimal units  Minimal units, or tokens, are critical for searching:  Find all words preceding the word "God"  Give me any mentions of Saint Paphnutius, ±10 words  Search for the glosses father and son within 20 words  Two problems:  The concept of words is complex in Coptic  Annotations overlap parts of words: individual letters, line breaks...  tokens are smaller than words! Schroeder & Zeldes / Towards Digital Coptic ⲡⲉϪⲁϥ ϫⲉ ⲉⲓ̇ⲥ ϣ ⲙⲟⲩⲛ ⲛ̇ⲣⲟⲙⲡⲉ ⲻ Ⲡⲉϫⲉ ⲡ̇ϩⲗ̇ⲗⲟ ⲛⲁϥ he sAid "it's been e ight years" – The old man told him Berlin, 14.1.2014 12/37
  • 14. Solution: segmentation layers in ANNIS  We use the open source ANNIS platform as a search interface (Zeldes et al. 2009)  Any annotation layer can be defined as a segmentation defining alternative views on:  Adjacency (in words, morphemes, etc.)  Proximity (in words, morphemes, etc.)  Context size (in words, morphemes, etc.)  But which segmentation layer do you want to see?  Remember, diplomatic and normalized layers don't match  Any segmentation layer is usable as "base text" Schroeder & Zeldes / Towards Digital Coptic Berlin, 14.1.2014 13/37
  • 15. Switching segmentations in ANNIS Schroeder & Zeldes / Towards Digital Coptic Berlin, 14.1.2014 14/37
  • 16. Different contexts  Example search: entity="person"  Hit: Abba Antonius  Some options: Ⲁⲩϭⲱⲗⲡ̇ 5 ⲉ̇ⲃⲟⲗ ⲛⲁⲡⲁ ⲁ̇ⲛ ⲧⲱⲛⲓ̇ⲟⲥ ϩⲓ̇ ⲡ̇ϫⲁⲓ̇ⲉ̇ · ϫⲉ ⲟⲩⲛ ⲟⲩⲁ̇ ⲉ̇ϥⲉⲓⲛⲉ̇  ±5 words, diplomatic: (less than -5 found, since start of text) Ⲁⲩϭⲱⲗⲡ̇ ⲉ̇ⲃⲟⲗ ⲛⲁⲡⲁ ⲁ̇ⲛⲧⲱⲛⲓ̇ⲟⲥ ϩⲓ̇ⲡ̇ϫⲁⲓ̇ⲉ̇ · ϫⲉⲟⲩⲛⲟⲩⲁ̇ ⲉ̇ϥⲉⲓⲛⲉ̇ ⲙ̇ⲙⲟⲕ  ±10 morphs, normalized: ⲁ ⲩ ϭⲱⲗⲡ ⲉⲃⲟⲗ ⲛ ⲁⲡⲁ ⲁⲛⲧⲱⲛⲓⲟⲥ ϩⲓ ⲡ ϫⲁⲓⲉ · ϫⲉ ⲟⲩⲛ ⲟⲩⲁ ⲉ ϥ ⲉⲓⲛⲉ ⲙⲙⲟ ⲕ  ±5 tokens: Ⲁ ⲩ ϭⲱⲗⲡ̇ ⲉ̇ⲃⲟⲗ ⲛ ⲁⲡⲁ ⲁ̇ⲛ ⲧⲱⲛⲓ̇ⲟⲥ ϩⲓ̇ ⲡ̇ ϫⲁⲓ̇ⲉ̇ · ϫⲉ Schroeder & Zeldes / Towards Digital Coptic Berlin, 14.1.2014 15/37
  • 17. Searching with AQL (see http://www.sfb632.uni-potsdam.de/annis/ )  Basic principle of ANNIS Query Language (AQL):  search for some annotations (#1, #2, #3...)  stipulate relationships between them (operators)  Example: verbs of Greek origin pos="V" & source_lang="Greek" & #1 _=_ #2 The head bandit repented identical coverage operator I have faith in God Schroeder & Zeldes / Towards Digital Coptic Berlin, 14.1.2014 16/37
  • 18. Referencing segmentations  There are many operators  . (adjacent), _i_ (inclusion), _o_ (overlap), _l_ (left aligned)...  > (dominance), -> (pointing relation), >@l (left child)...  ...  Possible to use segmentations in queries:  #1 . #2 - one followed by two  #1 .word #2 - two is the next word after one  #1 .norm,1,10 #2 - within 1 to 10 norm units  ... Schroeder & Zeldes / Towards Digital Coptic Berlin, 14.1.2014 17/37
  • 19. Adding metadata  Metadata is like any other constraint, with meta:: prefix  Can use regular expressions and negation pos!="V" & source_lang="Greek" & #1 _=_ #2 & meta::msName=/MONB.*/  For metadata names and values we use TEI/EpiDoc as a guideline  More information on AQL: http://www.sfb632.uni-potsdam.de/annis/ Schroeder & Zeldes / Towards Digital Coptic Berlin, 14.1.2014 18/37
  • 20. Architecture and formats  Different formats are suitable for different parts of the data  TEI ideal for manuscript structure, metadata  Linguistic formats for computational corpus linguistics: tagging, parsing, coreference  Convert and merge data using SaltNPepper (Zipser & Romary 2010) Schroeder & Zeldes / Towards Digital Coptic Berlin, 14.1.2014 19/37
  • 21. SaltNPepper (Zipser & Romary 2010)  Metamodel Salt for multiformat conversion  Work on extending TEI support: 2014-15  Salt as internal representation in ANNIS Schroeder & Zeldes / Towards Digital Coptic Berlin, 14.1.2014 20/37
  • 22. How can we view the data?  Even if we can query everything at once:  people who are indirect objects of the verb "show" aligned with Greek neuters...  Can we also look at everything at once?  Excerpt from a Salt graph view of two words: Schroeder & Zeldes / Towards Digital Coptic Berlin, 14.1.2014 21/37
  • 23. Breaking it down  Different annotations require different visualizations  Two conflicting requirements:  Ideal representation for each layer (syntax -> trees)  Stay generic and minimize amount of visualizations  How can we avoid programming new visualizations with each new annotation layer? Schroeder & Zeldes / Towards Digital Coptic Berlin, 14.1.2014 22/37
  • 24. Generic versus dedicated  For some purposes, dedicated visualizations cannot be avoided  Special interactive functionality  Special layouting algorithms  For other purposes, we can reuse visualizations by making flexible and configurable  Need to take segmentations into account Schroeder & Zeldes / Towards Digital Coptic Berlin, 14.1.2014 23/37
  • 25. Some dedicated examples  Syntax trees  Coreference view (interactive) Schroeder & Zeldes / Towards Digital Coptic Berlin, 14.1.2014 24/37
  • 26. Taking segmentations into account  Visualizations must be configurable to be aware of different base texts  Syntax tree is based on normalized "word"-internal morphs  Sometimes one syntactic unit has multiple tokens band of ban dits came upon a band Schroeder & Zeldes / Towards Digital Coptic of bandits band ofban 15 dits and foundthem drinking . [...] Berlin, 14.1.2014 25/37
  • 27. Reusing dedicated visualizers?  In some cases, some creative uses can be found for existing visualizations  Using the coreference visualizer for parallel alignment: apophthegmata patrum Schroeder & Zeldes / Towards Digital Coptic Berlin, 14.1.2014 26/37
  • 28. Generic visualizations Two main generic visualizers:  Annotation grid:  just mark borders of annotations  good for flat information  HTML visualizer:  generates HTML elements based on annotations  defined using two simple stylesheets  can look like (almost) anything Schroeder & Zeldes / Towards Digital Coptic Berlin, 14.1.2014 27/37
  • 29. Multiple grids  All annotations in one grid can lead to visual overload  Often better to separate groups of annotations: Schroeder & Zeldes / Towards Digital Coptic Berlin, 14.1.2014 28/37
  • 30. The HTML visualizer  Any specific visualization is configured by two style sheets: a config file and a CSS file norm.config p norm.css p div.htmlvis { word span; style="word" norm span; style="norm" font-family: Antinoou, sans-serif; width: 500px; white-space: normal !important; value trans t:title; style="trans" value } .trans:hover{color: red} .word:after{content: " ";} Schroeder & Zeldes / Towards Digital Coptic Berlin, 14.1.2014 29/37
  • 31. Result Abraham our Father <p> <t class="translation" title="Abraham our father wished to have children with Sarah."> <span class="word"> <span class="norm"> ⲁⲃⲣⲁϩⲁⲙ </span> </span> <span class="word"> <span class="norm"> ⲡⲉⲛ </span> <span class="norm"> ⲉⲓⲱⲧ </span> </span> </t> ... </p> Schroeder & Zeldes / Towards Digital Coptic Berlin, 14.1.2014 30/37
  • 32. Reusing the HTML visualizer dipl.config tok span lb div; style="line" pb table:title; style="pb" pb tr cb td; style="cb" hi_rend hi_rend:rend Schroeder & Zeldes / Towards Digital Coptic value value value Berlin, 14.1.2014 31/37
  • 33. Visualizing TEI @rend attributes dipl.css div.line{display: block; height: 22px counter-increment: linecount;} div.line:nth-of-type(5n):before{ content: counter(linecount)" "} ... .pb{border-style:solid;} .cb{counter-reset: linecount 0; width: 160px; min-width: 160px} ... hi_rend[rend*=superscript] {vertical-align: super; font-size: 80%} hi_rend[rend*=red] {color: red} hi_rend[rend*=tall] {font-size: 120%} hi_rend[rend*=extralarge] {font-size: 160%} Schroeder & Zeldes / Towards Digital Coptic Berlin, 14.1.2014 32/37
  • 34. Aggregate visualizations  Latest version of ANNIS offers basic frequency analysis  Open question: How much more should we build? Schroeder & Zeldes / Towards Digital Coptic Berlin, 14.1.2014 33/37
  • 35. Aggregate visualizations  Other visualizations are currently done e.g. in R: ϫⲟⲟ ⲉⲓⲣⲉ ϣⲁ ⲟⲩⲛ ϩⲟⲟⲩI/me ⲡⲉϫⲉ you.SG.M Egyptian vocabulary ⲓⲏⲥⲟⲩⲥ ϯⲥⲃⲱ ⲕⲁ ⲛⲉⲩ ⲉⲓ ⲅⲁⲗⲓⲗⲁⲓⲁ ⲛⲥⲱ Gospel ⲕⲏⲣⲩⲥⲥⲉ ⲉⲩⲁⲅⲅⲉⲗⲓⲟⲛ said Schroeder & Zeldes / Towards Digital Coptic ⲛⲙⲙⲁ ⲧⲃⲃⲟ Jesus ⲉⲣⲏⲙⲟⲥ ⲁⲡⲁ ⲕⲱ ⲫⲟⲣⲉⲓ ⲣⲓ ⲕ . ⲥⲱ ϣⲧⲏⲛ ⲣⲁⲧ ⲙⲉⲉⲩⲉ ⲗⲁⲁⲩ ⲙⲟⲛⲁⲭⲟⲥ ⲡⲉϫⲁ ⲣⲟⲙⲡⲉ ϫⲉⲓ ⲧⲁ ⲁϣ ⲓⲱϩⲁⲛⲛⲏⲥbaptism ⲃⲁⲡⲧⲓⲥⲙⲁ ⲁⲕⲁⲑⲁⲣⲧⲟⲛ impure John ⲥⲓⲙⲱⲛ old man ⲧⲉⲧⲛ ⲥⲩⲛⲁⲅⲱⲅⲏ ⲛⲙ ⲛⲧⲉⲣⲉ ϣⲟⲙⲛⲧ ⲏⲣⲡ ⲉⲓⲃⲉ Abba ⲟⲩⲱⲙ ⲡⲉⲓ ϩⲗⲗⲟ ⲙⲟⲟⲩ ϭⲱⲗⲡ wine synagogue ⲇⲁⲓⲙⲱⲛⲓⲟⲛ ⲥⲟⲩⲧⲛ eat Gospel of Mark 1 ⲩⲛⲟⲩ 11 apophthegmata patrum ⲡⲛⲉⲩⲙⲁ Holy Ghost Greek vocabulary Berlin, 14.1.2014 34/37
  • 36. Conclusion  Annotation projects should not be limited by corpus architectures:  annotate whatever you want, however often you want  link anything to anything  Why annotate all of these things in the corpus? (and not just in a separate spreadsheet)      Plots of just the verbs? Proper names?  POS tagging Highlight, search and link place-names?  Entity tagging Collapse inflected variants?  Lemmatization Collapse prominent referents?  Coreference annotation Dispersion of any of the above, alignment ... and much more Schroeder & Zeldes / Towards Digital Coptic Berlin, 14.1.2014 35/37
  • 37. Conclusion  Anything can be made queryable with more layers:  typical constructions and objects of verbs?  Greek vs. native verbs -> add language of origin layer  Translation behavior -> add alignment layer  ...  Fitting visualization facilities  should be easy to re-use  optimized to the task, display relevant portions of information  for many purposes, they must be sensitive to segmentations Schroeder & Zeldes / Towards Digital Coptic Berlin, 14.1.2014 36/37
  • 38. Outlook  This March: BMBF funded young researcher group on eHumanities at HU Berlin  KOMeT: KOrpuslinguistische Methoden für ePhilologie mit TEI  Focus on marrying TEI resources with computational linguistics methods and formats  Developing NLP tools, search and visualization for ancient world textual resources  Pilot phase (2014, approved): Coptic  Main phase (2015-2019, pending): Other languages as well  Currently looking for a student assistant (60h/month)  Stay tuned for more! Schroeder & Zeldes / Towards Digital Coptic Berlin, 14.1.2014 37/37
  • 40. References  Artstein, Ron & Massimo Poesio (2008), Inter-Coder Agreement for Computational Linguistics. Computational Linguistics 34(4), 556–596.  Layton, Bentley (2004), A Coptic Grammar. Second Edition, Revised and Expanded. (Porta linguarum orientalium 20.) Wiesbaden: Harrassowitz.  Schmid, Helmut (1994), Probabilistic Part-of-Speech Tagging Using Decision Trees. In: Proceedings of the Conference on New Methods in Language Processing. Manchester, UK, 44–49. Available at: http://www.ims.unistuttgart.de/ftp/pub/corpora/tree-tagger1.pdf.  Zeldes, Amir, Julia Ritz, Anke Lüdeling & Christian Chiarcos (2009), ANNIS: A Search Tool for Multi-Layer Annotated Corpora. In: Proceedings of Corpus Linguistics 2009. Liverpool, UK.  Zipser, Florian & Laurent Romary (2010), A Model Oriented Approach to the Mapping of Annotation Formats using Standards. In: Proceedings of the Workshop on Language Resource and Language Technology Standards, LREC-2010. Valletta, Malta, 7–18.
  • 41. Links  Coptic SCRIPTORIUM:  ANNIS: http://coptic.pacific.edu/ http://www.sfb632.uni-potsdam.de/annis/  Search engine for our corpora: https://korpling.german.hu-berlin.de/annis3/scriptorium  Papyri.info: http://papyri.info/  CMCL: http://cmcl.let.uniroma1.it/