SlideShare a Scribd company logo
1 of 49
Download to read offline
Data Analysis for Ancient Corpora
Cody Kingham and Dirk Roorda
FAMES, Cambridge, 2019-01-31
0
50
100
150
200
250
conj nmpr subs adjv prep art
Parts of Speech after Atnach in ETCBC Phrase
background
description
mini-study
new horizons
• Put researchers in control of their
data.
• Empower researchers to fully
harness the data available to them.
• Encourage a new paradigm in the
humanities
🤔
"# data
💰
what’s important
limits
researchers
they decide
Text-Fabric and Hebrew Data
• Free, accessible corpus annotation and analysis tool.
• Published the Amsterdam Hebrew data on Github with free,
open-source license.
• Encouraged researchers to step out of their technological
comfort zones.
A Different Vision
• Researchers are in charge of their data and set the agenda for
its use.
• Researchers are empowered with the tools needed for
powerful data analysis.
• Data is made open-source, freely available
Text-Fabric
• Graph model: words, phrases, etc. are “nodes,” relationships
between them are edges.
• We can model complex data structures better than other
methods (e.g. XML).
• All stored in easy-to-understand, plain-text files. No messy
XML, SQL, etc.
&P005381 = MSVO 3, 70
#atf: lang qpc
@tablet
@obverse
@column 1
1.a. 2(N14) , SZE~a SAL TUR3~a NUN~a
1.b. 3(N19) , |GISZ.TE|
2. 1(N14) , NAR NUN~a SIG7
3. 2(N04)# , PIRIG~b1 SIG7 URI3~a NUN~a
@column 2
1. 3(N04) , |GISZ.TE| GAR |SZU2.((HI+1(N57))+(HI+1(N57)))| GI4~a
2. , GU7 AZ SI4~f
@reverse
@column 1
1. 3(N14) , SZE~a
2. 3(N19) 5(N04) ,
3. , GU7
@column 2
1. , AZ SI4~f
CTBA|CTBA#CTBA#CTB###0#0#0#3#1#0#2#0#0#2#0#0#2#0#0#0#0#0 D;L;DOTH|;L;DOT#;L;DOTA#;LD#D#H#0#0#0#3#1#0#3#0#0#2#0#0#2#1#1#3#0#0
D;WOE|;WOE#;WOE#;WOE#D##0#0#0#0#0#0#0#0#0#1#0#0#2#0#0#0#0#0 MW;KA|MW;KA#MW;KA#MWK###0#1#0#3#1#0#2#0#0#0#0#2#0#0#0#0#0#0 BRH|
BR#BRA#BR##H#0#0#0#3#1#0#2#0#0#2#0#0#2#1#1#3#0#0 DDO;D|DO;D#DO;D#DO;D#D##0#0#0#0#0#0#0#0#0#1#0#0#2#0#0#0#0#0 BRH|
BR#BRA#BR##H#0#0#0#3#1#0#2#0#0#2#0#0#2#1#1#3#0#0 DABRHM|ABRHM#ABRHM#ABRHM#D##0#0#0#0#0#0#0#0#0#1#0#0#2#0#0#0#0#0
ABRHM|ABRHM#ABRHM#ABRHM###0#0#0#0#0#0#0#0#0#1#0#0#2#0#0#0#0#0 AOLD|AOLD#;LD#;LD###0#5#1#0#1#3#2#0#0#0#0#0#0#0#0#0#0#0 LA;SKX|
A;SKX#A;SKX#A;SKX#L##0#0#0#0#0#0#0#0#0#1#0#0#2#0#0#0#0#0 A;SKX|A;SKX#A;SKX#A;SKX###0#0#0#0#0#0#0#0#0#1#0#0#2#0#0#0#0#0 AOLD|
Syriac NT (Sedra database)
DEUT33,02 >C- >;71C 1.000 >;71C- >C-
DEUT33,02 DT D.@73T 1.000 D.@73T DT
DEUT33,09 BNW B.@N@73JW 1.000 B.@N@73W BNW
EST 01,16 MWMKN M:MW.K@81N 1.000 M:WM.K@81N MWMKN
EST 03,04 B- K.:- 1.000 B.:- B-
EST 03,04 >MRM >@M:R@70M 1.000 >@M:R@70M >MRM
Hebrew Ketiv-Qere (ETCBC)
Cuneiform Uruk (CDLI)
(1:1:1:1) bi P PREFIX|bi+
(1:1:1:2) somi N STEM|POS:N|LEM:{som|ROOT:smw|M|GEN
(1:1:2:1) {ll~ahi PN STEM|POS:PN|LEM:{ll~ah|ROOT:Alh|GEN
(1:1:3:1) {l DET PREFIX|Al+
(1:1:3:2) r~aHoma`ni ADJ STEM|POS:ADJ|LEM:r~aHoma`n|ROOT:rHm|MS|GEN
(1:1:4:1) {l DET PREFIX|Al+
(1:1:4:2) r~aHiymi ADJ STEM|POS:ADJ|LEM:r~aHiym|ROOT:rHm|MS|GEN
(1:2:1:1) {lo DET PREFIX|Al+
(1:2:1:2) Hamodu N STEM|POS:N|LEM:Hamod|ROOT:Hmd|M|NOM
Arabic Quran (Tanzil)
Source data of a corpus
TEI, Markdown, ASCII, Database
Data structure of TF - the IKEA spirit
node
order! order!
stacks of components
uniquely identified
words
phrases
chapters
verses
Conversion to TF
TF does more than half of the work
# Consider Phlebas
$ author=Iain M. Banks
## 1
Everything about us,
everything around us,
everything we know [and can know of] is composed ultimately of
patterns of nothing;
that’s the bottom line, the final truth.
So where we find we have any control over those patterns,
why not make the most elegant ones, the most enjoyable and good
ones,
in our own terms?
## 2
Besides,
it left the humans in the Culture free to take care of the things that
really mattered in life,
such as [sports, games, romance,] studying dead languages,
barbarian societies and impossible problems,
and climbing high mountains without the aid of a safety harness.
@node
@compiler=Dirk Roorda
@description=the letters of a word
@name=Culture quotes from Iain
Banks
@source=Good Reads
@url=https://www.goodreads.com/
work/quotes/14366-consider-phlebas
@valueType=str
@writtenBy=Text-Fabric
@dateWritten=2019-01-30T22:20:19Z
Everything
about
us
everything
around
us
everything
we
know
and
can
know
of
is
composed
ultimately
of
patterns
of
nothing
that’s
the
bottom
line
the
final
truth
So letters
@node
@compiler=Dirk Roorda
@description=the punctuation after
a word
@name=Culture quotes from Iain
Banks
@source=Good Reads
@url=https://www.goodreads.com/
work/quotes/14366-consider-phlebas
@valueType=str
@writtenBy=Text-Fabric
@dateWritten=2019-01-30T22:20:19Z
3 ,
6 ,
20 ;
24 ,
27 .
38 ,
45 ,
51 ,
55 ?
,
75 ,
78 ,
,
,
83 ,
88 ,
99 .
punc
banks/tf/
author.tf
gap.tf
letters.tf
number.tf
oslots.tf
otext.tf
otype.tf
punc.tf
terminator.tf
title.tf
TF dataset
otype
@node
@compiler=Dirk Roorda
@name=Culture quotes from Iain Banks
@source=Good Reads
@url=https://www.goodreads.com/work/quotes/14366-consider-phlebas
@valueType=str
@writtenBy=Text-Fabric
@dateWritten=2019-01-30T22:20:19Z
1-99 word
100 book
101-102 chapter
103-114 line
115-117 sentence
oslots
@edge
@compiler=Dirk Roorda
@name=Culture quotes from Iain Banks
@source=Good Reads
@url=https://www.goodreads.com/work/quotes/14366-consider-phlebas
@valueType=str
@writtenBy=Text-Fabric
@dateWritten=2019-01-30T22:20:19Z
100 1-99
1-55
56-99
1-3
4-6
7-9,14-20
21-27
28-38
39-51
52-55
56
57-75
76-77,81-83
84-88
89-99
1-27
28-55
56-99
1-99 word
100 book
101-102 chapter
103-114 line
115-117 sentence
## 1
Everything about us,
everything around us,
everything we know [and can know of] is composed ultimately of patterns of
nothing;
that’s the bottom line, the final truth.
So where we find we have any control over those patterns,
why not make the most elegant ones, the most enjoyable and good ones,
in our own terms?
## 2
Besides,
it left the humans in the Culture free to take care of the things that really
mattered in life,
such as [sports, games, romance,] studying dead languages,
barbarian societies and impossible problems,
and climbing high mountains without the aid of a safety harness.
otext
@config
@compiler=Dirk Roorda
@fmt:text-orig-full={letters}{punc}
@name=Culture quotes from Iain Banks
@sectionFeatures=title,number
@sectionTypes=book,chapter
@source=Good Reads
@url=https://www.goodreads.com/work/quotes/14366-consider-phlebas
@writtenBy=Text-Fabric
@dateWritten=2019-01-30T22:20:19Z
Computing - Python - Jupyter notebooks
https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/bhsa/start.ipynb
BHSA
Quran
https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/quran/start.ipynb
https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/syrnt/start.ipynb
Syriac NT
Old Babylon'
https://shebanq.ancient-data.org/hebrew/query?version=4b&id=1050 SHEBANQ
Computing - more power!
https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/bhsa/searchFromMQL.ipynb
BHSA
Quran
https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/quran/search.ipynb
Quran
https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/quran/search.ipynb
Syriac NT
https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/syrnt/search.ipynb
Old Babylon'
Uruk
https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/uruk/search.ipynb
UrukPower to you! (without the programming)
Uruk
Uruk
Mini-Study:
Atnachs and Phrase Divisions
• How often do atnach accents disagree with the ETCBC phrase
divisions?
• Why?
Sharing and re-using data
Text-Fabric has been developed by a DANS-employee
as a consequence:
Data export is built in ✅
Provenance tracking is built in ✅
Redistribution of newly created data is built in ✅
sharing #1: GitHub & NBviewer
work done in a Jupyter Notebook inside a GitHub repository
is very sharable
https://github.com/Nino-cunei/primers/blob/master/oldbabylonian/OB-primer1.ipynb
sharing #2: Export from TF-browser
sharing #3: Zenodo
sharing #4: Create new features
https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/bhsa/share.ipynb
• etcbc/valence/tf : the results of the verbal valence work of Janet Dyk in the
SYNVAR project;

• etcbc/lingo/heads/tf : head words for phrases, work done by Cody Kingham;

• ch-jensen/Semantic-mapping-of-participants/actor/tf : participant analysis in
progress by Christian Høygaard-Jensen;

• cmerwich/bh-reference-system/tf: participant analysis in progress by
Christiaan Erwich;

• or whatever you have in the making!

• HINT: semantic/fuzzy/plurality for collective nouns (Chip Hardy?)
https://github.com/ETCBC/lingo/tree/master/easter/tf/c
https://github.com/ETCBC/lingo/tree/master/easter/tf/c
Open Science Rocks
thank you
Cody Kingham codykingham@icloud.com
Dirk Roorda dirk.roorda@dans.knaw.nl

More Related Content

Similar to Ancient corpora analysis

Digital Humanities: A brief introduction to the field
Digital Humanities: A brief introduction to the fieldDigital Humanities: A brief introduction to the field
Digital Humanities: A brief introduction to the fieldaelang
 
Entities for Augmented Intelligence
Entities for Augmented IntelligenceEntities for Augmented Intelligence
Entities for Augmented Intelligencekrisztianbalog
 
The world is y0ur$: Geolocation-based wordlist generation with wordsmith
The world is y0ur$: Geolocation-based wordlist generation with wordsmithThe world is y0ur$: Geolocation-based wordlist generation with wordsmith
The world is y0ur$: Geolocation-based wordlist generation with wordsmithSanjiv Kawa
 
bridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the webbridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the webFabien Gandon
 
Complex queries in a distributed multi-model database
Complex queries in a distributed multi-model databaseComplex queries in a distributed multi-model database
Complex queries in a distributed multi-model databaseMax Neunhöffer
 
Topic models, vector semantics and applications
Topic models, vector semantics and applicationsTopic models, vector semantics and applications
Topic models, vector semantics and applicationsVasileios Lampos
 
A Matching Approach Based on Term Clusters for eRecruitment
A Matching Approach Based on Term Clusters for eRecruitmentA Matching Approach Based on Term Clusters for eRecruitment
A Matching Approach Based on Term Clusters for eRecruitmentKemal Can Kara
 
Sarah Rees Jones (York) and Helen Petrie: 'Chartex overview and next steps'
Sarah Rees Jones (York) and Helen Petrie: 'Chartex overview and next steps' Sarah Rees Jones (York) and Helen Petrie: 'Chartex overview and next steps'
Sarah Rees Jones (York) and Helen Petrie: 'Chartex overview and next steps' Digital History
 
Dark Data In the Long Tail of Science:   Examples in Biology
Dark Data In the Long Tail of Science:  Examples in BiologyDark Data In the Long Tail of Science:  Examples in Biology
Dark Data In the Long Tail of Science:   Examples in BiologyBryan Heidorn
 
From Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science TalesFrom Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science TalesBertram Ludäscher
 
Defrosting the Digital Library: A survey of bibliographic tools for the next ...
Defrosting the Digital Library: A survey of bibliographic tools for the next ...Defrosting the Digital Library: A survey of bibliographic tools for the next ...
Defrosting the Digital Library: A survey of bibliographic tools for the next ...Duncan Hull
 
Describing Everything - Open Web standards and classification
Describing Everything - Open Web standards and classificationDescribing Everything - Open Web standards and classification
Describing Everything - Open Web standards and classificationDan Brickley
 
Presentatie nl.dbpedia.org Datasalon 8 Gent 24 Februari 2012
Presentatie nl.dbpedia.org Datasalon 8 Gent 24 Februari 2012Presentatie nl.dbpedia.org Datasalon 8 Gent 24 Februari 2012
Presentatie nl.dbpedia.org Datasalon 8 Gent 24 Februari 2012Enno Meijers
 
IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)Marina Santini
 
Modelling and Querying Lists in RDF. A Pragmatic Study
Modelling and Querying Lists in RDF. A Pragmatic StudyModelling and Querying Lists in RDF. A Pragmatic Study
Modelling and Querying Lists in RDF. A Pragmatic StudyAlbert Meroño-Peñuela
 
Machine-Interpretable Dataset and Service Descriptions for Heterogeneous Data...
Machine-Interpretable Dataset and Service Descriptions for Heterogeneous Data...Machine-Interpretable Dataset and Service Descriptions for Heterogeneous Data...
Machine-Interpretable Dataset and Service Descriptions for Heterogeneous Data...andimou
 
DB-IR-ranking
DB-IR-rankingDB-IR-ranking
DB-IR-rankingFELIX75
 

Similar to Ancient corpora analysis (20)

Digital Humanities: A brief introduction to the field
Digital Humanities: A brief introduction to the fieldDigital Humanities: A brief introduction to the field
Digital Humanities: A brief introduction to the field
 
Entities for Augmented Intelligence
Entities for Augmented IntelligenceEntities for Augmented Intelligence
Entities for Augmented Intelligence
 
The world is y0ur$: Geolocation-based wordlist generation with wordsmith
The world is y0ur$: Geolocation-based wordlist generation with wordsmithThe world is y0ur$: Geolocation-based wordlist generation with wordsmith
The world is y0ur$: Geolocation-based wordlist generation with wordsmith
 
bridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the webbridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the web
 
Complex queries in a distributed multi-model database
Complex queries in a distributed multi-model databaseComplex queries in a distributed multi-model database
Complex queries in a distributed multi-model database
 
Topic models, vector semantics and applications
Topic models, vector semantics and applicationsTopic models, vector semantics and applications
Topic models, vector semantics and applications
 
A Matching Approach Based on Term Clusters for eRecruitment
A Matching Approach Based on Term Clusters for eRecruitmentA Matching Approach Based on Term Clusters for eRecruitment
A Matching Approach Based on Term Clusters for eRecruitment
 
Sarah Rees Jones (York) and Helen Petrie: 'Chartex overview and next steps'
Sarah Rees Jones (York) and Helen Petrie: 'Chartex overview and next steps' Sarah Rees Jones (York) and Helen Petrie: 'Chartex overview and next steps'
Sarah Rees Jones (York) and Helen Petrie: 'Chartex overview and next steps'
 
Dark Data In the Long Tail of Science:   Examples in Biology
Dark Data In the Long Tail of Science:  Examples in BiologyDark Data In the Long Tail of Science:  Examples in Biology
Dark Data In the Long Tail of Science:   Examples in Biology
 
From Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science TalesFrom Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science Tales
 
Defrosting the Digital Library: A survey of bibliographic tools for the next ...
Defrosting the Digital Library: A survey of bibliographic tools for the next ...Defrosting the Digital Library: A survey of bibliographic tools for the next ...
Defrosting the Digital Library: A survey of bibliographic tools for the next ...
 
Describing Everything - Open Web standards and classification
Describing Everything - Open Web standards and classificationDescribing Everything - Open Web standards and classification
Describing Everything - Open Web standards and classification
 
Empirical Semantics
Empirical SemanticsEmpirical Semantics
Empirical Semantics
 
Some Information Retrieval Models and Our Experiments for TREC KBA
Some Information Retrieval Models and Our Experiments for TREC KBASome Information Retrieval Models and Our Experiments for TREC KBA
Some Information Retrieval Models and Our Experiments for TREC KBA
 
Presentatie nl.dbpedia.org Datasalon 8 Gent 24 Februari 2012
Presentatie nl.dbpedia.org Datasalon 8 Gent 24 Februari 2012Presentatie nl.dbpedia.org Datasalon 8 Gent 24 Februari 2012
Presentatie nl.dbpedia.org Datasalon 8 Gent 24 Februari 2012
 
IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)
 
Modelling and Querying Lists in RDF. A Pragmatic Study
Modelling and Querying Lists in RDF. A Pragmatic StudyModelling and Querying Lists in RDF. A Pragmatic Study
Modelling and Querying Lists in RDF. A Pragmatic Study
 
Machine-Interpretable Dataset and Service Descriptions for Heterogeneous Data...
Machine-Interpretable Dataset and Service Descriptions for Heterogeneous Data...Machine-Interpretable Dataset and Service Descriptions for Heterogeneous Data...
Machine-Interpretable Dataset and Service Descriptions for Heterogeneous Data...
 
Recommandation sociale : filtrage collaboratif et par le contenu
Recommandation sociale : filtrage collaboratif et par le contenuRecommandation sociale : filtrage collaboratif et par le contenu
Recommandation sociale : filtrage collaboratif et par le contenu
 
DB-IR-ranking
DB-IR-rankingDB-IR-ranking
DB-IR-ranking
 

More from Dirk Roorda

General Missives
General MissivesGeneral Missives
General MissivesDirk Roorda
 
Text Display (when it gets tricky)
Text Display (when it gets tricky)Text Display (when it gets tricky)
Text Display (when it gets tricky)Dirk Roorda
 
Verbal Valency in Hebrew Verbs
Verbal Valency in Hebrew VerbsVerbal Valency in Hebrew Verbs
Verbal Valency in Hebrew VerbsDirk Roorda
 
Data management for researchers
Data management for researchersData management for researchers
Data management for researchersDirk Roorda
 
Annotating the Hebrew Bible
Annotating the Hebrew BibleAnnotating the Hebrew Bible
Annotating the Hebrew BibleDirk Roorda
 
20151111 utrecht ver theolbibliothecarissen
20151111 utrecht ver theolbibliothecarissen20151111 utrecht ver theolbibliothecarissen
20151111 utrecht ver theolbibliothecarissenDirk Roorda
 
Text as Data: processing the Hebrew Bible
Text as Data: processing the Hebrew BibleText as Data: processing the Hebrew Bible
Text as Data: processing the Hebrew BibleDirk Roorda
 
Datamanagement for Research: A Case Study
Datamanagement for Research: A Case StudyDatamanagement for Research: A Case Study
Datamanagement for Research: A Case StudyDirk Roorda
 
Datamanagement for Research: A Case Study
Datamanagement for Research: A Case StudyDatamanagement for Research: A Case Study
Datamanagement for Research: A Case StudyDirk Roorda
 
Hebrew Bible as Data: Laboratory, Sharing, Lessons
Hebrew Bible as Data: Laboratory, Sharing, LessonsHebrew Bible as Data: Laboratory, Sharing, Lessons
Hebrew Bible as Data: Laboratory, Sharing, LessonsDirk Roorda
 
Laf fabric-dh benelux2014
Laf fabric-dh benelux2014Laf fabric-dh benelux2014
Laf fabric-dh benelux2014Dirk Roorda
 
Data Analysis in the Hebrew Bible
Data Analysis in the Hebrew BibleData Analysis in the Hebrew Bible
Data Analysis in the Hebrew BibleDirk Roorda
 
Auto ingest demo-werklunch 2013-11-05
Auto ingest demo-werklunch 2013-11-05Auto ingest demo-werklunch 2013-11-05
Auto ingest demo-werklunch 2013-11-05Dirk Roorda
 

More from Dirk Roorda (20)

TF-FAIR.pdf
TF-FAIR.pdfTF-FAIR.pdf
TF-FAIR.pdf
 
Textpy
TextpyTextpy
Textpy
 
General Missives
General MissivesGeneral Missives
General Missives
 
Text Display (when it gets tricky)
Text Display (when it gets tricky)Text Display (when it gets tricky)
Text Display (when it gets tricky)
 
Tf in-context
Tf in-contextTf in-context
Tf in-context
 
Qdf2tf
Qdf2tfQdf2tf
Qdf2tf
 
Text fabric
Text fabricText fabric
Text fabric
 
Verbal Valency in Hebrew Verbs
Verbal Valency in Hebrew VerbsVerbal Valency in Hebrew Verbs
Verbal Valency in Hebrew Verbs
 
Data management for researchers
Data management for researchersData management for researchers
Data management for researchers
 
Annotating the Hebrew Bible
Annotating the Hebrew BibleAnnotating the Hebrew Bible
Annotating the Hebrew Bible
 
20151111 utrecht ver theolbibliothecarissen
20151111 utrecht ver theolbibliothecarissen20151111 utrecht ver theolbibliothecarissen
20151111 utrecht ver theolbibliothecarissen
 
Text as Data: processing the Hebrew Bible
Text as Data: processing the Hebrew BibleText as Data: processing the Hebrew Bible
Text as Data: processing the Hebrew Bible
 
Datamanagement for Research: A Case Study
Datamanagement for Research: A Case StudyDatamanagement for Research: A Case Study
Datamanagement for Research: A Case Study
 
Award
AwardAward
Award
 
Datamanagement for Research: A Case Study
Datamanagement for Research: A Case StudyDatamanagement for Research: A Case Study
Datamanagement for Research: A Case Study
 
Hebrew Bible as Data: Laboratory, Sharing, Lessons
Hebrew Bible as Data: Laboratory, Sharing, LessonsHebrew Bible as Data: Laboratory, Sharing, Lessons
Hebrew Bible as Data: Laboratory, Sharing, Lessons
 
Laf fabric-dh benelux2014
Laf fabric-dh benelux2014Laf fabric-dh benelux2014
Laf fabric-dh benelux2014
 
Data Analysis in the Hebrew Bible
Data Analysis in the Hebrew BibleData Analysis in the Hebrew Bible
Data Analysis in the Hebrew Bible
 
LAF Fabric
LAF FabricLAF Fabric
LAF Fabric
 
Auto ingest demo-werklunch 2013-11-05
Auto ingest demo-werklunch 2013-11-05Auto ingest demo-werklunch 2013-11-05
Auto ingest demo-werklunch 2013-11-05
 

Recently uploaded

Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
Quarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up FridayQuarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up FridayMakMakNepo
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 

Recently uploaded (20)

Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
Quarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up FridayQuarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up Friday
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 

Ancient corpora analysis

  • 1. Data Analysis for Ancient Corpora Cody Kingham and Dirk Roorda FAMES, Cambridge, 2019-01-31 0 50 100 150 200 250 conj nmpr subs adjv prep art Parts of Speech after Atnach in ETCBC Phrase
  • 3. • Put researchers in control of their data. • Empower researchers to fully harness the data available to them. • Encourage a new paradigm in the humanities
  • 4.
  • 5.
  • 6.
  • 8.
  • 9.
  • 10.
  • 12.
  • 13.
  • 14. Text-Fabric and Hebrew Data • Free, accessible corpus annotation and analysis tool. • Published the Amsterdam Hebrew data on Github with free, open-source license. • Encouraged researchers to step out of their technological comfort zones.
  • 15.
  • 16. A Different Vision • Researchers are in charge of their data and set the agenda for its use. • Researchers are empowered with the tools needed for powerful data analysis. • Data is made open-source, freely available
  • 17. Text-Fabric • Graph model: words, phrases, etc. are “nodes,” relationships between them are edges. • We can model complex data structures better than other methods (e.g. XML). • All stored in easy-to-understand, plain-text files. No messy XML, SQL, etc.
  • 18. &P005381 = MSVO 3, 70 #atf: lang qpc @tablet @obverse @column 1 1.a. 2(N14) , SZE~a SAL TUR3~a NUN~a 1.b. 3(N19) , |GISZ.TE| 2. 1(N14) , NAR NUN~a SIG7 3. 2(N04)# , PIRIG~b1 SIG7 URI3~a NUN~a @column 2 1. 3(N04) , |GISZ.TE| GAR |SZU2.((HI+1(N57))+(HI+1(N57)))| GI4~a 2. , GU7 AZ SI4~f @reverse @column 1 1. 3(N14) , SZE~a 2. 3(N19) 5(N04) , 3. , GU7 @column 2 1. , AZ SI4~f CTBA|CTBA#CTBA#CTB###0#0#0#3#1#0#2#0#0#2#0#0#2#0#0#0#0#0 D;L;DOTH|;L;DOT#;L;DOTA#;LD#D#H#0#0#0#3#1#0#3#0#0#2#0#0#2#1#1#3#0#0 D;WOE|;WOE#;WOE#;WOE#D##0#0#0#0#0#0#0#0#0#1#0#0#2#0#0#0#0#0 MW;KA|MW;KA#MW;KA#MWK###0#1#0#3#1#0#2#0#0#0#0#2#0#0#0#0#0#0 BRH| BR#BRA#BR##H#0#0#0#3#1#0#2#0#0#2#0#0#2#1#1#3#0#0 DDO;D|DO;D#DO;D#DO;D#D##0#0#0#0#0#0#0#0#0#1#0#0#2#0#0#0#0#0 BRH| BR#BRA#BR##H#0#0#0#3#1#0#2#0#0#2#0#0#2#1#1#3#0#0 DABRHM|ABRHM#ABRHM#ABRHM#D##0#0#0#0#0#0#0#0#0#1#0#0#2#0#0#0#0#0 ABRHM|ABRHM#ABRHM#ABRHM###0#0#0#0#0#0#0#0#0#1#0#0#2#0#0#0#0#0 AOLD|AOLD#;LD#;LD###0#5#1#0#1#3#2#0#0#0#0#0#0#0#0#0#0#0 LA;SKX| A;SKX#A;SKX#A;SKX#L##0#0#0#0#0#0#0#0#0#1#0#0#2#0#0#0#0#0 A;SKX|A;SKX#A;SKX#A;SKX###0#0#0#0#0#0#0#0#0#1#0#0#2#0#0#0#0#0 AOLD| Syriac NT (Sedra database) DEUT33,02 >C- >;71C 1.000 >;71C- >C- DEUT33,02 DT D.@73T 1.000 D.@73T DT DEUT33,09 BNW B.@N@73JW 1.000 B.@N@73W BNW EST 01,16 MWMKN M:MW.K@81N 1.000 M:WM.K@81N MWMKN EST 03,04 B- K.:- 1.000 B.:- B- EST 03,04 >MRM >@M:R@70M 1.000 >@M:R@70M >MRM Hebrew Ketiv-Qere (ETCBC) Cuneiform Uruk (CDLI) (1:1:1:1) bi P PREFIX|bi+ (1:1:1:2) somi N STEM|POS:N|LEM:{som|ROOT:smw|M|GEN (1:1:2:1) {ll~ahi PN STEM|POS:PN|LEM:{ll~ah|ROOT:Alh|GEN (1:1:3:1) {l DET PREFIX|Al+ (1:1:3:2) r~aHoma`ni ADJ STEM|POS:ADJ|LEM:r~aHoma`n|ROOT:rHm|MS|GEN (1:1:4:1) {l DET PREFIX|Al+ (1:1:4:2) r~aHiymi ADJ STEM|POS:ADJ|LEM:r~aHiym|ROOT:rHm|MS|GEN (1:2:1:1) {lo DET PREFIX|Al+ (1:2:1:2) Hamodu N STEM|POS:N|LEM:Hamod|ROOT:Hmd|M|NOM Arabic Quran (Tanzil) Source data of a corpus TEI, Markdown, ASCII, Database
  • 19. Data structure of TF - the IKEA spirit node order! order! stacks of components uniquely identified words phrases chapters verses
  • 20. Conversion to TF TF does more than half of the work
  • 21. # Consider Phlebas $ author=Iain M. Banks ## 1 Everything about us, everything around us, everything we know [and can know of] is composed ultimately of patterns of nothing; that’s the bottom line, the final truth. So where we find we have any control over those patterns, why not make the most elegant ones, the most enjoyable and good ones, in our own terms? ## 2 Besides, it left the humans in the Culture free to take care of the things that really mattered in life, such as [sports, games, romance,] studying dead languages, barbarian societies and impossible problems, and climbing high mountains without the aid of a safety harness.
  • 22. @node @compiler=Dirk Roorda @description=the letters of a word @name=Culture quotes from Iain Banks @source=Good Reads @url=https://www.goodreads.com/ work/quotes/14366-consider-phlebas @valueType=str @writtenBy=Text-Fabric @dateWritten=2019-01-30T22:20:19Z Everything about us everything around us everything we know and can know of is composed ultimately of patterns of nothing that’s the bottom line the final truth So letters @node @compiler=Dirk Roorda @description=the punctuation after a word @name=Culture quotes from Iain Banks @source=Good Reads @url=https://www.goodreads.com/ work/quotes/14366-consider-phlebas @valueType=str @writtenBy=Text-Fabric @dateWritten=2019-01-30T22:20:19Z 3 , 6 , 20 ; 24 , 27 . 38 , 45 , 51 , 55 ? , 75 , 78 , , , 83 , 88 , 99 . punc banks/tf/ author.tf gap.tf letters.tf number.tf oslots.tf otext.tf otype.tf punc.tf terminator.tf title.tf TF dataset
  • 23. otype @node @compiler=Dirk Roorda @name=Culture quotes from Iain Banks @source=Good Reads @url=https://www.goodreads.com/work/quotes/14366-consider-phlebas @valueType=str @writtenBy=Text-Fabric @dateWritten=2019-01-30T22:20:19Z 1-99 word 100 book 101-102 chapter 103-114 line 115-117 sentence
  • 24. oslots @edge @compiler=Dirk Roorda @name=Culture quotes from Iain Banks @source=Good Reads @url=https://www.goodreads.com/work/quotes/14366-consider-phlebas @valueType=str @writtenBy=Text-Fabric @dateWritten=2019-01-30T22:20:19Z 100 1-99 1-55 56-99 1-3 4-6 7-9,14-20 21-27 28-38 39-51 52-55 56 57-75 76-77,81-83 84-88 89-99 1-27 28-55 56-99 1-99 word 100 book 101-102 chapter 103-114 line 115-117 sentence ## 1 Everything about us, everything around us, everything we know [and can know of] is composed ultimately of patterns of nothing; that’s the bottom line, the final truth. So where we find we have any control over those patterns, why not make the most elegant ones, the most enjoyable and good ones, in our own terms? ## 2 Besides, it left the humans in the Culture free to take care of the things that really mattered in life, such as [sports, games, romance,] studying dead languages, barbarian societies and impossible problems, and climbing high mountains without the aid of a safety harness.
  • 25. otext @config @compiler=Dirk Roorda @fmt:text-orig-full={letters}{punc} @name=Culture quotes from Iain Banks @sectionFeatures=title,number @sectionTypes=book,chapter @source=Good Reads @url=https://www.goodreads.com/work/quotes/14366-consider-phlebas @writtenBy=Text-Fabric @dateWritten=2019-01-30T22:20:19Z
  • 26. Computing - Python - Jupyter notebooks https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/bhsa/start.ipynb BHSA
  • 31. Computing - more power! https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/bhsa/searchFromMQL.ipynb BHSA
  • 37. UrukPower to you! (without the programming)
  • 38. Uruk
  • 39. Uruk
  • 40. Mini-Study: Atnachs and Phrase Divisions • How often do atnach accents disagree with the ETCBC phrase divisions? • Why?
  • 41. Sharing and re-using data Text-Fabric has been developed by a DANS-employee as a consequence: Data export is built in ✅ Provenance tracking is built in ✅ Redistribution of newly created data is built in ✅
  • 42. sharing #1: GitHub & NBviewer work done in a Jupyter Notebook inside a GitHub repository is very sharable
  • 44. sharing #2: Export from TF-browser
  • 46. sharing #4: Create new features https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/bhsa/share.ipynb • etcbc/valence/tf : the results of the verbal valence work of Janet Dyk in the SYNVAR project; • etcbc/lingo/heads/tf : head words for phrases, work done by Cody Kingham; • ch-jensen/Semantic-mapping-of-participants/actor/tf : participant analysis in progress by Christian Høygaard-Jensen; • cmerwich/bh-reference-system/tf: participant analysis in progress by Christiaan Erwich; • or whatever you have in the making! • HINT: semantic/fuzzy/plurality for collective nouns (Chip Hardy?)
  • 49. Open Science Rocks thank you Cody Kingham codykingham@icloud.com Dirk Roorda dirk.roorda@dans.knaw.nl