SlideShare a Scribd company logo
1 of 18
Download to read offline
Jack T. Bowers
Melanie Seltmann
Austrian Academy of Sciences -Austrian Center for Digital Humanities
Exploring data models for heterogenous
dialect data:
the case of explore.bread.AT!
Outline of Presentation
Part I: Overview of project & data
Part II: Overview of possible solutions using XML-based
markup standards for representing onomasiological
dialectal language
explore.AT!
Overview:
• DBÖ: collection of Bavarian dialectal speech began 1911
• 2015-2016 converted from TUSTEP to TEI
Goals
• Gain cultural and linguistic insights into Bavarian dialects in former
Austro-Hungarian empire;
• Update and improve the existing body of resources by converting to
conform with standards and best practice (ISOcat, ISOconcept, etc.;
• Enhance usability and compatibility of data in order to share with
project partners;
• Integration of semantic web/LOD resources;
Project Overview: Datasets
DBÖ@TEI
WBÖ@TEI
BaseX Database
place inventory (TEI-listPlace)
concept inventory(TEI-feature structures)
gram features inventory (TEI-feature structures)
questionnaires (TEI-list)
DBÖ@ema
SQL
BaseX Database
Extracted Topical Datasets
explore.bread
The language of Color
lexicon(location(a))
inventory(lexicalFeature(a))
• Domain/Topic-based (exploreBread)
• Location
• Lexical/grammatical features
Possible basis for examination of sub-datasets
Visualization
DBÖ Questionnaires
Questionnaires:
While questionnaires are topical in general, they are a complicated
mixture of semasiological (term-based) and onomasiological
(concept-based)
e.g.
(31B5) bes. Weißgebäcke:
länglich flaches, gerundetes Weißgebäck, z.B. Strutz (l.!),
Strutzen, Strützel, Wecken u.a.; scherzhafte Bez. wie Schendarm
Current means of extracting this information were initially limited to:
• Questionnaires
• String searches in certain data fields
Dataset requires significant manual editing and curation due to nature of
the questionnaires
Desired Enhancements
In most sub-topical studies such as ExploreBread! there would be
potential benefits of having the ability to format data onomasiologically,
for example:
• Domain and/or concept-oriented entries better represent the content of
interest
• Information retrieval
• Ontology mapping
• Etymological &/or Morphosyntactic analysis
• Cross linguistic (or dialectal) comparisson or translation
Problem:
> TEI has no explicitly designated means of
encoding onomasiological data!
Enhancing original data
• Adding domain (onomasiological) and ontology-based sense tags
<sense corresp=“concept:Weißgebäck”>Weißgebäck</usg>
<usg type="dom" corresp=“concept:Brot”>Brot</usg>
• Normalization of phonetic notation*
<form type="lautung" n="1">

<pron notation="tustep">&gt;str-uts</pron>

<pron notation="ipa" resp="#JB" change=“01.2">ʒ̊truːts</pron>

</form>
• Adding Morpholgical/Compositional Analysis*
            <form type="hauptlemma">
               <orth>(S:emmel)zipfel</orth>
            </form>
            <form type="hauptlemma" resp="#MS">
               <orth>(<seg corresp="concept:Semmel”>S:emm<seg ana="#dimin">el</seg></seg>)
   <seg ana="#stem" corresp="concept:Zipf”>zipf<seg ana="#dimin">el</seg></seg>
       </orth>
            </form>
Lexical Organization
Semasiological:
Onomasiological:
Semasiological Lexical Model
meaning(iii)
Form
meaning(ii)meaning(i)
Onomasiological Lexical Model
Concept
Form(i) Form(ii) Form(iii)
Starting point is word form and identifies
associated meanings and senses
Starting point is a concept and looks at forms
used to represent it
Headword
Lemma(i..n)
BROT
brot broet brɛot
Prôt Prôt Prôt
Core DBÖ entry datatypes
—————————————-
Archive record
Headword (Form)
POS
Dialect lemma (Form)
Gram info
Meaning (Sense)
Usage example
Source
Place
Questionnaire
Etymology
Desired Data Structure
Desired Onomasiological Model for Extracted
Terminological DBÖ Datasets
TermEntry
Concept(a)
DialectEntry(i) DialectEntry(ii) DialectEntry(n)
Options using XML-Based Standards
(i) TEI Hacks: Alternate TEI Dictionary format (<entryFree>)
(ii) TEI-TBX Hybrid (Romary, 2014)
OR…. use TEI P4
TEI <entryFree> Model
(1…n)
<sense @corresp/>
<entryFree @xml:id>
<usg @type=“dom”>
<superEntry>
<entry @xml:id @xml:lang=“bar”>
(0…n)
(1…n)
<form type=“hauptlemma”>
<orth>
(1…n)
(1…1)
<form type=“hauptlemma”>
(all other elements content from original copied without alteration)
<def @xml:lang>
(0…n)
<sense>
concept:
meaning
concept:
domain
Form (headword(i))
Form (dialect(a))
Metadata:
DBÖ entry (headword (i))
Form (headword(ii))
Form (dialect(b))
Metadata:
DBÖ entry (headword (ii))
TEI <entryFree> Model
concept:
meaning
<entryFree>
            <sense corresp="concept:Wecken">
               <usg type="dom" corresp="concept:Brot">Brot</usg>
               <def xml:lang="en" resp="#JB">Oblong loaf of bread</def>
            </sense>
            <superEntry> <!—for each unique hauptlemma for concept entry —>
               <form type="hauptlemma">
                  <orth>Wecken</orth>
               </form>
<entry xml:id="w834_qdb-d1e602b" xml:lang="bar">
                  <!-- hauptlemma removed from here; entry content abbreviated -->
                  <form type="lautung" n="1">
                     <pron notation="tustep">W.eiggn</pron>
                     <pron notation="ipa" resp="#JB" change=“01.2">ʋɛiggn̩</pron>
                  </form>
                  <usg type="geo">
                     <placeName>St.Michael/B. Bgl.</placeName>
                  </usg>
               </entry>
<!—all entries with headword “Wecken” (ii..n) —> </superEntry>
<superEntry>
               <form type="hauptlemma">
                    <orth>Strutzen</orth>
               </form>
              
               <entry xml:id="s806_qdb-d1e43847b" xml:lang="bar">
                  <!-- hauptlemma removed from here; entry content abbreviated -->
                  <form type="lautung" n="1">
                     <pron notation="tustep">Struzn</pron>
                     <pron notation="ipa" resp="#JB" change=“01.2">ʃtruzn̩</pron>
                  </form>
<usg type="geo">

<placeName>Rohrb. OÖ</placeName>

</usg>
               </entry>
<!—all entries with headword “Strutzen” (ii..n) —> </superEntry>
</entryFree>
concept:
domain
Form (headword(i))
Form (dialect(a))
Metadata:
DBÖ entry (headword (i))
Form (headword(ii))
Form (dialect(b))
Metadata:
DBÖ entry (headword (ii))
Problems with <entryFree> model
• It is a hack!
• Current TEI guidelines and data model are
inherantly and intentionallly semasiological and
this use of the vocabulary is only valid by chance,
not intention.
>Thus using this data model within the TEI will not have
any of the advantages that generally come with its use
TBX-TEI Hybrid
Romary (2014):
Makes attempt at customizing TEI guidelines to incorporate TBX
(ISO 30046) terminological entries in order to provide TEI with an
onomasiological model
https://github.com/laurentromary/TBXinTEI
TBX-TEI Hybrid
  <tbx:termEntry xmlns="http://www.tbx.org"><!-- @xml:id;  -->
            <descrip type="concept" target="concept:Wecken"/> <!-- sense not normally included in TBX! -->             
            <descrip type="domain" target="concept:Brot" xml:lang="de">Brot</descrip>
            <descrip type="definition" xml:lang="en">Oblong loaf of bread</descrip>
           <!-- no headword form may occur outside of <langSet>—>
            <langSet xml:id="w834_qdb-d1e602" xml:lang="bar-x-smichael"><!-- language/dialect i) @xml:id;  -->
<!-- No sense allowed! —>
               <tei:note type="anmerkung" resp="O" corresp="#BD">deren Grundriß ein Oval ist</tei:note>
<!-- @corresp allowed in TEI <note> but not here —>
<!-- Most metadata element valid using <tei:ref> but syntactically required to occur before <tig> —>
                <admin type="geo">
                  <tei:placeName>St.Michael/B. Bgl.</tei:placeName>
               </admin>
               <tig><!-- <tei:form> would be better -->
                  <tei:term type="hauptlemma">Wecken</tei:term>
                  <termNote type="transcription">orth</termNote><!-- this is inefficient: need to allow <orth> & <pron>—>
                  <termNote type="pos">Subst</termNote><!-- this actually should be applicable to all forms (headword & lemmas) -->      
               </tig>
               <tig>
                  <tei:term type="lautung" n="1">W.eiggn</tei:term>
                  <termNote type="transcription">pron</termNote>
                  <termNote type="notation">tustep</termNote><!-- we also need to allow @notation  -->
               </tig>
               <tig><!-- TBX doesn't allow multiple instances of <term> in same <tig> as TEI does with <orth>,<pron> w/in <form> -->
                  <tei:term type="lautung" n="1" resp="#JB">ʋɛiggn̩</tei:term>
                  <termNote type="transcription" change=“1.2">pron</termNote><!-- @change in original not allowed in hybrid schema -->
                  <termNote type="notation">ipa</termNote>
               </tig>   
            </langSet>
….
Problems with TEI-TBX Hybrid model as
per the ODD Schema from Romary (2014)
• <tig> is verbose and would be better replaced with <form>
• the order of occurence of elements is too restricted
• TBX-dominated schema lacks way too many attributes (e.g.
@notation),and elements (e.g. <orth> <pron>) that are key
to storage and representation of lexical data as used in TEI
Conclusion
(i) TEI lacks a legitimate means of encoding terminological/
onomasiological entries;
(ii) Given that we need to include sense (or a parallel equivalent) and
the headword at the top of an entry, a TBX-TEI hybrid doesn’t work
either without serious modification via ODD mostly to introduce
elements and features from TEI, and stretching the traditional usage
of the system;
(iii) TEI needs to re-introduce a means of onomasiological data
representation (such as <termEntry>) but with an expanded set of
elements and attributes based on the degree of expressivity in the
Dictionary module

More Related Content

Similar to Exploring Onomasiological Data Models for Heterogenous Dialect Lexical Data

Similar to Exploring Onomasiological Data Models for Heterogenous Dialect Lexical Data (20)

Handling Markup Overlaps Using OWL
Handling Markup Overlaps Using OWLHandling Markup Overlaps Using OWL
Handling Markup Overlaps Using OWL
 
Embedding semantic annotations within texts: the FRETTA approach
Embedding semantic annotations within texts: the FRETTA approachEmbedding semantic annotations within texts: the FRETTA approach
Embedding semantic annotations within texts: the FRETTA approach
 
Introduction to XML
Introduction to XMLIntroduction to XML
Introduction to XML
 
XHTML
XHTMLXHTML
XHTML
 
The ISO-DCR
The ISO-DCRThe ISO-DCR
The ISO-DCR
 
23xml
23xml23xml
23xml
 
Xml
XmlXml
Xml
 
XSLT
XSLTXSLT
XSLT
 
CS4200 2019 | Lecture 5 | Transformation by Term Rewriting
CS4200 2019 | Lecture 5 | Transformation by Term RewritingCS4200 2019 | Lecture 5 | Transformation by Term Rewriting
CS4200 2019 | Lecture 5 | Transformation by Term Rewriting
 
XML for bioinformatics
XML for bioinformaticsXML for bioinformatics
XML for bioinformatics
 
Introduction to XML, XHTML and CSS
Introduction to XML, XHTML and CSSIntroduction to XML, XHTML and CSS
Introduction to XML, XHTML and CSS
 
Html ppt
Html pptHtml ppt
Html ppt
 
XPath - XML Path Language
XPath - XML Path LanguageXPath - XML Path Language
XPath - XML Path Language
 
Uta005 lecture2
Uta005 lecture2Uta005 lecture2
Uta005 lecture2
 
Xml and xslt
Xml and xsltXml and xslt
Xml and xslt
 
REST and AJAX Reconciled
REST and AJAX ReconciledREST and AJAX Reconciled
REST and AJAX Reconciled
 
Html bangla
Html banglaHtml bangla
Html bangla
 
Bangla HTML Tutorial
Bangla HTML TutorialBangla HTML Tutorial
Bangla HTML Tutorial
 
Processing XML with Java
Processing XML with JavaProcessing XML with Java
Processing XML with Java
 
Introduction to xml
Introduction to xmlIntroduction to xml
Introduction to xml
 

Recently uploaded

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 

Recently uploaded (20)

DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 

Exploring Onomasiological Data Models for Heterogenous Dialect Lexical Data

  • 1. Jack T. Bowers Melanie Seltmann Austrian Academy of Sciences -Austrian Center for Digital Humanities Exploring data models for heterogenous dialect data: the case of explore.bread.AT!
  • 2. Outline of Presentation Part I: Overview of project & data Part II: Overview of possible solutions using XML-based markup standards for representing onomasiological dialectal language
  • 3. explore.AT! Overview: • DBÖ: collection of Bavarian dialectal speech began 1911 • 2015-2016 converted from TUSTEP to TEI Goals • Gain cultural and linguistic insights into Bavarian dialects in former Austro-Hungarian empire; • Update and improve the existing body of resources by converting to conform with standards and best practice (ISOcat, ISOconcept, etc.; • Enhance usability and compatibility of data in order to share with project partners; • Integration of semantic web/LOD resources;
  • 4. Project Overview: Datasets DBÖ@TEI WBÖ@TEI BaseX Database place inventory (TEI-listPlace) concept inventory(TEI-feature structures) gram features inventory (TEI-feature structures) questionnaires (TEI-list) DBÖ@ema SQL BaseX Database Extracted Topical Datasets explore.bread The language of Color lexicon(location(a)) inventory(lexicalFeature(a)) • Domain/Topic-based (exploreBread) • Location • Lexical/grammatical features Possible basis for examination of sub-datasets
  • 6. DBÖ Questionnaires Questionnaires: While questionnaires are topical in general, they are a complicated mixture of semasiological (term-based) and onomasiological (concept-based) e.g. (31B5) bes. Weißgebäcke: länglich flaches, gerundetes Weißgebäck, z.B. Strutz (l.!), Strutzen, Strützel, Wecken u.a.; scherzhafte Bez. wie Schendarm Current means of extracting this information were initially limited to: • Questionnaires • String searches in certain data fields Dataset requires significant manual editing and curation due to nature of the questionnaires
  • 7. Desired Enhancements In most sub-topical studies such as ExploreBread! there would be potential benefits of having the ability to format data onomasiologically, for example: • Domain and/or concept-oriented entries better represent the content of interest • Information retrieval • Ontology mapping • Etymological &/or Morphosyntactic analysis • Cross linguistic (or dialectal) comparisson or translation Problem: > TEI has no explicitly designated means of encoding onomasiological data!
  • 8. Enhancing original data • Adding domain (onomasiological) and ontology-based sense tags <sense corresp=“concept:Weißgebäck”>Weißgebäck</usg> <usg type="dom" corresp=“concept:Brot”>Brot</usg> • Normalization of phonetic notation* <form type="lautung" n="1">
 <pron notation="tustep">&gt;str-uts</pron>
 <pron notation="ipa" resp="#JB" change=“01.2">ʒ̊truːts</pron>
 </form> • Adding Morpholgical/Compositional Analysis*             <form type="hauptlemma">                <orth>(S:emmel)zipfel</orth>             </form>             <form type="hauptlemma" resp="#MS">                <orth>(<seg corresp="concept:Semmel”>S:emm<seg ana="#dimin">el</seg></seg>)    <seg ana="#stem" corresp="concept:Zipf”>zipf<seg ana="#dimin">el</seg></seg>        </orth>             </form>
  • 9. Lexical Organization Semasiological: Onomasiological: Semasiological Lexical Model meaning(iii) Form meaning(ii)meaning(i) Onomasiological Lexical Model Concept Form(i) Form(ii) Form(iii) Starting point is word form and identifies associated meanings and senses Starting point is a concept and looks at forms used to represent it
  • 10. Headword Lemma(i..n) BROT brot broet brɛot Prôt Prôt Prôt Core DBÖ entry datatypes —————————————- Archive record Headword (Form) POS Dialect lemma (Form) Gram info Meaning (Sense) Usage example Source Place Questionnaire Etymology Desired Data Structure Desired Onomasiological Model for Extracted Terminological DBÖ Datasets TermEntry Concept(a) DialectEntry(i) DialectEntry(ii) DialectEntry(n)
  • 11. Options using XML-Based Standards (i) TEI Hacks: Alternate TEI Dictionary format (<entryFree>) (ii) TEI-TBX Hybrid (Romary, 2014) OR…. use TEI P4
  • 12. TEI <entryFree> Model (1…n) <sense @corresp/> <entryFree @xml:id> <usg @type=“dom”> <superEntry> <entry @xml:id @xml:lang=“bar”> (0…n) (1…n) <form type=“hauptlemma”> <orth> (1…n) (1…1) <form type=“hauptlemma”> (all other elements content from original copied without alteration) <def @xml:lang> (0…n) <sense> concept: meaning concept: domain Form (headword(i)) Form (dialect(a)) Metadata: DBÖ entry (headword (i)) Form (headword(ii)) Form (dialect(b)) Metadata: DBÖ entry (headword (ii))
  • 13. TEI <entryFree> Model concept: meaning <entryFree>             <sense corresp="concept:Wecken">                <usg type="dom" corresp="concept:Brot">Brot</usg>                <def xml:lang="en" resp="#JB">Oblong loaf of bread</def>             </sense>             <superEntry> <!—for each unique hauptlemma for concept entry —>                <form type="hauptlemma">                   <orth>Wecken</orth>                </form> <entry xml:id="w834_qdb-d1e602b" xml:lang="bar">                   <!-- hauptlemma removed from here; entry content abbreviated -->                   <form type="lautung" n="1">                      <pron notation="tustep">W.eiggn</pron>                      <pron notation="ipa" resp="#JB" change=“01.2">ʋɛiggn̩</pron>                   </form>                   <usg type="geo">                      <placeName>St.Michael/B. Bgl.</placeName>                   </usg>                </entry> <!—all entries with headword “Wecken” (ii..n) —> </superEntry> <superEntry>                <form type="hauptlemma">                     <orth>Strutzen</orth>                </form>                               <entry xml:id="s806_qdb-d1e43847b" xml:lang="bar">                   <!-- hauptlemma removed from here; entry content abbreviated -->                   <form type="lautung" n="1">                      <pron notation="tustep">Struzn</pron>                      <pron notation="ipa" resp="#JB" change=“01.2">ʃtruzn̩</pron>                   </form> <usg type="geo">
 <placeName>Rohrb. OÖ</placeName>
 </usg>                </entry> <!—all entries with headword “Strutzen” (ii..n) —> </superEntry> </entryFree> concept: domain Form (headword(i)) Form (dialect(a)) Metadata: DBÖ entry (headword (i)) Form (headword(ii)) Form (dialect(b)) Metadata: DBÖ entry (headword (ii))
  • 14. Problems with <entryFree> model • It is a hack! • Current TEI guidelines and data model are inherantly and intentionallly semasiological and this use of the vocabulary is only valid by chance, not intention. >Thus using this data model within the TEI will not have any of the advantages that generally come with its use
  • 15. TBX-TEI Hybrid Romary (2014): Makes attempt at customizing TEI guidelines to incorporate TBX (ISO 30046) terminological entries in order to provide TEI with an onomasiological model https://github.com/laurentromary/TBXinTEI
  • 16. TBX-TEI Hybrid   <tbx:termEntry xmlns="http://www.tbx.org"><!-- @xml:id;  -->             <descrip type="concept" target="concept:Wecken"/> <!-- sense not normally included in TBX! -->                          <descrip type="domain" target="concept:Brot" xml:lang="de">Brot</descrip>             <descrip type="definition" xml:lang="en">Oblong loaf of bread</descrip>            <!-- no headword form may occur outside of <langSet>—>             <langSet xml:id="w834_qdb-d1e602" xml:lang="bar-x-smichael"><!-- language/dialect i) @xml:id;  --> <!-- No sense allowed! —>                <tei:note type="anmerkung" resp="O" corresp="#BD">deren Grundriß ein Oval ist</tei:note> <!-- @corresp allowed in TEI <note> but not here —> <!-- Most metadata element valid using <tei:ref> but syntactically required to occur before <tig> —>                 <admin type="geo">                   <tei:placeName>St.Michael/B. Bgl.</tei:placeName>                </admin>                <tig><!-- <tei:form> would be better -->                   <tei:term type="hauptlemma">Wecken</tei:term>                   <termNote type="transcription">orth</termNote><!-- this is inefficient: need to allow <orth> & <pron>—>                   <termNote type="pos">Subst</termNote><!-- this actually should be applicable to all forms (headword & lemmas) -->                      </tig>                <tig>                   <tei:term type="lautung" n="1">W.eiggn</tei:term>                   <termNote type="transcription">pron</termNote>                   <termNote type="notation">tustep</termNote><!-- we also need to allow @notation  -->                </tig>                <tig><!-- TBX doesn't allow multiple instances of <term> in same <tig> as TEI does with <orth>,<pron> w/in <form> -->                   <tei:term type="lautung" n="1" resp="#JB">ʋɛiggn̩</tei:term>                   <termNote type="transcription" change=“1.2">pron</termNote><!-- @change in original not allowed in hybrid schema -->                   <termNote type="notation">ipa</termNote>                </tig>                </langSet> ….
  • 17. Problems with TEI-TBX Hybrid model as per the ODD Schema from Romary (2014) • <tig> is verbose and would be better replaced with <form> • the order of occurence of elements is too restricted • TBX-dominated schema lacks way too many attributes (e.g. @notation),and elements (e.g. <orth> <pron>) that are key to storage and representation of lexical data as used in TEI
  • 18. Conclusion (i) TEI lacks a legitimate means of encoding terminological/ onomasiological entries; (ii) Given that we need to include sense (or a parallel equivalent) and the headword at the top of an entry, a TBX-TEI hybrid doesn’t work either without serious modification via ODD mostly to introduce elements and features from TEI, and stretching the traditional usage of the system; (iii) TEI needs to re-introduce a means of onomasiological data representation (such as <termEntry>) but with an expanded set of elements and attributes based on the degree of expressivity in the Dictionary module