Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Recommendations for Encoding
Etymological Information Using TEI XML
Laurent Romary
INRIA
France
Jack T. Bowers
iljackb@gma...
General Overview of Project
We are creating a set of structural recommendations for
TEI lexical dictionaries, including in...
Goals for TEI Etymological Markup Recommendations
(i) address the lack of sufficient digital markup models and standards f...
0…n
<colloc>
<per>
<usg>
<case>
<gram> <pos>
<number>
<tns>
<gen>
<mood>
Working TEI Dictionary Metamodel (elements)
TEI
0...
Two Potential Etymology Structures in TEI0…n
<quote>
0…n
<cit>
0…n0…1
<gramGrp>
1…n
<bibl>
0…n
<seg>
<oRef>
<pRef>
<ptr>
<...
<etym>entry
• If there are no semantic implications for the
etymological change, and/or the semantic
change occurred in an...
0…n
<quote>
0…n
0…1
<usg> <cit>sense
0…n0…1
<gramGrp>
1…n
<bibl>
0…n
<seg>
<oRef>
<pRef>
0…n
0…n
<etym>sense
0…n
<ref>
<gl...
Etymological Processes: Inheritance
<entry xml:lang="it" xml:id=“buono">
<form type="lemma">
<orth>buono</orth>
<pron nota...
Etymological Processes:
2)
ˈbonu > ˈbon
<entry xml:lang="fr" xml:id="bon">
<form type="lemma">
<orth>bon</orth>
<pron nota...
Etymological Processes: Borrowing*
Key Linguistic concepts:
Description of lexical process:• where a language takes a
lexi...
Etymological Processes: Borrowing*
<entry xml:id="taxi" xml:lang="jpn">
<form type="lemma">
<orth type="transliterated" no...
<cit type=“etymon"><orth @type @notation>
<pron @notation>
<form type=“lemma">
<gramGrp>
<pos>
<etym type=“borrowing”>
<en...
Description of process:
Key components
• Domain of concept
(y): Source Domain;
• Domain of concept
(x): Target Domain
Sour...
Etymological Processes: Metaphor
Source Concept: bean
Target Concept: kidney
color shape
Source Domain Profile:
Legumes
Fo...
<entry xml:id="kidney">
<form type=“lemma">
<orth>ntuchi</orth>
<pron notation="ipa">ndù.ʧí</pron>
<!— gramGrp cluster—>
<...
<usg type=“dom”> <etym type=“metaphor”>
<gloss>
<cit type=“etymon”>
<lbl>
<sense @corresp>
<entry @xml:id>
<cit type=“tran...
Etymological Processes: Metonymy
Description of lexical process:
Key Linguistic concepts:
• concept (y) stands for concept...
Etymological Processes: Metonymy
Mixtepec-Mixtec: ‘kiti’ (horse)
<entry xml:id=“animal”>
<form type="lemma">
<orth>kiti</o...
<usg type=“dom”>
<form type=“lemma">
<entry @xml:id>
<sense @corresp>
<cit type=“translation” @xml:lang>
<oRef>
<gramGrp>
...
Etymological Processes:
Compounding
Description of lexical process:
• Combines surface forms of two
lexical items to form ...
Etymological Processes: Compounding
(with Metonymy)
Salient attribute of location = “the
presence of hummingbirds”
Mixtepe...
<oRef @corresp>
<form type=“lemma">
<gramGrp>
<pos>
<orth>
<seg @corresp>
Etymological Processes: Compounding
TEI model fo...
Alt (2006) LMF etymology extension proposal;
merged with the LMF Core package
Form
Representation
Lexical Entry
Lexical DB...
pompel
limoes
+pamplemousse pompelmoes
Synchronic Diachronic
DutchModern French
/etymologicalLink/
/source/=“..”/target/=“...
<entry xml:id="LE1" xml:lang=“fr">
<form type="lemma">
<orth>pamplemousse</orth>
....
</form>
<sense>
....
</sense>
…..
</...
pompel
limoes
+pamplemousse pompelmoes
Synchronic Diachronic
DutchModern French
/etymologicalLink/
/source/=“..”/target/=“...
ation of Alt (2006) LMF Etymology Extension: Compounding Stage
<entry xml:id="LE1" xml:lang=“fr">
<form type="lemma">
<ort...
<lbl>
<lang>
<sense> 0…n
<oRef @xml:lang>
<etym type=“borrowing”>
<ref @target>
<form type=“lemma">
<gramGrp>
<pos>
<c>
<o...
Étymol. et Hist. 1. 1re moitié du xiies. put cel estre (Psautier Oxford, 54, 13 ds T.-L.); ca 1160 puet
estre (Eneas, 9003...
PEUT-ÊTRE,adv.
Encoding from existing sources:
non-linguistic content portion of diachronic entry
….
<etym xml:id=“PEUT-ÊT...
PEUT-ÊTRE, adv.
Encoding from existing sources:
diachronic portion of entry
….
<sense>
<etym xml:id=“PEUT-ÊTRE-adv-Étym-et...
Encoding from existing sources:
diachronic portion of entry
<cit type="attestation">
<date notBefore="1200" notAfter="1250...
Conclusions and Summary
Our TEI recommendations can facilitate:
• linking and integrating corresponding data structures be...
Upcoming SlideShare
Loading in …5
×

Etymology Markup in TEI XML

512 views

Published on

This presentation introduces working recommendations for encoding etymological information in TEI P5 dictionaries. Herein an overview of a reformed package of elements attributes and structures is given for a revamping of TEI as per the ongoing project seeking a general overhaul of TEI dictionaries at INRIA France. Central to this is the need to create an LMF compatible set of TEI structures which is a long needed step forward in the field of lexical markup.
This presentation demonstrates ways to encode information that is central to linguistics but have not previously been encoded in any known TEI project such as: metaphor, metonymy, phonological changes, to name a few. Additionally demonstrated are structures meant to improve upon the existing ways that lexicographical resources are encoded in TEI such as etymological dictionaries.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Etymology Markup in TEI XML

  1. 1. Recommendations for Encoding Etymological Information Using TEI XML Laurent Romary INRIA France Jack T. Bowers iljackb@gmail.com COST ENeL WG2 Meeting Vienna 13/02/2015 revision 06/04/2015
  2. 2. General Overview of Project We are creating a set of structural recommendations for TEI lexical dictionaries, including information relevant to: • phonetic and orthographic forms; • grammatical information; • semantic and meta-linguistic information; • variation (on all levels); • etymology; • mono-/bi-/multi-/lingual dictionaries; as well as in dictionaries in which encyclopedic and examples are included; Models involve proposing changes to the TEI P5 guidelines itself and defining our constraints on the TEI in an ODD;
  3. 3. Goals for TEI Etymological Markup Recommendations (i) address the lack of sufficient digital markup models and standards for representing etymological information; (ii) coherence in treatment of the same exact linguistic information between synchronic and diachronic data structures; (iii) LMF and ONTOLEX compatible TEI structures; (iv) make better use of linking mechanisms in TEI for: • connecting cited forms in etymology and their project internal sources (where possible); • making use of existing external resources for lexical and information conceptual not internal to a given project or corpus: e.g. open source lexical & ontological knowledge and linked data resources (v) increase diversity in the types of etymological information that can be treated & make more use of concepts from linguistics:
  4. 4. 0…n <colloc> <per> <usg> <case> <gram> <pos> <number> <tns> <gen> <mood> Working TEI Dictionary Metamodel (elements) TEI 0…n <quote> 0…n 1…n 0…1 <usg> <cit>sense 0…n 0…n0…1 <gramGrp> 1…10…10…1 0…n0…n 1…n <bibl> 0…n<form> <sense> <orth> <pron> 0…n <seg> <seg> <listChange> 1…n <change> 0…1 <bibl> 0…1 <seg> <oRef> <pRef> <gramGrp> 0…n 0…n 0…n <etym>sense 0…n <etym>entry 0…n <ref> <gloss> 1…n <oRef> <pRef> <lang> <lbl> 0…n <ptr> <note> <date> <bibl> <ptr> <entry> 0…n <ref>0…n <spanGrp> <span> <annotationGrp> <annotations> 1…n 0…n 1…n <def> 1…n 0…n <def> <cit>etym <gramGrp> <cit> <num> <cit> <num> <lbl> <num> <lbl> 0…n <c><c>
  5. 5. Two Potential Etymology Structures in TEI0…n <quote> 0…n <cit> 0…n0…1 <gramGrp> 1…n <bibl> 0…n <seg> <oRef> <pRef> <ptr> <entry> 0…n <ref>0…n <spanGrp> <span> <annotationGrp> <annotations> 1…n 0…n 1…n <def> 1…n 0…n • if there are semantic implications for the etymological change; • no semantic implications for existing lexical items in language the etymological change; <etym>sense <etym>entry • both may occur in the same entry to account for unrelated changes that occurred at different stages; 0…1 0…n 0…n 0…n 0…n <etym>sense 0…n <gramGrp> <ref> <gloss> 1…n <cit> <oRef> <pRef> <lang> <lbl> 0…n <ptr> <note> <date> <bibl> 1…n <def> <num> <cit> <etym>entry <sense> 0…n <usg>
  6. 6. <etym>entry • If there are no semantic implications for the etymological change, and/or the semantic change occurred in another language or proto-language stage; 0…n 1…n <entry> <quote> 0…n <cit>sense 0…n0…1 <gramGrp> <bibl> 0…n <seg> <oRef> <pRef> <ptr> 0…n <ref>0…n <spanGrp> <span> <annotationGrp> <annotations> 1…n 0…n 1…n <def> 1…n 0…n <sense> • Inheritance ; Phonetic and phonological processes: (non exhaustive) • assimilation (place, manner) ; • epenthesis; • metathasis • erosion/deletion; (apokope, • coalescence; • tone changes; (has own internal categories) • neutralization; • Borrowing*; • lexical item imported from other language; 1…n 0…n 0…n 0…n 0…n 0…n <ref> <gloss> 1…n <cit>etym <oRef> <pRef> <lang> <lbl> 0…n <ptr> <note> <date> <bibl> 1…n <def> <num> <cit> <colloc> <per> <usg> <case> <gram> <number> <gen> <mood> 1…n <num> <gramGrp> <note> <etym>sense <cit> <lbl> <etym>entry <pos> <tns>
  7. 7. 0…n <quote> 0…n 0…1 <usg> <cit>sense 0…n0…1 <gramGrp> 1…n <bibl> 0…n <seg> <oRef> <pRef> 0…n 0…n <etym>sense 0…n <ref> <gloss> 1…n <cit>etym <oRef> <pRef> <lang> <lbl> 0…n <ptr> <note> <date> <bibl> <ptr> <entry> 0…n <ref>0…n <spanGrp> <span> <annotationGrp> <annotations> 1…n 0…n 1…n <def> 1…n 0…n Used when there are semantic implications for the etymological change; • *where there are multiple etymological processes that occur and some are semantic in nature and others phonetic, they may all be included in <etym>sense if the former permitted the latter. 1…n <def> <num> <cit> <etym>sense • Metaphor; • Metonymy • Blending*; • Compounding; • Grammaticalizati on; • several of these processes can co-occur; <gramGrp> 0…n 0…n 0…n <etym>entry <num> <lbl> 0…n 0…n <colloc> <pers> <usg> <case> <gram> <pos> <number> <tns> <gen> <mood> 1…n <sense> <num> <lbl>
  8. 8. Etymological Processes: Inheritance <entry xml:lang="it" xml:id=“buono"> <form type="lemma"> <orth>buono</orth> <pron notation=“ipa">'bwo.no</pron> <gramGrp> <pos>adj.</pos> <gen>masc.</gen> </gramGrp> </form> <sense> .... </sense> <etym type="inheritance"> <cit type="etymon"> <oRef xml:lang="la">bónŭ</oRef> <gramGrp> <pos>adj.</pos> <gen>masc.</gen <case>nom.</case> </gramGrp> </cit> </etym> </entry> Italian < Vulgar Latin buono < bŏnu synchronic entry diachronic (etymological) entry Note: processes and changes are approximate and meant for demonstrating markup rather than asserting precise etymological diachrony of individual items;
  9. 9. Etymological Processes: 2) ˈbonu > ˈbon <entry xml:lang="fr" xml:id="bon"> <form type="lemma"> <orth>bon</orth> <pron notation=“ipa">'bɔ̃</pron> <gramGrp> <pos>adj</pos> <gen>masc.</gen> </gramGrp> </form> <sense> .... </sense> <etym type="inheritance"> <cit type=“etymon" xml:id="bónŭ" next="ˈbon"> <oRef xml:lang="la">bónŭ</oRef> <gramGrp> <case>nom.</case> </gramGrp> </cit> <cit type=“etymon” xml:id="ˈbon" prev=“bónŭ” next="ˈbɔ̃"> <pRef xml:lang=“fro">ˈbon</oRef> </cit> <cit type=“etymon” xml:id="ˈbɔ̃" prev=“ˈbon"> <pRef xml:lang="fro">bɔ̃</oRef> </cit> </etym> </entry> bon < bónŭ French < Vulgar Latin (2) Intermediate phonological change (1) Root level etymological process (3) Final phonological change Inheritance & Phonological Changes Note: processes and changes are approximate and meant for demonstrating markup rather than asserting precise etymological diachrony of individual items; 3) ˈbon > ˈbɔ̃
  10. 10. Etymological Processes: Borrowing* Key Linguistic concepts: Description of lexical process:• where a language takes a lexical item from different language; • aka: loaning, importing; • often have historical and practical explanation for need • source language; • source form(s); phonetic, orthographic • importing language; • imported or borrowed form; • semantic/meta-linguistic concept; Source Language: Importing Language: Meta- linguistic Concept: Borrowed Form(s): Source Form(s): orth(i..n) pron(i..n) orth(i..n) pron(i..n)
  11. 11. Etymological Processes: Borrowing* <entry xml:id="taxi" xml:lang="jpn"> <form type="lemma"> <orth type="transliterated" notation="romanji">takushī</orth> <orth notation="katakana">タクシー</orth> <pron notation="ipa">taku'shi:</pron> <gramGrp> <pos>noun</pos> </gramGrp> </form> <sense corresp="http://dbpedia.org/page/Taxicab"> <usg type=“dom">transportation</usg> </sense> <etym type="borrowing"> <lbl>source</lbl> <lang>English</lang> <cit type="etymon"> <oRef corresp="http://en.wiktionary.org/wiki/taxi" xml:lang="en">taxi</oRef> <pRef notation=“ipa" corresp=“http://en.wiktionary.org/wiki/taxi#Pronunciation" xml:lang="en-US">'tæksi</pRef> </cit> </etym> </entry> Japanese < English: taxi(cab) Borrowed Form(s): Source Form(s): Meta-linguistic Concept: Importing Language Source Language
  12. 12. <cit type=“etymon"><orth @type @notation> <pron @notation> <form type=“lemma"> <gramGrp> <pos> <etym type=“borrowing”> <entry @xml:id> <oRef @corresp @xml:lang> <sense @corresp> <lbl> <lang> <usg type=“dom”> TEI Model for Japanese ‘takushī’ Etymological Process: Borrowing Lexical entry: <pRef @notation @corresp @xml:lang> Ontological resource for entry External lexical entry resource for source term External pronunciation resource for source term
  13. 13. Description of process: Key components • Domain of concept (y): Source Domain; • Domain of concept (x): Target Domain Source Concept: Salient Attributes Target Concept: • Lexical innovation based in human cognition; • Describe/understand one concept (x) in terms of concept (y); • Requires a change in semantic domains; • Mapping between concepts is only limited to certain salient attributes; • Results in lexical Polysemy Etymological Processes: Metaphor Source Domain Profile: Domain (x) Target Domain Profile: Domain (y) Lexical Source Form(s) Polysemous Lexical Form(s) phonetic orthographic
  14. 14. Etymological Processes: Metaphor Source Concept: bean Target Concept: kidney color shape Source Domain Profile: Legumes Food Target Domain Profile: Body Internal Organs Lexical Source Form(s) [ndù.ʧí] ntuchi Polysemous Lexical Form(s) Mixtepec-Mixtec ‘ntuchi’ (bean > kidney)
  15. 15. <entry xml:id="kidney"> <form type=“lemma"> <orth>ntuchi</orth> <pron notation="ipa">ndù.ʧí</pron> <!— gramGrp cluster—> </form> <sense corresp="http://dbpedia.org/resource/Kidney"> ….. <usg type="dom">Body</usg> <usg type=“dom">InternalOrgans</usg> <etym type="metaphor"> <cit type=“etymon"> <oRef corresp="#bean">ntuchi</oRef> <pRef corresp="#bean">ndù.ʧí</pRef> <gloss>bean</gloss> </cit> </etym> <entry xml:id="bean"> <form type=“lemma"> <orth>ntuchi</orth> <pron notation="ipa">ndù.ʧí</pron> <!— gramGrp cluster—> </form> ….. <sense corresp="http://dbpedia.org/resource/Pinto_bean"> <usg type="dom">Legume</usg> <usg type="dom">Food</usg> ……. <!— translation info here—> </sense> </entry> Etymological Processes: Metaphor dbpedia ontology entry for: ‘pinto bean’ dbpedia ontology entry for: ‘kidney’ pointer to entry for ‘bean’
  16. 16. <usg type=“dom”> <etym type=“metaphor”> <gloss> <cit type=“etymon”> <lbl> <sense @corresp> <entry @xml:id> <cit type=“translation” @xml:lang> <oRef @corresp> <gramGrp> <pos> <orth> <pron @notation> <form @type=“lemma"> <sense @correp><form type=“lemma”> <usg type=“dom”> <cit @type @xml:lang> <gramGrp> <entry @xml:id> TEI Model for Mixtepec-Mixtec ‘ntuchi’ Etymological process: Metaphor Lexical entry: Source entry: <pRef @corresp @notation> Ontological resource for entry (kidney): Ontological resource for Source entry (bean): <oRef @corresp> <pron @notation> <orth> <pos>
  17. 17. Etymological Processes: Metonymy Description of lexical process: Key Linguistic concepts: • concept (y) stands for concept (x); • no change in semantic domains; • one “vehicle” entity provides mental access to another, (i.e. a target) within the same domain.; • source concept (cognitive); • target concept (cognitive); • source form (lexical); • target form (lexical): • results in (synchronic) polysemy Vehicle Concept: Target Concept: Domain (X)
  18. 18. Etymological Processes: Metonymy Mixtepec-Mixtec: ‘kiti’ (horse) <entry xml:id=“animal”> <form type="lemma"> <orth>kiti</orth> <pron notation="ipa">kì.tí</pron> <!—gramGrp here —> </form> <sense corresp="http://dbpedia.org/resource/Animal"> <usg type=“dom">Living Beings</usg> <usg type=“dom">Animal</usg> <cit type="translation" xml:lang="eng"> <oRef>animal</oRef> </cit> <!—other translations here —> </sense> </entry> <entry xml:id=“animal-horse”> <form type=“lemma"> <orth>kiti</orth> <pron notation="ipa">kì.t̪í</pron> <!—gramGrp here —> </form> <sense corresp="http://dbpedia.org/resource/Horse"> <usg type=“dom”>Animal</usg> <etym type="metonymy"> <date notBefore="1517"/> <cit type="etymon"> <oRef corresp="#animal">kiti</oRef> <pRef notation="ipa" corresp="#animal">kì.t̪í</pRef> <gloss>animal</gloss> </cit> <note>In this lexical item, the language reflects the history, since there were no horses in Mexico until the arrival of the Spanish, there was no Mixtecan word for 'horse', thus they categorical noun for 'animal' was used to describe the unnamed animal. </note> </etym> <cit type="translation" xml:lang="eng"> <oRef>horse</oRef> </cit> <!—other translations here —> </sense> </entry> Vehicle Concept; entryTarget Concept; entry
  19. 19. <usg type=“dom”> <form type=“lemma"> <entry @xml:id> <sense @corresp> <cit type=“translation” @xml:lang> <oRef> <gramGrp> <pos><orth> <pron @notation> <sense @corresp><form type=“lemma”> <pron @notation> <usg type=“dom”> <cit type=“translation” @xml:lang> <gramGrp> <entry @xml:id> TEI Model for Mixtepec-Mixtec ‘kiti’ (horse) Etymological process: Metonymy Lexical entry: Source entry: <etym type=“metonymy”> Ontological resource for entry: Ontological resource for Source entry: <orth> <cit type=“etymon”> <note> <gloss> <oRef @corresp> <pRef @corresp @notation> <date @notBefore> <pos> <oRef>
  20. 20. Etymological Processes: Compounding Description of lexical process: • Combines surface forms of two lexical items to form new one; • Become the sum of its lexical and semantic parts; • Can involve metaphor, metonymy, and/or grammaticalization Etymon(i)*: Etymon(ii)*: grammatical info(i) grammatical info(ii) semantic/meta- linguistic info(ii) semantic/meta- linguistic info(ii) etym. process (0..n) etym. process (0..n)
  21. 21. Etymological Processes: Compounding (with Metonymy) Salient attribute of location = “the presence of hummingbirds” Mixtepec-Mixtec: Yucha Nchu’u ’Puebla State’ <etym type="metonymy"> <cit type="etymon"> <oRef corresp=“#hummingbird”>Nchu’u</pRef> <gramGrp> <pos>concrete noun</pos> </gramGrp> <gloss>hummingbird</gloss> </cit> </etym> <entry xml:id=“Puebla-state" xml:lang="mix" type="compound"> <form type="lemma"> <orth><seg corresp=“#lake">Yucha</seg> <seg corresp=“#hummingbird”>Nchu’u</seg></orth> <!— <gramGrp> here —>….. </form> Etymon(1): <sense corresp="http://dbpedia.org/resource/Puebla_State"> <etym type="compounding"> </etym> …. </sense> </entry> <cit type="etymon"> <oRef corresp=“#lake”>Yucha</pRef> <gramGrp> <pos>concreteNoun</pos> </gramGrp> <gloss>hummingbird</gloss> </cit> Etymological process(ii): Metonymy (Primary) Etymological process: Compounding Etymon(2):
  22. 22. <oRef @corresp> <form type=“lemma"> <gramGrp> <pos> <orth> <seg @corresp> Etymological Processes: Compounding TEI model for Mixtepec-Mixtec “Yucha Nchu’u” <gloss> <cit type=“etymon”> <pos> <gramGrp> <oRef @corresp> <gloss> <cit type=“etymon”> <etym type=“metonymy”> <pos> <gramGrp> Lexical entry: <entry @xml:id type=“compound”> <etym type=“compounding”> <sense @corresp> <seg @corresp> Ontological resource for entry:
  23. 23. Alt (2006) LMF etymology extension proposal; merged with the LMF Core package Form Representation Lexical Entry Lexical DB Text Representation Lexical Resource Global Information Statement Form Representation 0…n 1…n 0…1 0…n Etymon Etymological Link Etymology 0…n 1…n 1…n 1…n 0…n Sense 0…n 0…n 0…n 0…n 1…1 Definition
  24. 24. pompel limoes +pamplemousse pompelmoes Synchronic Diachronic DutchModern French /etymologicalLink/ /source/=“..”/target/=“…” /etymologicalClass/=/composition/ /biblSource/=“Boulan, König…” /confidenceScore/=“probable” Etymology of French ‘pamplemousse’: from Trésore de la Langue Française (TFL) Etymological stage Composition (eg., Compounding) Etymological stage Loan Word (eg., Borrowing) /etymon/ /orth/=“pompelmoes” /language/=”nl” /pos/=“commonNoun” /gender/=“feminine” /gloss/=“Citrus Maxima” /etymologicalLink/ /source/=“..”/target/=“…” /etymologicalClass/=/loan word/ /biblSource/=“TLF” Alt (2006) LMF Etymology Extension: Borrowing Stage /etymon/ /orth/=“limoes” /language/=“nl” /pos/=“commonNoun” /gloss/=“citron” /etymon/ /orth/=“pompel” /language/=“nl” /pos/=“adjective” /gloss/=“gros, enflé”
  25. 25. <entry xml:id="LE1" xml:lang=“fr"> <form type="lemma"> <orth>pamplemousse</orth> .... </form> <sense> .... </sense> ….. </etym> </entry> <cit type="etymon" xml:id="L2"> <oRef xml:lang="nl">pompelmoes</oRef> <gloss xml:lang="lat">Citrus maxima</gloss> <gramGrp> <pos>commonNoun</pos> <gen>feminine</gen> </gramGrp> <note>probablement de l’origine tamoule, De Vries, Nederl</note> </cit> <etym type=“borrowing"> ….. <ref target=“#TLF”>TLF</ref> ….. Alt (2006) LMF Etymology Extension: Borrowing Stage Converted TEI Markup Note: our TEI structures do not explicitly use an equivalent of /etymologicalLink/ or “ /source/=“..”/target/=“…” ) as this link is implicitly present in the xml data structure Dutch Modern French pompelmoes pamplemousse /etymologicalLink/ /source/=“..”/target/=“…” /etymologicalClass/=/loan word/ /biblSource/=“TLF” /etymon/ /orth/=“pompelmoes” /language/=”nl” /pos/=“commonNoun” /gender/=“feminine” /gloss/=“Citrus Maxima” <!— ‘compounding’ section goes here —> ≈
  26. 26. pompel limoes +pamplemousse pompelmoes Synchronic Diachronic DutchModern French /etymologicalLink/ /source/=“..”/target/=“…” /etymologicalClass/=/composition/ /biblSource/=“Boulan, König…” /confidenceScore/=“probable” Etymological stage Composition (eg., Compounding) Etymological stage Loan Word (eg., Borrowing) /etymon/ /orth/=“limoes” /language/=“nl” /pos/=“commonNoun” /gloss/=“citron” /etymon/ /orth/=“pompel” /language/=“nl” /pos/=“adjective” /gloss/=“gros, enflé” /etymon/ /orth/=“pompelmoes” /language/=”nl” /pos/=“commonNoun” /gender/=“feminine” /gloss/=“Citrus Maxima” Alt (2006) LMF Etymology Extension: Compounding Stage Etymology of French ‘pamplemousse’: from Trésore de la Langue Française (TFL) /etymologicalLink/ /source/=“..”/target/=“…” /etymologicalClass/=/loan word/ /biblSource/=“TLF”
  27. 27. ation of Alt (2006) LMF Etymology Extension: Compounding Stage <entry xml:id="LE1" xml:lang=“fr"> <form type="lemma"> <orth>pamplemousse</orth> .... </form> <sense> .... </sense> <etym type="borrowing"> …… ….. </etym> </entry> <etym type=“compounding”> <ref target="#Boulan-König">Boulan, König...</ref> </etym> <cit type="etymon"> <oRef xml:lang="nl">pompel</oRef> <gramGrp> <pos>adjective</pos> </gramGrp> <gloss>gros, enflé</gloss> </cit> <cit type=“etymon"> <oRef xml:lang="nl">limoes</oRef> <gramGrp> <pos>commonNoun</pos> </gramGrp> <gloss>citron</gloss> </cit> /etymon/ /orth/=“pompel” /language/=“nl” /pos/=“adjective” /gloss/=“gros, enflé” /etymon/ /orth/=“limoes” /language/=“nl” /pos/=“commonNoun” /gloss/=“citron” pompel limoes + pamplemousse Historical Dutch Modern French /etymologicalLink/ /source/=“..”/target/=“…” /etymologicalClass/=/composition/ /biblSource/=“Boulan, König…” /confidenceScore/=“probable” <!— ‘borrowing’ section goes here —> Note: our TEI structures do not explicitly use an equivalent of /etymologicalLink/ or “ /source/=“..”/target/=“…” ) as this link is implicitly present in the xml data structure ≈ ≈
  28. 28. <lbl> <lang> <sense> 0…n <oRef @xml:lang> <etym type=“borrowing”> <ref @target> <form type=“lemma"> <gramGrp> <pos> <c> <orth> <seg @corresp> Etymological Processes: Borrowing & Compounding TEI model for ‘pompelmousse’ as converted from LMF (Alt 2006) <gloss @xml:lang> <cit type=“etymon”> <gen> <note> <pos> <gramGrp> <oRef @xml:lang> <gloss @xml:lang> <cit type=“etymon”> <etym type=“compounding”> <ref @target> <pos> <gramGrp> Lexical entry: <seg @corresp> <entry @xml:id type=“compound”>
  29. 29. Étymol. et Hist. 1. 1re moitié du xiies. put cel estre (Psautier Oxford, 54, 13 ds T.-L.); ca 1160 puet estre (Eneas, 9003, ibid.); début xves. peut-estre (Quinze joies mariage, éd. J. Rychner, XII, 12); 1824 peut-être bien (Joubert, loc. cit.); 2. 1636 employé elliptiquement pour répondre évasivement à une question (Corneille, Le Cid, I, 2); 3. 1775 détaché en fin de phrase, exprimant le défi, l'ironie (Beaumarchais, Barbier de Séville, II, 2); 4. fin xiies. puet estre que (Flore et Blancheflor, éd. J.-L. Leclanche, 407); 1641 peut-estre que (Corneille, Cinna, III, 1); 5. 1637 subst. un peut-estre (Id., La Place royale, IV, 6). Comp. de peut, 3epers. du sing. de l'ind. prés. de pouvoir* et de être*. <entry xml:id="peut-être" xml:lang="fr" type="compound"> <form type="lemma"> <orth><seg corresp="#pouvoir-3s-pres-ind">peut</seg><c>-</c><seg corresp="#être">être</seg></orth> <gramGrp> <pos>adv.</pos> </gramGrp> </form> … </entry> PEUT-ÊTRE, adv. Encoding from existing sources: synchronic portion of entry Trésor de la Langue Française For “compound” entry types, @corresp can (optionally) be used in the <seg> element to point to the individual sub components of the item within a project or externally;
  30. 30. PEUT-ÊTRE,adv. Encoding from existing sources: non-linguistic content portion of diachronic entry …. <etym xml:id=“PEUT-ÊTRE-adv-Étym-et-Hist” > <lbl>Étymol. et Hist.</lbl> <num>1.</num> …… <num>2.</num> ….. <num>3.</num> …… <num>4.</num> ….. <num>5.</num> …… <note> Comp. de peut, 3epers. du sing. de l'ind. prés. de pouvoir* et de être*. </note> </etym> … Trésor de la Langue Française Étymol. et Hist. 2. 1636 employé elliptiquement pour répondre évasivement à une question (Corneille, Le Cid, I, 2); 1. 1re moitié du xiies. put cel estre (Psautier Oxford, 54, 13 ds T.-L.); ca 1160 puet estre (Eneas, 9003, ibid.); début xves. peut-estre (Quinze joies mariage, éd. J. Rychner, XII, 12); 1824 peut-être bien (Joubert, loc. cit.); 3. 1775 détaché en fin de phrase, exprimant le défi, l'ironie (Beaumarchais, Barbier de Séville, II, 2); 4. fin xiies. puet estre que (Flore et Blancheflor, éd. J.-L. Leclanche, 407); 1641 peut-estre que (Corneille, Cinna, III, 1); 5. 1637 subst. un peut-estre (Id., La Place royale, IV, 6). Comp. de peut, 3epers. du sing. de l'ind. prés. de pouvoir* et de être*.
  31. 31. PEUT-ÊTRE, adv. Encoding from existing sources: diachronic portion of entry …. <sense> <etym xml:id=“PEUT-ÊTRE-adv-Étym-et-Hist” type="inheritance"> <lbl>Étymol. et Hist.</lbl> <num>1.</num> …… <num>2.</num> ….. <num>3.</num> …… <num>4.</num> ….. <num>5.</num> …… <note> Comp. de peut, 3epers. du sing. de l'ind. prés. de pouvoir* et de être*. </note> </etym> </sense> … Trésor de la Langue Française <cit type="attestation"> <date> </date> <oRef> </oRef> <gramGrp> <!—appropriate element here —> </gramGrp> <bibl> </bibl> <note> </note> </cit> …. template 2. 1636 employé elliptiquement pour répondre évasivement à une question (Corneille, Le Cid, I, 2); 1. 1re moitié du xiies. put cel estre (Psautier Oxford, 54, 13 ds T.-L.); ca 1160 puet estre (Eneas, 9003, ibid.); début xves. peut-estre (Quinze joies mariage, éd. J. Rychner, XII, 12); 1824 peut-être bien (Joubert, loc. cit.); 3. 1775 détaché en fin de phrase, exprimant le défi, l'ironie (Beaumarchais, Barbier de Séville, II, 2); 4. fin xiies. puet estre que (Flore et Blancheflor, éd. J.-L. Leclanche, 407); 1641 peut-estre que (Corneille, Cinna, III, 1); 5. 1637 subst. un peut-estre (Id., La Place royale, IV, 6).
  32. 32. Encoding from existing sources: diachronic portion of entry <cit type="attestation"> <date notBefore="1200" notAfter="1250">1re moitié du xiies</date> <oRef xml:lang="fro">put cel estre</oRef> <bibl>(Psautier Oxford, 54, 13 ds T.-L.)</bibl> </cit> Trésor de la Langue Française iso 639-3 code Old French (842-ca. 1400) fro iso 639-3 code Middle French (ca. 1400 - 1600) frm <cit type="attestation"> <date notBefore="1400" notAfter="1450">début xves</date> <oRef xml:lang="frm">peut-estre</oRef> <bibl>(Quinze joies mariage, éd. J. Rychner, XII, 12)</bibl> </cit> <cit type="attestation"> <date when="1824">1824</date> <oRef>peut-être bien</oRef> <bibl>(Joubert, loc. cit.)</bibl> </cit> …. 1re moitié du xiies. put cel estre (Psautier Oxford, 54, 13 ds T.-L.); 1824 peut-être bien (Joubert, loc. cit.); début xves. peut-estre (Quinze joies mariage, éd. J. Rychner, XII, 12); 1.
  33. 33. Conclusions and Summary Our TEI recommendations can facilitate: • linking and integrating corresponding data structures between the synchronic and diachronic levels; • the use of open source lexical resources and ontological information; • a more principled and consistent set of TEI guidelines for digitally encoding etymological information; • better compatibility between information traditionally kept, and formatted separately in etymological dictionaries, lexical dictionaries and linguistic analyses; • models for encoding ubiquitous processes of linguistic change for multiple levels of language; • theoretically agnostic data structures; • a more diverse set of etymological examples for the TEI guidelines;

×