SlideShare a Scribd company logo
1 of 46
Download to read offline
mFiL 2015 1
Linguistic markup and processing
of transclusion in XML documents
Simon Dew BA MISTC
6 November 2015
Copyright © Simon Dew 2015.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
mFiL 2015 2
Transclusion
mFiL 2015 3
Transclusion
• Theodor Holm Nelson, 1981: Literary Machines
• The inclusion of an electronic document, or part of a document, in
the rendering of another document.
• The main document does not contain a copy of the transcluded
text, but only a reference to it.
• The software used to render the document obtains the transcluded
material and incorporates it into the main work.
Ted Nelson photo by Dgies
Licensed under CC BY-SA 3.0
mFiL 2015 4
Transclusion
This presentation focuses on transclusion in XML (Extensible Markup
Language) documents, including, but not limited to:
• DocBook
• DITA
• TEI
• XHTML
mFiL 2015 5
Transclusion
Transclusion can be large scale / context-free:
mFiL 2015 6
Transclusion
Transclusion can be small scale / parametrised:
mFiL 2015 7
Transclusion
Transclusion can be small scale / parametrised:
• General entities
Definition:
<!ENTITY device "Euro 500">
Reference:
<title>Configuring the &device;</title>
Result:
<title>Configuring the Euro 500</title>
mFiL 2015 8
Transclusion
Transclusion can be small scale / parametrised:
• General entities
• XInclude
Definition:
<phrase xml:id="device">Euro 500</phrase>
Reference:
<title>Configuring the <xi:include
xpointer="xpath(id('device')/node())"/></title>
Result:
<title>Configuring the Euro 500</title>
mFiL 2015 9
Transclusion
Transclusion can be small scale / parametrised:
• General entities
• XInclude
• Specific transclusion mechanisms, e.g. DITA conref
Definition:
<ph id="device">Euro 500</para>
Reference:
<title>Configuring the <ph conref="device"/></title>
Result:
<title>Configuring the <ph>Euro 500</ph></title>
mFiL 2015 10
Transclusion
Transcluded content may vary.
mFiL 2015 11
Transclusion
Transcluded content may vary.
1. Local redefinition
mFiL 2015 12
Transclusion
Transcluded content may vary.
1. Local redefinition
2.Conditional processing:
• Conditional profiling — DocBook
• DITAVAL files — DITA
<xsl:param name="profile.vendor" select="'ACME'"/>
<val>
<prop action="include" att="product" val="ACME"/>
<prop action="exclude" att="product" val="Yoyodyne"/>
</val>
mFiL 2015 13
Linguistic consequences
mFiL 2015 14
Linguistic consequences
A different form of the transcluded word or phrase may be required
depending on the environment into which it is placed:
• Orthography, e.g. writing systems with upper case
• Syntactic case
• Definiteness
• Number
• Others, e.g. initial consonant mutation
<title>_____ Details</title>
organisational unit
[TITLE CASE]
mFiL 2015 15
Linguistic consequences
A different form of the transcluded word or phrase may be required
depending on the environment into which it is placed:
• Orthography, e.g. writing systems with upper case
• Syntactic case
• Definiteness
• Number
• Others, e.g. initial consonant mutation
<para>Om nödvändigt, välj _____.</para>
organisationsenhet
[+DEFINITE]
mFiL 2015 16
Linguistic consequences
If the transcluded word or phrase is the head of a phrase, it may
demand agreement from dependent words.
• Phonetics
• Gender
• Number
• Case
• Definiteness
<para>Configuring a _____ Server</para>
Oz 500
[_V]
mFiL 2015 17
Linguistic consequences
If the transcluded word or phrase is the head of a phrase, it may
demand agreement from dependent words.
• Phonetics
• Gender
• Number
• Case
• Definiteness
<para>Pour configurer le _____ auqel le modem est connecté :  </para>
tablette
[_C] [FEM] [SING]
mFiL 2015 18
Principles
mFiL 2015 19
Principles
1. Linguistic markup scheme
Defining transcluded term:
• Mark up all forms of term to be transcluded
• Mark up features which affect dependent words
Where transcluded term required:
• Mark up required form
• Mark up dependent words
mFiL 2015 20
Principles
2. Linguistic pre-processing
mFiL 2015 21
Principles
2. Linguistic pre-processing
mFiL 2015 22
Markup
mFiL 2015 23
Markup
XML attributes
• Extend markup schema
• Wrapper element:
DocBook <phrase>
DITA <ph>
HTML <span>
• Namespace:
http://stanleysecurity.github.io/PACBook/ns/linguistics
• Prefix:
ling
mFiL 2015 24
Markup
ling:pron Phonetic environment.
(V, C, ...)
ling:num Grammatical number.
(sg, pl, ...)
ling:case Grammatical case.
(nom, gen, dat, acc, ...)
ling:gen Grammatical gender.
(c, m, f, n, ...)
ling:class Definiteness / inflectional class.
(strong, weak, mixed, ind, def, ...)
ling:orth Orthographic case.
(upper, lower, title, sentence)
ling:type head — form of a head word;
depend — dependent word.
mFiL 2015 25
Markup
Resource — features of head noun that demand agreement
<resource xl:label="Product_Name">
<phrase vendor="ACME" ling:pron="C">Euro 500</phrase>
<phrase vendor="Yoyodyne" ling:pron="V">Oz 500</phrase>
</resource>
Phonetic environment:
⟨Euro⟩ / j ə ə /ˈ ʊ ɹ ʊ _C
⟨Oz⟩ / z /ˈɒ _V
mFiL 2015 26
Markup
Resource — all possible forms of head noun:
<resource xl:label="Org_Unit">
<phrase ling:gen="c" ling:num="sg">
<phrase ling:type="head" ling:case="nom"
ling:class="ind">organisationsenhet</phrase>
<phrase ling:type="head" ling:case="gen"
ling:class="ind">organisationsenhets</phrase>
<phrase ling:type="head" ling:case="nom"
ling:class="def">organisationsenheten</phrase>
<phrase ling:type="head" ling:case="gen"
ling:class="def">organisationsenhetens</phrase>
</phrase>
</resource>
mFiL 2015 27
Markup
Document — mark up required form of transcluded term
<para>Om nödvändigt, välj <phrase ling:class="def"
content:ref="Org_Unit"/>.</para>
<title><phrase ling:orth="title"
content:ref="Org_Unit"/> Details</title>
mFiL 2015 28
Markup
Document — mark up dependent words in text
<title>Configuring <wordasword ling:type="depend">a</wordasword>
<phrase content:ref="Product_Name"/> Server</title>
<para>Wenn
<phrase>
<wordasword ling:type="depend">ein</wordasword>
<phrase content:ref="Device"/>
</phrase>
konfiguriert wird, werden die Details
<phrase>
<wordasword ling:type="depend">der</wordasword>
<phrase content:ref="Device" ling:case="gen"/>
</phrase>
auf der Weboberfläche angezeigt.</para>
mFiL 2015 29
Dictionary
mFiL 2015 30
Dictionary
Complies with dictionaries module of the TEI.
<entry n="a">
<form>
<gramGrp><usg value="C"/></gramGrp>
<orth>a</orth>
</form>
<form>
<gramGrp><usg value="V"/></gramGrp>
<orth>an</orth>
</form>
</entry>
mFiL 2015 31
Dictionary
<usg> Phonetic environment.
(V, C, ...)
<num> Grammatical number.
(sg, pl, ...)
<case> Grammatical case.
(nom, gen, dat, acc, ...)
<gen> Grammatical gender.
(c, m, f, n, ...)
<oVar> Definiteness / inflectional class.
(strong, weak, mixed, ind, def, ...)
<orth> Output.
mFiL 2015 32
Software
mFiL 2015 33
Transformational stylesheets
PACBook XSLT transformations:
• LingHead.xsl — select the required declension of head nouns.
• LingDepend.xsl — inflect dependent words.
●
LingCasing.xsl — sets the orthographic case of specified text.
mFiL 2015 34
Transformational stylesheets
PACBook XSLT transformations:
• LingHead.xsl — select the required declension of head nouns.
• LingDepend.xsl — inflect dependent words.
• LingCasing.xsl — sets the orthographic case of specified text.
Licence:
GNU Lesser General Public License (LGPL) v3
Repository:
https://github.com/STANLEYSecurity/PACBook
mFiL 2015 35
Limitations
●
Only noun phrases.
●
Only tested with small handful of languages.
●
Linguistic markup different for translated texts.
●
Linguistic markup can be complex for authors.
mFiL 2015 36
Related work
●
Various linguistic markup schemas / ontologies
●
Internationalisation markup
●
Nothing else?
●
What should we call this?
mFiL 2015 37
Collaboration
●
Dictionary — Wiktionary.
●
Testing and improving.
●
Integrating with other publication workflows.
Development fork:
https://github.com/janiveer/PACBook
mFiL 2015 38
Examples
mFiL 2015 39
Example
Resource:
<resource xl:label="Doc">
<phrase outputformat="PDF" ling:gen="n" ling:num="sg">
<phrase ling:type="head" ling:case="nom">Dokument</phrase>
<phrase ling:type="head" ling:case="acc">Dokument</phrase>
<phrase ling:type="head" ling:case="gen">Dokuments</phrase>
<phrase ling:type="head" ling:case="dat">Dokument</phrase>
</phrase>
<phrase outputformat="CHM" ling:gen="f" ling:num="sg">
<phrase ling:type="head" ling:case="nom">Hilfedatei</phrase>
<phrase ling:type="head" ling:case="acc">Hilfedatei</phrase>
<phrase ling:type="head" ling:case="gen">Hilfedatei</phrase>
<phrase ling:type="head" ling:case="dat">Hilfedatei</phrase>
</phrase>
</resource>
mFiL 2015 40
Example
Document:
<para>Die Einstellung der IP-Adresse ist in
<wordasword ling:type="depend">dies</wordasword>
<phrase content:ref="Doc" ling:case="dat"/>
nicht enthalten.</para>
mFiL 2015 41
Example
After transclusion:
<para>Die Einstellung der IP-Adresse ist in
<wordasword ling:type="depend">dies</wordasword>
<phrase ling:case="dat">
<phrase outputformat="PDF" ling:gen="n" ling:num="sg">
<phrase ling:type="head" ling:case="nom">Dokument</phrase>
<phrase ling:type="head" ling:case="acc">Dokument</phrase>
<phrase ling:type="head" ling:case="gen">Dokuments</phrase>
<phrase ling:type="head" ling:case="dat">Dokument</phrase>
</phrase>
<phrase outputformat="CHM" ling:gen="f" ling:num="sg">
<phrase ling:type="head" ling:case="nom">Hilfedatei</phrase>
<phrase ling:type="head" ling:case="acc">Hilfedatei</phrase>
<phrase ling:type="head" ling:case="gen">Hilfedatei</phrase>
<phrase ling:type="head" ling:case="dat">Hilfedatei</phrase>
</phrase>
</phrase>
nicht enthalten.</para>
mFiL 2015 42
Example
After head transformation:
<para>Die Einstellung der IP-Adresse ist in
<wordasword ling:type="depend">dies</wordasword>
<phrase ling:case="dat">
<phrase outputformat="PDF" ling:gen="n" ling:num="sg">
<phrase ling:type="head" ling:case="dat">Dokument</phrase>
</phrase>
<phrase outputformat="CHM" ling:gen="f" ling:num="sg">
<phrase ling:type="head" ling:case="dat">Hilfedatei</phrase>
</phrase>
</phrase>
nicht enthalten.</para>
mFiL 2015 43
Example
After conditional processing:
<para>Die Einstellung der IP-Adresse ist in
<wordasword ling:type="depend">dies</wordasword>
<phrase ling:case="dat">
<phrase outputformat="PDF" ling:gen="n" ling:num="sg">
<phrase ling:type="head" ling:case="dat">Dokument</phrase>
</phrase>
</phrase>
nicht enthalten.</para>
<para>Die Einstellung der IP-Adresse ist in
<wordasword ling:type="depend">dies</wordasword>
<phrase ling:case="dat">
<phrase outputformat="CHM" ling:gen="f" ling:num="sg">
<phrase ling:type="head" ling:case="dat">Hilfedatei</phrase>
</phrase>
</phrase>
nicht enthalten.</para>
mFiL 2015 44
Example
After dependent transformation:
<para>Die Einstellung der IP-Adresse ist in
<wordasword ling:type="depend">diesem</wordasword>
<phrase ling:case="dat">
<phrase outputformat="PDF" ling:gen="n" ling:num="sg">
<phrase ling:type="head" ling:case="dat">Dokument</phrase>
</phrase>
</phrase>
nicht enthalten.</para>
<para>Die Einstellung der IP-Adresse ist in
<wordasword ling:type="depend">dieser</wordasword>
<phrase ling:case="dat">
<phrase outputformat="CHM" ling:gen="f" ling:num="sg">
<phrase ling:type="head" ling:case="dat">Hilfedatei</phrase>
</phrase>
</phrase>
nicht enthalten.</para>
mFiL 2015 45
Questions?
mFiL 2015 46
References
●
[Nelson] Theodor Holm Nelson. 1981. Literary Machines. Mindful Press, Sausalito, California.
●
[XML] Tim Bray, Jean Paoli, C. M. Sperberg-McQueen, Eve Maler, François Yergeau, editors. 26 November 2008. Extensible
Markup Language (XML) 1.0 (Fifth Edition). World Wide Web Consortium (W3C).
●
[DocBook] DocBook Technical Committee. 1 November 2009. The DocBook Schema Version 5.0. Organization for the
Advancement of Structured Information Standards (OASIS).
●
[DITA] OASIS DITA Technical Committee. 1 December 2010. Darwin Information Typing Architecture (DITA) Version 1.2.
Organization for the Advancement of Structured Information Standards (OASIS).
●
[TEI] TEI Consortium, eds. 20 January 2014. TEI P5: Guidelines for Electronic Text Encoding and Interchange, 2.6.0. TEI
Consortium.
●
[HTML] Ian Hickson, Robin Berjon, Steve Faulkner, Travis Leithead, Erika Doyle Navara, Edward O’Connor, Silvia Pfeiffer, editors.
28 October 2014. HTML5. World Wide Web Consortium (W3C).
●
[XInclude] Jonathan Marsh, David Orchard, and Daniel Veillard, editors. 15 November 2006. XML Inclusions (XInclude) Version
1.0 (Second Edition). World Wide Web Consortium (W3C).
●
[XSLT] James Clark, editor. 16 November 1999. XSL Transformations (XSLT) Version 1.0. World Wide Web Consortium (W3C).
●
[Ant] Stephane Bailliez, et al. December 29, 2013. Apache Ant™ 1.9.3 Manual. The Apache Software Foundation.
●
[XProc] Norman Walsh, Alex Milowski, and Henry S. Thompson, editors. 11 May 2010. XProc: An XML Pipeline Language. World
Wide Web Consortium (W3C).
●
[XLIFF] OASIS XLIFF Technical Committee. 1 February 2008. XML Localisation Interchange File Format (XLIFF) Version 1.2.
Organization for the Advancement of Structured Information Standards (OASIS).
●
[GOLD] Scott Farrar and D. Terence Langendoen. 2003. A linguistic ontology for the Semantic Web. GLOT International. 7 (3),
pp.97-100.
●
[ISOcat] M. Kemps-Snijders, M.A. Windhouwer, P. Wittenburg, S.E. Wright. November 2009. ISOcat: Remodeling Metadata for
Language Resources. International Journal of Metadata, Semantics and Ontologies (IJMSO), 4(4), pp 261-276.
●
[ICU] ICU Project Management Committee. 7 October 2015. ICU 56. ICU — International Components for Unicode.

More Related Content

What's hot

Corpus annotation for corpus linguistics (nov2009)
Corpus annotation for corpus linguistics (nov2009)Corpus annotation for corpus linguistics (nov2009)
Corpus annotation for corpus linguistics (nov2009)Jorge Baptista
 
Trends In Languages 2010
Trends In Languages 2010Trends In Languages 2010
Trends In Languages 2010Markus Voelter
 
Guidance, Please! Towards a Framework for RDF-based Constraint Languages.
Guidance, Please! Towards a Framework for RDF-based Constraint Languages.Guidance, Please! Towards a Framework for RDF-based Constraint Languages.
Guidance, Please! Towards a Framework for RDF-based Constraint Languages.Kai Eckert
 
Xbase implementing specific domain language for java
Xbase  implementing specific domain language for javaXbase  implementing specific domain language for java
Xbase implementing specific domain language for javaYash Patel
 
Grosof haley-talk-semtech2013-ver6-10-13
Grosof haley-talk-semtech2013-ver6-10-13Grosof haley-talk-semtech2013-ver6-10-13
Grosof haley-talk-semtech2013-ver6-10-13Brian Ulicny
 
Overview of the SPARQL-Generate language and latest developments
Overview of the SPARQL-Generate language and latest developmentsOverview of the SPARQL-Generate language and latest developments
Overview of the SPARQL-Generate language and latest developmentsMaxime Lefrançois
 
Implications Of Dual Participation Of Floss Developer
Implications Of Dual Participation Of Floss DeveloperImplications Of Dual Participation Of Floss Developer
Implications Of Dual Participation Of Floss DeveloperDr. Sulayman K. Sowe
 
Representing Translations on the Semantic Web
Representing Translations on the Semantic WebRepresenting Translations on the Semantic Web
Representing Translations on the Semantic WebOscar Corcho
 
A Repository of Free Lexical Resources for African Languages: The Project and...
A Repository of Free Lexical Resources for African Languages: The Project and...A Repository of Free Lexical Resources for African Languages: The Project and...
A Repository of Free Lexical Resources for African Languages: The Project and...Guy De Pauw
 
morph-LDP: An R2RML-based Linked Data Platform implementation
morph-LDP: An R2RML-based Linked Data Platform implementationmorph-LDP: An R2RML-based Linked Data Platform implementation
morph-LDP: An R2RML-based Linked Data Platform implementationNandana Mihindukulasooriya
 
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)Dr.-Ing. Thomas Hartmann
 

What's hot (16)

Corpus annotation for corpus linguistics (nov2009)
Corpus annotation for corpus linguistics (nov2009)Corpus annotation for corpus linguistics (nov2009)
Corpus annotation for corpus linguistics (nov2009)
 
Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-
 
cldr_overview
cldr_overviewcldr_overview
cldr_overview
 
Trends In Languages 2010
Trends In Languages 2010Trends In Languages 2010
Trends In Languages 2010
 
sw owl
 sw owl sw owl
sw owl
 
Guidance, Please! Towards a Framework for RDF-based Constraint Languages.
Guidance, Please! Towards a Framework for RDF-based Constraint Languages.Guidance, Please! Towards a Framework for RDF-based Constraint Languages.
Guidance, Please! Towards a Framework for RDF-based Constraint Languages.
 
Xbase implementing specific domain language for java
Xbase  implementing specific domain language for javaXbase  implementing specific domain language for java
Xbase implementing specific domain language for java
 
Grosof haley-talk-semtech2013-ver6-10-13
Grosof haley-talk-semtech2013-ver6-10-13Grosof haley-talk-semtech2013-ver6-10-13
Grosof haley-talk-semtech2013-ver6-10-13
 
Overview of the SPARQL-Generate language and latest developments
Overview of the SPARQL-Generate language and latest developmentsOverview of the SPARQL-Generate language and latest developments
Overview of the SPARQL-Generate language and latest developments
 
Implications Of Dual Participation Of Floss Developer
Implications Of Dual Participation Of Floss DeveloperImplications Of Dual Participation Of Floss Developer
Implications Of Dual Participation Of Floss Developer
 
Efficient RDF Interchange (ERI) Format for RDF Data Streams
Efficient RDF Interchange (ERI) Format for RDF Data StreamsEfficient RDF Interchange (ERI) Format for RDF Data Streams
Efficient RDF Interchange (ERI) Format for RDF Data Streams
 
Representing Translations on the Semantic Web
Representing Translations on the Semantic WebRepresenting Translations on the Semantic Web
Representing Translations on the Semantic Web
 
A Repository of Free Lexical Resources for African Languages: The Project and...
A Repository of Free Lexical Resources for African Languages: The Project and...A Repository of Free Lexical Resources for African Languages: The Project and...
A Repository of Free Lexical Resources for African Languages: The Project and...
 
master_thesis_greciano_v2
master_thesis_greciano_v2master_thesis_greciano_v2
master_thesis_greciano_v2
 
morph-LDP: An R2RML-based Linked Data Platform implementation
morph-LDP: An R2RML-based Linked Data Platform implementationmorph-LDP: An R2RML-based Linked Data Platform implementation
morph-LDP: An R2RML-based Linked Data Platform implementation
 
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)
 

Viewers also liked

Assessment Literacy Module - Assessment Literacy Workshop in Kazakhstan (ENG...
Assessment Literacy Module  - Assessment Literacy Workshop in Kazakhstan (ENG...Assessment Literacy Module  - Assessment Literacy Workshop in Kazakhstan (ENG...
Assessment Literacy Module - Assessment Literacy Workshop in Kazakhstan (ENG...Kathleen Sullivan
 
মাযহাব বুঝার সরল পথ
মাযহাব বুঝার সরল পথমাযহাব বুঝার সরল পথ
মাযহাব বুঝার সরল পথHasan Masrur
 
بعطائكم نستمر مؤسسة أمل حضرموت اليمن
بعطائكم نستمر مؤسسة أمل  حضرموت اليمنبعطائكم نستمر مؤسسة أمل  حضرموت اليمن
بعطائكم نستمر مؤسسة أمل حضرموت اليمنProf. Ahmed Mohamed Badheeb
 
sheldon_patterson_cv doc 2016
sheldon_patterson_cv doc 2016sheldon_patterson_cv doc 2016
sheldon_patterson_cv doc 2016sheldon patterson
 
Day 4_PublicAwarenessPresentation
Day 4_PublicAwarenessPresentationDay 4_PublicAwarenessPresentation
Day 4_PublicAwarenessPresentationKathleen Sullivan
 

Viewers also liked (8)

Assessment Literacy Module - Assessment Literacy Workshop in Kazakhstan (ENG...
Assessment Literacy Module  - Assessment Literacy Workshop in Kazakhstan (ENG...Assessment Literacy Module  - Assessment Literacy Workshop in Kazakhstan (ENG...
Assessment Literacy Module - Assessment Literacy Workshop in Kazakhstan (ENG...
 
মাযহাব বুঝার সরল পথ
মাযহাব বুঝার সরল পথমাযহাব বুঝার সরল পথ
মাযহাব বুঝার সরল পথ
 
بعطائكم نستمر مؤسسة أمل حضرموت اليمن
بعطائكم نستمر مؤسسة أمل  حضرموت اليمنبعطائكم نستمر مؤسسة أمل  حضرموت اليمن
بعطائكم نستمر مؤسسة أمل حضرموت اليمن
 
sheldon_patterson_cv doc 2016
sheldon_patterson_cv doc 2016sheldon_patterson_cv doc 2016
sheldon_patterson_cv doc 2016
 
04_16dougWynn_CV-SC
04_16dougWynn_CV-SC04_16dougWynn_CV-SC
04_16dougWynn_CV-SC
 
Presentación1
Presentación1Presentación1
Presentación1
 
Day 4_PublicAwarenessPresentation
Day 4_PublicAwarenessPresentationDay 4_PublicAwarenessPresentation
Day 4_PublicAwarenessPresentation
 
Branney-Gant Research Paper
Branney-Gant Research PaperBranney-Gant Research Paper
Branney-Gant Research Paper
 

Similar to Linguistic markup and transclusion processing in XML documents

16 wordprocessing ml subject - odds and ends
16   wordprocessing ml subject - odds and ends16   wordprocessing ml subject - odds and ends
16 wordprocessing ml subject - odds and endsShawn Villaron
 
Xml data transformation
Xml data transformationXml data transformation
Xml data transformationRaghu nath
 
DDD Overrated GOTOpia.pdf
DDD Overrated GOTOpia.pdfDDD Overrated GOTOpia.pdf
DDD Overrated GOTOpia.pdfhewas1
 
RuleML2015 - Tutorial - Powerful Practical Semantic Rules in Rulelog - Funda...
RuleML2015 - Tutorial -  Powerful Practical Semantic Rules in Rulelog - Funda...RuleML2015 - Tutorial -  Powerful Practical Semantic Rules in Rulelog - Funda...
RuleML2015 - Tutorial - Powerful Practical Semantic Rules in Rulelog - Funda...RuleML
 
You Want to Go XML-First: Now What? Building an In-House XML-First Workflow -...
You Want to Go XML-First: Now What? Building an In-House XML-First Workflow -...You Want to Go XML-First: Now What? Building an In-House XML-First Workflow -...
You Want to Go XML-First: Now What? Building an In-House XML-First Workflow -...BookNet Canada
 
Compiler Construction | Lecture 6 | Introduction to Static Analysis
Compiler Construction | Lecture 6 | Introduction to Static AnalysisCompiler Construction | Lecture 6 | Introduction to Static Analysis
Compiler Construction | Lecture 6 | Introduction to Static AnalysisEelco Visser
 
Introduction To Docbook 4 .5 Authoring
Introduction To Docbook 4 .5   AuthoringIntroduction To Docbook 4 .5   Authoring
Introduction To Docbook 4 .5 AuthoringViswanath J
 
Building bridges - Plone Conference 2015 Bucharest
Building bridges   - Plone Conference 2015 BucharestBuilding bridges   - Plone Conference 2015 Bucharest
Building bridges - Plone Conference 2015 BucharestAndreas Jung
 
2015 bioinformatics python_introduction_wim_vancriekinge_vfinal
2015 bioinformatics python_introduction_wim_vancriekinge_vfinal2015 bioinformatics python_introduction_wim_vancriekinge_vfinal
2015 bioinformatics python_introduction_wim_vancriekinge_vfinalProf. Wim Van Criekinge
 
A Technical Comparison: ISO/IEC 26300 vs Microsoft Office Open XML
A Technical Comparison: ISO/IEC 26300 vs Microsoft Office Open XML A Technical Comparison: ISO/IEC 26300 vs Microsoft Office Open XML
A Technical Comparison: ISO/IEC 26300 vs Microsoft Office Open XML Alexandro Colorado
 
Extensible stylesheet language (Transformation) or XSLT
Extensible stylesheet language (Transformation) or XSLTExtensible stylesheet language (Transformation) or XSLT
Extensible stylesheet language (Transformation) or XSLTAshikur Rahman
 
DITA Quick Start for Authors - Part I
DITA Quick Start for Authors - Part IDITA Quick Start for Authors - Part I
DITA Quick Start for Authors - Part ISuite Solutions
 
Keep Calm and Specialize your Content Model
Keep Calm and Specialize your Content ModelKeep Calm and Specialize your Content Model
Keep Calm and Specialize your Content Modelctnitchie
 
Translation with technology
Translation with technologyTranslation with technology
Translation with technologyAna Lucia Amaral
 

Similar to Linguistic markup and transclusion processing in XML documents (20)

16 wordprocessing ml subject - odds and ends
16   wordprocessing ml subject - odds and ends16   wordprocessing ml subject - odds and ends
16 wordprocessing ml subject - odds and ends
 
Xml data transformation
Xml data transformationXml data transformation
Xml data transformation
 
Java Web Services
Java Web ServicesJava Web Services
Java Web Services
 
DDD Overrated GOTOpia.pdf
DDD Overrated GOTOpia.pdfDDD Overrated GOTOpia.pdf
DDD Overrated GOTOpia.pdf
 
RuleML2015 - Tutorial - Powerful Practical Semantic Rules in Rulelog - Funda...
RuleML2015 - Tutorial -  Powerful Practical Semantic Rules in Rulelog - Funda...RuleML2015 - Tutorial -  Powerful Practical Semantic Rules in Rulelog - Funda...
RuleML2015 - Tutorial - Powerful Practical Semantic Rules in Rulelog - Funda...
 
You Want to Go XML-First: Now What? Building an In-House XML-First Workflow -...
You Want to Go XML-First: Now What? Building an In-House XML-First Workflow -...You Want to Go XML-First: Now What? Building an In-House XML-First Workflow -...
You Want to Go XML-First: Now What? Building an In-House XML-First Workflow -...
 
Compiler Construction | Lecture 6 | Introduction to Static Analysis
Compiler Construction | Lecture 6 | Introduction to Static AnalysisCompiler Construction | Lecture 6 | Introduction to Static Analysis
Compiler Construction | Lecture 6 | Introduction to Static Analysis
 
Lemon at-mlw3
Lemon at-mlw3Lemon at-mlw3
Lemon at-mlw3
 
Why XML is important for everyone, especially technical communicators
Why XML is important for everyone, especially technical communicatorsWhy XML is important for everyone, especially technical communicators
Why XML is important for everyone, especially technical communicators
 
Understanding linport
Understanding linportUnderstanding linport
Understanding linport
 
Introduction To Docbook 4 .5 Authoring
Introduction To Docbook 4 .5   AuthoringIntroduction To Docbook 4 .5   Authoring
Introduction To Docbook 4 .5 Authoring
 
Building bridges - Plone Conference 2015 Bucharest
Building bridges   - Plone Conference 2015 BucharestBuilding bridges   - Plone Conference 2015 Bucharest
Building bridges - Plone Conference 2015 Bucharest
 
2015 bioinformatics python_introduction_wim_vancriekinge_vfinal
2015 bioinformatics python_introduction_wim_vancriekinge_vfinal2015 bioinformatics python_introduction_wim_vancriekinge_vfinal
2015 bioinformatics python_introduction_wim_vancriekinge_vfinal
 
A Technical Comparison: ISO/IEC 26300 vs Microsoft Office Open XML
A Technical Comparison: ISO/IEC 26300 vs Microsoft Office Open XML A Technical Comparison: ISO/IEC 26300 vs Microsoft Office Open XML
A Technical Comparison: ISO/IEC 26300 vs Microsoft Office Open XML
 
Extensible stylesheet language (Transformation) or XSLT
Extensible stylesheet language (Transformation) or XSLTExtensible stylesheet language (Transformation) or XSLT
Extensible stylesheet language (Transformation) or XSLT
 
DITA Quick Start for Authors - Part I
DITA Quick Start for Authors - Part IDITA Quick Start for Authors - Part I
DITA Quick Start for Authors - Part I
 
Keep Calm and Specialize your Content Model
Keep Calm and Specialize your Content ModelKeep Calm and Specialize your Content Model
Keep Calm and Specialize your Content Model
 
Translation with technology
Translation with technologyTranslation with technology
Translation with technology
 
Metamorphic Domain-Specific Languages
Metamorphic Domain-Specific LanguagesMetamorphic Domain-Specific Languages
Metamorphic Domain-Specific Languages
 
Markup For Dummies (Russ Ward)
Markup For Dummies (Russ Ward)Markup For Dummies (Russ Ward)
Markup For Dummies (Russ Ward)
 

Recently uploaded

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 

Recently uploaded (20)

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 

Linguistic markup and transclusion processing in XML documents

  • 1. mFiL 2015 1 Linguistic markup and processing of transclusion in XML documents Simon Dew BA MISTC 6 November 2015 Copyright © Simon Dew 2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
  • 3. mFiL 2015 3 Transclusion • Theodor Holm Nelson, 1981: Literary Machines • The inclusion of an electronic document, or part of a document, in the rendering of another document. • The main document does not contain a copy of the transcluded text, but only a reference to it. • The software used to render the document obtains the transcluded material and incorporates it into the main work. Ted Nelson photo by Dgies Licensed under CC BY-SA 3.0
  • 4. mFiL 2015 4 Transclusion This presentation focuses on transclusion in XML (Extensible Markup Language) documents, including, but not limited to: • DocBook • DITA • TEI • XHTML
  • 5. mFiL 2015 5 Transclusion Transclusion can be large scale / context-free:
  • 6. mFiL 2015 6 Transclusion Transclusion can be small scale / parametrised:
  • 7. mFiL 2015 7 Transclusion Transclusion can be small scale / parametrised: • General entities Definition: <!ENTITY device "Euro 500"> Reference: <title>Configuring the &device;</title> Result: <title>Configuring the Euro 500</title>
  • 8. mFiL 2015 8 Transclusion Transclusion can be small scale / parametrised: • General entities • XInclude Definition: <phrase xml:id="device">Euro 500</phrase> Reference: <title>Configuring the <xi:include xpointer="xpath(id('device')/node())"/></title> Result: <title>Configuring the Euro 500</title>
  • 9. mFiL 2015 9 Transclusion Transclusion can be small scale / parametrised: • General entities • XInclude • Specific transclusion mechanisms, e.g. DITA conref Definition: <ph id="device">Euro 500</para> Reference: <title>Configuring the <ph conref="device"/></title> Result: <title>Configuring the <ph>Euro 500</ph></title>
  • 11. mFiL 2015 11 Transclusion Transcluded content may vary. 1. Local redefinition
  • 12. mFiL 2015 12 Transclusion Transcluded content may vary. 1. Local redefinition 2.Conditional processing: • Conditional profiling — DocBook • DITAVAL files — DITA <xsl:param name="profile.vendor" select="'ACME'"/> <val> <prop action="include" att="product" val="ACME"/> <prop action="exclude" att="product" val="Yoyodyne"/> </val>
  • 13. mFiL 2015 13 Linguistic consequences
  • 14. mFiL 2015 14 Linguistic consequences A different form of the transcluded word or phrase may be required depending on the environment into which it is placed: • Orthography, e.g. writing systems with upper case • Syntactic case • Definiteness • Number • Others, e.g. initial consonant mutation <title>_____ Details</title> organisational unit [TITLE CASE]
  • 15. mFiL 2015 15 Linguistic consequences A different form of the transcluded word or phrase may be required depending on the environment into which it is placed: • Orthography, e.g. writing systems with upper case • Syntactic case • Definiteness • Number • Others, e.g. initial consonant mutation <para>Om nödvändigt, välj _____.</para> organisationsenhet [+DEFINITE]
  • 16. mFiL 2015 16 Linguistic consequences If the transcluded word or phrase is the head of a phrase, it may demand agreement from dependent words. • Phonetics • Gender • Number • Case • Definiteness <para>Configuring a _____ Server</para> Oz 500 [_V]
  • 17. mFiL 2015 17 Linguistic consequences If the transcluded word or phrase is the head of a phrase, it may demand agreement from dependent words. • Phonetics • Gender • Number • Case • Definiteness <para>Pour configurer le _____ auqel le modem est connecté :  </para> tablette [_C] [FEM] [SING]
  • 19. mFiL 2015 19 Principles 1. Linguistic markup scheme Defining transcluded term: • Mark up all forms of term to be transcluded • Mark up features which affect dependent words Where transcluded term required: • Mark up required form • Mark up dependent words
  • 20. mFiL 2015 20 Principles 2. Linguistic pre-processing
  • 21. mFiL 2015 21 Principles 2. Linguistic pre-processing
  • 23. mFiL 2015 23 Markup XML attributes • Extend markup schema • Wrapper element: DocBook <phrase> DITA <ph> HTML <span> • Namespace: http://stanleysecurity.github.io/PACBook/ns/linguistics • Prefix: ling
  • 24. mFiL 2015 24 Markup ling:pron Phonetic environment. (V, C, ...) ling:num Grammatical number. (sg, pl, ...) ling:case Grammatical case. (nom, gen, dat, acc, ...) ling:gen Grammatical gender. (c, m, f, n, ...) ling:class Definiteness / inflectional class. (strong, weak, mixed, ind, def, ...) ling:orth Orthographic case. (upper, lower, title, sentence) ling:type head — form of a head word; depend — dependent word.
  • 25. mFiL 2015 25 Markup Resource — features of head noun that demand agreement <resource xl:label="Product_Name"> <phrase vendor="ACME" ling:pron="C">Euro 500</phrase> <phrase vendor="Yoyodyne" ling:pron="V">Oz 500</phrase> </resource> Phonetic environment: ⟨Euro⟩ / j ə ə /ˈ ʊ ɹ ʊ _C ⟨Oz⟩ / z /ˈɒ _V
  • 26. mFiL 2015 26 Markup Resource — all possible forms of head noun: <resource xl:label="Org_Unit"> <phrase ling:gen="c" ling:num="sg"> <phrase ling:type="head" ling:case="nom" ling:class="ind">organisationsenhet</phrase> <phrase ling:type="head" ling:case="gen" ling:class="ind">organisationsenhets</phrase> <phrase ling:type="head" ling:case="nom" ling:class="def">organisationsenheten</phrase> <phrase ling:type="head" ling:case="gen" ling:class="def">organisationsenhetens</phrase> </phrase> </resource>
  • 27. mFiL 2015 27 Markup Document — mark up required form of transcluded term <para>Om nödvändigt, välj <phrase ling:class="def" content:ref="Org_Unit"/>.</para> <title><phrase ling:orth="title" content:ref="Org_Unit"/> Details</title>
  • 28. mFiL 2015 28 Markup Document — mark up dependent words in text <title>Configuring <wordasword ling:type="depend">a</wordasword> <phrase content:ref="Product_Name"/> Server</title> <para>Wenn <phrase> <wordasword ling:type="depend">ein</wordasword> <phrase content:ref="Device"/> </phrase> konfiguriert wird, werden die Details <phrase> <wordasword ling:type="depend">der</wordasword> <phrase content:ref="Device" ling:case="gen"/> </phrase> auf der Weboberfläche angezeigt.</para>
  • 30. mFiL 2015 30 Dictionary Complies with dictionaries module of the TEI. <entry n="a"> <form> <gramGrp><usg value="C"/></gramGrp> <orth>a</orth> </form> <form> <gramGrp><usg value="V"/></gramGrp> <orth>an</orth> </form> </entry>
  • 31. mFiL 2015 31 Dictionary <usg> Phonetic environment. (V, C, ...) <num> Grammatical number. (sg, pl, ...) <case> Grammatical case. (nom, gen, dat, acc, ...) <gen> Grammatical gender. (c, m, f, n, ...) <oVar> Definiteness / inflectional class. (strong, weak, mixed, ind, def, ...) <orth> Output.
  • 33. mFiL 2015 33 Transformational stylesheets PACBook XSLT transformations: • LingHead.xsl — select the required declension of head nouns. • LingDepend.xsl — inflect dependent words. ● LingCasing.xsl — sets the orthographic case of specified text.
  • 34. mFiL 2015 34 Transformational stylesheets PACBook XSLT transformations: • LingHead.xsl — select the required declension of head nouns. • LingDepend.xsl — inflect dependent words. • LingCasing.xsl — sets the orthographic case of specified text. Licence: GNU Lesser General Public License (LGPL) v3 Repository: https://github.com/STANLEYSecurity/PACBook
  • 35. mFiL 2015 35 Limitations ● Only noun phrases. ● Only tested with small handful of languages. ● Linguistic markup different for translated texts. ● Linguistic markup can be complex for authors.
  • 36. mFiL 2015 36 Related work ● Various linguistic markup schemas / ontologies ● Internationalisation markup ● Nothing else? ● What should we call this?
  • 37. mFiL 2015 37 Collaboration ● Dictionary — Wiktionary. ● Testing and improving. ● Integrating with other publication workflows. Development fork: https://github.com/janiveer/PACBook
  • 39. mFiL 2015 39 Example Resource: <resource xl:label="Doc"> <phrase outputformat="PDF" ling:gen="n" ling:num="sg"> <phrase ling:type="head" ling:case="nom">Dokument</phrase> <phrase ling:type="head" ling:case="acc">Dokument</phrase> <phrase ling:type="head" ling:case="gen">Dokuments</phrase> <phrase ling:type="head" ling:case="dat">Dokument</phrase> </phrase> <phrase outputformat="CHM" ling:gen="f" ling:num="sg"> <phrase ling:type="head" ling:case="nom">Hilfedatei</phrase> <phrase ling:type="head" ling:case="acc">Hilfedatei</phrase> <phrase ling:type="head" ling:case="gen">Hilfedatei</phrase> <phrase ling:type="head" ling:case="dat">Hilfedatei</phrase> </phrase> </resource>
  • 40. mFiL 2015 40 Example Document: <para>Die Einstellung der IP-Adresse ist in <wordasword ling:type="depend">dies</wordasword> <phrase content:ref="Doc" ling:case="dat"/> nicht enthalten.</para>
  • 41. mFiL 2015 41 Example After transclusion: <para>Die Einstellung der IP-Adresse ist in <wordasword ling:type="depend">dies</wordasword> <phrase ling:case="dat"> <phrase outputformat="PDF" ling:gen="n" ling:num="sg"> <phrase ling:type="head" ling:case="nom">Dokument</phrase> <phrase ling:type="head" ling:case="acc">Dokument</phrase> <phrase ling:type="head" ling:case="gen">Dokuments</phrase> <phrase ling:type="head" ling:case="dat">Dokument</phrase> </phrase> <phrase outputformat="CHM" ling:gen="f" ling:num="sg"> <phrase ling:type="head" ling:case="nom">Hilfedatei</phrase> <phrase ling:type="head" ling:case="acc">Hilfedatei</phrase> <phrase ling:type="head" ling:case="gen">Hilfedatei</phrase> <phrase ling:type="head" ling:case="dat">Hilfedatei</phrase> </phrase> </phrase> nicht enthalten.</para>
  • 42. mFiL 2015 42 Example After head transformation: <para>Die Einstellung der IP-Adresse ist in <wordasword ling:type="depend">dies</wordasword> <phrase ling:case="dat"> <phrase outputformat="PDF" ling:gen="n" ling:num="sg"> <phrase ling:type="head" ling:case="dat">Dokument</phrase> </phrase> <phrase outputformat="CHM" ling:gen="f" ling:num="sg"> <phrase ling:type="head" ling:case="dat">Hilfedatei</phrase> </phrase> </phrase> nicht enthalten.</para>
  • 43. mFiL 2015 43 Example After conditional processing: <para>Die Einstellung der IP-Adresse ist in <wordasword ling:type="depend">dies</wordasword> <phrase ling:case="dat"> <phrase outputformat="PDF" ling:gen="n" ling:num="sg"> <phrase ling:type="head" ling:case="dat">Dokument</phrase> </phrase> </phrase> nicht enthalten.</para> <para>Die Einstellung der IP-Adresse ist in <wordasword ling:type="depend">dies</wordasword> <phrase ling:case="dat"> <phrase outputformat="CHM" ling:gen="f" ling:num="sg"> <phrase ling:type="head" ling:case="dat">Hilfedatei</phrase> </phrase> </phrase> nicht enthalten.</para>
  • 44. mFiL 2015 44 Example After dependent transformation: <para>Die Einstellung der IP-Adresse ist in <wordasword ling:type="depend">diesem</wordasword> <phrase ling:case="dat"> <phrase outputformat="PDF" ling:gen="n" ling:num="sg"> <phrase ling:type="head" ling:case="dat">Dokument</phrase> </phrase> </phrase> nicht enthalten.</para> <para>Die Einstellung der IP-Adresse ist in <wordasword ling:type="depend">dieser</wordasword> <phrase ling:case="dat"> <phrase outputformat="CHM" ling:gen="f" ling:num="sg"> <phrase ling:type="head" ling:case="dat">Hilfedatei</phrase> </phrase> </phrase> nicht enthalten.</para>
  • 46. mFiL 2015 46 References ● [Nelson] Theodor Holm Nelson. 1981. Literary Machines. Mindful Press, Sausalito, California. ● [XML] Tim Bray, Jean Paoli, C. M. Sperberg-McQueen, Eve Maler, François Yergeau, editors. 26 November 2008. Extensible Markup Language (XML) 1.0 (Fifth Edition). World Wide Web Consortium (W3C). ● [DocBook] DocBook Technical Committee. 1 November 2009. The DocBook Schema Version 5.0. Organization for the Advancement of Structured Information Standards (OASIS). ● [DITA] OASIS DITA Technical Committee. 1 December 2010. Darwin Information Typing Architecture (DITA) Version 1.2. Organization for the Advancement of Structured Information Standards (OASIS). ● [TEI] TEI Consortium, eds. 20 January 2014. TEI P5: Guidelines for Electronic Text Encoding and Interchange, 2.6.0. TEI Consortium. ● [HTML] Ian Hickson, Robin Berjon, Steve Faulkner, Travis Leithead, Erika Doyle Navara, Edward O’Connor, Silvia Pfeiffer, editors. 28 October 2014. HTML5. World Wide Web Consortium (W3C). ● [XInclude] Jonathan Marsh, David Orchard, and Daniel Veillard, editors. 15 November 2006. XML Inclusions (XInclude) Version 1.0 (Second Edition). World Wide Web Consortium (W3C). ● [XSLT] James Clark, editor. 16 November 1999. XSL Transformations (XSLT) Version 1.0. World Wide Web Consortium (W3C). ● [Ant] Stephane Bailliez, et al. December 29, 2013. Apache Ant™ 1.9.3 Manual. The Apache Software Foundation. ● [XProc] Norman Walsh, Alex Milowski, and Henry S. Thompson, editors. 11 May 2010. XProc: An XML Pipeline Language. World Wide Web Consortium (W3C). ● [XLIFF] OASIS XLIFF Technical Committee. 1 February 2008. XML Localisation Interchange File Format (XLIFF) Version 1.2. Organization for the Advancement of Structured Information Standards (OASIS). ● [GOLD] Scott Farrar and D. Terence Langendoen. 2003. A linguistic ontology for the Semantic Web. GLOT International. 7 (3), pp.97-100. ● [ISOcat] M. Kemps-Snijders, M.A. Windhouwer, P. Wittenburg, S.E. Wright. November 2009. ISOcat: Remodeling Metadata for Language Resources. International Journal of Metadata, Semantics and Ontologies (IJMSO), 4(4), pp 261-276. ● [ICU] ICU Project Management Committee. 7 October 2015. ICU 56. ICU — International Components for Unicode.

Editor's Notes

  1. Hello. My name is Simon Dew. The presentation I&amp;apos;m going to give today grew out of my work at Stanley Black &amp; Decker Innovations from 2006 to 2015, where I was required to deliver single source documentation in multiple languages. I&amp;apos;m going to describe the linguistic problems I encountered with transclusion in XML documentation, and the solutions I developed to help solve them.
  2. So, let&amp;apos;s define our terms. What is transclusion?
  3. Transclusion is a term invented by Ted Nelson, the originator of the concept of hypertext, in his 1981 work, Literary Machines [Nelson]. It refers to the inclusion of an electronic document, or part of a document, in the rendering of another document. The main document does not contain a copy of the transcluded text, but only a reference to it. The software used to render the document obtains the transcluded material and incorporates it into the main work. Transclusion is a hugely important concept for the presentation of content on the WWW and in electronic publication workflows.
  4. This presentation focuses on transclusion in XML, the Extensible Markup Language [XML]. I&amp;apos;m assuming that everyone is familiar with XML. It&amp;apos;s widely used in the digital humanities. There are several XML standards for documentation. The techniques I&amp;apos;m going to talk about were originally developed for DocBook XML [DocBook], but they are generally applicable for any XML documentation standard, including the Darwin Information Typing Architecture [DITA], the Text Encoding Initiative [TEI] and XHTML [HTML].
  5. Transclusion works on two scales. It can be large scale, or context-free, which is when you build large documents by reusing smaller chunks of content. For example, using a DITAMAP or a DocBook assembly.
  6. Transclusion can also be small scale or parametrised. This is where you inject small snippets of text into a larger text flow. These snippets of text may be proper nouns such as product names or company names, or they may be common nouns such as product categories.
  7. In an XML document, parametrised transclusion might be realised using general entities...
  8. … generic mechanisms such as XInclude [XInclude] …
  9. … or specific transclusion mechanisms such as conref in DITA. The examples within this presentation show a specific transclusion mechanism that I developed for Stanley Black &amp; Decker Innovations, but the linguistic techniques that I&amp;apos;m going to describe should work with any kind of parametrised transclusion in any XML document.
  10. The important thing to note about parametrised transclusion — and this is where the term transclusion has changed its meaning since Ted Nelson invented it — is that the referred content may vary. You may not know what the referred content will be until publication time. You might want to re-use one topic in the documentation for several different products. Or you might need to change all the product names in a single document for different brandings.
  11. This variation might be achieved in several ways. You might locally redefine the parametrised terms when you pull a topic into a larger work;
  12. Or you might use conditional processing, when you mark up a range of alternatives within the transcluded text, and then include or exclude them as required in a separate step. If you&amp;apos;re familiar with DocBook, you&amp;apos;d achieve this using conditional profiling attributes. If you&amp;apos;re familiar with DITA, you&amp;apos;d use a DITAVAL file.
  13. Now, small scale, parametrised transclusion can have linguistic consequences.
  14. Firstly, you may require a different form of the transcluded word or phrase, depending on the environment into which it is placed. To give a simple example: for writing systems with orthographic case, you may require the transcluded term to be capitalised at the start of a sentence or in a title.
  15. Or, depending on the language, you may require the transcluded term to take a different form depending on syntactic case, definiteness, number, or other features. So in this Swedish example, we want to transclude a term into the sentence but we want the definite form of the noun.
  16. The second linguistic consequence relates to government and binding. If the transcluded term is the head of a phrase, it may demand agreement from dependent words. To take an obvious example from English: the indefinite article can take one of two forms depending on the phonetic environment. So if you have an indefinite article followed by a transcluded term, the form of the indefinite article will vary depending on whether the transcluded term starts with a vowel or a consonant.
  17. Furthermore, in many languages, dependent words may also vary according to the syntactic environment. For example, the transcluded term may have varying grammatical gender, so dependent articles and adjectives will also have to change in agreement with the transcluded term. These changes can often be avoided by careful wording and translation. In some circumstances, though, they may be unavoidable.
  18. At Stanley Black &amp; Decker Innovations I developed a publication toolchain which attempts to solve these linguistic problems, among other things. Internally, we referred to this publication toolchain as PACBook. I&amp;apos;ll keep using this term for convenience. Within PACBook, the linguistic solution that I developed has two facets. I&amp;apos;ll give an overview of the principles, and then I&amp;apos;ll go into more detail.
  19. First, I developed a linguistic markup scheme for use in XML documentation. There are basically four things to be marked up: When you&amp;apos;re defining terms to be transcluded, you must mark up all the possible syntactic forms of the word, and mark up any linguistic features which affect dependent words. When you&amp;apos;re marking up the locations where a transcluded term is required, you must mark up the form of the word that&amp;apos;s required, and mark up any words or phrases that depend on the transcluded terms.
  20. Secondly, I developed a set of XSLT stylesheets to perform linguistic pre-processing on documentation [XSLT]. (Extensible Stylesheet Language Transformations is a computer language for applying transformations to XML.) This pre-processing toolchain carries out a set of steps, like so: Resolve parametrised transclusion; Select the correct form of head words; Perform conditional profiling; Select the correct form of dependent words with the help of a syntactic dictionary; Finally, select the correct orthographic case. The output is then passed on to the next step in the publication process.
  21. We automated the process using build tools like Apache Ant [Ant] and XProc [XProc]; this is outside the scope of this presentation.
  22. Let&amp;apos;s go into more detail on the linguistic markup. I&amp;apos;ll propose a set of XML attributes which extend the document&amp;apos;s markup schema.
  23. These XML attributes can be added to any element which contains a run of text. To mark up a word or phrase, an author would most likely add linguistic markup attributes to a semantically empty wrapper element such as &amp;lt;phrase&amp;gt; in DocBook, &amp;lt;ph&amp;gt; in DITA or &amp;lt;span&amp;gt; in HTML. The linguistic markup that I&amp;apos;m proposing has its own XML namespace: the URI is given on the slide here. Authors can use any prefix they like to refer to this namespace, but I suggest the prefix ling.
  24. For the linguistic markup I propose these XML attributes: ling:pron, the phonetic environment governed by a term; ling:num, used to mark up grammatical number; ling:case, used to mark up grammatical case; ling:gen, used to mark up grammatical gender; ling:class, used to mark up definiteness or inflectional class; ling:orth, used to mark up orthographic case; ling:type, used to mark up whether a term is a form of a head word or a dependent word. The first five attributes can contain any value, but you must use consistent values within each language — you&amp;apos;ll see why later. Obviously this is not an exhaustive list of all possible linguistic features; they&amp;apos;re the features I needed for the languages that PACBook supports. But the scheme is designed to be extensible to support further linguistic features if necessary.
  25. So, let&amp;apos;s look at some examples of markup. First of all I&amp;apos;m going to show how to mark up the linguistic features that demand agreement when you define a resource for transclusion. This example is in English. You can see that there are two different forms of the product name, depending on the branding. I&amp;apos;ve used the ling:pron attribute to mark up the phonetic environment that these brand names govern: Euro is pronounced with an initial consonant and Oz is pronounced with an initial vowel.
  26. Here&amp;apos;s an example showing how to mark up all the possible grammatical variants of a word when you&amp;apos;re defining that word for transclusion. This is an example in Swedish. It&amp;apos;s the markup for a term called Org_Unit. You can see that the outer &amp;lt;phrase&amp;gt; wrapper shows that this term has common gender and singular number. The inner &amp;lt;phrase&amp;gt; elements are marked with ling:type=&amp;quot;head&amp;quot; to show that these are grammatical variants of a head word. Each variant is marked up for case and definiteness and I&amp;apos;ve marked up all the possible variants.
  27. So that&amp;apos;s how to mark up linguistic features when defining terms for transclusion. Now let&amp;apos;s see how to indicate the required form of a transcluded term in the main body of the document. The first example is Swedish again. It shows that we want to transclude the term called Org_Unit and that we want the definite form of the word. The grammatical case isn&amp;apos;t specified, which means nominative is assumed. The second example is in English; it shows that we want to transclude the term called Org_Unit, but because this is a title, we want the term to be output in title case, i.e. with initial capital letters.
  28. Finally, here are some examples showing how to mark up dependent terms in the text. The first example is English. I&amp;apos;ve used the &amp;lt;wordasword&amp;gt; element, which in DocBook is semantically empty, to mark up the word “a” as dependent on the following transcluded term. The next example is German. You can see that I&amp;apos;ve marked up the two articles as dependent words, but because there are two head words in this run of text, I&amp;apos;ve had to wrap each head word together with its dependent article in a semantically empty &amp;lt;phrase&amp;gt; element. It&amp;apos;s very similar to building a phrase structure diagram!
  29. So how does PACBook select the correct form of a dependent word? Well, as I mentioned in the overview, it uses a syntactic dictionary.
  30. The syntactic dictionary must comply with the dictionaries module of the Text Encoding Initiative (TEI). Currently PACBook has ten dictionaries in development, one for each of the languages that PACBook supports. You can see here an entry from the English dictionary. In fact, it&amp;apos;s the only entry in the English dictionary. It shows the two different forms of the indefinite article, and the environment in which each form is used.
  31. In the syntactic dictionaries, linguistic features are marked up using these TEI elements. You can see how they match up with the ling attributes: &amp;lt;usg&amp;gt;, the phonetic environment in which this form is used; &amp;lt;num&amp;gt;, the grammatical number; &amp;lt;case&amp;gt;, the grammatical case; &amp;lt;gen&amp;gt;, the grammatical gender; &amp;lt;oVar&amp;gt;, the definiteness or inflectional class; &amp;lt;orth&amp;gt;, used to mark up the output form. The first five elements can contain any values you like, but they must match the values used in the ling attributes in your document. Again, this isn&amp;apos;t an exhaustive list of all possible linguistic features; they&amp;apos;re the features I needed. The scheme could be extended.
  32. So that&amp;apos;s the markup scheme. The other facet of PACBook is the software which carries out the transformations.
  33. As I mentioned in the overview, PACBook contains an entire suite of XSLT stylesheets for document publication. The important ones from a linguistic perspective are: LingHead.xsl — you apply this transformation after parametrised transclusion. It selects the required grammatical form of any transcluded head words based on the markup in the document. LingDepend.xsl — this looks at the grammatical features of transcluded head words and selects the correct form of dependent words using the syntactic dictionary for the current language. LingCasing.xsl — this is very straightforward. It uses functions from the standard XSLT library to set the orthographic case of the specified text to upper case, lower case, sentence case or title case. Title case is English-only.
  34. These stylesheets are free software. SBD Innovations made the source code available under version 3.0 of the LGPL. The stylesheets are available on GitHub at this URI. This GitHub repository also contains the syntactic dictionaries, schemas which enable you to validate the linguistic markup, and full documentation.
  35. Obviously this solution has limitations. It can only inflect the head nouns and dependent words within a noun phrase. It can&amp;apos;t (yet) conjugate verbs, for instance. Secondly, the solution has been developed for and tested with ten European languages. Thirdly, inline markup can be quite different for translated texts. I&amp;apos;ve worked out a correspondence between the linguistic markup and the XLIFF translation standard [XLIFF]; see the GitHub website for details. Finally, the linguistic markup is simple for authors or translators writing in English, but can become complex in other languages.
  36. The problems I&amp;apos;ve described are linguistically trivial, and the solution is computationally trivial. That being so, has anyone done any work in this area before? There are lots of linguistic markup schemas and ontologies, e.g. the GOLD ontology [GOLD], or ISOcat [ISOcat]. None of them seemed concise enough to use in an existing document schema. There are also various markup schemas for internationalisation. International Components for Unicode have developed MessageFormat, which enables you to mark up the singular or plural forms of a word for a locale [ICU]. But as far as I can tell, nothing has ever been developed which solves the problem that I&amp;apos;ve attempted to solve. In fact I&amp;apos;m not even sure what you&amp;apos;d call this problem. Perhaps we need a name for it, in the spirit of Ted Nelson...
  37. I&amp;apos;d be very interested to make contact with people who are working in similar areas and would be interested in collaboration. I need help with: Extending the dictionaries — perhaps programmatically, using the huge amount of linguistic data available on Wiktionary; Testing and improving the software; Integrating with other publication workflows. One final thing to say: Stanley Black and Decker Innovations was wound down at the end of July 2015. I&amp;apos;ve cloned the GitHub repository at the URI shown here, so that development can continue. If anyone would like to contribute please contact me.
  38. Here’s an example of how the markup and the transformations work together.
  39. Imagine that set of help topics can be delivered as a print file or as online help. So, when the author wants to refer to the document itself, she marks up the two possible document types with profiling attributes which effectively say, “for the print output format, this is a document; for the online output format, this is a help file”. Then, because our author is German, she marks up all the possible singular forms of the words for “document” and “help file”, and adds further linguistic attributes to signal the number and gender.
  40. In the document, the author marks up all the points where the term will be inserted and specifies which syntactic case is required. Then, she runs the document build process...
  41. After parametrised transclusion, all possible grammatical forms of all possible document types are included in the document, at every point where the document type is required.
  42. After the head transformation, the correct grammatical form of the term is kept at every point where the term is required, and the other grammatical forms are removed.
  43. After conditional processing, only one of the alternative forms is selected at every point where the product name is required, and the rest are removed.
  44. Finally, all dependent terms are transformed to match their syntactic and phonetic environment as required.
  45. Thank you very much for listening. Are there any questions?