XML is brilliant, and pervasive. If you are a translator, chances are you have used it in a) files for translation, b) tools for translation, c) life. No need for a mad scientist hat to get the gist of it! For goodness sakes, you speak another natural language (a whole organic system that not even Chomsky has cracked) and you are afraid of a little bit of markup code? Come see what the fuss is about (is worth it!) and keep in mind XML "spell and grammar checkers" are off the charts! (Imagine if English teachers could say the same!)
3. Goals 1
Access markup languages via linguistics:
Lexicon: tags as words
Syntax: combinatorial options
Semantics: what tags do
4. Goals 2
Learn to distinguish
markup (metalanguage)
text (object language)
Work with marked-up files
Use ML-based translation tools
Intervene in ML vocabulary creation
5. What is Markup?
Earmarking text with metadata
USE of object language
Goblins are deceiving creatures.
MENTION with metalanguage
“Goblin” comes from a Greek term.
Note the use of quotation marks!!!
Meaning to text for ≠ applications to use
Semiotics
11. Linguistic Features of a ML
SIGNS :: elements in tags
MEANING :: instructions
SYNTAX :: open/close
nested structure
12. ML Signs:Tags
< title > Book < / title >
START tag END tag
ELEMENT
text
less than
slash
greater than
Delimiters + Name
Case sensitive (lower case)
13. ML Meaning: Instructions
Tags organize text
Tags style text
Tags describe text
<b>
<h1>
<lastname>
<p> <li>
<center> <bgcolor>
<ingredient>
14. ML Syntax: order
Tags open and close: <h1>texto</h1>
Declared attributes: Property=“Value”
Nested in order:
< table width=“100%”/>
<h1>text</h1>
<b><i>text</i></b>
<h1 />
(deprecated)
15. ML Syntax: compound words
< table width=“100%” > […] < / table>
property
element
value
Attributes
19. HTML Graphic File Instruction
Porta Ludovica
<img src="porta.GIF" width="432" height="216"
align="BOTTOM" alt="Porta Ludovica"><br>
20. HTML Limitations
Mix of style and structure
Bolding, Centering, Color,
Paragraphs,Tables, etc.
W3C decided to separate:
style instructions
structure instructions
*WorldWideWeb Consortium
21. HTML Expansion
Structure: <h1>
HTML proper: structural integrity
Style: <font> <color>
CSS: associate the same HTML
document with different styles
CSS: Cascading Style Sheets
22. XML: eXtensible Markup Language
Extensible: no predetermined set of tags
Platform & application independent
Applicable to very diverse fields
Data representation for
Storage > meaningful fragments
Interaction > searching / sorting
Control > transformation / display within a
browser, PDF, print publishing, etc.
23. Conveys info on structure and meaning
Tree hierarchy
Validated Syntax
(rules OK?)
XML
25. Declaration
Defines XML version and character encoding
Optional
<xml version="1.0" encoding="ISO-8859-1"?>
<recipe>
<type>Paella
</type>
<ingredient>rice
[...]
UTF: UniversalCharacter SetTransformation
Format
<xml version="1.0" encoding=“UTF-8"?>
<recipe>
<type>Paella
</type>
<ingredient>rice
[...]
26. Tags
Element delimiters
<start tag> :: opens an element
<end tag> :: closes an element
<xml version="1.0" encoding="ISO-8859-1"?>
<recipe>
<name> Paella </name>
<ingredient>rice </ingredient>
[...]
27. Elements
Building blocks of XML
Root element: mandatory
<xml version="1.0" encoding=“UTF-8"?>
<recipe>
<name> Paella </name>
<ingredient> rice </ingredient>
[...]
28. Attributes
Additional information about elements
Syntax: <element property=”value”>
<tel type=“cell”>123-456-7890</tel>
<book id=“001” lang=“Italian”>[text]</book>
29. Comments
Notes or temporary edit-out of code
Signaled by <!-- comment -->
No visual rendition
Not translated
<xml version="1.0" encoding="ISO-8859-1"?>
<!--this example is based on TEI guidelines-->
<novel>
<title>The Pillars of the Earth</name>
[...]
31. <?xml version="1.0" encoding="utf-8"?>
<catalog>
<book id="bk001" lang="english" trans="">
<author>Nadine Kano</author>
<translator></translator>
<title>Developing International Software</title>
<origtitle></origtitle>
<genre>IT</genre>
<publish_date>1995-12-05</publish_date>
<description>A handbook for international software
design.</description>
</book>
<book id="bk002" lang="italian" trans="">
<author>Enrico Galiano</author>
<translator></translator>
<title>Eppure cadiamo feilici</title>
<origtitle></origtitle>
<genre>Fiction</genre>
<publish_date>2017-04-18</publish_date>
<description>La sotira di una giovvane che non a paura
di ascoltare il rumore della felicità.</description>
</book>
<book id="bk003" lang="english" trans="spanish">
<author>Ken Follet</author>
<translator>Rosalía Vázquez</translator>
<title>Los Pilares de la Tierra</title>
<origtitle>The Pillars of the Earth</origtitle>
<genre>Historical Fiction</genre>
<publish_date>2008-02-01</publish_date>
<description>Durante la construcción de una
catedral gótica, el amor, la muerte y el poder se
entrecruzan mostrando que las relaciones humanas son
siempre complejas.</description>
</book>
</catalog>
elements
root element
44. ML code text using metalanguage
<recipe> Breaded Chicken Breast </recipe>
<li> chicken breast </li>
<li> breadcrumbs </li>
Marked-upText
<tags>
45. XML: Evolution? Revolution!
Describe not display
Tags are not predefined
Tree structure
Extensible vocabulary of elements
Validated for well formed syntax
Entities