A crash course on markup languages for the aspiring technical translators: The explosion of new technologies has turned many translators into mesmerized spectators of a business built on their very shoulders. XML (extensible markup language) is the standard data exchange tool for web and other environments that translators are forced to use while working on XML files or using XML based translation software. This workshop will help them understand the technology they are both manipulating and using, by covering first HTML basics as a building block, and later introducing XML concepts and translation issues. In this first part, we look at the evolution of markup languages and delve into HTML as a building block.
2. 02/21/18ATA Conference, Phoenix, 2003
2
ML: Intro
Markup Language
Data exchange tool
MLs in translation: marked-up files &
ML‑based translation tools
3. 02/21/18ATA Conference, Phoenix, 2003
3
A Little History: SGML
SGML: Standardized General Markup
Language
Developed in the 80s
For aerospace & automotive industries
ISO standard in 1986 (ISO 8879)
4. 02/21/18ATA Conference, Phoenix, 2003
4
A Little History: HTML
HTML: Hypertext Markup Language
Developed in the 90’s
First markup language specially designed
for the Web, based on SGML
Comparable to Controlled English:
subset of a more comprehensive
language
5. 02/21/18ATA Conference, Phoenix, 2003
5
A Little History: XML
XML: Extensible Markup Language
XML: Developement started in 1996
Initiative of the W3C for large-scale
electronic publishing
W3C recommendation in February 1998
W3C: World Wide Web Consortium
(Development of Web technologies)
6. 02/21/18ATA Conference, Phoenix, 2003
6
What is a Markup Language?
Publishing world
Text to be printed ≠ Instructions on how
to print it
Set of instructions (such as font size,
color, etc.): “markup”
9. 02/21/18ATA Conference, Phoenix, 2003
9
Example: Textbook Highlighting
Classifying or "marking up" the
information: original content + relevance
If different meanings > several colors
(main ideas + support ideas)
If different sets of colors > cannot share
the texbook
(main ideas + support ideas)
11. 02/21/18ATA Conference, Phoenix, 2003
11
Conditions for Useful Markup
Standard defining what is a valid markup
(highlighting in pink & yellow)
Standard defining what that markup
means
(pink: main ideas & yellow: support
ideas)
12. 02/21/18ATA Conference, Phoenix, 2003
12
Linguistic Conditions
We provide our SIGNS (colors) with
VALUES (messages) derived from their
mutual and exclusive RELATIONS
We are defining a LANGUAGE through a
system of DIFFERENCE
(Saussure)
17. 02/21/18ATA Conference, Phoenix, 2003
17
Semiotics
Theory of the Suppositions (Scholastics)
Statements refer to:
- entities: suppositio formalis
- names of entities: suppositio materialis
>> different logical levels of languages
They relied on context!
18. 02/21/18ATA Conference, Phoenix, 2003
18
Semiotics
USE:
“Goblins are deceiving creatures.”
MENTION:
“’Goblin’ comes from a Greek term.”
Note the use of quotation marks
21. 02/21/18ATA Conference, Phoenix, 2003
21
HTML
Popular markup systems
Defines how a browser presents text,
images, and sounds of web documents
Ability to handle instructions (tags) to
interconnect document (hyperlinks) and
allow user interaction (forms)
22. 02/21/18ATA Conference, Phoenix, 2003
22
HTML’s Standards
1. Valid Markup: Tags, consisting of
delimiters and names: <Tag Name>
2. Meaning: Each tag communicates a
layout message, associating a structure
or style rule to the marked-up text
<p> = paragraph
23. 02/21/18ATA Conference, Phoenix, 2003
23
HTML Reader
Targeted at HTML processors, usually
Internet browsers
Read the instructions and present the
document according to the associated
rules
26. 02/21/18ATA Conference, Phoenix, 2003
26
HTML Signs
HTML uses a textual language based in
English
We can read it and edit it with a simple
text processor such as NotePad
What we see in the text processor differs
from what the browser presents us with
29. 02/21/18ATA Conference, Phoenix, 2003
29
HTML Interpretation & Editing
Display varies from browser to browser
“This site is optimized for Internet
Explorer 4.0 or higher”
Editors can present “raw” HTML or
“interpreted” HTML, or both
WYSIWYG: what you see is what you get
30. 02/21/18ATA Conference, Phoenix, 2003
30
HTML Tags
<b> instructs the browser to display the
marked-up text in bold
<a> href="mailto:EMAIL">TEXT</a>
creates an email link
EMAIL: romina@romina.com
TEXT: Romina
PRESENTATION: Romina
31. 02/21/18ATA Conference, Phoenix, 2003
31
HTML “words”
Words: separated by spaces
Markup Sequences: distinguished by
opening and closing each tag
Convention:
<tag name>marked-up text</tag name>
32. 02/21/18ATA Conference, Phoenix, 2003
32
HTML “words”
Tags can have attributes defined by a
property and a value
<table width=”100%”> creates a table as
wide as the screen
Tag: <table>
Property: “width”
Value: window % or pixels
33. 02/21/18ATA Conference, Phoenix, 2003
33
HTML Syntax
What happens if we want the text to be
both bold and italicised?
Third condition of markup languages:
3. Standard defining how markup signs
can be combined, i.e., a grammar
35. 02/21/18ATA Conference, Phoenix, 2003
35
HTML Limitations
Tags grew more and more complex,
mixing style and content (or structure)
tags within the document
(Bolding, Centering, Paragraphs, Tables,
Links, Lists, Colors, etc.)
W3C decided to separate style
instructions from content instructions
36. 02/21/18ATA Conference, Phoenix, 2003
36
HTML Expansion
Content instructions (such as <h1>,
largest title) would continue to be part of
HTML markup within a document
Style tags (such as <font>) would be
used in Cascading Style Sheets or CSS
style instructions separate from, but
associated with, the document
37. 02/21/18ATA Conference, Phoenix, 2003
37
HTML + CSS Advantage
Control the style without compromising
the structural integrity of the data
Associate the same HTML document
with different style sheets
Evolution into a more standardized
language
41. 02/21/18ATA Conference, Phoenix, 2003
41
HTML Summary
How Internet browsers display text,
images and sounds
Valid markup is defined by tags with an
associated meaning and grammar
CSS are a mechanism to separate style
instructions from content instructions for
a web page
42. 02/21/18ATA Conference, Phoenix, 2003
42
XML: Evolution or Revolution?
Less strict and complicated SGML dialect
SGML :: Indo-European
HTML :: Controlled English
XML :: English
Extensible: no predetermined set of tags
Applicable to very diverse fields
43. 02/21/18ATA Conference, Phoenix, 2003
43
XML Off the Rack
MathML, for mathematics
SMIL, for multimedia (Synchronized
Multimedia Integration Language)
CML, for manipulating molecular
information
44. 02/21/18ATA Conference, Phoenix, 2003
44
XML Flexibility
MathML, for mathematics; SMIL, for
multimedia (Synchronized Multimedia
Integration Language); CML, for
manipulating molecular information