XML is a markup language that allows users to define their own tags to structure documents. It separates content from presentation, allowing content to be reused in different formats. XML has advantages like encoding data only once, separating content from formatting, enabling interoperability and machine-to-machine communication, and allowing content to be ported and preserved more easily. XML is not a programming language but a meta-language used to build domain-specific markup languages for different applications and industries.
Web data management provides revolutionized access to information.The Internet and World Wide Web have revolutionized access to information. Users now store information across multiple platforms from personal computers, to smartphones, to websites such as Youtube and Picasa
A comparison of a database table to an XML document. There is an overview of basic XML concepts suchs as attribute, element, entity, and tag. Data centric and document centric XML document are covered.
This workshop is intended for Connecticut Digital Archive participants to introduce them to xml and how MODS or metadata object description schema is implemented in the CTDA.
Watch this presentation to complete each part of a research paper correctly. Read the full article here: https://essay-academy.com/account/blog/major-parts-of-a-research-paper
Web data management provides revolutionized access to information.The Internet and World Wide Web have revolutionized access to information. Users now store information across multiple platforms from personal computers, to smartphones, to websites such as Youtube and Picasa
A comparison of a database table to an XML document. There is an overview of basic XML concepts suchs as attribute, element, entity, and tag. Data centric and document centric XML document are covered.
This workshop is intended for Connecticut Digital Archive participants to introduce them to xml and how MODS or metadata object description schema is implemented in the CTDA.
Watch this presentation to complete each part of a research paper correctly. Read the full article here: https://essay-academy.com/account/blog/major-parts-of-a-research-paper
Applied vs basic research - Research Methodology - Manu Melwin Joy manumelwin
When discussing research methodology, it is important to distinguish between applied and basic research. Applied research examines a specific set of circumstances, and its ultimate goal is relating the results to a particular situation. That is, applied research uses the data directly for real world application.
By now, you have heard how important structured content is. But, maybe you poked around with something like DITA and were baffled by the complexity. Or, maybe you still aren’t sure what XSLT stands for. This workshop will take participants back to the basics, to provide a foundation for higher-level concepts that have taken hold of our industry. Topics will include:
- What XML looks like, what it does, and how to create it.
- How to define a structure model, including whether to use a - DTD, Schema, etc.
- What XSLT looks like, what it does, and how to make it work.
- What DITA and DocBook really are and whether one is right for you.
Russell Ward is an experienced technical writer and structured technologies developer. He has spent many years working with structured content to maximize efficiency in the techcomm environment, both as an employee and as an independent consultant. He is also an experienced trainer and speaks periodically at conferences and other peer events.
2. Page 2 of 16
1. XML by example
1.1. Credit card statement (paper)
Cardmember Statement
ACCOUNT NUMBER: 4444888822221111
AVAILABLE CREDIT: 5,000
CLOSING DATE: 11/25/02
PAYMENT DUE DATE: 12/15/02
CARDMEMBER STATEMENT SUMMARY
TRANS
DATE
POST
DATE
REFERENCE
NUMBER
DESCRIPTION OF
TRANSACTION
CREDITS CHARGES
1023 1025 2416QZP Townhouse Store #3306 DC 10.65
1027 1027 2422KQ12 Wazuri DC 55.00
1103 1103 7422120F Payment – Thank you 1,000.00
4. Page 4 of 16
1.3. XML building blocks
XML deal with documents
A document is a basic unit of XML information, composed of elements and other
markup in an orderly package
<Description> Payment -- Thank you </Description>
Start tag Character data End tag
Markup Element Markup
An element is an identifiable, named component of a document
can have content (but doesn’t have to): data, other elements
can be a pointer to information (cross-reference, link)
must have one start and one end tag
elements can nest but cannot overlap
An attribute provides additional information about an element
<Transaction Category=”Groceries”>
found inside start tag
5. Page 5 of 16
may be required or implied
an element may have multiple attributes
1.4. Credit card statement DTD
• What DTD can (structure, sequence, in-document linking, selected occurrence
indicators) and cannot provide for (datatyping, flexible occurrence indicators)
1.5. Document types and their instances
• Invoice
• Sales catalog
• Dictionary
• Journal article
6. Page 6 of 16
1.6. Validating parser
1.6.1. What parser does
• Is document well-formed? (for stand-alone docs)
• Does a DTD conform to XML specs?
• Does a document instance conform to the DTD?
1.6.2. What parser does not do
• Check semantics (“gobbledygook” might be meaningless but valid as far as a
validating parser is concerned)
• Check what a DTD cannot enforce (datatyping, flexible occurrence indicators)
1.7. Credit card statement in XML environment
7. Page 7 of 16
2. Components of an XML system
• Document instance
• DTD/Schema
• Validating parser
• Processing system
2.1.1. Document
Two kinds
Well-formed
Valid (has a model)
Usually created
Manually – using XML Editor (Epic, XMetaL)
Programmatically from a database, another XML document, or by conversion
from another format (LaTeX, MSWord)
2.1.2. DTD
The modeling mechanism specified by the XML standard
models one type of information
is a set of rules describing how documents of that type can be marked up
2.2. Processing system
XML DOES NOT DO ANYTHING!
Your software CAN!
Start/stop behavior
Run a script, load a database, create a “form letter” and fill-in contents
Link
Format (start bold, end bold)
Process
Extract selected elements (e.g., metadata)
Rearrange/resequence content
Rename, add content
8. Page 8 of 16
Count how many
3. XML origins
3.1. What is markup?
Information added to a document that enhances its meaning in certain ways, in
that it identifies the parts and how they relate to each other.
3.2. Pre-electronic (traditional) markup
Set this header in 12-point Helvetica Medium italic on a 14-point text
body, justified on a 22-Pica slug with indents of 1 en on left and none on
the right.
3.3. Markup language
A set of symbols that can be placed in the text of a document to demarcate and
label the parts of that document
3.4. Specific markup languages
Tells formatter what action to take: "carriage return", "center the following lines",
"go to the next page", etc.
3.4.1. RTF, Script, etc.
Script example
.sp (skip one line)
.bf roman 12 (change font size)
.bd .ce Chapter 1. Introduction
(center "Chapter 1. Introduction" and print it in bold)
3.4.2. WYSIWYG Word Processors, DTP, and professional typesetting
systems
• WordPerfect, MSWord, WordStar, MacWrite
• Quark, Ventura
• XYVision, Penta, Miles 33
Proprietary, not interchangeable, structure and presentation inextricably intertwined.
Retrieval, cross-referencing difficult.
9. Page 9 of 16
3.5. Generic markup languages
Uses descriptive tags rather than formatting codes. Indicates logical structure of
the documents. Separates formatting from structure/content.
3.5.1. Macro-based languages
• LaTeX for TeX
• Syspub for Waterloo Script
• ms for nroff
LaTeX example
to{Mr. Smith}
stands for 3 commands
noindent
settabs 6 columns
+TO:&Mr. Smithcr
3.6. SGML
• 1960s. GCA’s “GenCode” (Graphics Communications Association)
• 1969. IBM’s GML. Generalized Markup Language (Charles Goldfarb, Edward
Mosher, and Raymond Lorie)
• 1978. ANSI working group formed to provide a format for text interchange to
develop a standard text-description language based on GML headed by Charles
Goldfarb
• 1983 SGML developed. DoD and IRS adopt SGML. DoD develops CALS
(Computer-Aided Acquisition and Logistic Support) as an SGML application.
(CALS tables still in use.) AAP develops DTDs for books and journals. SGML
spreads in Europe and North America
• 1986. ISO ratifies SGML as a standard (ISO 8879:1986)
3.7. HTML
• Early 1990s. Tim Berners-Lee and Anders Berglund of European particle physics
lab CERN develop HyperText Markup Language (Berglund designed a
publishing system to test SGML in the 1980s)
• HTML is an application of SGML for hypertext documents
10. Page 10 of 16
• Both a step forward (Web, wide adoption, public interest in markup) and a step
back (generic coding principles compromised: one (!) doc type used for all
purposes, many tags purely presentational)
HTML example
11. Page 11 of 16
HTML tags format
4. XML
1998. W3C group under Jon Bosak: simplified version of SGML: 80% of SGML
power with 20% of its complexity
4.1. What XML can do
XML can be used to tag…
Content (what type of information is this?)
City, state, zip
Part number
Debit, credit, payment
Question, answer
12. Page 12 of 16
Genus, species
Indications, counter-indications
Structure (what part of document is this?)
Paragraph, sub-section, section, chapter, list
Table, figure, formula, video
Author block, signature block, address block
Pointers (Location, navigation, linking, and other relationships)
Hypertext links
Cross-references
Indexing terms
Metadata (information about data)
Bibliographic/cataloging information (author, title, publication date)
Index terms and keywords (search terms)
Revision, version, edition
Status, tracking information
Data sources
Editor’s and reviewer’s comments
Abstracts, highlights, “teasers”, “blurbs”
Rendering/Processing (if you MUST) – how text should behave, display, or print
normally handled through a stylesheet but…
position of graphic on the page (floating, centered)
line break in titles
tables
author’s whimsy (“I want this word bold just because”)
13. Page 13 of 16
4.2. XML is…
A subset of SGML. A meta-language that describes the concepts and rules to
build domain-specific markup languages
A family of technologies/standards (W3C Recommendations): XSLT, XSL,
Xpointer, XPath, XQuery, Xlink, DOM, SAX, etc.
XML can be used:
for document modeling
for data interchange
4.3. XML applications (domain-specific markup languages)
Device/media-oriented:
• XHTML - Web
• WML – wireless markup language
• VoxML – spoken word markup language
Discipline-oriented:
• MathML – mathematical markup language
• CML – chemical markup language
Industry-oriented:
• Airlines/aircraft
• Semiconductors
Process-oriented:
• SVG - Scalable Vector Graphics
4.4. XML is not…
• a programming language. Does not replace C++, Java, Perl, etc.
• a user interface
• a presentation format
• a text formatting or processing system
• a standard set of document types
14. Page 14 of 16
• a standard or recommended set of tags
• UNICODE
• a database
• user-unfriendly
5. XML in a publishing environment
15. Page 15 of 16
5.1. Uncontrolled inputs, controlled outputs
Hand held computer
Cell phone
Telephone
A&I
Services
XML
document
TOCs
Indices
Search
Interfaces
XML DB
Word
Perfect
MS
Word
LaTeX
HTML
Post
Script
XML
Converter
Composition
Engine
Low-res
PDF
High-res
PDF
XML Article
HTML
XSLT
stylesheet
CrossRef
MDDB
16. Page 16 of 16
5.2. Integrated environment with controlled inputs and outputs
Example: technical manual (aircraft, automobile, etc.)
Conceptual configuration of a database-centered XML-aware system (adapted from The SGML Implementation
Guide by B. Travis and D. Waldt)
Authoring Editing Reviewing
Copy-editing
Converting
Imaging
ComposingPublishing
Abstracting
and Indexing
Searching Archiving
Revising
Tracking
Referencing
and linking
Translating
Assigning
Master Database
- Text Objects
- Graphics
- Works in Progress
6. XML advantages
• Encode (markup) data only once. Create single information repository
• Separates content/structure from presentation/formatting
• Software/hardware independent
• Interoperability: common language for a community to agree on data content;
machine-to-machine communication.
• Portability
• Preservation
• Non-proprietary/open industry standard
• Reuse/re-purposing (many outputs)
• Enables semantically complex searching and retrieval
• Cuts down on the number of required converters (saves software development
costs)