Developing an STM DTD/Schema: Strategic design choices
1. Developing an STM DTD/Schema:
Strategic Design Choices
Alexander (‘Sasha’) Schwarzman, AGU (sschwarzman@agu.org)
Extreme Markup Languages 2006, Montréal, Canada
August 7 – 11, 2006
Requirements
Does an agreed upon Requirements document exist? (Get one!)
What is your XML’s role?
Archival copy-of-record (preserving scientific content)?
Means of producing a pretty PDF?
Both?
Much more?
Architecture
When during production is XML created? How is accuracy checked at each stage?
Dummy empty elements for not-yet-assigned metadata plus use of configurable
production-stage-specific Business Rules Checker / Validator / QC Tool?
Multiple DTDs: a separate one for each production stage?
XML “layering”: What “layer” to use for enforcing editorial style and business
rules?
DTD / parser?
Validator / Schematron?
Human editors?
Revisable unit (what is the elemental unit?)
Article?
Issue?
Arbitrary / cross-journal article collection?
Volume / year?
Journal?
More than one of these?
Scope
For what material?
Current?
Future-only?
Legacy?
All of the above or some combination?
What is the extent of an article / book?
Does it include supplementary material, like datasets and computable spreadsheets?
Do you model “extra stuff” as just another structured section or is it something different?
Special links (“related links”) section?
Alexander (‘Sasha’) Schwarzman, AGU Extreme Markup Languages 2006, Montréal, Canada Page 1 of 2
(sschwarzman@agu.org) August 7 – 11, 2006
2. Developing an STM DTD / Schema: Strategic Design Choices (cont’d)
Modeling Language Choices
Which constraint language is primary?
DTD?
XSD?
RELAX NG?
How many DTDs / schemas (purpose of each)?
Authoring?
Conversion / Transformation?
Production?
Archiving?
Separate or shared: If your content includes journal article, newspaper article, book
chapter, book, case study, lecture notes, etc., should you use:
Distinct DTD / schema for each?
A large shared structure?
A DTD / schema suite with common modules?
“Off-the-shelf, Altered-to-fit, or Bespoke?” (T. Usdin)
If altered, what public model?
“compatible with” or “informed by” (subset or superset)?
If bespoke, do you use any public models at all (for tables and math, for instance)?
Modeling Design Choices
“Prussian” or “Californian”: prescriptive or descriptive? Flexible or enforcing?
Generated or Explicit text? (depends on XML’s role)
Preserve generation / rendition rules?
Different approach for text and bibliographic references?
How to model bibliographic references?
Mixed content?
Genre-specific “strict models” (with an escape hatch provided)?
“Tag abuse” tolerance?
How to reference non-XML components, e.g., figures, in XML?
By an ID that maps to a set of multiple images in an archive?
By naming a specific file from the set? Which one is “the mother of all images”?
Which components to store / migrate? Is “storing cheaper than thinking”? (D. Lapeyre)
How to model math?
MathML presentation versus content (computation)?
How to ensure the identicalness of the same math symbols in different browsers (same UNICODE
codepoints look differently in various browsers, e.g., epsilon and varepsilon)?
LaTeX plus GIFs?
How to ensure the identicalness of special characters that occur both in a displayed formula and
inline?
Just GIFs?
“Just because you can, doesn’t mean you should” (D. Lapeyre)
The lure of modeling for its own sake. Simplicity maintains better over time
Alexander (‘Sasha’) Schwarzman, AGU Extreme Markup Languages 2006, Montréal, Canada Page 2 of 2
(sschwarzman@agu.org) August 7 – 11, 2006