Most UPs still working toward print. Ideally, a good XML workflow will let you view the printed book—offset or digital—as one product that can come out of the content. But right now the book, mainly printed but increasingly digital, is still the engine that drives schol. pub.ing. The good news is that what we’re good at is making books. This expertise is what you want to rely on in establishing an XML wf. That is, you don’t need to totally reinvent how you work. You want to use it, use what you already know to work, in order to transition to a wf that will allow you to take advantage of digital opportunities.
XML early: MS is converted and edited as XML. Editors and authors work in XML-editing software. 2. XML in the middle: XML extracted from final edited word-processing files and used for typesetting and other purposes (including prepublication display) 3. XML late: Final PDF files converted to XML
1. XML early: Requires buy-in: Editors and authors must learn to use new software. Allows repurposing throughout publishing process. 2. XML in the middle: No need to learn new software. Allows repurposing. Relies on standardization to preserve XML after typesetting. 3. XML late: No new software. Requires financial investment to convert each title after composition.
This scheme results in two main parts of the XML workflow: pre- and post-production/typesetting. The print production cycle is going to be a major influence on the XML workflow, but the XML still has to allow both efficient print production and the ability to repurpose for other digital needs, such as archive mining and conversion to EPUB. The moment in the workflow centered around typesetting caused the most difficulties for CUP’s XML workflow—and this where the most reconciliation and care are needed. It is important to keep this in mind when establishing the DTD. So: what is a DTD?
Part of setting up an XML workflow should involve developing a DTD. There’s a natural connection between the typesetting styles that you’ve always used to set up your books and the XML element tags that a DTD will comprise. The functions of the two things are really the same: to describe how each element in a document relates to all the others and should be presented to a reader. Typesetting styles speak to a designer or compositor, and the XML elements will speak to any application that needs to display the content—from word processing to typesetting to, with some transformation, EPUB readers. Some publishers might find that multiple DTDs are useful for various kinds of books. This really depends on the diversity of the publishing program and the ease of training staff to handle variations in the workflow.
This is the CUP DTD viewed in an XML editor. The elements in the CUP DTD evolved directly from the typesetting styles that MS editors used to communicate to the designers and compositors. As the XML workflow has developed since 2002, these elements and the style template used by the editors have evolved as well. But they had their origin in decades of bookmaking knowledge.
In a major 2008 upgrade, the template of styles available to CUP editors was expanded, with a corresponding expansion of the DTD. At the same time, all XML coming out of MS Editorial had to be valid to the DTD, so no nonstandard styles could be passed through from editing and become part of the XML archive. Since the DTD defines which elements are legal and specifies how they can be arranged, enforcing validity means there will be no surprises in the XML that could cause breakdowns in typesetting or in other uses of the XML.
As part of this revision to the DTD, we increased the granularity and specificity of the styles and tags to communicate more information to the designer and comp, such as the position of a paragraph in a series of similar elements (e.g., the last item in a bulleted list). The level of this granularity came out of identifying inefficiencies in the communication between editor and des./comp.
How granular do styles and DTD elements need to be? This is really a matter of individual choice. But in deciding whether to add a certain kind of element to the DTD, use your bookmaking experience to ask some questions. 1. Are the indentation, position, and context important enough to justify a style and a tag? Can it be taken care of with a note to the compositor? 2. Is the element common to many books or kinds of books or unique? 3. Is it a print-only issue? Will digital applications take note of it? Is it something you want to make sure is preserved in digital applications?
CUP currently deals with elements like these through a system of custom styles. In Word editing, the ed. can set these styles to look like whatever is needed. In conversion to XML, the description of what the custom style is meant to be—that is, the function of the element—is incorporated directly into the XML tag. Since these explanations become part of the XML, the file remains valid to the DTD and the specific requirements of the element are available for any future use. I do think that there is still room for more specificity in a few matters, such as elements within other elements, multiparagraph quotations within notes, or lists within boxed text.
There are a number of off-the-shelf DTDs out there specifically meant for book publishing, including Docbook and TEI. In creating an XML workflow, you need to decide if using one of these is the best way to go and, if so, how. You can adapt your workflow to match these standards. This is probably the best course for presses that are not used to working with rigidly defined styles. It might be harder for presses that have a strong legacy of styles and are used to doing a lot of work in-house. If need be, it’s always possible to create or purchase a transform that will convert your final XML to be valid to an off-the-shelf DTD, and this might be the best way to handle matters if you want to create your own EPUBs in-house.
AAUP 2011: XML Workflows (M. Haskell)
XML, DTDs, OMG <ul><ul><li>Michael Haskell, Publishing Systems Manager, Columbia University Press </li></ul></ul>
XML Workflows and DTD Choices <ul><li>About Columbia University Press </li></ul><ul><li>Models of XML workflows: kinds, challenges, strengths </li></ul><ul><li>DTDs: Use what you know </li></ul>
Columbia University Press <ul><li>Established 1893 </li></ul><ul><li>160 new book titles per year </li></ul><ul><li>Diverse publishing program </li></ul><ul><li>Mixture of in-house, freelance, and packaged composition </li></ul>
XML and Scholarly Publishing: Use What You Know <ul><li>Most scholarly publishing still driven by “the book”—print or, increasingly, digital </li></ul><ul><li>Printed book is one product of the underlying content </li></ul><ul><li>Apply bookmaking expertise to development of XML workflow and digital publishing: use the resources you have </li></ul>
Possible XML Workflows <ul><li>XML early </li></ul><ul><ul><li>Editors and authors work in XML-editing application </li></ul></ul><ul><li>XML in the middle </li></ul><ul><ul><li>XML extracted from word processing and used for typesetting </li></ul></ul><ul><li>XML late </li></ul><ul><ul><li>Final application files or PDFs converted to XML after book is complete </li></ul></ul>
Pros, Cons <ul><li>XML early </li></ul><ul><ul><li>Requires buy-in: Editors and authors must learn to use new software. Allows repurposing throughout publishing process. </li></ul></ul><ul><li>XML in the middle </li></ul><ul><ul><li>No need to learn new software. Allows repurposing. Relies on standardization to preserve XML after typesetting. </li></ul></ul><ul><li>XML late </li></ul><ul><ul><li>No new software. Requires financial investment to convert each title after composition. </li></ul></ul>
CUP: XML in the Middle <ul><li>XML is behind the scenes: neither production editors nor comps need to learn an XML-editing program </li></ul><ul><li>Authors can receive edited MS files in Word and page proofs as PDFs: no need to touch XML </li></ul><ul><li>Styles applied in Word lead directly to tag set in XML—and this influences the creation of the DTD </li></ul>
The Great Divide <ul><li>Two major facets of an XML workflow: pre- and post-production (typesetting) </li></ul><ul><li>The XML must work for both typesetting and other purposes further down the line </li></ul><ul><li>This break can cause headaches—and bridging it is an important part of the XML workflow and a major consideration in establishing the DTD </li></ul>
What Does a DTD Do? <ul><li>Establishes a set of rules that govern the structure of an XML document or set of documents </li></ul><ul><li>Defines how each element relates to all others and to the document or collection </li></ul><ul><li>Natural connection between typesetting styles and DTD elements—use what you know! </li></ul><ul><li>Multiple DTDs: useful? </li></ul>
Granularity: It’s What’s for Breakfast <ul><li>How much do you need to communicate? </li></ul><ul><li>How often does the situation come up? </li></ul><ul><li>What are the ramifications for digital use? </li></ul>