XML and content strategyWhy and how to “future-proof” your contentPublishers and other information providers increasingly ...
The tag for a first-level heading also can function as metadata.For instance, a book’s table of contents might be construc...
around the phrase Homo sapiens that indicate “these words Data in the content management                                  ...
of career-oriented pressures that impel them to comply with Vendors that have developed and                               ...
In fact, if you need to understand how XML refers to types ofcontent and not their appearance, take a look at the display ...
The Contributors                                                 The AuthorsSpecial thanks to the following individual con...
Upcoming SlideShare
Loading in …5

XML and Content Strategy


Published on

XML and content strategy --
Why and how to “future-proof” your content

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

XML and Content Strategy

  1. 1. XML and content strategyWhy and how to “future-proof” your contentPublishers and other information providers increasingly By far, the most practical, most versatile tool for manipulatinguse multiple media to display their content for various content for both current formats and those not invented yet isapplications. Books become e-books, online journal articles XML. XML (eXtensible Markup Language) is an open standard;are published online first and in print later, and figures are its power derives from the fact that XML has been adopted byaggregated into image databases. Users request chunks of entire industries, many government agencies, and platformcontent, or the publisher assembles pieces of content from developers. When new standards emerge, such as EPUB formultiple publications into a new publication. Information e-book readers, the standards are derived from generic XML,users want what they want, when they want, in the form they allowing even files created a few years hence to flow readilywant. As publishers work to respond to the changing needs into the new standard.of their constituencies, the challenge is: how can publishers“future-proof” their content? By far, the most practical, mostEven today, content takes many forms and has many uses. versatile tool for manipulatingPublishers find that they need to adapt their content invarious ways (figure 1). content for both current formats and those not invented yet is XML. Sampling Some organizations think of XML as a different set of tags. While XML tags are different from those used in other • Web-ready HTML on proprietary platforms systems like SGML or HTML, XML is actually a different way of • HTML for web previewing thinking about content. Karen Colson, director of publishing • PDF for printing/viewing/downloading and communications at the Association for Research in Vision • Distribution by third-party aggregators (Ovid, EBSCO) and Ophthalmology (ARVO) explains it simply, • Abstract & indexing services (Scopus) • Mobile devices (iPad, smartphones) XML describes content, not appearance. • Archival solutions (Portico) An XML tag (actually, a pair of tags—one at the beginning and one at the end of an element) might indicate that a sectionFigure 1: Sampling of data output of copy is a first-level heading inside a book chapter. The actual appearance of the heading, however, is determined byIf today’s situation is not complicated enough, the future a different style sheet for each application. The typeface andis likely to be even more complex. How can information size that appears in the book might be completely different ifproviders respond to the changing needs of customers and the book is available on an e-reader, and it might be differentnew technologies with greater facility in terms of time, cost, still if the book is included on the electronic platform of aand effort? third-party aggregator.SPi Global2807 North Parham Road, Suite 350, Richmond, VA 23294T 1 804 262 4219 www.spi-global.com
  2. 2. The tag for a first-level heading also can function as metadata.For instance, a book’s table of contents might be constructed Organizations that want to get theby copying chapter titles and first-level headings. Or, perhapsan aggregator’s general search function could look primarily most out of XML apply it consistentlyat first-level headings. In either case, a pair of tags that starts and as early as possible in the contentout regulating appearance can have multiple programmatic development process.applications as well.Organizations that want to get the most out of XML apply When an error occurs, the correction is made in the nativeit consistently and as early as possible in the content XML file so that the error can be corrected in every productdevelopment process. When this happens, editing changes that flows from the content. Making corrections in the nativeare captured within a single, authoritative XML file, all XML XML file represents the industry’s best practice, but practicalfiles are built according to the same rules, and the final challenges exist even with this approach.XML file is the source for all types of output. Creating thiscapability requires thoughtful planning and technically astute Julia Sawabini, director of e-commerce at Elsevier, explainsimplementation. that to build the web page for a particular product, Elsevier pulls content from a database containing fields variouslyPlanning for end-to-end XML Workflow populated by editorial, production, and marketing people.The most reliable and powerful way to apply XML to The information is organized via style sheets but no contentdocuments is to do so at the very beginning of the production is created at this point. “If there’s something wrong oncycle. In organizations where content is created by employees, the website, it’s wrong someplace along the way. I can’tthe content creator may enter tags, often using shortcuts or change it.”templates. For most publishers, however, tags are applied byskilled markup operators based on the list of tags available Once a correction is made, the change may not appearto them (more on this below). Most markup operators work immediately, as the website is updated in batches at specifiedfor compositors, so their function sometimes overlaps intervals. The incorrect product information will appear on thewith typesetting. But markup is a distinct function in the site until the update takes place. Also, the incorrect materialproduction process. Once the tags are applied, production will remain on the servers of distributors, e-bookstores, andcan proceed (figure 2). other outlets for the information unless corrected files are sent and uploaded. XML markup An analogous challenge occurs in publishing printed materials. Sometimes a production person spots an error while Copyediting processing a PDF for the printer. The temptation, and often the reality, is that the production person corrects the PDF and sends it on to the printer, breathing a sigh of relief. Unless Typsetting the production manager remembers to go back to make the same correction, the error still exists in the XML file. Page layout Implicit in this discussion is the notion that XML workflow Proofreading includes an element that is rarely critical in a single- medium product—what director of production at Elsevier Phil Schafer describes as “a central content repository with Content Repository full functionality.” It is not enough to save all content to a particular server. Ideally, the content will flow into a database- Multiple outputs like structure that enables the owner or other authorized users to find specific content and manipulate it for specific Figure 2: Production process using XML publishing applications.Page 2 XML and content strategy Why and how to “future-proof” your content
  3. 3. around the phrase Homo sapiens that indicate “these words Data in the content management are genus and species – put them in italics, and remember to make an index entry for this term.” In an anthropology book, systems are heavily tagged with you might want to distinguish between Homo sapiens and metadata so users can get optimal other species such as Homo erectus, and treat both species as search results despite the multiple index sub-entries under the genus Homo. In that case, you’d put a pair of tags around Homo indicating “this is a genus”, original sources of the material. and a tag around either sapiens or erectus indicating “this is a species.” Instructions for constructing the index would complete the picture.Content repositories can be critical in highly regulated areassuch as medicine. Larry McGrew, head of content and editorial The previous paragraph took 186 words to discuss how tooperations at Aetna, relies on multiple content management treat genus and species in a DTD. Multiply this by the manysystems with carefully approved material to populate editorial, functional, design, and marketing considerations inAetna’s sites that are central to their members’ experience. any one publication, and then multiply it again by the rangeMcGrew admits that this has been “extremely challenging” of publications you hope to represent with a single DTD. Theto implement. considerations become massive, and the temptation might be to skimp on the detail of the DTD (for instance, coding forThe DTD genus and species together, rather than separately). This mightThe Document Type Definition (DTD)—the very rough be a false economy, though. Nina Chang, senior publisher forequivalent of type specifications for print products—specifies e-journals at Lippincott Williams & Wilkins, points out,both how an element will look in print, on the web, on e-bookreaders, etc., and, to some extent, what the element means. Richly tagged data allow for moreDTDs need to code both data and metadata. precise searching.To explain how a DTD functions, look at the different tagging In STM and scholarly publishing, searchers want to retrievepossibilities for how genus and species might be handled the information that really matters, so the detail of the DTDdepending on the media and application. For instance, we is important to the perception of quality. It’s helpful to refineassume that readers of this white paper belong to the species the DTD as much as possible before implementation.Homo sapiens. It is probably sufficient therefore to surroundHomo sapiens with XML tags that mean “put these words initalics no matter what other appearance specifications youhave.” But in a zoology book, you might want to put each One approach is to start with a DTDgenus/species into the index. In that case, you could put tags that is already in the public domain. The Document Type Definition As Schafer points out, “If we choose to introduce a new (DTD)—the very rough equivalent element, we have to take it to a supplier support data team of type specifications for print to ensure that it’s implemented across all of our journals.” And Chang of LWW points out that changing the DTD has products—specifies both how an implications for archival data as well. For instance, do you go element will look in print, on the web, back and insert new tags to keep up with the functionality on e-book readers, etc., and, to some of new material? This requires a business decision: What are the changes worth to the users, compared with the extent, what the element means. inevitable costs?XML and content strategy Page 3Why and how to “future-proof” your content
  4. 4. of career-oriented pressures that impel them to comply with Vendors that have developed and constraints that authors of journal articles will accept. Still, over time elementary-high school and higher education worked with DTD’s in the past have a publishers have begun to implement DTD’s, which in turn pragmatic knowledge of what works offer them flexibility. Not only can they put content on well for their customers, and they also multiple platforms to meet student and school district needs have staff with backgrounds to steer but also they can customize the content of publications. This may be one reason why most educational publishers seem skillfully through the complexities. fairly confident of their ability to meet the idiosyncratic social science requirements of the single largest school district (ie, the Texas School Board) while continuing to publish theirAt large publishing organizations, developing a sufficiently books for the rest of the country.powerful and flexible DTD is a challenge. As we discussedearlier, it is not enough to catalog all of the type specifications Custom publishers are another category that has found XMLthat might be needed. A team building the DTD also needs to be an invaluable asset to their business, as seen in theto consider whether to define specific kinds of information Case Study.and to what degree of detail, and they also need to definethe metadata required for their own use and for the use of ONIX: A specialized DTD for book metadatacurrent and future third parties. For people in the publishing industry, ONIX (ONlineOne approach is to start with a DTD that is already in the Information eXchange) is perhaps the most familiar examplepublic domain. For instance, Colson of ARVO has twice used of a DTD for metadata.the DTD developed by the National Library of Medicine as thebasis for an organizational DTD: ONIX is used extensively in the book trade as a standardized means of communicating information about books—from[The DTD from the National Library of Medicine] author and title to weight per copy, minimum order quantity, subject classification, and so forth. These data then populateis comprehensive—it works for books, everything from the publisher’s own Website (for instance,Annual Meeting abstracts, and all of our the one maintained by Elsevier’s Sawabini) to industry giantsother publications. such as Amazon and Barnes & Noble.Colson even used this DTD when she worked at AmericanGeophysical Union (AGU), even though AGU content had little Case Studyif any relationship to medicine, because the structure worked Triangle Publishing Services, Inc., prepares publicationseffectively for other types of scholarly content. for technology companies like Microsoft, Cisco, and Hewlett-Packard. In some cases, Triangle has preparedAnother approach is to contract with a trusted vendor. all the content in a book so that it can be repurposed.Vendors that have developed and worked with DTD’s in thepast have a pragmatic knowledge of what works well for For example, a book with chapters on applications in a dozen different industries can be disaggregated intotheir customers, and they also have staff with backgrounds a dozen different white papers for distribution online.to steer skillfully through the complexities. Outside vendors Or, by searching on XML tags, the book’s case studiescan do their future-oriented work freeing up in-house staff can be extracted and used in other settings.to manage day-to-day operations. And a good outside vendorcan also help train staff to understand the new DTD and/or a Larry Marion, CEO and Editorial Director at Triangle,new, XML-oriented workflow. says this about taking advantage of the power of XML:A large proportion of scholarly journals, with their tightly Think about how you want to repurpose content; bestructured, relatively brief units of copy, have migrated with as creative and granular as possible. Extra work at thereasonable success to XML. Books have been harder because beginning can save you pain down the road.they are more varied, and authors often don’t have the kindPage 4 XML and content strategy Why and how to “future-proof” your content
  5. 5. In fact, if you need to understand how XML refers to types ofcontent and not their appearance, take a look at the display of Data conversions are typically doneany particular title on Amazon, and then on Barnes & Noble. by production vendors, with theirAuthor, title, publisher’s description, and the like look entirelydifferent, yet they contain precisely the same information. in-depth knowledge of publishing workflows and outputs.Other industries and disciplines have their own specializedmetadata sets, as well. display, search, and the like. Similarly, links to tables andImplementation illustrations might or might not be captured.In some parallel universe, management might be able to Another challenge is that conversions may not capturesend out a memo one Friday afternoon announcing a new important metadata (“this is a chapter, not a scholarly paper”)production workflow that starts the following Monday because the metadata simply don’t exist in the originalmorning. In this world, however, it isn’t that simple. Employees material. Either the original publisher provides the metadatamay need to perform different tasks, or they may perform retrospectively, or the new party provides the metadata usingthe same tasks in different sequence. Managers need to their best, potentially fallible judgment.assess performance using different metrics. Suppliers need toaccept input that looks different and generate different kinds Building capacity for end-to-end XML requires an organizationof output, with possible changes in schedules, prices, and to commit staff resources, time on the calendar, and financialquality management. For a publisher, all of this needs to take resources. Realistically, not every publisher can muster allplace while products already in the pipeline move through three kinds of resources conveniently.the previous workflow, or some hybrid. Data conversions are typically done by production vendors, with their in-depth knowledge of publishing workflows and The programmatic approach, however, outputs. can miss or misinterpret improvised or last-minute changes. Another approach is to leave file conversions to the aggregator, e-book platform, etc. that wants to use the data. These companies typically do a good job of ensuring that the XMLXML on the fly they generate is effective for their application, but if another vendor approaches the publisher, the process needs to beSometimes, an information provider will need to produce repeated at the cost of more money and more time.XML hastily. For instance, a content provider may be switchingpublishers or may be wishing to digitize back file content, or Time for XML?work with a new third party aggregator. For the foreseeable future, information is going to flow intoIn these situations, publishers need to convert existing data. and through multiple platforms— from books, magazines,With typesetting files in hand, a conversion vendor can read and newspapers to websites, e-book readers, mobile devices,the typesetting codes (for instance, “Heading 1”) and change and inventions that are only sketches on a white board rightthem to XML tags, for the most part programmatically. For now. Authorities agree that XML provides the most effectiveinstance, if someone sees at the last minute that a “1” head way to cope with the multiple and shifting demands. Colsonreally should have been a “2” head, that person might not of ARVO says it well:change the typesetting code but might simply alter the typecharacteristics to look like a “2” head. The XML coding will Don’t be afraid of XML. Using XML will give youcontinue to treat the heading as a “1” head, with potential more versatility than any scheme I’m aware of.implications for the quality of the applications such as WebXML and content strategy Page 5Why and how to “future-proof” your content
  6. 6. The Contributors The AuthorsSpecial thanks to the following individual contributors: • Rich Lampert• Nina Chang, Senior Publisher, Online Journals, Lippincott The Lampert Consultancy Williams & Wilkins www.lampert-consultancy.net Rich Lampert is owner of The Lampert Consultancy, LLC,• Karen Colson, Director, Publishing and Communications, established in 2004 to provide strategic, editorial, and Association for Research in Vision and Ophthalmology marketing services to publishers in STM, professional,• Mark Gaertner, Senior Web Producer, Team Lead, and scholarly publishing. Rich is also, Principal, Publishing BMStudio at Bristol-Myers Squibb Services Division, at Doody Enterprises, Inc., which focuses on not-for-profit publishers.• Larry Marion, CEO/Editor-in-Chief, Triangle Publishing Services • Cara Kaufman• Larry McGrew, Head, Content/Editorial Kaufman-Wills Group Operations, Aetna www.kaufmanwills.com Cara Kaufman is co-founder of Kaufman-Wills Group,• Julia Sawabini, Web Marketing Director, Elsevier LLC, which was created in 2000, to offer STM and other• Phil Schafer, Director, Journal Production, Elsevier scholarly publishers a full range of professional publishing services in the areas of strategic planning, business development, electronic publishing strategy, RFP and self-publishing projects, editorial services, and marketing and market research. SPi sought the help of Kaufman-Wills Group in developing this white paper.Page 6 XML and content strategy Why and how to “future-proof” your content