Building XML Projects for an FDA-Approved Device: 3 ...
Building XML Projects for an
and Their Success Factors
Dorothy J. Hoskins
Novatek Communications, Inc.
A Quick Tour of Terminology
• DTD or schema: the rules for organizing and
defining/naming XML elements and attributes
• CMS: content management system.
• TM: translation memory, a file that can be used
to find identical, similar or related words among
• Workflow: a method to route documents and
messages from one worker to another
Are We Saving Money Yet?
A Cautionary Tale of 3 Projects
• A global company manufacturing complex
medical equipment (FDA regulated) has been
heavily invested in XML technology for
documentation for over 3 years. In 3 projects,
they may have lost a lot of money overall – or
have they saved millions?
• It’s a story of XML design, infrastructure, and
content control from writing to approval to
translation to customer use
Impact of Complexity
• Customization increases complexity
Complexity (# elements
Custom PDF +
HTML + RTF
Increase in cost to deliver outputs
3 Projects: Overview
Project # docs # Output types Custom or Other
(English) lang standard
Large scale Thousands 11 PDF on Custom Regulated
CMS, (hundreds then CD/web, RTF (complex audited
integrated x multiple 18 (user-editable), DTD of 250 content
XML editor versions) and HTML elements) (FDA
(later if ever) approved)
Medium thousands 18 HTML with Custom Embedded
size, open cross-linked (simple DTD in device;
source navigation plus of 50 product-
tools plain text, PDF elements) specific
for internal use
Small scale hundreds 18 HTML with Standard Embedded
DITA plus cross-linked (50 elements in device
SaaS navigation, subset of
PDF on CD DITA)
on server Email for DOC Print for Hard
DOC Ink date/sig
• Legacy of Word documents, “wet”
signatures, paper file storage of versions
• Translated the approved Word files, got
Certificate of Translation Accuracy (CTA)
Controlled authoring, editing and
XML versioning of FDA-audited content
Revisions record + sigs
• Complex XML DTD requires complex
WYSIWYG template in XML authoring tool
• Version control provided by CMS check out/in
integrated with the authoring tool
Generate PDF PDF PDF
Review and PDF, email approval
approval XML for review
workflow Track comments E -signature
Revisions record + sigs
• Custom transformation of XML (XSL-FO) to PDF
for review & approval
• E-mail alerts for tasks to workflow recipients
• E-signature provided as part of CMS workflow
• XML routed for translation, In Country
Review (ICR) as PDF, returned with CTA
• One doc per workflow, multiple langs
CMS archives XML Translation
documents of vendors
record + sigs Translation order x lang
+ English master Revisions
CTA ICR x lang
Translations PDF (XSL-FO)
CD, print, web distribution
to end user (customer)
XML RTF Send
CMS archives outputs to distrib.
documents of x lang process
record + sigs PDF
• Control maintained in CMS up to
• Changes to the original DTD meant expensive
ripple effects for PDF and conversion.
• The workflows for review and approval, and
translation workflow, were hard to set up and
had unintended consequences.
– Risks for changes very high
– Documents in workflows are checked out, not editable
• RTF output was buggy and required manual
edits, not “single source” any more (synch XML
and RTF changes for every revision).
• Summing it up: High complexity, complete
custom solution, high cost proprietary
system, few users, not as many
documents as proposed.
• Hard to recoup costs.
• Was it worth it? Costs are not dropping
over time, relative to the benefits. “Buyer’s
remorse” setting in.
• Meanwhile, another project: software
troubleshooting messages. Over 2 million words
of highly repetitive content.
• Required integration as text and HTML files with
software on the equipment.
• Needed to add a lot of content and languages
but the same time (or less) as last product
• Legacy: Troubleshooting content was
previously in HTML
– not consistently in either wording or HTML
– translated as HTML, but translators messed
up HTML markup.
• Needed: version-control and reuse (a lot of
text was redundant).
• Developer based DTD on HTML plus a
small set of custom elements.
• Developer worked out entire process end
to end, proved and documented it.
• Risks: custom code, multi-step process
• Benefits: open source tools, batch
Vars -> Chunks -> Collections
A B C A B C A C
C D A B A B A B
A C C B A C B A
• Extreme A D
reuse! A B A D
• Mix-and- A C
D C A C
match at A D C A D C
C B A
A D C
• CMS provided version control, “where used”
reporting, integration with authoring tool during
• Open-source tools provide fast batch processing
for outputs outside CMS.
• Once chunks were approved in English, open-
source tools used create XML English
“translation masters” so the translators could see
the complete text.
• Reuse made translations very consistent.
• As more content was translated, got more
benefit from reuse in TMs.
• Cost per word dropped to below .09/word,
versus over .22/word in the translation
estimate, saving about $1.7 million.
• After translation, the XML was batch-converted
with open-source tools to the final HTML and .txt
files in all languages, by product configuration.
• The folder structure of the HTML output was
automatically generated so the links, images
and the look and feel would be correct on the
• In Summary: Highly successful project, on time,
creating huge savings, due to simple DTD and
extreme reuse of content.
• Open source batch tools combined with CMS
provided best of each system.
• XML is now “content of record”, so can check
out, edit, make PDF for approval process, and
leverage existing translation.
• Customer needed reference documents
and Help files on equipment, about 650
topics x 17 language translations.
• Wanted to use DITA to avoid costs of
custom DTD, benefit from collaboration
and Open Toolkit.
• Writers already familiar with other XML,
got some DITA training.
• Open source tools made it easier to
publish DITA “maps”.
• Lacked version control system.
• Found Software As A Service product
which bundled XML editor, CMS,
workflows and translation interface.
• SaaS model provides application via web
browser, users work “in the cloud”.
• Global, 24/7 CMS access, XML content
development and SaaS team support.
• Contract is for user licenses and hosting,
not for software application licenses.
• All content can be extracted from the
servers as backup in standard XML.
• Customer owns the translation memory,
which resides on their server.
• Set up and initial content import plus
training included for $40K. Additional work
pushed costs up >3x.
• Migrating existing DITA required change
from standard DITA to CMS schema.
• Numerous delays due to small SaaS staff.
• SaaS’ built-in XML editor was not very
robust or sophisiticated.
• Sometimes the server’s downtime lost
work and created delays.
• Translation vendors complained that the
TM segment interface slowed them down,
estimated 3x the time to translate.
• Emergency solution -- exported the XML
from the SaaS and used XSL to “resolve”
the referenced text.
• Developing offline processes added some
complexity and costs to the project.
– regression testing of XML in CMS.
– testing output processes that translation
vendors would use.
• Leveraged processes developed for the
medium-scale project to develop
processes for the small-scale project in a
few weeks. Same concept of open-source
batch processes for merging XML fo
create translation masters, producing the
HTML for ICR.
• Vendors create translation memory files of
resolved XML offline.
• Import the TMs back into SaaS, then can
review in segment view.
• Translated, approved content used to
generate PDF and HTML from SaaS’ CMS
as originally intended.
• Summing up: SaaS much less expensive
than enterprise CMS plus XML software
• Some problems directly related to SaaS
model or SaaS company.
• Could be an important tool for global
content development when mature.
3 XML projects: Conclusions
• Flexibility of XML is proven (again) –
can engineer various solutions to meet
customer needs as they arise.
• Speed, accuracy and efficiency gains from
translation are very significant.
• Project must be well-designed to achieve
hoped-for results. QA needed for every
Dorothy’s Top Rules for XML
• Think of an XML project more like software
development than content development.
• Keep it simple.
• Pilot the entire project through every stage
• Allow 3x the time and money that the vendor
• Involve your writers, SMEs and translation
vendors from the start in selecting applications.
• Make QA part of every process for initial
development and every time system changes.
How to Save $$$$$$$$
• Make sure there are sound business reasons for
XML that management supports. Translation +
reuse + infrastructure + experience = succe$$.
• Perform risk analysis as well as cost/benefit
analysis for all parts of the XML project.
• Work with experienced developers.
• Work backwards from the deliverables to the
• Determine workflows for review/approval and
train SMEs in your XML system.
Success Factors: More is More
• Volume: #docs x #langs x #outputs
• Reuse: among same docs, across doc
sets, TM, output formats
• Time: 6 mo. pilot, up to 1.5 years prior to
launch; not tied to quarterly budget;
• Funds: for infrastructure, training,
development, implementation, QA
• Leadership: long-term planning
• Dorothy Hoskins can be reached at
• Novatek Communications, Inc., provides
technical documentation, eLearning/LMS
and XML services.
Novatek Communications, Inc.
500 Helendale Road, Suite 280, Rochester, NY 14609-3173 (Tel.)
585.482.4070 (Fax) 585.482.4098
A particular slide catching your eye?
Clipping is a handy way to collect important slides you want to go back to later.