2. Confidential & Proprietarywww.dclab.com 2
Valuable Content Transformed
• Document Digitization
• XML and HTML Conversion
• eBook Production
• Hosted Solutions
• Big Data Automation
• Conversion Management
• Editorial Services
• Harmonizer
3. Confidential & Proprietarywww.dclab.com 3
Experience the DCL Difference
DCL blends years of conversion experience with cutting-edge technology and the
infrastructure to make the process easy and efficient.
• World-Class Services
• Leading-Edge Technology
• Unparalleled Infrastructure
• US-Based Management
• Complex-Content Expertise
• 24/7 Online Project Tracking
• Automated Quality Control
• Global Capabilities
5. Confidential & Proprietarywww.dclab.com 5
. . . Spanning All Industries
• Aerospace
• Associations
• Defense
• Distribution
• Education
• Financial
• Government
• Libraries
• Life Sciences
• Manufacturing
• Medical
• Museums
• Periodicals
• Professional
• Publishing
• Reference
• Research
• Societies
• Software
• STM
• Technology
• Telecommunications
• Universities
• Utilities
6. Confidential & Proprietarywww.dclab.com
• Companies focused on delivering new content
• Are they overlooking what they already have and missing
opportunities?
• The digital age has increased potential audience size and
demand for content
• Don’t ignore what you already have – it may be highly
valuable
6
What’s in Your Archives?
7. Confidential & Proprietarywww.dclab.com
• Paper
• Microfilm
• Photographs and/or slides
• Electronic files
• Some combination of the above
• Can you find it easily?
7
Which Format Is Your Data in?
9. Confidential & Proprietarywww.dclab.com
• Task seems too big
• Who takes ownership?
• Cost in dollars and staff resources
• Will take forever to complete
9
What’s Causing Your Fear and Anxiety?
10. Confidential & Proprietarywww.dclab.com
• Perception: Converting legacy data will be costly, labor-
intensive, and too complex to manage. The ROI is
questionable, thus it’s not worth the risk
• Reality: Proper analysis and planning, a customized process,
and the expertise of a trusted partner limits the risk and
ensures maximum ROI
10
Perception vs. Reality
12. Confidential & Proprietarywww.dclab.com
• Identify what you have
• Determine your target audience
• Decide how you’ll distribute the digital content (Web and/or
mobile, subscription model or single purchase, discrete pieces
of content, e.g., images)
• Develop your business case (costs, markets, revenue)
• Start converting!
12
Where Do I Begin?
14. Confidential & Proprietarywww.dclab.com 14
Converting a Large Content Repository
Customer Problem
• OSA needed to build a flexible digital repository of its authoritative library of scientific
journals going back to 1917; 750,000 pages spanning almost 100 years
• The materials incorporated extensive math, tables, and images, in multiple formats
which needed to be built into a cohesive database that would facilitate new approaches
to dissemination and creation of future products, not yet conceived
Solution
• Flexibility in execution–the size and breadth of the collection made it impractical to
develop full specifications in advance
• Develop an overall specification, with allowance for change as new scenarios are
discovered
• Software development sprints to incorporate changes and frequent review meetings
allowed the assessment of nuances in new materials as they came up. Close
collaboration to manage new situations
Results
• A three year project delivered on schedule and on budget, with new products already
developed and out on the market
• The close collaboration and involvement of the client shaved 6-8 months off the project
schedule, and created a product that meets all goals
Case Study – Optical Society of America
15. Confidential & Proprietarywww.dclab.com 15
Customer Problem
• Need to improve the content coverage and link density of their Scopus bibliographic database,
beginning with their back-list of published articles prior to 1996
Inventory over 5.5 million Elsevier files against over 3 million Scopus records
Convert over 50 million references to a standard XML format. Source content consists of
multiple variations of a source DTD with differing levels of quality including totally
unstructured references
Link as many references as possible to the Scopus repository
Solution
• Automated solution to inventory the large archive and provide comprehensive inventory reporting.
• Developed a fully-automated multi-step solution, running 24 x 7 , to process the source content
and return high-quality, converted, validated and enriched references, improving the match rate to
Scopus
Results
Decomposed over 1 million unstructured references based on pattern detection software
Heuristically repaired and converted the source content to Elsevier CARS XML
Validated each reference against Scopus, CrossRef, or PubMed and enriched the content
based on the results, as appropriate
Packaged and delivered the final XML for ingestion into the Elsevier Scopus System
Case Study – Elsevier
Automating Large-Scale Reference Conversion
17. Confidential & Proprietarywww.dclab.com 17
The Value of Structured Content
Increase Revenues
Improve customer service
Decrease time to market
Expand into new markets
Create data versatility
Enhance discoverability
Decrease Expenses
Increase authoring productivity
Reduce publishing costs
Increase information reuse
Reduce translation costs
Future-proof data
Successful business strategies are driven by content!
18. Confidential & Proprietarywww.dclab.com 18
Can your content keep up with changing technology?
Data drives every aspect of a business from engineering and development
to maintenance, repair and operations, sales, customer service, marketing,
and more
Documents are often converted in order to comply with law, industry
standards, or to support distribution partners and meet consumers'
expectations
Data conversion is most desirable for its potential to lower costs by making
data easier to manage, update, reproduce, and syndicate
Structured formatting enables content to be delivered any where at any
time on any device imaginable
19. Confidential & Proprietarywww.dclab.com 19
Re-purposing
Searching
Component Reuse
Enforce Data Standards
Interchange with Vendors, Customers, & World
Creating new versions of data suitable for derivative uses
(e.g. the web, diagnostic equipment, hand-held devices,
voice devices)
Ability to find information through text searches and
through more advanced searches that depend on context
and “understanding”
Ability to reuse portions of data for different products and
different documentation sets
Ability to assure that the information produced is
produced consistently and meets corporate standards
Ability for others to use your information for
communications with others and to incorporate into
products belonging to other organizations
Various Uses for Structured Content
20. Confidential & Proprietarywww.dclab.com
• DON’T let your valuable content lie dormant
• Convert it into a structured format that supports the needs of
your business
• Mine that gold!
20
Key Takeaways
Good afternoon, everyone! Thanks for joining us for this webinar. Today we’re going to talk about finding the value in your legacy content to create new products and new revenue streams. I’m Greg Fagan, and I’m the Sales Director for the publishing and financial industries at DCL. Because you’re all busy people, I’ve tried to keep this presentation as concise as possible. I’ll talk for about 15-20 minutes and then open the floor to your questions.
Just some quick background information on DCL. We’re content conversion experts. We take content in any format you might have it and convert it to reusable formats for digital output such as XML, SGML, HTML5, DITA, and EPUB. We not only convert your content, but we can enrich it to make it more discoverable, usable, and deliverable to any output format or device. Aside from conversion, we offer a suite of services, including hosting, editorial services, and project management.
Our deep experience, sophisticated infrastructure, and ferocious commitment to quality are what set us apart from the pack.
We serve a broad range of clients. Myriad large, global companies from many different sectors entrust their content to us.
And our clients span a wide array of industries, which speaks to our familiarity and fluency with many different XML schemas. Publishers, societies, pharmaceutical companies, defense contractors, and government agencies are just a few of the types of clients and industries we serve.
Most businesses and organizations, be they in publishing, financial services, pharmaceuticals, aerospace, and most other sectors, are focused on how they’re going to produce and deliver new content or data, and they should be. But many of these same organizations also have decades worth of legacy content, and in many cases, it’s sitting there untapped when it could, if properly digitized and structured, be creating value and enhancing their business. Digitization and the ever-expanding list of digital delivery channels have vastly increased potential audience size and demand for content.
Legacy content is often converted in order to comply with legal or industry standards or to support distribution partners and meet consumers' expectations. Generally however, legacy conversion is most desirable for its potential to lower costs by making data easier to manage, update, reproduce and syndicate. Given that, it’s simply bad business to ignore your archives.
Legacy content exists in many forms. There’s paper, like hard copy books, journals, and newspapers; microfilm, photographs and/or slides, electronic files (PDF, Word); or various combinations of the above. In all likelihood, you can’t find it very easily, even if it’s in digital form. It’s sitting in boxes in storerooms or basements, or on shelves, or in 50 different subdirectories on your network.
Think about your own legacy content. Which formats do you have? How is it stored? Is it retrievable? We’ve seen all kinds of legacy material, including mountaineering maps and images, letters and papers from famous people that have been contributed to a university library, specialized image collections, diaries of Civil War officers, scientific journals dating back decades and even centuries, vintage car repair manuals, movie magazines from the golden age of cinema...the list is endless.
The thought of actually organizing and reviewing all this data is daunting and downright scary. But the truth is, it doesn’t have to be.
So what’s behind this fear and anxiety? Well, if your organization has decades worth of legacy content in various formats and saved in many different places, the task of compiling, organizing, and converting it seems Herculean. And who will take ownership and drive this huge project from start to finish? Finally, even from a high overview, it seems the cost in dollars and staff/management will be prohibitive, and it will take ages to complete.
These are legitimate and understandable concerns. How do we review and analyze all that content? Do we have the right people on staff to drive it? Can we properly estimate costs and secure the budget to proceed? These are all good questions, but they need not make you flee in fear.
Often perception and reality are very different things. In many cases, the perception represents a distorted view. In most cases, careful planning, a customized process, and the help of a knowledgeable and trusted partner minimizes the risk and ensures maximum return on investment.
Converting legacy data is an investment that results in increased revenues and decreased expenses. Not only will having data maintained in a more structured, easily configurable format increase customer service and decrease time to market, it will allow for expansion into new markets and create data versatility as well. Additionally, publishing and translation costs will be reduced and authoring productivity and information reuse will be increased. If you want to realize these benefits, it’s critical in my view to work with a vendor that has the content expertise and technological sophistication to help you manage your conversion successfully.
You have three options when considering legacy conversion and calculating expected ROI: 1) Convert nothing: This will result in delayed or no ROI. 2) Convert everything: This will result in higher conversion costs and a potentially lower ROI. 3) Convert top-priority content: This is the best option to start with, as there will be some conversion costs but a maximized ROI. It will take some effort on your part to identify your high-priority content, but it’s a worthwhile exercise that will pay real dividends. You can always convert the remaining content later if it stands up to the same cost-benefit analysis as your top-priority content.
Converting nothing is only a sensible option if, in your judgment, your legacy content has no potential value in digital formats. And is there any organization that can say that?
So how you start? The first step is to identify what you have in terms of volume, formats, completeness, and overall condition (e.g., old paper, incomplete files, etc.). Doing this helps determine value and cost. Then you need to think about your target audience and what they’ll need, which might require user surveys and focus groups. Keep in mind that once the content is discoverable, the audience will likely be larger than you might think. Next you’ll need to decide how the content will be distributed. Will it be offered across all platforms and devices? Subscription model or single purchase? Sold to libraries/consortia/corporations/ or to individuals. Then develop your business case. Think about all the available alternatives in creating your digital content and estimate costs. Determine your potential markets and projected revenue. Some of this will be guesswork, but it’s important to set measurable goals. Finally, get off the starting line and run daylight! (That’s a metaphor; there’s no actual running involved.)
Here’s an example of a large legacy conversion that DCL performed for the Optical Society of America. OSA needed to build a flexible digital repository of its journal content going back to 1917, which comprised 750,000 pages. The content included extensive math, tables, and images in various formats, so we purposely kept the specs fluid to accommodate new content types that arose. This illustrates the point that every legacy content collection is unique and thus requires a customized solution. The new XML repository has already yielded a revenue-generating spinoff image bank. And that highlights one of the real benefits of having a structured content repository – the ability to create new products and revenue streams.
Elsevier wanted to enrich the references in its Scopus bibliographic database, which is their homegrown version of PubMed and CrossRef, beginning with their backlist of articles published prior to 1996. We’re talking about 5 and a half million articles that needed to be inventoried and over 50 million references that needed to be converted to XML. Many of the references were completely unstructured; that doesn’t work well with XML, which is all about structure. We devised an automated inventory and reporting solution, along with an automated process that decomposed the unstructured references into more granular elements and then recomposed them into valid CARS XML. Once the repair process was complete, the references could be validated and linked to Scopus, CrossRef, or PubMed. This increased the value of the database not only to researchers, but also to current and potential institutional subscribers.
Well-structured content has many benefits, with the most important being that it can increase revenue by decreasing time to market and enabling new product development. It also decreases expenses, such as publishing and translation costs, over time, which makes it a smart investment.
Often legacy content is more complex and difficult to manage than new content. In many cases, it was designed for one specific output and not much thought was given to proper storage, retrieval, or reusability. There are also different document types, formats, and levels of complexity, like heavy math and tabular material that was never meant for digital output. This is where the help of a trusted partner can be invaluable in helping you identify, categorize, and convert your content to a well-structured format. Your content should drive your business strategy.
But you can’t structure your content and think your work is done. It’s an ongoing process to keep up with industry standards, compliance, and constantly evolving outputs. Once the major work is done, however, the changes are much easier to manage, and your content is ready for delivery to any output. Content drives every aspect of your business, so make sure yours is ready to take you in the right direction.
Structured content has many uses, with reuse and repurposing the most important in my mind. Why? Because they generate revenue. The others are important, too. Different industries have differing degrees of importance, but money talks in all of them. When your content is structured at a granular level, you can assemble the different components into new products, as the OSA did with the creation of the image bank that I referred to. That wouldn’t have happened if they hadn’t taken the step to convert their legacy content. That’s just one example; there are many, many possibilities once your content has been converted to a structured format.
[Read bullets.] Once you get started, it’s easier than you might think!
I’d like to thank you for tuning in today. Feel free to contact me directly anytime. Now I’m happy to take your questions.