This document discusses preparing legacy data for automation in S1000D. It outlines the challenges of converting traditional linear documents into the modular structure required by S1000D. These challenges include identifying reusable content, assigning data modules and codes, and structuring information across publications. The document recommends planning thoroughly for a conversion project, including assessing source materials, analyzing content reuse, specifying the conversion, and normalizing data. It describes setting up the conversion project, performing document analysis, and developing a detailed specification to guide the conversion process.
2. Confidential & Proprietarywww.dclab.com 2
Valuable Content Transformed
• Document Digitization
• XML and HTML Conversion
• eBook Production
• Hosted Solutions
• Big Data Automation
• Conversion Management
• Editorial Services
• Harmonizer
3. Confidential & Proprietarywww.dclab.com 3
Experience the DCL Difference
DCL blends years of conversion experience with cutting-edge technology and the
infrastructure to make the process easy and efficient.
• World-Class Services
• Leading-Edge Technology
• Unparalleled Infrastructure
• US-Based Management
• Complex-Content Expertise
• 24/7 Online Project Tracking
• Automated Quality Control
• Global Capabilities
5. Confidential & Proprietarywww.dclab.com 5
. . . Spanning All Industries
• Aerospace
• Associations
• Defense
• Distribution
• Education
• Financial
• Government
• Libraries
• Life Sciences
• Manufacturing
• Medical
• Museums
• Periodicals
• Professional
• Publishing
• Reference
• Research
• Societies
• Software
• STM
• Technology
• Telecommunications
• Universities
• Utilities
6. Confidential & Proprietarywww.dclab.com 6
What Makes S1000D Conversion Difficult
• S1000D is a conceptual departure from linear information – and
is difficult for many to get used to
• Turns the traditional book into a collection of DMs
– Introductory material that applies to numerous DMs
– Placement of Warnings, Cautions and Notes
– Writer creativity
• DMC & business rules.
– Assigning DMCs and ICNs
– Hierarchy in Map Files (Publication Module)
– Data can fit more than one information code
• …but your documents weren’t likely to have been designed to do
this.
7. Confidential & Proprietarywww.dclab.com 7
Structuring a Book into Data Modules in S1000D
IPD
Wiring
Descriptive
Crew
Fault
Appendix B
Procedural
Para 1-1Early S1000D
Publication
Para 1-2
Para 1-3
Para 1-1
Para 3-1
Para 2-1
PDF Book
Para 1-2
38784 Book
Para 2-1
Para 2-2
Appendix A
Para 3-2
Appendix A
Appendix B
S1000D Common Source
Database
Publication 1
Publication2
Subtask
Task
Subtask
ATA Book
Pageblock
Pageblock
Pageblock
Pageblock
Pageblock
Task
Maintenance
Process
Descriptive DM
Procedural DM
IPD DM
Wiring DM
Crew DM
Process DM
Maintenance DM
Fault DM
IPD
Wiring
Descriptive
Crew
Fault
Procedural
Maintenance
Process
Process
Wiring
Procedural
Descriptive
Fault
Crew
Process
Publication3
8. Confidential & Proprietarywww.dclab.com 8
Further Complications in S1000D Conversion
• There’s the usual conversion issues
– Accuracy of the transferred text
– Tables
– Math or odd looking text
– Special Characters
• There’s also the structuring issues
– Identifying DMs
– Identifying reusable content
– Identifying Applicability
• And the people issues
– Getting rugged individualists to collaborate more
– Deciding what needs re-authoring
– Getting used to a new “document” paradigm
9. Confidential & Proprietarywww.dclab.com 9
Most Importantly – Plan!!!
• Ask the important initial questions
˗ Who are the stakeholders. Who is the final client/user?
˗ What is the estimated volume and deadline?
˗ Source format. Not all source data are created equal.
˗ What version of S1000D?
˗ Do we know what CMS or rendering tools will be used?
˗ Budget?
• Ask around or join discussion groups.
• Get your hands on the source data, business rules, and schemas.
• Begin looking for the right people. You don’t need to be a S1000D savvy
but you do at a minimum understand the concept.
14. Confidential & Proprietarywww.dclab.com
Inventory & Assessment
• Log the batches received into a production control system.
• By logging and tracking each unit you can gather information
that can be used to:
– Project delivery schedules
– Confirm that processes are working properly
– Track each unit and show you in what step of the production
process it’s in.
15. Confidential & Proprietarywww.dclab.com 15
Inventory & Assessment: What to Convert, and in What Order
• Categorizing
– Active documents in good shape
– Active documents that need a lot of work
– Somewhat inactive document that will likely be retired
– Archival materials
• Prioritizing
– Documents that are most used
– Documents that are customer favorites
– Documents with longest product life
– Start with most recent documents and go back
• Identifying the process
– Can be converted as is
– Can be converted with some work
– Needs to be rewritten
– Don’t convert – just keep archival copies
19. Confidential & Proprietarywww.dclab.com 19
Content Reuse Analysis Reports
• Finding exact or similar text will help you when mapping to Data Modules
• It will also help to detect applicability and inconsistencies
22. Confidential & Proprietarywww.dclab.com 22
Document Analysis & Conversion Specification
• Evaluate document sources to determine the
relative ease & accuracy of content extraction
• Identify metadata sources
• Identify the types of information in the documents
and the appropriate level of tagging
• Identify processes for various materials
• Detailed analysis of documents by type
• Review enough documents to understand the
potential variations
• Develop tagging instructions
• Prepare specification
• Normalize your data
34. Confidential & Proprietarywww.dclab.com 34
Q&A
Naveh Greenberg
Director, U.S. Defense Development,
Data Conversion Laboratory
(718) 307-5758
ngreenberg@dclab.com
@dclaboratory
Editor's Notes
-there’s a lot more components to getting a conversion project done than most people think
-and there’s a lot more things that need to be setup so that there’s no surprise, or rework, later when you’re chunking things out
-I tried to lay out the common tasks that I would expect in a large conversion project – there are of course some variations – but these are the major ones
-traditionally most of this was done by whoever was “in charge of the conversion” – and that’ was the predominant model until a few years ago.
-what we’re finding today is that many times a hybrid model – where different parties handle some of the task might work better, especially when the client company already has significant resources for some of the tasks, but needs expertise for others
-later in this talk I will discuss several case studies of how this might work
-but first, I would like to through what the various steps are, and a little about what gets done in which one
-these two wheels represent the various tasks – the left wheel, read clockwise, represents what gets done to get set up, and the right wheel represents the production tasks.
-there’s a lot more components to getting a conversion project done than most people think
-and there’s a lot more things that need to be setup so that there’s no surprise, or rework, later when you’re chunking things out
-I tried to lay out the common tasks that I would expect in a large conversion project – there are of course some variations – but these are the major ones
-traditionally most of this was done by whoever was “in charge of the conversion” – and that’ was the predominant model until a few years ago.
-what we’re finding today is that many times a hybrid model – where different parties handle some of the task might work better, especially when the client company already has significant resources for some of the tasks, but needs expertise for others
-later in this talk I will discuss several case studies of how this might work
-but first, I would like to through what the various steps are, and a little about what gets done in which one
-these two wheels represent the various tasks – the left wheel, read clockwise, represents what gets done to get set up, and the right wheel represents the production tasks.
-there’s a lot more components to getting a conversion project done than most people think
-and there’s a lot more things that need to be setup so that there’s no surprise, or rework, later when you’re chunking things out
-I tried to lay out the common tasks that I would expect in a large conversion project – there are of course some variations – but these are the major ones
-traditionally most of this was done by whoever was “in charge of the conversion” – and that’ was the predominant model until a few years ago.
-what we’re finding today is that many times a hybrid model – where different parties handle some of the task might work better, especially when the client company already has significant resources for some of the tasks, but needs expertise for others
-later in this talk I will discuss several case studies of how this might work
-but first, I would like to through what the various steps are, and a little about what gets done in which one
-these two wheels represent the various tasks – the left wheel, read clockwise, represents what gets done to get set up, and the right wheel represents the production tasks.
-there’s a lot more components to getting a conversion project done than most people think
-and there’s a lot more things that need to be setup so that there’s no surprise, or rework, later when you’re chunking things out
-I tried to lay out the common tasks that I would expect in a large conversion project – there are of course some variations – but these are the major ones
-traditionally most of this was done by whoever was “in charge of the conversion” – and that’ was the predominant model until a few years ago.
-what we’re finding today is that many times a hybrid model – where different parties handle some of the task might work better, especially when the client company already has significant resources for some of the tasks, but needs expertise for others
-later in this talk I will discuss several case studies of how this might work
-but first, I would like to through what the various steps are, and a little about what gets done in which one
-these two wheels represent the various tasks – the left wheel, read clockwise, represents what gets done to get set up, and the right wheel represents the production tasks.