In topic-based information development environments such as DITA, the perennial question is “How large should a topic be?” While the topic-oriented approach has been shown to be suitable for authoring, only two levels of presentation are common: an entire set of content as a book (which may run into hundreds of pages), or component-by-component as authored.
The first (monolithic) presentation type is deficient because readers are not able to easily locate the information relevant to their situation. The second (fragmented) presentation type is deficient because it is not aligned with information-seeking behavior.
We have implemented a document structure of recursively nested information components that allows a web application to dynamically chunk information according to the reader’s behavior. At publication time, the source components are translated to granular chunks with an index, then further compiled into a recursive format that describes nesting and ancestry.
As with other presentation types, the information can be viewed in a single long document or addressed at the base component level. In addition, intermediate levels of presentation are possible. When the user selects an item from the table of contents, the component and all children components are displayed as a single chunk. Parent chunks are accessible through TOC and breadcrumb navigation that is defined in the document. After moving to a parent chunk, the starting point chunk is included as a child.
The varying levels of granularity allow for search results to more accurately present a document properly scoped to the user’s needs.
In addition, dynamic chunking supports well-documented “information foraging” behavior much better than the granular presentation approach, while preserving addressability that is lacking in monolithic presentation.
After exploring the advantages of dynamic chunking, Ben and Owen describe the publishing processes that transform the source DITA content to the dynamic presentation.
This presentation was given at Information Development World on October 1, 2015.
9. 9
Every page is page one
› Every page is a potential entry point
› Sometimes hierarchy and sequence are relevant
› Often hierarchy and sequence are not relevant
› Multiplicity of navigation options is required
10. 10
Information foraging behavior
› Information scent: Users estimate a given hunt’s likely
success from … assessing whether their path
exhibits cues related to the desired outcome.
› Informavores will keep clicking as long as they sense that
they're “getting warmer”—the scent must keep getting
stronger and stronger, or people give up.
› Progress must seem rapid enough to be worth the
predicted effort required to reach the destination.
› As users drill down the site, … provide feedback about
the current location and how it relates to users' tasks.
11. 11
Documentation use cases
1. A new user may want to browse a complete high level
document.
2. A developing user may want an intermediate-sized chunk
that has subject/sequence affinity.
3. An experienced user may want a small chunk with a
particular item of information.
4. A support technician may need to provide a chunk scoped
at an intermediate level to a customer so they are not
overloaded with too much information, but also not given
too little.
13. 13
DITA gets us halfway there
Authoring and management is done at the
topic level
Chunking exists as an approach
but
Chunking control is manual
Chunks are static
14. 14
Ben’s magical solution
If I had an infinite number of monkeys, I could
chunk all topics in all possible combinations
15. 15
Cross-disciplinary thinking to the rescue
› We need a recursive document!
› A document is:
1. A title
2. A globally unique key (document name + sub document ID)
3. A locally unique key (sub document ID)
4. A list of tags
5. A (recursive) list of documents
› DITA is recursive but none of the existing presentation
mechanisms are recursive.
› JSON is a natural way to represent a recursive document.
› XSLT is a natural way to generate such a JSON document.
22. 22
DITA to JSON 4: Sub-document
Field Source
Title Topic title
ID Topic filename
Unique key Top-level document filename +
topic filename
Ancestors List of ancestor topics at all
levels
Summary* Topic shortdesc
Body Topic body
HREF Topic path + topic filename
Documents* List of sub-documents
23. 23
Document Loading Process
Flatten each node Create Unique ID Establish ancestry
Convert relative
image and cross
references to
absolute links
Create a standalone
document of each
node
Load to DB
Load to search
index
31. 31
Conventions
› Images
› All image paths need to be converted to absolute paths. Having all of them in a
flat folder called “images” is one easy way to accomplish this.
› Cross References
› Cross reference links within the JSON are all relative. Like images, they need to
be converted to absolute links.
› JSON Tag Recursion
› It is tedious to add tags to all levels of the JSON Document, so most tags are
programmatically pulled through to all sub documents. Tags can be overridden
in children if desired.
› Permissions – can be set in source
› Anchors not supported
› We currently have a single page app making anchors difficult, but somewhat
irrelevant since each level is available as an independent link.
32. 32
What’s next?
› More publishing automation
› Publishing is currently a 2 step process. JSON Publication followed by document loading.
It would be better to provide a 1 step process controlled by the document publisher.
› Holistic approach
› Search cultivation
› Search analytics
› Chat
› Case Deflection Analysis driving documentation.
› Tag-based navigation
33. 33
Ben is less dissatisfied
Problems solved
• Apparently dynamic presentation
• Satisfactory context-sensitive help targets
• CMS/search loading
Problems not solved
• Static transformations
Problems created
• Content removal
• Proofing
• Custom software
Editor's Notes
Key Points:
At its core, Nutanix eliminates complexity in the datacenter
One of the root causes of complexity is the data storage architecture, specifically the storage network
The Nutanix Virtual Computing Platform gets rid of the SAN and brings compute and storage together for virtualized environments
This approach eliminates network bottlenecks and simplifies the architecture. This is particularly important with flash storage because the network can become a chokepoint for the system
With a Nutanix solution, customers can easily add additional compute and storage by adding nodes on the go
Software documentation
Feature and task
Text, image, video
Context-sensitive help
Release documentation
Release notes
Upgrade instructions
Hardware documentation
Replacement procedures
System specifications
Text, image, video
Were publishing in PDF—bad for findability.
Then publishing also in WebHelp—silos per document.
Difficult to use web CMS (e.g. Drupal) as publishing endpoint—import/update complicated.
High page count
Deep nesting and poor scoping of pages
Mismatch between page (8.5x11) and topic (standalone piece of information, variable length)
Alignment between page and topic
Small pieces without clear scope of relationships--only in TOC with the same deep nesting
From Mark Baker
From Nielsen Norman Group
http://www.nngroup.com/articles/information-scent/
information foraging uses the analogy of wild animals gathering food to analyze how humans collect information online.
Information foraging's most famous concept is information scent: users estimate a given hunt's likely success from the spoor: assessing whether their path exhibits cues related to the desired outcome. Informavores will keep clicking as long as they sense (to mix metaphors) that they're "getting warmer" -- the scent must keep getting stronger and stronger, or people give up. Progress must seem rapid enough to be worth the predicted effort required to reach the destination.
Secondly, as users drill down the site, each page should clearly indicate that they're still on the path to the food. In other words, provide feedback about the current location and how it relates to users' tasks.
Would like to be able to present a page at any of these levels. With the standard tools, only document (monolithic) and topic (fragmented) levels are possible.
Want to keep the granular authoring and management
Manual chunking (using @chunk) is of limited value
Chunking is static
It’s possible to envision how to have multiple chunk outputs but not how to handle them.
Over to Owen.
Is using XSLT too hard? No, the OT already uses it for all output types. Under 300 lines to read HTML2 output and create a single JSON file.
New XSLT for each doc type? No, processing is generic.
Publish JSON, PDF, ePUB
Analyze into 8 pages
Process all possible chunk combinations
A single JSON document is loaded into a DB and a Search Index.
The recursive list of subdocuments is flattened
A single monolithic document is created for each sub-document.
Each recursive node contains ancestry information to create breadcrumbs
Table of Contents
The table of contents is created only for the top level document, not scoped for each subdocument.
Because siblings are shown in scope, a TOC becomes less relevant.
On mobile devices, we can look at TOC or content, saving space.
Links and Images
The JSON document is published with relative links.
The loading process converts these into absolute link.
Your automated loader is your infinite number of monkeys.
Demo hierarchy.ditamap
CSH: Target linked to isn’t just what is obvious but provides more context
Content removal: inconsistency between search results and available docs
Productize?