Successfully reported this slideshow.
Your SlideShare is downloading. ×

Dynamic chunking of component-authored information

Upcoming SlideShare
My life with MongoDB
My life with MongoDB
Loading in …3
×

Check these out next

1 of 34 Ad
1 of 34 Ad

Dynamic chunking of component-authored information

Download to read offline

Automatically chunk topics to provide varied user experiences of documentation.

Presented at Information Development World 2015.

Automatically chunk topics to provide varied user experiences of documentation.

Presented at Information Development World 2015.

Advertisement
Advertisement

More Related Content

Similar to Dynamic chunking of component-authored information (20)

Advertisement

Dynamic chunking of component-authored information

  1. 1. Dynamic Chunking of Component-Authored Information Ben Colborn Owen Richter Manager, Technical Publications Web Application Architect
  2. 2. 2 Converged compute and storage All intelligence in software Distributed everything Self-healing system Web-scale converged infrastructure Automation and Rich Analytics
  3. 3. 3 Technical publications responsibilities › Software documentation › Release documentation › Hardware documentation › Support knowledge base › Education collaboration › Localization
  4. 4. 4 Problem Ben didn’t like any available options for publishing documentation
  5. 5. 5 Monolithic documentation
  6. 6. 6 Fragmented documentation
  7. 7. 7 Advantages Monolithic •Easy to produce •Familiar for audience •Portable Fragmented •Easy to link •Short page load time •Familiar for authors
  8. 8. 8 Opportunity Growing company; development of new support portal
  9. 9. 9 Every page is page one › Every page is a potential entry point › Sometimes hierarchy and sequence are relevant › Often hierarchy and sequence are not relevant › Multiplicity of navigation options is required
  10. 10. 10 Information foraging behavior › Information scent: Users estimate a given hunt’s likely success from … assessing whether their path exhibits cues related to the desired outcome. › Informavores will keep clicking as long as they sense that they're “getting warmer”—the scent must keep getting stronger and stronger, or people give up. › Progress must seem rapid enough to be worth the predicted effort required to reach the destination. › As users drill down the site, … provide feedback about the current location and how it relates to users' tasks.
  11. 11. 11 Documentation use cases 1. A new user may want to browse a complete high level document. 2. A developing user may want an intermediate-sized chunk that has subject/sequence affinity. 3. An experienced user may want a small chunk with a particular item of information. 4. A support technician may need to provide a chunk scoped at an intermediate level to a customer so they are not overloaded with too much information, but also not given too little.
  12. 12. 12 Document levels Document Part Chapter Section Topic
  13. 13. 13 DITA gets us halfway there  Authoring and management is done at the topic level  Chunking exists as an approach but  Chunking control is manual  Chunks are static
  14. 14. 14 Ben’s magical solution If I had an infinite number of monkeys, I could chunk all topics in all possible combinations
  15. 15. 15 Cross-disciplinary thinking to the rescue › We need a recursive document! › A document is: 1. A title 2. A globally unique key (document name + sub document ID) 3. A locally unique key (sub document ID) 4. A list of tags 5. A (recursive) list of documents › DITA is recursive but none of the existing presentation mechanisms are recursive. › JSON is a natural way to represent a recursive document. › XSLT is a natural way to generate such a JSON document.
  16. 16. 16 JSON generation process DITA Source HTML JSON
  17. 17. 17 Theoretical document: Complete Document 1. Chapter 1.1 Section 2. Chapter 2.1 Section 2.1.1 Topic 2.2 Section 2.2.1 Topic 3. Chapter
  18. 18. 18 Theoretical document: Chunks 1. Chapter 1.1 Section 2. Chapter 2.1 Section 2.1.1 Topic 2.2 Section 2.2.1 Topic 3. Chapter 2.1 Section 2.1.1 Topic 2.2 Section 2.2.1 Topic 2.1.1 Topic 2.2.1 Topic 1.1 Section
  19. 19. 19 DITA to JSON 1: DITAMAP Document Properties Topic References
  20. 20. 20 DITA to JSON 2: HTML index Document Properties Topic References
  21. 21. 21 DITA to JSON 3: JSON Document Properties Topic Topic
  22. 22. 22 DITA to JSON 4: Sub-document Field Source Title Topic title ID Topic filename Unique key Top-level document filename + topic filename Ancestors List of ancestor topics at all levels Summary* Topic shortdesc Body Topic body HREF Topic path + topic filename Documents* List of sub-documents
  23. 23. 23 Document Loading Process Flatten each node Create Unique ID Establish ancestry Convert relative image and cross references to absolute links Create a standalone document of each node Load to DB Load to search index
  24. 24. 24 Search
  25. 25. 25 Task Topic
  26. 26. 26 Chapter
  27. 27. 27 Document
  28. 28. 28 TOC
  29. 29. 29 Multi-modality
  30. 30. 30 DITA output targets 1. PDF: monolithic 2. ePUB: monolithic 3. HTML: fragmented 4. JSON: dynamically chunked
  31. 31. 31 Conventions › Images › All image paths need to be converted to absolute paths. Having all of them in a flat folder called “images” is one easy way to accomplish this. › Cross References › Cross reference links within the JSON are all relative. Like images, they need to be converted to absolute links. › JSON Tag Recursion › It is tedious to add tags to all levels of the JSON Document, so most tags are programmatically pulled through to all sub documents. Tags can be overridden in children if desired. › Permissions – can be set in source › Anchors not supported › We currently have a single page app making anchors difficult, but somewhat irrelevant since each level is available as an independent link.
  32. 32. 32 What’s next? › More publishing automation › Publishing is currently a 2 step process. JSON Publication followed by document loading. It would be better to provide a 1 step process controlled by the document publisher. › Holistic approach › Search cultivation › Search analytics › Chat › Case Deflection Analysis driving documentation. › Tag-based navigation
  33. 33. 33 Ben is less dissatisfied Problems solved • Apparently dynamic presentation • Satisfactory context-sensitive help targets • CMS/search loading Problems not solved • Static transformations Problems created • Content removal • Proofing • Custom software

Editor's Notes

  • Key Points:
    At its core, Nutanix eliminates complexity in the datacenter
    One of the root causes of complexity is the data storage architecture, specifically the storage network
    The Nutanix Virtual Computing Platform gets rid of the SAN and brings compute and storage together for virtualized environments
    This approach eliminates network bottlenecks and simplifies the architecture. This is particularly important with flash storage because the network can become a chokepoint for the system
    With a Nutanix solution, customers can easily add additional compute and storage by adding nodes on the go

  • Software documentation
    Feature and task
    Text, image, video
    Context-sensitive help
    Release documentation
    Release notes
    Upgrade instructions
    Hardware documentation
    Replacement procedures
    System specifications
    Text, image, video
  • Were publishing in PDF—bad for findability.
    Then publishing also in WebHelp—silos per document.
    Difficult to use web CMS (e.g. Drupal) as publishing endpoint—import/update complicated.
  • High page count
    Deep nesting and poor scoping of pages
    Mismatch between page (8.5x11) and topic (standalone piece of information, variable length)
  • Alignment between page and topic
    Small pieces without clear scope of relationships--only in TOC with the same deep nesting
  • From Mark Baker
  • From Nielsen Norman Group
    http://www.nngroup.com/articles/information-scent/

    information foraging uses the analogy of wild animals gathering food to analyze how humans collect information online.

    Information foraging's most famous concept is information scent: users estimate a given hunt's likely success from the spoor: assessing whether their path exhibits cues related to the desired outcome. Informavores will keep clicking as long as they sense (to mix metaphors) that they're "getting warmer" -- the scent must keep getting stronger and stronger, or people give up. Progress must seem rapid enough to be worth the predicted effort required to reach the destination.
    Secondly, as users drill down the site, each page should clearly indicate that they're still on the path to the food. In other words, provide feedback about the current location and how it relates to users' tasks.
  • Would like to be able to present a page at any of these levels. With the standard tools, only document (monolithic) and topic (fragmented) levels are possible.
  • Want to keep the granular authoring and management

    Manual chunking (using @chunk) is of limited value
  • Chunking is static

    It’s possible to envision how to have multiple chunk outputs but not how to handle them.
  • Over to Owen.
  • Is using XSLT too hard? No, the OT already uses it for all output types. Under 300 lines to read HTML2 output and create a single JSON file.

    New XSLT for each doc type? No, processing is generic.

    Publish JSON, PDF, ePUB
  • Analyze into 8 pages
  • Process all possible chunk combinations
  • A single JSON document is loaded into a DB and a Search Index.
    The recursive list of subdocuments is flattened
    A single monolithic document is created for each sub-document.
    Each recursive node contains ancestry information to create breadcrumbs
    Table of Contents
    The table of contents is created only for the top level document, not scoped for each subdocument.
    Because siblings are shown in scope, a TOC becomes less relevant.
    On mobile devices, we can look at TOC or content, saving space.
    Links and Images
    The JSON document is published with relative links.
    The loading process converts these into absolute link.
    Your automated loader is your infinite number of monkeys.
  • Demo hierarchy.ditamap
  • CSH: Target linked to isn’t just what is obvious but provides more context
    Content removal: inconsistency between search results and available docs

    Productize?

×