No Ki Magic
Or
Hyperdocument Authoring Link Management
Using Git and XQuery in Service of an Abstract
Hyperdocument Management Model Applied to
DITA Hyperdocuments
8/14/2015 Contrext, LLC 1
Eliot Kimber
Contrext, LLC
Balisage 2015
WHAT AM I TALKING ABOUT?
8/14/2015 Contrext, LLC 2
Link Management and
Configuration Management
8/14/2015 Contrext, LLC 3
DITA
8/14/2015 Contrext, LLC 4
Solution Implementation
8/14/2015 Contrext, LLC 5
LINK AND CONFIGURATION
MANAGEMENT
8/14/2015 Contrext, LLC 6
The Problems
• As an author: What can I link to and how do I
address it?
• As an authoring tool: What does this indirect
address point to?
• As a deliverable producer: What is the set of
resources I require in order to produce a
deliverable from the input publication source?
• As a manager: What is the version-specific
configuration of this publication in a specific
repository access context?
8/14/2015 Contrext, LLC 7
The Essential Issue
• Given a collection of source components with
links among them and managed through
asynchronous revision processes, what is the
time-specific configuration of those
components at any moment in time as viewed
by a given agent for a specific purpose?
• In DITA terms: When I process a map in a
specific access context, what do I see and
what can I see?
8/14/2015 Contrext, LLC 8
Interlude: A (bit of a) Poem
Time present and time past
Are both perhaps present in time future,
And time future contained in time past.
If all time is eternally present
All time is unredeemable.
What might have been is an abstraction
Remaining a perpetual possibility
Only in a world of speculation.
What might have been and what has been
Point to one end, which is always present.
…
—T.S Eliot, "Four Quartets 1: Burnt Norton"
8/14/2015 Contrext, LLC 9
BACKGROUND
8/14/2015 Contrext, LLC 10
Aikido
• A defensive martial art based on blending with an
attacker's energy, capturing their balance, and
redirecting their energy in order to return them
to harmony
• Goal of Aikido is ultimately universal peace and
harmony
• There is no one true way to do Aikido
– Aikidosa are expected to develop their own
expression and interpretation of Aikido as they
develop their skills
• It's all about connection
8/14/2015 Contrext, LLC 11
DITA
• A standard XML application architecture for human-
consumed documents
• Optimized for interchange and interoperation of
content, processing, and DITA-specific knowledge
• Distinguishing architectural features:
– Specialization: enables controlled extension from base
DITA markup vocabulary
– Use-by reference: Content components can be used in
multiple contexts (DITA maps, content reference)
– Indirect addressing: keys and key references
– Designed to work entirely from a file system
• DITA is all about connection
8/14/2015 Contrext, LLC 12
Another Poem
If you have not
Linked yourself
To true emptiness,
You will never understand
The Art of Peace.
—Morihei Ueshiba, The Art of Peace,
translated by John Stevens.
8/14/2015 Contrext, LLC 13
Direct vs. Indirect Addressing
8/14/2015 Contrext, LLC 14
• Blend and redirect to appropriate target
• Harder to learn and execute but more
effective
• Many options at time of action
• Death does not result
Indirect addressing
• Quick, effective, fragile.
• Relatively easy to learn and execute
• Predetermined response to a given attack
• Death results
Direct addressing
Indirection Is Necessary For
Survival
• Direct addressing is preferred for delivery
– Fast, uncomplicated, reliable,
• Indirect addressing is required for authoring
– Flexible, robust, complicated
– The link must live to link another day
• Allows binding same address to different targets in
different use contexts
• Without indirection many authoring and configuration
use cases cannot be satisfied
• Prefer a standard, interoperable way to do indirect
addressing
8/14/2015 Contrext, LLC 15
Different Use Contexts
• Same component used
multiple times in the same
hyperdocument
• Same component used in
different hyperdocuments
• Same component used in
different versions in time of
a given hyperdocument
8/14/2015 Contrext, LLC 16
Map
1
Topic
A
Topic
A
Map
1
Topic
A
Map
2
Map 1
V1
Topic A
V1
Map 1
V2
DITA Maps and Topics
• Topics: XML documents that contain content
– All content is contained by topics
– Topics are intended to be more-or-less context
independent
• Maps: XML documents containing nothing but
links
– Links to other maps
– Links to topics
– Links to non-DITA things
8/14/2015 Contrext, LLC 17
Map 1
Topic A
Topic B
DITA Keys (No Magic)
• Keys are defined in maps
• Key definition binds a key name to a resource
• Resource can be a topic, a map, or a non-DITA
thing (image, Web site, etc.)
• Same key name can have different bindings in
different maps
• A key reference can be used any place a direct
URI reference is allowed
8/14/2015 Contrext, LLC 18
ABSTRACT VERSION AND
LINK MANAGEMENT MODEL
8/14/2015 Contrext, LLC 19
Snapshot-Based Configuration
Management (SnapCM)
• First formulated around 1999 by Heintz, Kimber,
et. al.
• Combines Heintz' version management insights
with Kimber's hyperdocument representation
and management insights
• Driven in large part by experience with legislative
document management workflows and business
requirements (bill drafting)
– Arguably the hardest set of requirements one could
have
8/14/2015 Contrext, LLC 20
Branches and Snapshots
• A Repository contains Resources
• Resources have Versions
• A Repository has one or more
Branches
• A Branch is a linear sequence of
Snapshots
• A Snapshot points to zero or more
Versions
– Constraint: no two Versions have
the same Resource
8/14/2015 Contrext, LLC 21
Configuration Management
• By default, can only see versions on Branch
– Current Snapshot
– Earlier Snapshots
• A link to a Resource is resolved using a
"resolution policy""
• Default policy is "on Snapshot"
• Thus, a Snapshot represents a version-specific
configuration of a set of Resources
8/14/2015 Contrext, LLC 22
SOLUTION: DITA FOR SMALL
TEAMS
8/14/2015 Contrext, LLC 23
DITA for Small Teams (DFST)
• Show how open-source tools can be combined to
create a reasonable DITA authoring and production
support system
• Four main parts:
– Versioned content storage: git, mercurial, etc.
– Authoring: oXygen XML, etc.
– Production and delivery: Continuous integration + DITA
Open Toolkit
– Link Management: Under development
• Link management is the one missing piece
• I'm implementing link management for use in the DFST
context
8/14/2015 Contrext, LLC 24
Git-Based DFST
8/14/2015 Contrext, LLC 25
Git
Repository Git Hooks
Link
management
Processing
Authoring Environment
Link
Managemen
t
Repository
Web App
Link Management Deliverable Production
CI
Server
Git
Repository
DITA
OT
Git push
Deliverabl
e
Deliverabl
e
Deliverable
Git As the Repository
• Git's versioning model close match to the
abstract model
• Does not, by itself, provide branch-specific
access control
• Can get the effect by having multiple clones
with different branches exposed
• Git hooks feed updates to Link Manager
8/14/2015 Contrext, LLC 26
Link Management
• DITA-specific XQuery application: BaseX, XQuery 3.1
• Maintains where-used index based on links in the
source documents
• Implements DITA key space construction and key
resolution
• Fundamentally just data processing
• Some tricky bits due to DITA features:
– Map trees
– Conditional key definitions and map references
– Key scopes (DITA 1.3)
– Branch filtering (DITA 1.3)
8/14/2015 Contrext, LLC 27
Git For Versioning Model
• Git branch = SnapCM Branch
• Git commit = SnapCM Snapshot
• Link management repository mirrors git
repository/branch organization
• Current implementation only reflects current
commit
– Could reflect any commits, just costs storage
• Git atomic commit of multiple objects allows
consistent link management state
8/14/2015 Contrext, LLC 28
No Key Magic:
Link Management (LM) Database
• XQuery database (BaseX)
– Heavy dependence on XPath 3.1 (maps)
– Would be much less convenient without maps
• One top-level collection per git repo/branch
pair
• Parallel link metadata database with link
management "index"
• Functions to encapsulate the git nature of the
database organization
8/14/2015 Contrext, LLC 29
Where-Used Index
• Each target doc has a directory in the LM
database
• Directory contains one or more use records
recording details of the linking element
• Where-used query:
– Is there a directory for the target doc?
• No: Not used
• Yes: get use records
8/14/2015 Contrext, LLC 30
Direct Links
• Find all links: //*[@href]
• Resolve the addresses
• Record use records
8/14/2015 Contrext, LLC 31
<dfst:useRecord xmlns:dfst="http://dita-for-small-teams.org"
resourceKey="bL1LeEVFr4lAgv77oEaECA==^1.2"
targetDoc="dfst^dfst-sample-project^master/docs/topic-01.dita"
usingDoc="dfst^dfst-sample-project^master/docs/pub-02.ditamap"
linkType="topicref"
linkClass="- map/topicref "
linkContext="navtree"
format="dita"
scope="local">
<title>Publication Two</title>
</dfst:useRecord>
Indirect Links
• Find all maps: /*[contains(@class, ' map/map ')]
• Generate resolved maps that reflect directly-referenced
submaps—store in LM database.
• Construct key space documents from resolved maps.
• Use generated IDs to correlate key definitions in
resolved maps and keys spaces to content key
definitions
• Find all indirect links: //*[@keyref]
• Resolve indirect links to targets
• Record use records
8/14/2015 Contrext, LLC 32
Link Management Web App
• RESTXQ Web app
– Web pages
– REST API
– Quick and easy to implement
• Report on whatever is interesting about the link
nature of the content
– Where is something used?
– What are the links?
– Map structures
– Dependencies emanating from a given object
8/14/2015 Contrext, LLC 33
Demo
• Oops, out of time
8/14/2015 Contrext, LLC 34
CONCLUSIONS AND FUTURE
WORK
8/14/2015 Contrext, LLC 35
What Was Easy?
• Git for versioned hyperdocument source
management: direct match to SnapCM model
• BaseX: Easy to set up and use for DITA content
– Direct support for XML catalogs
– RESTXQ implementation
– Lightweight installation
• Direct address resolution
• DITA map resolution (ignored harder bits for
now)
8/14/2015 Contrext, LLC 36
What Was Hard
• Key space construction
– I struggled to work with XQuery 3.1 maps
– No code authoring support for complex maps
• I miss my Java IDE (I am weak and feeble from my
dependence on strongly-typed language programming)
– Scoped keys add data processing complexity
aggravated by my weak map fu
• XQuery update does not allow naïve
approaches to LM database population
8/14/2015 Contrext, LLC 37
Future Work
• Finish out DITA key space construction (key scopes,
branch filtering, dynamic conditional processing)
• Finish out basic link management reporting features
• Implement basic REST API for accessing link
management information
• Docker container packaging for ease of deployment
• Tighter integration with authoring tools
• Better error reporting for link management data
processing
• Oh, yeah, documentation…
8/14/2015 Contrext, LLC 38
Questions?
8/14/2015 Contrext, LLC 39
Resources
• DITA for Small Teams:
https://github.com/dita-for-small-teams
• Me: ekimber@contrext.com,
http://contrext.com
8/14/2015 Contrext, LLC 40

No Ki Magic: Managing Complex DITA Hyperdocuments

  • 1.
    No Ki Magic Or HyperdocumentAuthoring Link Management Using Git and XQuery in Service of an Abstract Hyperdocument Management Model Applied to DITA Hyperdocuments 8/14/2015 Contrext, LLC 1 Eliot Kimber Contrext, LLC Balisage 2015
  • 2.
    WHAT AM ITALKING ABOUT? 8/14/2015 Contrext, LLC 2
  • 3.
    Link Management and ConfigurationManagement 8/14/2015 Contrext, LLC 3
  • 4.
  • 5.
  • 6.
  • 7.
    The Problems • Asan author: What can I link to and how do I address it? • As an authoring tool: What does this indirect address point to? • As a deliverable producer: What is the set of resources I require in order to produce a deliverable from the input publication source? • As a manager: What is the version-specific configuration of this publication in a specific repository access context? 8/14/2015 Contrext, LLC 7
  • 8.
    The Essential Issue •Given a collection of source components with links among them and managed through asynchronous revision processes, what is the time-specific configuration of those components at any moment in time as viewed by a given agent for a specific purpose? • In DITA terms: When I process a map in a specific access context, what do I see and what can I see? 8/14/2015 Contrext, LLC 8
  • 9.
    Interlude: A (bitof a) Poem Time present and time past Are both perhaps present in time future, And time future contained in time past. If all time is eternally present All time is unredeemable. What might have been is an abstraction Remaining a perpetual possibility Only in a world of speculation. What might have been and what has been Point to one end, which is always present. … —T.S Eliot, "Four Quartets 1: Burnt Norton" 8/14/2015 Contrext, LLC 9
  • 10.
  • 11.
    Aikido • A defensivemartial art based on blending with an attacker's energy, capturing their balance, and redirecting their energy in order to return them to harmony • Goal of Aikido is ultimately universal peace and harmony • There is no one true way to do Aikido – Aikidosa are expected to develop their own expression and interpretation of Aikido as they develop their skills • It's all about connection 8/14/2015 Contrext, LLC 11
  • 12.
    DITA • A standardXML application architecture for human- consumed documents • Optimized for interchange and interoperation of content, processing, and DITA-specific knowledge • Distinguishing architectural features: – Specialization: enables controlled extension from base DITA markup vocabulary – Use-by reference: Content components can be used in multiple contexts (DITA maps, content reference) – Indirect addressing: keys and key references – Designed to work entirely from a file system • DITA is all about connection 8/14/2015 Contrext, LLC 12
  • 13.
    Another Poem If youhave not Linked yourself To true emptiness, You will never understand The Art of Peace. —Morihei Ueshiba, The Art of Peace, translated by John Stevens. 8/14/2015 Contrext, LLC 13
  • 14.
    Direct vs. IndirectAddressing 8/14/2015 Contrext, LLC 14 • Blend and redirect to appropriate target • Harder to learn and execute but more effective • Many options at time of action • Death does not result Indirect addressing • Quick, effective, fragile. • Relatively easy to learn and execute • Predetermined response to a given attack • Death results Direct addressing
  • 15.
    Indirection Is NecessaryFor Survival • Direct addressing is preferred for delivery – Fast, uncomplicated, reliable, • Indirect addressing is required for authoring – Flexible, robust, complicated – The link must live to link another day • Allows binding same address to different targets in different use contexts • Without indirection many authoring and configuration use cases cannot be satisfied • Prefer a standard, interoperable way to do indirect addressing 8/14/2015 Contrext, LLC 15
  • 16.
    Different Use Contexts •Same component used multiple times in the same hyperdocument • Same component used in different hyperdocuments • Same component used in different versions in time of a given hyperdocument 8/14/2015 Contrext, LLC 16 Map 1 Topic A Topic A Map 1 Topic A Map 2 Map 1 V1 Topic A V1 Map 1 V2
  • 17.
    DITA Maps andTopics • Topics: XML documents that contain content – All content is contained by topics – Topics are intended to be more-or-less context independent • Maps: XML documents containing nothing but links – Links to other maps – Links to topics – Links to non-DITA things 8/14/2015 Contrext, LLC 17 Map 1 Topic A Topic B
  • 18.
    DITA Keys (NoMagic) • Keys are defined in maps • Key definition binds a key name to a resource • Resource can be a topic, a map, or a non-DITA thing (image, Web site, etc.) • Same key name can have different bindings in different maps • A key reference can be used any place a direct URI reference is allowed 8/14/2015 Contrext, LLC 18
  • 19.
    ABSTRACT VERSION AND LINKMANAGEMENT MODEL 8/14/2015 Contrext, LLC 19
  • 20.
    Snapshot-Based Configuration Management (SnapCM) •First formulated around 1999 by Heintz, Kimber, et. al. • Combines Heintz' version management insights with Kimber's hyperdocument representation and management insights • Driven in large part by experience with legislative document management workflows and business requirements (bill drafting) – Arguably the hardest set of requirements one could have 8/14/2015 Contrext, LLC 20
  • 21.
    Branches and Snapshots •A Repository contains Resources • Resources have Versions • A Repository has one or more Branches • A Branch is a linear sequence of Snapshots • A Snapshot points to zero or more Versions – Constraint: no two Versions have the same Resource 8/14/2015 Contrext, LLC 21
  • 22.
    Configuration Management • Bydefault, can only see versions on Branch – Current Snapshot – Earlier Snapshots • A link to a Resource is resolved using a "resolution policy"" • Default policy is "on Snapshot" • Thus, a Snapshot represents a version-specific configuration of a set of Resources 8/14/2015 Contrext, LLC 22
  • 23.
    SOLUTION: DITA FORSMALL TEAMS 8/14/2015 Contrext, LLC 23
  • 24.
    DITA for SmallTeams (DFST) • Show how open-source tools can be combined to create a reasonable DITA authoring and production support system • Four main parts: – Versioned content storage: git, mercurial, etc. – Authoring: oXygen XML, etc. – Production and delivery: Continuous integration + DITA Open Toolkit – Link Management: Under development • Link management is the one missing piece • I'm implementing link management for use in the DFST context 8/14/2015 Contrext, LLC 24
  • 25.
    Git-Based DFST 8/14/2015 Contrext,LLC 25 Git Repository Git Hooks Link management Processing Authoring Environment Link Managemen t Repository Web App Link Management Deliverable Production CI Server Git Repository DITA OT Git push Deliverabl e Deliverabl e Deliverable
  • 26.
    Git As theRepository • Git's versioning model close match to the abstract model • Does not, by itself, provide branch-specific access control • Can get the effect by having multiple clones with different branches exposed • Git hooks feed updates to Link Manager 8/14/2015 Contrext, LLC 26
  • 27.
    Link Management • DITA-specificXQuery application: BaseX, XQuery 3.1 • Maintains where-used index based on links in the source documents • Implements DITA key space construction and key resolution • Fundamentally just data processing • Some tricky bits due to DITA features: – Map trees – Conditional key definitions and map references – Key scopes (DITA 1.3) – Branch filtering (DITA 1.3) 8/14/2015 Contrext, LLC 27
  • 28.
    Git For VersioningModel • Git branch = SnapCM Branch • Git commit = SnapCM Snapshot • Link management repository mirrors git repository/branch organization • Current implementation only reflects current commit – Could reflect any commits, just costs storage • Git atomic commit of multiple objects allows consistent link management state 8/14/2015 Contrext, LLC 28
  • 29.
    No Key Magic: LinkManagement (LM) Database • XQuery database (BaseX) – Heavy dependence on XPath 3.1 (maps) – Would be much less convenient without maps • One top-level collection per git repo/branch pair • Parallel link metadata database with link management "index" • Functions to encapsulate the git nature of the database organization 8/14/2015 Contrext, LLC 29
  • 30.
    Where-Used Index • Eachtarget doc has a directory in the LM database • Directory contains one or more use records recording details of the linking element • Where-used query: – Is there a directory for the target doc? • No: Not used • Yes: get use records 8/14/2015 Contrext, LLC 30
  • 31.
    Direct Links • Findall links: //*[@href] • Resolve the addresses • Record use records 8/14/2015 Contrext, LLC 31 <dfst:useRecord xmlns:dfst="http://dita-for-small-teams.org" resourceKey="bL1LeEVFr4lAgv77oEaECA==^1.2" targetDoc="dfst^dfst-sample-project^master/docs/topic-01.dita" usingDoc="dfst^dfst-sample-project^master/docs/pub-02.ditamap" linkType="topicref" linkClass="- map/topicref " linkContext="navtree" format="dita" scope="local"> <title>Publication Two</title> </dfst:useRecord>
  • 32.
    Indirect Links • Findall maps: /*[contains(@class, ' map/map ')] • Generate resolved maps that reflect directly-referenced submaps—store in LM database. • Construct key space documents from resolved maps. • Use generated IDs to correlate key definitions in resolved maps and keys spaces to content key definitions • Find all indirect links: //*[@keyref] • Resolve indirect links to targets • Record use records 8/14/2015 Contrext, LLC 32
  • 33.
    Link Management WebApp • RESTXQ Web app – Web pages – REST API – Quick and easy to implement • Report on whatever is interesting about the link nature of the content – Where is something used? – What are the links? – Map structures – Dependencies emanating from a given object 8/14/2015 Contrext, LLC 33
  • 34.
    Demo • Oops, outof time 8/14/2015 Contrext, LLC 34
  • 35.
  • 36.
    What Was Easy? •Git for versioned hyperdocument source management: direct match to SnapCM model • BaseX: Easy to set up and use for DITA content – Direct support for XML catalogs – RESTXQ implementation – Lightweight installation • Direct address resolution • DITA map resolution (ignored harder bits for now) 8/14/2015 Contrext, LLC 36
  • 37.
    What Was Hard •Key space construction – I struggled to work with XQuery 3.1 maps – No code authoring support for complex maps • I miss my Java IDE (I am weak and feeble from my dependence on strongly-typed language programming) – Scoped keys add data processing complexity aggravated by my weak map fu • XQuery update does not allow naïve approaches to LM database population 8/14/2015 Contrext, LLC 37
  • 38.
    Future Work • Finishout DITA key space construction (key scopes, branch filtering, dynamic conditional processing) • Finish out basic link management reporting features • Implement basic REST API for accessing link management information • Docker container packaging for ease of deployment • Tighter integration with authoring tools • Better error reporting for link management data processing • Oh, yeah, documentation… 8/14/2015 Contrext, LLC 38
  • 39.
  • 40.
    Resources • DITA forSmall Teams: https://github.com/dita-for-small-teams • Me: ekimber@contrext.com, http://contrext.com 8/14/2015 Contrext, LLC 40