3. Annotations
• Annotations in WordprocessingML store many
kinds of document ‘markup’:
– Revisions (insertions, deletions, moves)
– Comments
– Bookmarks
– Range-level Permissions
– Spelling and Grammar Errors (cached)
4. Annotations
• Before we discuss each type…
• Annotations as a group present a unique
challenge to an XML format
– They’re inherently not well formed (if we used a
tag to encapsulate them)
Starts in paragraph one
Ends in paragraph two
5. Types of Annotations
• This motivates the fact that annotations in
WordprocessingML are stored in three ways:
– Inline
– ‘Cross Structure’
– Property
6. Inline Annotations
• The most common type
– These actually don’t break the wellformedness of
the paragraph structure
8. ‘Cross Structure’ Annotations
• Word processing documents also contain
annotations which can span parts of multiple
paragraphs
– i.e. wouldn’t be well formed if they were one tag
10. Property Annotations
• The first two buckets cover annotations to
content in the document
– But WordprocessingML allows for revision
marking on properties as well
11. Property Annotations (cont.)
• The rPrChange stores previous sets of
properties on the object (in this case, the run)
– Stored as another run property
12. Revisions
• Refers to the explicit storage of all
modifications to a document
– Used to track document’s evolution
For example, someone changed
a couple of words here and it
was tracked
13. Revisions (cont.)
• Revisions in WordprocessingML contain up to
five pieces of information
– Required
• Type of revision (insertion, deletion, move)
• Unique revision ID
– Optional
• Author information
• Date & time information
• Content of revision (if any)
14. Revisions - Insertions
• Looking at some samples of revisions:
• Insertions are marked via use of the ins
element
16. Revisions - Moves
• WordprocessingML also supports the concept
of moved text
• Moves are tracked in two parts:
– The move start/end location
– The move contents
17. Revisions – Moves (cont.)
• The moveFromRangeStart/End elements
specify the start/end of the move start
• The moveFrom element contains the move
start content
18. Revisions - Moves
• Why two parts?
– The move range is often not the same as the move
contents (because of paragraph marks)
Without this, we would lose the fidelity
as to whether the paragraph is part of
the move
19. Revisions – Custom Markup
• Custom markup is unique in that, like
paragraphs, there are physical characters at
display time but no corresponding runs in
WordprocessingML
Only one run – the tag is
created at display time
20. Revisions – Custom Markup (cont.)
• So what if we delete a tag character?
– We can’t just lose that information
• To do this, we again use the start/end marker
syntax
21. Revisions – Custom Markup (cont.)
• The customXmlDelRangeStart/End elements
tell us a tag character is deleted
– They are linked by an id attribute
22. Comments
• Comments are another document story in
WordprocessingML
– The contents of comments can be any block-level
content
– Can be formatted, etc.
23. Comments Part
• Comments are stored in the comments part in
WordprocessingML
– Reached via an implicit relationship from the
document part
– Each comment is then referenced via its id
• Relationship type
http://schemas.openxmlformats.org/wordprocessingml/2006/comment
s
• Content type
vnd-openxmlformats.officedocument.wordprocessingml-
comments+xml
24. Comments
• A WordprocessingML comment can be divided
into two components:
– The comment anchor (the text on which the
comment applies)
– The comment content (the content of the
comment)
Comment
range
Comment
contents
25. Comment Anchor
• The first part of a comment is the actual
anchor in the document
– Hooks the comment to text
– Creates the reference to the comment in the
comments part
• This is done via two XML
elements
26. Comment Range
• The first part of a comment anchor is
determining the comment range
– Like bookmarks and other annotations, comment
ranges are not required to be well formed
• This is done via the
commentRangeStart/End
elements
27. Comment Reference
• The second part of the comment anchor is an
actual comment reference
– This eastablishes the reference to the specific
comment in the comments part by referencing its
id attrbute
• This is done via the
commentReference
element
This comment
range is linked
to comment 0
28. Comment Contents
• Once we have an anchor range, we need an
actual comment
– Stored in the comments part
Everything in the bubble is
part of the comment
29. Comment Contents (cont.)
• Each comment contains up to five things:
– Required
• Comment ID
• At least one block-level element
– Optional
• Comment author
• Comment author initials
• Comment date and time
30. Comment Contents
• Inside the comment, there is block-level
content which contains the actual comment
Comment content
31. Comment Contents (cont.)
• As well, the annotationRef element specifies
the location of the comment information
block
Comment
information block
33. Bookmarks
• The granddaddy of annotations in
WordprocessingML
– This type of annotation is again a case of
annotations that don’t need to be well formed
Starts in paragraph one,
ends in paragraph two
34. Bookmarks (cont.)
• The bookmarkStart/End elements contain the
start end end marker for a bookmark
– The two are linked by the id attribute
35. Range Permissions
• WordprocessingML range permissions consists
of three main components:
– Start and end anchors to mark content that has
restricted permissions (again)
– ID for linking the start/end marker
– The user(s) who may edit the range
36. Spelling and Grammar Errors
• WordprocessingML spelling and grammar
errors are a cache of proofing state, and
consist of two main components
• Start and end anchors to mark spelling and
grammatical errors
• Type of error
• Spelling
• grammar
37. Spelling and Grammar Errors
(cont.)
• Why store a spelling and grammar cache?
– Performance (don’t recheck 2000 pages every
open)
– Storing spelling exceptions (ignore this word you
think is misspelled, silly computer)
38. Revision Save IDs
• Switching gears a little bit…
• So far, we have focused on annotations as a
record of changes in a document
– However, many people don’t have the foresight to
actually track changes (or don’t want to)
• WordprocessingML also has a facility for
storing unique IDs on objects for the purpose
of comparing two documents
39. Revision Save IDs (cont.)
• You’ve probably seen these in many of the
examples…
40. Revision Save IDs (cont.)
• So what is an rsid*?
– Unique hex number that identifies an editing
session – the editing between any two saves
– These are stored for almost every object in
WordprocessingML
• E.g. paragraphs, run, sections, run properties,
paragraph properties, etc.
• This information tells you whether two runs
were last edited during the same session.
41. Revision Save IDs Example
• In this example, the runs were created in the
same session, and the run was formatted
during a subsequent session
42. Using Revision Save IDs
• With this information, applications can do
more precise comparisons
– An application knows which things are from the
same set of edits
43. Disclaimer
This presentation is for informational purposes only, and should
not be relied upon as a substitute or replacement for Microsoft
formal file format documentation, which is available at the
following website: https://msdn.microsoft.com/en-
us/library/cc313118(v=office.12).aspx. Any views or opinions
presented in this material are solely those of the author and do
not necessarily represent those of Microsoft. Microsoft
disclaims all liability for mistakes or inaccuracies in this
presentation.