2. The ‘Document’
• A WordprocessingML document file is a
collection of multiple ‘subdocuments’,
formally called stories:
– The main story
– Header(s) / Footer(s)
– Footnote(s) / Endnote(s)
– Subdocuments
– Frame(s)
– Comment(s)
3. Shared Story Properties
• All stories* in a document share a common
set of properties:
– Style information
– Numbering definitions
– Font information
– Document settings
*with one exception, which we’ll discuss later
4. Style Information
• A style defines a specific set of formatting
properties
– For example, the Normal style in Word 2003 is
defined as:
• Font = Times New Roman
• Font Size = 12 point
• Font Language = anguage of Word (English (US) for me)
• Justification = Left
• Line Spacing = Single
5. Style Types
• Word supports six different types of styles:
– Paragraph styles
– Character styles
– Linked styles (paragraph + character)
– Table styles
– Numbering styles
– Default paragraph and character properties
6. Style Cascading/Inheritance
• Multiple style ‘types’ can be applied to the same part
of a file, so properties are applied in a specific order.
• The properties set by one type can be removed or
supplemented by following types.
• As well, styles of any given type can inherit from
other styles of that type.
– e.g. The Heading 1 paragraph style inherits properties
from the Normal paragraph style
7. Style Application
Table Characters Paragraph List Item
Table
Paragraph
Character
Direct Formatting
Numbering
Applicationorder
Document Defaults
9. Numbering Definitions
• A numbering definition consists of nine levels,
each of which have formatting properties
– Paragraph properties (e.g. margins)
– Number properties (e.g. number text,
justification, character formatting, etc.)
• A numbered paragraph is specified in two
parts:
– The numbering definition instance
– The numbering level
10. Abstract Numbering Definition
• The abstract numbering definition specifies
the properties for any or all of the nine levels
in the list
• A numbering definition instance specifies the
properties for a specific numbering definition
by inheritance:
– References an abstract list definition
– Provides overrides for zero or more levels in the
numbering definition
12. Font Information
• The font information stores two distinct
pieces of information:
– Embedded fonts (when the producer chooses to
embed them)
– Font type data
• The latter provides characteristics of the font
which are used to find a suitable replacement
when the specified font is unavailable
13. Document Settings
• All settings pertinent to the document are stored in
separate parts
• These settings can be divided into two groups:
– Those which affect presentation
• Web settings (e.g. HTML <DIV> and <FRAMESET> data)
• Compatibility options
– ‘Pure’ settings
• View, zoom state
• Defaults
• User preferences (i.e. ‘don’t ask me this again’)
14. Story Content
• Within each story is the actual content, which
consists of what are formally called block level
structures:
– Paragraphs
– Tables
– Custom Markup (structured document tags,
custom XML)
– Range Permissions
15. Story Content
• Within each paragraph are what is formally
called inline structures:
– Runs
– Custom Markup (structured document tags,
custom XML)
– Annotations (comments, tracked changes,
bookmarks)
– DrawingML elements
– Fields
– Hyperlinks
16. Basic Structural Rules
• All text in a word processing document is
contained within runs
– A run is a region of text with a common set of
properties
• All runs must be contained within a paragraph
– A paragraph is a collection of one or more runs
that is displayed as a unit (analogous to the HTML
<P> tag)
18. Basic Structural Rules
• A paragraph may itself be at any location
which allows block level content:
– At the top-most level within a story (e.g. header,
footer, main document)
– Nested within a table cell
– Nested within a structured document tag or
annotation markers
19. Tables
• Similar to HTML tables, a Word table consists
of the table; properties; rows; and cells.
Properties
Row
Cell
20. Tables
• Individual table cells can themselves contain
any block level content
– This means that tables can be nested arbitrarily,
etc.
Nested
table
21. Custom Markup
• Custom markup can be applied within the
contents of any story in a document
• These tags can take one of three forms:
– Smart tags
– Custom XML markup
– Structured document tags
22. Custom Defined XML
• A facility for embedding arbitrary user XML
within the document at either block or inline
levels
23. Structured Document Tags
• Provide granular semantics at either the block
or inline levels
– e.g. region can/cannot be edited; region
can/cannot be deleted; region should show a date
picker/drop-down list/textbox
– Do not affect layout
• Similar to custom XML - without the XML
schema semantics; with presentation data
and more granular properties
24. Sections
• Sections in a word processing document
specify:
– Page properties
• Page size
• Page orientation
• Margins
– Header/footer references
– Footnote/endnote properties
– Column properties
25. Sections
• Sections specify (cont'd):
– Line numbering
– Text direction (RTL vs. LTR; top-to-bottom vs.
bottom-to-top)
26. Sections
• Four types of sections:
– Continuous
– Next page (start on next page)
– Even (start on next even page)
– Odd (start on next odd page)
27. Annotations
• Annotations in a word processing document
store markup information:
– Tracked revisions (insertion, deletion, move)
– Comments
– Bookmarks
31. Headers/Footers
• There are three types of headers and footers
in Word:
– Odd page header
– Even page header (optional)
– First page header (optional)
• If one of the optional types is not specified,
the odd page header is used
32. Headers/Footers
• Headers and footers are stored in separate
parts– one per header or footer
• Each section refers to its header(s)/footer(s)
by an explicit relationship reference:
34. Footnotes/Endnotes
• All footnotes are stored in a single part
– Same applies to all endnotes
• Footnote references are positioned by a
special tag in run content, which specifies the
footnote to reference:
36. Glossary Document
• Remember that exception to the ‘all stories
share the same data’ rule?
• The glossary document is a completely distinct
main story
– Specifies its own styles, lists, fonts, settings
• This story is used to store document
fragments which may be inserted at a later
time
37. File Format Types
• Template (DOTX) – classic “DOT”
• Document (DOCX) – classic “DOC”
• Both utilize the same file format –
differentiation is a function of the main
content type and file extension only
38. Disclaimer
This presentation is for informational purposes only, and should
not be relied upon as a substitute or replacement for Microsoft
formal file format documentation, which is available at the
following website: https://msdn.microsoft.com/en-
us/library/cc313118(v=office.12).aspx. Any views or opinions
presented in this material are solely those of the author and do
not necessarily represent those of Microsoft. Microsoft
disclaims all liability for mistakes or inaccuracies in this
presentation.