3. Paragraphs
• The most basic unit of a WordprocessingML
document
• Analogous to the HTML <p> tag
• A paragraph contains three pieces of
information:
– Paragraph properties
– Inline content
– (optionally) a set of revision IDs used for
document merge and compare
4. Paragraph Example
• A basic paragraph with three different text
formats:
Paragraph properties
Paragraph contents
5. Paragraph Properties
• The paragraph properties are stored on the
pPr element
• This contains all information on the
formatting applied at the paragraph level, as
well as to the paragraph mark character
6. Paragraph Properties
• Paragraph Style
• Keep on same page
with previous/next
paragraph
• Page break before
• Text frame
– Text frame
properties
• Widow/Orphan
control
– Prevents one line of
a paragraph from
being on a different
page
• Numbering
properties
• Paragraph borders
7. Paragraph Properties (cont'd)
• Suppress line
numbering
• Paragraph shading
• Tab stops
• Override
hyphenation
• RTL vs. LTR
• East Asian
typography settings
• Line spacing
• Document grid
settings
– Adjust text to grid
– Snap margins to grid
• Paragraph alignment
8. Paragraph Properties (cont'd)
• Indentation
– Mirror indents?
• Text orientation
(vertical vs.
horizontal)
• Outline level
• HTML <div>
references
• Conditional
formatting
properties (in tables)
• Formatting
properties for the
paragraph mark
character
• Section properties
10. Runs
• A run is a region of text with a common set of
properties
• All text in a word processing document is
contained within runs
• A run contains three pieces of information:
– Run properties
– Run content (e.g. text)
– (optionally) A set of revision IDs for document
comparison
11. Runs
• All runs must be contained within a paragraph
• Producers may break runs whenever they
choose, as long as the net property set for
each run is correct
15. Run Example (cont'd)
• The second example may be less efficient, but
it's equally valid.
16. Run Properties
• The run properties are stored on the rPr
element
• This contains all information on the
formatting applied to the characters in this
run
17. Run Properties
• Character style
• Font face
• Font size
• Bold
• Italic
• ALL CAPS
• Small caps
• Strikethrough
• Double
Strikethrough
• Outline
• Shadow
• Emboss
• Engrave
• Hidden text
18. Run Properties (cont'd)
• Run property
revisions
• Fit text (for East
Asian typography)
• Vertical alignment
• RTL vs. LTR
• Complex script flag
• Emphasis mark
• Language ID of text
• Horizontal in vertical
• Two lines in one
• Math
19. Run Content
• Runs may contain 'run content':
– Text
– Deleted text
– Soft line breaks
– Field codes
– Deleted field codes
– Footnote/endnote reference marks
– Fields
20. Run Content
• Runs may contain 'run content' (cont'd):
• Page numbers
• Tabs
• Ruby text
• DrawingML content
• Embedded objects
• Pictures
21. Text
• The only elements in the main story that can
contain a text node(!)
– All other text is in an attribute value
• There are four types of text in
WordprocessingML:
– Text
– Deleted text
– Field code
– Deleted field codes
22. Text
• Why do we use a different element for
deleted text?
– Good question!
• This allows simple consumers to get the text
of the document easily by just grabbing the
contents of the t node (text)
• They don't need to check where revisions start
and end, etc. to extract the visible contents
23. Disclaimer
This presentation is for informational purposes only, and should
not be relied upon as a substitute or replacement for Microsoft
formal file format documentation, which is available at the
following website: https://msdn.microsoft.com/en-
us/library/cc313118(v=office.12).aspx. Any views or opinions
presented in this material are solely those of the author and do
not necessarily represent those of Microsoft. Microsoft
disclaims all liability for mistakes or inaccuracies in this
presentation.