Day Of Dot Net Ann Arbor 2007


Published on

My presentation on Office 2007/OpenXML file formats

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Day Of Dot Net Ann Arbor 2007

    1. 1. Creating Office Documents with Open XML David Truxall, Ph.D. Principal Consultant NuSoft Solutions
    2. 2. Agenda <ul><li>Overview </li></ul><ul><li>System.IO.Packaging </li></ul><ul><li>Building Documents with .Net </li></ul>
    3. 3. Open XML <ul><li>A Standard that describes a family of XML schemas (Ecma Standard) </li></ul><ul><li>Defines the XML vocabularies for word-processing, spreadsheet, and presentation documents </li></ul><ul><li>Defines the packaging of documents that conform to these schemas </li></ul>
    4. 4. Features of Office Open XML
    5. 5. Support for Open XML <ul><li>iPhone </li></ul><ul><li>iWork </li></ul><ul><li>Microsoft Office </li></ul><ul><li>OpenOffice </li></ul><ul><li>Gnumeric </li></ul><ul><li>WordPerfect </li></ul><ul><li>Palm OS </li></ul><ul><li>NeoOffice </li></ul><ul><li>PHP </li></ul><ul><li>Java </li></ul><ul><li>Monarch v.9.0 </li></ul><ul><li>OpenXML Writer </li></ul><ul><li>Word Counter 2.2.1 </li></ul><ul><li>Altsoft XML2PDF </li></ul><ul><li>MindMapping </li></ul><ul><li>XmlSpy </li></ul>
    6. 6. Open XML Format Architecture User view: single Office file <ul><li>Document Parts </li></ul><ul><li>Most parts are XML </li></ul><ul><li>Each XML part is a discrete component </li></ul><ul><li>Can add, extract and modify individual parts without using Office programs </li></ul><ul><li>Corruption of any part would not prohibit the file from opening </li></ul>Developer view: modular file File Container Document Properties Comments WordML / Spreadsheet ML Custom XML Embedded Code Images / Video / Sound
    7. 7. Open Packaging Organization <ul><li>Package – The container (a ZIP archive) </li></ul><ul><li>Document Parts – The files inside the container </li></ul><ul><li>Relationships – Every part that references other parts does so via a relationship </li></ul>Document Properties Application Properties Custom Properties Sheet 1 Sheet 2 Sheet 3 Strings Theme Workbook
    8. 8. Exploring the Document Package
    9. 9. Reference Schemas <ul><li>Xml Reference Schemas </li></ul><ul><ul><li>80+ that make up the standard </li></ul></ul><ul><ul><ul><li>Display oriented </li></ul></ul></ul><ul><ul><ul><li>Document format </li></ul></ul></ul><ul><li>Custom Schemas </li></ul><ul><ul><li>Specific to your business </li></ul></ul><ul><ul><ul><li>Data oriented </li></ul></ul></ul><ul><ul><ul><li>Business information </li></ul></ul></ul>
    10. 10. Custom XML Content <ul><li>Enables interoperability with other systems </li></ul><ul><ul><li>Documents can provide a rich view of back-end data sources </li></ul></ul><ul><ul><li>Documents can update back-end data sources </li></ul></ul><ul><li>Exposes business data in Open XML documents </li></ul><ul><ul><li>Heterogenous systems can easily read data from documents </li></ul></ul><ul><ul><li>Business-specific semantics can be applied to document data </li></ul></ul><ul><li>Separates presentation and data </li></ul><ul><ul><li>Simplified programming model for all of the above </li></ul></ul><ul><li>Custom XML schema support was a key design objective for Open XML: any schema can be used in Open XML documents. </li></ul>
    11. 11. System.IO.Packaging <ul><li>Part of Windows Presentation Foundation </li></ul><ul><li>Installed with .NET 3.0 </li></ul><ul><li>Requires .NET 2.0 Runtime </li></ul><ul><li>Enables package manipulation for </li></ul><ul><ul><li>Office Open XML File Formats </li></ul></ul><ul><ul><li>XML Paper Specification Files </li></ul></ul><ul><ul><li>Any Open Packaging Convention files </li></ul></ul>
    12. 12. The Package <ul><li>Package Class </li></ul><ul><li>Provides methods to create, enumerate and delete the following entities: </li></ul><ul><ul><li>Package </li></ul></ul><ul><ul><li>Package Properties </li></ul></ul><ul><ul><li>PackageRelationships </li></ul></ul><ul><ul><li>PackageParts </li></ul></ul>Common Package Parts Package Relationships Core Properties Digital Signatures Specific Format Parts Office Document Part Relationships XML Part XML Part Part Rels Etc …
    13. 13. The PackagePart <ul><li>A PackagePart is the object of data within the Package </li></ul><ul><li>It provides support to create, enumerate and delete part relationships </li></ul><ul><li>Get data as a System.IO.Stream </li></ul><ul><li>PackagePart properties: </li></ul><ul><ul><li>CompressionOption </li></ul></ul><ul><ul><li>ContentType </li></ul></ul><ul><ul><li>Package </li></ul></ul><ul><ul><li>Uri </li></ul></ul>
    14. 14. PackageRelationship <ul><li>Required to find parts (part names are not guaranteed) </li></ul><ul><li>Iterate through a RelationshipCollection by type or ID </li></ul><ul><li>Relationship Properties </li></ul><ul><ul><li>ID </li></ul></ul><ul><ul><li>Package </li></ul></ul><ul><ul><li>RelationshipType </li></ul></ul><ul><ul><li>SourceUri </li></ul></ul><ul><ul><li>TargetMode </li></ul></ul><ul><ul><li>TargetUri </li></ul></ul>
    15. 15. Package Uri Helper <ul><li>Find a related PackagePart by searching relationships, either by relationship type or relationship ID </li></ul><ul><ul><li>This returns a list of PackageRelationship objects </li></ul></ul><ul><li>A PackageRelationship defines two relative URIs </li></ul><ul><ul><li>Source URI, pointing to the source PackagePart </li></ul></ul><ul><ul><li>Target URI, pointing to the target PackagePart </li></ul></ul><ul><li>Retrieve a PackagePart by using a URI relative to the root of the Package </li></ul><ul><ul><li>Translation of Source and Target URIs is required </li></ul></ul><ul><ul><li>Use the PackUriHelper class to aid in the translation </li></ul></ul>
    16. 16. System.IO.Packaging
    17. 17. SpreadsheetML <ul><li>Workbooks, Worksheets </li></ul><ul><li>Rows, Columns, Values </li></ul><ul><li>Formulas </li></ul>Workbook properties table chart styles calcChain sharedStrings sheet1..N sheet1..N sheet1..N sheet1..N sheet1..N sheet1..N sheet1..N drawing
    18. 18. The Minimal xlsx <ul><li>Required: workbook.xml , the document “start part” </li></ul><ul><li>Required: at least one sheet, worksheet.xml </li></ul><ul><li>Required: one relationship part ( .rels ) </li></ul><ul><ul><li>Must be in a _rels folder </li></ul></ul><ul><li>Required: [Content_Types].xml </li></ul><ul><ul><li>Required part for all Open XML documents </li></ul></ul><ul><ul><li>Three content types must be defined: </li></ul></ul><ul><ul><ul><li>SpreadsheetML main document (for the start part) </li></ul></ul></ul><ul><ul><ul><li>Worksheet </li></ul></ul></ul><ul><ul><ul><li>Package relationships (for the required relationships) </li></ul></ul></ul><ul><li>Everything else is optional </li></ul><ul><ul><li>Worksheet <sheetdata> is required, but may be empty </li></ul></ul>
    19. 19. SpreadsheetML Tables <ul><li>SpreadsheetML tables provide structure and formatting for worksheet information </li></ul><ul><li>Separation of presentation and data: </li></ul><ul><ul><li>Data stays in the worksheet </li></ul></ul><ul><ul><li>Table definition in separate part (implicit relationship) </li></ul></ul><ul><li>Open XML has different types of tables for each document type, optimized for different scenarios: </li></ul><ul><ul><li>WordprocessingML has its tbl element </li></ul></ul><ul><ul><li>SpreadsheetML has its table element </li></ul></ul><ul><ul><li>PresentationML uses DrawingML tables ( tbl inside graphicData ) </li></ul></ul>
    20. 20. SpreadsheetML Table Headings = shared strings Worksheet (sheet1.xml) Table definition (table1.xml) <sheetData> <row r=&quot;1&quot; spans=&quot;1:2&quot;> <c r=&quot;A1&quot; t=&quot;s&quot;><v>0</v></c> <c r=&quot;B1&quot; t=&quot;s&quot;><v>1</v></c> </row> <row r=&quot;2&quot; spans=&quot;1:2&quot;> <c r=&quot;A2&quot;><v>1</v></c> <c r=&quot;B2&quot;><v>4</v></c> </row> <row r=&quot;3&quot; spans=&quot;1:2&quot;> <c r=&quot;A3&quot;><v>2</v></c> <c r=&quot;B3&quot;><v>5</v></c> </row> <row r=&quot;4&quot; spans=&quot;1:2&quot;> <c r=&quot;A4&quot;><v>3</v></c> <c r=&quot;B4&quot;><v>6</v></c> </row> </sheetData> ... <tableParts count=&quot;1&quot;> <tablePart r:id=&quot;rId2&quot;/> </tableParts> <table … ref=&quot;A1:B4” …> <autoFilter ref=&quot;A1:B4”/> <tableColumns count=&quot;2&quot;> <tableColumn id=&quot;1&quot; name=&quot;Column1&quot; /> <tableColumn id=&quot;2&quot; name=&quot;Column2&quot; /> </tableColumns> <tableStyleInfo …/> </table>
    21. 21. ExcelPackage <ul><li>Open Source API on Codeplex </li></ul><ul><li>Wraps System.IO.Packaging and SpreadsheetML </li></ul>
    22. 22. WordProcessingML Document <ul><ul><li>A WordprocessingML file is a collection of multiple “stories”: </li></ul></ul><ul><ul><ul><li>The main story </li></ul></ul></ul><ul><ul><ul><li>Header(s) / Footer(s) </li></ul></ul></ul><ul><ul><ul><li>Footnote(s) / Endnote(s) </li></ul></ul></ul><ul><ul><ul><li>Subdocuments </li></ul></ul></ul><ul><ul><ul><li>Comment(s) </li></ul></ul></ul>Document body properties fontTable headers/footers images numberingDefinitions styles customXML footnotes/endnotes comments
    23. 23. Main Document Part <ul><li>The top-level element in the start part (e.g., document.xml) is document </li></ul><ul><li>Document has two optional child elements: </li></ul><ul><ul><li>The background element, which specifies the settings for the background for the document </li></ul></ul><ul><ul><li>The body element, which contains the content of the main story </li></ul></ul>
    24. 24. Block-Level Elements <ul><li>The body element contains the main document story, made up of block-level elements: </li></ul><ul><ul><li>Paragraphs </li></ul></ul><ul><ul><li>Tables </li></ul></ul><ul><ul><li>Custom XML markup </li></ul></ul><ul><ul><li>Alternate format chunks </li></ul></ul><ul><ul><li>Subdocuments </li></ul></ul><ul><ul><li>Final section properties </li></ul></ul><ul><ul><li>Future extensibility containers </li></ul></ul><ul><li>Nested elements: a table may contain a table which contains a paragraph, etc. </li></ul>
    25. 25. Inline Structures <ul><li>The <w:p> paragraph element contains inline structures: </li></ul><ul><ul><li>Runs (containing <w:t> text regions) </li></ul></ul><ul><ul><li>Custom Markup (can occur at block or inline level) </li></ul></ul><ul><ul><li>Annotations (comments, tracked changes, bookmarks) </li></ul></ul><ul><ul><li>DrawingML elements </li></ul></ul><ul><ul><li>Fields (date, page number, document creator, etc.) </li></ul></ul><ul><ul><li>Hyperlinks </li></ul></ul>
    26. 26. Paragraphs <w:p> <ul><li>The most basic unit of a WordprocessingML document </li></ul><ul><li>Contains three pieces of information: </li></ul><ul><ul><li>Paragraph properties </li></ul></ul><ul><ul><li>Inline content </li></ul></ul><ul><ul><li>optional revision IDs used for document merge and compare </li></ul></ul><ul><li>A paragraph may occur at any location which allows block level content: </li></ul><ul><ul><li>At the top-most level within a story (e.g. header, footer, main document) </li></ul></ul><ul><ul><li>Nested within a table cell </li></ul></ul><ul><ul><li>Nested within a structured document tag or annotation markers </li></ul></ul>
    27. 27. Paragraph Properties <ul><li>Can be set directly on a paragraph (below) or in a paragraph style </li></ul><ul><li>24 total property settings </li></ul><w:p> <w:pPr> <w:widowControl w:val=“on” /> <w:keepNext/> <w:keepLines/> <w:pageBreakBefore/> <w:suppressLineNumbers /> <w:suppressAutoHyphens /> <w:textBoxTightWrap /> </w:pPr> … runs, paragraph content … </w:p>
    28. 28. Runs <w:r> <ul><li>A run is a region of text with a common set of properties </li></ul><ul><li>All text must be contained within runs </li></ul><ul><li>All runs must be contained within paragraphs </li></ul><ul><li>A run contains three types of information: </li></ul><ul><ul><li>Run properties </li></ul></ul><ul><ul><li>Run content (text, fields, soft line breaks, pictures, etc.) </li></ul></ul><ul><ul><li>Optional revision IDs for document comparison </li></ul></ul>
    29. 29. <ul><li>Define formatting for individual characters </li></ul><ul><li>Font attributes, size/position, etc. </li></ul><ul><li>24 total properties </li></ul>Run Properties <w:r> <w:rPr> <w:rFonts w:ascii=“ Arial ” w:hAnsi=“Arial” w:cs=“Arial” /> <w: b /> <w: i /> <w:sz w:val=“ 11 ” /> <w: dstrike w:val=“ true ” />
    30. 30. PresentationML View Properties Presentation Properties Code Themes Fonts        Notes Masters        Slides        Handout Masters        Slide Masters        Notes Slides        Slide Layouts Presentation
    31. 31. The Minimal pptx <ul><li>Presentation Element </li></ul><ul><ul><li>Presentation.xml </li></ul></ul><ul><ul><ul><li>Slide Masters </li></ul></ul></ul><ul><ul><ul><li>Notes Masters </li></ul></ul></ul><ul><ul><ul><li>Handout Masters </li></ul></ul></ul><ul><ul><ul><li>Slides </li></ul></ul></ul><ul><li>Relationships Part </li></ul><ul><ul><li>Links to slide parts </li></ul></ul>
    32. 32. Slide Parts <p:sld xmlns:p=“…/presentationml/2006/main” xmlns:a=“…/drawingml/2006/main” …> <p:cSld> <p:spTree> <p:sp> <p:nvSpPr>   <p:cNvPr id=&quot;2&quot; name=&quot; 7-Point Star 1 ” /> … <p:sp> <p:nvSpPr>   <p:cNvPr id=&quot;3&quot; name=&quot; TextBox 2 ” /> … <p:graphicFrame > <p:nvGraphicFramePr> <p:cNvPr id=&quot;4&quot; name=&quot; Chart 3 ” /> … </p:spTree> </p:cSld> <p:clrMapOvr> <a:masterClrMapping /> </p:clrMapOvr> </p:sld> Shape Chart Textbox
    33. 33. Object Parts – DrawingML Chart Part (chart1.xml) Data source Shape Chart Textbox
    34. 34. DrawingML <ul><li>5 Main types of objects </li></ul><ul><ul><li>Shape </li></ul></ul><ul><ul><li>Group Shape </li></ul></ul><ul><ul><li>Connector </li></ul></ul><ul><ul><li>Picture </li></ul></ul><ul><ul><li>Graphic Frame </li></ul></ul><ul><ul><ul><li>General-purpose container </li></ul></ul></ul><ul><ul><ul><li>Used for Charts, Diagrams, Tables </li></ul></ul></ul><ul><li>Most widely used elements are Property elements </li></ul><ul><ul><li>Non-Visible Properties (nvPrs): union of common nvPrs and object specific nvPrs </li></ul></ul><ul><ul><li>Visible Properties: object specific </li></ul></ul>
    35. 35. Shapes <ul><li>Preset geometry </li></ul><ul><ul><li>Pick the preset shape </li></ul></ul><ul><ul><li>Specify the adjust values for the shape </li></ul></ul><ul><li>Text geometry </li></ul><ul><ul><li>Pick the preset text shape </li></ul></ul><ul><ul><li>Specify the adjust values for the text shape </li></ul></ul><ul><li>Custom geometry </li></ul><ul><ul><li>Not covered in this course </li></ul></ul>
    36. 36. Shape Line and Fill Properties < a:blipFill > < a:blip r:embed = &quot; rId2 &quot; /> < a:stretch > < a:fillRect /> </ a:stretch > </ a:blipFill > < a:ln > < a:solidFill > < a:srgbClr val = &quot; 4F81BD &quot; /> </ a:solidFill > < a:prstDash val = &quot; sysDash &quot; /> </ a:ln > Indicates relationship id to image data BLIP (Binary Large Image or Pictures) Fill Gradient Fill Dash Line and Solid Fill Fill Dashed Line Line < a:gradFill flip = &quot; none &quot; rotWithShape = &quot; 1 &quot; > < a:gsLst > < a:gs pos = &quot; 0 &quot; > < a:srgbClr val = &quot; DDEBCF &quot; /> </ a:gs > < a:gs pos = &quot; 50000 &quot; > < a:srgbClr val = &quot; 9CB86E &quot; /> </ a:gs > ... </ a:gsLst > < a:lin ang = &quot; 4200000 &quot; scaled = &quot; 0 &quot; /> < a:tileRect /> </ a:gradFill > Gradient stop and color
    37. 37. Pictures <ul><li>Define a Picture: <p:pic/> </li></ul><ul><li>Source image rel. id <a:blip r:embed=“rId2”/> </li></ul><ul><li>Acts similar to a shape <p:spPr/> </li></ul><ul><li>Non-Visual picture properties convey picture specific save properties <p:nvPicPr/> </li></ul><ul><li>Similar for Audio & Video </li></ul>< p:pic > < p:nvPicPr > < p:cNvPr id = &quot; 4 &quot; name = &quot; lake.jpeg &quot; /> < p:cNvPicPr > < a:picLocks noChangeAspect = &quot; 1 &quot; /> </ p:cNvPicPr > < p:nvPr /> </ p:nvPicPr > < p:blipFill > < a:blip r:embed = &quot; rId2 &quot; /> < a:stretch > < a:fillRect /> </ a:stretch > </ p:blipFill > < p:spPr > < a:xfrm > < a:off x = &quot; 762000 &quot; y = &quot; 571500 &quot; /> < a:ext cx = &quot; 7620000 &quot; cy = &quot; 5715000 &quot; /> </ a:xfrm > < a:prstGeom prst = &quot; rect &quot; > < a:avLst /> </ a:prstGeom > </ p:spPr > </ p:pic >
    38. 38. Pictures vs. Shapes <ul><li>Single fill allowed </li></ul><ul><li>Borders grow in/outward </li></ul><ul><li>Must be done by app </li></ul><ul><li>Can have text attached </li></ul><ul><li>Can have shape properties </li></ul><ul><li>Shape specific UI enabled </li></ul><ul><li>Two overlaid fills allowed </li></ul><ul><li>Borders grow outward </li></ul><ul><li>Lock aspect ratio flag </li></ul><ul><li>Cannot have text attached </li></ul><ul><li>Can have shape properties </li></ul><ul><li>Picture specific UI enabled </li></ul>
    39. 39. Graphic Objects <ul><li>Graphic element represents a single graphical object </li></ul><ul><li>GraphicData element and Uri attribute </li></ul><ul><ul><li>Specifies the namespace for the embedded content </li></ul></ul><ul><ul><li>Tells the consumer how to interpret the graphicData </li></ul></ul><ul><ul><li>Ability to render is application specific </li></ul></ul><ul><ul><li>Office supports a set of specific URI values: </li></ul></ul><ul><ul><ul><li> </li></ul></ul></ul><ul><ul><ul><li> </li></ul></ul></ul>Graphic Object < graphic > < a:graphicData uri = &quot; http://schemas.../drawingml/2006/chart &quot; > < c:chart xmlns:c = &quot; http://schemas.../drawingml/2006/chart &quot; xmlns:r = &quot; http://schemas.../officeDocument/2006/relationships &quot; r:id = &quot; rd123232 &quot; /> </ a:graphicData > </ graphic > URI means chart follows
    40. 40. Charts <ul><li>Graphic Object definition </li></ul><ul><ul><li>References separate XML chart part </li></ul></ul><ul><ul><li>Defined in DrawingML namespace </li></ul></ul><ul><li>Chart XML Part </li></ul><ul><ul><li>Visual representation of data. </li></ul></ul><ul><ul><li>Includes a cache of data for chart. </li></ul></ul><ul><ul><li>Includes formatting using DrawingML. </li></ul></ul><ul><li>Data Relationship </li></ul><ul><ul><li>External relationship to file, or </li></ul></ul><ul><ul><li>Internal relationship to embedded spreadsheet </li></ul></ul><ul><ul><li>Spreadsheets point to their own data. </li></ul></ul><ul><li>Chart Drawing </li></ul><ul><ul><li>Contains shapes and pictures drawn on chart </li></ul></ul>XML Chart Part Graphic Object Data Source Chart Drawing
    41. 41. Build a Document in Code
    42. 42. Resources <ul><li> </li></ul><ul><li>OpenXMLSDK </li></ul><ul><li>Package Explorer </li></ul><ul><li>Code Snippets </li></ul>
    43. 44. <ul><li>[email_address] </li></ul>