Office OpenXML: a technical approach for OOo.

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    1 Group

    Office OpenXML: a technical approach for OOo. - Presentation Transcript

    1. Office Open XML: a technical approach for OOo OOoCon 2007, Barcelona, September 21st, 2007
        • Hubert FIguiere
        • Software Engineer, OpenOffice.org
        • Novell - hfiguiere@novell.com
    2. Getting Started
    3. What is Office Open XML?
      • An office application file format
      • XML based
      • Created by Microsoft...
      • ...for Microsoft Office 2007
      • ECMA standard 376
      • Proposed to ISO
    4. What Office Open XML is not?
      • Office Open XML is not OpenDocument (ISO 26300)
      • ... nor the previous XML formats for Microsoft Office introduced in the last few MS-Office release
      • ... nor an ISO standard though it has been proposed
    5. Why supporting Open XML?
      • Support = importing from (and/or exporting to)
      • For interoperability reasons with Microsoft Office 2007
    6. Overview of the format
    7. The specification
      • Available to anybody as ECMA standard 376
      • 5 PDF documents
        • Fundamentals
        • Open Packaging Conventions
        • Primer
        • Markup Language Reference
        • Markup Compatibility and Extensibility
      • 173 + 129 + 472 + 5129 + 43 = 5946 pages
    8. The specification (cont.)
      • Some have printed it.
      OpenXML printed spec photo by Pavel Janik photo by Pavel Janik http://blog.janik.cz/archives/2007/05/19/T20_32_07/
    9. “Packaging Conventions”
      • A zip file: “Open Package”
        • Contain the main content...
        • ... and the embedded content
      • Same container used for other Microsoft format like XPS
      • Replace the old OLE structured storage
      • In principle similar to OpenDocument, but not really.
    10. Content
      • DrawingML
        • Diagrams, Charts, etc.
      • WordprocessingML
        • Word document
      • SpreadsheetML
        • Excel document
      • PresentationML
        • PowerPoint presentation
        • Heavily relies on DrawingML
    11. Content (cont.)
      • Relationships
        • Maps embedded objects
        • Set the relationships between fragments
    12. Content (cont.)
      • VML
        • Legacy format from Office 2000
      • Embedded objects
        • Sound files
        • Images
        • Can be anything !
          • I have seen some PowerPoint document with an OpenDocument chart in an OLE container that was referenced from a slide
    13. OpenOffice implementation
    14. Plans
      • Implement a native filter for Office Open XML
      • Import (in progress)
      • Export (Novell is committed to do it)
      • Split in 2 modules
      • Target is tentatively 2.4
      • Novell's “ooo-build” 2.3 has it:
        • Ship with openSUSE 10.3
        • Will ship with other Linux distros
        • Joint effort between
        • Sun and Novell
        • “ [...] a team of 5 developers will implement 25 handlers a week, which means that we'd have all the XML handlers written in 44 weeks.
        • [...] Nevertheless, we’ve taken a little less than a year to get the converters reading the new file format.”
        • [...] This is just for Word.”
        • -- Rick Schaut, Mac Office team, about implementing the Office 2007 importer for Word for Mac, December 2006.
        • http://blogs.msdn.com/rick_schaut/archive/2006/12/07/open-xml-converters-for-mac-office.aspx
        • Microsoft released the beta version of the Word 2007 to RTF converter for MacOS in May 2007...
        • ...and PowerPoint support was released July 31 st 2007
    15. Modules
      • Writerfilter
        • Word import
        • Refactoring of the RTF and binary doc filter
        • See Fridrich Strba presentation for all the details
      • OOX
        • Excel and PowerPoint, but not Word
        • CWS xmlfilter02
        • implements VML as well
        • called by the writerfilter if needed.
    16. No XSLT
      • OOX is not an XSLT based filter.
      • Process XML to input into OpenOffice.org internal model
      • Written in C++
    17. The fast SAX parser
      • 5568 tokens are listed in our code
      • String comparisons for tokens are slow
      • The fast SAX parser is designed to
        • reduce the number of string comparisons by using a 32-bits hash for string tokens (including the xml namespace)
        • offer that API through UNO
      • It lives in the sax module
      • Off course it is generic and could be used anywhere
    18. Fast parser details
      • Hash tokens are generated by gperf at compile time
        • From a compile time generated list (OOX)
      • Each know string token is referenced by a const like XML_token
      • XML namespace in the high order bits of token
        • Allow selecting the namespace with a simple bit-mask
    19. Example
        • switch( aToken )
        • {
        • case NMSP_DRAWINGML|XML_lnSpc:
        • break;
        • case NMSP_DRAWINGML|XML_spcBef:
        • break;
        • case NMSP_DRAWINGML|XML_spcAft:
        • break;
        • default:
        • }
    20. API
      • The OOX module only depend on UNO API
        • Can't always get inspiration from the binary filters that mostly use the internal APIs
        • Some UNO API are incomplete or missing
          • They need to be implemented
    21. The data model
      • The Office Open XML data model is somewhat very close to the one from the binary format
        • “ [...] XLSX may be ugly, but its concepts were very familiar from XLS. We already had much of the code required to handle it.”
        • -- Jody Goldberg about Gnumeric Excel 2007 support,
        • http://blogs.gnome.org/jody/2007/09/10/odf-vs-oox-asking-the-wrong-questions/
    22. Excel vs Calc
      • Excel 2007 has more feature difference than Calc
      • Dealing with missing features in Calc:
        • Find a workaround
        • “Downgrade” the data
          • Problem with round-trip conversions
        • Implement the missing feature
    23. Excel 2007 vs Excel 2003
      • No notable new feature into the core
      • Overall structures are very similar
        • shared string table that contains cell string
        • Sheet protection options data contain the identical set of options.
        • Autofilter uses internal cell range names (not visible to the user) that are identical both in xlsx and xls.
    24. Excel 2007 vs Excel 2003 (cont.)
      • Overall structures are very similar (cont.)
        • In both xls and xlsx formats, pivot table record contains a cached source data.
        • Excel allows rich text and field objects in the header and footer, and they are encoded. In both xls and xlsx, the same encoding scheme is used.
    25. PowerPoint vs Impress
      • Pixel perfect rendering
        • People spend hours in airport to refine their “PowerPoint”...
        • ...so the import has to be perfect
      • SmartArt
        • This is a big feature in PowerPoint 2007
      • Animation / transition
        • Both based on SMIL
    26. PowerPoint 2007 vs PowerPoint 2003
      • Not much changes
      • SmartArt
        • Saving in PowerPoint 2007 as binary PPT makes it an embedded OLE
        • Off course this require having the engine
    27. DrawingML
      • A shared ML
      • Used directly by PresentationML
      • Encountered in WordprocessingML and SpreadsheetML documents.
      • Defines styles, shapes, text, charts, diagrams, audio/video, etc
      • Supposed to be more functional than VML, therefore to replace it.
    28. VML
      • Legacy Microsoft XML format
      • Still generated by 2007 version if MS applications
      • Replace the binary EMF for OLE
      • Used by annotations in Excel
      • and a lot of drawing features in Word
      • supposed to be superseded by DrawingML
    29. Alternative Implementations
    30. odf-converter (Free Software)
      • Microsoft sponsored ODF to Office OpenXML converter
      • XSLT based
      • Written in C# / .Net
        • Also runs with Mono (Free Software platform)
      • Free Software (MIT style license)
      • Currently shipped by Novell for SUSE and Windows
    31. GNOME (Free Software)
      • libgsf
        • Implement OpenPackage reading and writing
      • Gnumeric
        • Import .xlsx files
        • Export .xlsx files (somewhat)
      • AbiWord
        • Import .docx
      • Both run on non-GNOME platforms like Windows
        • “ The initial importer was written on the flight to London for the ECMA meeting, and export was added on the flight back. Toss in a few hours of debugging and the sample file [...] was under a week of effort to read and write.”
        • -- Jody Goldberg about Gnumeric Excel 2007 support,
        • http://blogs.gnome.org/jody/2007/09/10/odf-vs-oox-asking-the-wrong-questions/
    32. Apple iWork '08 (non-Free)
      • Pages
        • Import and export .docx
      • Numbers
        • Import and export .xlsx
      • Keynote
        • Import and export .pptx
    33. Questions?
    34.  
      • Unpublished Work of Novell, Inc. All Rights Reserved.
      • This work is an unpublished work and contains confidential, proprietary, and trade secret information of Novell, Inc. Access to this work is restricted to Novell employees who have a need to know to perform tasks within the scope of their assignments. No part of this work may be practiced, performed, copied, distributed, revised, modified, translated, abridged, condensed, expanded, collected, or adapted without the prior written consent of Novell, Inc. Any use or exploitation of this work without authorization could subject the perpetrator to criminal and civil liability.
      • General Disclaimer
      • This document is not to be construed as a promise by any participating company to develop, deliver, or market a product. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. Novell, Inc. makes no representations or warranties with respect to the contents of this document, and specifically disclaims any express or implied warranties of merchantability or fitness for any particular purpose. The development, release, and timing of features or functionality described for Novell products remains at the sole discretion of Novell. Further, Novell, Inc. reserves the right to revise this document and to make changes to its content, at any time, without obligation to notify any person or entity of such revisions or changes. All Novell marks referenced in this presentation are trademarks or registered trademarks of Novell, Inc. in the United States and other countries. All third-party trademarks are the property of their respective owners.

    + Alexandro ColoradoAlexandro Colorado, 3 years ago

    custom

    1515 views, 0 favs, 0 embeds more stats

    More info about this document

    CC Attribution License

    Go to text version

    • Total Views 1515
      • 1515 on SlideShare
      • 0 from embeds
    • Comments 0
    • Favorites 0
    • Downloads 31
    Most viewed embeds

    more

    All embeds

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories

    Groups / Events