Document Generation  Do’s and Don’ts      Jason Harrop     Plutext Pty Ltd
Where I’m coming from…• docx4j is an ASLv2 library for (Microsoft) Open XML office  documents (docx, pptx, xlsx)• My compa...
Since its introduction in 2007, docx4j has become quite popular.                          www.docx4java.org
Comparables               Open XML        tool                    docx4j           POI        Aspose                 SDK  ...
www.docx4java.org
Choose your hub format; import/export from/to others               PDF                                           XHTML    ...
Open XML• standardised via ECMA 376 and ISO/IEC 29500• includes XSD   – can generate strongly typed classes               ...
Authoring time                          Generation time                 What skills                 do authors            ...
Approach 1:- Variable replacement. This approach can also be used for pptx, xlsx                                www.docx4j...
What could be simpler?                         www.docx4java.org
Ummm… not so fast.                                    1. spelling/grammar proofing                                    2. r...
Look for a solution which maintains integrity• Typically a Word Add-In or macro which ensures integrity• This suggestion a...
Additional requirement: repeating data (list items, table rows)• can be done using some convention, for example:   [#list ...
Additional requirement: conditional content• for example, XDocReport uses   – [#if (Freemarker)   – #if( (Velocity)       ...
Additional requirement: images• Now it is starting to get a bit trickier, because inserting an  image requires:   – adding...
Approach 2:- MERGEFIELD and other fields• Fields are a long standing feature of Word, included in the  Open XML specificat...
But, two unpleasant XML hybrids (simple and complex)<w:fldSimple w:instr=" MERGEFIELD name ">   <w:r>     <w:t>«name»</w:t...
Approach 3:- Content controls                       www.docx4java.org
Much nicer XML, and XPath binding<w:sdt>       <w:sdtPr>         <w:alias w:val="name"/>         <w:tag w:val="od:xpath=ri...
Content controls are nice•   Better solution integrity wise•   Can bind via XPath to arbitrary XML•   handles images•   si...
Repeats/conditions•   applies to content inside•   w:dataBinding doesn’t support these•   so create your own semantics•   ...
Choose your poison• docx4j supports all three approaches   – but content controls are strongly recommended• other librarie...
Thanks! www.docx4java.org
Upcoming SlideShare
Loading in …5
×

Approaches to document/report generation

6,954 views

Published on

Presents approaches for programmatically creating Office files. Targeted at developers.

Presented at http://osdc.com.au/talks/generating-documents-tools-and-techniques

0 Comments
7 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
6,954
On SlideShare
0
From Embeds
0
Number of Embeds
26
Actions
Shares
0
Downloads
39
Comments
0
Likes
7
Embeds 0
No embeds

No notes for slide
  • People sometimes also try this, using RTF or Word HTML as their document format.Good that it can also be used for pptx, xlsx
  • same approach can be used for OpenOffice documents
  • Approaches to document/report generation

    1. 1. Document Generation Do’s and Don’ts Jason Harrop Plutext Pty Ltd
    2. 2. Where I’m coming from…• docx4j is an ASLv2 library for (Microsoft) Open XML office documents (docx, pptx, xlsx)• My company Plutext sponsors that project• docx4j started in 2007 www.docx4java.org
    3. 3. Since its introduction in 2007, docx4j has become quite popular. www.docx4java.org
    4. 4. Comparables Open XML tool docx4j POI Aspose SDK vendor Microsoft Plutext Apache Aspose language .NET (C# etc) Java Java Java cost free free free expensive yes yes open source no no (ASL v2) (ASL v2) marshalling JAXB .NET XML Beans JAXB framework (even moXy) www.docx4java.org
    5. 5. www.docx4java.org
    6. 6. Choose your hub format; import/export from/to others PDF XHTML XHTML docx ? docx ? PDF• If you need to replicate the appearance of existing Office documents, using the Microsoft formats as your “hub” will avoid lots of pain• If you can, work with the OpenXML formats, not the legacy binary ones, or Word 2003 XML, or Word HTML• LibreOffice/OpenOffice is a useful tool for conversion, driven by JODConverter www.docx4java.org
    7. 7. Open XML• standardised via ECMA 376 and ISO/IEC 29500• includes XSD – can generate strongly typed classes Alter Manipulate Open Open Unzip Unzip Unmarshal XML objects www.docx4java.org
    8. 8. Authoring time Generation time What skills do authors need? docx data PDF HTML www.docx4java.org
    9. 9. Approach 1:- Variable replacement. This approach can also be used for pptx, xlsx www.docx4java.org
    10. 10. What could be simpler? www.docx4java.org
    11. 11. Ummm… not so fast. 1. spelling/grammar proofing 2. rsid 3. run formatting www.docx4java.org
    12. 12. Look for a solution which maintains integrity• Typically a Word Add-In or macro which ensures integrity• This suggestion applies to approaches #2 and #3 as well www.docx4java.org
    13. 13. Additional requirement: repeating data (list items, table rows)• can be done using some convention, for example: [#list developers as developer] ${developer.name} [/#list]• many systems invent their own (eg HotDocs)• but freemarker or velocity template language can be used to do this: – http://freemarker.sourceforge.net/ – http://velocity.apache.org/• for example: – XDocReport (FreeMarker or Velocity; open source)• (this templating approach can also be used with OpenOffice documents) www.docx4java.org
    14. 14. Additional requirement: conditional content• for example, XDocReport uses – [#if (Freemarker) – #if( (Velocity) www.docx4java.org
    15. 15. Additional requirement: images• Now it is starting to get a bit trickier, because inserting an image requires: – adding an image part to the docx package – making a note of its rel id – replacing the placeholder with the image XML, including the rel id www.docx4java.org
    16. 16. Approach 2:- MERGEFIELD and other fields• Fields are a long standing feature of Word, included in the Open XML specification• so lots of documents use this (aka mail merge)• Various other useful field types eg IF• A partial solution to the integrity problems of Approach 1 www.docx4java.org
    17. 17. But, two unpleasant XML hybrids (simple and complex)<w:fldSimple w:instr=" MERGEFIELD name "> <w:r> <w:t>«name»</w:t> </w:r> </w:fldSimple> <w:r> <w:fldChar w:fldCharType="begin"/> <w:instrText xml:space="preserve">NAME</w:instrText> <w:fldChar w:fldCharType="separate"/> <w:r> <w:t>«name»</w:t> </w:r> <w:fldChar w:fldCharType="end"/> </w:r> www.docx4java.org
    18. 18. Approach 3:- Content controls www.docx4java.org
    19. 19. Much nicer XML, and XPath binding<w:sdt> <w:sdtPr> <w:alias w:val="name"/> <w:tag w:val="od:xpath=ribxv"/> <w:id w:val="13144269"/> <w:dataBinding w:xpath="/oda:answers/oda:answer[@id=name_Wt]" /> </w:sdtPr> <w:sdtContent> <w:r > <w:t>«name»</w:t> </w:r> </w:sdtContent> </w:sdt> www.docx4java.org
    20. 20. Content controls are nice• Better solution integrity wise• Can bind via XPath to arbitrary XML• handles images• since Word 2007• can nest, so repeats/conditions work well – unlike Approaches 1 & 2 – table row friendly• w:tag supports arbitrary data.. But unique to Open XML.(Could/should a revised ODF support similar?) www.docx4java.org
    21. 21. Repeats/conditions• applies to content inside• w:dataBinding doesn’t support these• so create your own semantics• OpenDoPE is one way• use w:tag for implementation• need an editing tool to insert repeats/conditions – for OpenDoPE, there are Word Add-Ins designed for technical and non-technical users• at generation time, need code to support them – docx4j does this, and other OpenXML libraries could be extended to support• can support complex documents (nested repeats etc) www.docx4java.org
    22. 22. Choose your poison• docx4j supports all three approaches – but content controls are strongly recommended• other libraries offer more or less support for each approach www.docx4java.org
    23. 23. Thanks! www.docx4java.org

    ×