SlideShare a Scribd company logo
Document Generation
  Do’s and Don’ts
      Jason Harrop
     Plutext Pty Ltd
Where I’m coming from…

• docx4j is an ASLv2 library for (Microsoft) Open XML office
  documents (docx, pptx, xlsx)
• My company Plutext sponsors that project
• docx4j started in 2007




                        www.docx4java.org
Since its introduction in 2007, docx4j has become quite popular.




                          www.docx4java.org
Comparables


               Open XML
        tool                    docx4j           POI        Aspose
                 SDK
     vendor    Microsoft        Plutext         Apache      Aspose

   language .NET (C# etc)         Java           Java        Java

        cost     free             free           free      expensive
                                  yes             yes
 open source      no                                          no
                                (ASL v2)        (ASL v2)
 marshalling                     JAXB
                 .NET                          XML Beans     JAXB
 framework                     (even moXy)




                           www.docx4java.org
www.docx4java.org
Choose your hub format; import/export from/to others


               PDF                                           XHTML




              XHTML                                          docx

     ?                    docx                       ?                   PDF



• If you need to replicate the appearance of existing Office documents, using the
  Microsoft formats as your “hub” will avoid lots of pain
• If you can, work with the OpenXML formats, not the legacy binary ones, or Word
  2003 XML, or Word HTML
• LibreOffice/OpenOffice is a useful tool for conversion, driven by JODConverter

                                 www.docx4java.org
Open XML

• standardised via ECMA 376 and ISO/IEC 29500
• includes XSD
   – can generate strongly typed classes




                                            Alter      Manipulate
      Open
      Open               Unzip
                         Unzip             Unmarshal
                                            XML         objects




                           www.docx4java.org
Authoring time                          Generation time

                 What skills
                 do authors
                   need?
                                                          docx

                                           data           PDF

                                                          HTML




                               www.docx4java.org
Approach 1:- Variable replacement.




 This approach can also be used for pptx, xlsx

                                www.docx4java.org
What could be simpler?




                         www.docx4java.org
Ummm… not so fast.




                                    1. spelling/grammar proofing




                                    2. rsid




                                    3. run formatting



                     www.docx4java.org
Look for a solution which maintains integrity

• Typically a Word Add-In or macro which ensures integrity
• This suggestion applies to approaches #2 and #3 as well




                        www.docx4java.org
Additional requirement: repeating data (list items, table rows)

• can be done using some convention, for example:
   [#list developers as developer]
    ${developer.name}
   [/#list]
• many systems invent their own (eg HotDocs)
• but freemarker or velocity template language can be used to
  do this:
    – http://freemarker.sourceforge.net/
    – http://velocity.apache.org/
• for example:
    – XDocReport (FreeMarker or Velocity; open source)
• (this templating approach can also be used with OpenOffice
  documents)

                            www.docx4java.org
Additional requirement: conditional content

• for example, XDocReport uses
   – [#if (Freemarker)
   – #if( (Velocity)




                         www.docx4java.org
Additional requirement: images

• Now it is starting to get a bit trickier, because inserting an
  image requires:
   – adding an image part to the docx package
   – making a note of its rel id
   – replacing the placeholder with the image XML, including the rel id




                           www.docx4java.org
Approach 2:- MERGEFIELD and other fields

• Fields are a long standing feature of Word, included in the
  Open XML specification
• so lots of documents use this (aka mail merge)
• Various other useful field types eg IF
• A partial solution to the integrity problems of Approach 1




                        www.docx4java.org
But, two unpleasant XML hybrids (simple and complex)
<w:fldSimple w:instr=" MERGEFIELD name ">
   <w:r>
     <w:t>«name»</w:t>
   </w:r>
  </w:fldSimple>            <w:r>
                                <w:fldChar w:fldCharType="begin"/>

                                <w:instrText xml:space="preserve">NAME</w:instrText>

                                <w:fldChar w:fldCharType="separate"/>

                                <w:r>
                                   <w:t>«name»</w:t>
                                </w:r>

                                <w:fldChar w:fldCharType="end"/>
                               </w:r>

                                www.docx4java.org
Approach 3:- Content controls




                       www.docx4java.org
Much nicer XML, and XPath binding

<w:sdt>
       <w:sdtPr>
         <w:alias w:val="name"/>
         <w:tag w:val="od:xpath=ribxv"/>
         <w:id w:val="13144269"/>
         <w:dataBinding w:xpath="/oda:answers/oda:answer[@id='name_Wt']" />
       </w:sdtPr>
       <w:sdtContent>
         <w:r >
           <w:t>«name»</w:t>
         </w:r>
       </w:sdtContent>
     </w:sdt>



                             www.docx4java.org
Content controls are nice

•   Better solution integrity wise
•   Can bind via XPath to arbitrary XML
•   handles images
•   since Word 2007
•   can nest, so repeats/conditions work well
    – unlike Approaches 1 & 2
    – table row friendly
• w:tag supports arbitrary data

.. But unique to Open XML.
(Could/should a revised ODF support similar?)

                           www.docx4java.org
Repeats/conditions

•   applies to content inside
•   w:dataBinding doesn’t support these
•   so create your own semantics
•   OpenDoPE is one way
•   use w:tag for implementation
•   need an editing tool to insert repeats/conditions
    – for OpenDoPE, there are Word Add-Ins designed for technical and
      non-technical users
• at generation time, need code to support them
    – docx4j does this, and other OpenXML libraries could be extended to
      support
• can support complex documents (nested repeats etc)


                           www.docx4java.org
Choose your poison

• docx4j supports all three approaches
   – but content controls are strongly recommended
• other libraries offer more or less support for each approach




                          www.docx4java.org
Thanks!




 www.docx4java.org

More Related Content

What's hot

Heart disease prediction
Heart disease predictionHeart disease prediction
Heart disease prediction
Ariful Haque
 
Heart Attack Prediction System Using Fuzzy C Means Classifier
Heart Attack Prediction System Using Fuzzy C Means ClassifierHeart Attack Prediction System Using Fuzzy C Means Classifier
Heart Attack Prediction System Using Fuzzy C Means Classifier
IOSR Journals
 
Disease prediction using machine learning
Disease prediction using machine learningDisease prediction using machine learning
Disease prediction using machine learning
JinishaKG
 
Joget Workflow Clustering and Performance Testing on Amazon Web Services (AWS)
Joget Workflow Clustering and Performance Testing on Amazon Web Services (AWS)Joget Workflow Clustering and Performance Testing on Amazon Web Services (AWS)
Joget Workflow Clustering and Performance Testing on Amazon Web Services (AWS)
Joget Workflow
 
[132] rust
[132] rust[132] rust
[132] rust
NAVER D2
 
Event handling in Java(part 1)
Event handling in Java(part 1)Event handling in Java(part 1)
Event handling in Java(part 1)
RAJITHARAMACHANDRAN1
 
Guía de uso API de acceso a ISTAC.base
Guía de uso API de acceso a ISTAC.baseGuía de uso API de acceso a ISTAC.base
Guía de uso API de acceso a ISTAC.base
Instituto Canario de Estadística (ISTAC)
 
HEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHM
HEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHMHEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHM
HEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHM
amiteshg
 
Disease Prediction And Doctor Appointment system
Disease Prediction And Doctor Appointment  systemDisease Prediction And Doctor Appointment  system
Disease Prediction And Doctor Appointment system
KOYELMAJUMDAR1
 
Independent component analysis
Independent component analysisIndependent component analysis
Independent component analysis
Vanessa S
 
Radix 2 code
Radix 2 codeRadix 2 code
Radix 2 codepradipakv
 
Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013
Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013
Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013Cosmin Lehene
 
Expectation Maximization and Gaussian Mixture Models
Expectation Maximization and Gaussian Mixture ModelsExpectation Maximization and Gaussian Mixture Models
Expectation Maximization and Gaussian Mixture Modelspetitegeek
 
Beyond Scala Lens
Beyond Scala LensBeyond Scala Lens
Beyond Scala Lens
Julien Truffaut
 
Survey on data mining techniques in heart disease prediction
Survey on data mining techniques in heart disease predictionSurvey on data mining techniques in heart disease prediction
Survey on data mining techniques in heart disease predictionSivagowry Shathesh
 
Bayesian Linear Regression.pptx
Bayesian Linear Regression.pptxBayesian Linear Regression.pptx
Bayesian Linear Regression.pptx
JerminJershaTC
 
Independent Component Analysis
Independent Component AnalysisIndependent Component Analysis
Independent Component Analysis
Tatsuya Yokota
 
Machine Learning for Disease Prediction
Machine Learning for Disease PredictionMachine Learning for Disease Prediction
Machine Learning for Disease Prediction
Mustafa Oğuz
 
Euclides
EuclidesEuclides
Euclides
bichim
 
A Survey on Stroke Prediction
A Survey on Stroke PredictionA Survey on Stroke Prediction
A Survey on Stroke Prediction
MohammadRakib8
 

What's hot (20)

Heart disease prediction
Heart disease predictionHeart disease prediction
Heart disease prediction
 
Heart Attack Prediction System Using Fuzzy C Means Classifier
Heart Attack Prediction System Using Fuzzy C Means ClassifierHeart Attack Prediction System Using Fuzzy C Means Classifier
Heart Attack Prediction System Using Fuzzy C Means Classifier
 
Disease prediction using machine learning
Disease prediction using machine learningDisease prediction using machine learning
Disease prediction using machine learning
 
Joget Workflow Clustering and Performance Testing on Amazon Web Services (AWS)
Joget Workflow Clustering and Performance Testing on Amazon Web Services (AWS)Joget Workflow Clustering and Performance Testing on Amazon Web Services (AWS)
Joget Workflow Clustering and Performance Testing on Amazon Web Services (AWS)
 
[132] rust
[132] rust[132] rust
[132] rust
 
Event handling in Java(part 1)
Event handling in Java(part 1)Event handling in Java(part 1)
Event handling in Java(part 1)
 
Guía de uso API de acceso a ISTAC.base
Guía de uso API de acceso a ISTAC.baseGuía de uso API de acceso a ISTAC.base
Guía de uso API de acceso a ISTAC.base
 
HEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHM
HEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHMHEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHM
HEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHM
 
Disease Prediction And Doctor Appointment system
Disease Prediction And Doctor Appointment  systemDisease Prediction And Doctor Appointment  system
Disease Prediction And Doctor Appointment system
 
Independent component analysis
Independent component analysisIndependent component analysis
Independent component analysis
 
Radix 2 code
Radix 2 codeRadix 2 code
Radix 2 code
 
Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013
Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013
Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013
 
Expectation Maximization and Gaussian Mixture Models
Expectation Maximization and Gaussian Mixture ModelsExpectation Maximization and Gaussian Mixture Models
Expectation Maximization and Gaussian Mixture Models
 
Beyond Scala Lens
Beyond Scala LensBeyond Scala Lens
Beyond Scala Lens
 
Survey on data mining techniques in heart disease prediction
Survey on data mining techniques in heart disease predictionSurvey on data mining techniques in heart disease prediction
Survey on data mining techniques in heart disease prediction
 
Bayesian Linear Regression.pptx
Bayesian Linear Regression.pptxBayesian Linear Regression.pptx
Bayesian Linear Regression.pptx
 
Independent Component Analysis
Independent Component AnalysisIndependent Component Analysis
Independent Component Analysis
 
Machine Learning for Disease Prediction
Machine Learning for Disease PredictionMachine Learning for Disease Prediction
Machine Learning for Disease Prediction
 
Euclides
EuclidesEuclides
Euclides
 
A Survey on Stroke Prediction
A Survey on Stroke PredictionA Survey on Stroke Prediction
A Survey on Stroke Prediction
 

Similar to Approaches to document/report generation

Jdom how it works & how it opened the java process
Jdom how it works & how it opened the java processJdom how it works & how it opened the java process
Jdom how it works & how it opened the java processHicham QAISSI
 
Unit iv xml dom
Unit iv xml domUnit iv xml dom
Unit iv xml dom
smitha273566
 
Java Web Services
Java Web ServicesJava Web Services
Java Web Services
Jussi Pohjolainen
 
Programming the Semantic Web
Programming the Semantic WebProgramming the Semantic Web
Programming the Semantic Web
Luigi De Russis
 
Creating web applications with LODSPeaKr
Creating web applications with LODSPeaKrCreating web applications with LODSPeaKr
Creating web applications with LODSPeaKrAlvaro Graves
 
X Usax Pdf
X Usax PdfX Usax Pdf
X Usax Pdf
nit Allahabad
 
Dos and donts
Dos and dontsDos and donts
Dos and donts
Andrzej Zydroń MBCS
 
Ch23 xml processing_with_java
Ch23 xml processing_with_javaCh23 xml processing_with_java
Ch23 xml processing_with_java
ardnetij
 
epicenter2010 Open Xml
epicenter2010   Open Xmlepicenter2010   Open Xml
epicenter2010 Open Xml
Craig Murphy
 
Processing XML with Java
Processing XML with JavaProcessing XML with Java
Processing XML with Java
BG Java EE Course
 
Understanding Dom
Understanding DomUnderstanding Dom
Understanding DomLiquidHub
 
RDFa Tutorial
RDFa TutorialRDFa Tutorial
RDFa Tutorial
Ivan Herman
 
Poster
PosterPoster
Poster
tdsrogers
 
01 html-introduction
01 html-introduction01 html-introduction
01 html-introduction
Mohsin Mushtaq
 
Workshop on design and development of institutional repositories using d space
Workshop on design and development of institutional repositories using d spaceWorkshop on design and development of institutional repositories using d space
Workshop on design and development of institutional repositories using d space
Mahesh Palamuttath
 
Comparison with storing data using NoSQL(CouchDB) and a relational database.
Comparison with storing data using NoSQL(CouchDB) and a relational database.Comparison with storing data using NoSQL(CouchDB) and a relational database.
Comparison with storing data using NoSQL(CouchDB) and a relational database.
eross77
 

Similar to Approaches to document/report generation (20)

paradise city
paradise cityparadise city
paradise city
 
Jdom how it works & how it opened the java process
Jdom how it works & how it opened the java processJdom how it works & how it opened the java process
Jdom how it works & how it opened the java process
 
Unit iv xml dom
Unit iv xml domUnit iv xml dom
Unit iv xml dom
 
Java Web Services
Java Web ServicesJava Web Services
Java Web Services
 
Programming the Semantic Web
Programming the Semantic WebProgramming the Semantic Web
Programming the Semantic Web
 
Creating web applications with LODSPeaKr
Creating web applications with LODSPeaKrCreating web applications with LODSPeaKr
Creating web applications with LODSPeaKr
 
X Usax Pdf
X Usax PdfX Usax Pdf
X Usax Pdf
 
Dos and donts
Dos and dontsDos and donts
Dos and donts
 
Ch23
Ch23Ch23
Ch23
 
Ch23 xml processing_with_java
Ch23 xml processing_with_javaCh23 xml processing_with_java
Ch23 xml processing_with_java
 
epicenter2010 Open Xml
epicenter2010   Open Xmlepicenter2010   Open Xml
epicenter2010 Open Xml
 
25dom
25dom25dom
25dom
 
Processing XML with Java
Processing XML with JavaProcessing XML with Java
Processing XML with Java
 
Understanding Dom
Understanding DomUnderstanding Dom
Understanding Dom
 
RDFa Tutorial
RDFa TutorialRDFa Tutorial
RDFa Tutorial
 
Poster
PosterPoster
Poster
 
01 html-introduction
01 html-introduction01 html-introduction
01 html-introduction
 
Workshop on design and development of institutional repositories using d space
Workshop on design and development of institutional repositories using d spaceWorkshop on design and development of institutional repositories using d space
Workshop on design and development of institutional repositories using d space
 
XML
XMLXML
XML
 
Comparison with storing data using NoSQL(CouchDB) and a relational database.
Comparison with storing data using NoSQL(CouchDB) and a relational database.Comparison with storing data using NoSQL(CouchDB) and a relational database.
Comparison with storing data using NoSQL(CouchDB) and a relational database.
 

Approaches to document/report generation

  • 1. Document Generation Do’s and Don’ts Jason Harrop Plutext Pty Ltd
  • 2. Where I’m coming from… • docx4j is an ASLv2 library for (Microsoft) Open XML office documents (docx, pptx, xlsx) • My company Plutext sponsors that project • docx4j started in 2007 www.docx4java.org
  • 3. Since its introduction in 2007, docx4j has become quite popular. www.docx4java.org
  • 4. Comparables Open XML tool docx4j POI Aspose SDK vendor Microsoft Plutext Apache Aspose language .NET (C# etc) Java Java Java cost free free free expensive yes yes open source no no (ASL v2) (ASL v2) marshalling JAXB .NET XML Beans JAXB framework (even moXy) www.docx4java.org
  • 6. Choose your hub format; import/export from/to others PDF XHTML XHTML docx ? docx ? PDF • If you need to replicate the appearance of existing Office documents, using the Microsoft formats as your “hub” will avoid lots of pain • If you can, work with the OpenXML formats, not the legacy binary ones, or Word 2003 XML, or Word HTML • LibreOffice/OpenOffice is a useful tool for conversion, driven by JODConverter www.docx4java.org
  • 7. Open XML • standardised via ECMA 376 and ISO/IEC 29500 • includes XSD – can generate strongly typed classes Alter Manipulate Open Open Unzip Unzip Unmarshal XML objects www.docx4java.org
  • 8. Authoring time Generation time What skills do authors need? docx data PDF HTML www.docx4java.org
  • 9. Approach 1:- Variable replacement. This approach can also be used for pptx, xlsx www.docx4java.org
  • 10. What could be simpler? www.docx4java.org
  • 11. Ummm… not so fast. 1. spelling/grammar proofing 2. rsid 3. run formatting www.docx4java.org
  • 12. Look for a solution which maintains integrity • Typically a Word Add-In or macro which ensures integrity • This suggestion applies to approaches #2 and #3 as well www.docx4java.org
  • 13. Additional requirement: repeating data (list items, table rows) • can be done using some convention, for example: [#list developers as developer] ${developer.name} [/#list] • many systems invent their own (eg HotDocs) • but freemarker or velocity template language can be used to do this: – http://freemarker.sourceforge.net/ – http://velocity.apache.org/ • for example: – XDocReport (FreeMarker or Velocity; open source) • (this templating approach can also be used with OpenOffice documents) www.docx4java.org
  • 14. Additional requirement: conditional content • for example, XDocReport uses – [#if (Freemarker) – #if( (Velocity) www.docx4java.org
  • 15. Additional requirement: images • Now it is starting to get a bit trickier, because inserting an image requires: – adding an image part to the docx package – making a note of its rel id – replacing the placeholder with the image XML, including the rel id www.docx4java.org
  • 16. Approach 2:- MERGEFIELD and other fields • Fields are a long standing feature of Word, included in the Open XML specification • so lots of documents use this (aka mail merge) • Various other useful field types eg IF • A partial solution to the integrity problems of Approach 1 www.docx4java.org
  • 17. But, two unpleasant XML hybrids (simple and complex) <w:fldSimple w:instr=" MERGEFIELD name "> <w:r> <w:t>«name»</w:t> </w:r> </w:fldSimple> <w:r> <w:fldChar w:fldCharType="begin"/> <w:instrText xml:space="preserve">NAME</w:instrText> <w:fldChar w:fldCharType="separate"/> <w:r> <w:t>«name»</w:t> </w:r> <w:fldChar w:fldCharType="end"/> </w:r> www.docx4java.org
  • 18. Approach 3:- Content controls www.docx4java.org
  • 19. Much nicer XML, and XPath binding <w:sdt> <w:sdtPr> <w:alias w:val="name"/> <w:tag w:val="od:xpath=ribxv"/> <w:id w:val="13144269"/> <w:dataBinding w:xpath="/oda:answers/oda:answer[@id='name_Wt']" /> </w:sdtPr> <w:sdtContent> <w:r > <w:t>«name»</w:t> </w:r> </w:sdtContent> </w:sdt> www.docx4java.org
  • 20. Content controls are nice • Better solution integrity wise • Can bind via XPath to arbitrary XML • handles images • since Word 2007 • can nest, so repeats/conditions work well – unlike Approaches 1 & 2 – table row friendly • w:tag supports arbitrary data .. But unique to Open XML. (Could/should a revised ODF support similar?) www.docx4java.org
  • 21. Repeats/conditions • applies to content inside • w:dataBinding doesn’t support these • so create your own semantics • OpenDoPE is one way • use w:tag for implementation • need an editing tool to insert repeats/conditions – for OpenDoPE, there are Word Add-Ins designed for technical and non-technical users • at generation time, need code to support them – docx4j does this, and other OpenXML libraries could be extended to support • can support complex documents (nested repeats etc) www.docx4java.org
  • 22. Choose your poison • docx4j supports all three approaches – but content controls are strongly recommended • other libraries offer more or less support for each approach www.docx4java.org

Editor's Notes

  1. People sometimes also try this, using RTF or Word HTML as their document format.Good that it can also be used for pptx, xlsx
  2. same approach can be used for OpenOffice documents