This document discusses different approaches for generating Microsoft Word documents (docx) programmatically from external data sources. It recommends using content controls (Approach 3), as they provide cleaner XML and allow binding data via XPath. Content controls can handle repeats, conditions, and images better than other approaches like simple variable replacement (Approach 1) or MERGEFIELDS (Approach 2). It also discusses considerations for maintaining document integrity and handling repeating elements.
Disease prediction using machine learningJinishaKG
Github link :
https://github.com/jini-the-coder/Diseaseprediction
Blog link :
http://amigoscreation.blogspot.com/2020/07/disease-prediction-using-machine.html
Youtube link :
https://youtu.be/3YmAbta16yk
Joget Workflow Clustering and Performance Testing on Amazon Web Services (AWS)Joget Workflow
Joget Workflow is an open source web-based workflow software to develop workflow and BPM applications. It is also a rapid application development platform that offers full-fledged agile development capabilities (consisting of processes, forms, lists, CRUD and UI), not just back-end EAI/orchestration/integration or the task-based interface.
This document is intended to describe and analyze the results of performance tests on a clustered deployment of Joget Workflow on Amazon Web Services (AWS). This document also demonstrates the baseline performance of the Joget Workflow platform for a basic app and shows how horizontal and vertical scaling can be used to support larger deployments.
More information on Joget Workflow at http://www.joget.org
Documento en el que se describen los métodos de llamada para el acceso y descarga de dataset del ISTAC en su repositorio ISTAC.base, así como las estructuras de los formatos de descarga.
A Lens is a functional concept which solves a very common problem: how to update a complex immutable structure. This is probably the reason why Lenses are relatively well known in functional programming languages such as Haskell or Scala. However, there are far less resources available on the generalization of Lenses known as "optics".
In this slides, I would like to go through a few of these optics namely Iso, Prism and Optional, by showing how they relate to each other as well as how to use optics in a day to day programming job.
Machine Learning for Disease PredictionMustafa Oğuz
A great application field of machine learning is predicting diseases. This presentation introduces what is preventable diseases and deaths. Then examines three diverse papers to explain what has been done in the field and how the technology works. Finishes with future possibilities and enablers of the disease prediction technology.
Disease prediction using machine learningJinishaKG
Github link :
https://github.com/jini-the-coder/Diseaseprediction
Blog link :
http://amigoscreation.blogspot.com/2020/07/disease-prediction-using-machine.html
Youtube link :
https://youtu.be/3YmAbta16yk
Joget Workflow Clustering and Performance Testing on Amazon Web Services (AWS)Joget Workflow
Joget Workflow is an open source web-based workflow software to develop workflow and BPM applications. It is also a rapid application development platform that offers full-fledged agile development capabilities (consisting of processes, forms, lists, CRUD and UI), not just back-end EAI/orchestration/integration or the task-based interface.
This document is intended to describe and analyze the results of performance tests on a clustered deployment of Joget Workflow on Amazon Web Services (AWS). This document also demonstrates the baseline performance of the Joget Workflow platform for a basic app and shows how horizontal and vertical scaling can be used to support larger deployments.
More information on Joget Workflow at http://www.joget.org
Documento en el que se describen los métodos de llamada para el acceso y descarga de dataset del ISTAC en su repositorio ISTAC.base, así como las estructuras de los formatos de descarga.
A Lens is a functional concept which solves a very common problem: how to update a complex immutable structure. This is probably the reason why Lenses are relatively well known in functional programming languages such as Haskell or Scala. However, there are far less resources available on the generalization of Lenses known as "optics".
In this slides, I would like to go through a few of these optics namely Iso, Prism and Optional, by showing how they relate to each other as well as how to use optics in a day to day programming job.
Machine Learning for Disease PredictionMustafa Oğuz
A great application field of machine learning is predicting diseases. This presentation introduces what is preventable diseases and deaths. Then examines three diverse papers to explain what has been done in the field and how the technology works. Finishes with future possibilities and enablers of the disease prediction technology.
Build an application upon Semantic Web models. Brief overview of Apache Jena and OWL-API.
Semantic Web course
e-Lite group (https://elite.polito.it)
Politecnico di Torino, 2017
Tutorial on RDFa, to be held at ISWC2010 in Shanghai, China. (I was supposed to hold the tutorial but last minute issues made it impossible for me to travel there...)
Comparison with storing data using NoSQL(CouchDB) and a relational database.eross77
This presentation is intended to show those familiar with relational databases how a NoSQL database can make their jobs easier with loosely structured data.
Similar to Approaches to document/report generation (20)
2. Where I’m coming from…
• docx4j is an ASLv2 library for (Microsoft) Open XML office
documents (docx, pptx, xlsx)
• My company Plutext sponsors that project
• docx4j started in 2007
www.docx4java.org
6. Choose your hub format; import/export from/to others
PDF XHTML
XHTML docx
? docx ? PDF
• If you need to replicate the appearance of existing Office documents, using the
Microsoft formats as your “hub” will avoid lots of pain
• If you can, work with the OpenXML formats, not the legacy binary ones, or Word
2003 XML, or Word HTML
• LibreOffice/OpenOffice is a useful tool for conversion, driven by JODConverter
www.docx4java.org
7. Open XML
• standardised via ECMA 376 and ISO/IEC 29500
• includes XSD
– can generate strongly typed classes
Alter Manipulate
Open
Open Unzip
Unzip Unmarshal
XML objects
www.docx4java.org
8. Authoring time Generation time
What skills
do authors
need?
docx
data PDF
HTML
www.docx4java.org
9. Approach 1:- Variable replacement.
This approach can also be used for pptx, xlsx
www.docx4java.org
11. Ummm… not so fast.
1. spelling/grammar proofing
2. rsid
3. run formatting
www.docx4java.org
12. Look for a solution which maintains integrity
• Typically a Word Add-In or macro which ensures integrity
• This suggestion applies to approaches #2 and #3 as well
www.docx4java.org
13. Additional requirement: repeating data (list items, table rows)
• can be done using some convention, for example:
[#list developers as developer]
${developer.name}
[/#list]
• many systems invent their own (eg HotDocs)
• but freemarker or velocity template language can be used to
do this:
– http://freemarker.sourceforge.net/
– http://velocity.apache.org/
• for example:
– XDocReport (FreeMarker or Velocity; open source)
• (this templating approach can also be used with OpenOffice
documents)
www.docx4java.org
15. Additional requirement: images
• Now it is starting to get a bit trickier, because inserting an
image requires:
– adding an image part to the docx package
– making a note of its rel id
– replacing the placeholder with the image XML, including the rel id
www.docx4java.org
16. Approach 2:- MERGEFIELD and other fields
• Fields are a long standing feature of Word, included in the
Open XML specification
• so lots of documents use this (aka mail merge)
• Various other useful field types eg IF
• A partial solution to the integrity problems of Approach 1
www.docx4java.org
17. But, two unpleasant XML hybrids (simple and complex)
<w:fldSimple w:instr=" MERGEFIELD name ">
<w:r>
<w:t>«name»</w:t>
</w:r>
</w:fldSimple> <w:r>
<w:fldChar w:fldCharType="begin"/>
<w:instrText xml:space="preserve">NAME</w:instrText>
<w:fldChar w:fldCharType="separate"/>
<w:r>
<w:t>«name»</w:t>
</w:r>
<w:fldChar w:fldCharType="end"/>
</w:r>
www.docx4java.org
20. Content controls are nice
• Better solution integrity wise
• Can bind via XPath to arbitrary XML
• handles images
• since Word 2007
• can nest, so repeats/conditions work well
– unlike Approaches 1 & 2
– table row friendly
• w:tag supports arbitrary data
.. But unique to Open XML.
(Could/should a revised ODF support similar?)
www.docx4java.org
21. Repeats/conditions
• applies to content inside
• w:dataBinding doesn’t support these
• so create your own semantics
• OpenDoPE is one way
• use w:tag for implementation
• need an editing tool to insert repeats/conditions
– for OpenDoPE, there are Word Add-Ins designed for technical and
non-technical users
• at generation time, need code to support them
– docx4j does this, and other OpenXML libraries could be extended to
support
• can support complex documents (nested repeats etc)
www.docx4java.org
22. Choose your poison
• docx4j supports all three approaches
– but content controls are strongly recommended
• other libraries offer more or less support for each approach
www.docx4java.org