Slides from the talk given at XML Prague 2015-
After the introduction of streaming in XSLT 3.0, new possibilities and applications for XSLT opened up. Streaming stylesheets can process documents with bounded memory consumption, even large documents that would not fit into memory when a non-streaming processor is used. With bounded memory consumption and disc space de-facto unlimited (and SSD drives providing fast access to stored data), CPU speed can become a bottleneck in many scenarios. However, contemporary commodity machines have 2, 4 or more CPU cores, of which only one is usually used by present day XSLT processors (and the CPU is therefore underutilized). In this presentation, we discuss scenarios in which optimal CPU utilization can be achieved by processing the input file in a parallel manner. As the experiments show, such an approach can significantly increase the performance (=shorten run time) of a transformation by up to ~35% in our experimental setting.
2. Reminder on streaming…
Can now process huge documents in bounded memory
A whole new area where XSLT is now applicable
With trade-offs
stylesheet must follow streamability rules
limited XPath
XSLT 3.0 only, only in commercial products
Large documents take long time to process
processing time dominated by the time required to parse the input
4. Why so long?
I/O is not a problem (SSDs are fast enough)
We are using streaming, so memory
consumption is constant (bounded)
Processor runs on 100%
but just one of the cores…
5. Space for optimization?
Multi-core machines are ubiquitous
XSLT processor should use all cores if possible
Parsing + processing in multiple threads
and then merge the outputs
7. Trade-offs
One processor thread can’t see data processed by other threads
The document has to consist of fairly independent “records”
can be processed separately
As in streaming, we can’t “go back”
and crotches like accumulators won’t work
And sometimes can’t even “go up” (out of the record)
8. Requirements #1 (input)
The document has a well-defined structure (schema)
A major part of the content is in a sequence of nodes
of certain types (we will call these core types)
Core types and their ancestors are not recursive.
Contents of core types are reasonably independent.
We expect that processing of each
record takes similar amount of time
Input can readable by multiple
threads from random positions
9. Requirements #2 (stylesheet)
Streamable
Explicitly marked templates for core nodes
Paths in those templates are absolute and use only child axis
and element names
alternatively: provide schema
Only the core node and it’s subtree can be accessed by XPath
match="/ProteinDatabase/ProteinEntry"
pxsl:core="yes"
10. Special cases
If we know more about the structure, we can
access more data safely, e.g.
If all core nodes are children of one node
We can read from „intro“ in all threads
11. Special cases #2
If all core nodes are not children of one node
Maybe we could choose different layer of
nodes as core nodes
12. Parsing problems
Possible issues when splitting the document
comments, PIs, CDATA
Solutions
report error
preprocessing
with „fast“ XML parser
non XML-aware
?
<ProteinEntry>
...
<!--
</ProteinEntry>
<ProteinEntry>
...
-->
</ProteinEntry>
13. Side-effect problem
Parallelization can produce unexpected results
Side-effects defined by the language, e.g. xsl:message
Could be buffered/concatenated
Others
Vendor-specific extensions
User extensions
Solutions?
14. Experimental implementation
Thin wrapper around Saxon EE 9.6, written in Java
1. Split the documents into portions of roughly the same size
2. Turn each portion into a well-formed XML
(by adding a small prefix/suffix)
3. Run an instance of Saxon on each portion
4. Merge the results when all threads finish
https://github.com/j-maly/pXSLT
15. Use Case
RUIAN = DB of geographical, municipal information, XML
Prague = 614 MB of data
Simple format
Records for streets, buildings, …
Task: split the large file into
individual records
(each in one XML file)
Takes 42 minutes in Saxon EE
16. Conclusion
Processing in multiple threads provides measurable speed-up
Imposes additional limitations on the stylesheet and input
Described approach makes sense only for large documents
(for documents that fit into memory, other solutions are already
available, e.g. saxon:threads)
https://github.com/j-maly/pXSLT
Editor's Notes
So we are using streaming mode, but we don’t support “real” streaming scenarios
Remember, we need to avoid parsing the whole document in one thread, because time to do that can dominate the time of the whole transformation.
Some preprocessing XML parser (not really dealing with attributes, namespaces, entities etc)
Some other preprocessing – comments, PIs, CDATAs are all linear constructs, so just make sure we don’t end up in the middle of them when splitting the document…