Speeding Up Your DITA-OT Processing<br />Aryeh Sanders, Suite Solutions<br />
Who Are We?<br />Our Mission<br />To increase our customers’ profitability by significantly improving the efficiency of th...
Clients and Partners<br />3<br />Private and Confidential<br />Suite Solutions©2009<br />
Introduction<br />Performance in the DITA-OT<br />“No Silver Bullet”<br />Design of the DITA-OT puts limits on performance...
Overview<br />Overview of the webinar<br />Performance Pain Points in the DITA-OT<br />Hardware and Software Changes for P...
Performance Issues With the DITA-OT<br />The DITA-OT sacrifices speed for simplicity<br />Constructed as a pipeline of tra...
Importance of Measurement<br />A Case Study<br />Since the DITA-OT writes many files repeatedly, we have to wait for the h...
Hardware Issues<br />Anecdotal:<br />I’ve run the same data and stylesheets on my laptop, and on a client’s server<br />10...
Hardware Issues (2)<br />Make sure there’s enough memory<br />Very workload dependent<br />For very large workloads (rough...
Memory<br />Once you have enough, it won’t help to have extra<br />Slightly surprising to me, but I tested at least one da...
XSLT Performance<br />Stylesheet developers don’t necessarily think about what needs to happen behind the scenes<br />Exam...
Profiling<br />Good idea, many commercial tools<br />Oxygen, StylusStudio, fancier editions of Visual Studio<br />Essentia...
XSLT Performance (2)<br />XPath tends to have one line requests, but that one line can hide a lot of computation<br />What...
XSLT Performance Example (Calculated in Perl, sorry)<br />for $a (1..100) {           #for each of our 100 nodes<br />    ...
Tips From Mike Kay<br />Eight tips for how to write efficient XSLT:<br />Avoid repeated use of "//item".<br />Don't evalua...
Commentary On Those Tips<br />Use <xsl:number> when appropriate – I’m pretty sure that the cases where his comment applies...
What is an XSLT Key?<br />Somewhere on the top level of the stylesheet, you can use something like:<xsl:key name="mapTopic...
More On Slow XSLT<br />Consider what’s inside a loop<br />Example:<br />If you have a template, and the template defines a...
PDF Stylesheet Development Tips<br />Not a general performance issue, but a timesaver for stylesheet developers<br />If, l...
Questions?<br />Any questions?<br />Be in touch!Aryeh Sandersaryehs@suite-sol.com<br />
Upcoming SlideShare
Loading in...5
×

Ot performance webinar

136

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
136
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Ot performance webinar

  1. 1. Speeding Up Your DITA-OT Processing<br />Aryeh Sanders, Suite Solutions<br />
  2. 2. Who Are We?<br />Our Mission<br />To increase our customers’ profitability by significantly improving the efficiency of their information development and delivery processes.<br />Qualitative Advantage<br />Content Lifecycle Implementation (CLI) is Suite Solutions’ comprehensive approach – from concept to publication – to maximizing the value of your information assets.<br />Our professionals are with you at every phase, determining, recommending and implementing the most cost-effective, flexible and long term solution for your business.<br />
  3. 3. Clients and Partners<br />3<br />Private and Confidential<br />Suite Solutions©2009<br />
  4. 4. Introduction<br />Performance in the DITA-OT<br />“No Silver Bullet”<br />Design of the DITA-OT puts limits on performance without a redesign<br />Some of which is underway<br />Performance relative to what?<br />Try to examine needs to figure out which performance issues should be tackled and which can be ignored<br />No hard and fast rules<br />Performance can be assessed only with your data, in your environment<br />Measurement<br />
  5. 5. Overview<br />Overview of the webinar<br />Performance Pain Points in the DITA-OT<br />Hardware and Software Changes for Performance<br />Memory Settings for Java<br />Stylesheet Performance and Code Changes<br />
  6. 6. Performance Issues With the DITA-OT<br />The DITA-OT sacrifices speed for simplicity<br />Constructed as a pipeline of transformations, each step of which does one thing<br />Each step must at least reparse the DITA files<br />Each read of a DITA file with DOCTYPE reparsed the DTDs<br />Now it doesn’t – Eliot Kimber added a patch to cache the DTDs<br />Best takeaway from this talk – upgrade to a version with this patch - 1.5.1<br />XSLT<br />High level language, far removed from the practicalities of performance<br />Often, the easiest way to do something is XSLT involves repeated searches through the document<br />
  7. 7. Importance of Measurement<br />A Case Study<br />Since the DITA-OT writes many files repeatedly, we have to wait for the hard disk to complete the write, even to temporary files where long term integrity isn’t that important. This certainly holds up processing, right?<br />Test: Stop those writes<br />ImBench – ramdisk tool<br />Create a temporary disk in memory and use that as the temp directory<br />Now, no writes have to wait for the disk<br />Run the OT 20 times with the same data<br />I used a slightly complicated map (98 pages on output)<br />41.1 seconds average with disk vs. 39.1 seconds in memory<br />For most people, not worth it; on the other hand, saves 5% of the time<br />
  8. 8. Hardware Issues<br />Anecdotal:<br />I’ve run the same data and stylesheets on my laptop, and on a client’s server<br />10 minutes on the server vs. 1.5 minutes on the laptop<br />And it’s not a new laptop<br />Since the DITA-OT is doing a lot of processing, it’s worth using a machine that’s capable of reasonable performance<br />Measure!<br />But a modern low-end $250 Dell desktop is about as fast as my laptop<br />Don’t throw it on an old computer and then make people wait<br />Make sure there’s one core free to run the OT so it doesn’t have to compete with other processes<br />
  9. 9. Hardware Issues (2)<br />Make sure there’s enough memory<br />Very workload dependent<br />For very large workloads (roughly > 600 pages, or > 1000 topics), consider a 64-bit machine with a 64-bit JVM<br />Eliot Kimber is working on a patch to pass the right memory parameters to the OT – if this is an issue, check the developer mailing list or contact him<br />If there’s not enough physicalmemory, you can get thrashing<br />JVM memory on next slide<br />
  10. 10. Memory<br />Once you have enough, it won’t help to have extra<br />Slightly surprising to me, but I tested at least one data set<br />-Xmx tells Java the maximum heap size<br />The reason this is slightly surprising is that before Java gives up, it will try garbage collection<br />Frequent garbage collection can be slow<br />Possibly the OT doesn’t tend to release memory<br />Some datasets run out of memory, then the standard advice is to set reloadstylesheets=“true”<br />Slows down processing, since stylesheets are re-read<br />Much better to figure out how to give the OT enough memory if possible<br />One customer solved their memory issues with JRockit as JVM<br />
  11. 11. XSLT Performance<br />Stylesheet developers don’t necessarily think about what needs to happen behind the scenes<br />Example:<br /><xsl:variable name=“example” select=“//*[@id=$refid]”/><br /><ul><li>This searches the whole document – fine if that’s what you want, but not if you mean:<xsl:variable name=“example” select=“..//*[@id=$refid]”/></li></ul>In the context of a document where @id is unique, both would behave the same, but one would be slower than the other<br />Except:this could theoretically be optimized if the @id attribute was an ID type, and you have a DTD, and the stylesheet processor has that optimization built in, which leads us back to …<br />Measurement is also useful for stylesheets<br />Saxon comes in a free version and commercial versions<br />Not that expensive, with more optimizations, which might matter for your workload – or might not<br />
  12. 12. Profiling<br />Good idea, many commercial tools<br />Oxygen, StylusStudio, fancier editions of Visual Studio<br />Essentially another example of measurement to find the real pain points<br />Not always necessary if the pain points are evident<br />
  13. 13. XSLT Performance (2)<br />XPath tends to have one line requests, but that one line can hide a lot of computation<br />What needs to happen to process this?preceding-sibling::*[following-sibling::*[contains(@class, ‘ topic/ul ‘)]]<br />Preceding-sibling has to check each previous sibling<br />For each one, following-sibling has to check every following-sibling<br />And contains() itself can’t be that efficient because it needs to hunt within @class for ‘ topic/ul ‘<br />Some numbers: Let’s look at 100 nodes, and let’s pretend that there is no topic/ul, so the test never succeeds. Let’s run this test on all 100 nodes in sequence<br />We could do the math, but it’s easier to write a program<br />
  14. 14. XSLT Performance Example (Calculated in Perl, sorry)<br />for $a (1..100) { #for each of our 100 nodes<br /> for $b (1..$a-1) { #look at the preceding-siblings<br /> for $c ($b+1..100) { #look at the following-sibling of each of those<br /> $contains++; #and call contains()<br /> }<br /> }<br />}<br />print $contains, " ";<br />Running this tells us there are 328350 (!) calls to contains()<br />Of course, with 10 nodes, there are only 285 calls, but the point remains – one line in XSLT might be doing a LOT of computation<br />
  15. 15. Tips From Mike Kay<br />Eight tips for how to write efficient XSLT:<br />Avoid repeated use of "//item".<br />Don't evaluate the same node-set more than once; save it in a variable.<br />Avoid <xsl:number> if you can. For example, by using position().<br />Use <xsl:key>, for example to solve grouping problems.<br />Avoid complex patterns in template rules. Instead, use <xsl:choose> within the rule.<br />Be careful when using the preceding[-sibling] or following[-sibling] axes. This often indicates an algorithm with n-squared performance.<br />Don't sort the same node-set more than once. If necessary, save it as a result tree fragment and access it using the node-set() extension function.<br />To output the text value of a simple #PCDATA element, use <xsl:value-of> in preference to <xsl:apply-templates>.<br />
  16. 16. Commentary On Those Tips<br />Use <xsl:number> when appropriate – I’m pretty sure that the cases where his comment applies aren’t found that often in the OT<br />By all means, use xsl:key!<br />This is probably where to find low-hanging fruit in speeding up the built-in stylesheets<br />We can’t realistically avoid complex patterns in template rules, but it’s worth considering why he gave that advice<br />Every <xsl:apply-templates/> runs through each child node<br />For each child node, it has to run the test in the match in every one of the <xsl:template>s<br />Each match test takes some amount of processing, and it runs for every node, so we’d like to minimize that<br />If you can move processing to an xsl:choose or a moded template, then you only need to run those tests on a smaller subset of nodes<br />
  17. 17. What is an XSLT Key?<br />Somewhere on the top level of the stylesheet, you can use something like:<xsl:key name="mapTopics" match="//opentopic:map//*" use="@id" /><br />Then, later in your stylesheets, you can look up items with that key:select="key('mapTopics', $id)…"<br />This lets you do the search once, instead of searching through opentopic:map elements many times. <br />Note that this is part of the code that had a 40% speedup in generating the TOC in a large book I’ll mention on the next slide, despite that <xsl:key name="mapTopics" match="/*/opentopic:map//*" use="@id" />would have been much more efficient.<br />
  18. 18. More On Slow XSLT<br />Consider what’s inside a loop<br />Example:<br />If you have a template, and the template defines a variable:<br /><xsl:variable name=“topicrefs” select=“//*[contains(@class, ‘ map/topicref ‘)]”/><br />(This isn’t a good idea to start with because of //)<br />This variable will have the same value every time<br />So why not only construct it once?<br />Move it out of the template and make it a global variable<br />One customer speeded up TOC generation by around 40% on a huge book<br />
  19. 19. PDF Stylesheet Development Tips<br />Not a general performance issue, but a timesaver for stylesheet developers<br />If, like us, you need to repeatedly tweak a stylesheet and test the tweak, but each test is slow<br />First, try directly editing the topic.fo file and view it, before you change the stylesheet, so you won’t have to run the OT at all<br />Second, you can configure the toolkit to have another Ant “target” – simply run your DITA once, and after that, let the toolkit start the PDF stylesheets from the files in the temp directory, skipping the earlier processing<br />Contact us for more information – we don’t have a nicely packaged version of this yet, but we can give you the pieces<br />
  20. 20. Questions?<br />Any questions?<br />Be in touch!Aryeh Sandersaryehs@suite-sol.com<br />
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×