• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Dita Accelerator Xml2008
 

Dita Accelerator Xml2008

on

  • 1,800 views

Putting quality to the test ... How do you define quality in content conversion? Is it only about the output? Or do the performance, speed and reliability of the process matter too? Stilo has put ...

Putting quality to the test ... How do you define quality in content conversion? Is it only about the output? Or do the performance, speed and reliability of the process matter too? Stilo has put its leading OmniMark content processing solution to the test to see how it stands up against the DITA Open Toolkit. See the results of this technical benchmarking exercise in the following presentation and contact Stilo to learn how you can accelerate the adoption of DITA with OmniMark. www.stilo.com

Statistics

Views

Total Views
1,800
Views on SlideShare
1,799
Embed Views
1

Actions

Likes
0
Downloads
14
Comments
0

1 Embed 1

http://www.linkedin.com 1

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Dita Accelerator Xml2008 Dita Accelerator Xml2008 Presentation Transcript

    • Accelerating DITA with OmniMark A Scalable Sol tion for Demanding Prod ction En ironments Solution Production Environments Copyright © Stilo International 2008 XML-in-Practice 2008 ramodeo@stilo.com
    • Darwin Information Typing Architecture (DITA) An OASIS standard for content Reduce Reuse Repurpose p DITA Open ToolKit Editors Tools CMSes Topic-Level Metadata-Based Transclusion Specialization Maps Filtering HTML Established Foundations Topic Types HyTime and Best Practices CALS Tables XML SGML DITA Publishing Depends on efficient assembly, interpretation, filtering & formatting of content components
    • The DITA Open Toolkit Factor The Open Toolkit has been a big part of DITA's success DITA s Open source Active development community Thorough implementation of DITA Out-of-the-box support for multiple output formats Modular architecture Easily customized Components of the Open Toolkit are replaceable Users h U have a choice of XSLT and FO processor components h i f d t Many commercial products bundle the Open Toolkit As a result DITA is closely identified with the Open Toolkit
    • DITA Editors incorporating Open Toolkit Adobe Framemaker 8 Information Mapping Content Mapper rcom.pdf Inmedius DITA Storm ools/STC_Inter STC intercom, April 2008 (DITA issue) In.vision DITA Studio ditanews.com/to Justsystems XMetaL Author Enterprise 5 1 5.1 DITA Tools from a to z, PTC Arbortext 5.3 http://www.d SyncRO S ft <oXygen/> 9.1 S RO Soft X / 91 f Bob Doyle Syntext Serna 3.5 XMLmind XML Editor 3.6 Source:
    • DITA CMS Integration with Open Toolkit Astoria On Demand Author-it rcom.pdf Bluestream XDocs ools/STC_Inter DITA Exchange STC intercom, April 2008 (DITA issue) DocZone Inmedius Horizon ditanews.com/to IXIASOFT DITA CMS Framework DITA Tools from a to z, PTC Arbortext Content Manager http://www.d f SiberLogic SiberSafe Bob Doyle Trisoft Infoshare Vasont Source: X-Hive Docato XyEnterprise Content@
    • Exploiting DITA As DITA evolves, it will be applied to ever more demanding situations evolves Many industries publish huge volumes of data Aerospace, automotive, oil services, legal publishing Aspects of DITA can be used for their own sake DITA specialization may spin off into its own standard Transclusion can allow reuse even among monolithic documents Metadata-based Metadata based filtering can provide general purpose effectivity support general-purpose DITA is a very modular specification Some of these scenarios will have very demanding requirements Very l V large "t i " "topics" Large numbers of topics
    • The DITA Continuum at Stilo Content Details Drivers Pure DITA Semi- FrameMaker Authoring costs; conductor source; PDF Consistency; Datasheets publishing Customized Pubs Legal e-Learning; Word Adaptable; Simplified Procedures and HTML source authoring; Integration with existing XML tools Aerospace Monolithic; SGML, Many legacy formats; Standards, Interleaf, Word Multi-target; access to Semi-DITA 2 projects source; publish to sub-contractors; S1000D ATA, S1000D, new support web services Aircraft Monolithic; ATA Efficient update; Maint. source; E-manuals Targeting; Costs; Manuals Regulatory compliance Automotive Monolithic; Multiple Efficient update; sources; SGML Targeting; Costs; Software Topics; SGML; Authoring costs; Multi- Non-DITA Docs RDBMS storage target; Reuse;
    • Pushing the Boundaries How well does the Toolkit cope with these situations? The Toolkit has a modular architecture It can be used as a base for partial DITA applications Some coding tricks are required XSLT rules must be implemented carefully to preserve support for specialization Most importantly, XSLT is not known as a fast processing technology Can the Toolkit cope with high volumes of data? We can test this
    • Building a DITA Stress Test Sample input is the DITA language reference p p g g 200+ topics 1468 conref references 741 targets referenced by conref 1.06 1 06 MB Average file size 5 kB The DITA language reference was inflated in two ways Topic sizes were increased up to a factor of 100 ( 500KB p file) p p (to per ) Number of files was increased up to a factor of 100 (to 20,000 files) To increase topic sizes The body of each topic was replicated A random prefix was added to each word to create unique content The number of links increased proportionately To increase the number of files The whole topic was replicated p p A random prefix was added to each word, each id, and each idref The number of links and link targets increased proportionately
    • Open Toolkit performance (1) Processing Time vs. Average File Size: from 5 kB to 50 kB g g seconds hours 4000 1.1 1.0 3500 0.9 AVG SIZE (kB) Open Toolkit 3000 TIME (s) 0.8 me 5 80 cessing Tim 2500 0.7 07 DITA Accelerator 10 156 0.6 2000 Open Toolkit 21 428 0.5 31 852 1500 0.4 04 Proc 41 1422 1000 0.3 51 2160 0.2 500 0.1 0 0.0 0 10 20 30 40 50 60 Average File Size (kB)
    • Open Toolkit performance (2) Processing Time vs. Average File Size: from 5 kB to 500 kB g g seconds hours AVG SIZE Open Toolkit DITA Open (kB) TIME (s) Toolkit TIME 40000 11 (hr) 10 35000 5 80 0.02 9 10 156 0.04 30000 8 21 428 0.12 me 7 cessing Tim 25000 31 852 0.24 0 24 6 20000 41 1422 0.40 5 51 2160 0.60 15000 4 DITA Accelerator 103 8160 2.27 Proc 3 10000 206 33660 9.35 Open Toolkit 2 5000 309 1 OUT OF 412 0 0 MEMORY 0 100 200 300 400 500 600 515 Average File Size (kB)
    • Open Toolkit performance (3) Processing Time vs. Number of Files: from 200 to 2,000 g , 1000 Number of DITA Open DITA Accelerator Files Toolkit TIME 800 (s) Open Toolkit essing Tim (s) 206 80 me 600 412 144 824 286 400 1236 415 Proce 1648 557 200 2060 699 0 0 500 1000 1500 2000 2500 Number of Files
    • Open Toolkit performance (4) Processing Time vs. Number of Files: from 200 to 20,000 g , Number of Open Toolkit Files TIME (s) 3000 206 80 2500 412 144 essing Tim (s) 2000 824 286 me 1236 415 1500 1648 557 1000 2060 699 Proce DITA Accelerator 4120 1429 500 Open Toolkit 8240 0 12360 OUT OF 0 5000 10000 15000 20000 25000 MEMORY 16480 Number of Files 20600
    • Accelerating DITA for Production An alternative to the Toolkit is required Production-level quality No limits on large volumes of content Consistently high throughput speed as volume increases y g g p p Robust and maintainable Rapid development architecture Out-of-the-box rendering for standard DITA schemas/DTDs Easily customized DITA-aware Built-in support for DITA concepts Transclusion Specialization Filtering No programming tricks required
    • OmniMark DITA Accelerator DITA Accelerator implements HTML p p publishing g Implements all functionality required for language reference HTML support still requires completion PDF to be implemented in the future Behavior is modeled on the Toolkit Automated tests were written to ensure that the output is almost identical The output of the DITA Accelerator is nearly identical to the Open Toolkit index.html from the Open Toolkit p index.html from the DITA Accelerator Some small differences remain Table cell borders are inconsistent in some cases Some errors in the DITA toolkit are corrected in the Accelerator High performance is achieved with streaming technology Leverages OmniMark's built-in support for streaming Makes heavy use of referents y A DITA-aware library has been implemented Programmers do not have to employ coding tricks
    • Gentlemen, start your engines DITA language reference 206 files 1414 elements with ids (potential link or conref targets) 1468 conref references 741 targets referenced by conref 1.06 MB Average file size 5 kB Initial results are promising DITA Open Toolkit: 1 minute, 21 seconds DITA Accelerator: 18 seconds Speed improvement: 4X What about larger input sets? g p
    • Comparing DITA Accelerator and Open Toolkit (1) Processing Time vs. Average File S e from 5 kB to 50 kB ocess g e s e age e Size: o o seconds hours 4000 4000 1.1 AVG SIZE Open DITA 1.0 (kB) Toolkit Accelerator 3500 3500 TIME (s) TIME (s) 0.9 3000 5 80 18 Processin Time 3000 0.8 10 156 20 2500 2500 0.7 ng DITA Accelerator DITA Accelerator 21 428 35 0.6 2000 2000 31 852 41 Open Toolkit Open Toolkit 0.5 41 1422 46 1500 1500 0.4 51 2160 57 1000 1000 0.3 03 0.2 500 500 0.1 0 0 0.0 00 0 0 10 10 20 20 30 30 40 50 60 Average File Size (kB)
    • Comparing DITA Accelerator and Open Toolkit (2) Processing Time vs. Average File Size: from 5 kB to 500 kB g g seconds hours AVG SIZE Open DITA 40000 (kB) Toolkit Accelerator 11 TIME (s) TIME (s) 10 35000 5 80 18 9 10 156 20 Processing Time 30000 8 21 428 35 g 25000 7 31 852 41 6 41 1422 46 20000 5 51 2160 57 15000 P 4 103 8160 86 DITA Accelerator 10000 3 206 33660 150 Open Toolkit 2 309 217 5000 OUT OF 1 412 292 MEMORY 0 0 515 369 0 100 200 300 400 500 600 =9 =6 Average File Size (kB) hours minutes
    • Comparing throughput rate as sizes increase AVG Open Toolkit DITA Processing Throughput as File SIZE THROUGHPUT Accelerator (kB) (kB/s) THROUGHPUT Size Increases (kB/s) 5 13.3 62 10 13,6 104 300 ghput Rate (kB/s) 21 9.9 120 250 31 7.5 155 e 200 DITA Accelerator 41 6.0 184 150 Open Toolkit 51 4,9 186 103 2.6 247 100 Throug 206 1.3 283 50 309 293 0 OUT OF 412 290 MEMORY 0 200 400 600 515 287 Average File Size (kB)
    • Comparing DITA Accelerator and Open Toolkit (3) Processing Time vs. Number of Files: from 200 to 20,000 g , Number of Open Toolkit DITA 3000 Files TIME (s) Accelerator TIME (s) 2500 Processing Time (s) 206 80 18 2000 412 144 49.5 824 286 100.2 1500 1236 415 142.4 1000 1648 557 193.8 DITA Accelerator 2060 699 247.1 247 1 P 500 Open Toolkit 4120 1429 491.2 0 8240 1055.2 0 5000 10000 15000 20000 25000 12360 OUT OF 1601.5 1601 5 Number of Files 16480 MEMORY 2143.4 20600 2788.5
    • Throughput rate as number of files increases Number Open Toolkit DITA 70.0 of Files THROUGHPUT Accelerator (kB/s) THROUGHPUT 60.0 (kB/s) kB/s) 206 13.3 35 50.0 hput Rate (k 412 14.7 35 40.0 824 14.8 42 30.0 DITA Accelerator 1236 15.3 47 Through DITA Open Toolkit 1648 15.2 47 20.0 2060 15.2 45 4120 14.8 14 8 43 10.0 8240 42 0.0 12360 42 0 5000 10000 15000 20000 25000 OUT OF 16480 MEMORY O 41 Number of files 20600 39
    • Interpretation of timing statistics DITA Open Toolkit is best for light duty Performance degrades rapidly as file sizes increase Performance is fairly flat as the number of files increase In both sets of tests, the toolkit eventually fails when it runs out of memory A great starting point OmniMark DITA Accelerator is robust and scales well Does not run out of memory Throughput rate is fairly flat in both types of testing DITA can play in demanding production environments Because DITA is a standard, technology can be changed without changing the information architecture
    • Ongoing analysis Tests used DITA Toolkit "out-of-the-box" out-of-the-box Different XSLT processors may improve performance Forum discussions suggest a workaround for memory exhaustion Reload XSLT stylesheet on every transformation Currently requires toolkit modification (may be configurable in 1 5) 1.5) Expect slower performance on smaller topics Even with improvements, best performance will still be quadratic Linear Quadratic for increasing file sizes There will b room f i Th ill be for improvement f th t for the foreseeable future
    • Role of OmniMark Most of the performance is due to engineering "behind the scenes" Native efficiency of OmniMark Streaming architecture reduces memory requirements Record shelves can be used to implement high speed lookup for DITA processing rules OmniMark referents simplify support for transclusion Referents are a streaming mechanism for reordering content Eliminate complex book-keeping OmniMark language i easily extended O iM k l is il t d d Macros Modules (functions and data types) Bonus: SGML support included
    • Usability XSLT supports DITA reluctantly XSLT rule selection mechanism is not DITA-aware Two templates that match the element "u": <xsl:template match="*[contains(@class,' hi-d/u ')]"> <xsl:template match="*[contains(@class,' topic/ph ')]"> Both have equal priority Programmer must use tricks to ensure that the "hi-d/u" takes precedence over the "topic/ph" rule Extra conditions on the "topic/ph" rule can invert the hierarchy! y The spaces around the class names are required And no more than one on each side XSLT does not enforce this d t f thi Programmer must code carefully to avoid inexplicable behavior
    • OmniMark extensions provide DITA support DITA Accelerator augments OmniMark with "DITA rules" DITA rules Automatically prioritized according to the specialization hierarchy Rule selection is optimized so that performance stays consistent as more rules are added l dd d DITA rules can be grouped into sets, like OmniMark rules DITA rules can be supplied as OmniMark modules Local DITA rules take precedence over imported rules for the same DITA class Module supplies support functions that understand DITA class specialization
    • DITA Accelerator specialization support The syntax of DITA rules is based on OmniMark element rules Element rules specify element names element "u" output "<u>" || "%c" || "</u>" p DITA rules specify classes instead of elements Selection by DITA declare dita-rule hi-d-u-rule class – understands class "hi-d/u" specialization output "<u>" || dita.process-content || "</u>" DITA rule for "hi-d/u" will take precedence over "topic/ph" Based on the class specialization in the DTD Processes content, , Currently implemented by macros like "%c" in element Allows access to full OmniMark language rules DITA module also provides utility functions p y DITA class-based queries for current and ancestor elements Mimics the element tests built into OmniMark
    • Conclusions The OmniMark-based DITA Accelerator provides scalability y Robust Consistent throughput as volumes increase No catastrophic failures The OmniMark language can be easily extended to p g g y provide a natural DITA programming environment Programmers can "think in DITA", rather than trying to align a pre-existing programming model with the DITA semantics Standards are about choice of tools DITA Toolkit is a good choice for Learning DITA Prototyping Less demanding production uses OmniMark DITA Accelerator Demanding production environments Most importantly, tool choice must be governed by the unique characteristics of your environment