Gen AI in Business - Global Trends Report 2024.pdf
Automated Testing DITA Content and Customizations
1. Automated Testing DITA
Content and Customizations
Steven Anderson
Information Architect
@sanderson_sfdc
steve.anderson@gmail.com
2. What is testing?
Testing is the process of validating and verifying that your
project:
– meets the requirements that guided its design and development
– works as expected
3. What types of testing are there?
Regression testing
Acceptance testing
Alpha and beta testing
4. How can we meet our test objectives?
Manual testing
Automated testing
5. What is manual testing?
Visual inspection
Click that link!
Expand the table of contents
6. What is automated testing?
Use software to run the tests
Compare actual output to expected output
Report on the state of the content or output
7. Let's learn from software development
Use manual testing only when required
Depend on automated testing
8. What are the limitations of manual testing?
Time consuming
Error prone
Is that an error or not?
9. Why is automated testing a good choice?
It gives you confidence that you haven't made that mistake
– again
It scales
It finds unintended side-effects
11. With DITA there are three things to test
Content (the input)
The OT itself
The output
12. How can you test your content?
Schematron
QA plugin
XMLUnit
13. What is schematron?
A rule based validation language for making assertions
about the presence or absence of patterns in XML trees
Many authoring tools use schematron
http://www.schematron.com/
15. What’s the QA plugin?
Created by D.P. Clark and Patrick Quinlan
Identifies errors in DITA tagging, element nesting, language
standards, and common syntax errors
Default tests based on styles in the Microsoft Manual of
Style for Technical Publications
An HTML report is created that includes links to each
project topic file
Tests written in XSLT
http://sourceforge.net/p/qa-plugin-dot/wiki/Home/
19. How can you test the OT?
OT regression test suite
JUnit
20. DITA OT Regression Test
A large number of sample DITA maps and topics
representing a wide variety of content, as well as batch files
to run the content through a build
The output of the test is compared to the output of the
previous working set of code to determine if any tests fail or
if any output changes in a negative way
Mostly useful when customizing, or when upgrading,
versions of the OT
http://dita.xml.org/wiki/regression-testing-in-the-toolkit
25. Needle and Nose
Unit testing using Python
Nose is the test framework, needle does the comparison
For testing HTML output
http://needle.readthedocs.org/en/latest/index.html
* Abstract When publishing DITA content, customization is a given. The format you need for your HTML/PDF/epub, etc., is special, and the default output doesn't work for you. But customization is tricky. What happens when you override a template? Are you sure you know all the side-effects? For anything but the simplest customizations, we need help to be sure the latest tweak doesn't cause a nasty side-effect somewhere else. Anderson explains how automated testing is used in programming, and how he uses automated testing in salesforce.com DITA processes. The same is also true of content. What happens when you remove some content from a topic that is used in multiple deliverables? Do you know the side-effects? How about upgrading from one version of DITA to another? How do you verify that your process will still work? Automated testing can help with all of these issues. Anderson explains how automated testing is used in programming, and how he uses automated testing in salesforce.com DITA processes.
Making this more specific for us, as DITA users, we test to ensure that our readers and users are getting usable content. No matter what your process is, you are already testing. Every time your review the output of your content, you are visually testing it to make sure it's okay. Yes, that's pretty obvious, right? For me, though, testing is also a way to manage my fear. Did that change cause anything to break? A little fear can be a good thing, it helps you avoid stupid mistakes, but when fear hurts your productivity, or keeps you from upgrading to a new version of the OT that would really help you and your customers, that's not good fear. Testing helps up manage our fear, allowing us to be more agile and more productive.
There are different ways that testing can be categorized. For DITA, the most logical method we've found is to categorize our tests using test objectives. Regression testing asks the question – did my change break something that was working? Ever run a build in the morning and everything works, then you change something simple, like you add a period, and then rebuild and everything blows up? You pull out your hair trying to figure out what changed. Usually it's not what you did, it's what that *other* writer did, or something the tools guy changed, or cosmic rays. Regression testing helps you avoid situations like that. It tells you, “Hey, this change broke it” or “If you make this change, you’ll break this part of the workflow”. The latter is actually what you want to hear. You want to do your regression testing *before* you add your changes to the workflow of your team. When you do that, you ensure that you aren’t causing anyone else problems. Acceptance testing asks the question - does it work the way we want it to? This assumes your build works. But even if the build worked, things could still be wrong. You want to make sure that you've resolved all of these things before letting your reader get a hold of it. Are there any broken links? Missing images? Empty elements? Topics with no content except a title? Completely empty topics? Topics that still contain boiler-plate text from the template you use? Alpha and beta tests are where people actually are using your project. That’s when you are really getting a chance to verify – does it work the way they expect it to work?
TODO: Is this the best way to introduce automated testing?
This probably seems obvious, but every time you edit a file, or review the output of a build, you are doing manual testing. We technical communicators are picky people. I've had bugs filed against me for a half-point different in a type-face. The writer found that by looking at the output. But we’ve also found bugs in links, functionality of the site (such as a table of contents that did not expand), etc.
The biggest difference between manual testing and automated testing is that the only time a person needs to get involved is if there’s a test failure. You can schedule your tests, or start them manually, and, if there are failures, communicate them appropriately. Most automated tests are created based on manual tests. Rather than clicking the link, run an automated test to verify it works.
TODO: Beef this section up. What is the history of testing in software development. Sometimes manual testing is required, but it’s limited
Show sample XML, schematron file, then run the test. Explain how schematron, while automated, is scheduled not based on time, but based on user action (say, at time of save).
Show sample content, a sample test XSLT file, and the output of the test
TODO: Add simple info about what Junit is
TODO: Show example XMLUnit, sample XML content, and a test report. Compare and contrast to QA plugin
Show sample content, and an error report
I’m not going to go in-depth about link tests. There are many tools for doing it, both online and command-line based. There is only one PDF link checker that I’m aware of. TODO: Get URL. You can use XMLUnit for your output the same way you do for the input, so I won’t cover that again. You can use it both for XHTML and PDF by validating the FO, before the PDF conversion.
These tools use web browsers to verify the output. They can do all kinds of interesting things, including checking links, validating behavior (does the search button work?). One of the great things about these tools is that they experience your output the way your users do.
Simple selenium test run. Show the browser actually running, and the result.
TODO: Should I fold these into selenium/silk? It’s a different way of writing the tests, and needle does the comparison in a way that selenium cannot.