Version Control, Writers,
Using development processes and tools to swim with the
current, not against it
Richard Mateosian STC-SV November 16, 2015
Share Your Experience
Mathematician . . . Developer . . . Marketer . . . 20+ years a tech writer.
Writers andDevelopers Use Similar Processes – SharingTools CanHelp
Topics S/W Components
PDF, help, website
SaaS, download, DVD
Iterate indefinitely between the lines
VCS Repositories Help You Control Files
•Version control system (VCS) repositories (Git, Subversion, Perforce, and others) keep
track of current and old versions of your files.
•Projects keep track of related files: generally the sources for building software
packages or documents. Master and other branches help manage complexity.
•You can check files out, check them in, merge, compare versions.
•Repos integrate with other tools. They handle text best (for diffs and merges), but
they don’t understand contents or do CCM (component content management).
1. Maintain a local copy.
2. Keep it in sync with the master.
3. Make your changes and test them locally.
4. Commit them to the master.
Share Your Experience.
Merging Is Not a Big Deal For Docs
•Merging is putting your changes into a changed master copy.
•Most changes (in text files) don’t conflict, so most VCSs default to optimistic locking –
checking out a file doesn’t prevent others from changing it while you are.
•Every VCS has tools to handle conflicts; if it can’t resolve them, it asks you to do it.
1. Check out the file.
2. Make changes.
3. Check it in.
4. If the VCS sees a conflict, use its tools to resolve it.
Share Your Experience.
VCS Repositories Are Central or
•Every repo enables checking files in and out and keeps history.
•A central repo (Perforce, Subversion) uses a master server. You must be connected to
the master to perform repo operations.
•A distributed repo (Git) replicates the origin repo on your machine. Every machine is
potentially the master. You can perform repo operations while disconnected from the
•Git is powerful but not user friendly.
•GitHub supports using Git for your projects. Many open source projects use it.
Easy Branching Makes Git Special
•Branching has low overhead.
•Branching enables all repo ops, but keeps changes separate.
•Pull request provides review mechanism.
•Pull operation merges changes into main branch.
Pull Requests are Git’s Review Process
•Git branches enable you to develop independently.
•A pull request says, “please pull my branch into the master.”
•GitHub manages review comments and responses.
Git Branching Workflow
1. Create a branch & track it on the origin repo.
2. Make and test changes locally.
3. Sync local with origin & issue a pull request.
4. Submitter and reviewers interact until everyone says LGTM (looks good to me).
5. An authorized user (could be you) merges your branch into the master.
6. You delete your branch.
Share Your Experience.
Aside: DITA for Small (Impecunious)
Eliot Kimber, author of DITA for Practitioners, suggests using
•Git for managing content.
•DITA Open Toolkit for generating output.
•Jenkins (was Hudson) for automating processing.
•oXygenXML for editing and producing output locally.
teams.html, www.dita-for-small-teams.org, www.dita-ot.org, jenkins-ci.org,
Share Your Experience.
Tracking Issues Requires an Issue Tracker
•Software systems and their documentation are complex. Keeping track of details is
•Issue trackers (JIRA, Bugzilla, and dozens of others) keep and organize information
about tasks, defects, and the conversations that arise around them.
•Trackers have different slants: general purpose, software development, customer
tickets, and so forth.
•Many trackers integrate with other tools.
Issue Tracking Workflow
1. Someone reports an issue.
2. People comment.
3. Someone (or some process) assigns the issue to you.
4. You make changes and others review and comment.
5. Designated reviewer closes the issue.
Share Your Experience.
Be Part of the Team
•Understand the process you’re part of – even if you don’t seem to be part of it.
• Inhabit the communication spaces: online chat, project wiki, relevant email lists, issue
tracker, even the code if necessary.
•Understand tribal customs: use the communication spaces the way others use them.
•Take charge of doc review: make it efficient. Don’t waste developers’ time.
Share Your Experience.
Atlassian Git Tutorials: www.atlassian.com/git/tutorials
Perforce tutorials: www.perforce.com/resources/tutorials
Subversion docs: svnbook.red-bean.com
Wikipedia on issue tracking: en.wikipedia.org/wiki/Issue_tracking_system
JIRA docs: confluence.atlassian.com/jira/jira-documentation-1556.html
Bugzilla docs: bmo.readthedocs.org/en/latest
Great book on team communication: Scott Berkun's The Year Without Pants
Questions, comments, or corrections: email@example.com
Share your references.
This is a presentation to the Silicon Valley chapter of the Society for Technical Communication (STC).
I’d like to thank Tom Johnson, the chapter’s manager of programs (and famous blogger in his own right) for inviting me. The presentation is designed for interaction, because there are so many different tools and varied workflows. Reading through the slides should take no more than 15 minutes, so if you sit on your hands and keep your mouths shut, it’s going to be a short meeting. Let me hear your different approaches, or just war stories. Many of the slides end with the line “share your experience.”
Like that, only a little smaller and at the bottom.
I write for an audience of developers. I understand that audience because of my background.
By the time I graduated from high school, I knew I wanted to study mathematics. I did so at RPI and then in graduate school at UC Berkeley. My PhD thesis is titled Simple Algebras and Relative Galois Cohomology. I don’t remember what that means.
Because of a dispute with my first thesis adviser, I got a full time job as a computer programmer while finishing my thesis and stayed with it after I finished.
I spent 15 years as a developer and development manager. I then wrote two computer books and spent a decade doing technical and strategic marketing in the semiconductor industry.
For the last 20+ years, I’ve been deeply immersed in the culture of technical writing, so I understand what writers are trying to do and why they have trouble doing it.
This is a highly simplified view showing the parallels between what technical writers and developers do.
The people who conceive the projects we work on may be visionaries, but when it comes down building something that works more or less as the designers envisioned it, developers do most of the work. What writers do is similar enough that we can leverage development tools to reduce the semantic gap, impedance mismatch, or whatever you want to call the problems that writers have working with developers. To the extent possible we must function as members of the development team. That starts with becoming comfortable in the environment the development team lives in.
Maybe it will be otherwise someday, but files are at the heart of everything we do. Developers store their source code in files. Writers store their text and images in files. We have a whole slew of files to keep track of for each development project, and when we try to support separate versions of a product, complexity grows. VCSs help us keep track of all those versions. Most use projects and branches to simplify things a bit.
VCSs provide a basic set of functions – checking files in and out, merging changes, and comparing versions. More about that later. Sometimes they integrate with other tools (for example, an XML editor).
We have writing techniques that require large numbers of small files that we can mix and match. For a while we can get by on file naming conventions and folder hierarchies, but we soon outgrow that approach and need some sort of metadata-based management system. VCSs don’t provide the necessary support, but can help with keeping track of all the files and versions. If things get too complex, you may need a component content management system.
The general idea with all repos is that they keep a master copy and you have a local copy. You want the local copy to stay in sync with the master, but you don’t want your work to mess up the master irretrievably.
Committing a change is a formal process, and the VCS keeps enough information to undo it. You must document why you changed the committed files.
In some development environments you can’t commit a file without additional verification, like a successful build followed by a battery of tests. Generally, writers don’t need to do that, though if you work in the code – on Javadoc for example – you might run into that. You too can feel the peer disapproval when you break the build!
If different people work on the same file, then they might interfere with each other’s work. One way to prevent this is to allow only one person to work on a file at a time. That’s called pessimistic locking. While you’re working on the file, people can still read the master (which doesn’t have your changes yet), but they can’t modify it. Pessimistic locking can be highly inefficient.
In most cases different people can work on the same file, and their changes don’t conflict. So John checks out a file. Then Mary checks out the same file, modifies it, and checks it back in. John makes his changes, and when he wants to check his version in, that’s where merging comes in.
If you’re working with files (for example FrameMaker or Word) that aren’t purely text based, the VCS typically has no way to merge the two sets of changes, but for text-based files (program code or XML-based docs, for example), it’s usually no problem. The VCS can often do it automatically, and when it can’t, it asks John to look for potential conflicts between his changes and Mary’s and resolve them.
So this is John’s workflow. When he gets to step 3, the VCS notices that his changed file is based on an earlier version, because Mary’s is now the latest. If there is no overlap between his changes and Mary’s he’s in the clear. If there is overlap, the VCS can show John’s and Mary’s versions simultaneously and ask John to resolve the differences.
Most well-known VCSs, except for Git, use a central repository. You must be connected to that repository’s server to do anything.
Git, modeled on BitKeeper, was developed by and under the direction of Linus Torvalds to help manage the huge Linux kernel code base. The distributed aspect of it allows easy backup and avoids the politics of who can write to the master. People can have their own copy and do what they please in it and negotiate about merging their changes into the main line later. Torvalds says he does many merges per day, so he wanted that to be fast.
You get a Git-based project by cloning a repo. When you do so, the repo you cloned from becomes “origin,” though you can have other remotes as well.
Git’s conceptual model and basic entities are not easy to understand, so as with Word, people figure out how to do what they need to and try to stay on that narrow path through the mine field.
GitHub is a website that supports using Git in this way. Many open source projects use Git.
Imagine how many different projects contribute to the ongoing development of the Linux kernel. Without an easy way to start, carry out, and integrate those projects, the overhead would be unbearable. Torvalds designed Git to enable low-overhead branching. Because everything is on your local machine and each branch is just a set of pointers, you can switch between branches in moments. You can change your branch without affecting the master and merge it into the master, again in just a few moments.
You can do what you please in your branch, pull ongoing changes from the master as you go long, and when you want to integrate your branch back into the master, you issue a pull request.
Once you issue a pull request, people turn their attention to your branch. If you track your branch on origin – a simple procedure – people can look at your changes and try them out. This applies to docs as well as to program source code. For example, if your Git repo is for a static website, developers and your fellow writers can build their own local versions, so there is no mystery about what you’re trying to integrate into the main branch. If you host your origin repo on GitHub, it keeps track of review comments and notifies team members of them by email.
So that’s how easy it is to use a branch. The overhead is tiny compared to branches on central repos. The workflow is smooth.
If you’re on a DITA-based project, you have many ways to go, but some of them are pretty pricey. Kimber suggests using Git and the DITA Open Toolkit, both of which are free. Jenkins is also free. Hudson was the butler on Upstairs, Downstairs. He gave his name to a Java-based project designed to automate integration and building. But when Larry Ellison bought the Bellamy family house, Hudson stayed on as his butler, and the Bellamys hired Jenkins to do that job for them in their new home. In more technical terms, the Jenkins project forked from the Hudson project in 2010 when Oracle, the owner of Java, decided to trademark the name Hudson.
The only part of Kimber’s package that you have to pay for is oXygenXML, which is a highly regarded DITA authoring tool. We heard about it here from George Bina in October 2014.
The links on the slide take you to Kimber’s posting and DITA for small teams project and to documentation of the DITA open toolkit, Jenkins, and oXygenXML.
Software is complex. Our docs are complex. And software and docs interact in many ways. An issue tracking system keeps track of all the issues that anyone notices. The most common ones are oriented toward software development. Others – Zendesk, for example – are oriented toward support calls, but they all have the same general goal: keep the information together and ensure that the problem remains in view until it’s resolved.
Issue trackers often integrate with other tools, especially VCSs.
When a software problem arises, we may have to document the bug. When someone fixes a bug, the behavior of the product may change, requiring a change in documentation. Typically, someone creates a sub-issue specifically to track the required doc changes, and that sub-issue may get assigned to you.
So you have a doc issue assigned to you. The workflow is simple, but to finish step 4, you may have to navigate through the tracking system to read comments on other issues. The comments may be written by developers and assume knowledge of program internals. The comments may not be easy to understand. It may be hard to determine which parts of the information are relevant to the audience for your docs. The better your facility with the tracking system and the better your relationships with the developers writing comments or fixing the software, the more likely you are to produce useful documentation.
The team you’re working with functions in an environment. If you’re uncomfortable in the environment, you’ll have a harder time producing good docs.
Understand the process. It’s easy for writers to feel left out. Different development processes have different rhythms, different artifacts, different points of focus. Understanding and being sensitive to the team’s needs and priorities gives you the best chance of getting the help you need to produce good, timely documentation.
Inhabit the communication spaces. Different projects have different communication customs. There may be a mix of online chat, email, and wiki pages. The team may have conventions – perhaps unstated -- for which to use when. Learn and follow those conventions. The references slide has a link to Scott Berkun’s book about how they did it at Automattic, the company behind WordPress. They used IRC and blogs and occasional face-to-face team meetings. It’s a fun read.
Doc review can be problematic. There are many formal systems for it, and some of them have good points you can incorporate, but in the end you have to do what works in your environment. Give careful instructions about what you need – and don’t need – and when you need it. Don’t waste developers’ time, but let them know when their slow responses are blocking you.
Does any of this resonate with your experience?
The first three references deal with VCSs. I’ve looked at them but not thoroughly.
The next three deal with issue tracking. The Wikipedia article is an exhaustive survey. I haven’t worked with Bugzilla, but it’s popular and worth looking into. JIRA is an Atlassian product, and the link is to Atlassian docs.
The Scott Berkun book captures the flavor of working at a totally virtual company. It’s an inspiration to those of us who don’t like sitting in a cubicle all day.
And finally, here’s my email address, in case you have comments or corrections.