DITA METRICS INPRODUCTION:How, When, Where, and Why(and How Much?)Keith Schengili-Roberts, ManagerDocumentation & Localiza...
BIOGRAPHICAL INFORMATION | a.k.a. Who is this Guy? Keith Schengili-Roberts   – Manager for AMD’s Documentation and     Lo...
WHAT MY GROUP DOES Documentation & Localization Group at AMDs Graphics Product Group  (GPG)   – Formerly ATI, based in Ma...
DITA METRICS AND R.O.I. Most typical context in which one hears about metrics in relation to DITA:   – Scenario: document...
PROVING THE PRODUCTIVITY R.O.I. ARGUMENT How?   – Take metrics before switching over to CMS in order to set a baseline   ...
PROVING THE PRODUCTIVITY R.O.I. ARGUMENT (CONT.) What’s happened since 2007?6 | DITA Metrics in Production: How, When, Wh...
PROVING THE REDUCED LOCALIZATION COST ARGUMENT Track localization budget and compare to actual amount spent; any  positiv...
R.O.I. PROVEN, NOW WHAT? In our case the Ixiasoft DITA CMS had demonstratively paid for itself in  terms of increased eff...
DITA + CMS = ABILITY TO MEASURE OUTPUTS DITA CMS has capability to process large amounts of data in its  repository, so m...
WHAT NOT TO MEASURE | DITA Makes these More Problematic Technical Writing metrics that are of questionable use:    – Hour...
GOOD BACKGROUND MATERIAL ON DOC METRICS Pre-DITA, good general overviews:    – Documentation Metrics: What Do You Really ...
DITA TOPICS AS “ATOMIC” UNIT OF MEASURE Each DITA topic can be thought of  an “atomic” unit of production; while  they va...
ONE EXAMPLE: TOPIC PRODUCTION PER QUARTER Tracked items:    – # of topics created per author    – # of topics modified pe...
TOPIC PRODUCTION PER QUARTER | One View of theResulting Data Collated results show all topic production over a quarter  #...
TOPIC PRODUCTION PER QUARTER | Another View of theResulting Data Can further break this down to show how much groups are ...
COST-PER-TOPIC | Identifying the cost of production Idea came from “Best Practices” article by Mike Eleder (“The Illusive...
COST-PER-TOPIC | Example chart$ Initial version doesn’t provide a clear picture; cost varies based on  production figures...
COST-PER-TOPIC | Example chart with trend line added$ The trend line records the average cost-per-topic over the time mea...
COST-PER-TOPIC | Example chart based on published data   $ This graph shows what is published (i.e. what goes out the doo...
COST-PER-LOCALIZED-TOPIC | Example chart   $ This looks at the cost per localized topic; compares total localized topics ...
REUSE RATES | What to Measure? On the surface, this seems like a simple question to answer: if a topic is  used again, th...
SEGMENTS AS “QUARKS” OF MEASUREMENT FOR REUSE If topics are the “atomic” unit of production measurement,  then segments a...
SAMPLE OF SEGMENTATION AS APPLIED TO REUSE%                                                                               ...
OTHER QUESTIONS RELATED TO PRODUCTIVITYIMPROVEMENTS Constraints added to DITA 1.2 spec allow you to limit the number of t...
WORDCLOUD SHOWING RELATIVE TAG USAGE          Produced by doing a search on the frequency of use of each DITA tag (e.g. #...
EXAMINING THE OUTLIERS                                                                                      In this case ...
TRACKING TOPIC TYPE USAGE | Tracking History Year-over-year comparisons of topic types used provides an idea as to  the c...
TRACKING TOPIC TYPE USAGE | Making Change Asked writers to be more diligent  about writing task topics where  they might ...
COMPARING LIKE PRODUCTION YEAR OVER YEAR Comparing the number of datasheets vs. databooks not useful, but  comparing how ...
LOOKING MORE CLOSELY AT YEARLY PRODUCTION This was created by outputting a list of all maps published within the year,  a...
BREAKING PRODUCTION NUMBERS DOWN FURTHER31 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 1...
PRODUCTION AND LOCALIZATION NUMBERS Following was produced by outputting a list of all non-English maps  created within t...
COMPARING ENGLISH TO LOCALIZED PUBLISHED MAPS A year-over-year comparison of published map output33 | DITA Metrics in Pro...
OTHER POSSIBILITIES | Looking at Quality Another avenue we are looking at is automated quality metrics    – Are actively ...
OTHER POSSIBILITIES | LocalizationThere are further gains to be had on the localization side: Ixiasoft (DITA CMS) and XML...
SLIGHT TANGENT #1: CREATING A DITA STYLE GUIDE A recommendation for any tech docs group that uses DITA extensively:    – ...
SCHEMATRON CAN HELP ENFORCE DITA STYLE What is Schematron? “Schematron is a rule-based validation language for  making as...
XSL CAN ALSO HELP ENFORCE DITA HOUSE STYLE We have a DITA house style that says nested lists should be no more  than two ...
SLIGHT TANGENT #2:AUTOMATION EFFORTSGeneral rule: if a process can beautomated, do so.We have a dedicated programmer whobu...
QUESTIONS AND ANSWERS40 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
Trademark AttributionAMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in th...
Upcoming SlideShare
Loading in …5
×

Dita Metrics in Production: How, When, Where, and Why (and How Much) Redux

724
-1

Published on

An update to an earlier presentation that talks about DITA Metrics looking at my experiences while at AMD, and looking at production metrics as well as ROI.

0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
724
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
13
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Dita Metrics in Production: How, When, Where, and Why (and How Much) Redux

  1. 1. DITA METRICS INPRODUCTION:How, When, Where, and Why(and How Much?)Keith Schengili-Roberts, ManagerDocumentation & LocalizationJune 16, 2011
  2. 2. BIOGRAPHICAL INFORMATION | a.k.a. Who is this Guy? Keith Schengili-Roberts – Manager for AMD’s Documentation and Localization department for the Professional Graphics division (formerly ATI); formerly its Information Architect (migrated team to DITA) – Award-winning lecturer at University of Toronto’s Professional Learning Center since 1999, teaching courses on information architecture and content management – Author of four titles on Internet technologies; last title was Core CSS, 2nd Edition (2001) – Co-Founder of the Semiconductor DITA Implementer’s Group (SDIG) that holds monthly free talks/webinars on using DITA for our domain: http://tech.groups.yahoo.com/group/semidita – DITA blog at: www.ditawriter.com2 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
  3. 3. WHAT MY GROUP DOES Documentation & Localization Group at AMDs Graphics Product Group (GPG) – Formerly ATI, based in Markham, Ontario – 4 writers, 1½ process engineers, 2 localization co-ordinators, 1 manager – CMS: DITA CMS from Ixiasoft (www.ixiasoft.com) – Have been using DITA in production for over four years Responsible for: – End-user documentation, including online help – Engineering documentation for ODM/OEM partners – Technical training documentation for partners – Localize in up to 25 languages (mostly end-user and UI) – Primary outputs are PDF and XHTML3 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
  4. 4. DITA METRICS AND R.O.I. Most typical context in which one hears about metrics in relation to DITA: – Scenario: documentation team is trying to justify expense of a DITA- based Content Management System to upper management. The two key costing arguments are:  Higher re-use rates = more efficient process → faster delivery/lower costs  Separation of form from content = more efficient work process, greatly reduced localization costs – If estimated $ R.O.I. > $ CMS and attendant costs, get CMS4 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
  5. 5. PROVING THE PRODUCTIVITY R.O.I. ARGUMENT How? – Take metrics before switching over to CMS in order to set a baseline – Here are comparative numbers from 2007, covering two quarters immediately pre- and post-CMS implementation:  Demonstrates that using the DITA CMS made us faster, and that we more than doubled output (docs were comparable)5 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
  6. 6. PROVING THE PRODUCTIVITY R.O.I. ARGUMENT (CONT.) What’s happened since 2007?6 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
  7. 7. PROVING THE REDUCED LOCALIZATION COST ARGUMENT Track localization budget and compare to actual amount spent; any positive difference goes towards R.O.I. “Bad Old Days” CMS ROI $ Content audit + Single-sourcing  Blue line = localization budget for year, Red line = actual localization spend7 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
  8. 8. R.O.I. PROVEN, NOW WHAT? In our case the Ixiasoft DITA CMS had demonstratively paid for itself in terms of increased efficiency and reduced localization cost by the end of the second year – Any additional savings from that point on is “gravy” Metrics now take on a related but different purpose: – Focus is now on relative production and quality using the CMS; or in other words:  “how do we track what we are doing?”  “how do we do what we do better?”  “what is the ongoing cost of production?”8 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
  9. 9. DITA + CMS = ABILITY TO MEASURE OUTPUTS DITA CMS has capability to process large amounts of data in its repository, so managers can answer the following questions that relate to production in a DITA-based environment: – What is the average size of a topic? – How much content was created? – How much does it cost to create a topic? – What is the topic reuse rate? – What DITA tags are most/least used? Are we using them efficiently? – What is the ratio of topic types used? – How much of the same document type have we produced year over year?9 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
  10. 10. WHAT NOT TO MEASURE | DITA Makes these More Problematic Technical Writing metrics that are of questionable use: – Hours of days to produce a page  What exactly is a “page” in XML?  Different output formats change outcome (double-spaced output anyone?) Documents released per writer  Differences in size (Datasheets vs. Databooks)  Documents with high reuse rates vs. low reuse rates Measure aggregate values rather than individual effort – Otherwise you run the risk of getting more what you measure10 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
  11. 11. GOOD BACKGROUND MATERIAL ON DOC METRICS Pre-DITA, good general overviews: – Documentation Metrics: What Do You Really Want to Measure, by Donald S. Le Vie Jr. http://www.stc.org/intercom/PDFs/2000/200012_06- 09.pdf – Practical and Effective Metrics, by Geoffrey J.S. Hart http://www.stc.org/PDF_Files/myjob/geofferyHart.pdf DITA –specific: – What is the Best Metric to Measure the Success of Your Reuse of DITA Topics? by Bill Hackos http://www.infomanagementcenter.com/enewsletter/200806/third.htm – DITA Metrics: Cost Metrics, by Mark A. Lewis http://dita.xml.org/resource/dita-metrics-cost-metrics (look for PDF link) – The Illusive, Writing Productivity Metric: Making Unit Cost a Competitive Advantage, by Mike Eleder (“Best Practices”, Feb 2011)11 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
  12. 12. DITA TOPICS AS “ATOMIC” UNIT OF MEASURE Each DITA topic can be thought of an “atomic” unit of production; while they vary in size, that difference can be measured and averaged in a meaningful way – Different topic types have their own characteristics; graph shows average, max, min, and count for concepts, references and tasks in our repository – References average = 2.6 pages of lorem ipsum text in MS Word; are roughly twice the size of concepts and tasks12 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
  13. 13. ONE EXAMPLE: TOPIC PRODUCTION PER QUARTER Tracked items: – # of topics created per author – # of topics modified per author Results then collated, showing amount of work being done in DITA CMS Sample search: Authoring cycle selected (current work), All document types selected Select a date range for topic creation, then select a person13 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
  14. 14. TOPIC PRODUCTION PER QUARTER | One View of theResulting Data Collated results show all topic production over a quarter # of Topics – Results show ongoing work necessary to meet product release cycle  An effective tool with upper management; shows productivity and predicted trends based on previous cycles14 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
  15. 15. TOPIC PRODUCTION PER QUARTER | Another View of theResulting Data Can further break this down to show how much groups are contributing: # of Topics – Same total, but this shows relative contribution per group  Uses document type/author to group topic outputs15 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
  16. 16. COST-PER-TOPIC | Identifying the cost of production Idea came from “Best Practices” article by Mike Eleder (“The Illusive, Writing Productivity Metric: Making Unit Cost a Competitive Advantage”) Basically: cost per topic = monthly tech writer team cost topics produced monthly Is a unit cost measure, producing monthly average cost for producing topics – Can estimate the cost of future work based on result – Original article suggests using only published topics; I use topics created + modified (i.e. using the same figures used in previous two charts) as those numbers are effectively normalized and better reflect ongoing work16 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
  17. 17. COST-PER-TOPIC | Example chart$ Initial version doesn’t provide a clear picture; cost varies based on production figures (dips in August of both years represent busy times: more topics created in a given timeframe = lower cost per topic)17 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
  18. 18. COST-PER-TOPIC | Example chart with trend line added$ The trend line records the average cost-per-topic over the time measured This shows an overall downward trend: Avg $28 per topic in 2009, and $25 per topic in 201018 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
  19. 19. COST-PER-TOPIC | Example chart based on published data $ This graph shows what is published (i.e. what goes out the door) each month; release cycle is more variable than our production schedule; it does show costs of publishing content (High = $81.60, Low = $6.61, Avg = $17.79) Previous topics created/modified charts provide a more accurate result as it reflects producing topic content as a running cost19 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
  20. 20. COST-PER-LOCALIZED-TOPIC | Example chart $ This looks at the cost per localized topic; compares total localized topics produced against actual localization costs invoiced for that month Invoicing cycles don’t always match our localization cycles, so same caveat applies here as with published doc Removing outliers, results are interesting: High = $28.15, Min = $0.02, Avg = $5.2520 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
  21. 21. REUSE RATES | What to Measure? On the surface, this seems like a simple question to answer: if a topic is used again, that equals 100% reuse, doesn’t it? – True, but that’s not the whole picture; topics are often copied (cloned) and then only a small portion of the new topic is changed (a sentence is added, a numerical value is changed, etc) – So to get a true value for reuse, you need to go past the topic level Localization tools provides us with a usable option: segments21 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
  22. 22. SEGMENTS AS “QUARKS” OF MEASUREMENT FOR REUSE If topics are the “atomic” unit of production measurement, then segments are “sub-atomic” – Topic reuse can take place at the whole topic level, but more often it happens sub-topic level, as topics are copied and then modified – Segmentation is a concept from the localization realm: a segment is a sentence or (more usually) a portion of a sentence  Defined by segmentation rules, which ensures that you are always comparing like to like  Localization tools look for changes (the “delta”) at this level – Data within our CMS enables us to do a look at this – If you don’t have a CMS, try a localization workbench22 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
  23. 23. SAMPLE OF SEGMENTATION AS APPLIED TO REUSE% # of topics Average monthly topic reuse rate for this time period = 53.53%23 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
  24. 24. OTHER QUESTIONS RELATED TO PRODUCTIVITYIMPROVEMENTS Constraints added to DITA 1.2 spec allow you to limit the number of tags you use – This is useful, because it helps enforce a “house style” of DITA usage, where authors can more readily find the tags that they need to use – But how do you usefully determine what you should and should not keep?  If you have an existing base of content, examine how often tags are used and then set a threshold below which tags are excluded24 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
  25. 25. WORDCLOUD SHOWING RELATIVE TAG USAGE  Produced by doing a search on the frequency of use of each DITA tag (e.g. # of times <note> is used in all topics) in our repository, and then using a WordCloud app (wordle.com) to produce the results  In this case it is not the “big” words in which we are interested, but the smallest, which ought to be excluded from ongoing use25 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
  26. 26. EXAMINING THE OUTLIERS  In this case we are looking at the tags that are the least used – These are candidates for exclusion in the constraint file – This type of search does not reveal tags with a count = 0; these would need to be identified and excluded as well  Note that at 73 tags total, we are using well under half all DITA tags available  Based on final results, an effective DITA 1.2 constraint can be created26 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
  27. 27. TRACKING TOPIC TYPE USAGE | Tracking History Year-over-year comparisons of topic types used provides an idea as to the composition of a “typical” DITA-based document – While I expected a high percentage of reference topics, I wondered whether there were more topics that ought to be tasks which were instead done as references27 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
  28. 28. TRACKING TOPIC TYPE USAGE | Making Change Asked writers to be more diligent about writing task topics where they might be temped to write them as references instead – Result was a measurable increase in the percentage of task topics created over the course of the year – A larger number of our documents now include a procedural, step-by-step approach to the subject, which is beneficial to the end-user28 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
  29. 29. COMPARING LIKE PRODUCTION YEAR OVER YEAR Comparing the number of datasheets vs. databooks not useful, but comparing how many of those are made over several years is – Provides insight into relative production efficiency # of docs produced per year29 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
  30. 30. LOOKING MORE CLOSELY AT YEARLY PRODUCTION This was created by outputting a list of all maps published within the year, and then manually sorting them by type of publication30 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
  31. 31. BREAKING PRODUCTION NUMBERS DOWN FURTHER31 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
  32. 32. PRODUCTION AND LOCALIZATION NUMBERS Following was produced by outputting a list of all non-English maps created within the year, and then sorting (and counting) by language32 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
  33. 33. COMPARING ENGLISH TO LOCALIZED PUBLISHED MAPS A year-over-year comparison of published map output33 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
  34. 34. OTHER POSSIBILITIES | Looking at Quality Another avenue we are looking at is automated quality metrics – Are actively investigating the Flesch Reading Ease metric  It follows a simple formula, which could be applied to the content of any topic  We do a lot of engineering documentation, so low scores (indicating words using many syllables aimed at those with more than a basic education) is expected  However, topics containing basic information that score especially high (such as most concepts) may serve as a guide to the writer to re-examine a topic to make it easier to read34 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
  35. 35. OTHER POSSIBILITIES | LocalizationThere are further gains to be had on the localization side: Ixiasoft (DITA CMS) and XML Int’l (XTM Suite) have integrated the CMS with a translation memory (TM) suite that will allow us to use TM within CMS – Currently use segmentation matching, producing 100% matches that still need to be verified (i.e. 100% matches which still need to be verified); with integrated TM, only new content will be sent to localization vendor.35 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
  36. 36. SLIGHT TANGENT #1: CREATING A DITA STYLE GUIDE A recommendation for any tech docs group that uses DITA extensively: – Helps new writers/contributors come up to speed – Usefully narrows the scope of the XSL work that needs to be done – Many things are “legal” in DITA but may be poor from a “house style” standpoint, for example:  Can have unformatted block content between a header and a table in a section  Tables and figures do not have to have a title  Can have unlimited nested lists  Alpha lists can contain more than 26 items  Lists can contain only a single item36 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
  37. 37. SCHEMATRON CAN HELP ENFORCE DITA STYLE What is Schematron? “Schematron is a rule-based validation language for making assertions about the presence or absence of patterns in XML trees.” (Wikipedia) We use Schematron to point out to the writers potential errors/lapses in our DITA House Style: Text between a section and table not wrapped in block tags: A list ought to have more than one item:37 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
  38. 38. XSL CAN ALSO HELP ENFORCE DITA HOUSE STYLE We have a DITA house style that says nested lists should be no more than two levels deep. Here’s Schematron doing it’s job: And here is the result if you try to output it:38 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
  39. 39. SLIGHT TANGENT #2:AUTOMATION EFFORTSGeneral rule: if a process can beautomated, do so.We have a dedicated programmer whobuilds tools and automation processes forour group.Sample tools built include: “Table-izer”, app that creates DITA XMLtable from clipboard Automated T.O.C. DITA Topic Editor, a tag-less editoraimed at SMEs Automated index based from a list ofkeywords Change-bar mechanism that diffs output Dynamic trademark attribution39 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
  40. 40. QUESTIONS AND ANSWERS40 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
  41. 41. Trademark AttributionAMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United Statesand/or other jurisdictions. Other names used in this presentation are for identification purposes only and may be trademarks oftheir respective owners.©2011 Advanced Micro Devices, Inc. All rights reserved.41 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×