1. I IKnow What You Wrote Last
Know What You Wrote Last
Summer Summer
Using Cumulative Sum for Voice
Using Cumulative Sum Voice
Unification in Authoring
Unification in Authoring
Confidential SDL Information
2. Jonathan Slaughter – Business Consultant
SDL International
jslaughter@sdl.com
@JRSlaughterSDL
Confidential SDL Information
3. Today’s agenda
Overview
• What is it?
• History of Writer Analysis
Cumulative Sum
• Early origins
• Current uses
How it Works
• Creating a Voice
• Analysis of Authors
• Unification
Applicability and ROI
• Impact
• Where does it make sense?
• When is it “overkill?”
Examples
• Charts
• Customers
Q&A
5. What is it?
The Cumulative Sum technique is a
recognition system applied to human
utterance, whether written or spoken. The
application of this system is commonly
called “QSUM.”
Two-stage analysis based on:
1) analyzing sequences of language
units (normal unit is the sentence) and,
2) counts of recurrent kinds of language-
use within each sentence
Based on “quantitative stylistics” – the
use of mathematical models as a basis for
examining the periodic, or recurrent, nature
of language.
Literary “scholarship” versus “criticism”
6. Brief history
1859 – Augustus de Morgan, professor of mathematics at London
University first suggests using number of words and average word
length of all Epistles to confirm/deny authorship of Hebrews to Paul.
1938 – Cambridge statistician, G. Udny Yale developed first formal
word-length index format and focused on word distribution within each
sentence and across the document.
1960’s – four major statistical studies around authorship:
1962 – Alvar Ellegard’s examination of the Junius Letters
1964 – Mosteller and Wallace’s study of the Federalist papers
1967 – Louis Milic’s analysis of Jonathan Swift’s prose
1966 – Morton and McLeman’s work on the Pauline Epistles
1988 – Andrew Morton incorporates cumulative sum tests, commonly
used in industrial settings, within the study of human utterance.
1990 – QSUM techniques and graphs used in court case to
attribute/refute ownership of confession during appeal. Followed by
future uses within courts.
2005* – First uses of QSUM techniques to unify multiple authors’
“voices” to a single “voice.”
7. How does this fit in to business?
Global organizations are taking significant steps to
improve/reduce the costs of creating and distributing content to
their end-users. Examples include:
Minimalism
Global Authoring Practices/Training
Workforce Globalization
Content Management Systems
Authoring Tools
What none of these tools and processes do is create a truly
“homogeneous” voice for authored content.
CMS systems optimize re-use (consistency) but assume the source content is of acceptable
quality
Global authoring and Minimalism teach “practices” but fail to address the effect of combined
voices in re-used content
Voice Unification is a “next” step for organizations looking to
establish optimal ROI on process and technological investments.
Good investment where “brand image” and “brand communication” is central to company
success
Impact on technical material can vary, based upon target markets
Recommended to clients centralizing source content development in organizations grown
primarily through acquisition (loose integration) or significant shifts in development strategy.
8. How to create/define your voice?
Understanding what your
company “voice” sounds like
is important. There are three
common methods
Voice Creation
Mean Voice Alteration
Select Voice Modification
Each provides similar benefits,
but the best option depends on
a number of factors, including:
Content types
Number of voices
Audience expectations
Content re-use
9. Factors used to define
Cusum analysis, aims to compare two aspects of habitual
language use within a given text, segment of text, or combination
of texts:
Length – the number of words, in a sentence written or uttered, by the person providing the
sample.
• Cusum is the sum of the deviations in length – more or less – of the sentences from the
average sentence length. Produces sld (sentence length distribution)
Habit – feature of language use within each sentence. Most commonly used are the number of
two and three-letter words (23lw) and initial vowel words (ivw).
• Cusum of habit is average of these per sentence, with the deviation from this average
tracked.
QSUM charts can then be created combining the graphs of both
aspects in overlaid format.
Provides a visual comparison of that aspects
The closer the two charts (demonstrated on the next slides) are, the more “homogeneous” the
user voices are – the more likely it was written by the same person.
Voice Unification is a difficult process and requires conscious
content creation.