SlideShare a Scribd company logo
1 of 10
Download to read offline
I IKnow What You Wrote Last
          Know What You Wrote Last
      Summer Summer
                          Using Cumulative Sum for Voice
                           Using Cumulative Sum        Voice
                          Unification in Authoring
                                Unification in Authoring

Confidential SDL Information
Jonathan Slaughter – Business Consultant
SDL International
jslaughter@sdl.com
@JRSlaughterSDL
 Confidential SDL Information
Today’s agenda

  Overview
  • What is it?
  • History of Writer Analysis
  Cumulative Sum
  • Early origins
  • Current uses
  How it Works
  • Creating a Voice
  • Analysis of Authors
  • Unification
  Applicability and ROI
  • Impact
  • Where does it make sense?
  • When is it “overkill?”
  Examples
  • Charts
  • Customers
  Q&A
A “voice” is distinct
What is it?

              The Cumulative Sum technique is a
              recognition system applied to human
              utterance, whether written or spoken. The
              application of this system is commonly
              called “QSUM.”
              Two-stage analysis based on:
                  1) analyzing sequences of language
                  units (normal unit is the sentence) and,
                  2) counts of recurrent kinds of language-
                  use within each sentence
              Based on “quantitative stylistics” – the
              use of mathematical models as a basis for
              examining the periodic, or recurrent, nature
              of language.
              Literary “scholarship” versus “criticism”
Brief history
 1859 – Augustus de Morgan, professor of mathematics at London
 University first suggests using number of words and average word
 length of all Epistles to confirm/deny authorship of Hebrews to Paul.
 1938 – Cambridge statistician, G. Udny Yale developed first formal
 word-length index format and focused on word distribution within each
 sentence and across the document.
 1960’s – four major statistical studies around authorship:
     1962 – Alvar Ellegard’s examination of the Junius Letters
     1964 – Mosteller and Wallace’s study of the Federalist papers
     1967 – Louis Milic’s analysis of Jonathan Swift’s prose
     1966 – Morton and McLeman’s work on the Pauline Epistles
 1988 – Andrew Morton incorporates cumulative sum tests, commonly
 used in industrial settings, within the study of human utterance.
 1990 – QSUM techniques and graphs used in court case to
 attribute/refute ownership of confession during appeal. Followed by
 future uses within courts.
 2005* – First uses of QSUM techniques to unify multiple authors’
 “voices” to a single “voice.”
How does this fit in to business?

 Global organizations are taking significant steps to
 improve/reduce the costs of creating and distributing content to
 their end-users. Examples include:
     Minimalism
     Global Authoring Practices/Training
     Workforce Globalization
     Content Management Systems
     Authoring Tools
 What none of these tools and processes do is create a truly
 “homogeneous” voice for authored content.
     CMS systems optimize re-use (consistency) but assume the source content is of acceptable
     quality
     Global authoring and Minimalism teach “practices” but fail to address the effect of combined
     voices in re-used content
 Voice Unification is a “next” step for organizations looking to
 establish optimal ROI on process and technological investments.
     Good investment where “brand image” and “brand communication” is central to company
     success
     Impact on technical material can vary, based upon target markets
     Recommended to clients centralizing source content development in organizations grown
     primarily through acquisition (loose integration) or significant shifts in development strategy.
How to create/define your voice?

 Understanding what your
 company “voice” sounds like
 is important. There are three
 common methods
     Voice Creation
     Mean Voice Alteration
     Select Voice Modification
 Each provides similar benefits,
 but the best option depends on
 a number of factors, including:
     Content types
     Number of voices
     Audience expectations
     Content re-use
Factors used to define

 Cusum analysis, aims to compare two aspects of habitual
 language use within a given text, segment of text, or combination
 of texts:
     Length – the number of words, in a sentence written or uttered, by the person providing the
     sample.
       • Cusum is the sum of the deviations in length – more or less – of the sentences from the
         average sentence length. Produces sld (sentence length distribution)
     Habit – feature of language use within each sentence. Most commonly used are the number of
     two and three-letter words (23lw) and initial vowel words (ivw).
       • Cusum of habit is average of these per sentence, with the deviation from this average
         tracked.
 QSUM charts can then be created combining the graphs of both
 aspects in overlaid format.
     Provides a visual comparison of that aspects
     The closer the two charts (demonstrated on the next slides) are, the more “homogeneous” the
     user voices are – the more likely it was written by the same person.
 Voice Unification is a difficult process and requires conscious
 content creation.
Wrap Up Q&A

Confidential SDL Information

More Related Content

Similar to Sdl lavacon 2011 jonathan slaughter

Industry-Academia Communication In Empirical Software Engineering
Industry-Academia Communication In Empirical Software EngineeringIndustry-Academia Communication In Empirical Software Engineering
Industry-Academia Communication In Empirical Software Engineering
Per Runeson
 
It services & research methods
It services & research methodsIt services & research methods
It services & research methods
AkanshShandilya
 
Omt Personal Statement Examples
Omt Personal Statement ExamplesOmt Personal Statement Examples
Omt Personal Statement Examples
Tammy Lacy
 
Automatic text simplification evaluation aspects
Automatic text simplification  evaluation aspectsAutomatic text simplification  evaluation aspects
Automatic text simplification evaluation aspects
iwan_rg
 

Similar to Sdl lavacon 2011 jonathan slaughter (20)

Ich Bin Ein Website - The impact of culture and language on internationalization
Ich Bin Ein Website - The impact of culture and language on internationalizationIch Bin Ein Website - The impact of culture and language on internationalization
Ich Bin Ein Website - The impact of culture and language on internationalization
 
Dynamic V&V in Language-Oriented Modeling
Dynamic V&V in Language-Oriented ModelingDynamic V&V in Language-Oriented Modeling
Dynamic V&V in Language-Oriented Modeling
 
Using construction grammar in conversational systems
Using construction grammar in conversational systemsUsing construction grammar in conversational systems
Using construction grammar in conversational systems
 
Industry-Academia Communication In Empirical Software Engineering
Industry-Academia Communication In Empirical Software EngineeringIndustry-Academia Communication In Empirical Software Engineering
Industry-Academia Communication In Empirical Software Engineering
 
Corpus study design
Corpus study designCorpus study design
Corpus study design
 
De4201715719
De4201715719De4201715719
De4201715719
 
Live Sign Language Translation: A Survey
Live Sign Language Translation: A SurveyLive Sign Language Translation: A Survey
Live Sign Language Translation: A Survey
 
A tool for discourse visualization and analysis
A tool for discourse visualization and analysisA tool for discourse visualization and analysis
A tool for discourse visualization and analysis
 
It services & research methods
It services & research methodsIt services & research methods
It services & research methods
 
Omt Personal Statement Examples
Omt Personal Statement ExamplesOmt Personal Statement Examples
Omt Personal Statement Examples
 
Domain Specific Terminology Extraction (ICICT 2006)
Domain Specific Terminology Extraction (ICICT 2006)Domain Specific Terminology Extraction (ICICT 2006)
Domain Specific Terminology Extraction (ICICT 2006)
 
IRJET - Text Optimization/Summarizer using Natural Language Processing
IRJET - Text Optimization/Summarizer using Natural Language Processing IRJET - Text Optimization/Summarizer using Natural Language Processing
IRJET - Text Optimization/Summarizer using Natural Language Processing
 
10
1010
10
 
A Survey on Speech Recognition with Language Specification
A Survey on Speech Recognition with Language SpecificationA Survey on Speech Recognition with Language Specification
A Survey on Speech Recognition with Language Specification
 
Automatic text simplification evaluation aspects
Automatic text simplification  evaluation aspectsAutomatic text simplification  evaluation aspects
Automatic text simplification evaluation aspects
 
The impact of standardized terminologies and domain-ontologies in multilingua...
The impact of standardized terminologies and domain-ontologies in multilingua...The impact of standardized terminologies and domain-ontologies in multilingua...
The impact of standardized terminologies and domain-ontologies in multilingua...
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...
 
Mining Opinion Features in Customer Reviews
Mining Opinion Features in Customer ReviewsMining Opinion Features in Customer Reviews
Mining Opinion Features in Customer Reviews
 
ICAME 2010
ICAME 2010ICAME 2010
ICAME 2010
 
What are the basics of Analysing a corpus? chpt.10 Routledge
What are the basics of Analysing a corpus? chpt.10 RoutledgeWhat are the basics of Analysing a corpus? chpt.10 Routledge
What are the basics of Analysing a corpus? chpt.10 Routledge
 

Recently uploaded

Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
fonyou31
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Krashi Coaching
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 

Recently uploaded (20)

Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 

Sdl lavacon 2011 jonathan slaughter

  • 1. I IKnow What You Wrote Last Know What You Wrote Last Summer Summer Using Cumulative Sum for Voice Using Cumulative Sum Voice Unification in Authoring Unification in Authoring Confidential SDL Information
  • 2. Jonathan Slaughter – Business Consultant SDL International jslaughter@sdl.com @JRSlaughterSDL Confidential SDL Information
  • 3. Today’s agenda Overview • What is it? • History of Writer Analysis Cumulative Sum • Early origins • Current uses How it Works • Creating a Voice • Analysis of Authors • Unification Applicability and ROI • Impact • Where does it make sense? • When is it “overkill?” Examples • Charts • Customers Q&A
  • 4. A “voice” is distinct
  • 5. What is it? The Cumulative Sum technique is a recognition system applied to human utterance, whether written or spoken. The application of this system is commonly called “QSUM.” Two-stage analysis based on: 1) analyzing sequences of language units (normal unit is the sentence) and, 2) counts of recurrent kinds of language- use within each sentence Based on “quantitative stylistics” – the use of mathematical models as a basis for examining the periodic, or recurrent, nature of language. Literary “scholarship” versus “criticism”
  • 6. Brief history 1859 – Augustus de Morgan, professor of mathematics at London University first suggests using number of words and average word length of all Epistles to confirm/deny authorship of Hebrews to Paul. 1938 – Cambridge statistician, G. Udny Yale developed first formal word-length index format and focused on word distribution within each sentence and across the document. 1960’s – four major statistical studies around authorship: 1962 – Alvar Ellegard’s examination of the Junius Letters 1964 – Mosteller and Wallace’s study of the Federalist papers 1967 – Louis Milic’s analysis of Jonathan Swift’s prose 1966 – Morton and McLeman’s work on the Pauline Epistles 1988 – Andrew Morton incorporates cumulative sum tests, commonly used in industrial settings, within the study of human utterance. 1990 – QSUM techniques and graphs used in court case to attribute/refute ownership of confession during appeal. Followed by future uses within courts. 2005* – First uses of QSUM techniques to unify multiple authors’ “voices” to a single “voice.”
  • 7. How does this fit in to business? Global organizations are taking significant steps to improve/reduce the costs of creating and distributing content to their end-users. Examples include: Minimalism Global Authoring Practices/Training Workforce Globalization Content Management Systems Authoring Tools What none of these tools and processes do is create a truly “homogeneous” voice for authored content. CMS systems optimize re-use (consistency) but assume the source content is of acceptable quality Global authoring and Minimalism teach “practices” but fail to address the effect of combined voices in re-used content Voice Unification is a “next” step for organizations looking to establish optimal ROI on process and technological investments. Good investment where “brand image” and “brand communication” is central to company success Impact on technical material can vary, based upon target markets Recommended to clients centralizing source content development in organizations grown primarily through acquisition (loose integration) or significant shifts in development strategy.
  • 8. How to create/define your voice? Understanding what your company “voice” sounds like is important. There are three common methods Voice Creation Mean Voice Alteration Select Voice Modification Each provides similar benefits, but the best option depends on a number of factors, including: Content types Number of voices Audience expectations Content re-use
  • 9. Factors used to define Cusum analysis, aims to compare two aspects of habitual language use within a given text, segment of text, or combination of texts: Length – the number of words, in a sentence written or uttered, by the person providing the sample. • Cusum is the sum of the deviations in length – more or less – of the sentences from the average sentence length. Produces sld (sentence length distribution) Habit – feature of language use within each sentence. Most commonly used are the number of two and three-letter words (23lw) and initial vowel words (ivw). • Cusum of habit is average of these per sentence, with the deviation from this average tracked. QSUM charts can then be created combining the graphs of both aspects in overlaid format. Provides a visual comparison of that aspects The closer the two charts (demonstrated on the next slides) are, the more “homogeneous” the user voices are – the more likely it was written by the same person. Voice Unification is a difficult process and requires conscious content creation.
  • 10. Wrap Up Q&A Confidential SDL Information