Experimenting the
TextTiling Algorithm
Summary of the work done by master
students at Université Toulouse Le Mirail
Adam C., Andreani V., Bengsston J., Bouchara N., Choucavy L.,
Delpech E., El Maarouf I., Fontan L., Gotlik W.
Experimenting the Text Tiling
algorithm
Part I : What is the Text Tiling Algorithm ?
Part II : Experimentations with the Text
Tiling algorithm
Part III : Demo
Part I :
What is the TextTiling algorithm?
 « an algorithm for partitionning expository texts into
coherent multi-paragraph discourse units which reflects
the subtopic structure of the texts »

 developed by Marti Hearst (1997):
«TextTiling: Segmenting Text into Multi-Paragraph
Subtopic Passages », In Computational Linguistics, March
1997.
http://www.ischool.berkeley.edu/~hearst/tiling-about.html
Why segment a text into multi-paragraphs
unit ?
Computational tasks that use arbitrary windows might
benefit from using windows with motivated boundaries
Ease of readability for online long texts (Reading
Assistant Tools)
IR : retrieving relevant passages instead of whole
document
Summarization : extract sentences according to their
position in the subtopic structure
What is the hypothesis behind TextTiling ?

 « TextTiling assumes that a set of lexical items is in use
during the course of a given subtopic discussion, and
when that subtopic changes, a significant proportion
when that subtopic changes, a significant proportion of the
of the vocabulary changes
vocabulary changes as well »as well »
Text Tiling doesn’t detect subtopics per se but shifts in
topic by means of change in vocabulary
Operates a linear segmentation (no hierarchy)
Detection of topic shift
Raw text
Tokenisation

similarity score SS
bloc A vs bloc B S
S

Segmentation into
pseudo-sentences
(20 tokens)

a similarity score is computed every
pseudo-sentence between 2 blocks of 6
pseudo-sequences


the more vocabulary in common, the
highest the score


S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
I. Detection of topic shift
SCORE
1

 a gap means there is a

0,85

0,9

drop in vocabulary similarity

0,8

0,8

0,7

 topic shifts occur at the

0,6
0,75

deepest gaps (after
smoothing)

0,5
0,4
0,7

tiles boundaries will be
adjusted to the nearest
paragraph break

0,3
0,65
0,2

0,1
0,6
0
1 1 3 3 5 5 7 7 9 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39
11 13 15 17 19 21 23 25 27 29 31 33 35 37 39

Pseudo-sentence
number
Evaluation by Hearst (1997)
 Evaluation on 12 magazine articles annotated by 7
judges

 Judges are asked « to mark the paragraph boudary at
which the topic changed »

 In case of disagreement among judges, a boudary is
kept if at least 3 judges agree on it

 Agreement among judges (kappa measure) :

kappa = 0.647
Evaluation by Hearst (1997)
Precision

Recall

0.43

0.42

TextTiler

0.66

0.61

Judges

0.81

0.71

Baseline
(random)

Works well on long (+1800 words) expository texts with
little structural demarcation
Part II : Experimentations with
theTextTiling algorithm
 Work done by masters students, Université Toulouse Le
Mirail

 Implementation in Perl
 Experimentations :
 cross annotation of 3 texts
 variation of :


linguistic parameters



computation parameters
Annotation of topic boundary
 No clear-cut topic shift, rather ‘regions’ of shift
Annotators felt a smaller unity (sentence) would have
been more convenient

 Our kappa : 0.56
 Heart’s judges : 0.65

 kappa should be at least > 0.67, the best is > 0.8

 A difficult (unnatural ?) task for humans
Variation of linguistic parameters
basic

trigrams

lemmatization (TreeTagger*)
0,61

0,7

0,58

0,6

0,53

0,5

0,35
0,34

0,26
0,23

PRECISION
F-MESURE

0,4

0,25

0,3
0,2

0,17

0,1
0

RECALL
* http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/
Variation of computation parameters
 Computation window :


pseudo-sentence length



block length

 Smoothing :
0,7

0,7
0,7

0,6

0,6
0,6

0,5

0,5
0,5

0,4

0,4
0,4

0,3

0,3
0,3

0,2

0,2
0,2

0,1

0,1
0,1

0

0

0

1

1

15
57
71
18 17 22 2736 40 5053 65 66 78 85 92 99 105 118127 137 141 148 155 162170 183 196
1425 29 41 4349 57 64 73 81 89 92 105 113 121 129 134 145 153157169 177 185 193 197
79 97 106 113 120 131 144 161 169 176 183 190 201
9
33
Size of computation window
Pseudo-sentence length

Block length
2

4

6

8

10

12

14

16

18

20

5

++ +++ ++

++

++

++

++

++

++

++

10

++

++

++

+

+

++

+

+

+

+

15

++

+

+

+

+

+

+

-

-

-

20

+

+

+

-

-

-

-

-

-

--

25

+

+

-

-

-

-

-

--

--

--

30

+

-

-

-

-

--

--

--

--

--

35

+

-

-

-

-

--

--

--

--

--

40

--

--

--

--

--

--

--

--

--

--
Correlation
window size / smoothing
window size (number of tokens)
10

30

40

50

iteration

3

3

1

1

1

width

Smoothing

20

2

1

2

2

1

 Correlation between window size and smoothing :
The smallest your window, the more smoothing you need
to smoothe
Optimal parameters set
Nb
parag.

Nb
Words sentences tokens
smooth.
words /
/
/
iteration
parag. block
sentence

smooth.
width

Text 1

12

2000

167

6

5

3

2

Text 2

22

2400

109

6

10

1

1

Text 3

37

1750

20

8

10

1

1

 One optimal parameters set per text
 Optimal set varies according to text/paragraph
length ?
Final thoughts
 Linguistic processing :
lemmatization doesn’t significantly improve TextTiling
 what about stemming ?


 Computation parameters :
 parameters are highly dependent


optimal parameters set vary from text to text

 Proposal : an adaptative Text Tiler ?
 window size could be adapted to text intrinsic qualities
 smoothing could then be adapted to window size
Part III :

Demo
Similarity score – Hearst (1997)

Sim (b1 ,b2) =

∑t wt,b1 . wt,b2

√ ∑ w² b1 . ∑ w² b2
t

t

t

t

b1 : block 1
b2 : block 2
t : token
w : weight (frequency) of the token in the block
back
Kappa measure
http://www.musc.edu/dc/icrebm/kappa.html
Annot 1
yes

no

TOTAL

40

35

Y2=75

no

5

20

N2=25

TOTAL

Y1=45

N1=55

T=100

Annot2 yes

Kappa

Agreement
P(A) = 0.6
Expected agreement
P(E)
= (Y1.Y2 + N1.N2) / T²
= 0.475

P(A) – P(E)
=

1 – P(E)

= 0.24
back

Experimenting the TextTiling Algorithm

  • 1.
    Experimenting the TextTiling Algorithm Summaryof the work done by master students at Université Toulouse Le Mirail Adam C., Andreani V., Bengsston J., Bouchara N., Choucavy L., Delpech E., El Maarouf I., Fontan L., Gotlik W.
  • 2.
    Experimenting the TextTiling algorithm Part I : What is the Text Tiling Algorithm ? Part II : Experimentations with the Text Tiling algorithm Part III : Demo
  • 3.
    Part I : Whatis the TextTiling algorithm?  « an algorithm for partitionning expository texts into coherent multi-paragraph discourse units which reflects the subtopic structure of the texts »  developed by Marti Hearst (1997): «TextTiling: Segmenting Text into Multi-Paragraph Subtopic Passages », In Computational Linguistics, March 1997. http://www.ischool.berkeley.edu/~hearst/tiling-about.html
  • 4.
    Why segment atext into multi-paragraphs unit ? Computational tasks that use arbitrary windows might benefit from using windows with motivated boundaries Ease of readability for online long texts (Reading Assistant Tools) IR : retrieving relevant passages instead of whole document Summarization : extract sentences according to their position in the subtopic structure
  • 5.
    What is thehypothesis behind TextTiling ?  « TextTiling assumes that a set of lexical items is in use during the course of a given subtopic discussion, and when that subtopic changes, a significant proportion when that subtopic changes, a significant proportion of the of the vocabulary changes vocabulary changes as well »as well » Text Tiling doesn’t detect subtopics per se but shifts in topic by means of change in vocabulary Operates a linear segmentation (no hierarchy)
  • 6.
    Detection of topicshift Raw text Tokenisation similarity score SS bloc A vs bloc B S S Segmentation into pseudo-sentences (20 tokens) a similarity score is computed every pseudo-sentence between 2 blocks of 6 pseudo-sequences  the more vocabulary in common, the highest the score  S S S S S S S S S S S S S S S
  • 7.
    I. Detection oftopic shift SCORE 1  a gap means there is a 0,85 0,9 drop in vocabulary similarity 0,8 0,8 0,7  topic shifts occur at the 0,6 0,75 deepest gaps (after smoothing) 0,5 0,4 0,7 tiles boundaries will be adjusted to the nearest paragraph break 0,3 0,65 0,2 0,1 0,6 0 1 1 3 3 5 5 7 7 9 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 Pseudo-sentence number
  • 8.
    Evaluation by Hearst(1997)  Evaluation on 12 magazine articles annotated by 7 judges  Judges are asked « to mark the paragraph boudary at which the topic changed »  In case of disagreement among judges, a boudary is kept if at least 3 judges agree on it  Agreement among judges (kappa measure) : kappa = 0.647
  • 9.
    Evaluation by Hearst(1997) Precision Recall 0.43 0.42 TextTiler 0.66 0.61 Judges 0.81 0.71 Baseline (random) Works well on long (+1800 words) expository texts with little structural demarcation
  • 10.
    Part II :Experimentations with theTextTiling algorithm  Work done by masters students, Université Toulouse Le Mirail  Implementation in Perl  Experimentations :  cross annotation of 3 texts  variation of :  linguistic parameters  computation parameters
  • 11.
    Annotation of topicboundary  No clear-cut topic shift, rather ‘regions’ of shift Annotators felt a smaller unity (sentence) would have been more convenient  Our kappa : 0.56  Heart’s judges : 0.65  kappa should be at least > 0.67, the best is > 0.8  A difficult (unnatural ?) task for humans
  • 12.
    Variation of linguisticparameters basic trigrams lemmatization (TreeTagger*) 0,61 0,7 0,58 0,6 0,53 0,5 0,35 0,34 0,26 0,23 PRECISION F-MESURE 0,4 0,25 0,3 0,2 0,17 0,1 0 RECALL * http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/
  • 13.
    Variation of computationparameters  Computation window :  pseudo-sentence length  block length  Smoothing : 0,7 0,7 0,7 0,6 0,6 0,6 0,5 0,5 0,5 0,4 0,4 0,4 0,3 0,3 0,3 0,2 0,2 0,2 0,1 0,1 0,1 0 0 0 1 1 15 57 71 18 17 22 2736 40 5053 65 66 78 85 92 99 105 118127 137 141 148 155 162170 183 196 1425 29 41 4349 57 64 73 81 89 92 105 113 121 129 134 145 153157169 177 185 193 197 79 97 106 113 120 131 144 161 169 176 183 190 201 9 33
  • 14.
    Size of computationwindow Pseudo-sentence length Block length 2 4 6 8 10 12 14 16 18 20 5 ++ +++ ++ ++ ++ ++ ++ ++ ++ ++ 10 ++ ++ ++ + + ++ + + + + 15 ++ + + + + + + - - - 20 + + + - - - - - - -- 25 + + - - - - - -- -- -- 30 + - - - - -- -- -- -- -- 35 + - - - - -- -- -- -- -- 40 -- -- -- -- -- -- -- -- -- --
  • 15.
    Correlation window size /smoothing window size (number of tokens) 10 30 40 50 iteration 3 3 1 1 1 width Smoothing 20 2 1 2 2 1  Correlation between window size and smoothing : The smallest your window, the more smoothing you need to smoothe
  • 16.
    Optimal parameters set Nb parag. Nb Wordssentences tokens smooth. words / / / iteration parag. block sentence smooth. width Text 1 12 2000 167 6 5 3 2 Text 2 22 2400 109 6 10 1 1 Text 3 37 1750 20 8 10 1 1  One optimal parameters set per text  Optimal set varies according to text/paragraph length ?
  • 17.
    Final thoughts  Linguisticprocessing : lemmatization doesn’t significantly improve TextTiling  what about stemming ?   Computation parameters :  parameters are highly dependent  optimal parameters set vary from text to text  Proposal : an adaptative Text Tiler ?  window size could be adapted to text intrinsic qualities  smoothing could then be adapted to window size
  • 18.
  • 19.
    Similarity score –Hearst (1997) Sim (b1 ,b2) = ∑t wt,b1 . wt,b2 √ ∑ w² b1 . ∑ w² b2 t t t t b1 : block 1 b2 : block 2 t : token w : weight (frequency) of the token in the block back
  • 20.
    Kappa measure http://www.musc.edu/dc/icrebm/kappa.html Annot 1 yes no TOTAL 40 35 Y2=75 no 5 20 N2=25 TOTAL Y1=45 N1=55 T=100 Annot2yes Kappa Agreement P(A) = 0.6 Expected agreement P(E) = (Y1.Y2 + N1.N2) / T² = 0.475 P(A) – P(E) = 1 – P(E) = 0.24 back