Centograqs

CENTOGRAQ
Empirical Study of Distribution of Letters in English Words: A
Special Case of Centograqs
Omisile Kehinde Olugbenga

Background
• One of my colleagues posted on WhatsApp that if we assign
1 to A, 2 to B, … 26 to Z,
we could weight English words with percent as unit of
measurement.
• He went on to create summation for words like
‘HARDWORK’ = 98 percent
‘KNOWLEDGE’ = 96 percent
‘LOVE’ = 54 percent
‘LUCK’ = 47 percent and ended up posting that
‘ATTITUDE’ sums to 100 percent!

So…
•How many words
in the English
language when
weighted would
sum to 100%?

I set to work… and this was my first attempt

Objectives
1. The number of English words that actually weighted-sum up to 100
percent and their weighted-average length.
2. Average length of Centograqs.
3. The distribution of initial letters (first letters) and terminal letters (last
letters).
4. The distribution of letters used to form long words – words longer than
nine letters.
5. The composite distribution of letters in terms of the most common letter
used in constructing words.
6. Distribution of words with unique letters (isograms), that is words whose
letters are all unique.
7. Distribution of repeated letters within words and twin bigrams/digraphs.

What is a centograq?
•There was a need to have a single name to
call words whose weighted-sum is 100% so I
came up with the colloquial word – Centograq.
•Hence, a centograq is a word whose letter-by-
letter weighted-sum is 100 percent; and, the
weights are based on the letters’ positions in
the list of English alphabets.

How many words are centograqs?
•From the website www.gist.github.com, one of
the participants Peter Magenheimer had
written a Python code to extract text strings
from the English dictionary, evaluate their sum
and return only those that summed to 100
percent. Hence, using his python code, he
was able to extract 2302 centograqs.

METHODS
Steps in the analysis of centograqs

Step 1
•The imported data
was set up in an
Excel Table,
named ‘Dict’, and
the column holding
the words named
[Word].
Excel Tables

Length
[Length]
=LEN(Dict[Word])
The average length of centograqs

Letter
extraction
[1st]
{=IFERROR(
CHAR(
CODE(
LOWER(
MID([@Word],[@Length]-([@Length]-COLUMNS($E2:E2)),1)
)
)
)
,0)}
Maximum length was calculated hence fifteen contiguous columns were set up to ‘spread’ the words.
An array formula was used for the spread.

Confirm Weight
[Weight]
{=SUMPRODUCT(
MATCH(Dict[@[1st]:[14th]],alpha,0)
)-14}
Sum the letters to ensure they are all 100% weights. This same
formula (CSE) could be adapted for any word.

Initial and
Terminal Letters
[Last]
=RIGHT([Word],1)
Initial letters are obvious being in the column [1st].
Terminal letters are extracted using the formula below

Frequency of
Letters
[Frequency]
COUNTIF(Dict[[1st]:[14th]],[@Letter])
I also counted the frequency of occurrence of each
letter. This was set up on a different table with ‘Dict’
as precedence.

Unique Letters
in Each Word
[Unique] 
{=SUMPRODUCT(
1/COUNTIF(Dict[@[1st]:[14th]],Dict[@[1st]:[14th]])
)}
How many letters form each word? (CSE)

Other Analyses
The remaining results were obtained using
pivot tables.

Number of
centograqs
There are 2302
English words
whose weighted-
sum is 100 percent.

Average length of Centograqs
Mean SD
Mini
mum
Maxi
mum
25th
Perce
ntile
Media
n
75th
Perce
ntile
8.8 1.4 5 14 8 9 10

Distribution of Lengths of Centograqs

Shortest and Longest Centograqs
Shortest
• Tousy
• Struv
• Totty
• Buzzy
• Nutty, and
• Pussy
Longest
•Batrachoididae
•Biddulphiaceae

Most common
letters
Showing the top 7 and the
bottom seven.
E A I R O N
[CELLRANGE];
2225
[CELLRANGE];
1852
[CELLRANGE];
1826
[CELLRANGE];
1462
[CELLRANGE];
1363
[CELLRANGE];
1292
[CELLRANGE];
1267
[CELLRANGE];
196
[CELLRANGE];
185
[CELLRANGE];
168
[CELLRANGE]; 68
[CELLRANGE]; 55
[CELLRANGE]; 32
[CELLRANGE]; 28
0 500 1000 1500 2000 2500
E
A
I
R
O
N
L
K
W
V
Z
X
J
Q
Frequency of Occurence

Comparison of most common letters in English
Words between centograqs, free texts, and email
text
Centograq
s E A I R O N L T S C U D M P H
Text1
E T A O I N S R H L D C U M F
Email2
E T O A I N S R H L D U C M P

Top 7 Initial and Terminal Letters
[CELLRANGE],
126
[CELLRANGE],
132
[CELLRANGE],
166
[CELLRANGE],
179
[CELLRANGE],
183
[CELLRANGE],
224
[CELLRANGE],
250
0 50 100 150 200 250 300
t
m
u
a
c
p
s
Initial
[CELLRANGE],
156
[CELLRANGE],
168
[CELLRANGE],
181
[CELLRANGE],
185
[CELLRANGE],
187
[CELLRANGE],
229
[CELLRANGE],
489
0100200300400500
l
s
r
d
n
y
e
Terminal

Number of Words with Unique Letters
Length of Word
Number of Duplications
Total0 1 2 3 4 5 6
5 2 3 1 6
6 36 36 12 84
7 121 154 36 5 316
8 140 298 162 36 2 638
9 105 228 197 74 9 613
10 14 103 159 88 23 2 389
11 21 56 73 32 7 189
12 9 16 17 9 1 52
13 1 4 3 5 13
14 2 2
Total 418 843 632 293 87 23 6 2302

Example of Centograqs that are isograms
baculiform fatherling lageniform neoblastic
pelargonic plumbagine impugnable
anhydremic unmiracled coislander
Colubrinae conjugated athrogenic
Purbeckian syndicate dysphemia
cystidean cymophane exophasic
sulfamine subrepand muraenoid lubricant
Juncoides guildsman bufotalin steckling
staminode asyndetic trembling tranceful

Comparison between All Centograqs
and Unique Centograqs
Position Class 1 2 3 4 5 6 7 8 9 10 11 12 13
1
4
1
5
Initial
Unique centograqs S P U T C A D O M B G E V NF
All centograqs S P C A U M T D B R G E H FO
Terminal
Unique centograqs Y E N D R T L M S C G P A HW
All centograqs E Y N D R S L T A C M G H PO
Common
Letter
Unique centograqs I R E A O T N S L U Y P C HM
All centograqs E A I R O N L T S C U D M PH

Twin Bigrams
•We also attempted to study the pattern of twin
bigrams, words with same letter repeated one after
the other.
•Of the 2302 centograqs, 484 (21%) have at least
one twin bigram while there are 28 words that have
two twin bigrams but not more.

Centograqs with Twin Bigrams
Accumulate, Adiabatically, Adulthood, Annually, Attitude,
Awfully, Blissful, Bootmaker, Boycott, Chimpanzee,
Clerically, Clinically, Coatroom, Congress, Corridor,
Connivance, Conniver Coyness, Diagonally, Dispeller,
Dooryard, Drizzle, Excellent, Ferryman, Flurry, Forefoot,
Immature, Inapplicable, Inefficient, Innovate, Intellect,
Interbreed, Irritate, Largeness, Likelihood, Outtalk,
Pathless, Plummet, Proofing, Pussy, Reshuffle,
Roommate, Schoolman, Session, Shooter, Spooler,
Stress, Swimmer, Swollen, Unfreeze, Unwilled, Useless

Position of Twin Bigrams in Centograqs
72
120
68
62 63
45
36
15
2 1
0
20
40
60
80
100
120
140
2 3 4 5 6 7 8 9 10 11
NumberofWords
Position

Comparison between the Twin Bigrams in
Centograqs and Google English Corpus
Centograqs
L
L
S
S
O
O
E
E
R
R
T
T
M
M
N
N
P
P
F
F
G
G
C
C
D
D
B
B
Z
Z
A
A
K
K
H I J Q U V W X Y
Google*
L
L
S
S
E
E
O
O
T
T
F
F
P
P
R
R
M
M
C
C
N
N
D
D
G
G
II
B
B
A
A
Z
Z
X
X
U
U
H
H
Q J W K Y V
*Rick Wicklin, 2014

A TOUCH OF
CRYPTOGRAPHY
S WGWRN BN
CWJDKDTPEBT

Vkrhu jhf zry lirasmt sabyk cidwdrkeuk
(‘Thank you for reading about centograqs’).

Centograqs

Recommended

Recommended

More Related Content

Recently uploaded

Recently uploaded (20)

Featured

Featured (20)

Centograqs

Editor's Notes