This document provides guidelines for segmenting and transcribing audio documents. It outlines rules for segmenting utterances into words according to orthographic conventions. It also provides detailed transcription guidelines, such as transcribing verbatim without correcting errors, following standard capitalization and spelling, and using symbols to annotate noises, partial words, and uncertain transcriptions. The document emphasizes transcribing exactly what is said to avoid changing pronunciations or abbreviating words. It concludes with instructions for saving files with synchronized names and metadata.
2. 1. Segmentation
1.1. Utterance should be segmented into words
following orthographic conventions of the language.
E.g.
Keessaattti > keessa + isaatti
Irrakeessatti > irra + keessa + isaatti
Gamana > gama + kana ( not needed)
Obboleessakeetti > obboleessa + keetti
Oldeeme > ol + deeme
Olkeesuu > ol + kaasuu
3. Segmentation…
Gabbaye> gadi + baye
Gaqqabi > gadi + qabi
Deemeera> deemee + jira ( as it is )
Deemuuf > ( as it is )
TolaafiBantii > Tolaa + fi + Bantii
Ofjaalata > Of + jaalata
4. Segmentation…
1.2. No utterance can be longer than 15 seconds.
1.3. Independent words should be separated.
1.4. More than one dependent word should not be
affixed.
1.5. Meaningless lexical elements should not stand
alone.
1.6.Whenever possible choose a segmentation that
maintains the phrase structure of the conversation.
5. Segmentation…
1.7. Consider a stretch of silence which has small
amplitude noises embedded in it as a silence only
utterance.
1.8. Do not mark the noise and do not segment the
noises into separate utterances.
1.9. However, if a noise has a particularly high amplitude,
then segment it into its own utterance.
6. Transcription
2.1. Transcribe “verbatim,” without correcting
grammatical errors: “Inni deemte,”
2.2. Do not change an individual pronunciation into the
‘standard pronunciation’: “ Keeysa > keessa”.
2.3. Follow the dictionary on hyphenating compounds in
clear-cut cases. But “when in doubt, leave them out.”
e.g. bu’aa-ba’ii
2.4. Compound words: All compound words should be
transcribed as one word when such a word exists in the
dictionary unless there is an acoustical pause between
the two words. e.g. “gamana”, “garana”, “ofirra”, etc
7. Transcription …
2.5. Try to avoid word abbreviations: dhibbentaa , not
% .
2.6. Contractions are allowed. e.g. “deemeera”, “kale”
2.7. Capitalization: Use normal capitalization on
proper nouns. Do not capitalize the beginning of
the sentence.
2.8. No punctuation should be used in the
transcriptions
2.9. Remember to watch for common spelling
confusions like: its and it’s, they’re, there and their,
by and bye, to and too, etc.
8. Transcription …
2.10. Numbers: Spell out all number sequences
except in cases such as “123” or “101” where the
numbers have a specific meaning.
2.11. Transcribe years like 1983 as spoken —
“nineteen eighty three.”
2.12. Do not use hyphens (“twenty eight”, not
“twenty-eight”)
9. Transcription …
2.12. Letter sequences: Spell out letter sequences: DFW, USA,
FBI, NASA, ROM
2.13. Possessives: Use standard grammar rules to denote
possession: the US’s policy, Sally’s book, the drivers’ cars,
the CEO’s decision, the dancers’ shoes.
2.13. If a speaker does not completely pronounce a word and
the word is not a standard reduction then spell out as much
of the word as is pronounced, and inside brackets spell out
the part of the word that was not pronounced.
10. Transcription …
2.14. Use a single dash after the brackets if the last
part of the word was not pronounced and a single
dash before the brackets if the first part of the
word was not pronounced to flag that a partial
word was spoken.
2.15. Context should be used to determine what word
was intended to be spoken. If, from context, a
reasonable intended word can not be determined,
mark it as [vocalized-noise]
11. Transcription …
2.16. Restarts of “i”: If a speaker restarts when saying the
word “i”, it should be transcribed as “i-”.
2.17. Mispronunciations: If a speaker mispronounces a
word and the mispronunciation is not an actual word,
transcribe the word as it is spoken followed by the
word that was intended.
2.18. Divide these two words by a forward slash and
enclose both words in brackets.
E.g. i wasn’t sure that they were blaming that
[splace/space] space disaster on one company
12. Transcription …
2.19. If a speaker uses and gives meaning to a word that is not an actual
word, spell the word out as it sounds and enclose it in braces.
E.g. How are things for you {weatherwise}
2.20. If one of the speakers involved in the conversation talks to someone
in the background and the words can be understood, then transcribe it as
an aside enclosed in the markers, <b_aside> and <e_aside>.
2.21. This only applies if one of the conversation speakers is involved in the
background conversation.
13. Transcription …
2.22. If just background speakers can be heard then
this can be thought of either as noise or
background noise depending energy level of the
background speakers. compared to the foreground
speakers.
E.g. “yeah i know what you <b_aside> honey i
can’t play with you right now i’m on the phone
<e_aside> sorry you know kids always want
mommy all to themselves”
14. Transcription …
2.23. Hesitation sounds: Use “uh” or “ah” for hesitations
consisting of a vowel sound, and “um” or “hm” for hesitations
with a nasal sound, depending upon which transcription the
actual sound is closest to.
2.24. Use “huh” for the aspirated version of the hesitation as in:
"huh? <other speaker responds> um ok, I see your point."
2.25. Yes/no sounds: Use “uh-huh” or “um-hum” (yes) and “huh-
uh” or “hum-um” (no) for anything remotely resembling these
sounds of assent or denial;
2.26. You may use “yeah,” “yep,” and “nope” if that is what the
words sound like.
15. Transcription …
2.27. Non-speech sounds during conversations: transcribe these
using only the following list of expressions in brackets:
E.g. [laughter] [noise] [vocalized-noise] Pick the closest description
([noise] will be adequate in most cases)
2.28. Laughter during speech: If laughter occurs directly before a
word, place the [laughter] tag before the spoken word.
2.30. If laughter occurs after a spoken word, place the [laughter]
tag after the word.
2.31. If the speaker laughs while saying the word, but the word is
still understood, transcribe this as [laughter-word], where "word"
is the word spoken during the laughter.
16. Transcription …
2.32. If the speech is obliterated by the laughter, transcribe it
strictly as [laughter].
2.33. If a speaker laughs while saying several words and the
words are understood, transcribe each word in the phrase as
[laughter-word].
2.34. Laughter throughout the phrase, “you don’t say,” would be
transcribed as: [laughter-you] [laughter-don’t] [laughter-say].
17. Transcription …
2.35. Pronunciation variants are handled in such that
words should be transcribed as they are said.
E.g.
about_ b aw t
because _k ah z
depends_p eh n d z
them_ eh m
18. Transcription …
2.37. Consider continuous background noise as part of
channel.
2.38. For example, if a baby cries at a consistent energy
level throughout the conversation then treat it as
background
noise.
2.39. Only consider it as noise if the noise grows much
louder than the normal level .
2.40. The baby screaming would warrant considering it as
noise. In this case mark it as [noise].
19. Transcription …
2.41.In general abbreviations should be avoided and words
should be transcribed exactly as spoken.
2.42. The exception is that when abbreviations are used as
part of a personal title, they remain as abbreviations, as in
standard writing:
E.g. Mr. Brown
Mrs. Jones
Dr. Spock
20. Transcription …
2. 43. Acronyms that are normally written as a single
word but pronounced as a sequence of individual
letters should be written in all caps, with each
individual letter surrounded by spaces.
2.44. Similarly, individual letters that are pronounced as
such should b e written in caps:
e.g.
I got an A on the test.
How ’bout if his name was spelled M U H R?
21. Transcription …
2.45.When a speaker breaks off in the middle of the word, annotators
transcribe as much of the word as can be made out.
2.46.A single dash without preceding space is used to indicate point at which
word was broken off.
2.47.If transcribers can make a reasonable guess at which word was intended
by the speaker, they should include the full form of the word immediately
after the truncated form, preceded by a plus sign + (without separating
spaces).
E.g.
Yes, absolu- +absolutely absolutely.
Well, I gue- +guess -- I would think this is what they intended.
22. Transcription …
2.48. Speaker restarts are indicated with double dash –
surrounded by spaces.
2.49.Annotators use this convention for cases where a
speaker stops short, cutting him/herself off before
continuing with or rephrasing the utterance.
E.g.
Did people uh -- did fights ever break out uh over
hockey? Since she -- when she died we moved from
across the street.
23. Transcription …
2.50.An asterisk * is used for obviously mispronounced
words (not regional or nonstandard dialect
pronunciation), or for words that are made up on the
spot by the speaker or idiosyncratic to that speaker’s
usage.
2.51.Annotators should transcribe using the standard
spelling and should not try to represent the
pronunciation. E.g.
They have as much *knowledgement about
things as we’ve got. He insisted that we
((*teak)) -- talk to him in Italian
24. Transcription …
2.52.Sometimes an audio file will contain a section of speech that is
difficult or impossible to understand. In these cases, annotators use
double parentheses (( )) to mark the region of difficulty.
2.53.If it is possible to make a guess about the speaker’s words,
annotators should transcribe what they think they hear and surround
the stretch of uncertain transcription with double parentheses:
E.g.
And she told me that ((I should just leave.))
25. Transcription …
2.54.In addition to the transcription conventions outlined above,
the following symbols are used to for the transcription of
other kinds of noises made by either the main speaker or one
of the other participants in the interviews:
{BR} breath (The speaker takes an audible breath.)
{CG} cough (The speaker coughs, or clears his/her throat.)
{LS} lip smack (The speaker smacks his/her lips.)
{LG} laughter (The speaker laughs.)
{NS} noise (Loud background noise, e.g. a door slamming,
cars honking etc.)
26. Saving
A file name should be same as the tier name.
GLOSSA can find easily the file if the file
name and tier names are synchronized.
Add this information to the metadata.