SlideShare a Scribd company logo
First Experiments Searching Spontaneous Czech Speech
Pavel Ircing
Department of Cybernetics
University of West Bohemia
Plzen, Czech Republic
ircing@kky.zcu.cz
Douglas W. Oard
College of Information
Studies/UMIACS
University of Maryland
College Park, Maryland
oard@glue.umd.edu
Jan Hoidekr
Department of Cybernetics
University of West Bohemia
Plzen, Czech Republic
hoidekr@kky.zcu.cz
Categories and Subject Descriptors
H.3 [Information Storage and Retrieval]: H.3.1 Con-
tent Analysis and Indexing
General Terms
Experimentation
Keywords
Speech retrieval; Spontaneous speech
1. INTRODUCTION
This paper reports on experiments with the ļ¬rst available
Czech IR test collection. The collection consists of a con-
tinuous stream from automatic transcription of spontaneous
speech (see [3] for details) and the task of the IR system is
to identify appropriate replay points where the discussion
about the queried topic starts. The collection thus lacks
clearly deļ¬ned document boundaries. Moreover, the accu-
racy of the transcription is limited (around 35% word error
rate), mostly due to the nature of the speechā€”interviews
with Holocaust survivors, which are sometimes emotional,
accented, and exhibiting age-related speech impediments.
This collection therefore oļ¬€ers an excellent opportunity to
explore both eļ¬€ects present in Czech (e.g., morphology) and
eļ¬€ects that result from processing spontaneous speech. It
was also used in the CL-SR track at the CLEF 2006 evalu-
ation campaign (http://www.clef-campaign.org/).
2. METHODS
Retrieval from a speech stream with unknown topic bound-
aries is an interesting challenge, but that is not our princi-
pal focus in these experiments. We therefore transformed
the collection into artiļ¬cially deļ¬ned set of ā€œdocumentsā€
by removing all recognized pauses between words and then
sliding a 3-minute window over the transcripts with a 1-
minute step size. This resulted in a collection of 11,377
overlapping passages, each containing an average of 390 rec-
ognized words (denoted as the asr ļ¬eld) and a set of au-
tomatically produced Czech translations (using techniques
described in [2]) for 20 automatically assigned thesaurus key-
words (using techniques described in [4]) (the ak ļ¬eld). Each
Copyright is held by the author/owner(s).
SIGIRā€™07, July 23ā€“27, 2007, Amsterdam, The Netherlands.
ACM 978-1-59593-597-7/07/0007.
word stem lemma
asr 0.0256 0.0494 0.0506
ak 0.0018 0.0022 0.0023
asr.ak 0.0241 0.0447 0.0467
Table 1: Mean GAP, long queries.
ļ¬eld was indexed separately, and a uniļ¬ed index (asr.ak)
was also constructed.
Twenty-nine topics were initially created in English in
the usual TREC-style format (<title>, <desc> and <narr>
ļ¬elds), translated into Czech by a native speaker, and then
checked for natural expression by a second native speaker.
We performed monolingual experiments with ā€œlongā€ queries
constructed by concatenating the words from all three topic
ļ¬elds.
A morphological analyser was used to obtain the infor-
mation about the lemma (linguistic root form), stem (ap-
proximation to that root form using truncation alone) and
part-of-speech for each Czech word [1] . Three variants of
the collection were indexed, one with only words, one with
only lemmas and one with only stems. Part-of-speech tags
were used as a basis for stopword removalā€”as we could not
ļ¬nd any decent stoplist for Czech, we simply removed all
words that were tagged as preposition, conjunction, particle
or interjection. In each case, identical processing was done
for the queries. We used Lemur to implement a simple tf.idf
model with blind feedback (using Lemurā€™s standard parame-
ters). Length normalization was not performed because the
collection preprocessing resulted in documents with nearly
identical lengths.
3. EVALUATION
Relevance assessors identiļ¬ed appropriate start times by
interactively searching using manually assigned English the-
saurus terms and the same automatically transcribed con-
tent, ultimately conļ¬rming their decisions by listening to the
audio when the automatically produced transcripts were not
sufficiently accurate to make a deļ¬nitive judgment. Table 1
reports the mean Generalized Average Precision (mGAP),
which is computed in a manner similar to mean average pre-
cision (for details see [3]).
Indexing the ak ļ¬eld, alone or in combination with asr,
proved not to be helpful (although the apparent reduction
when indexed together is not statistically signiļ¬cant (p >
0.05)). Manual examination of a few ak ļ¬elds indeed indi-
cates a low density of terms that appear as if they match
SIGIR 2007 Proceedings Poster
835
0
0.05
0.1
0.15
0.2
0.25
0.3
1166
1181
1185
1187
1225
1286
1288
1310
1311
1321
1508
1620
1630
1663
1843
word
stem
lemma
0
0.05
0.1
0.15
0.2
0.25
0.3
2198
2253
3004
3005
3009
3014
3015
3017
3018
3020
3025
3033
4000
14312
word
stem
lemma
Figure 1: GAP by topic, asr field, long queries.
the content of the passage, but additional analysis will be
needed before we can ascribe blame between the transcrip-
tion, classiļ¬cation and translation stages in the cascade that
produced those keyword assignments. We therefore focus on
results obtained using the asr ļ¬eld alone for the remainder
of our analysis.
It is apparent that some form of linguistic preprocessing
is indeed crucial for Czech. Both lemmatization and stem-
ming boosted the performance almost by a factor of two
in comparison with the word runs, and a Wilcoxon signed-
rank test shows that diļ¬€erence to be statistically signiļ¬cant
(p < 0.005). The slight apparent advantage of the lemma run
over the stem run is not statistically signiļ¬cant (p > 0.05).
As Figure 1 shows, substantial variation in GAP is evident
across topics. The four topics with the highest GAP values
(1225, 1630, 2198, 3014) each contain highly discriminative
terms that were correctly transcribed. Topic 1630 exhibits
an enormous diļ¬€erence between word matching and match-
ing either stems or lemmas, a vivid reminder of how the
recall-enhancing eļ¬€ect of linguistic analysis can dominate
averaged measures (a similar eļ¬€ect is also apparent for topic
1310). While a few cases of adverse eļ¬€ects from linguis-
tic analysis are visible (most notably with topics 1225 and
1181), these eļ¬€ects are generally relatively small. The occa-
sional diļ¬€erences between stems and lemmas suggests that
combining evidence from both might help in some cases.
Unsuccessful topics generally either asked about abstract
concepts without using many discriminative terms (e.g., topic
1288: ā€œstrengthening faith during the Holocaustā€), or the
discriminative terms for the topic happened to be missing
from the collection. For example, topic 3018 contained a sin-
gle discriminative term that was simply spelled diļ¬€erently
in the ASR lexicon (and consequently in the transcripts).
Manually conforming the spelling in the topic to that found
in the lexicon would have increased the GAP for that topic
(with lemma) from 0.0026 to 0.1175.
Interestingly, it turned out that every term that we (man-
ually) judged to be highly discriminating in our analysis of
successful and unsuccessful topics was a named entity (NE).
This prompted us to perform a more systematic analysis of
the vocabulary coverage for the NEs present in all 29 topics.
If we leave out the NEs that are widespread in the collection
and thus useless for IR (Jew, Holocaust, Hitler, etc.), there
are 42 NEs in the topic set; only 13 of them are present in
the ASR lexicon, only 11 of those 13 actually appeared any-
where in the transcripts, and only 5 of those 11 substantially
contributed to successful IR (or, if we manually conform the
spelling in topic 3018, 6 of 12). The overall ā€œquery rare
named entity error rateā€ for this collection is therefore (42-
5)/42=88%, more than double the overall word error rate.
Rare NEs are quite naturally not well represented in the
materials from which ASR systems are trained; integrating
phone-lattice term detection with large-vocabulary recogni-
tion oļ¬€ers one promising research direction. Inconsistent
spelling is probably the more easily rectiļ¬ed problem; anno-
tators of ASR training materials are typically not domain
experts, and in some cases valid alternate transliterations
(e.g., from Yiddish roots) result in disagreement even among
experts. One useful approach would be to adjust the top-
ics to conform to the ASR lexicon, thus simulating a similar
process an interactive searcher could perform if notiļ¬ed that
one of their query terms is outside the known vocabulary.
4. NEXT STEPS
In addition to the ideas above for dealing with rare terms,
another obvious next step would be to optimize our system
design to better reļ¬‚ect the task characteristics that moti-
vated the design of the mean GAP measure. We have shown
that passage retrieval can indeed sometimes get us in the
right neighborhood, but overlapping passages may not be
the best way of identifying optimal replay start times. An-
other question that we need to explore is whether some other
retrieval model might be more eļ¬€ective. Finally, extending
our work to include on the far larger CLEF 2007 Czech
news test collection will allow us to enrich our comparison
between lemmas and stems for Czech indexing.
5. ACKNOWLEDGMENTS
This work was supported in part by projects MSMT LC536,
GACR 1ET101470416 and NSF IIS-0122466.
6. REFERENCES
[1] J. HajicĢŒ. Disambiguation of Rich Inflection.
(Computational Morphology of Czech). Karolinum,
Prague, 2004.
[2] C. Murray et al. Leveraging Reusability: Cost-eļ¬€ective
Lexical Acquisition for Large-scale Ontology
Translation. In Proceeding of ACL 2006, pages 945ā€“952,
Sydney, Australia, 2006.
[3] D. Oard et al. Overview of the CLEF-2006
Cross-Language Speech Retrieval Track. In CLEF 2006
- revised selected papers - Springer LNCS, 2007.
[4] S. Olsson, D. Oard, and J. HajicĢŒ. Cross-Language Text
Classiļ¬cation. In Proceedings of SIGIR 2005, pages
645ā€“646, Salvador, Brazil, 2005.
SIGIR 2007 Proceedings Poster
836

More Related Content

Similar to Analysis And Indexing General Terms Experimentation

Language Combinatorics: A Sentence Pattern Extraction Architecture Based on C...
Language Combinatorics: A Sentence Pattern Extraction Architecture Based on C...Language Combinatorics: A Sentence Pattern Extraction Architecture Based on C...
Language Combinatorics: A Sentence Pattern Extraction Architecture Based on C...
Waqas Tariq
Ā 
A REVIEW ON PARTS-OF-SPEECH TECHNOLOGIES
A REVIEW ON PARTS-OF-SPEECH TECHNOLOGIESA REVIEW ON PARTS-OF-SPEECH TECHNOLOGIES
A REVIEW ON PARTS-OF-SPEECH TECHNOLOGIES
IJCSES Journal
Ā 
A survey on parallel corpora alignment
A survey on parallel corpora alignment A survey on parallel corpora alignment
A survey on parallel corpora alignment
andrefsantos
Ā 
AN INVESTIGATION OF THE SAMPLING-BASED ALIGNMENT METHOD AND ITS CONTRIBUTIONS
AN INVESTIGATION OF THE SAMPLING-BASED ALIGNMENT METHOD AND ITS CONTRIBUTIONSAN INVESTIGATION OF THE SAMPLING-BASED ALIGNMENT METHOD AND ITS CONTRIBUTIONS
AN INVESTIGATION OF THE SAMPLING-BASED ALIGNMENT METHOD AND ITS CONTRIBUTIONS
ijaia
Ā 
WITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSER
WITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSERWITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSER
WITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSER
ijnlc
Ā 
WITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSER
WITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSERWITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSER
WITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSER
kevig
Ā 
Comparing Forgetting Heuristics For Complexity Reduction Of Justifications
Comparing Forgetting Heuristics For Complexity Reduction Of JustificationsComparing Forgetting Heuristics For Complexity Reduction Of Justifications
Comparing Forgetting Heuristics For Complexity Reduction Of Justifications
TimdeBoer16
Ā 
NAMED ENTITY RECOGNITION IN TURKISH USING ASSOCIATION MEASURES
NAMED ENTITY RECOGNITION IN TURKISH USING ASSOCIATION MEASURESNAMED ENTITY RECOGNITION IN TURKISH USING ASSOCIATION MEASURES
NAMED ENTITY RECOGNITION IN TURKISH USING ASSOCIATION MEASURES
acijjournal
Ā 
USING TF-ISF WITH LOCAL CONTEXT TO GENERATE AN OWL DOCUMENT REPRESENTATION FO...
USING TF-ISF WITH LOCAL CONTEXT TO GENERATE AN OWL DOCUMENT REPRESENTATION FO...USING TF-ISF WITH LOCAL CONTEXT TO GENERATE AN OWL DOCUMENT REPRESENTATION FO...
USING TF-ISF WITH LOCAL CONTEXT TO GENERATE AN OWL DOCUMENT REPRESENTATION FO...
cseij
Ā 
AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...
AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...
AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...
IJCSEA Journal
Ā 
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATAIDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
ijistjournal
Ā 
Identifying the semantic relations on
Identifying the semantic relations onIdentifying the semantic relations on
Identifying the semantic relations on
ijistjournal
Ā 
2014-mo444-practical-assignment-02-paulo_faria
2014-mo444-practical-assignment-02-paulo_faria2014-mo444-practical-assignment-02-paulo_faria
2014-mo444-practical-assignment-02-paulo_fariaPaulo Faria
Ā 
Blei ngjordan2003
Blei ngjordan2003Blei ngjordan2003
Blei ngjordan2003
Ajay Ohri
Ā 
A Pilot Study On Computer-Aided Coreference Annotation
A Pilot Study On Computer-Aided Coreference AnnotationA Pilot Study On Computer-Aided Coreference Annotation
A Pilot Study On Computer-Aided Coreference Annotation
Darian Pruitt
Ā 
Stemming is one of several text normalization techniques that converts raw te...
Stemming is one of several text normalization techniques that converts raw te...Stemming is one of several text normalization techniques that converts raw te...
Stemming is one of several text normalization techniques that converts raw te...
NALESVPMEngg
Ā 
Basic review on topic modeling
Basic review on  topic modelingBasic review on  topic modeling
Basic review on topic modeling
Hiroyuki Kuromiya
Ā 
Chapter 2 Text Operation and Term Weighting.pdf
Chapter 2 Text Operation and Term Weighting.pdfChapter 2 Text Operation and Term Weighting.pdf
Chapter 2 Text Operation and Term Weighting.pdf
JemalNesre1
Ā 
05 handbook summ-hovy
05 handbook summ-hovy05 handbook summ-hovy
05 handbook summ-hovySagar Dabhi
Ā 
Automatic Speech Recognition and Machine Learning for Robotic Arm in Surgery
Automatic Speech Recognition and Machine Learning for Robotic Arm in SurgeryAutomatic Speech Recognition and Machine Learning for Robotic Arm in Surgery
Automatic Speech Recognition and Machine Learning for Robotic Arm in Surgery
DR.P.S.JAGADEESH KUMAR
Ā 

Similar to Analysis And Indexing General Terms Experimentation (20)

Language Combinatorics: A Sentence Pattern Extraction Architecture Based on C...
Language Combinatorics: A Sentence Pattern Extraction Architecture Based on C...Language Combinatorics: A Sentence Pattern Extraction Architecture Based on C...
Language Combinatorics: A Sentence Pattern Extraction Architecture Based on C...
Ā 
A REVIEW ON PARTS-OF-SPEECH TECHNOLOGIES
A REVIEW ON PARTS-OF-SPEECH TECHNOLOGIESA REVIEW ON PARTS-OF-SPEECH TECHNOLOGIES
A REVIEW ON PARTS-OF-SPEECH TECHNOLOGIES
Ā 
A survey on parallel corpora alignment
A survey on parallel corpora alignment A survey on parallel corpora alignment
A survey on parallel corpora alignment
Ā 
AN INVESTIGATION OF THE SAMPLING-BASED ALIGNMENT METHOD AND ITS CONTRIBUTIONS
AN INVESTIGATION OF THE SAMPLING-BASED ALIGNMENT METHOD AND ITS CONTRIBUTIONSAN INVESTIGATION OF THE SAMPLING-BASED ALIGNMENT METHOD AND ITS CONTRIBUTIONS
AN INVESTIGATION OF THE SAMPLING-BASED ALIGNMENT METHOD AND ITS CONTRIBUTIONS
Ā 
WITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSER
WITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSERWITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSER
WITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSER
Ā 
WITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSER
WITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSERWITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSER
WITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSER
Ā 
Comparing Forgetting Heuristics For Complexity Reduction Of Justifications
Comparing Forgetting Heuristics For Complexity Reduction Of JustificationsComparing Forgetting Heuristics For Complexity Reduction Of Justifications
Comparing Forgetting Heuristics For Complexity Reduction Of Justifications
Ā 
NAMED ENTITY RECOGNITION IN TURKISH USING ASSOCIATION MEASURES
NAMED ENTITY RECOGNITION IN TURKISH USING ASSOCIATION MEASURESNAMED ENTITY RECOGNITION IN TURKISH USING ASSOCIATION MEASURES
NAMED ENTITY RECOGNITION IN TURKISH USING ASSOCIATION MEASURES
Ā 
USING TF-ISF WITH LOCAL CONTEXT TO GENERATE AN OWL DOCUMENT REPRESENTATION FO...
USING TF-ISF WITH LOCAL CONTEXT TO GENERATE AN OWL DOCUMENT REPRESENTATION FO...USING TF-ISF WITH LOCAL CONTEXT TO GENERATE AN OWL DOCUMENT REPRESENTATION FO...
USING TF-ISF WITH LOCAL CONTEXT TO GENERATE AN OWL DOCUMENT REPRESENTATION FO...
Ā 
AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...
AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...
AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...
Ā 
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATAIDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
Ā 
Identifying the semantic relations on
Identifying the semantic relations onIdentifying the semantic relations on
Identifying the semantic relations on
Ā 
2014-mo444-practical-assignment-02-paulo_faria
2014-mo444-practical-assignment-02-paulo_faria2014-mo444-practical-assignment-02-paulo_faria
2014-mo444-practical-assignment-02-paulo_faria
Ā 
Blei ngjordan2003
Blei ngjordan2003Blei ngjordan2003
Blei ngjordan2003
Ā 
A Pilot Study On Computer-Aided Coreference Annotation
A Pilot Study On Computer-Aided Coreference AnnotationA Pilot Study On Computer-Aided Coreference Annotation
A Pilot Study On Computer-Aided Coreference Annotation
Ā 
Stemming is one of several text normalization techniques that converts raw te...
Stemming is one of several text normalization techniques that converts raw te...Stemming is one of several text normalization techniques that converts raw te...
Stemming is one of several text normalization techniques that converts raw te...
Ā 
Basic review on topic modeling
Basic review on  topic modelingBasic review on  topic modeling
Basic review on topic modeling
Ā 
Chapter 2 Text Operation and Term Weighting.pdf
Chapter 2 Text Operation and Term Weighting.pdfChapter 2 Text Operation and Term Weighting.pdf
Chapter 2 Text Operation and Term Weighting.pdf
Ā 
05 handbook summ-hovy
05 handbook summ-hovy05 handbook summ-hovy
05 handbook summ-hovy
Ā 
Automatic Speech Recognition and Machine Learning for Robotic Arm in Surgery
Automatic Speech Recognition and Machine Learning for Robotic Arm in SurgeryAutomatic Speech Recognition and Machine Learning for Robotic Arm in Surgery
Automatic Speech Recognition and Machine Learning for Robotic Arm in Surgery
Ā 

More from Ashley Hernandez

5 Paragraph Essay Closing Paragraph. Online assignment writing service.
5 Paragraph Essay Closing Paragraph. Online assignment writing service.5 Paragraph Essay Closing Paragraph. Online assignment writing service.
5 Paragraph Essay Closing Paragraph. Online assignment writing service.
Ashley Hernandez
Ā 
A Level Economics Essay Guide. Online assignment writing service.
A Level Economics Essay Guide. Online assignment writing service.A Level Economics Essay Guide. Online assignment writing service.
A Level Economics Essay Guide. Online assignment writing service.
Ashley Hernandez
Ā 
500 Word Descriptive Essay Example. Online assignment writing service.
500 Word Descriptive Essay Example. Online assignment writing service.500 Word Descriptive Essay Example. Online assignment writing service.
500 Word Descriptive Essay Example. Online assignment writing service.
Ashley Hernandez
Ā 
2010 Fifa World Cup Essay. Online assignment writing service.
2010 Fifa World Cup Essay. Online assignment writing service.2010 Fifa World Cup Essay. Online assignment writing service.
2010 Fifa World Cup Essay. Online assignment writing service.
Ashley Hernandez
Ā 
250 Word Scholarship Essay Sample. Online assignment writing service.
250 Word Scholarship Essay Sample. Online assignment writing service.250 Word Scholarship Essay Sample. Online assignment writing service.
250 Word Scholarship Essay Sample. Online assignment writing service.
Ashley Hernandez
Ā 
12 Essay Template. Online assignment writing service.
12 Essay Template. Online assignment writing service.12 Essay Template. Online assignment writing service.
12 Essay Template. Online assignment writing service.
Ashley Hernandez
Ā 
7Th Grade Persuasive Essay Writing Prompts
7Th Grade Persuasive Essay Writing Prompts7Th Grade Persuasive Essay Writing Prompts
7Th Grade Persuasive Essay Writing Prompts
Ashley Hernandez
Ā 
300-500 Word Essay Is How Many Pages. Online assignment writing service.
300-500 Word Essay Is How Many Pages. Online assignment writing service.300-500 Word Essay Is How Many Pages. Online assignment writing service.
300-500 Word Essay Is How Many Pages. Online assignment writing service.
Ashley Hernandez
Ā 
3 Essays Of Jose Rizal. Online assignment writing service.
3 Essays Of Jose Rizal. Online assignment writing service.3 Essays Of Jose Rizal. Online assignment writing service.
3 Essays Of Jose Rizal. Online assignment writing service.
Ashley Hernandez
Ā 
A2 Media Essay Examples. Online assignment writing service.
A2 Media Essay Examples. Online assignment writing service.A2 Media Essay Examples. Online assignment writing service.
A2 Media Essay Examples. Online assignment writing service.
Ashley Hernandez
Ā 
4Th Grade Expository Essay Samples. Online assignment writing service.
4Th Grade Expository Essay Samples. Online assignment writing service.4Th Grade Expository Essay Samples. Online assignment writing service.
4Th Grade Expository Essay Samples. Online assignment writing service.
Ashley Hernandez
Ā 
2009 Ap Literature Sample Essay. Online assignment writing service.
2009 Ap Literature Sample Essay. Online assignment writing service.2009 Ap Literature Sample Essay. Online assignment writing service.
2009 Ap Literature Sample Essay. Online assignment writing service.
Ashley Hernandez
Ā 
1012 Sat Essay. Online assignment writing service.
1012 Sat Essay. Online assignment writing service.1012 Sat Essay. Online assignment writing service.
1012 Sat Essay. Online assignment writing service.
Ashley Hernandez
Ā 
A Descriptive Essay About Christmas Day. Online assignment writing service.
A Descriptive Essay About Christmas Day. Online assignment writing service.A Descriptive Essay About Christmas Day. Online assignment writing service.
A Descriptive Essay About Christmas Day. Online assignment writing service.
Ashley Hernandez
Ā 
A Conclusion Paragraph For A Compare And Contrast Essay
A Conclusion Paragraph For A Compare And Contrast EssayA Conclusion Paragraph For A Compare And Contrast Essay
A Conclusion Paragraph For A Compare And Contrast Essay
Ashley Hernandez
Ā 
A College Admission Essay Examples. Online assignment writing service.
A College Admission Essay Examples. Online assignment writing service.A College Admission Essay Examples. Online assignment writing service.
A College Admission Essay Examples. Online assignment writing service.
Ashley Hernandez
Ā 
06.02 Essay Analysis Prezi. Online assignment writing service.
06.02 Essay Analysis Prezi. Online assignment writing service.06.02 Essay Analysis Prezi. Online assignment writing service.
06.02 Essay Analysis Prezi. Online assignment writing service.
Ashley Hernandez
Ā 
7Th Grade Science Essay Questions. Online assignment writing service.
7Th Grade Science Essay Questions. Online assignment writing service.7Th Grade Science Essay Questions. Online assignment writing service.
7Th Grade Science Essay Questions. Online assignment writing service.
Ashley Hernandez
Ā 
5 Page Essay Examples. Online assignment writing service.
5 Page Essay Examples. Online assignment writing service.5 Page Essay Examples. Online assignment writing service.
5 Page Essay Examples. Online assignment writing service.
Ashley Hernandez
Ā 
A Day At The Beach Essay For Grade 2. Online assignment writing service.
A Day At The Beach Essay For Grade 2. Online assignment writing service.A Day At The Beach Essay For Grade 2. Online assignment writing service.
A Day At The Beach Essay For Grade 2. Online assignment writing service.
Ashley Hernandez
Ā 

More from Ashley Hernandez (20)

5 Paragraph Essay Closing Paragraph. Online assignment writing service.
5 Paragraph Essay Closing Paragraph. Online assignment writing service.5 Paragraph Essay Closing Paragraph. Online assignment writing service.
5 Paragraph Essay Closing Paragraph. Online assignment writing service.
Ā 
A Level Economics Essay Guide. Online assignment writing service.
A Level Economics Essay Guide. Online assignment writing service.A Level Economics Essay Guide. Online assignment writing service.
A Level Economics Essay Guide. Online assignment writing service.
Ā 
500 Word Descriptive Essay Example. Online assignment writing service.
500 Word Descriptive Essay Example. Online assignment writing service.500 Word Descriptive Essay Example. Online assignment writing service.
500 Word Descriptive Essay Example. Online assignment writing service.
Ā 
2010 Fifa World Cup Essay. Online assignment writing service.
2010 Fifa World Cup Essay. Online assignment writing service.2010 Fifa World Cup Essay. Online assignment writing service.
2010 Fifa World Cup Essay. Online assignment writing service.
Ā 
250 Word Scholarship Essay Sample. Online assignment writing service.
250 Word Scholarship Essay Sample. Online assignment writing service.250 Word Scholarship Essay Sample. Online assignment writing service.
250 Word Scholarship Essay Sample. Online assignment writing service.
Ā 
12 Essay Template. Online assignment writing service.
12 Essay Template. Online assignment writing service.12 Essay Template. Online assignment writing service.
12 Essay Template. Online assignment writing service.
Ā 
7Th Grade Persuasive Essay Writing Prompts
7Th Grade Persuasive Essay Writing Prompts7Th Grade Persuasive Essay Writing Prompts
7Th Grade Persuasive Essay Writing Prompts
Ā 
300-500 Word Essay Is How Many Pages. Online assignment writing service.
300-500 Word Essay Is How Many Pages. Online assignment writing service.300-500 Word Essay Is How Many Pages. Online assignment writing service.
300-500 Word Essay Is How Many Pages. Online assignment writing service.
Ā 
3 Essays Of Jose Rizal. Online assignment writing service.
3 Essays Of Jose Rizal. Online assignment writing service.3 Essays Of Jose Rizal. Online assignment writing service.
3 Essays Of Jose Rizal. Online assignment writing service.
Ā 
A2 Media Essay Examples. Online assignment writing service.
A2 Media Essay Examples. Online assignment writing service.A2 Media Essay Examples. Online assignment writing service.
A2 Media Essay Examples. Online assignment writing service.
Ā 
4Th Grade Expository Essay Samples. Online assignment writing service.
4Th Grade Expository Essay Samples. Online assignment writing service.4Th Grade Expository Essay Samples. Online assignment writing service.
4Th Grade Expository Essay Samples. Online assignment writing service.
Ā 
2009 Ap Literature Sample Essay. Online assignment writing service.
2009 Ap Literature Sample Essay. Online assignment writing service.2009 Ap Literature Sample Essay. Online assignment writing service.
2009 Ap Literature Sample Essay. Online assignment writing service.
Ā 
1012 Sat Essay. Online assignment writing service.
1012 Sat Essay. Online assignment writing service.1012 Sat Essay. Online assignment writing service.
1012 Sat Essay. Online assignment writing service.
Ā 
A Descriptive Essay About Christmas Day. Online assignment writing service.
A Descriptive Essay About Christmas Day. Online assignment writing service.A Descriptive Essay About Christmas Day. Online assignment writing service.
A Descriptive Essay About Christmas Day. Online assignment writing service.
Ā 
A Conclusion Paragraph For A Compare And Contrast Essay
A Conclusion Paragraph For A Compare And Contrast EssayA Conclusion Paragraph For A Compare And Contrast Essay
A Conclusion Paragraph For A Compare And Contrast Essay
Ā 
A College Admission Essay Examples. Online assignment writing service.
A College Admission Essay Examples. Online assignment writing service.A College Admission Essay Examples. Online assignment writing service.
A College Admission Essay Examples. Online assignment writing service.
Ā 
06.02 Essay Analysis Prezi. Online assignment writing service.
06.02 Essay Analysis Prezi. Online assignment writing service.06.02 Essay Analysis Prezi. Online assignment writing service.
06.02 Essay Analysis Prezi. Online assignment writing service.
Ā 
7Th Grade Science Essay Questions. Online assignment writing service.
7Th Grade Science Essay Questions. Online assignment writing service.7Th Grade Science Essay Questions. Online assignment writing service.
7Th Grade Science Essay Questions. Online assignment writing service.
Ā 
5 Page Essay Examples. Online assignment writing service.
5 Page Essay Examples. Online assignment writing service.5 Page Essay Examples. Online assignment writing service.
5 Page Essay Examples. Online assignment writing service.
Ā 
A Day At The Beach Essay For Grade 2. Online assignment writing service.
A Day At The Beach Essay For Grade 2. Online assignment writing service.A Day At The Beach Essay For Grade 2. Online assignment writing service.
A Day At The Beach Essay For Grade 2. Online assignment writing service.
Ā 

Recently uploaded

How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
Ā 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chips
GeoBlogs
Ā 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
Excellence Foundation for South Sudan
Ā 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
Jheel Barad
Ā 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
MIRIAMSALINAS13
Ā 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
EugeneSaldivar
Ā 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
Celine George
Ā 
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdfESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
Fundacja Rozwoju Społeczeństwa Przedsiębiorczego
Ā 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
Celine George
Ā 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
GeoBlogs
Ā 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
DeeptiGupta154
Ā 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
Ā 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
Celine George
Ā 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
joachimlavalley1
Ā 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
Ā 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
Ā 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
Vivekanand Anglo Vedic Academy
Ā 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
kaushalkr1407
Ā 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
Ā 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
Ā 

Recently uploaded (20)

How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Ā 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chips
Ā 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
Ā 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
Ā 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Ā 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
Ā 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
Ā 
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdfESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
Ā 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
Ā 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
Ā 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
Ā 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Ā 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
Ā 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
Ā 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Ā 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Ā 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
Ā 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
Ā 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Ā 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
Ā 

Analysis And Indexing General Terms Experimentation

  • 1. First Experiments Searching Spontaneous Czech Speech Pavel Ircing Department of Cybernetics University of West Bohemia Plzen, Czech Republic ircing@kky.zcu.cz Douglas W. Oard College of Information Studies/UMIACS University of Maryland College Park, Maryland oard@glue.umd.edu Jan Hoidekr Department of Cybernetics University of West Bohemia Plzen, Czech Republic hoidekr@kky.zcu.cz Categories and Subject Descriptors H.3 [Information Storage and Retrieval]: H.3.1 Con- tent Analysis and Indexing General Terms Experimentation Keywords Speech retrieval; Spontaneous speech 1. INTRODUCTION This paper reports on experiments with the ļ¬rst available Czech IR test collection. The collection consists of a con- tinuous stream from automatic transcription of spontaneous speech (see [3] for details) and the task of the IR system is to identify appropriate replay points where the discussion about the queried topic starts. The collection thus lacks clearly deļ¬ned document boundaries. Moreover, the accu- racy of the transcription is limited (around 35% word error rate), mostly due to the nature of the speechā€”interviews with Holocaust survivors, which are sometimes emotional, accented, and exhibiting age-related speech impediments. This collection therefore oļ¬€ers an excellent opportunity to explore both eļ¬€ects present in Czech (e.g., morphology) and eļ¬€ects that result from processing spontaneous speech. It was also used in the CL-SR track at the CLEF 2006 evalu- ation campaign (http://www.clef-campaign.org/). 2. METHODS Retrieval from a speech stream with unknown topic bound- aries is an interesting challenge, but that is not our princi- pal focus in these experiments. We therefore transformed the collection into artiļ¬cially deļ¬ned set of ā€œdocumentsā€ by removing all recognized pauses between words and then sliding a 3-minute window over the transcripts with a 1- minute step size. This resulted in a collection of 11,377 overlapping passages, each containing an average of 390 rec- ognized words (denoted as the asr ļ¬eld) and a set of au- tomatically produced Czech translations (using techniques described in [2]) for 20 automatically assigned thesaurus key- words (using techniques described in [4]) (the ak ļ¬eld). Each Copyright is held by the author/owner(s). SIGIRā€™07, July 23ā€“27, 2007, Amsterdam, The Netherlands. ACM 978-1-59593-597-7/07/0007. word stem lemma asr 0.0256 0.0494 0.0506 ak 0.0018 0.0022 0.0023 asr.ak 0.0241 0.0447 0.0467 Table 1: Mean GAP, long queries. ļ¬eld was indexed separately, and a uniļ¬ed index (asr.ak) was also constructed. Twenty-nine topics were initially created in English in the usual TREC-style format (<title>, <desc> and <narr> ļ¬elds), translated into Czech by a native speaker, and then checked for natural expression by a second native speaker. We performed monolingual experiments with ā€œlongā€ queries constructed by concatenating the words from all three topic ļ¬elds. A morphological analyser was used to obtain the infor- mation about the lemma (linguistic root form), stem (ap- proximation to that root form using truncation alone) and part-of-speech for each Czech word [1] . Three variants of the collection were indexed, one with only words, one with only lemmas and one with only stems. Part-of-speech tags were used as a basis for stopword removalā€”as we could not ļ¬nd any decent stoplist for Czech, we simply removed all words that were tagged as preposition, conjunction, particle or interjection. In each case, identical processing was done for the queries. We used Lemur to implement a simple tf.idf model with blind feedback (using Lemurā€™s standard parame- ters). Length normalization was not performed because the collection preprocessing resulted in documents with nearly identical lengths. 3. EVALUATION Relevance assessors identiļ¬ed appropriate start times by interactively searching using manually assigned English the- saurus terms and the same automatically transcribed con- tent, ultimately conļ¬rming their decisions by listening to the audio when the automatically produced transcripts were not sufficiently accurate to make a deļ¬nitive judgment. Table 1 reports the mean Generalized Average Precision (mGAP), which is computed in a manner similar to mean average pre- cision (for details see [3]). Indexing the ak ļ¬eld, alone or in combination with asr, proved not to be helpful (although the apparent reduction when indexed together is not statistically signiļ¬cant (p > 0.05)). Manual examination of a few ak ļ¬elds indeed indi- cates a low density of terms that appear as if they match SIGIR 2007 Proceedings Poster 835
  • 2. 0 0.05 0.1 0.15 0.2 0.25 0.3 1166 1181 1185 1187 1225 1286 1288 1310 1311 1321 1508 1620 1630 1663 1843 word stem lemma 0 0.05 0.1 0.15 0.2 0.25 0.3 2198 2253 3004 3005 3009 3014 3015 3017 3018 3020 3025 3033 4000 14312 word stem lemma Figure 1: GAP by topic, asr field, long queries. the content of the passage, but additional analysis will be needed before we can ascribe blame between the transcrip- tion, classiļ¬cation and translation stages in the cascade that produced those keyword assignments. We therefore focus on results obtained using the asr ļ¬eld alone for the remainder of our analysis. It is apparent that some form of linguistic preprocessing is indeed crucial for Czech. Both lemmatization and stem- ming boosted the performance almost by a factor of two in comparison with the word runs, and a Wilcoxon signed- rank test shows that diļ¬€erence to be statistically signiļ¬cant (p < 0.005). The slight apparent advantage of the lemma run over the stem run is not statistically signiļ¬cant (p > 0.05). As Figure 1 shows, substantial variation in GAP is evident across topics. The four topics with the highest GAP values (1225, 1630, 2198, 3014) each contain highly discriminative terms that were correctly transcribed. Topic 1630 exhibits an enormous diļ¬€erence between word matching and match- ing either stems or lemmas, a vivid reminder of how the recall-enhancing eļ¬€ect of linguistic analysis can dominate averaged measures (a similar eļ¬€ect is also apparent for topic 1310). While a few cases of adverse eļ¬€ects from linguis- tic analysis are visible (most notably with topics 1225 and 1181), these eļ¬€ects are generally relatively small. The occa- sional diļ¬€erences between stems and lemmas suggests that combining evidence from both might help in some cases. Unsuccessful topics generally either asked about abstract concepts without using many discriminative terms (e.g., topic 1288: ā€œstrengthening faith during the Holocaustā€), or the discriminative terms for the topic happened to be missing from the collection. For example, topic 3018 contained a sin- gle discriminative term that was simply spelled diļ¬€erently in the ASR lexicon (and consequently in the transcripts). Manually conforming the spelling in the topic to that found in the lexicon would have increased the GAP for that topic (with lemma) from 0.0026 to 0.1175. Interestingly, it turned out that every term that we (man- ually) judged to be highly discriminating in our analysis of successful and unsuccessful topics was a named entity (NE). This prompted us to perform a more systematic analysis of the vocabulary coverage for the NEs present in all 29 topics. If we leave out the NEs that are widespread in the collection and thus useless for IR (Jew, Holocaust, Hitler, etc.), there are 42 NEs in the topic set; only 13 of them are present in the ASR lexicon, only 11 of those 13 actually appeared any- where in the transcripts, and only 5 of those 11 substantially contributed to successful IR (or, if we manually conform the spelling in topic 3018, 6 of 12). The overall ā€œquery rare named entity error rateā€ for this collection is therefore (42- 5)/42=88%, more than double the overall word error rate. Rare NEs are quite naturally not well represented in the materials from which ASR systems are trained; integrating phone-lattice term detection with large-vocabulary recogni- tion oļ¬€ers one promising research direction. Inconsistent spelling is probably the more easily rectiļ¬ed problem; anno- tators of ASR training materials are typically not domain experts, and in some cases valid alternate transliterations (e.g., from Yiddish roots) result in disagreement even among experts. One useful approach would be to adjust the top- ics to conform to the ASR lexicon, thus simulating a similar process an interactive searcher could perform if notiļ¬ed that one of their query terms is outside the known vocabulary. 4. NEXT STEPS In addition to the ideas above for dealing with rare terms, another obvious next step would be to optimize our system design to better reļ¬‚ect the task characteristics that moti- vated the design of the mean GAP measure. We have shown that passage retrieval can indeed sometimes get us in the right neighborhood, but overlapping passages may not be the best way of identifying optimal replay start times. An- other question that we need to explore is whether some other retrieval model might be more eļ¬€ective. Finally, extending our work to include on the far larger CLEF 2007 Czech news test collection will allow us to enrich our comparison between lemmas and stems for Czech indexing. 5. ACKNOWLEDGMENTS This work was supported in part by projects MSMT LC536, GACR 1ET101470416 and NSF IIS-0122466. 6. REFERENCES [1] J. HajicĢŒ. Disambiguation of Rich Inflection. (Computational Morphology of Czech). Karolinum, Prague, 2004. [2] C. Murray et al. Leveraging Reusability: Cost-eļ¬€ective Lexical Acquisition for Large-scale Ontology Translation. In Proceeding of ACL 2006, pages 945ā€“952, Sydney, Australia, 2006. [3] D. Oard et al. Overview of the CLEF-2006 Cross-Language Speech Retrieval Track. In CLEF 2006 - revised selected papers - Springer LNCS, 2007. [4] S. Olsson, D. Oard, and J. HajicĢŒ. Cross-Language Text Classiļ¬cation. In Proceedings of SIGIR 2005, pages 645ā€“646, Salvador, Brazil, 2005. SIGIR 2007 Proceedings Poster 836