Using Machine Learning Methods
CSI 5386 Project Presentation
• An important step in QA
– To classify the question to the anticipated type
of the answer (semantically).
– More challenging than common search tasks.
• Q: What Canadian city has the largest
Answer type: city
The Ambiguity Problem
• What is bipolar disorder?
• What do bats eat?
• What is the PH scale?
• Hard to categorize those questions into one
– Need multiple class labels for a single
Why Machine Learning?
• Manually constructed sets of rules to map a
question to its type is not efficient.
– Requires the analysis of a large number of
– Mapping questions into fine classes requires
the use of lexical items (specific words).
• A learned classifier enables one to define only a
small number of “type” features.
• Can be trained on a new taxonomy.
Li and Roth (2002):
Learning Question Classifier
• Uses the SnoW learning architecture.
– Hierarchical classifiers
– 6 coarse classes: ABBREVATION, ENTITY,
DESCRIPTION, HUMAN, LOCATION,
– 50 fine classes.
Li and Roth (cont)
• UIUC question classification dataset
– 5500 training (from TREC 8,9, including 500
– 500 test datasets from TREC 10.
• Six primitive feature types:
– Words, pos tags, chunks, named entities, head
chunks and semantically related words
• Semantically related word list for each question
– “away” belongs to the sensor Rel(distance).
Zhang and Lee (2003):
Question Classifcation using SVM
• Two kind of features:
– Bag of words and bag of ngrams.
• SVM with kernel tree
– Use LIBVSM (Chang and Lin, 2001).
– Take advantage of the syntactic structures of
– Compare with Nearest Neighbors, Naïve
Bayes, Decision Tree, SnoW.
Zhang and Lee (cont)
• Using the same dataset as Li and Roth
• Same twolayered question taxonomy
• Same assumption:
– One question resides in only one category.
• Uses automated constructed features
– No semantically related word list
Huang et al. (2008):
QC using Head Words and their Hypernyms
• In contrast to Li's, a compact feature set
– Head word
– Use WordNet to augment the semantic
– Adopt Lesk's word sense disambiguation
Huang et al. (cont)
• Again, use the same dataset.
• Other features:
– Question whword, word grams, word shape
– Maximum Entropy Model
– Support Vector Model – also adopt LIBVSM.
– Obtained higher accuracy (89% and 89.2%).
Datasets for the project
• Same dataset as Li's:
• Additional datasets:
– TREC QA: http://trec.nist.gov/data/qa.html
Plan for the project
• Experiment with different feature types:
– Head chunks, semantic features for head
chunk, namedentities, word grams and word
• Use WordNet to automate the generation of
– Find hypernyms.
– Apply Lesk's WSD to the head chunk.
• Java interface to WordNet:
• A syntactic parser for extracting the head
– Berkeley parser (Petrov and Klein, 2007).
• Use the Ngram Statistics Package
• Named entity recognizer, a relational
feature extraction language (FEX):
• Mallet Machine Learning Library:
• Li, X. and D. Roth. 2002. Learning Question
Classiﬁers.The 19th international conference on
Computational linguistics, vol. 1, pp. 1–7.
• Zhang D. and W. S. Lee. 2003. Question
Classiﬁcation using Support Vector Machines.
The ACM SIGIR conference in information
retrieval, pp. 26–32.
• Zhiheng Huang; Marcus Thint; Zengchang Qin.
Question Classification using Head Words and
• D. Roth, G. Kao, X. Li, R. Nagarajan, V.
Punyakanok, N. Rizzolo, W. Yih, C. O. Alm, and
L. G. Moran. 2002. Learning components for a
question answering system. In TREC2001.
• Jonathan Brown – IR Lab. EntityTagged
Language Models for Question Classification in a
• Donald Metzler, W. Bruce Croft Analysis of
statistical question classification for factbased