Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative Learning (CSCL) Data
Question Classiﬁcation & Sentiment
Dept. of Computer Science,
The Univ. of Hong Kong
March 5, 2010
The Knowledge Forum
A forum for students to discuss interesting issues, so that they can
learn during the discussion process.
Monitor the progress of students participating in the forum.
Forum articles can be categorized into four diﬀerent types – as
Argument, Statement, Information, and Question.
Examples of Articles
(Information) Alcohol is an other kind of energy that would not produce
air-pollution and easy to use. In Brazil, alcohol energy is very popular and
successful. The Brazil government co-operate with a bank and produce alcohol for
(Argument) but producing fossil fuel need a few million years or maybe more than
it. So it will too late if we have to wait for a long time until its produced.
(Question) is it the one using Changjiang River?
(Statement) we are doing wind energy.
2 of 14
The progress of a student is reﬂected by the diﬀerent types of
articles the student posted on the forum.
We would like to use Machine Learning technique to solve this
Two pieces of work which is related to this problem:
Question Classiﬁcation — Classify questions into diﬀerent categories.
Sentiment Analysis (Opinion Mining) — aims to determine the
attitude of a writer with respect to some topic. The attitude may be
their judgment or evaluation, their aﬀective state (the emotional state
of the author when writing) or the intended emotional communication
(the emotional eﬀect the author wishes to have on the reader). (from
Wikipedia) This includes
determining the polarity of a given text — positive, negative or neutral.
determining the opinions expressed on diﬀerent aspects of entities
3 of 14
We have used a local-aligned tree-kernel to do Question
Application: Question/Answering System.
Based on the UIUC TREC database.
5 training set, containing 1,000 to 5,500 training questions, and a
test set containing 500 questions (Li & Roth).
The Questions are divided into 6 coarse classes and 50 ﬁne classes.
We achieved 92.5% accuracy.
4 of 14
ABBREVIATION – abbreviation and expression.
DESCRIPTION – deﬁnition, description, manner, reason.
ENTITY – animal, body, color, creative, currency,
disease/medicine, event, food, instrument, lang, letter, other,
plant, product, religion, sport, substance, symbol, technique, term,
HUMAN – description, group, individual, title
LOCATION – city, country, mountain, state, other
NUMERIC VALUE – code, count, date, distance, money, order,
period, speed, percent, temp, vol/size, weight, other
5 of 14
The following is the ﬁrst question extracted from the training dataset
for each broad class:
(ABBR, exp) What is the full form of .com ?
(DESC, manner) How did serfdom develop in and then leave
(ENTY, animal) What fowl grabs the spotlight after the Chinese
Year of the Monkey ?
(HUM, title) What is the oldest profession ?
(LOC, state) What sprawling U.S. state boasts the most airports ?
(NUM, date) When was Ozzy Osbourne born ?
6 of 14
words – words appearing in the question.
POS tags – their corresponding POS tags.
Chunks – non-overlapping phrases in the question.
Head chunks – the ﬁrst noun/verb chunk in the question.
Examples: (from Li & Roth)
(Question) : Who was the ﬁrst woman killed in the Vietnam War?
(POS Tagging) : [Who WP] [was VBD] [the DT] [ﬁrst JJ]
[woman NN] [killed VBN] [in IN] [the DT] [Vietnam NNP] [War
NNP] [? .]
(Chunking) : [NP Who] [VP was] [NP the ﬁrst woman] [VP
killed] [PP in] [NP the Vietnam War] ?
7 of 14
Named Entities – noun phrases was categorized into diﬀerent
semantic categories or varying speciﬁcity.
e.g. Question in the previous slides, we can get the named entity
[Num ﬁrst] and [Event Vietnam War].
WordNet Senses – words are organized into senses in WordNet,
which are organized in hierarchy. All senses of a word are used as
We use the Wu & Palmer metric to measure the similarity between
Class-speciﬁc related words – some words are related to speciﬁc
question class, e.g. alcohol, lunch, orange etc are related to food
words occurring in similar syntactic structure are similar to each other.
words can be grouped into semantic categories accordingly.
8 of 14
Li & Roth used a hierarchical classiﬁer.
Use two level classiﬁer.
Coarse classiﬁer – divide into the coarse classes.
Fine classiﬁer – for the ﬁne classes.
use Winnows algorithm.
Zhang & Chan
Use convolution tree kernels with local alignment
tree-kernel is semantic-enriched, by measuring the semantic similarity
of two parse trees, based on WordNet and Wu & Palmer metric, and
Classiﬁcation was done by Support Vector Machine (SVM).
We believe article classiﬁcation can be done similarly, using both
general features (for example, all POS tags and WordNet senses)
and expert features (Class-speciﬁc related words).
9 of 14
Sentiment Analysis & Opinion Mining
It involves the following problems (Pang & Li):
Sentiment polarity and degree of positivity
classify the position of the opinion in a continuum between two
for example, in the context of reviews or political speech.
determine whether a piece of objective information is good or bad.
more diﬃcult task: rating inference, “pro and con” instead of positive
Subjectivity detection and opinion identiﬁcation
whether an article contain subjective/objective information.
determining opinion strength (diﬀerent from rating).
for example, use adjectives in the sentences.
10 of 14
The following features can be used for sentiment analysis:
Term presence & frequency
Although term frequency was commonly used in information retrieval,
it was found that term presence gives better performance.
Binary features vs numerical feature.
topic emphasized by frequent occurrences of keywords
overall sentiment may not.
Sometimes single occurrence of word already indicate subjectivity.
position of a term within a textual unit.
use of unigram, bigram or trigram.
high-contrast pair of words, such as ”delicious an dirty”.
11 of 14
Parts of Speech
Adjectives is particularly important in sentiment analysis.
for example, certain adjective are good indicators.
Use selected phrases, which are chosen via a pre-speciﬁed POS
patterns, most including an adjective or an adverb.
Nouns and verbs can also be strong indicators (e.g. ”gem”, and
sub-tree syntactic structures have been used.
collocation and other complex syntactic patterns have also been found
Positive and Negative opinion sometimes only diﬀers in one negative
word (such as ”not”, ”don’t”).
Negation can be expressed in subtle ways, which is diﬃcult to discover
(such as sarcasm and irony).
12 of 14
topic information should be incorporated into features.
for example, a piece of good news of rivals can be a bad news.
may need to include indicators (”this work”) or party names so that
the features can be attached to diﬀerent entities.
13 of 14
to apply Machine Learning, we need a labeled corpus with
suﬃcient training data.
Many diﬀerent features are used. Some system uses more than
200,000 features! (of course generated by computers)
Can group terms together to form concepts to reduce number of
If we have enough training data, we can ﬁnd the grouping most
tailored for the topic involved.
Features can also be results of another machine learning program,
such as sentiment analysis, topic related keywords.
Supervised classiﬁcation can be employed, such as Support Vector
Machines or Decision Trees with Adaboost.
If suﬃcient data, the entire process can be data-driven.
Expert knowledge can be used to reduce amount of training data
14 of 14