Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative Learning (CSCL) Data


Published on

5 March 2010 (Friday) | 09:00 - 12:30 | | Dr. Kwok Ping CHAN, Associate Professor, Department of Computer Science, HKU

Published in: Education, Technology
  • Be the first to comment

Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative Learning (CSCL) Data

  1. 1. Question Classification & Sentiment Analysis Kwok-Ping Chan Dept. of Computer Science, The Univ. of Hong Kong March 5, 2010
  2. 2. The Knowledge Forum A forum for students to discuss interesting issues, so that they can learn during the discussion process. Monitor the progress of students participating in the forum. Forum articles can be categorized into four different types – as Argument, Statement, Information, and Question. Examples of Articles (Information) Alcohol is an other kind of energy that would not produce air-pollution and easy to use. In Brazil, alcohol energy is very popular and successful. The Brazil government co-operate with a bank and produce alcohol for drivers (Argument) but producing fossil fuel need a few million years or maybe more than it. So it will too late if we have to wait for a long time until its produced. (Question) is it the one using Changjiang River? (Statement) we are doing wind energy. 2 of 14
  3. 3. Article Classification The progress of a student is reflected by the different types of articles the student posted on the forum. We would like to use Machine Learning technique to solve this problem. Two pieces of work which is related to this problem: Question Classification — Classify questions into different categories. Sentiment Analysis (Opinion Mining) — aims to determine the attitude of a writer with respect to some topic. The attitude may be their judgment or evaluation, their affective state (the emotional state of the author when writing) or the intended emotional communication (the emotional effect the author wishes to have on the reader). (from Wikipedia) This includes determining the polarity of a given text — positive, negative or neutral. subjectivity/objectivity identification determining the opinions expressed on different aspects of entities 3 of 14
  4. 4. Question Classification We have used a local-aligned tree-kernel to do Question Classification. Application: Question/Answering System. Based on the UIUC TREC database. 5 training set, containing 1,000 to 5,500 training questions, and a test set containing 500 questions (Li & Roth). The Questions are divided into 6 coarse classes and 50 fine classes. We achieved 92.5% accuracy. 4 of 14
  5. 5. Question hierarchy ABBREVIATION – abbreviation and expression. DESCRIPTION – definition, description, manner, reason. ENTITY – animal, body, color, creative, currency, disease/medicine, event, food, instrument, lang, letter, other, plant, product, religion, sport, substance, symbol, technique, term, vehicle, word. HUMAN – description, group, individual, title LOCATION – city, country, mountain, state, other NUMERIC VALUE – code, count, date, distance, money, order, period, speed, percent, temp, vol/size, weight, other 5 of 14
  6. 6. Example Questions The following is the first question extracted from the training dataset for each broad class: (ABBR, exp) What is the full form of .com ? (DESC, manner) How did serfdom develop in and then leave Russia ? (ENTY, animal) What fowl grabs the spotlight after the Chinese Year of the Monkey ? (HUM, title) What is the oldest profession ? (LOC, state) What sprawling U.S. state boasts the most airports ? (NUM, date) When was Ozzy Osbourne born ? 6 of 14
  7. 7. Syntactic Features words – words appearing in the question. POS tags – their corresponding POS tags. Chunks – non-overlapping phrases in the question. Head chunks – the first noun/verb chunk in the question. Examples: (from Li & Roth) (Question) : Who was the first woman killed in the Vietnam War? (POS Tagging) : [Who WP] [was VBD] [the DT] [first JJ] [woman NN] [killed VBN] [in IN] [the DT] [Vietnam NNP] [War NNP] [? .] (Chunking) : [NP Who] [VP was] [NP the first woman] [VP killed] [PP in] [NP the Vietnam War] ? 7 of 14
  8. 8. Semantic Features Named Entities – noun phrases was categorized into different semantic categories or varying specificity. e.g. Question in the previous slides, we can get the named entity [Num first] and [Event Vietnam War]. WordNet Senses – words are organized into senses in WordNet, which are organized in hierarchy. All senses of a word are used as features. We use the Wu & Palmer metric to measure the similarity between words. Class-specific related words – some words are related to specific question class, e.g. alcohol, lunch, orange etc are related to food class. Distributional Similarity words occurring in similar syntactic structure are similar to each other. words can be grouped into semantic categories accordingly. 8 of 14
  9. 9. Classifiers Li & Roth used a hierarchical classifier. Use two level classifier. Coarse classifier – divide into the coarse classes. Fine classifier – for the fine classes. use Winnows algorithm. Zhang & Chan Use convolution tree kernels with local alignment tree-kernel is semantic-enriched, by measuring the semantic similarity of two parse trees, based on WordNet and Wu & Palmer metric, and distributional similarity. Classification was done by Support Vector Machine (SVM). We believe article classification can be done similarly, using both general features (for example, all POS tags and WordNet senses) and expert features (Class-specific related words). 9 of 14
  10. 10. Sentiment Analysis & Opinion Mining It involves the following problems (Pang & Li): Sentiment polarity and degree of positivity classify the position of the opinion in a continuum between two polarities. for example, in the context of reviews or political speech. determine whether a piece of objective information is good or bad. more difficult task: rating inference, “pro and con” instead of positive or negative. Subjectivity detection and opinion identification whether an article contain subjective/objective information. determining opinion strength (different from rating). for example, use adjectives in the sentences. 10 of 14
  11. 11. Features The following features can be used for sentiment analysis: Term presence & frequency Although term frequency was commonly used in information retrieval, it was found that term presence gives better performance. Binary features vs numerical feature. topic emphasized by frequent occurrences of keywords overall sentiment may not. Sometimes single occurrence of word already indicate subjectivity. Term-based features position of a term within a textual unit. use of unigram, bigram or trigram. high-contrast pair of words, such as ”delicious an dirty”. 11 of 14
  12. 12. Features Parts of Speech Adjectives is particularly important in sentiment analysis. for example, certain adjective are good indicators. Use selected phrases, which are chosen via a pre-specified POS patterns, most including an adjective or an adverb. Nouns and verbs can also be strong indicators (e.g. ”gem”, and ”love”) Syntax sub-tree syntactic structures have been used. collocation and other complex syntactic patterns have also been found useful. Negation Positive and Negative opinion sometimes only differs in one negative word (such as ”not”, ”don’t”). Negation can be expressed in subtle ways, which is difficult to discover (such as sarcasm and irony). 12 of 14
  13. 13. Features Topic-oriented features topic information should be incorporated into features. for example, a piece of good news of rivals can be a bad news. may need to include indicators (”this work”) or party names so that the features can be attached to different entities. 13 of 14
  14. 14. Suggested Framework to apply Machine Learning, we need a labeled corpus with sufficient training data. Many different features are used. Some system uses more than 200,000 features! (of course generated by computers) Can group terms together to form concepts to reduce number of features. If we have enough training data, we can find the grouping most tailored for the topic involved. Features can also be results of another machine learning program, such as sentiment analysis, topic related keywords. Supervised classification can be employed, such as Support Vector Machines or Decision Trees with Adaboost. If sufficient data, the entire process can be data-driven. Expert knowledge can be used to reduce amount of training data needed. 14 of 14