This is a presentation we made in the 2012 Spring Data Mining class of Tsinghua University. The presentation is about a paper by Lei Zhang, Bing Liu, Suk Hwan Lim, Eamonn O’Brien-Strain
2. • Retina Display
• 3-axis gyro & accelerometer
• A4 CPU
• Multitask
• Face Time
• iBook
• Antenna Gate
A Story
3. • Clearly knowing the response from
consumers will help company win
more market share.
• Consumers could also make correct
choices when shopping.
Why mining product
features?
4. • In recent years, opinion mining has been an active research
area in NLP. The most important problem is to extracting
features from a corpus.
• HMM, ME, PMI,CRF methods.
• Double Propagation is a state-of-art unsupervised technique
for solving this problem, though it has its own significant
limitations.
Recent Research
5. • Proposed by researchers from
Illinois University and Zhejiang
University.
• Mainly extracts noun
features, woks well for medium-
size corpora.
• No additional resources but initial
seed opinion lexicon needed.
Double Propagation
6. Basic Assumption: Features are nouns/noun phrases and opinion words are
adjectives.
Dependency Grammar: Describe the dependency relations between words in a
sentence, including direct relations(a)(b) and indirect relations(c)(d).
Opinion
The camera has a good lens.
Class Feature
DP Mechanism
7. Non-opinion adjectives may be extracted as opinion words. This will introduce more
and more noise during the extracting process.
current
+ Noun
entire
Some important features do not have opinion words modifying them.
There is a valley on my mattress.
No opinion word modified feature
DP Limitations
8. Two-Step feature mining method:
Feature Extraction
• Double Propagation
• Part-whole pattern
• No pattern
Feature Ranking
• New angle to solve the noise problem.
• Use relevance & frequency to rank features.
Proposed Methods
9. • Three strong clue indicates a correct feature:
• Modified by multiple opinion words.
• Could be extracted by multiple part-whole pattern.
• Combination of the part-whole, no pattern and opinion word
modification.
• Frequent appearing indicates an important feature.
Ranking Principles
10. • Feature extraction
• part-whole relation
• “no” pattern
• Feature ranking
• HITS algorithm
• consider frequency
Process
15. Data Sets Cars Mattress Phone LCD
# of Sent. 2223 13233 15168 1783
“Cars” and “Mattress”: product review sites.
“Phone” and “LCD”: forum sites.
Precision@N metric:
Percentage of correct features that are among
the top N feature candidates in a ranked list.
Data Sets & Evaluation Metrics
22. • Use part-whole and “no” patterns to increase recall
• Rank extracted feature candidates by feature
importance, determined by two factors:
• Feature relevance
• Feature frequency(HITS was applied)
Conclusion