MSc Thesis defense
For datasets and further information, contact the author at skarabasakis@gmail.com
Abstract: The web offers vast quantities of user-generated content, including reviews. These reviews, be they about products, services, books, music or movies, constitute a primary target for the application of opinion analysis techniques. We present Aspect Miner, an integrated opinion mining system tailored to user reviews published on the web. By leveraging the user ratings that typically accompany these reviews, Aspect Miner can be trained to distinguish not only positive from negative sentiment, but also between multiple sentiment intensity levels. Moreover, Aspect Miner is able to classify opinions on the sentence level as well as on the level of individual ratable aspects that are present in a sentence, and is adaptable to texts of any domain.
The system is built around three core subtasks: (i) classification of subjective terms (ii) aspect identification and (iii) sentence sentiment analysis. For the first subtask, we pro-pose a classification scheme that employs the user ratings in a training corpus. For the second one, we look into the LDA topic model as a means to identify and extract the features of the reviews items in the corpus and we attempt to address its inherent limitations by employing an additional post-processing step that aggregates multiple disparate feature models into a single concise one. Finally, in order to perform analysis on the sentence level, we make use of the results of the aforementioned subtasks together with a syntax-tree based linguistic method powered by a set of predefined typed dependency rules. Our experiments show that the accuracy of our approach on these specific tasks is at least comparable to – and under certain circumstances surpasses – a number of other popular sentiment analysis techniques.
Full thesis text (in greek): http://j.mp/AspectMiner
Online hotel searching is a daunting task due to the wealth of online information. Reviews written by other travelers replace the word- of-mouth, yet turn the search into a time consuming task. Users do not rate enough hotels to enable a collaborative filtering based rec- ommendation. Thus, a cold start recommender system is needed.
In this work we design a cold start hotel recommender system, which uses the text of the reviews as its main data. We define con- text groups based on reviews extracted from TripAdvisor.com and Venere.com. We introduce a novel weighted algorithm for text min- ing. Our algorithm imitates a user that favors reviews written with the same trip intent and from people of similar background (na- tionality) and with similar preferences for hotel aspects, which are our defined context groups. Our approach combines numerous ele- ments, including unsupervised clustering to build a vocabulary for hotel aspects, semantic analysis to understand sentiment towards hotel features, and the profiling of intent and nationality groups.
We implemented our system which was used by the public to conduct 150 trip planning experiments. We compare our solution to the top suggestions of the mentioned web services and show that users were, on average, 20% more satisfied with our hotel recom- mendations. We outperform these web services even more in cities where hotel prices are high.
NoSQL: Μη-σχεσιακές βάσεις δεδομένων για υψηλή κλιμάκωση σε web applicationsStelios Karabasakis
Download original PPTX presentation with speaker notes in greek from: http://www.mediafire.com/?me3h3zfqkny
NoSQL Grunge Logo designed by me and released to the public domain. Download as PSD or PNG from: http://www.mediafire.com/?sharekey=2644cf1d57cb17d6ab1eab3e9fa335cace0f768f8ef0a62b
---------
Παρουσίαση που πραγματοποιήθηκε στις 26/5/2010 στο τμήμα Πληροφορικής και Τηλεπικοινωνιών ΕΚΠΑ, στα πλαίσια του μεταπτυχιακού μαθήματος "Θέματα Εφαρμογών Βάσεων Δεδομένων"
Jeff will showcase the sparklyr the new R package to interface with Spark and talk about the different use extensions including the rsparkling ML package.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Make a query regarding a topic of interest and come to know the sentiment for the day in pie-chart or for the week in form of line-chart for the tweets gathered from twitter.com
Online hotel searching is a daunting task due to the wealth of online information. Reviews written by other travelers replace the word- of-mouth, yet turn the search into a time consuming task. Users do not rate enough hotels to enable a collaborative filtering based rec- ommendation. Thus, a cold start recommender system is needed.
In this work we design a cold start hotel recommender system, which uses the text of the reviews as its main data. We define con- text groups based on reviews extracted from TripAdvisor.com and Venere.com. We introduce a novel weighted algorithm for text min- ing. Our algorithm imitates a user that favors reviews written with the same trip intent and from people of similar background (na- tionality) and with similar preferences for hotel aspects, which are our defined context groups. Our approach combines numerous ele- ments, including unsupervised clustering to build a vocabulary for hotel aspects, semantic analysis to understand sentiment towards hotel features, and the profiling of intent and nationality groups.
We implemented our system which was used by the public to conduct 150 trip planning experiments. We compare our solution to the top suggestions of the mentioned web services and show that users were, on average, 20% more satisfied with our hotel recom- mendations. We outperform these web services even more in cities where hotel prices are high.
NoSQL: Μη-σχεσιακές βάσεις δεδομένων για υψηλή κλιμάκωση σε web applicationsStelios Karabasakis
Download original PPTX presentation with speaker notes in greek from: http://www.mediafire.com/?me3h3zfqkny
NoSQL Grunge Logo designed by me and released to the public domain. Download as PSD or PNG from: http://www.mediafire.com/?sharekey=2644cf1d57cb17d6ab1eab3e9fa335cace0f768f8ef0a62b
---------
Παρουσίαση που πραγματοποιήθηκε στις 26/5/2010 στο τμήμα Πληροφορικής και Τηλεπικοινωνιών ΕΚΠΑ, στα πλαίσια του μεταπτυχιακού μαθήματος "Θέματα Εφαρμογών Βάσεων Δεδομένων"
Jeff will showcase the sparklyr the new R package to interface with Spark and talk about the different use extensions including the rsparkling ML package.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Make a query regarding a topic of interest and come to know the sentiment for the day in pie-chart or for the week in form of line-chart for the tweets gathered from twitter.com
A Strategic Approach: GenAI in EducationPeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
Safalta Digital marketing institute in Noida, provide complete applications that encompass a huge range of virtual advertising and marketing additives, which includes search engine optimization, virtual communication advertising, pay-per-click on marketing, content material advertising, internet analytics, and greater. These university courses are designed for students who possess a comprehensive understanding of virtual marketing strategies and attributes.Safalta Digital Marketing Institute in Noida is a first choice for young individuals or students who are looking to start their careers in the field of digital advertising. The institute gives specialized courses designed and certification.
for beginners, providing thorough training in areas such as SEO, digital communication marketing, and PPC training in Noida. After finishing the program, students receive the certifications recognised by top different universitie, setting a strong foundation for a successful career in digital marketing.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...NelTorrente
In this research, it concludes that while the readiness of teachers in Caloocan City to implement the MATATAG Curriculum is generally positive, targeted efforts in professional development, resource distribution, support networks, and comprehensive preparation can address the existing gaps and ensure successful curriculum implementation.
Thinking of getting a dog? Be aware that breeds like Pit Bulls, Rottweilers, and German Shepherds can be loyal and dangerous. Proper training and socialization are crucial to preventing aggressive behaviors. Ensure safety by understanding their needs and always supervising interactions. Stay safe, and enjoy your furry friends!
How to Build a Module in Odoo 17 Using the Scaffold MethodCeline George
Odoo provides an option for creating a module by using a single line command. By using this command the user can make a whole structure of a module. It is very easy for a beginner to make a module. There is no need to make each file manually. This slide will show how to create a module using the scaffold method.
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
2024 State of Marketing Report – by HubspotMarius Sescu
https://www.hubspot.com/state-of-marketing
· Scaling relationships and proving ROI
· Social media is the place for search, sales, and service
· Authentic influencer partnerships fuel brand growth
· The strongest connections happen via call, click, chat, and camera.
· Time saved with AI leads to more creative work
· Seeking: A single source of truth
· TLDR; Get on social, try AI, and align your systems.
· More human marketing, powered by robots
ChatGPT is a revolutionary addition to the world since its introduction in 2022. A big shift in the sector of information gathering and processing happened because of this chatbot. What is the story of ChatGPT? How is the bot responding to prompts and generating contents? Swipe through these slides prepared by Expeed Software, a web development company regarding the development and technical intricacies of ChatGPT!
A Strategic Approach: GenAI in EducationPeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
Safalta Digital marketing institute in Noida, provide complete applications that encompass a huge range of virtual advertising and marketing additives, which includes search engine optimization, virtual communication advertising, pay-per-click on marketing, content material advertising, internet analytics, and greater. These university courses are designed for students who possess a comprehensive understanding of virtual marketing strategies and attributes.Safalta Digital Marketing Institute in Noida is a first choice for young individuals or students who are looking to start their careers in the field of digital advertising. The institute gives specialized courses designed and certification.
for beginners, providing thorough training in areas such as SEO, digital communication marketing, and PPC training in Noida. After finishing the program, students receive the certifications recognised by top different universitie, setting a strong foundation for a successful career in digital marketing.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...NelTorrente
In this research, it concludes that while the readiness of teachers in Caloocan City to implement the MATATAG Curriculum is generally positive, targeted efforts in professional development, resource distribution, support networks, and comprehensive preparation can address the existing gaps and ensure successful curriculum implementation.
Thinking of getting a dog? Be aware that breeds like Pit Bulls, Rottweilers, and German Shepherds can be loyal and dangerous. Proper training and socialization are crucial to preventing aggressive behaviors. Ensure safety by understanding their needs and always supervising interactions. Stay safe, and enjoy your furry friends!
How to Build a Module in Odoo 17 Using the Scaffold MethodCeline George
Odoo provides an option for creating a module by using a single line command. By using this command the user can make a whole structure of a module. It is very easy for a beginner to make a module. There is no need to make each file manually. This slide will show how to create a module using the scaffold method.
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
2024 State of Marketing Report – by HubspotMarius Sescu
https://www.hubspot.com/state-of-marketing
· Scaling relationships and proving ROI
· Social media is the place for search, sales, and service
· Authentic influencer partnerships fuel brand growth
· The strongest connections happen via call, click, chat, and camera.
· Time saved with AI leads to more creative work
· Seeking: A single source of truth
· TLDR; Get on social, try AI, and align your systems.
· More human marketing, powered by robots
ChatGPT is a revolutionary addition to the world since its introduction in 2022. A big shift in the sector of information gathering and processing happened because of this chatbot. What is the story of ChatGPT? How is the bot responding to prompts and generating contents? Swipe through these slides prepared by Expeed Software, a web development company regarding the development and technical intricacies of ChatGPT!
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
The realm of product design is a constantly changing environment where technology and style intersect. Every year introduces fresh challenges and exciting trends that mold the future of this captivating art form. In this piece, we delve into the significant trends set to influence the look and functionality of product design in the year 2024.
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
Mental health has been in the news quite a bit lately. Dozens of U.S. states are currently suing Meta for contributing to the youth mental health crisis by inserting addictive features into their products, while the U.S. Surgeon General is touring the nation to bring awareness to the growing epidemic of loneliness and isolation. The country has endured periods of low national morale, such as in the 1970s when high inflation and the energy crisis worsened public sentiment following the Vietnam War. The current mood, however, feels different. Gallup recently reported that national mental health is at an all-time low, with few bright spots to lift spirits.
To better understand how Americans are feeling and their attitudes towards mental health in general, ThinkNow conducted a nationally representative quantitative survey of 1,500 respondents and found some interesting differences among ethnic, age and gender groups.
Technology
For example, 52% agree that technology and social media have a negative impact on mental health, but when broken out by race, 61% of Whites felt technology had a negative effect, and only 48% of Hispanics thought it did.
While technology has helped us keep in touch with friends and family in faraway places, it appears to have degraded our ability to connect in person. Staying connected online is a double-edged sword since the same news feed that brings us pictures of the grandkids and fluffy kittens also feeds us news about the wars in Israel and Ukraine, the dysfunction in Washington, the latest mass shooting and the climate crisis.
Hispanics may have a built-in defense against the isolation technology breeds, owing to their large, multigenerational households, strong social support systems, and tendency to use social media to stay connected with relatives abroad.
Age and Gender
When asked how individuals rate their mental health, men rate it higher than women by 11 percentage points, and Baby Boomers rank it highest at 83%, saying it’s good or excellent vs. 57% of Gen Z saying the same.
Gen Z spends the most amount of time on social media, so the notion that social media negatively affects mental health appears to be correlated. Unfortunately, Gen Z is also the generation that’s least comfortable discussing mental health concerns with healthcare professionals. Only 40% of them state they’re comfortable discussing their issues with a professional compared to 60% of Millennials and 65% of Boomers.
Race Affects Attitudes
As seen in previous research conducted by ThinkNow, Asian Americans lag other groups when it comes to awareness of mental health issues. Twenty-four percent of Asian Americans believe that having a mental health issue is a sign of weakness compared to the 16% average for all groups. Asians are also considerably less likely to be aware of mental health services in their communities (42% vs. 55%) and most likely to seek out information on social media (51% vs. 35%).
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
This article is all about what AI trends will emerge in the field of creative operations in 2024. All the marketers and brand builders should be aware of these trends for their further use and save themselves some time!
A report by thenetworkone and Kurio.
The contributing experts and agencies are (in an alphabetical order): Sylwia Rytel, Social Media Supervisor, 180heartbeats + JUNG v MATT (PL), Sharlene Jenner, Vice President - Director of Engagement Strategy, Abelson Taylor (USA), Alex Casanovas, Digital Director, Atrevia (ES), Dora Beilin, Senior Social Strategist, Barrett Hoffher (USA), Min Seo, Campaign Director, Brand New Agency (KR), Deshé M. Gully, Associate Strategist, Day One Agency (USA), Francesca Trevisan, Strategist, Different (IT), Trevor Crossman, CX and Digital Transformation Director; Olivia Hussey, Strategic Planner; Simi Srinarula, Social Media Manager, The Hallway (AUS), James Hebbert, Managing Director, Hylink (CN / UK), Mundy Álvarez, Planning Director; Pedro Rojas, Social Media Manager; Pancho González, CCO, Inbrax (CH), Oana Oprea, Head of Digital Planning, Jam Session Agency (RO), Amy Bottrill, Social Account Director, Launch (UK), Gaby Arriaga, Founder, Leonardo1452 (MX), Shantesh S Row, Creative Director, Liwa (UAE), Rajesh Mehta, Chief Strategy Officer; Dhruv Gaur, Digital Planning Lead; Leonie Mergulhao, Account Supervisor - Social Media & PR, Medulla (IN), Aurelija Plioplytė, Head of Digital & Social, Not Perfect (LI), Daiana Khaidargaliyeva, Account Manager, Osaka Labs (UK / USA), Stefanie Söhnchen, Vice President Digital, PIABO Communications (DE), Elisabeth Winiartati, Managing Consultant, Head of Global Integrated Communications; Lydia Aprina, Account Manager, Integrated Marketing and Communications; Nita Prabowo, Account Manager, Integrated Marketing and Communications; Okhi, Web Developer, PNTR Group (ID), Kei Obusan, Insights Director; Daffi Ranandi, Insights Manager, Radarr (SG), Gautam Reghunath, Co-founder & CEO, Talented (IN), Donagh Humphreys, Head of Social and Digital Innovation, THINKHOUSE (IRE), Sarah Yim, Strategy Director, Zulu Alpha Kilo (CA).
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
The search marketing landscape is evolving rapidly with new technologies, and professionals, like you, rely on innovative paid search strategies to meet changing demands.
It’s important that you’re ready to implement new strategies in 2024.
Check this out and learn the top trends in paid search advertising that are expected to gain traction, so you can drive higher ROI more efficiently in 2024.
You’ll learn:
- The latest trends in AI and automation, and what this means for an evolving paid search ecosystem.
- New developments in privacy and data regulation.
- Emerging ad formats that are expected to make an impact next year.
Watch Sreekant Lanka from iQuanti and Irina Klein from OneMain Financial as they dive into the future of paid search and explore the trends, strategies, and technologies that will shape the search marketing landscape.
If you’re looking to assess your paid search strategy and design an industry-aligned plan for 2024, then this webinar is for you.
5 Public speaking tips from TED - Visualized summarySpeakerHub
From their humble beginnings in 1984, TED has grown into the world’s most powerful amplifier for speakers and thought-leaders to share their ideas. They have over 2,400 filmed talks (not including the 30,000+ TEDx videos) freely available online, and have hosted over 17,500 events around the world.
With over one billion views in a year, it’s no wonder that so many speakers are looking to TED for ideas on how to share their message more effectively.
The article “5 Public-Speaking Tips TED Gives Its Speakers”, by Carmine Gallo for Forbes, gives speakers five practical ways to connect with their audience, and effectively share their ideas on stage.
Whether you are gearing up to get on a TED stage yourself, or just want to master the skills that so many of their speakers possess, these tips and quotes from Chris Anderson, the TED Talks Curator, will encourage you to make the most impactful impression on your audience.
See the full article and more summaries like this on SpeakerHub here: https://speakerhub.com/blog/5-presentation-tips-ted-gives-its-speakers
See the original article on Forbes here:
http://www.forbes.com/forbes/welcome/?toURL=http://www.forbes.com/sites/carminegallo/2016/05/06/5-public-speaking-tips-ted-gives-its-speakers/&refURL=&referrer=#5c07a8221d9b
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
Everyone is in agreement that ChatGPT (and other generative AI tools) will shape the future of work. Yet there is little consensus on exactly how, when, and to what extent this technology will change our world.
Businesses that extract maximum value from ChatGPT will use it as a collaborative tool for everything from brainstorming to technical maintenance.
For individuals, now is the time to pinpoint the skills the future professional will need to thrive in the AI age.
Check out this presentation to understand what ChatGPT is, how it will shape the future of work, and how you can prepare to take advantage.
A brief introduction to DataScience with explaining of the concepts, algorithms, machine learning, supervised and unsupervised learning, clustering, statistics, data preprocessing, real-world applications etc.
It's part of a Data Science Corner Campaign where I will be discussing the fundamentals of DataScience, AIML, Statistics etc.
Time Management & Productivity - Best PracticesVit Horky
Here's my presentation on by proven best practices how to manage your work time effectively and how to improve your productivity. It includes practical tips and how to use tools such as Slack, Google Apps, Hubspot, Google Calendar, Gmail and others.
The six step guide to practical project managementMindGenius
The six step guide to practical project management
If you think managing projects is too difficult, think again.
We’ve stripped back project management processes to the
basics – to make it quicker and easier, without sacrificing
the vital ingredients for success.
“If you’re looking for some real-world guidance, then The Six Step Guide to Practical Project Management will help.”
Dr Andrew Makar, Tactical Project Management
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Aspect Miner: Fine-grained, feature-level opinion mining from rated review corpora
1. Aspect Miner
Fine-grained feature level opinion mining
from rated review corpora
MSc Thesis Defense | February 2012
Stelios Karabasakis
Dept. of Informatics and Telecommunications
National and Kapodistrian University of Athens
in association with the Knowledge Discovery in Databases Laboratory
kddlab.di.uoa.gr
2. INTRODUCTION
Opinion Mining: an overview
What is it? The task of recognizing and classifying the
opinions and sentiments expressed in unstructured text.
Our focus in
Use cases this work Opinion sources
• product comparison • news
• opinion summarization • blogs
• opinion-aware recommendation systems • reviews
• opinion-aware online advertising • user comments
• reputation management • social networks
• business intelligence • forums
• government intelligence • discussion groups
Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 2
3. INTRODUCTION
Reviews
• Popular form of user
movies books
generated content
» consumers use them to
make informed choices
» businesses use them to
gauge and monitor hotels restaurants
consumer sentiment
• Covering many distinct
domains, such as… goods services
Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 3
4. INTRODUCTION
Ratings
• Every online review typically
carries a rating
» picked by the review author
» summarizes the sentiment of
the text
• Corpora of rated reviews are
» abundant on the web
» potentially useful for
supervised opinion mining
» largely ignored in the literature!
Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 4
5. INTRODUCTION
Opinion Mining is challenging
Not as simple as counting positive vs. negative words
It is pointless to discuss why Hitchcock was a genius.
Distinct opinions about different topics in the same sentence
The top-notch production values are not enough to distract from a
clichéd story that lacks heart and soul.
Semantics of subjective expressions are domain-dependent
unpredictable plot twist, gloomy atmosphere (movies)
unpredictable service quality, gloomy room (hotels)
Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 5
6. INTRODUCTION
Opinion Mining is a text classification problem
classification dimensions
• subjectivity: factual vs. subjective statements
• polarity: positive vs. negative sentiment
• intensity: weak vs. strong sentiment
classification granularity ? Motivating question
How can we train a system to
• binary distinguish among multiple
• multiclass degrees of sentiment?
Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 6
7. INTRODUCTION
Classification levels
document level
In “Game of Thrones” (2011), the transition
from book to screen is remarkably successful.
The carefully chosen location and cast, the
top-notch cinematography and the seamless- positive
ness of its narrative come together brilliantly.
The new HBO show offers compelling drama,
even when rehashing old fantasy themes.
Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 7
8. INTRODUCTION
Classification levels
sentence level
In “Game of Thrones” (2011), the transition
positive
from book to screen is remarkably successful.
The carefully chosen location and cast, the
top-notch cinematography and the seamless- positive
ness of its narrative come together brilliantly.
The new HBO show offers compelling drama,
even when rehashing old fantasy themes. positive
Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 8
9. INTRODUCTION
Classification levels
feature level
features = domain-specific ratable properties
In “Game of Thrones” (2011), the transition
adaptation: positive
from book to screen is remarkably successful.
The carefully chosen location and cast, the production: positive
cast: positive
top-notch cinematography and the seamless-
direction: positive
ness of its narrative come together brilliantly. plot: positive
The new HBO show offers compelling drama, serialization: positive
even when rehashing old fantasy themes. subject: negative
? Motivating question
How can we identify feature terms
and the features they refer to?
Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 9
10. INTRODUCTION
Problem description
Produce rich, fine-grained, feature-oriented review summaries
by analyzing reviews at the sentence level and aggregating the results
Sample summary
“Avatar” (2009) aggregated summary of 90 reviews
aspect mentions sentiment mean sentiment dispersion
direction 217 9/10 STRONGLY POSITIVE 17% UNANIMOUS AGREEMENT
story 152 8/10 POSITIVE 32% GENERAL AGREEMENT
acting 177 4/10 WEAKLY NEGATIVE 56% MIXED REACTION
Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 10
11. INTRODUCTION
Solution components
a sentiment lexicon term prior sentiment _
masterpiece 10 (very strongly positive)
multiclass and adapted good 8 (positive)
to the target domain mediocre 5 (very weakly negative)
terrible 2 (strongly negative)
feature term feature a feature lexicon
protagonist CAST
performance CAST for the target domain
deliver CAST
camera DIRECTION
cinematography DIRECTION
dialogue WRITING
script WRITING
and a set of linguistic rules for sentence classification
Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 11
12. INTRODUCTION
The Aspect Miner system
(a proof-of concept implementation of our approach)
Training subsystem
Training corpus Index of
(rated reviews) terms
Feature Term
Lexical
identifier classifier
Analyzer
Feature Sentiment
lexicon lexicon
Result:
Text to
classify
Sentence classifier Feature-level
sentiments
Key features: modular architecture, unsupervised,
domain agnostic, configurable granularity
Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 12
13. INTRODUCTION
Aspect Miner implementation*
• Implemented in Java with
» NekoHTML for scraping
» JDBC/MySQL for dataset storage
» Lucene as a lexical analysis API and for indexing
» Wordnet & JWNL for lemmatization
» Stanford Parser for POS-tagging & typed dependency parsing
» Mallet’s LDA implementation for topic modeling
» GraphViz for visualizations
* source code (MIT-licensed) available from
github.com/skarabasakis/ImdbAspectMiner
Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 13
14. INTRODUCTION
Training dataset*
107.646 movie reviews from IMDB.com, rated 1-10 stars
*available as an SQL dump from http://db.tt/vAthzJRL
mean = 291 words
median = 228 words
# reviews
review length (words)
Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 14
16. SENTIMENT LEXICON
Terms
A term is a (base form, part of speech) tuple
» part of speech {VERB, NOUN, ADJECTIVE, ADVERB}
» a term represents all inflected forms and spellings of a word
e.g. {choose, chooses, chose, chosen, …} [choose VERB]
{localise, localize, …} [localize VERB]
» terms can be compound
e.g. [work out VERB] [common sense NOUN]
[meet up with VERB] [as a matter of fact ADVERB]
Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 16
17. SENTIMENT LEXICON
Lexical analyzer
Training corpus
(rated reviews)
Purpose: to extract terms from texts
Tokenization
» Identifies the base form of words & compounds
POS tagging • Uses Wordnet to look up base forms
Named Entity identification
Lemmatization
» Eliminates non-subjective words
Lexical Analyzer
Comparatives annotation • Stop words including very common terms (be,have,…)
Negation scope resolution
• Named Entities (i.e. proper nouns)
• all articles, pronouns, prepositions etc.
Stop word removal
» Eliminates words that would be misleading
Open-class word filtering
for sentiment classification
• Comparatives & superlatives
Bags of terms
(one per • Words within a negation scope
document)
Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 17
18. SENTIMENT LEXICON
Lexical analysis example
The most dramatic moment in the Sixth Sense does not occur until the
final minutes and the jaw dropping twist Shyamalan has been building up to.
Lemmatize
Eliminate
Get indexable terms
Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 18
19. SENTIMENT LEXICON
Previous approaches to term classification
Lexicon-based approach
• Prior sentiment inferred from lexical associations
(synonyms, antonyms, hypernyms etc.) in a dictionary
• High accuracy, limited coverage
• Notable example: Sentiwordnet (Esuli & Sebastiani 2006)
Corpus-based approach
• Prior sentiment inferred from correlation patterns
(and, or, either…or, but etc.) in a training corpus
• Extended coverage, lower accuracy
• Notable examples: Hatzivassiloglou & McKeown 1997, Turney & Littman 2003,
Popescu & Etzioni 2005, Ding Liu & Yu 2008
Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 19
20. SENTIMENT LEXICON
Ratings-based term classification
Our proposal: a ratings-based approach
positive term negative term
• Requires a training
set of rated reviews
• Prior sentiment
inferred from the
distribution of ratings
among all the reviews neutral term polysemous term
where a term occurs,
i.e. the rating
histogram of the
term
Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 20
21. SENTIMENT LEXICON
IMDB dataset: Ratings distribution
# reviews # terms
# reviews
# terms
rating
Caution: Ratings are not evenly distributed
across the training corpus.
Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 21
22. SENTIMENT LEXICON
Rating frequency weighting
Why? Weighting is necessary to
» eliminate training set biases
» make rating frequencies comparable to each other
How? Multiply every rating frequency in a histogram
with that rating’s weight , calculated as follows:
» := cumulative term count of all reviews with rating
» We pick in such a way that are equal for all
• Most predominant rating in the dataset has =1
• The less frequent the rating, the higher its weight
Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 22
23. SENTIMENT LEXICON
Some sample histograms
extracted from the IMDB dataset
Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 23
24. SENTIMENT LEXICON
Designing a term classifier
input: weighted rating histogram for term
output: one or more* sets of significant ratings
* if term is polysemous
A weighted mean function can
condense into a single rating.
9 5 7 7 10 8
7
9 7 10
This rating indicates the term’s
sentiment.
Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 24
25. SENTIMENT LEXICON
Neutrality criterion
For a term to be neutral, its rating histogram must
approximate a uniform distribution
1
where 0 < ≤1
Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 25
26. SENTIMENT LEXICON
Term classification schemes
Scheme 1: Peak Classifier
Picks the histogram’s
peak rating as the only
significant rating
Pros Simplest classifier possible. Useful as a comparison baseline.
Surprisingly capable at classifying polarity (almost 2/3 accurate)
Cons Can’t detect polysemy
Poor at classifying intensity
Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 26
27. SENTIMENT LEXICON
Term classification schemes
Scheme 2: Positive/Negative Area Classifier (PN)
All ratings above a cutoff
frequency are significant
Cutoff frequency should
be set a little bit above 11
the frequency average.
Returns separate sets for
positive and negative
ratings
Pros Better at classifying intensity
Makes an attempt at detecting polysemy
Cons Weak terms can be mistaken for polysemous
Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 27
28. SENTIMENT LEXICON
Term classification schemes
Scheme 3: Widest Window Classifier (WW)
Looks for windows of
consecutive significant ratings
Ratings are added to windows
from most to least frequent
Significant rating windows must
satisfy 2 constraints
minimum coverage:
windows must contain at
least of samples
be as wide as possible
Returns as many rating classes
as the windows it detects
Pros Avoids detecting false polysemy
Avoids biases exhibited by the other classification schemes
Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 28
29. SENTIMENT LEXICON
Classifier evaluation: Ratings Distribution
We classified 33.000 terms
that appear ≥5 times in the IMDB dataset.
Conclusion: WW classifier distributes rating classes more evenly
PEAK
PN
WW
Distribution of primary rating classes for each classifier
Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 29
30. SENTIMENT LEXICON
Classifier evaluation: Polarity
We evaluate against a reference lexicon of 5272 terms
based on the MPQA and General Inquirer subjectivity lexicons.
Accuracy Precision Recall F1-Score WW is the most
POSITIVE 55.5% 44.2% 49.2% accurate of the 3
PEAK 63.6%
NEGATIVE 67.3% 65.3% 66.3%
proposed classifiers
POSITIVE 62.4% 58.4% 60.4% But not as accurate
PN 66.2%
NEGATIVE 68.4% 72.3% 70.3% than SentiWordnet
POSITIVE 70.4% 86.2% 77.5%
WW 70.1% However, WW is
NEGATIVE 69.6% 60.5% 64.8%
more accurate for
POSITIVE 63.6% 61.3% 62.4% domain-specific
SentiWordnet 73.2%
NEGATIVE 83.6% 48.3% 61.3% terms
Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 30
31. SENTIMENT LEXICON
Classifier evaluation: Intensity
We evaluate against a test set of 443 strong + 323 weak terms
based on the General Inquirer subjectivity lexicon.
WEAK STRONG
40.0% Using the WW classifier
% terms in WW lexicon
to classify intensity:
30.0%
78% of strong terms
Ποσοστό όρων
20.0%
are classified 3 and
above
10.0%
83% of weak terms
are classified 3 and
0.0%
1 2 3 4 5 below
Intensity Τιμή Έντασης WW lexicon
class in WW
Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 31
32. SENTIMENT LEXICON
The Aspect Miner sentiment lexicon*
A reusable sentiment lexicon for the movie review domain
* downloadable from
github.com/skarabasakis/ImdbAspectMiner/blob/master/imdb_sentiment_lexicon.xls
Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 32
34. FEATURE IDENTIFICATION
Approaches to feature identification
The traditional approach: discovery through heuristics
• frequency: commonly occurring noun phrases are often features
(Hu & Liu 2004)
• co-occurrence: terms commonly found near subjective expressions
may be features (Kim & Hovy 2006, Qiu et al. 2011)
• language patterns: in phrases such as 'F of P' or 'P has F‘, P is a
product and F is a feature (Popescu & Etzioni 2005)
• background knowledge: user annotations, ontologies, search
engine results, Wikipedia data…
An up-and-coming approach: topic modeling
Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 34
35. FEATURE IDENTIFICATION
Topic Modeling
Probabilistic Topic Models can model the
abstract topics that occur in a set of documents
documents are
mixtures of topics
topics are
distributions over words
Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 35
36. FEATURE IDENTIFICATION
Topic Modeling
Probabilistic topic models
• require that the user specifies a number of topics
» Topics are just numbers – their semantic interpretation is not the model’s concern
• make an assumption about the probability distribution of topics
• define a probabilistic procedure for generating documents from topics
» by inverting this procedure, we can infer topics from documents
A popular topic model: Latent Dirichlet Allocation (LDA)
• assumes that topics follow a Dirichlet prior distribution
» i.e. each document is associated with just a small number of topics
Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 36
37. FEATURE IDENTIFICATION
Topics vs. Features
? Motivating question Here are a few sample topics we
Features are a form of topics. Can we got from running LDA on the
use topic models to discover features? IMDB dataset
ROLE SCRIPT WAR POLICE CAR
ACTOR IDEA HERO CASE CHASE
PERFORMANCE DIALOGUE ATTACK MYSTERY SHOOT
PLAY WRITE GROUP VICTIM VEHICLE
LEAD PLOT AIRPLANE SOLVE COP
CAST SCREENPLAY BUNCH MURDER DRIVE
SUPPORT COME UP SOLDIER OFFICER KILL
ACTRESS CRAFT KILL SUSPECT STREET
SHINE EXPLAIN BOMB DETECTIVE BULLET
STAR HOLE ENEMY CRIME ROBBERY
These topics arefeatures. These topics arethemes.
They are useful to us We are not interested in them
Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 37
38. FEATURE IDENTIFICATION
Feature identification with LDA
Problem. Topics are global, features are local
Solution. Train topic model on shorter segments (e.g. sentences) rather
than full documents.
Problem. Running LDA on such short segments produces noisy topics
Solution. Implement a bootstrap aggregation scheme to filter the noise:
1. Train N topic models from different subsets of dataset
2. Merge similar topics across models to produce a single meta-model
» Intuition: Valid feature-topics should occur in >1 models and share many common top
terms. Noisy topics should be isolated to specific models
Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 38
39. FEATURE IDENTIFICATION
Merging topics
COMEDY 0.200 COMEDY 0.180 COMEDY 0.380
JOKE 0.099 PARODY 0.168 PARODY 0.168
LAUGH
FUN
0.096
0.088
+ SATIRE
JOKE
0.099
0.061
= JOKE
LAUGH
0.160
0.096
FORMULA 0.025 RIDICULE 0.054 SATIRE 0.099
FUN 0.088
RIDICULE 0.054 discarded
FORMULA 0.025
Topic Similarity for topics Tm, Tn
» More common terms with higher
probabilities higher similarity
Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 39
40. FEATURE IDENTIFICATION
Merging topic models
To merge 2 topic sets
• Merge every topic of set A to most similar topic from set B
» but only if that similarity is above average similarity
To merge N topic sets
• Merge first two, then merge the result with the third etc.
• At the end
» discard topics with a low merging degree
» If same term ends up in >1 topics, only keep it in the topic where it
has the highest probability
Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 40
41. FEATURE IDENTIFICATION
Movie feature lexicon
56 topics, manually labeled with 18 labels
Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 41
44. SENTENCE-LEVEL ANALYSIS
Typed Dependencies
Natalie Portman comes off as very believable,
Typed dependencies are binary gaining empathy from the audience.
grammatical relations between
word pairs in a sentence
(de Marneffe et al., 2006)
amod(relations, binary)
type governor dependent
Typed dependency trees are
• semantically richer than syntax trees
• easier to process, because content words are connected directly
rather than through function words
Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 44
45. SENTENCE-LEVEL ANALYSIS
Dependency types
Stanford Parser’s representation defines a
hierarchy of 48 dependency types
Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 45
46. SENTENCE-LEVEL ANALYSIS
Contextual sentiment estimation
? Motivating question
What is the contextual sentiment of a dependency,
given the prior sentiment of its constituents?
Examples
It is best to avoid watching infmod(best/+2, avoid/−4) −4
any of the increasingly xcomp(avoid/−4, watching/+2) −2
disappointing sequels.
advmod(disappointing/−2, increasingly/+3) −3
Our model. We empirically developed and formally defined
• 6 outcome functions that model types of word interactions
• 42 dependency rules that cover all possible dependency patterns
Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 46
47. SENTENCE-LEVEL ANALYSIS
Outcome functions
Models an interaction where
UNCHANGED base term imposes the sentiment
Ιt seems that they ran out of budget.
STRONGER stronger term imposes the sentiment
a mighty talent wasted in mass produced rom-coms
AVG both terms contribute equally to the sentiment
intelligent and ambitious
Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 47
48. SENTENCE-LEVEL ANALYSIS
Outcome functions
Models an interaction where
INTENSIFY modifier increases the intensity of the base
increasingly disappointing sequels
REFLECT modifier overrides polarity, increases or decreases intensity of base
impossible to enjoy unless you lower your expectations
NEG modifier diminishes or negates the base
not a masterpiece, but not bad either
Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 48
49. SENTENCE-LEVEL ANALYSIS
Dependency Rules: General form
td(pgov, pdep) outcome_base
type label term patterns outcome function base specifier
A pattern may specify: one of the following: GOV or DEP
• a list of allowed parts of speech UNCHANGED NEGATED
• a white list of specific terms STRONGER AVG
INTENSIFY REFLECT
POSITIVE NEGATIVE
Examples conj(*,*) AVG_DEP
advmod({n,a,r},*) INTENSIFY_GOV
amod(*,{too}) NEGATIVE_GOV
Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 49
51. SENTENCE-LEVEL ANALYSIS
Sentence classification algorithm
Initialization
• Generate dependency tree from sentence
• Annotate subjective terms with prior polarities from sentiment lexicon
• Annotate feature terms with labels from feature lexicon
Sentiment estimation
• Apply closest matching rule to every dependency relation in the tree
» The sentiment of the dependency replaces previous sentiment of the governor node
» Dependencies are processed in reverse postfix order (bottom to top and right to left)
Feature targeting
• The scope of a feature term is a subtree that contains the term and goes
» all the way down to the leaves
» all the way up to the closest clausal dependency
• the sentiment at the root of the subtree gets assigned to the feature
Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 51
52. SENTENCE-LEVEL ANALYSIS
Sentence classification example
Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 52
53. SENTENCE-LEVEL ANALYSIS
Sentence polarity evaluation
Test set: Sentence polarity dataset by Pang & Lee, 2002
(5331 positive + 5331 negative sentences from movie reviews)
Results
Polarity classification is accurate for
71.5% of positive sentences
76.9% of negative sentences
74.2% of all sentences
Analysis of error causes
39.0% inaccurate dependency rule
28.5% misclassified term (or we picked the wrong sense)
21.5% erroneous sentence parsing
8.5% ambiguous sentence
2.5% dependency rules applied in the wrong order
Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 53
54. SENTENCE-LEVEL ANALYSIS
Comparative evaluation
Reference Method Accuracy
Linguistic methods
Nakagawa, Irui & Kurohashi, 2005 majority voting 62,9%
Ikeda & Takamura, 2008 majority voting with negations 65.8%
Aspect Miner dependency rules 74.2%
Learning based methods
Andreevskaia & Bergler, 2008 naïve bayes 69.0%
Nakagawa, Irui & Kurohashi, 2005 SVM (bag-of-features) 76.4%
Arora, Mayfield et al., 2010 genetic programming 76.9%
SVM (sentence-wise learning
Ikeda & Takamura, 2008 77.0%
with polarity shifting + ngrams)
Nakagawa, Irui & Kurohashi, 2005 dependency tree CRFs 77.3%
Conclusion: Our method fares well among linguistic techniques,
but does not match the accuracy of learning based methods
Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 54
56. Training subsystem
CONCLUSIONS
Training corpus Term classifier
(rated reviews)
Corpus statistics Term Histogram
collection generation
Tokenization
POS tagging
Named Entity identification Index of PEAK PN WW
Indexing
Lexical Analyzer Lemmatization terms classifier classifier classifier
Comparatives annotation
Feature identifier
Negation scope resolution
... Topic
models
Stop word removal partition 1
Training set partitioning
TΜ1 TΜ2 ... TΜΝ-1 TΜΝ
...
Open-class word filtering partition 2
LDA
...
Aggregation
...
Bags of terms partition N-1
(one per ...
document)
partition N Assisted labeling
Sentiment lexicon Feature lexicon
Dependency Dependency Sentence & Feature
parsing tree(s) Classification
Result:
Text to
Feature-sentiment
classify
pairs
Dependency
Rule set
Sentence classifier
Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 56
57. CONCLUSIONS
Summary of contributions
• We showed the feasibility of • We developed a
granular prior polarity reusable sentiment lexicon
classification using review and feature lexicon for the
ratings movie review domain
» and developed a classifier that
achieved at least 70% accuracy • We created a set of linguistic
on the training dataset rules and developed a
methodology that is capable
• We suggested a fine-grained feature-level
bagging-inspired classification of sentences
meta-algorithm for » and achieved 74.2% accuracy
discovering feature topics for polarity classification on
with LDA our test dataset.
Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 57
58. CONCLUSIONS
Suggested Improvements
Term classification intensifier term
• Assigning a special class to intensifier terms
• Per-feature polysemy resolution
Feature identification
• Named entities as features
• Applying multi-grain topic models for
discovery of local topics, e.g. MG-LDA (Titov & MacDonald, 2008)
Sentence-level classification
• Supervised learning of rules.
Replace manually-made set of rules with a set of rules inferred from
frequent dependency patterns.
Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 58
59. CONCLUSIONS
References
For a complete list of references, see the full report (in greek)
http://j.mp/AspectMiner
B. Liu, “Sentiment analysis and subjectivity,” Handbook of Natural M. Huand B. Liu, “Mining and summarizing customer reviews,” in
Language Processing,, pp. 978–1420085921, 2010. Proceedings of the tenth ACM SIGKDD international conference
B. Pang and L. Lee, “Opinion mining and sentiment analysis,” on Knowledge discovery and data mining, 2004, pp. 168–177.
Foundations and Trends in Information Retrieval, vol. 2, no. 1-2, X. Ding, B. Liu, and P S. Yu, “A holistic lexicon-based approach to opinion
.
pp. 1–135, 2008. mining,” in Proceedings of the international conference on Web
A. Esuliand F. Sebastiani, “Sentiwordnet: A publicly available lexical search and web data mining, 2008, pp. 231–240.
resource for opinion mining,” in Proceedings of LREC, 2006, vol. 6, I. Titovand R. McDonald, “Modelingonline reviews with multi-grain
pp. 417–422. topic models,” in Proceeding of the 17th international conference
V. Hatzivassiloglouand K. R. McKeown, “Predicting the semantic on World Wide Web, 2008, pp. 111–120.
orientation of adjectives,” in Proceedings of the eighth conference T. Nakagawa, K. Inui, and S. Kurohashi, “Dependency tree-based
on European chapter of the Association for Computational sentiment classification using CRFswith hidden variables,” in
Linguistics, 1997, pp. 174–181. Human Language Technologies: The 2010 Annual Conference of
P Turney, M. L. Littman, and others, “Measuring praise and criticism:
. the North American Chapter of the Association for Computational
Inference of semantic orientation fromassociation,” in ACM Linguistics, 2010, pp. 786–794.
Transactions on Information Systems (TOIS), 2003. A. Andreevskaiaand S. Bergler, “When specialists and generalists work
A. M. Popescuand O. Etzioni, “Extracting product features and together: Overcoming domain dependence in sentiment
opinions from reviews,” in Proceedings of the conference on tagging,” ACL-08: HLT, 2008.
Human Language Technology and Empirical Methods in Natural D. Ikeda and H. Takamura, “Learning to shift the polarity of words for
Language Processing, 2005, pp. 339–346. sentiment classification,” Comp.Intelligence, vol. 25, no. 1, pp.
296–303, 2008.
Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 59
Editor's Notes
experimental opinion mining system for user reviews
Identifying compound terms brings us some of the benefits of n-grams, without the increased costs and noise
If polysemous, the RC set with the highest frequency sum indicates the term’s primary sentiment.