SlideShare a Scribd company logo
NE7012 SOCIAL
NETWORK ANATYSIS
PREPARED BY: A.RATHNADEVI A.V.C COLLEGE OF
ENGINEERING
UNIT 5-TEXT AND OPINION MINING
UNIT V TEXT AND OPINION MINING
Text Mining in Social Networks -Opinion extraction – Sentiment classification and clustering -
Temporal sentiment analysis - Irony detection in opinion mining - Wish analysis – Product
review mining – Review Classification – Tracking sentiments towards topics over time
5.1 Text Mining in Social Networks
5.1.1 Text mining definition
 The objective of Text Mining is to exploit information contained in textual documents in
various ways, including discovery of patterns and trends in data, associations among
entities, predictive rules, etc
 The results can be important both for:
 the analysis of the collection, and
 providing intelligent navigation and browsing methods
5.1.2 Text mining pipeline
5.1.3 Motivation for Text Mining
 Approximately 90% of the world’s data is held in unstructured formats (source:
Oracle Corporation)
 Information intensive business processes demand that we transcend from simple
document retrieval to “knowledge” discovery.
 The justification for the interest in text mining is the same as for the interest in
knowledge retrieval (search and categorization).
 The shear amount of unstructured data (mostly textual) out there calls for more than just
document retrieval. Tools and techniques exist to mine this data and realize value in the
same way that data mining taps structured data for business intelligence and knowledge
discovery.
5.1.4 Text mining process
 Text preprocessing
- Syntactic/Semantic text analysis
 Features Generation
- Bag of words
 Features Selection
- Simple counting
- Statistics
 Text/Data Mining
- Classification- Supervised learning
- Clustering- Unsupervised learning
 Analyzing results
- Mapping/Visualization
- Result interpretation
5.1.5 Challenges in text mining
 Data collection is “free text”, is not well-organized (Semi-structured or unstructured)
 No uniform access over all sources, each source has separate storage and algebra,
examples: email, databases, applications, web
 A quintuple heterogeneity: semantic, linguistic, structure, format, size of unit information
 Learning techniques for processing text typically need annotated training
 XML as the common model, it allows:
o Manipulation data with standards
o Mining becomes more data mining
o RDF emerging as a complementary model
 The more structure you can explore the better you can do mining
5.1.6 Text mining actors
5.1.7 Text mining tasks
5.1.8 Applications of Text Mining
 Keyword Search
 Classification
 Clustering
 Linkage-based Cross Domain Learning
5.1.8.1 Keyword Search
 simple but user-friendly interface for information retrieval on the Web.
 Proves to be an effective method for accessing structured data.
 The challenges lie in three aspects:
o Query semantics
o Ranking strategy
o Query efficiency
Keyword Search Algorithms
 Query Semantics and Answer Ranking
 Keyword search over XML and relational data
 Keyword search over graph data
5.1.8.2 Classification Algorithms
 Content-based text classification
o Naive Bayes classifier, TFIDF classifier and Probabilistic Indexing classifier
 Challenges in the context of text classification:
o Social networks contain a much larger and non-standard vocabulary
o The labels in social networks may often be quite sparse
o use of content can greatly improve the effectiveness of the link-based
classification process
5.1.8.3 Clustering Algorithms
 Related to the traditional problem of graph partitioning
 The problem of graph partitioning is NP-hard and often does not scale very well to large
networks.
 Methods:
o The Kerninghan-Lin algorithm
o link-based clustering
o clustering graph streams
 uses only the structure of the network for the clustering process.
 Improve the quality of clustering by using the text content in the nodes of the social
network.
 use a number of variants of traditional clustering algorithms for multi-dimensional data.
 Most of these methods are variants of the k-means method
o start off with a set of k seeds and build the clusters iteratively around these seeds.
o The seeds and cluster membership are iteratively defined with respect to each
other, until we converge to an effective solution.
 Perform the clustering with the use of both content and structure information.
 constructs a new graph which takes into account both the structure and attribute
information.
 Such a graph has two kinds of edges:
 structure edges from the original graph, and
 attribute edges, which are based on the nature of the attributes in the different nodes.
 A random walk approach is used over this graph in order to define the underlying
clusters.
 Each edge is associated with a weight, which is used in order to control the probability of
the random walk across the different nodes.
 These weights are updated during an iterative process, and the clusters and the weights
are successively used in order to refine each other.
 weights and the clusters will naturally converge, as the clustering process progresses
5.2 Sentiment analysis
5.2.1 Introduction
 Sentiment analysis (opinion mining): Computational and automatic study of people’s
opinions expressed in written language or text.
 Two types of information are in text data:
 Objective information: facts.
 Subjective information: opinions.
 The focus of sentiment analysis:
 subjective part of text à identify opinionated information rather than mining and retrieval
of factual information.
 Sentiment analysis brings together various fields of research: text mining, Natural
Language Processing, Data mining.
5.2.2 APPLICATIONS
 Review summarizations.
- Review-oriented search engines.
- Search for people’s opinions: How do people think about iPhone 5s?
 Recommendation systems.
- If you can do sentiment analysis, then the recommendation system can recommend
items with positive feedback and not recommend items with negative feedback.
 Information extraction systems.
- These systems focus on objective parts to extract factual information.
- They can discard subjective sentences.
 Question-answering systems.
- Different types of questions: definitional and opinion oriented questions.
- Both individuals and organizations can take advantage of sentiment analysis.
5.2.3 Levels Of Sentiment Analysis
 Document level
- Identify the opinion orientation of the whole document.
 Sentence level
- Identify whether the sentence is subjective or objective.
- Identify the opinion orientation of subjective sentences.
 Aspect level
- Identify the aspects that the users are commenting on.
- Identify the opinion orientation about each aspect.
5.2.4 System process
5.2.5 ASPECT IDENTIFICATION
 Using clustering to find similar sentences.
 It is likely that similar sentences are about similar aspects.
 For sentence clustering the method that we use for representing each sentence is
important.
 The major reason that regular clustering algorithms did not work (Gamon et al [2005]) is
the lack of proper method to represent each sentence.
 Sentences representation
 BOW representation: considers all terms in the sentence.
 BON representation: considers only nouns of the sentence.
5.2.6 Sentiment Identification
 Machine learning approach sees the sentiment identification problem as a classification
problem. Make use of manually labeled training data.
 Two major tasks in designing a classifier
 Feature extraction: come up with a set of features that represents your problem properly.
 Classifier selection: choose a classifier among KNN, Naïve Bayes, SVM, Maximum
Entropy.
 Our approaches are related to feature extraction steps.
 Support Vector Machines are widely used in text classification. We use SVM as well.
5.2.7 Sentiment classification
 Classify sentences/documents (e.g. reviews)/features based on the overall sentiments
expressed by authors
o positive, negative and (possibly) neutral
 Similar to topic-based text classification
o Topic-based classification: topic words are important
o Sentiment classification: sentiment words are more important (e.g: great,
excellent, horrible, bad, worst)
 In summary, approaches used in sentiment classification
o Unsupervised – eg: NLP pattern @ NLP patterns with lexicon
o Supervised – eg: SVM, Naive Bayes..etc (with varying features like POS tags,
word phrases)
o Semi Supervised – eg: lexicon+classifier
1) Supervised Learning
 Supervised learning (or called classification) is one of the major tasks in the research
areas such as machine learning, artificial intelligence, data mining, and so forth.
 A supervised learning algorithm commonly first trains a classifier (or inferred function)
by analyzing the given training data and then classify (or give class label to) those test
data.
 One typical example for supervised learning in web mining is that if we are given many
already known web pages with labels (i.e., topics in Yahoo!), how to automatically set
labels to the new web pages.
 In this section, we briefly introduce some most commonly used techniques for supervised
learning. More kinds of strategies and algorithms can be found.
 Nearest Neighbor Classifiers
 Decision Tree
 Bayesian Classifiers
 Neural Networks Classifier
.
2) Unsupervised Learning
 In this section, we will introduce major techniques of unsupervised learning (or
clustering).
 Among a large amount of approaches that have been proposed, there are three
representative unsupervised learning strategies, i.e., k-means, hierarchical clustering and
density based clustering.
3) Semi-supervised Learning
 In the previous two sections, we have introduced the learning issues on the labeled data
(supervised learning or classification), and the unlabeled data (unsupervised learning or
clustering).
 In this chapter, we will present the basic learning techniques when both of the two kind
of data are given.
 The intuition is that large amount of unlabeled data is easier to obtain (e.g., pages crawled
by Google) yet only a small part of them could be labeled due to resource limitation.
 The research is so-called semi-supervised learning (or semi-supervised classi f ication),
which aims to address the problem by using large amount of unlabeled data, together
with the labeled data, to build better classifiers.
 There are many approaches proposed for semi-supervised classification, in which the
representatives are self-training, co-training, generative models, graph-based methods.
5.3 Temporal sentiment analysis
5.3.1 Overview
 The method produces topic graph and sentiment graph by using sentiment phrases which
are patterns of sentiment expression such as “happy” or “delighted at”.
 We extracted 383 sentiment phrases from Japanese news articles manually, and classified
them into eight categories: anxiety, sorrow, anger, happiness, suffering, fatigue,
complaint, and shock.
5.3.2 Procedure for Making a Topic Graph
Following is the procedure for making a topic graph. Given: one of sentiment category S which
is specified by a user period of time: D=(d1, d2, …, dl)
Step 1: For each day di in D, retrieve articles containing sentiment phrases of sentiment s.
Step 2: Extract keywords from retrieved articles by using a keyword extraction system called
GENSEN-Web3 that can extract compound nouns as a keyword.
Step 3: For each extracted keywords wj(j=1,2,…,N), calculate an average correlation c between
wj and sentiment phrases contained in S. We use the Dice coefficient for calculating correlation.
Step 4: Extract top n keywords according to the score defined by the products of (1) number of
days in which keywords appears, (2) inverse frequency of number of days, and (3) scores
provided by GENSEN-Web. Step 4’(optional): Put keywords into clusters based on correlation
coefficient over timeline and the Dice coefficient in an article.
Step 5: Generate a temporal graph for each n keywords (or clusters). For viewability of the
graph, we apply moving average.
5.3.4 Procedure for Making a Sentiment Graph
Following is the procedure for making a sentiment graph. Given: a keyword w which is specified
by a user period of time: D=(d1, d2, …, dl)
Step 1: Retrieve articles containing keyword w for each day di(i=1,2,…,l).
Step 2: For each articles, calculate the sum of frequency of sentiment phrases for all sentiment
categories.
Step 3: Generate a temporal graph of frequency of sentiment phrases for each sentiment
category. Then, moving average is applied to the graph.
5.4 Irony detection in opinion mining
 In video/spoken discourse, especially in a conversational context, we are usually able to
detect a variety of external clues (e.g. facial expression, intonation, pause duration) that
enable the perception of irony. In written text, a set of more or less explicit linguistic
strategies is also used to express irony. In the next subsections, we describe eight
linguistic patterns that we have previously identified to be related to the expression of
irony (Table 1). Some are specific to Portuguese (e.g. morphological patterns) while
others seem to be language independent (e.g. emoticons).
1. P𝑑𝑖𝑚: Diminutive Forms
Diminutives are commonly used in Portuguese, often with the purpose of expressing
positive sentiments, like affect, tenderness and intimacy. However, they can also be
sarcastically and ironically used for expressing an insult or depreciation towards the
entity they represent. This is especially so when diminutives are found in NE mentioning
well-known personalities, such as political entities (e.g. “Socratezinho” for the current
Portuguese prime-minister, Jos´e S´ocrates).
2. P𝑑𝑒𝑚: Demonstrative Determiners
In Portuguese, the occurrence of any demonstrative form – namely, “este” (this), “esse”
and “aquele” (that) – before an human NE usually indicates that such entity is being
negatively or pejoratively mentioned. In some cases, demonstratives (DEM ) are the
unique explicit clue that signals the presence of irony (e.g. “Este S´ocrates ´e muito
amigo do Sr. Jack” / “This S´ocrates is a very good friend of Mr. Jack”).
3. P𝑖𝑡𝑗 : Interjections
Interjections abound in subjective texts, particularly in UGC, carrying on valuable
information concerning authors’ emotions, feelings and attitudes. We believe that some
interjections can be used as potential clues for irony detection, when they appear in
specific contexts, such as the ones represented in the Pattern P𝑖 . Since we are especially
interested in recognizing irony in prior positive text, we confined our analysis to a small
set of interjections that are commonly used to express positive sentiments, namely:
“bravo”, “for¸ca”, “muito obrigado/a”, “obrigado/a”, “obrigadinho/a”, “parab´ens”,
“muitos parab´ens” and “viva”.
4. P𝑣𝑒𝑟𝑏: Verb Morphology
The type of pronoun used for addressing people can also be an important clue for irony
detection in UGC, especially in languages like Portuguese, where the choice of a specific
pronoun or way of expression (e.g. “tu” vs. “vocˆe”, both translatable by “you”) may
depend on the degree of proximity/familiarity between the speaker and the NE it refers
to. The pronoun “tu” is used in a familiar context (e.g. with friends and family). In our
experiments, we analyze to what extent the use of the pronoun “tu” for addressing a
wellknow named entity can be used as a clue for irony detection in UGC. As represented
in P𝑣𝑒𝑟𝑏, the pronoun can be either explicitly referred in the text or it can be embedded
in the morphology of the verb (which is in the second-person singular). We confined the
analysis to the verb “ser” (to be).
5. P𝑐𝑟𝑜𝑠𝑠: Cross-constructions
In Portuguese, evaluative adjectives with a prior positive or neutral polarity usually take a
negative or ironic interpretation whenever they appear in cross-constructions, where
adjectives relate to the noun they modify through the preposition “de” (e.g. “O comunista
do ministro” / “The communist of the minister”) [2]. Pattern P𝑐𝑟𝑜𝑠𝑠 recognizes cross-
constructions headed by a positive or neutral adjective (ADJ𝑝𝑜𝑠 or ADJ𝑛𝑒𝑢𝑡,
respectively), which modify a human NE. Adjectives are preceded by a demonstrative
(DEM ) or an article (ART) determiner.
6. P𝑝𝑢𝑛𝑐𝑡: Heavy Punctuation
In UGC, punctuation is frequently used both for verbalizing user immediate emotions and
feelings and for intentionally signaling humoristic or ironic text. We assume that the
presence in a sentence of a sequence composed of more than one exclamation point
and/or question mark can be used as a clue for irony detection.
7. P𝑞𝑢𝑜𝑡𝑒: Quotation Marks
Quotation marks are also frequently used to express and emphasize an ironic content,
especially if the content has a prior positive polarity (e.g. positive adjective qualifying an
entity). In our experiments, we tried to find possible ironic sentences by searching quoted
sequences composed of one or two words, corresponding, at least one of them, to a
positive adjective or noun.
8. P𝑙𝑎𝑢𝑔ℎ: Laughter Expressions
Internet slang contains a variety of widespread expressions and symbols that typically
represent a sensory expression, suggesting different attitudes or emotions. In our
experiments, we considered (i) the acronyms “lol” and corresponding variations (LOL),
(ii) onomatopoeic expressions such as “ah”, “eh” and “hi” (AH) and (iii) the prior
positive emoticons “:)”“;-)” and “:P” (EMO+). In this particular case, we did not
constraint the polarity of elements contained in the sentence. We assume that laugh
expressions are intrinsically positive or ironic
5.5 Product review mining
5.5.1 Motivation
 A rapid expansion of e-commerce, where more and more products are sold via online
portals (Amazon, eBay … )
 Online product reviews thus become an important resource:
o Customers to share and find opinions about products easily
o Producers to get certain degrees of feedback
5.5.2 Related works
 Single-document summarization
o Extractive-based approach
 Sentence score + ranking
 Machine learning technique
o Abstractive-based approach
 Template
 Concept hierarchy
 Multi-document summarization
o Extractive-based approach
 Sentence score + ranking + MMR + Ordering
o Abstractive-based approach
 Template
 Concept hierarchy
 Sentence fusion with paraphrasing rules
 Sentiment analysis
o Reviews polarity classification
o PROS/ CONS identification
o Mining review opinions
 Identify product facets
 Identify opinion orientation on the facet
5.5.3 Process
5.5.4 Product facets identification
o Association rule mining
 Each transaction consists of nouns/noun phrases from single sentence
 The frequent itemsets are the candidate product facets
o Redundancy pruning
 Removing redundant facets that contain only single words. (e.g. life ->
battery life)
o Compactness pruning
 Removing meaningless facets that contain multiple words

More Related Content

What's hot

NE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSISNE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSIS
rathnaarul
 
Community detection in social networks
Community detection in social networksCommunity detection in social networks
Community detection in social networks
Francisco Restivo
 
Community Detection in Social Media
Community Detection in Social MediaCommunity Detection in Social Media
Community Detection in Social Media
Symeon Papadopoulos
 
Social Media Mining - Chapter 8 (Influence and Homophily)
Social Media Mining - Chapter 8 (Influence and Homophily)Social Media Mining - Chapter 8 (Influence and Homophily)
Social Media Mining - Chapter 8 (Influence and Homophily)
SocialMediaMining
 
Social Network Analysis To Blog Based Online Communities
Social Network Analysis To Blog Based Online CommunitiesSocial Network Analysis To Blog Based Online Communities
Social Network Analysis To Blog Based Online Communitiessubby88
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network Analysis
Sujoy Bag
 
Community detection algorithms
Community detection algorithmsCommunity detection algorithms
Community detection algorithms
Alireza Andalib
 
Social Media Mining - Chapter 10 (Behavior Analytics)
Social Media Mining - Chapter 10 (Behavior Analytics)Social Media Mining - Chapter 10 (Behavior Analytics)
Social Media Mining - Chapter 10 (Behavior Analytics)
SocialMediaMining
 
Social Network Analysis Workshop
Social Network Analysis WorkshopSocial Network Analysis Workshop
Social Network Analysis Workshop
Data Works MD
 
Social Media Mining - Chapter 2 (Graph Essentials)
Social Media Mining - Chapter 2 (Graph Essentials)Social Media Mining - Chapter 2 (Graph Essentials)
Social Media Mining - Chapter 2 (Graph Essentials)
SocialMediaMining
 
Social Media Mining - Chapter 3 (Network Measures)
Social Media Mining - Chapter 3 (Network Measures)Social Media Mining - Chapter 3 (Network Measures)
Social Media Mining - Chapter 3 (Network Measures)
SocialMediaMining
 
Data visualization
Data visualizationData visualization
Data visualization
Christian Stade-Schuldt
 
Unit 4.pdf
Unit 4.pdfUnit 4.pdf
Unit 4.pdf
Jayaprasanna4
 
Introduction to Social Network Analysis
Introduction to Social Network AnalysisIntroduction to Social Network Analysis
Introduction to Social Network Analysis
Premsankar Chakkingal
 
Community Detection in Social Networks: A Brief Overview
Community Detection in Social Networks: A Brief OverviewCommunity Detection in Social Networks: A Brief Overview
Community Detection in Social Networks: A Brief Overview
Satyaki Sikdar
 
Data mining for social media
Data mining for social mediaData mining for social media
Data mining for social media
rangesharp
 
Social network analysis intro part I
Social network analysis intro part ISocial network analysis intro part I
Social network analysis intro part I
THomas Plotkowiak
 
Feature detection and matching
Feature detection and matchingFeature detection and matching
Feature detection and matching
Kuppusamy P
 
Deep Learning for Graphs
Deep Learning for GraphsDeep Learning for Graphs
Deep Learning for Graphs
DeepLearningBlr
 
Spatial data mining
Spatial data miningSpatial data mining
Spatial data mining
MITS Gwalior
 

What's hot (20)

NE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSISNE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSIS
 
Community detection in social networks
Community detection in social networksCommunity detection in social networks
Community detection in social networks
 
Community Detection in Social Media
Community Detection in Social MediaCommunity Detection in Social Media
Community Detection in Social Media
 
Social Media Mining - Chapter 8 (Influence and Homophily)
Social Media Mining - Chapter 8 (Influence and Homophily)Social Media Mining - Chapter 8 (Influence and Homophily)
Social Media Mining - Chapter 8 (Influence and Homophily)
 
Social Network Analysis To Blog Based Online Communities
Social Network Analysis To Blog Based Online CommunitiesSocial Network Analysis To Blog Based Online Communities
Social Network Analysis To Blog Based Online Communities
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network Analysis
 
Community detection algorithms
Community detection algorithmsCommunity detection algorithms
Community detection algorithms
 
Social Media Mining - Chapter 10 (Behavior Analytics)
Social Media Mining - Chapter 10 (Behavior Analytics)Social Media Mining - Chapter 10 (Behavior Analytics)
Social Media Mining - Chapter 10 (Behavior Analytics)
 
Social Network Analysis Workshop
Social Network Analysis WorkshopSocial Network Analysis Workshop
Social Network Analysis Workshop
 
Social Media Mining - Chapter 2 (Graph Essentials)
Social Media Mining - Chapter 2 (Graph Essentials)Social Media Mining - Chapter 2 (Graph Essentials)
Social Media Mining - Chapter 2 (Graph Essentials)
 
Social Media Mining - Chapter 3 (Network Measures)
Social Media Mining - Chapter 3 (Network Measures)Social Media Mining - Chapter 3 (Network Measures)
Social Media Mining - Chapter 3 (Network Measures)
 
Data visualization
Data visualizationData visualization
Data visualization
 
Unit 4.pdf
Unit 4.pdfUnit 4.pdf
Unit 4.pdf
 
Introduction to Social Network Analysis
Introduction to Social Network AnalysisIntroduction to Social Network Analysis
Introduction to Social Network Analysis
 
Community Detection in Social Networks: A Brief Overview
Community Detection in Social Networks: A Brief OverviewCommunity Detection in Social Networks: A Brief Overview
Community Detection in Social Networks: A Brief Overview
 
Data mining for social media
Data mining for social mediaData mining for social media
Data mining for social media
 
Social network analysis intro part I
Social network analysis intro part ISocial network analysis intro part I
Social network analysis intro part I
 
Feature detection and matching
Feature detection and matchingFeature detection and matching
Feature detection and matching
 
Deep Learning for Graphs
Deep Learning for GraphsDeep Learning for Graphs
Deep Learning for Graphs
 
Spatial data mining
Spatial data miningSpatial data mining
Spatial data mining
 

Similar to NE7012- SOCIAL NETWORK ANALYSIS

[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
IJET - International Journal of Engineering and Techniques
 
Introduction to feature subset selection method
Introduction to feature subset selection methodIntroduction to feature subset selection method
Introduction to feature subset selection method
IJSRD
 
Analysis Levels And Techniques A Survey
Analysis Levels And Techniques   A SurveyAnalysis Levels And Techniques   A Survey
Analysis Levels And Techniques A Survey
Liz Adams
 
Neural Network Based Context Sensitive Sentiment Analysis
Neural Network Based Context Sensitive Sentiment AnalysisNeural Network Based Context Sensitive Sentiment Analysis
Neural Network Based Context Sensitive Sentiment Analysis
Editor IJCATR
 
Web_Mining_Overview_Nfaoui_El_Habib
Web_Mining_Overview_Nfaoui_El_HabibWeb_Mining_Overview_Nfaoui_El_Habib
Web_Mining_Overview_Nfaoui_El_Habib
El Habib NFAOUI
 
Effective Feature Selection for Feature Possessing Group Structure
Effective Feature Selection for Feature Possessing Group StructureEffective Feature Selection for Feature Possessing Group Structure
Effective Feature Selection for Feature Possessing Group Structure
rahulmonikasharma
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
IJERD Editor
 
Estimating the overall sentiment score by inferring modus ponens law
Estimating the overall sentiment score by inferring modus ponens lawEstimating the overall sentiment score by inferring modus ponens law
Estimating the overall sentiment score by inferring modus ponens law
International Journal of Advance Research and Innovative Ideas in Education
 
Document Classification Using Expectation Maximization with Semi Supervised L...
Document Classification Using Expectation Maximization with Semi Supervised L...Document Classification Using Expectation Maximization with Semi Supervised L...
Document Classification Using Expectation Maximization with Semi Supervised L...
ijsc
 
Document Classification Using Expectation Maximization with Semi Supervised L...
Document Classification Using Expectation Maximization with Semi Supervised L...Document Classification Using Expectation Maximization with Semi Supervised L...
Document Classification Using Expectation Maximization with Semi Supervised L...
ijsc
 
OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...
OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...
OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...
Content Savvy
 
Semantic Search of E-Learning Documents Using Ontology Based System
Semantic Search of E-Learning Documents Using Ontology Based SystemSemantic Search of E-Learning Documents Using Ontology Based System
Semantic Search of E-Learning Documents Using Ontology Based System
ijcnes
 
16     Decision Support and Business Intelligence Systems (9th E.docx
16     Decision Support and Business Intelligence Systems (9th E.docx16     Decision Support and Business Intelligence Systems (9th E.docx
16     Decision Support and Business Intelligence Systems (9th E.docx
RAJU852744
 
16     Decision Support and Business Intelligence Systems (9th E.docx
16     Decision Support and Business Intelligence Systems (9th E.docx16     Decision Support and Business Intelligence Systems (9th E.docx
16     Decision Support and Business Intelligence Systems (9th E.docx
herminaprocter
 
View the Microsoft Word document.doc
View the Microsoft Word document.docView the Microsoft Word document.doc
View the Microsoft Word document.docbutest
 
View the Microsoft Word document.doc
View the Microsoft Word document.docView the Microsoft Word document.doc
View the Microsoft Word document.docbutest
 
View the Microsoft Word document.doc
View the Microsoft Word document.docView the Microsoft Word document.doc
View the Microsoft Word document.docbutest
 
A Survey on Sentiment Categorization of Movie Reviews
A Survey on Sentiment Categorization of Movie ReviewsA Survey on Sentiment Categorization of Movie Reviews
A Survey on Sentiment Categorization of Movie Reviews
Editor IJMTER
 
Supervised Approach to Extract Sentiments from Unstructured Text
Supervised Approach to Extract Sentiments from Unstructured TextSupervised Approach to Extract Sentiments from Unstructured Text
Supervised Approach to Extract Sentiments from Unstructured Text
International Journal of Engineering Inventions www.ijeijournal.com
 
A Survey on Sentiment Analysis and Opinion Mining
A Survey on Sentiment Analysis and Opinion MiningA Survey on Sentiment Analysis and Opinion Mining
A Survey on Sentiment Analysis and Opinion Mining
IJSRD
 

Similar to NE7012- SOCIAL NETWORK ANALYSIS (20)

[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
 
Introduction to feature subset selection method
Introduction to feature subset selection methodIntroduction to feature subset selection method
Introduction to feature subset selection method
 
Analysis Levels And Techniques A Survey
Analysis Levels And Techniques   A SurveyAnalysis Levels And Techniques   A Survey
Analysis Levels And Techniques A Survey
 
Neural Network Based Context Sensitive Sentiment Analysis
Neural Network Based Context Sensitive Sentiment AnalysisNeural Network Based Context Sensitive Sentiment Analysis
Neural Network Based Context Sensitive Sentiment Analysis
 
Web_Mining_Overview_Nfaoui_El_Habib
Web_Mining_Overview_Nfaoui_El_HabibWeb_Mining_Overview_Nfaoui_El_Habib
Web_Mining_Overview_Nfaoui_El_Habib
 
Effective Feature Selection for Feature Possessing Group Structure
Effective Feature Selection for Feature Possessing Group StructureEffective Feature Selection for Feature Possessing Group Structure
Effective Feature Selection for Feature Possessing Group Structure
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Estimating the overall sentiment score by inferring modus ponens law
Estimating the overall sentiment score by inferring modus ponens lawEstimating the overall sentiment score by inferring modus ponens law
Estimating the overall sentiment score by inferring modus ponens law
 
Document Classification Using Expectation Maximization with Semi Supervised L...
Document Classification Using Expectation Maximization with Semi Supervised L...Document Classification Using Expectation Maximization with Semi Supervised L...
Document Classification Using Expectation Maximization with Semi Supervised L...
 
Document Classification Using Expectation Maximization with Semi Supervised L...
Document Classification Using Expectation Maximization with Semi Supervised L...Document Classification Using Expectation Maximization with Semi Supervised L...
Document Classification Using Expectation Maximization with Semi Supervised L...
 
OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...
OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...
OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...
 
Semantic Search of E-Learning Documents Using Ontology Based System
Semantic Search of E-Learning Documents Using Ontology Based SystemSemantic Search of E-Learning Documents Using Ontology Based System
Semantic Search of E-Learning Documents Using Ontology Based System
 
16     Decision Support and Business Intelligence Systems (9th E.docx
16     Decision Support and Business Intelligence Systems (9th E.docx16     Decision Support and Business Intelligence Systems (9th E.docx
16     Decision Support and Business Intelligence Systems (9th E.docx
 
16     Decision Support and Business Intelligence Systems (9th E.docx
16     Decision Support and Business Intelligence Systems (9th E.docx16     Decision Support and Business Intelligence Systems (9th E.docx
16     Decision Support and Business Intelligence Systems (9th E.docx
 
View the Microsoft Word document.doc
View the Microsoft Word document.docView the Microsoft Word document.doc
View the Microsoft Word document.doc
 
View the Microsoft Word document.doc
View the Microsoft Word document.docView the Microsoft Word document.doc
View the Microsoft Word document.doc
 
View the Microsoft Word document.doc
View the Microsoft Word document.docView the Microsoft Word document.doc
View the Microsoft Word document.doc
 
A Survey on Sentiment Categorization of Movie Reviews
A Survey on Sentiment Categorization of Movie ReviewsA Survey on Sentiment Categorization of Movie Reviews
A Survey on Sentiment Categorization of Movie Reviews
 
Supervised Approach to Extract Sentiments from Unstructured Text
Supervised Approach to Extract Sentiments from Unstructured TextSupervised Approach to Extract Sentiments from Unstructured Text
Supervised Approach to Extract Sentiments from Unstructured Text
 
A Survey on Sentiment Analysis and Opinion Mining
A Survey on Sentiment Analysis and Opinion MiningA Survey on Sentiment Analysis and Opinion Mining
A Survey on Sentiment Analysis and Opinion Mining
 

Recently uploaded

ESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdfESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
Fundacja Rozwoju Społeczeństwa Przedsiębiorczego
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
beazzy04
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
GeoBlogs
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
Celine George
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
Thiyagu K
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
Anna Sz.
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
DeeptiGupta154
 
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
AzmatAli747758
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
MIRIAMSALINAS13
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
Celine George
 
Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
Nguyen Thanh Tu Collection
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
Excellence Foundation for South Sudan
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
kaushalkr1407
 

Recently uploaded (20)

ESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdfESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
 
Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
 

NE7012- SOCIAL NETWORK ANALYSIS

  • 1. NE7012 SOCIAL NETWORK ANATYSIS PREPARED BY: A.RATHNADEVI A.V.C COLLEGE OF ENGINEERING UNIT 5-TEXT AND OPINION MINING
  • 2. UNIT V TEXT AND OPINION MINING Text Mining in Social Networks -Opinion extraction – Sentiment classification and clustering - Temporal sentiment analysis - Irony detection in opinion mining - Wish analysis – Product review mining – Review Classification – Tracking sentiments towards topics over time 5.1 Text Mining in Social Networks 5.1.1 Text mining definition  The objective of Text Mining is to exploit information contained in textual documents in various ways, including discovery of patterns and trends in data, associations among entities, predictive rules, etc  The results can be important both for:  the analysis of the collection, and  providing intelligent navigation and browsing methods 5.1.2 Text mining pipeline 5.1.3 Motivation for Text Mining  Approximately 90% of the world’s data is held in unstructured formats (source: Oracle Corporation)  Information intensive business processes demand that we transcend from simple document retrieval to “knowledge” discovery.  The justification for the interest in text mining is the same as for the interest in knowledge retrieval (search and categorization).
  • 3.  The shear amount of unstructured data (mostly textual) out there calls for more than just document retrieval. Tools and techniques exist to mine this data and realize value in the same way that data mining taps structured data for business intelligence and knowledge discovery. 5.1.4 Text mining process  Text preprocessing - Syntactic/Semantic text analysis  Features Generation - Bag of words  Features Selection
  • 4. - Simple counting - Statistics  Text/Data Mining - Classification- Supervised learning - Clustering- Unsupervised learning  Analyzing results - Mapping/Visualization - Result interpretation 5.1.5 Challenges in text mining  Data collection is “free text”, is not well-organized (Semi-structured or unstructured)  No uniform access over all sources, each source has separate storage and algebra, examples: email, databases, applications, web  A quintuple heterogeneity: semantic, linguistic, structure, format, size of unit information  Learning techniques for processing text typically need annotated training  XML as the common model, it allows: o Manipulation data with standards o Mining becomes more data mining o RDF emerging as a complementary model  The more structure you can explore the better you can do mining 5.1.6 Text mining actors
  • 5. 5.1.7 Text mining tasks 5.1.8 Applications of Text Mining  Keyword Search  Classification  Clustering
  • 6.  Linkage-based Cross Domain Learning 5.1.8.1 Keyword Search  simple but user-friendly interface for information retrieval on the Web.  Proves to be an effective method for accessing structured data.  The challenges lie in three aspects: o Query semantics o Ranking strategy o Query efficiency Keyword Search Algorithms  Query Semantics and Answer Ranking  Keyword search over XML and relational data  Keyword search over graph data 5.1.8.2 Classification Algorithms  Content-based text classification o Naive Bayes classifier, TFIDF classifier and Probabilistic Indexing classifier  Challenges in the context of text classification: o Social networks contain a much larger and non-standard vocabulary o The labels in social networks may often be quite sparse o use of content can greatly improve the effectiveness of the link-based classification process 5.1.8.3 Clustering Algorithms  Related to the traditional problem of graph partitioning  The problem of graph partitioning is NP-hard and often does not scale very well to large networks.  Methods:
  • 7. o The Kerninghan-Lin algorithm o link-based clustering o clustering graph streams  uses only the structure of the network for the clustering process.  Improve the quality of clustering by using the text content in the nodes of the social network.  use a number of variants of traditional clustering algorithms for multi-dimensional data.  Most of these methods are variants of the k-means method o start off with a set of k seeds and build the clusters iteratively around these seeds. o The seeds and cluster membership are iteratively defined with respect to each other, until we converge to an effective solution.  Perform the clustering with the use of both content and structure information.  constructs a new graph which takes into account both the structure and attribute information.  Such a graph has two kinds of edges:  structure edges from the original graph, and  attribute edges, which are based on the nature of the attributes in the different nodes.  A random walk approach is used over this graph in order to define the underlying clusters.  Each edge is associated with a weight, which is used in order to control the probability of the random walk across the different nodes.  These weights are updated during an iterative process, and the clusters and the weights are successively used in order to refine each other.  weights and the clusters will naturally converge, as the clustering process progresses 5.2 Sentiment analysis 5.2.1 Introduction
  • 8.  Sentiment analysis (opinion mining): Computational and automatic study of people’s opinions expressed in written language or text.  Two types of information are in text data:  Objective information: facts.  Subjective information: opinions.  The focus of sentiment analysis:  subjective part of text à identify opinionated information rather than mining and retrieval of factual information.  Sentiment analysis brings together various fields of research: text mining, Natural Language Processing, Data mining. 5.2.2 APPLICATIONS  Review summarizations. - Review-oriented search engines. - Search for people’s opinions: How do people think about iPhone 5s?  Recommendation systems. - If you can do sentiment analysis, then the recommendation system can recommend items with positive feedback and not recommend items with negative feedback.  Information extraction systems. - These systems focus on objective parts to extract factual information. - They can discard subjective sentences.  Question-answering systems. - Different types of questions: definitional and opinion oriented questions. - Both individuals and organizations can take advantage of sentiment analysis. 5.2.3 Levels Of Sentiment Analysis  Document level - Identify the opinion orientation of the whole document.  Sentence level - Identify whether the sentence is subjective or objective. - Identify the opinion orientation of subjective sentences.  Aspect level - Identify the aspects that the users are commenting on. - Identify the opinion orientation about each aspect. 5.2.4 System process
  • 9. 5.2.5 ASPECT IDENTIFICATION  Using clustering to find similar sentences.  It is likely that similar sentences are about similar aspects.  For sentence clustering the method that we use for representing each sentence is important.  The major reason that regular clustering algorithms did not work (Gamon et al [2005]) is the lack of proper method to represent each sentence.  Sentences representation  BOW representation: considers all terms in the sentence.  BON representation: considers only nouns of the sentence. 5.2.6 Sentiment Identification  Machine learning approach sees the sentiment identification problem as a classification problem. Make use of manually labeled training data.  Two major tasks in designing a classifier  Feature extraction: come up with a set of features that represents your problem properly.  Classifier selection: choose a classifier among KNN, Naïve Bayes, SVM, Maximum Entropy.  Our approaches are related to feature extraction steps.  Support Vector Machines are widely used in text classification. We use SVM as well.
  • 10. 5.2.7 Sentiment classification  Classify sentences/documents (e.g. reviews)/features based on the overall sentiments expressed by authors o positive, negative and (possibly) neutral  Similar to topic-based text classification o Topic-based classification: topic words are important o Sentiment classification: sentiment words are more important (e.g: great, excellent, horrible, bad, worst)  In summary, approaches used in sentiment classification o Unsupervised – eg: NLP pattern @ NLP patterns with lexicon o Supervised – eg: SVM, Naive Bayes..etc (with varying features like POS tags, word phrases) o Semi Supervised – eg: lexicon+classifier 1) Supervised Learning  Supervised learning (or called classification) is one of the major tasks in the research areas such as machine learning, artificial intelligence, data mining, and so forth.  A supervised learning algorithm commonly first trains a classifier (or inferred function) by analyzing the given training data and then classify (or give class label to) those test data.  One typical example for supervised learning in web mining is that if we are given many already known web pages with labels (i.e., topics in Yahoo!), how to automatically set labels to the new web pages.  In this section, we briefly introduce some most commonly used techniques for supervised learning. More kinds of strategies and algorithms can be found.  Nearest Neighbor Classifiers  Decision Tree  Bayesian Classifiers  Neural Networks Classifier . 2) Unsupervised Learning  In this section, we will introduce major techniques of unsupervised learning (or clustering).  Among a large amount of approaches that have been proposed, there are three representative unsupervised learning strategies, i.e., k-means, hierarchical clustering and density based clustering.
  • 11. 3) Semi-supervised Learning  In the previous two sections, we have introduced the learning issues on the labeled data (supervised learning or classification), and the unlabeled data (unsupervised learning or clustering).  In this chapter, we will present the basic learning techniques when both of the two kind of data are given.  The intuition is that large amount of unlabeled data is easier to obtain (e.g., pages crawled by Google) yet only a small part of them could be labeled due to resource limitation.  The research is so-called semi-supervised learning (or semi-supervised classi f ication), which aims to address the problem by using large amount of unlabeled data, together with the labeled data, to build better classifiers.  There are many approaches proposed for semi-supervised classification, in which the representatives are self-training, co-training, generative models, graph-based methods. 5.3 Temporal sentiment analysis 5.3.1 Overview  The method produces topic graph and sentiment graph by using sentiment phrases which are patterns of sentiment expression such as “happy” or “delighted at”.  We extracted 383 sentiment phrases from Japanese news articles manually, and classified them into eight categories: anxiety, sorrow, anger, happiness, suffering, fatigue, complaint, and shock.
  • 12. 5.3.2 Procedure for Making a Topic Graph Following is the procedure for making a topic graph. Given: one of sentiment category S which is specified by a user period of time: D=(d1, d2, …, dl) Step 1: For each day di in D, retrieve articles containing sentiment phrases of sentiment s. Step 2: Extract keywords from retrieved articles by using a keyword extraction system called GENSEN-Web3 that can extract compound nouns as a keyword. Step 3: For each extracted keywords wj(j=1,2,…,N), calculate an average correlation c between wj and sentiment phrases contained in S. We use the Dice coefficient for calculating correlation. Step 4: Extract top n keywords according to the score defined by the products of (1) number of days in which keywords appears, (2) inverse frequency of number of days, and (3) scores provided by GENSEN-Web. Step 4’(optional): Put keywords into clusters based on correlation coefficient over timeline and the Dice coefficient in an article. Step 5: Generate a temporal graph for each n keywords (or clusters). For viewability of the graph, we apply moving average. 5.3.4 Procedure for Making a Sentiment Graph Following is the procedure for making a sentiment graph. Given: a keyword w which is specified by a user period of time: D=(d1, d2, …, dl) Step 1: Retrieve articles containing keyword w for each day di(i=1,2,…,l). Step 2: For each articles, calculate the sum of frequency of sentiment phrases for all sentiment categories. Step 3: Generate a temporal graph of frequency of sentiment phrases for each sentiment category. Then, moving average is applied to the graph.
  • 13. 5.4 Irony detection in opinion mining  In video/spoken discourse, especially in a conversational context, we are usually able to detect a variety of external clues (e.g. facial expression, intonation, pause duration) that enable the perception of irony. In written text, a set of more or less explicit linguistic strategies is also used to express irony. In the next subsections, we describe eight linguistic patterns that we have previously identified to be related to the expression of
  • 14. irony (Table 1). Some are specific to Portuguese (e.g. morphological patterns) while others seem to be language independent (e.g. emoticons). 1. P𝑑𝑖𝑚: Diminutive Forms Diminutives are commonly used in Portuguese, often with the purpose of expressing positive sentiments, like affect, tenderness and intimacy. However, they can also be sarcastically and ironically used for expressing an insult or depreciation towards the entity they represent. This is especially so when diminutives are found in NE mentioning well-known personalities, such as political entities (e.g. “Socratezinho” for the current Portuguese prime-minister, Jos´e S´ocrates). 2. P𝑑𝑒𝑚: Demonstrative Determiners In Portuguese, the occurrence of any demonstrative form – namely, “este” (this), “esse” and “aquele” (that) – before an human NE usually indicates that such entity is being negatively or pejoratively mentioned. In some cases, demonstratives (DEM ) are the unique explicit clue that signals the presence of irony (e.g. “Este S´ocrates ´e muito amigo do Sr. Jack” / “This S´ocrates is a very good friend of Mr. Jack”). 3. P𝑖𝑡𝑗 : Interjections Interjections abound in subjective texts, particularly in UGC, carrying on valuable information concerning authors’ emotions, feelings and attitudes. We believe that some interjections can be used as potential clues for irony detection, when they appear in specific contexts, such as the ones represented in the Pattern P𝑖 . Since we are especially interested in recognizing irony in prior positive text, we confined our analysis to a small set of interjections that are commonly used to express positive sentiments, namely: “bravo”, “for¸ca”, “muito obrigado/a”, “obrigado/a”, “obrigadinho/a”, “parab´ens”, “muitos parab´ens” and “viva”. 4. P𝑣𝑒𝑟𝑏: Verb Morphology The type of pronoun used for addressing people can also be an important clue for irony detection in UGC, especially in languages like Portuguese, where the choice of a specific
  • 15. pronoun or way of expression (e.g. “tu” vs. “vocˆe”, both translatable by “you”) may depend on the degree of proximity/familiarity between the speaker and the NE it refers to. The pronoun “tu” is used in a familiar context (e.g. with friends and family). In our experiments, we analyze to what extent the use of the pronoun “tu” for addressing a wellknow named entity can be used as a clue for irony detection in UGC. As represented in P𝑣𝑒𝑟𝑏, the pronoun can be either explicitly referred in the text or it can be embedded in the morphology of the verb (which is in the second-person singular). We confined the analysis to the verb “ser” (to be). 5. P𝑐𝑟𝑜𝑠𝑠: Cross-constructions In Portuguese, evaluative adjectives with a prior positive or neutral polarity usually take a negative or ironic interpretation whenever they appear in cross-constructions, where adjectives relate to the noun they modify through the preposition “de” (e.g. “O comunista do ministro” / “The communist of the minister”) [2]. Pattern P𝑐𝑟𝑜𝑠𝑠 recognizes cross- constructions headed by a positive or neutral adjective (ADJ𝑝𝑜𝑠 or ADJ𝑛𝑒𝑢𝑡, respectively), which modify a human NE. Adjectives are preceded by a demonstrative (DEM ) or an article (ART) determiner. 6. P𝑝𝑢𝑛𝑐𝑡: Heavy Punctuation In UGC, punctuation is frequently used both for verbalizing user immediate emotions and feelings and for intentionally signaling humoristic or ironic text. We assume that the presence in a sentence of a sequence composed of more than one exclamation point and/or question mark can be used as a clue for irony detection. 7. P𝑞𝑢𝑜𝑡𝑒: Quotation Marks Quotation marks are also frequently used to express and emphasize an ironic content, especially if the content has a prior positive polarity (e.g. positive adjective qualifying an entity). In our experiments, we tried to find possible ironic sentences by searching quoted sequences composed of one or two words, corresponding, at least one of them, to a positive adjective or noun. 8. P𝑙𝑎𝑢𝑔ℎ: Laughter Expressions Internet slang contains a variety of widespread expressions and symbols that typically represent a sensory expression, suggesting different attitudes or emotions. In our experiments, we considered (i) the acronyms “lol” and corresponding variations (LOL), (ii) onomatopoeic expressions such as “ah”, “eh” and “hi” (AH) and (iii) the prior positive emoticons “:)”“;-)” and “:P” (EMO+). In this particular case, we did not constraint the polarity of elements contained in the sentence. We assume that laugh expressions are intrinsically positive or ironic 5.5 Product review mining 5.5.1 Motivation  A rapid expansion of e-commerce, where more and more products are sold via online portals (Amazon, eBay … )
  • 16.  Online product reviews thus become an important resource: o Customers to share and find opinions about products easily o Producers to get certain degrees of feedback 5.5.2 Related works  Single-document summarization o Extractive-based approach  Sentence score + ranking  Machine learning technique o Abstractive-based approach  Template  Concept hierarchy  Multi-document summarization o Extractive-based approach  Sentence score + ranking + MMR + Ordering o Abstractive-based approach  Template  Concept hierarchy  Sentence fusion with paraphrasing rules  Sentiment analysis o Reviews polarity classification o PROS/ CONS identification o Mining review opinions  Identify product facets  Identify opinion orientation on the facet 5.5.3 Process
  • 17. 5.5.4 Product facets identification o Association rule mining  Each transaction consists of nouns/noun phrases from single sentence  The frequent itemsets are the candidate product facets o Redundancy pruning  Removing redundant facets that contain only single words. (e.g. life -> battery life) o Compactness pruning  Removing meaningless facets that contain multiple words