UNIT V TEXT AND OPINION MINING
Text Mining in Social Networks -Opinion extraction – Sentiment classification and clustering -
Temporal sentiment analysis - Irony detection in opinion mining - Wish analysis – Product review mining – Review Classification – Tracking sentiments towards topics over time
UNIT II MODELING AND VISUALIZATION
Visualizing Online Social Networks - A Taxonomy of Visualizations - Graph Representation -
Centrality- Clustering - Node-Edge Diagrams - Visualizing Social Networks with Matrix-Based
Representations- Node-Link Diagrams - Hybrid Representations - Modelling and aggregating
social network data – Random Walks and their Applications –Use of Hadoop and Map Reduce -
Ontological representation of social individuals and relationships.
UNIT III MINING COMMUNITIES
Aggregating and reasoning with social network data, Advanced Representations - Extracting
evolution of Web Community from a Series of Web Archive - Detecting Communities in Social
Networks - Evaluating Communities – Core Methods for Community Detection & Mining Applications of Community Mining Algorithms - Node Classification in Social Networks.
UNIT I- INTRODUCTION
Introduction to Web - Limitations of current Web – Development of Semantic Web – Emergence of the Social Web – Statistical Properties of Social Networks -Network analysis - Development of Social Network Analysis - Key concepts and measures in network analysis - Discussion networks -Blogs and online communities - Web-based networks
UNIT II MODELING AND VISUALIZATION
Visualizing Online Social Networks - A Taxonomy of Visualizations - Graph Representation -
Centrality- Clustering - Node-Edge Diagrams - Visualizing Social Networks with Matrix-Based
Representations- Node-Link Diagrams - Hybrid Representations - Modelling and aggregating
social network data – Random Walks and their Applications –Use of Hadoop and Map Reduce -
Ontological representation of social individuals and relationships.
UNIT III MINING COMMUNITIES
Aggregating and reasoning with social network data, Advanced Representations - Extracting
evolution of Web Community from a Series of Web Archive - Detecting Communities in Social
Networks - Evaluating Communities – Core Methods for Community Detection & Mining Applications of Community Mining Algorithms - Node Classification in Social Networks.
UNIT I- INTRODUCTION
Introduction to Web - Limitations of current Web – Development of Semantic Web – Emergence of the Social Web – Statistical Properties of Social Networks -Network analysis - Development of Social Network Analysis - Key concepts and measures in network analysis - Discussion networks -Blogs and online communities - Web-based networks
UNIT 1: INTRODUCTION
Introduction to Web - Limitations of current Web – Development of Semantic Web – Emergence of the Social Web – Statistical Properties of Social Networks -Network analysis - Development of Social Network Analysis - Key concepts and measures in network analysis - Discussion networks -Blogs and online communities - Web-based networks
Social Media Mining - Chapter 8 (Influence and Homophily)SocialMediaMining
R. Zafarani, M. A. Abbasi, and H. Liu, Social Media Mining: An Introduction, Cambridge University Press, 2014.
Free book and slides at http://socialmediamining.info/
Social Media Mining - Chapter 10 (Behavior Analytics)SocialMediaMining
R. Zafarani, M. A. Abbasi, and H. Liu, Social Media Mining: An Introduction, Cambridge University Press, 2014.
Free book and slides at http://socialmediamining.info/
Social Network Analysis Workshop
This talk will be a workshop featuring an overview of basic theory and methods for social network analysis and an introduction to igraph. The first half of the talk will be a discussion of the concepts and the second half will feature code examples and demonstrations.
Igraph is a package in R, Python, and C++ that supports social network analysis and network data visualization.
Ian McCulloh holds joint appointments as a Parson’s Fellow in the Bloomberg School of Public health, a Senior Lecturer in the Whiting School of Engineering and a senior scientist at the Applied Physics Lab, at Johns Hopkins University. His current research is focused on strategic influence in online networks. His most recent papers have been focused on the neuroscience of persuasion and measuring influence in online social media firestorms. He is the author of “Social Network Analysis with Applications” (Wiley: 2013), “Networks Over Time” (Oxford: forthcoming) and has published 48 peer-reviewed papers, primarily in the area of social network analysis. His current applied work is focused on educating soldiers and marines in advanced methods for open source research and data science leadership.
More information about Dr. Ian McCulloh's work can be found at https://ep.jhu.edu/about-us/faculty-directory/1511-ian-mcculloh
R. Zafarani, M. A. Abbasi, and H. Liu, Social Media Mining: An Introduction, Cambridge University Press, 2014.
Free book and slides at http://socialmediamining.info/
R. Zafarani, M. A. Abbasi, and H. Liu, Social Media Mining: An Introduction, Cambridge University Press, 2014.
Free book and slides at http://socialmediamining.info/
Social network analysis [SNA] is the mapping and measuring of relationships and flows between people, groups, organizations, computers, URLs, and other connected information/knowledge entities. SNA provides both a visual and a mathematical analysis of human relationships.
Text mining efforts to innovate new, previous unknown or hidden data by automatically extracting
collection of information from various written resources. Applying knowledge detection method to
formless text is known as Knowledge Discovery in Text or Text data mining and also called Text Mining.
Most of the techniques used in Text Mining are found on the statistical study of a term either word or
phrase. There are different algorithms in Text mining are used in the previous method. For example
Single-Link Algorithm and Self-Organizing Mapping(SOM) is introduces an approach for visualizing
high-dimensional data and a very useful tool for processing textual data based on Projection method.
Genetic and Sequential algorithms are provide the capability for multiscale representation of datasets and
fast to compute with less CPU time based on the Isolet Reduces subsets in Unsupervised Feature
Selection. We are going to propose the Vector Space Model and Concept based analysis algorithm it will
improve the text clustering quality and a better text clustering result may achieve. We think it is a good
behavior of the proposed algorithm is in terms of toughness and constancy with respect to the formation of
Neural Network.
Introduction to feature subset selection methodIJSRD
Data Mining is a computational progression to ascertain patterns in hefty data sets. It has various important techniques and one of them is Classification which is receiving great attention recently in the database community. Classification technique can solve several problems in different fields like medicine, industry, business, science. PSO is based on social behaviour for optimization problem. Feature Selection (FS) is a solution that involves finding a subset of prominent features to improve predictive accuracy and to remove the redundant features. Rough Set Theory (RST) is a mathematical tool which deals with the uncertainty and vagueness of the decision systems.
UNIT 1: INTRODUCTION
Introduction to Web - Limitations of current Web – Development of Semantic Web – Emergence of the Social Web – Statistical Properties of Social Networks -Network analysis - Development of Social Network Analysis - Key concepts and measures in network analysis - Discussion networks -Blogs and online communities - Web-based networks
Social Media Mining - Chapter 8 (Influence and Homophily)SocialMediaMining
R. Zafarani, M. A. Abbasi, and H. Liu, Social Media Mining: An Introduction, Cambridge University Press, 2014.
Free book and slides at http://socialmediamining.info/
Social Media Mining - Chapter 10 (Behavior Analytics)SocialMediaMining
R. Zafarani, M. A. Abbasi, and H. Liu, Social Media Mining: An Introduction, Cambridge University Press, 2014.
Free book and slides at http://socialmediamining.info/
Social Network Analysis Workshop
This talk will be a workshop featuring an overview of basic theory and methods for social network analysis and an introduction to igraph. The first half of the talk will be a discussion of the concepts and the second half will feature code examples and demonstrations.
Igraph is a package in R, Python, and C++ that supports social network analysis and network data visualization.
Ian McCulloh holds joint appointments as a Parson’s Fellow in the Bloomberg School of Public health, a Senior Lecturer in the Whiting School of Engineering and a senior scientist at the Applied Physics Lab, at Johns Hopkins University. His current research is focused on strategic influence in online networks. His most recent papers have been focused on the neuroscience of persuasion and measuring influence in online social media firestorms. He is the author of “Social Network Analysis with Applications” (Wiley: 2013), “Networks Over Time” (Oxford: forthcoming) and has published 48 peer-reviewed papers, primarily in the area of social network analysis. His current applied work is focused on educating soldiers and marines in advanced methods for open source research and data science leadership.
More information about Dr. Ian McCulloh's work can be found at https://ep.jhu.edu/about-us/faculty-directory/1511-ian-mcculloh
R. Zafarani, M. A. Abbasi, and H. Liu, Social Media Mining: An Introduction, Cambridge University Press, 2014.
Free book and slides at http://socialmediamining.info/
R. Zafarani, M. A. Abbasi, and H. Liu, Social Media Mining: An Introduction, Cambridge University Press, 2014.
Free book and slides at http://socialmediamining.info/
Social network analysis [SNA] is the mapping and measuring of relationships and flows between people, groups, organizations, computers, URLs, and other connected information/knowledge entities. SNA provides both a visual and a mathematical analysis of human relationships.
Text mining efforts to innovate new, previous unknown or hidden data by automatically extracting
collection of information from various written resources. Applying knowledge detection method to
formless text is known as Knowledge Discovery in Text or Text data mining and also called Text Mining.
Most of the techniques used in Text Mining are found on the statistical study of a term either word or
phrase. There are different algorithms in Text mining are used in the previous method. For example
Single-Link Algorithm and Self-Organizing Mapping(SOM) is introduces an approach for visualizing
high-dimensional data and a very useful tool for processing textual data based on Projection method.
Genetic and Sequential algorithms are provide the capability for multiscale representation of datasets and
fast to compute with less CPU time based on the Isolet Reduces subsets in Unsupervised Feature
Selection. We are going to propose the Vector Space Model and Concept based analysis algorithm it will
improve the text clustering quality and a better text clustering result may achieve. We think it is a good
behavior of the proposed algorithm is in terms of toughness and constancy with respect to the formation of
Neural Network.
Introduction to feature subset selection methodIJSRD
Data Mining is a computational progression to ascertain patterns in hefty data sets. It has various important techniques and one of them is Classification which is receiving great attention recently in the database community. Classification technique can solve several problems in different fields like medicine, industry, business, science. PSO is based on social behaviour for optimization problem. Feature Selection (FS) is a solution that involves finding a subset of prominent features to improve predictive accuracy and to remove the redundant features. Rough Set Theory (RST) is a mathematical tool which deals with the uncertainty and vagueness of the decision systems.
Neural Network Based Context Sensitive Sentiment AnalysisEditor IJCATR
Social media communication is evolving more in these days. Social networking site is being rapidly increased in recent years, which provides platform to connect people all over the world and share their interests. The conversation and the posts available in social media are unstructured in nature. So sentiment analysis will be a challenging work in this platform. These analyses are mostly performed in machine learning techniques which are less accurate than neural network methodologies. This paper is based on sentiment classification using Competitive layer neural networks and classifies the polarity of a given text whether the expressed opinion in the text is positive or negative or neutral. It determines the overall topic of the given text. Context independent sentences and implicit meaning in the text are also considered in polarity classification.
Effective Feature Selection for Feature Possessing Group Structurerahulmonikasharma
Feature selection has become an interesting research topic in recent years. It is an effective method to tackle the data with high dimension. The underlying structure has been ignored by the previous feature selection method and it determines the feature individually. Considering this we focus on the problem where feature possess some group structure. To solve this problem we present group feature selection method at group level to execute feature selection. Its objective is to execute the feature selection in within the group and between the group of features that select discriminative features and remove redundant features to obtain optimal subset. We demonstrate our method on data sets and perform the task to achieve classification accuracy.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
Document Classification Using Expectation Maximization with Semi Supervised L...ijsc
As the amount of online document increases, the demand for document classification to aid the analysis and management of document is increasing. Text is cheap, but information, in the form of knowing what classes a document belongs to, is expensive. The main purpose of this paper is to explain the expectation maximization technique of data mining to classify the document and to learn how to improve the accuracy while using semi-supervised approach. Expectation maximization algorithm is applied with both supervised and semi-supervised approach. It is found that semi-supervised approach is more accurate and effective. The main advantage of semi supervised approach is “DYNAMICALLY GENERATION OF NEW CLASS”. The algorithm first trains a classifier using the labeled document and probabilistically classifies the
unlabeled documents. The car dataset for the evaluation purpose is collected from UCI repository dataset in which some changes have been done from our side.
Document Classification Using Expectation Maximization with Semi Supervised L...ijsc
As the amount of online document increases, the demand for document classification to aid the analysis and management of document is increasing. Text is cheap, but information, in the form of knowing what classes a document belongs to, is expensive. The main purpose of this paper is to explain the expectation maximization technique of data mining to classify the document and to learn how to improve the accuracy while using semi-supervised approach. Expectation maximization algorithm is applied with both supervised and semi-supervised approach. It is found that semi-supervised approach is more accurate and effective. The main advantage of semi supervised approach is “DYNAMICALLY GENERATION OF NEW CLASS”. The algorithm first trains a classifier using the labeled document and probabilistically classifies the unlabeled documents. The car dataset for the evaluation purpose is collected from UCI repository dataset in which some changes have been done from our side.
Semantic Search of E-Learning Documents Using Ontology Based Systemijcnes
The keyword searching mechanism is traditionally used for information retrieval from Web based systems. However, this system fails to meet the requirements in Web searching of the expert knowledge base based on the popular semantic systems. Semantic search of E-learning documents based on ontology is increasingly adopted in information retrieval systems. Ontology based system simplifies the task of finding correct information on the Web by building a search system based on the meaning of keyword instead of the keyword itself. The major function of the ontology based system is the development of specification of conceptualization which enhances the connection between the information present in the Web pages with that of the background knowledge.The semantic gap existing between the keyword found in documents and those in query can be matched suitably using Ontology based system. This paper provides a detailed account of the semantic search of E-learning documents using ontology based system by making comparison between various ontology systems. Based on this comparison, this survey attempts to identify the possible directions for future research.
16 Decision Support and Business Intelligence Systems (9th E.docxRAJU852744
16 Decision Support and Business Intelligence Systems (9th Edition) Instructor’s Manual
Chapter 7:
Text Analytics, Text Mining, and Sentiment Analysis
Learning Objectives for Chapter 7
1. Describe text mining and understand the need for text mining
2. Differentiate among text analytics, text mining, and data mining
3. Understand the different application areas for text mining
4. Know the process of carrying out a text mining project
5. Appreciate the different methods to introduce structure to text-based data
6. Describe sentiment analysis
7. Develop familiarity with popular applications of sentiment analysis
8. Learn the common methods for sentiment analysis
9. Become familiar with speech analytics as it relates to sentiment analysis
10. Learn three facets of Web analytics—content, structure, and usage mining
11. Know social analytics including social media and social network analyses
CHAPTER OVERVIEW
This chapter provides a comprehensive overview of text analytics/mining and Web analytics/mining along with their popular application areas such as search engines, sentiment analysis, and social network/media analytics. As we have been witnessing in recent years, the unstructured data generated over the Internet of Things (IoT) (Web, sensor networks, radio-frequency identification [RFID]–enabled supply chain systems, surveillance networks, etc.) are increasing at an exponential pace, and there is no indication of its slowing down. This changing nature of data is forcing organizations to make text and Web analytics a critical part of their business intelligence/analytics infrastructure.
CHAPTER OUTLINE
7.1 Opening Vignette: Amadori Group Converts Consumer Sentiments into
Near-Real-Time Sales
7.2 Text Analytics and Text Mining Overview
7.3 Natural Language Processing (NLP)
7.4 Text Mining Applications
7.5 Text Mining Process
7.6 Sentiment Analysis
7.7 Web Mining Overview
7.8 Search Engines
7.9 Web Usage Mining
7.10 Social Analytics
ANSWERS TO END OF SECTION REVIEW QUESTIONS( ( ( ( ( (
Section 7.1 Review Questions
1. According to the vignette and based on your opinion, what are the challenges that the food industry is facing today?
Student perceptions may vary, but some common themes related to the challenges faced by the food industry could include the changing nature and role of food in people’s lifestyles, the shift towards pre-prepared or easily prepared food, and the growing importance of marketing to keep customers interested in brands.
2. How can analytics help businesses in the food industry to survive and thrive in this competitive marketplace?
Analytics can serve dual purposes by both tracking customer interest in the brand as well as providing valuable feedback on customer preferences. An analytics system can be used to evaluate the traffic to various brand marketing campaigns (website or social) that play a pivotal role in ensuring that products are being shown to new pot.
16 Decision Support and Business Intelligence Systems (9th E.docxherminaprocter
16 Decision Support and Business Intelligence Systems (9th Edition) Instructor’s Manual
Chapter 7:
Text Analytics, Text Mining, and Sentiment Analysis
Learning Objectives for Chapter 7
1. Describe text mining and understand the need for text mining
2. Differentiate among text analytics, text mining, and data mining
3. Understand the different application areas for text mining
4. Know the process of carrying out a text mining project
5. Appreciate the different methods to introduce structure to text-based data
6. Describe sentiment analysis
7. Develop familiarity with popular applications of sentiment analysis
8. Learn the common methods for sentiment analysis
9. Become familiar with speech analytics as it relates to sentiment analysis
10. Learn three facets of Web analytics—content, structure, and usage mining
11. Know social analytics including social media and social network analyses
CHAPTER OVERVIEW
This chapter provides a comprehensive overview of text analytics/mining and Web analytics/mining along with their popular application areas such as search engines, sentiment analysis, and social network/media analytics. As we have been witnessing in recent years, the unstructured data generated over the Internet of Things (IoT) (Web, sensor networks, radio-frequency identification [RFID]–enabled supply chain systems, surveillance networks, etc.) are increasing at an exponential pace, and there is no indication of its slowing down. This changing nature of data is forcing organizations to make text and Web analytics a critical part of their business intelligence/analytics infrastructure.
CHAPTER OUTLINE
7.1 Opening Vignette: Amadori Group Converts Consumer Sentiments into
Near-Real-Time Sales
7.2 Text Analytics and Text Mining Overview
7.3 Natural Language Processing (NLP)
7.4 Text Mining Applications
7.5 Text Mining Process
7.6 Sentiment Analysis
7.7 Web Mining Overview
7.8 Search Engines
7.9 Web Usage Mining
7.10 Social Analytics
ANSWERS TO END OF SECTION REVIEW QUESTIONS( ( ( ( ( (
Section 7.1 Review Questions
1. According to the vignette and based on your opinion, what are the challenges that the food industry is facing today?
Student perceptions may vary, but some common themes related to the challenges faced by the food industry could include the changing nature and role of food in people’s lifestyles, the shift towards pre-prepared or easily prepared food, and the growing importance of marketing to keep customers interested in brands.
2. How can analytics help businesses in the food industry to survive and thrive in this competitive marketplace?
Analytics can serve dual purposes by both tracking customer interest in the brand as well as providing valuable feedback on customer preferences. An analytics system can be used to evaluate the traffic to various brand marketing campaigns (website or social) that play a pivotal role in ensuring that products are being shown to new pot.
A Survey on Sentiment Categorization of Movie ReviewsEditor IJMTER
Sentiment categorization is a process of mining user generated text content and determine
the sentiment of the users towards that particular thing. It is the approach of detecting the sentiment of
the author in regard to some topics. It also known as sentiment detection, sentiment analysis and opinion
mining. It is very useful for movie production companies that interested in knowing how users feel
about their movies. For example word “excellent” indicates that the review gives positive emotion about
particular movie. The same applies to movies, songs, cars, holiday destinations, Political parties, social
network sites, web blogs, discussion forum and so on. Sentiment categorization can be carried out by
using three approaches. First, Supervised machine learning based text classifier on Naïve Bayes,
Maximum Entropy, SVM, kNN classifier, hidden marcov model. Second, Unsupervised Semantic
Orientation scheme of extracting relevant N-grams of the text and then labelling. Third, SentiWordNet
based publicly available library.
The peer-reviewed International Journal of Engineering Inventions (IJEI) is started with a mission to encourage contribution to research in Science and Technology. Encourage and motivate researchers in challenging areas of Sciences and Technology.
A Survey on Sentiment Analysis and Opinion MiningIJSRD
In Today’s world, the social media has given web users a place for expressing and sharing their thoughts and opinions on different topics or events. For this purpose, the opinion mining has gained the importance. Sentiment classification and Opinion Mining is the study of people’s opinion, emotions, attitude towards the product, services, etc. Sentiment Analysis and Opinion Mining are the two interchangeable terms. There are various approaches and techniques exist for Sentiment Analysis like Naïve Bayes, Decision Trees, Support Vector Machines, Random Forests, Maximum Entropy, etc. Opinion mining is a useful and beneficial way to scientific surveys, political polls, market research and business intelligence, etc. This paper presents a literature review of various techniques used for opinion mining and sentiment analysis.
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
How to Create Map Views in the Odoo 17 ERPCeline George
The map views are useful for providing a geographical representation of data. They allow users to visualize and analyze the data in a more intuitive manner.
Ethnobotany and Ethnopharmacology:
Ethnobotany in herbal drug evaluation,
Impact of Ethnobotany in traditional medicine,
New development in herbals,
Bio-prospecting tools for drug discovery,
Role of Ethnopharmacology in drug evaluation,
Reverse Pharmacology.
This is a presentation by Dada Robert in a Your Skill Boost masterclass organised by the Excellence Foundation for South Sudan (EFSS) on Saturday, the 25th and Sunday, the 26th of May 2024.
He discussed the concept of quality improvement, emphasizing its applicability to various aspects of life, including personal, project, and program improvements. He defined quality as doing the right thing at the right time in the right way to achieve the best possible results and discussed the concept of the "gap" between what we know and what we do, and how this gap represents the areas we need to improve. He explained the scientific approach to quality improvement, which involves systematic performance analysis, testing and learning, and implementing change ideas. He also highlighted the importance of client focus and a team approach to quality improvement.
The Roman Empire A Historical Colossus.pdfkaushalkr1407
The Roman Empire, a vast and enduring power, stands as one of history's most remarkable civilizations, leaving an indelible imprint on the world. It emerged from the Roman Republic, transitioning into an imperial powerhouse under the leadership of Augustus Caesar in 27 BCE. This transformation marked the beginning of an era defined by unprecedented territorial expansion, architectural marvels, and profound cultural influence.
The empire's roots lie in the city of Rome, founded, according to legend, by Romulus in 753 BCE. Over centuries, Rome evolved from a small settlement to a formidable republic, characterized by a complex political system with elected officials and checks on power. However, internal strife, class conflicts, and military ambitions paved the way for the end of the Republic. Julius Caesar’s dictatorship and subsequent assassination in 44 BCE created a power vacuum, leading to a civil war. Octavian, later Augustus, emerged victorious, heralding the Roman Empire’s birth.
Under Augustus, the empire experienced the Pax Romana, a 200-year period of relative peace and stability. Augustus reformed the military, established efficient administrative systems, and initiated grand construction projects. The empire's borders expanded, encompassing territories from Britain to Egypt and from Spain to the Euphrates. Roman legions, renowned for their discipline and engineering prowess, secured and maintained these vast territories, building roads, fortifications, and cities that facilitated control and integration.
The Roman Empire’s society was hierarchical, with a rigid class system. At the top were the patricians, wealthy elites who held significant political power. Below them were the plebeians, free citizens with limited political influence, and the vast numbers of slaves who formed the backbone of the economy. The family unit was central, governed by the paterfamilias, the male head who held absolute authority.
Culturally, the Romans were eclectic, absorbing and adapting elements from the civilizations they encountered, particularly the Greeks. Roman art, literature, and philosophy reflected this synthesis, creating a rich cultural tapestry. Latin, the Roman language, became the lingua franca of the Western world, influencing numerous modern languages.
Roman architecture and engineering achievements were monumental. They perfected the arch, vault, and dome, constructing enduring structures like the Colosseum, Pantheon, and aqueducts. These engineering marvels not only showcased Roman ingenuity but also served practical purposes, from public entertainment to water supply.
2. UNIT V TEXT AND OPINION MINING
Text Mining in Social Networks -Opinion extraction – Sentiment classification and clustering -
Temporal sentiment analysis - Irony detection in opinion mining - Wish analysis – Product
review mining – Review Classification – Tracking sentiments towards topics over time
5.1 Text Mining in Social Networks
5.1.1 Text mining definition
The objective of Text Mining is to exploit information contained in textual documents in
various ways, including discovery of patterns and trends in data, associations among
entities, predictive rules, etc
The results can be important both for:
the analysis of the collection, and
providing intelligent navigation and browsing methods
5.1.2 Text mining pipeline
5.1.3 Motivation for Text Mining
Approximately 90% of the world’s data is held in unstructured formats (source:
Oracle Corporation)
Information intensive business processes demand that we transcend from simple
document retrieval to “knowledge” discovery.
The justification for the interest in text mining is the same as for the interest in
knowledge retrieval (search and categorization).
3. The shear amount of unstructured data (mostly textual) out there calls for more than just
document retrieval. Tools and techniques exist to mine this data and realize value in the
same way that data mining taps structured data for business intelligence and knowledge
discovery.
5.1.4 Text mining process
Text preprocessing
- Syntactic/Semantic text analysis
Features Generation
- Bag of words
Features Selection
4. - Simple counting
- Statistics
Text/Data Mining
- Classification- Supervised learning
- Clustering- Unsupervised learning
Analyzing results
- Mapping/Visualization
- Result interpretation
5.1.5 Challenges in text mining
Data collection is “free text”, is not well-organized (Semi-structured or unstructured)
No uniform access over all sources, each source has separate storage and algebra,
examples: email, databases, applications, web
A quintuple heterogeneity: semantic, linguistic, structure, format, size of unit information
Learning techniques for processing text typically need annotated training
XML as the common model, it allows:
o Manipulation data with standards
o Mining becomes more data mining
o RDF emerging as a complementary model
The more structure you can explore the better you can do mining
5.1.6 Text mining actors
5. 5.1.7 Text mining tasks
5.1.8 Applications of Text Mining
Keyword Search
Classification
Clustering
6. Linkage-based Cross Domain Learning
5.1.8.1 Keyword Search
simple but user-friendly interface for information retrieval on the Web.
Proves to be an effective method for accessing structured data.
The challenges lie in three aspects:
o Query semantics
o Ranking strategy
o Query efficiency
Keyword Search Algorithms
Query Semantics and Answer Ranking
Keyword search over XML and relational data
Keyword search over graph data
5.1.8.2 Classification Algorithms
Content-based text classification
o Naive Bayes classifier, TFIDF classifier and Probabilistic Indexing classifier
Challenges in the context of text classification:
o Social networks contain a much larger and non-standard vocabulary
o The labels in social networks may often be quite sparse
o use of content can greatly improve the effectiveness of the link-based
classification process
5.1.8.3 Clustering Algorithms
Related to the traditional problem of graph partitioning
The problem of graph partitioning is NP-hard and often does not scale very well to large
networks.
Methods:
7. o The Kerninghan-Lin algorithm
o link-based clustering
o clustering graph streams
uses only the structure of the network for the clustering process.
Improve the quality of clustering by using the text content in the nodes of the social
network.
use a number of variants of traditional clustering algorithms for multi-dimensional data.
Most of these methods are variants of the k-means method
o start off with a set of k seeds and build the clusters iteratively around these seeds.
o The seeds and cluster membership are iteratively defined with respect to each
other, until we converge to an effective solution.
Perform the clustering with the use of both content and structure information.
constructs a new graph which takes into account both the structure and attribute
information.
Such a graph has two kinds of edges:
structure edges from the original graph, and
attribute edges, which are based on the nature of the attributes in the different nodes.
A random walk approach is used over this graph in order to define the underlying
clusters.
Each edge is associated with a weight, which is used in order to control the probability of
the random walk across the different nodes.
These weights are updated during an iterative process, and the clusters and the weights
are successively used in order to refine each other.
weights and the clusters will naturally converge, as the clustering process progresses
5.2 Sentiment analysis
5.2.1 Introduction
8. Sentiment analysis (opinion mining): Computational and automatic study of people’s
opinions expressed in written language or text.
Two types of information are in text data:
Objective information: facts.
Subjective information: opinions.
The focus of sentiment analysis:
subjective part of text à identify opinionated information rather than mining and retrieval
of factual information.
Sentiment analysis brings together various fields of research: text mining, Natural
Language Processing, Data mining.
5.2.2 APPLICATIONS
Review summarizations.
- Review-oriented search engines.
- Search for people’s opinions: How do people think about iPhone 5s?
Recommendation systems.
- If you can do sentiment analysis, then the recommendation system can recommend
items with positive feedback and not recommend items with negative feedback.
Information extraction systems.
- These systems focus on objective parts to extract factual information.
- They can discard subjective sentences.
Question-answering systems.
- Different types of questions: definitional and opinion oriented questions.
- Both individuals and organizations can take advantage of sentiment analysis.
5.2.3 Levels Of Sentiment Analysis
Document level
- Identify the opinion orientation of the whole document.
Sentence level
- Identify whether the sentence is subjective or objective.
- Identify the opinion orientation of subjective sentences.
Aspect level
- Identify the aspects that the users are commenting on.
- Identify the opinion orientation about each aspect.
5.2.4 System process
9. 5.2.5 ASPECT IDENTIFICATION
Using clustering to find similar sentences.
It is likely that similar sentences are about similar aspects.
For sentence clustering the method that we use for representing each sentence is
important.
The major reason that regular clustering algorithms did not work (Gamon et al [2005]) is
the lack of proper method to represent each sentence.
Sentences representation
BOW representation: considers all terms in the sentence.
BON representation: considers only nouns of the sentence.
5.2.6 Sentiment Identification
Machine learning approach sees the sentiment identification problem as a classification
problem. Make use of manually labeled training data.
Two major tasks in designing a classifier
Feature extraction: come up with a set of features that represents your problem properly.
Classifier selection: choose a classifier among KNN, Naïve Bayes, SVM, Maximum
Entropy.
Our approaches are related to feature extraction steps.
Support Vector Machines are widely used in text classification. We use SVM as well.
10. 5.2.7 Sentiment classification
Classify sentences/documents (e.g. reviews)/features based on the overall sentiments
expressed by authors
o positive, negative and (possibly) neutral
Similar to topic-based text classification
o Topic-based classification: topic words are important
o Sentiment classification: sentiment words are more important (e.g: great,
excellent, horrible, bad, worst)
In summary, approaches used in sentiment classification
o Unsupervised – eg: NLP pattern @ NLP patterns with lexicon
o Supervised – eg: SVM, Naive Bayes..etc (with varying features like POS tags,
word phrases)
o Semi Supervised – eg: lexicon+classifier
1) Supervised Learning
Supervised learning (or called classification) is one of the major tasks in the research
areas such as machine learning, artificial intelligence, data mining, and so forth.
A supervised learning algorithm commonly first trains a classifier (or inferred function)
by analyzing the given training data and then classify (or give class label to) those test
data.
One typical example for supervised learning in web mining is that if we are given many
already known web pages with labels (i.e., topics in Yahoo!), how to automatically set
labels to the new web pages.
In this section, we briefly introduce some most commonly used techniques for supervised
learning. More kinds of strategies and algorithms can be found.
Nearest Neighbor Classifiers
Decision Tree
Bayesian Classifiers
Neural Networks Classifier
.
2) Unsupervised Learning
In this section, we will introduce major techniques of unsupervised learning (or
clustering).
Among a large amount of approaches that have been proposed, there are three
representative unsupervised learning strategies, i.e., k-means, hierarchical clustering and
density based clustering.
11. 3) Semi-supervised Learning
In the previous two sections, we have introduced the learning issues on the labeled data
(supervised learning or classification), and the unlabeled data (unsupervised learning or
clustering).
In this chapter, we will present the basic learning techniques when both of the two kind
of data are given.
The intuition is that large amount of unlabeled data is easier to obtain (e.g., pages crawled
by Google) yet only a small part of them could be labeled due to resource limitation.
The research is so-called semi-supervised learning (or semi-supervised classi f ication),
which aims to address the problem by using large amount of unlabeled data, together
with the labeled data, to build better classifiers.
There are many approaches proposed for semi-supervised classification, in which the
representatives are self-training, co-training, generative models, graph-based methods.
5.3 Temporal sentiment analysis
5.3.1 Overview
The method produces topic graph and sentiment graph by using sentiment phrases which
are patterns of sentiment expression such as “happy” or “delighted at”.
We extracted 383 sentiment phrases from Japanese news articles manually, and classified
them into eight categories: anxiety, sorrow, anger, happiness, suffering, fatigue,
complaint, and shock.
12. 5.3.2 Procedure for Making a Topic Graph
Following is the procedure for making a topic graph. Given: one of sentiment category S which
is specified by a user period of time: D=(d1, d2, …, dl)
Step 1: For each day di in D, retrieve articles containing sentiment phrases of sentiment s.
Step 2: Extract keywords from retrieved articles by using a keyword extraction system called
GENSEN-Web3 that can extract compound nouns as a keyword.
Step 3: For each extracted keywords wj(j=1,2,…,N), calculate an average correlation c between
wj and sentiment phrases contained in S. We use the Dice coefficient for calculating correlation.
Step 4: Extract top n keywords according to the score defined by the products of (1) number of
days in which keywords appears, (2) inverse frequency of number of days, and (3) scores
provided by GENSEN-Web. Step 4’(optional): Put keywords into clusters based on correlation
coefficient over timeline and the Dice coefficient in an article.
Step 5: Generate a temporal graph for each n keywords (or clusters). For viewability of the
graph, we apply moving average.
5.3.4 Procedure for Making a Sentiment Graph
Following is the procedure for making a sentiment graph. Given: a keyword w which is specified
by a user period of time: D=(d1, d2, …, dl)
Step 1: Retrieve articles containing keyword w for each day di(i=1,2,…,l).
Step 2: For each articles, calculate the sum of frequency of sentiment phrases for all sentiment
categories.
Step 3: Generate a temporal graph of frequency of sentiment phrases for each sentiment
category. Then, moving average is applied to the graph.
13. 5.4 Irony detection in opinion mining
In video/spoken discourse, especially in a conversational context, we are usually able to
detect a variety of external clues (e.g. facial expression, intonation, pause duration) that
enable the perception of irony. In written text, a set of more or less explicit linguistic
strategies is also used to express irony. In the next subsections, we describe eight
linguistic patterns that we have previously identified to be related to the expression of
14. irony (Table 1). Some are specific to Portuguese (e.g. morphological patterns) while
others seem to be language independent (e.g. emoticons).
1. P𝑑𝑖𝑚: Diminutive Forms
Diminutives are commonly used in Portuguese, often with the purpose of expressing
positive sentiments, like affect, tenderness and intimacy. However, they can also be
sarcastically and ironically used for expressing an insult or depreciation towards the
entity they represent. This is especially so when diminutives are found in NE mentioning
well-known personalities, such as political entities (e.g. “Socratezinho” for the current
Portuguese prime-minister, Jos´e S´ocrates).
2. P𝑑𝑒𝑚: Demonstrative Determiners
In Portuguese, the occurrence of any demonstrative form – namely, “este” (this), “esse”
and “aquele” (that) – before an human NE usually indicates that such entity is being
negatively or pejoratively mentioned. In some cases, demonstratives (DEM ) are the
unique explicit clue that signals the presence of irony (e.g. “Este S´ocrates ´e muito
amigo do Sr. Jack” / “This S´ocrates is a very good friend of Mr. Jack”).
3. P𝑖𝑡𝑗 : Interjections
Interjections abound in subjective texts, particularly in UGC, carrying on valuable
information concerning authors’ emotions, feelings and attitudes. We believe that some
interjections can be used as potential clues for irony detection, when they appear in
specific contexts, such as the ones represented in the Pattern P𝑖 . Since we are especially
interested in recognizing irony in prior positive text, we confined our analysis to a small
set of interjections that are commonly used to express positive sentiments, namely:
“bravo”, “for¸ca”, “muito obrigado/a”, “obrigado/a”, “obrigadinho/a”, “parab´ens”,
“muitos parab´ens” and “viva”.
4. P𝑣𝑒𝑟𝑏: Verb Morphology
The type of pronoun used for addressing people can also be an important clue for irony
detection in UGC, especially in languages like Portuguese, where the choice of a specific
15. pronoun or way of expression (e.g. “tu” vs. “vocˆe”, both translatable by “you”) may
depend on the degree of proximity/familiarity between the speaker and the NE it refers
to. The pronoun “tu” is used in a familiar context (e.g. with friends and family). In our
experiments, we analyze to what extent the use of the pronoun “tu” for addressing a
wellknow named entity can be used as a clue for irony detection in UGC. As represented
in P𝑣𝑒𝑟𝑏, the pronoun can be either explicitly referred in the text or it can be embedded
in the morphology of the verb (which is in the second-person singular). We confined the
analysis to the verb “ser” (to be).
5. P𝑐𝑟𝑜𝑠𝑠: Cross-constructions
In Portuguese, evaluative adjectives with a prior positive or neutral polarity usually take a
negative or ironic interpretation whenever they appear in cross-constructions, where
adjectives relate to the noun they modify through the preposition “de” (e.g. “O comunista
do ministro” / “The communist of the minister”) [2]. Pattern P𝑐𝑟𝑜𝑠𝑠 recognizes cross-
constructions headed by a positive or neutral adjective (ADJ𝑝𝑜𝑠 or ADJ𝑛𝑒𝑢𝑡,
respectively), which modify a human NE. Adjectives are preceded by a demonstrative
(DEM ) or an article (ART) determiner.
6. P𝑝𝑢𝑛𝑐𝑡: Heavy Punctuation
In UGC, punctuation is frequently used both for verbalizing user immediate emotions and
feelings and for intentionally signaling humoristic or ironic text. We assume that the
presence in a sentence of a sequence composed of more than one exclamation point
and/or question mark can be used as a clue for irony detection.
7. P𝑞𝑢𝑜𝑡𝑒: Quotation Marks
Quotation marks are also frequently used to express and emphasize an ironic content,
especially if the content has a prior positive polarity (e.g. positive adjective qualifying an
entity). In our experiments, we tried to find possible ironic sentences by searching quoted
sequences composed of one or two words, corresponding, at least one of them, to a
positive adjective or noun.
8. P𝑙𝑎𝑢𝑔ℎ: Laughter Expressions
Internet slang contains a variety of widespread expressions and symbols that typically
represent a sensory expression, suggesting different attitudes or emotions. In our
experiments, we considered (i) the acronyms “lol” and corresponding variations (LOL),
(ii) onomatopoeic expressions such as “ah”, “eh” and “hi” (AH) and (iii) the prior
positive emoticons “:)”“;-)” and “:P” (EMO+). In this particular case, we did not
constraint the polarity of elements contained in the sentence. We assume that laugh
expressions are intrinsically positive or ironic
5.5 Product review mining
5.5.1 Motivation
A rapid expansion of e-commerce, where more and more products are sold via online
portals (Amazon, eBay … )
16. Online product reviews thus become an important resource:
o Customers to share and find opinions about products easily
o Producers to get certain degrees of feedback
5.5.2 Related works
Single-document summarization
o Extractive-based approach
Sentence score + ranking
Machine learning technique
o Abstractive-based approach
Template
Concept hierarchy
Multi-document summarization
o Extractive-based approach
Sentence score + ranking + MMR + Ordering
o Abstractive-based approach
Template
Concept hierarchy
Sentence fusion with paraphrasing rules
Sentiment analysis
o Reviews polarity classification
o PROS/ CONS identification
o Mining review opinions
Identify product facets
Identify opinion orientation on the facet
5.5.3 Process
17. 5.5.4 Product facets identification
o Association rule mining
Each transaction consists of nouns/noun phrases from single sentence
The frequent itemsets are the candidate product facets
o Redundancy pruning
Removing redundant facets that contain only single words. (e.g. life ->
battery life)
o Compactness pruning
Removing meaningless facets that contain multiple words