IIT-TUDA at SemEval-2016 Task 5: Beyond Sentiment Lexicon: Combining Domain Dependency and Distributional Semantics Features for Aspect Based Sentiment Analysis
Ayush Kumar, Sarah Kohail, Amit Kumar, Asif Ekbal, Chris Biemann
IIT Patna, India
TU Darmstadt, Germany
Presented by: Alexander Panchenko, TU Darmstadt, Germany
Can Deep Learning solve the Sentiment Analysis ProblemMark Cieliebak
Sentiment analysis appears to be one of the easier tasks in the realm of text analytics: given a text like a tweet or product review, decide whether it contains positive or negative opinion. This task is almost trivial for humans, but it turns out to be a true challenge for automated systems. In fact, state-of-the-art sentiment analysis tools are wrong on approx. 4 out of 10 documents.
Current sentiment analysis tools are rule-based, feature-based, or combinations of both. However, recent research uses deep learning on very large sets of documents.
In this talk, we will explain the intrinsic difficulties of automated sentiment analysis; present existing solution approaches and their performance; describe an architecture for a deep learning system; and explore whether deep learning can improve sentiment analysis accuracy.
LEPOR: an augmented machine translation evaluation metric - Thesis PPT Lifeng (Aaron) Han
Machine translation (MT) was developed as one of the hottest research topics in the natural language processing (NLP) literature. One important issue in MT is that how to evaluate the MT system reasonably and tell us whether the translation system makes an improvement or not. The traditional manual judgment methods are expensive, time-consuming, unrepeatable, and sometimes with low agreement. On the other hand, the popular automatic MT evaluation methods have some weaknesses. Firstly, they tend to perform well on the language pairs with English as the target language, but weak when English is used as source. Secondly, some methods rely on many additional linguistic features to achieve good performance, which makes the metric unable to replicateand apply to other language pairs easily. Thirdly, some popular metrics utilize incomprehensive factors, which result in low performance on some practical tasks.
In this thesis, to address the existing problems, we design novel MT evaluation methods and investigate their performances on different languages. Firstly, we design augmented factors to yield highly accurate evaluation.Secondly, we design a tunable evaluation model where weighting of factors can be optimized according to the characteristics of languages. Thirdly, in the enhanced version of our methods, we design concise linguistic feature using POS to show that our methods can yield even higher performance when using some external linguistic resources. Finally, we introduce the practical performance of our metrics in the ACL-WMT workshop shared tasks, which show that the proposed methods are robust across different languages.
RuleML2015: Rule Generalization Strategies in Incremental Learning of Disjunc...RuleML
Symbolic Machine Learning systems and applications, especially
when applied to real-world domains, must face the problem of
concepts that cannot be captured by a single definition, but require several
alternate definitions, each of which covers part of the full concept
extension. This problem is particularly relevant for incremental systems,
where progressive covering approaches are not applicable, and the learning
and refinement of the various definitions is interleaved during the
learning phase. In these systems, not only the learned model depends
on the order in which the examples are provided, but it also depends on
the choice of the specific definition to be refined. This paper proposes
different strategies for determining the order in which the alternate definitions
of a concept should be considered in a generalization step, and
evaluates their performance on a real-world domain dataset.
Can Deep Learning solve the Sentiment Analysis ProblemMark Cieliebak
Sentiment analysis appears to be one of the easier tasks in the realm of text analytics: given a text like a tweet or product review, decide whether it contains positive or negative opinion. This task is almost trivial for humans, but it turns out to be a true challenge for automated systems. In fact, state-of-the-art sentiment analysis tools are wrong on approx. 4 out of 10 documents.
Current sentiment analysis tools are rule-based, feature-based, or combinations of both. However, recent research uses deep learning on very large sets of documents.
In this talk, we will explain the intrinsic difficulties of automated sentiment analysis; present existing solution approaches and their performance; describe an architecture for a deep learning system; and explore whether deep learning can improve sentiment analysis accuracy.
LEPOR: an augmented machine translation evaluation metric - Thesis PPT Lifeng (Aaron) Han
Machine translation (MT) was developed as one of the hottest research topics in the natural language processing (NLP) literature. One important issue in MT is that how to evaluate the MT system reasonably and tell us whether the translation system makes an improvement or not. The traditional manual judgment methods are expensive, time-consuming, unrepeatable, and sometimes with low agreement. On the other hand, the popular automatic MT evaluation methods have some weaknesses. Firstly, they tend to perform well on the language pairs with English as the target language, but weak when English is used as source. Secondly, some methods rely on many additional linguistic features to achieve good performance, which makes the metric unable to replicateand apply to other language pairs easily. Thirdly, some popular metrics utilize incomprehensive factors, which result in low performance on some practical tasks.
In this thesis, to address the existing problems, we design novel MT evaluation methods and investigate their performances on different languages. Firstly, we design augmented factors to yield highly accurate evaluation.Secondly, we design a tunable evaluation model where weighting of factors can be optimized according to the characteristics of languages. Thirdly, in the enhanced version of our methods, we design concise linguistic feature using POS to show that our methods can yield even higher performance when using some external linguistic resources. Finally, we introduce the practical performance of our metrics in the ACL-WMT workshop shared tasks, which show that the proposed methods are robust across different languages.
RuleML2015: Rule Generalization Strategies in Incremental Learning of Disjunc...RuleML
Symbolic Machine Learning systems and applications, especially
when applied to real-world domains, must face the problem of
concepts that cannot be captured by a single definition, but require several
alternate definitions, each of which covers part of the full concept
extension. This problem is particularly relevant for incremental systems,
where progressive covering approaches are not applicable, and the learning
and refinement of the various definitions is interleaved during the
learning phase. In these systems, not only the learned model depends
on the order in which the examples are provided, but it also depends on
the choice of the specific definition to be refined. This paper proposes
different strategies for determining the order in which the alternate definitions
of a concept should be considered in a generalization step, and
evaluates their performance on a real-world domain dataset.
Most existing approaches to Twitter sentiment analysis assume that sentiment is explicitly expressed through affective words. Nevertheless, sentiment is often implicitly expressed via latent semantic relations, patterns and dependencies among words in tweets. In this paper, we propose a novel approach that automatically captures patterns of words of similar contextual semantics and sentiment in tweets. Unlike previous work on sentiment pattern extraction, our proposed approach does not rely on external and fixed sets of syntactical templates/patterns, nor requires deep analyses of the syntactic structure of sentences in tweets.
We evaluate our approach with tweet- and entity-level sentiment analysis tasks by using the extracted semantic patterns as classification features in both tasks. We use 9 Twitter datasets in our evaluation and compare the performance of our patterns against 6 state-of-the-art baselines. Results show that our patterns consistently outperform all other baselines on all datasets by 2.19% at the tweet-level and 7.5% at the entity-level in average F-measure.
Natural language processing for requirements engineering: ICSE 2021 Technical...alessio_ferrari
These are the slides for the technical briefing given at ICSE 2021, given by Alessio Ferrari, Liping Zhao, and Waad Alhoshan
It covers RE tasks to which NLP is applied, an overview of a recent systematic mapping study on the topic, and a hands-on tutorial on using transfer learning for requirements classification.
Please find the links to the colab notebooks here:
https://colab.research.google.com/drive/158H-lEJE1pc-xHc1ISBAKGDHMt_eg4Gn?usp=sharing
https://colab.research.google.com/d rive/1B_5ow3rvS0Qz1y-KyJtlMNnm gmx9w3kJ?usp=sharing
https://colab.research.google.com/d rive/1Xrm0gNaa41YwlM5g2CRYYX cRvpbDnTRT?usp=sharing
Meta-evaluation of machine translation evaluation methodsLifeng (Aaron) Han
Cite: Lifeng Han. 2021. Meta-evaluation of machine translation evaluation methods. In Metrics2021 Tutorial Track/type: Workshop on Informetric and Scientometric Research (SIG-MET), ASIS&T. October 23–24.
Datalog+-Track Introduction & Reasoning on UML Class Diagrams via Datalog+-RuleML
UML class diagrams (UCDs) are a widely adopted formalism
for modeling the intensional structure of a software system. Although
UCDs are typically guiding the implementation of a system, it is common
in practice that developers need to recover the class diagram from an
implemented system. This process is known as reverse engineering. A
fundamental property of reverse engineered (or simply re-engineered)
UCDs is consistency, showing that the system is realizable in practice.
In this work, we investigate the consistency of re-engineered UCDs, and
we show is pspace-complete. The upper bound is obtained by exploiting
algorithmic techniques developed for conjunctive query answering under
guarded Datalog+/-, that is, a key member of the Datalog+/- family
of KR languages, while the lower bound is obtained by simulating the
behavior of a polynomial space Turing machine.
Lexicon-based approaches to Twitter sentiment analysis are gaining much popularity due to their simplicity, domain independence, and relatively good performance. These approaches rely on sentiment lexicons, where a collection of words are marked with fixed sentiment polarities. However, words' sentiment orientation (positive, neural, negative) and/or sentiment strengths could change depending on context and targeted entities. In this paper we present SentiCircle; a novel lexicon-based approach that takes into account the contextual and conceptual semantics of words when calculating their sentiment orientation and strength in Twitter. We evaluate our approach on three Twitter datasets using three different sentiment lexicons. Results show that our approach significantly outperforms two lexicon baselines. Results are competitive but inconclusive when comparing to state-of-art SentiStrength, and vary from one dataset to another. SentiCircle outperforms SentiStrength in accuracy on average, but falls marginally behind in F-measure.
This presentation talks about Natural Language Processing using Java. At Museaic, a music intelligence platform, we spent time figuring out how to extract central themes from song lyrics. In this talk, I will cover some of the tasks involved in natural language processing such as named entity recognition, word sense disambiguation and concept/theme extraction. I will also cover libraries available in java such as stanford-nlp, dbpedia-spotlight and graph approaches using WordNet and semantic databases. This talk would help people understand text processing beyond simple keyword approaches and provide them with some of the best techniques/libraries for it in the Java world.
Following are the questions which I tried to answer in this ppt
What is text summarization.
What is automatic text summarization?
How it has evolved over the time?
What are different methods?
How deep learning is used for text summarization?
business application
in first few slides extractive summarization is explained, with pro and cons in next section abstractive on is explained.
In the last section business application of each one is highlighted
An on-going project on Natural Language Processing (using Python and the NLTK toolkit), which focuses on the extraction of sentiment from a Question and its title on www.stackoverflow.com and determining the polarity.Based on the above findings, it is verified whether the rules and guidelines imposed by the SO community on the users are strictly followed or not.
Sentiment analysis over Twitter offers organisations and individuals a fast and effective way to monitor the publics' feelings towards them and their competitors. To assess the performance of sentiment analysis methods over Twitter a small set of evaluation datasets have been released in the last few years. In this paper we present an overview of eight publicly available and manually annotated evaluation datasets for Twitter sentiment analysis. Based on this review, we show that a common limitation of most of these datasets, when assessing sentiment analysis at target (entity) level, is the lack of distinctive sentiment annotations among the tweets and the entities contained in them. For example, the tweet ``I love iPhone, but I hate iPad'' can be annotated with a mixed sentiment label, but the entity iPhone within this tweet should be annotated with a positive sentiment label. Aiming to overcome this limitation, and to complement current evaluation datasets, we present STS-Gold, a new evaluation dataset where tweets and targets (entities) are annotated individually and therefore may present different sentiment labels. This paper also provides a comparative study of the various datasets along several dimensions including: total number of tweets, vocabulary size and sparsity. We also investigate the pair-wise correlation among these dimensions as well as their correlations to the sentiment classification performance on different datasets.
Rulelog is in process of industry standardization via RuleML and W3C:
RIF-Rulelog specification, version of of May 24, 2013, Michael Kifer, ed. RIF-Rulelog is a powerful dialect of W3C Rule Interchange Format (RIF) that is in draft as a submission from RuleML to W3C.
Several industry standards in the areas are based heavily on our team’s contributions to the authoring/editing of the specifications and conducting the underlying research and earlier-phase standards design. These include most notably the two most important industry standards on rules knowledge:
W3C Rule Interchange Format (RIF), which is primarily based on the RuleML standards design (semantic web rules)
W3C OWL 2 RL Profile (rule-based web ontologies)
The team has also contributed to the development of W3C SPARQL and ISO Common Logic, and been strongly involved in other related standardization efforts at OMG and Oasis.
Twitter has brought much attention recently as a hot research topic in the domain of sentiment analysis. Training sentiment classifiers from tweets data often faces the data sparsity problem partly due to the large variety of short and irregular forms introduced to tweets because of the 140-character limit. In this work we propose using two different sets of features to alleviate the data sparseness problem. One is the semantic feature set where we extract semantically hidden concepts from tweets and then incorporate them into classifier training through interpolation. Another is the sentiment-topic feature set where we extract latent topics and the associated topic sentiment from tweets, then augment the original feature space with these sentiment-topics. Experimental results on the Stanford Twitter Sentiment Dataset show that both feature sets outperform the baseline model using unigrams only. Moreover, using semantic features rivals the previously reported best result. Using sentiment-topic features achieves 86.3% sentiment classification accuracy, which outperforms existing approaches.
Methodological study of opinion mining and sentiment analysis techniquesijsc
Decision making both on individual and organizational level is always accompanied by the search of
other’s opinion on the same. With tremendous establishment of opinion rich resources like, reviews, forum
discussions, blogs, micro-blogs, Twitter etc provide a rich anthology of sentiments. This user generated
content can serve as a benefaction to market if the semantic orientations are deliberated. Opinion mining
and sentiment analysis are the formalization for studying and construing opinions and sentiments. The
digital ecosystem has itself paved way for use of huge volume of opinionated data recorded. This paper is
an attempt to review and evaluate the various techniques used for opinion and sentiment analysis.
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques ijsc
Decision making both on individual and organizational level is always accompanied by the search of other’s opinion on the same. With tremendous establishment of opinion rich resources like, reviews, forum discussions, blogs, micro-blogs, Twitter etc provide a rich anthology of sentiments. This user generated content can serve as a benefaction to market if the semantic orientations are deliberated. Opinion mining and sentiment analysis are the formalization for studying and construing opinions and sentiments. The digital ecosystem has itself paved way for use of huge volume of opinionated data recorded. This paper is an attempt to review and evaluate the various techniques used for opinion and sentiment analysis.
Most existing approaches to Twitter sentiment analysis assume that sentiment is explicitly expressed through affective words. Nevertheless, sentiment is often implicitly expressed via latent semantic relations, patterns and dependencies among words in tweets. In this paper, we propose a novel approach that automatically captures patterns of words of similar contextual semantics and sentiment in tweets. Unlike previous work on sentiment pattern extraction, our proposed approach does not rely on external and fixed sets of syntactical templates/patterns, nor requires deep analyses of the syntactic structure of sentences in tweets.
We evaluate our approach with tweet- and entity-level sentiment analysis tasks by using the extracted semantic patterns as classification features in both tasks. We use 9 Twitter datasets in our evaluation and compare the performance of our patterns against 6 state-of-the-art baselines. Results show that our patterns consistently outperform all other baselines on all datasets by 2.19% at the tweet-level and 7.5% at the entity-level in average F-measure.
Natural language processing for requirements engineering: ICSE 2021 Technical...alessio_ferrari
These are the slides for the technical briefing given at ICSE 2021, given by Alessio Ferrari, Liping Zhao, and Waad Alhoshan
It covers RE tasks to which NLP is applied, an overview of a recent systematic mapping study on the topic, and a hands-on tutorial on using transfer learning for requirements classification.
Please find the links to the colab notebooks here:
https://colab.research.google.com/drive/158H-lEJE1pc-xHc1ISBAKGDHMt_eg4Gn?usp=sharing
https://colab.research.google.com/d rive/1B_5ow3rvS0Qz1y-KyJtlMNnm gmx9w3kJ?usp=sharing
https://colab.research.google.com/d rive/1Xrm0gNaa41YwlM5g2CRYYX cRvpbDnTRT?usp=sharing
Meta-evaluation of machine translation evaluation methodsLifeng (Aaron) Han
Cite: Lifeng Han. 2021. Meta-evaluation of machine translation evaluation methods. In Metrics2021 Tutorial Track/type: Workshop on Informetric and Scientometric Research (SIG-MET), ASIS&T. October 23–24.
Datalog+-Track Introduction & Reasoning on UML Class Diagrams via Datalog+-RuleML
UML class diagrams (UCDs) are a widely adopted formalism
for modeling the intensional structure of a software system. Although
UCDs are typically guiding the implementation of a system, it is common
in practice that developers need to recover the class diagram from an
implemented system. This process is known as reverse engineering. A
fundamental property of reverse engineered (or simply re-engineered)
UCDs is consistency, showing that the system is realizable in practice.
In this work, we investigate the consistency of re-engineered UCDs, and
we show is pspace-complete. The upper bound is obtained by exploiting
algorithmic techniques developed for conjunctive query answering under
guarded Datalog+/-, that is, a key member of the Datalog+/- family
of KR languages, while the lower bound is obtained by simulating the
behavior of a polynomial space Turing machine.
Lexicon-based approaches to Twitter sentiment analysis are gaining much popularity due to their simplicity, domain independence, and relatively good performance. These approaches rely on sentiment lexicons, where a collection of words are marked with fixed sentiment polarities. However, words' sentiment orientation (positive, neural, negative) and/or sentiment strengths could change depending on context and targeted entities. In this paper we present SentiCircle; a novel lexicon-based approach that takes into account the contextual and conceptual semantics of words when calculating their sentiment orientation and strength in Twitter. We evaluate our approach on three Twitter datasets using three different sentiment lexicons. Results show that our approach significantly outperforms two lexicon baselines. Results are competitive but inconclusive when comparing to state-of-art SentiStrength, and vary from one dataset to another. SentiCircle outperforms SentiStrength in accuracy on average, but falls marginally behind in F-measure.
This presentation talks about Natural Language Processing using Java. At Museaic, a music intelligence platform, we spent time figuring out how to extract central themes from song lyrics. In this talk, I will cover some of the tasks involved in natural language processing such as named entity recognition, word sense disambiguation and concept/theme extraction. I will also cover libraries available in java such as stanford-nlp, dbpedia-spotlight and graph approaches using WordNet and semantic databases. This talk would help people understand text processing beyond simple keyword approaches and provide them with some of the best techniques/libraries for it in the Java world.
Following are the questions which I tried to answer in this ppt
What is text summarization.
What is automatic text summarization?
How it has evolved over the time?
What are different methods?
How deep learning is used for text summarization?
business application
in first few slides extractive summarization is explained, with pro and cons in next section abstractive on is explained.
In the last section business application of each one is highlighted
An on-going project on Natural Language Processing (using Python and the NLTK toolkit), which focuses on the extraction of sentiment from a Question and its title on www.stackoverflow.com and determining the polarity.Based on the above findings, it is verified whether the rules and guidelines imposed by the SO community on the users are strictly followed or not.
Sentiment analysis over Twitter offers organisations and individuals a fast and effective way to monitor the publics' feelings towards them and their competitors. To assess the performance of sentiment analysis methods over Twitter a small set of evaluation datasets have been released in the last few years. In this paper we present an overview of eight publicly available and manually annotated evaluation datasets for Twitter sentiment analysis. Based on this review, we show that a common limitation of most of these datasets, when assessing sentiment analysis at target (entity) level, is the lack of distinctive sentiment annotations among the tweets and the entities contained in them. For example, the tweet ``I love iPhone, but I hate iPad'' can be annotated with a mixed sentiment label, but the entity iPhone within this tweet should be annotated with a positive sentiment label. Aiming to overcome this limitation, and to complement current evaluation datasets, we present STS-Gold, a new evaluation dataset where tweets and targets (entities) are annotated individually and therefore may present different sentiment labels. This paper also provides a comparative study of the various datasets along several dimensions including: total number of tweets, vocabulary size and sparsity. We also investigate the pair-wise correlation among these dimensions as well as their correlations to the sentiment classification performance on different datasets.
Rulelog is in process of industry standardization via RuleML and W3C:
RIF-Rulelog specification, version of of May 24, 2013, Michael Kifer, ed. RIF-Rulelog is a powerful dialect of W3C Rule Interchange Format (RIF) that is in draft as a submission from RuleML to W3C.
Several industry standards in the areas are based heavily on our team’s contributions to the authoring/editing of the specifications and conducting the underlying research and earlier-phase standards design. These include most notably the two most important industry standards on rules knowledge:
W3C Rule Interchange Format (RIF), which is primarily based on the RuleML standards design (semantic web rules)
W3C OWL 2 RL Profile (rule-based web ontologies)
The team has also contributed to the development of W3C SPARQL and ISO Common Logic, and been strongly involved in other related standardization efforts at OMG and Oasis.
Twitter has brought much attention recently as a hot research topic in the domain of sentiment analysis. Training sentiment classifiers from tweets data often faces the data sparsity problem partly due to the large variety of short and irregular forms introduced to tweets because of the 140-character limit. In this work we propose using two different sets of features to alleviate the data sparseness problem. One is the semantic feature set where we extract semantically hidden concepts from tweets and then incorporate them into classifier training through interpolation. Another is the sentiment-topic feature set where we extract latent topics and the associated topic sentiment from tweets, then augment the original feature space with these sentiment-topics. Experimental results on the Stanford Twitter Sentiment Dataset show that both feature sets outperform the baseline model using unigrams only. Moreover, using semantic features rivals the previously reported best result. Using sentiment-topic features achieves 86.3% sentiment classification accuracy, which outperforms existing approaches.
Similar to IIT-TUDA at SemEval-2016 Task 5: Beyond Sentiment Lexicon: Combining Domain Dependency and Distributional Semantics Features for Aspect Based Sentiment Analysis
Methodological study of opinion mining and sentiment analysis techniquesijsc
Decision making both on individual and organizational level is always accompanied by the search of
other’s opinion on the same. With tremendous establishment of opinion rich resources like, reviews, forum
discussions, blogs, micro-blogs, Twitter etc provide a rich anthology of sentiments. This user generated
content can serve as a benefaction to market if the semantic orientations are deliberated. Opinion mining
and sentiment analysis are the formalization for studying and construing opinions and sentiments. The
digital ecosystem has itself paved way for use of huge volume of opinionated data recorded. This paper is
an attempt to review and evaluate the various techniques used for opinion and sentiment analysis.
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques ijsc
Decision making both on individual and organizational level is always accompanied by the search of other’s opinion on the same. With tremendous establishment of opinion rich resources like, reviews, forum discussions, blogs, micro-blogs, Twitter etc provide a rich anthology of sentiments. This user generated content can serve as a benefaction to market if the semantic orientations are deliberated. Opinion mining and sentiment analysis are the formalization for studying and construing opinions and sentiments. The digital ecosystem has itself paved way for use of huge volume of opinionated data recorded. This paper is an attempt to review and evaluate the various techniques used for opinion and sentiment analysis.
Creating a dataset of peer review in computer science conferences published b...Aliaksandr Birukou
Computer science (CS) as a field is characterised by higher publication numbers and prestige of conference proceedings as opposed to scholarly journal articles. In this presentation we present preliminary results of the extraction and analysis of peer review information from computer science conferences published by Springer in almost 10,000 proceedings volumes. The results will be uploaded to lod.springer.com, with the purpose of creation of the largest dataset of peer review processes in CS conferences.
Brains, Data, and Machine Intelligence (2014 04 14 London Meetup)Numenta
Jeff will discuss the Brains, Data, Machine Intelligence, Cortical Learning Algorithm he developed and the Numenta Platform for Intelligent Computing (NuPIC).
Predicting Contradiction Intensity: Low, Strong or Very Strong?Ismail BADACHE
Reviews on web resources (e.g. courses, movies) become increasingly exploited in text analysis tasks (e.g. opinion detection, controversy detection). This paper investigates contradiction intensity in reviews exploiting different features such as variation of ratings and variation of polarities around specific entities (e.g. aspects, topics). Firstly, aspects are identified according to the distributions of the emotional terms in the vicinity of the most frequent nouns in the reviews collection. Secondly, the polarity of each review segment containing an aspect is estimated. Only resources containing these aspects with opposite polarities are considered. Finally, some features are evaluated, using feature selection algorithms, to determine their impact on the effectiveness of contradiction intensity detection. The selected features are used to learn some state-of-the-art learning approaches. The experiments are conducted on the Massive Open Online Courses data set containing 2244 courses and their 73,873 reviews, collected from coursera.org. Results showed that variation of ratings, variation of polarities, and reviews quantity are the best predictors of contradiction intensity. Also, J48 was the most effective learning approach for this type of classification.
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...kevig
This study investigates the effectiveness of Knowledge Named Entity Recognition in Online Judges (OJs). OJs are lacking in the classification of topics and limited to the IDs only. Therefore a lot of time is consumed in finding programming problems more specifically in knowledge entities.A Bidirectional Long Short-Term Memory (BiLSTM) with Conditional Random Fields (CRF) model is applied for the recognition of knowledge named entities existing in the solution reports.For the test run, more than 2000 solution reports are crawled from the Online Judges and processed for the model output. The stability of the model is also assessed with the higher F1 value. The results obtained through the proposed BiLSTM-CRF model are more effectual (F1: 98.96%) and efficient in lead-time.
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...ijnlc
This study investigates the effectiveness of Knowledge Named Entity Recognition in Online Judges (OJs). OJs are lacking in the classification of topics and limited to the IDs only. Therefore a lot of time is consumed in finding programming problems more specifically in knowledge entities.A Bidirectional Long Short-Term Memory (BiLSTM) with Conditional Random Fields (CRF) model is applied for the recognition of knowledge named entities existing in the solution reports.For the test run, more than 2000 solution reports are crawled from the Online Judges and processed for the model output. The stability of the model is
also assessed with the higher F1 value. The results obtained through the proposed BiLSTM-CRF model are more effectual (F1: 98.96%) and efficient in lead-time.
code4lib 2011 preconference: What's New in Solr (since 1.4.1)Erik Hatcher
code4lib 2011 preconference, presented by Erik Hatcher of Lucid Imagination.
Abstract: The library world is fired up about Solr. Practically every next-gen catalog is using it (via Blacklight, VuFind, or other technologies). Solr has continued improving in some dramatic ways, including geospatial support, field collapsing/grouping, extended dismax query parsing, pivot/grid/matrix/tree faceting, autosuggest, and more. This session will cover all of these new features, showcasing live examples of them all, including anything new that is implemented prior to the conference.
Trilinos progress, challenges and future plansM Reza Rahmati
Similar to IIT-TUDA at SemEval-2016 Task 5: Beyond Sentiment Lexicon: Combining Domain Dependency and Distributional Semantics Features for Aspect Based Sentiment Analysis (20)
Graph's not dead: from unsupervised induction of linguistic structures from t...Alexander Panchenko
In this invited talk, presented at the Dialogue'2018 conference, I argue for the usefulness of graph representations for NLP in the deep learning era. In the lecture, it is described how to extract symbolic linguistic structures, such as word senses and semantic frames in an unsupervised way from text corpora using graph-based algorithms and distributional semantics.
Building a Web-Scale Dependency-Parsed Corpus from Common CrawlAlexander Panchenko
We present DepCC, the largest-to-date linguistically analyzed corpus in English including 365 million documents, composed of 252 billion tokens and 7.5 billion of named entity occurrences in 14.3 billion sentences from a web-scale crawl of the Common Crawl project. The sentences are processed with a dependency parser and with a named entity tagger and contain provenance information, enabling various applications ranging from training syntax-based word embeddings to open information extraction and question answering. We built an index of all sentences and their linguistic meta-data enabling quick search across the corpus. We demonstrate the utility of this corpus on the verb similarity task by showing that a distributional model trained on our corpus yields better results than models trained on smaller corpora, like Wikipedia. This distributional model outperforms the state of art models of verb similarity trained on smaller corpora on the SimVerb3500 dataset.
http://www.lrec-conf.org/proceedings/lrec2018/summaries/215.html
Improving Hypernymy Extraction with Distributional Semantic ClassesAlexander Panchenko
http://www.lrec-conf.org/proceedings/lrec2018/pdf/234.pdf
In this paper, we show how distributionally-induced semantic classes can be helpful for extracting hypernyms. We present methods for inducing sense-aware semantic classes using distributional semantics and using these induced semantic classes for filtering noisy hypernymy relations. Denoising of hypernyms is performed by labeling each semantic class with its hypernyms. On the one hand, this allows us to filter out wrong extractions using the global structure of distributionally similar senses. On the other hand, we infer missing hypernyms via label propagation to cluster terms. We conduct a large-scale crowdsourcing study showing that processing of automatically extracted hypernyms using our approach improves the quality of the hypernymy extraction in terms of both precision and recall. Furthermore, we show the utility of our method in the domain taxonomy induction task, achieving the state-of-the-art results on a SemEval'16 task on taxonomy induction.
The paper was presented at the LREC'2018 conference in Miyazaki, Japan.
Inducing Interpretable Word Senses for WSD and Enrichment of Lexical ResourcesAlexander Panchenko
In this talk, we will discuss induction of sparse and dense
word sense representations using graph-based approaches and
distributional models. Induced senses are represented by a vector, but
also a set of hypernyms, images, and usage examples, derived in an
unsupervised and knowledge-free manner, which ensure interpretability
of the discovered senses by humans. We showcase the usage of the
induced representations for the tasks of word sense disambiguation and
enrichment of lexical resources, such as WordNet.
Fighting with Sparsity of the Synonymy Dictionaries for Automatic Synset Indu...Alexander Panchenko
Presentation at the AIST'17 conference by Dmitry Ustalov. Authors of the original paper: Dmitry Ustalov, Mikhail Chernoskutov, Chris Biemann, Alexander Panchenko.
Using Linked Disambiguated Distributional Networks for Word Sense DisambiguationAlexander Panchenko
We introduce a new method for unsupervised knowledge-based word sense disambiguation (WSD) based on a resource that links two types of sense-aware lexical networks: one is induced from a corpus using distributional semantics, the other is manually constructed. The combination of two networks reduces the sparsity of sense representations used for WSD. We evaluate these enriched representations within two lexical sample sense disambiguation benchmarks. Our results indicate that (1) features extracted from the corpus-based resource help to significantly outperform a model based solely on the lexical resource; (2) our method achieves results comparable or better to four state-of-the-art unsupervised knowledge-based WSD systems including three hybrid systems that also rely on text corpora. In contrast to these hybrid methods, our approach does not require access to web search engines, texts mapped to a sense inventory, or machine translation systems.
See the full paper at: http://www.aclweb.org/anthology/W/W17/W17-1909.pdf
Panchenko A., Faralli S., Ponzetto S. P., and Biemann C. (2017): Using Linked Disambiguated Distributional Networks for Word Sense Disambiguation. In Proceedings of the Workshop on Sense, Concept and Entity Representations and their Applications (SENSE) co-located with the 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL'2017). Valencia, Spain. Association for Computational Linguistics
Unsupervised Does Not Mean Uninterpretable: The Case for Word Sense Induction...Alexander Panchenko
The current trend in NLP is the use of highly opaque models, e.g. neural networks and word embeddings. While these models yield state-of-the-art results on a range of tasks, their drawback is
poor interpretability. On the example of word sense induction and disambiguation (WSID), we show that it is possible to develop an interpretable model that matches the state-of-the-art models in accuracy. Namely, we present an unsupervised, knowledge-free WSID approach, which is interpretable at three levels: word sense inventory, sense feature representations, and disambiguation procedure. Experiments show that our model performs on par with state-of-the-art word sense embeddings and other unsupervised systems while offering the possibility to justify
its decisions in human-readable form.
Noun Sense Induction and Disambiguation using Graph-Based Distributional Sema...Alexander Panchenko
We introduce an approach to word sense
induction and disambiguation. The method
is unsupervised and knowledge-free: sense
representations are learned from distributional
evidence and subsequently used to
disambiguate word instances in context.
These sense representations are obtained
by clustering dependency-based secondorder
similarity networks. We then add
features for disambiguation from heterogeneous
sources such as window-based and
sentence-wide co-occurrences, and explore
various schemes to combine these context
clues. Our method reaches a performance
comparable to the state-of-the-art unsupervised
word sense disambiguation systems
including top participants of the SemEval
2013 word sense induction task and two
more recent state-of-the-art neural word
sense induction systems
Full paper:
https://www.lt.informatik.tu-darmstadt.de/fileadmin/user_upload/Group_LangTech/publications/konvens2016panchenko.pdf
A sentiment index measures the average emotional level in a corpus. We introduce four such indexes and use them to gauge average “positiveness” of a population during some period based on posts in a social network. This article for the first time presents a text-, rather than word-based sentiment index. Furthermore, this study presents the first large-scale study of the sentiment index of the Russian-speaking Facebook. Our results are consistent with the prior experiments for English language.
Semantic relations, such as synonyms, hypernyms and co-hyponyms proved to be useful for text processing applications, including text similarity, query expansion, question answering and word sense disambiguation. Such relations are practical because of the gap between lexical surface of the text and its meaning. Indeed, the same concept is often represented by different terms. However, existing resources often do not cover a vocabulary required by a given system. Manual resource construction is prohibitively expensive for many projects.
On the other hand, precision of the existing extractors still do not meet quality of the handcrafted resources. All these factors motivate the development of novel extraction methods. In this work we developed several similarity measures for semantic relation extraction. The main research question we address, is how to improve precision and coverage of such measures. First, we perform a large-scale study the baseline techniques. Second, we propose four novel measures. One of them significantly outperforms the baselines, the others perform comparably to the state-of-the-art techniques. Finally, we successfully apply one of the novel measures in two text processing systems.
Detecting Gender by Full Name: Experiments with the Russian LanguageAlexander Panchenko
This paper describes a method that detects gender of a person by his/her full name. While some approaches were proposed for English language, little has been done so far for Russian. We fill this gap and present a large-scale experiment on a dataset of 100,000 Russian full names from Facebook. Our method is based on three types of features (word endings, character $n$-grams and dictionary of names) combined within a linear supervised model. Experiments show that the proposed simple and computationally efficient approach yields excellent results achieving accuracy up to 96\%.
Вычислительная лексическая семантика: метрики семантической близости и их при...Alexander Panchenko
Вычислительная лексическая семантика: метрики семантической близости и их приложения
Серия лекций в НИУ ВШЭ, факультет бизнес-информатики и прикладной математики (Нижний Новгород)
Phenomics assisted breeding in crop improvementIshaGoswami9
As the population is increasing and will reach about 9 billion upto 2050. Also due to climate change, it is difficult to meet the food requirement of such a large population. Facing the challenges presented by resource shortages, climate
change, and increasing global population, crop yield and quality need to be improved in a sustainable way over the coming decades. Genetic improvement by breeding is the best way to increase crop productivity. With the rapid progression of functional
genomics, an increasing number of crop genomes have been sequenced and dozens of genes influencing key agronomic traits have been identified. However, current genome sequence information has not been adequately exploited for understanding
the complex characteristics of multiple gene, owing to a lack of crop phenotypic data. Efficient, automatic, and accurate technologies and platforms that can capture phenotypic data that can
be linked to genomics information for crop improvement at all growth stages have become as important as genotyping. Thus,
high-throughput phenotyping has become the major bottleneck restricting crop breeding. Plant phenomics has been defined as the high-throughput, accurate acquisition and analysis of multi-dimensional phenotypes
during crop growing stages at the organism level, including the cell, tissue, organ, individual plant, plot, and field levels. With the rapid development of novel sensors, imaging technology,
and analysis methods, numerous infrastructure platforms have been developed for phenotyping.
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxMAGOTI ERNEST
Although Artemia has been known to man for centuries, its use as a food for the culture of larval organisms apparently began only in the 1930s, when several investigators found that it made an excellent food for newly hatched fish larvae (Litvinenko et al., 2023). As aquaculture developed in the 1960s and ‘70s, the use of Artemia also became more widespread, due both to its convenience and to its nutritional value for larval organisms (Arenas-Pardo et al., 2024). The fact that Artemia dormant cysts can be stored for long periods in cans, and then used as an off-the-shelf food requiring only 24 h of incubation makes them the most convenient, least labor-intensive, live food available for aquaculture (Sorgeloos & Roubach, 2021). The nutritional value of Artemia, especially for marine organisms, is not constant, but varies both geographically and temporally. During the last decade, however, both the causes of Artemia nutritional variability and methods to improve poorquality Artemia have been identified (Loufi et al., 2024).
Brine shrimp (Artemia spp.) are used in marine aquaculture worldwide. Annually, more than 2,000 metric tons of dry cysts are used for cultivation of fish, crustacean, and shellfish larva. Brine shrimp are important to aquaculture because newly hatched brine shrimp nauplii (larvae) provide a food source for many fish fry (Mozanzadeh et al., 2021). Culture and harvesting of brine shrimp eggs represents another aspect of the aquaculture industry. Nauplii and metanauplii of Artemia, commonly known as brine shrimp, play a crucial role in aquaculture due to their nutritional value and suitability as live feed for many aquatic species, particularly in larval stages (Sorgeloos & Roubach, 2021).
ESR spectroscopy in liquid food and beverages.pptxPRIYANKA PATEL
With increasing population, people need to rely on packaged food stuffs. Packaging of food materials requires the preservation of food. There are various methods for the treatment of food to preserve them and irradiation treatment of food is one of them. It is the most common and the most harmless method for the food preservation as it does not alter the necessary micronutrients of food materials. Although irradiated food doesn’t cause any harm to the human health but still the quality assessment of food is required to provide consumers with necessary information about the food. ESR spectroscopy is the most sophisticated way to investigate the quality of the food and the free radicals induced during the processing of the food. ESR spin trapping technique is useful for the detection of highly unstable radicals in the food. The antioxidant capability of liquid food and beverages in mainly performed by spin trapping technique.
Toxic effects of heavy metals : Lead and Arsenicsanjana502982
Heavy metals are naturally occuring metallic chemical elements that have relatively high density, and are toxic at even low concentrations. All toxic metals are termed as heavy metals irrespective of their atomic mass and density, eg. arsenic, lead, mercury, cadmium, thallium, chromium, etc.
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
hematic appreciation test is a psychological assessment tool used to measure an individual's appreciation and understanding of specific themes or topics. This test helps to evaluate an individual's ability to connect different ideas and concepts within a given theme, as well as their overall comprehension and interpretation skills. The results of the test can provide valuable insights into an individual's cognitive abilities, creativity, and critical thinking skills
This presentation explores a brief idea about the structural and functional attributes of nucleotides, the structure and function of genetic materials along with the impact of UV rays and pH upon them.
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills MN
Travis Hills of Minnesota developed a method to convert waste into high-value dry fertilizer, significantly enriching soil quality. By providing farmers with a valuable resource derived from waste, Travis Hills helps enhance farm profitability while promoting environmental stewardship. Travis Hills' sustainable practices lead to cost savings and increased revenue for farmers by improving resource efficiency and reducing waste.
IIT-TUDA at SemEval-2016 Task 5: Beyond Sentiment Lexicon: Combining Domain Dependency and Distributional Semantics Features for Aspect Based Sentiment Analysis
1. IIT-TUDA at SemEval-2016 Task 5: Beyond Sentiment Lexicon:
Combining Domain Dependency and Distributional Semantics
Features for Aspect Based Sentiment Analysis
Ayush Kumar1, Sarah Kohail2, Amit Kumar1, Asif Ekbal1, Chris Biemann2
1IIT Patna, India 2TU Darmstadt, Germany
Presented by:
Alexander Panchenko, TU Darmstadt, Germany
2. Motivation
People write blog posts, comments, reviews, tweets, etc.
Attitudes, feelings, emotions, opinions, etc.
Mining and summarizing opinions/sentiment from text about
specific entities and their aspects can help:
Organizations to monitor their reputation and products.
Customers to make a decision or choose among multiple options.
2
4. SemEval-Task 5: Aspect-Based Sentiment analysis (ABSA)
Aspect Based Sentiment Analysis (ABSA) task analysis performs a
fine-grained sentiment analysis by addressing three slots:
1. Aspect Category Detection: Identifying the entity#attribute that is referred
to by the aspect. E and A should be chosen from predefined inventories of
entity types (e.g. LAPTOP, MOUSE, RESTAURANT, FOOD) and attribute
labels (e.g. DESIGN, PRICE, QUALITY).
2. Opinion Target (OT) Extraction: Extracting aspects, given a set of
sentences with pre-identified entities (e.g., restaurants), identify the aspect
terms “opinion target” from the review text which present in the sentence.
3. Sentiment Polarity Classification: Each identified Entity#Attribute, OT
tuple has to be assigned one of the following polarity labels: positive,
negative, or neutral.
4
5. Our Submission
We participated in Slot 1 (aspect category detection) and Slot 3
(sentiment polarity classification) for 7 languages and 4 different
domains.
We also conducted experiments for Slot 2 (opinion target
extraction) for 4 languages in restaurants domain.
Overall, we submitted 29 runs, covering 7 languages (English,
Spanish, Dutch, French, Turkish, Russian and Arabic) and 4
different domains (laptop, restaurants, phones, hotels).
5
6. Experimental Setup: Supervised Models
For Slot 1 and Slot 3, we use supervised classification using
Support Vector Machine (SVM) with the linear kernel.
For Slot 2, we use linear-chain Conditional Random Field
(CRF) with default parameters.
We perform 5-fold cross-validation on the training set to
evaluate the performance.
6
7. Feature Extraction: Preprocessing
Normalize digits to ‘num’ and remove stop words for tf-idf
computation.
For English, we use Stanford tools to tokenize, parse and
extract lemma, Part-of-Speech (PoS) and named entity (NE)
information.
For the other languages, we use taggers and dependency
parsers based on Universal Dependencies (UD).
7
8. Contribution I: Lexicon Expansion based on DT
8
1. Based on the notion of distributional thesaurus (DTs), we expand
existing lexical resources to reach a higher coverage of
sentiment lexicons and improve the extraction of rare/unseen
aspect words.
Examples of DT expansions
Token DT Expansion
good bad, excellent, decent, great
powerful potential, influential, strong, sophisticated
small tiny, large, sized, huge, sizable
efficient reliable, effective, energy-efficient, flexible
11. Contribution I: Lexicon Expansion based on DT
11
Expansion statistics for induced lexicons.
Common entries denote the number of words which are present
both in the seed lexicon and the induced lexicon
12. Contribution II: DDGs for Aspect Category Detection
12
processor .
.
.
(.....)
(.....)
(.....)
..
(.....)
(.....)
(.....)
..
(.....)
(.....)
(.....)
..
(.....)
(.....)
(.....)
..
(.....)
(.....)
(.....)
..
(.....)
(.....)
(.....)
..
(.....)
(.....)
(.....)
..
(.....)
(.....)
(.....)
..
.
.
.
d1
d
2
dn
fast
good
amod(processor, fast)
amod(processor, good)
conj(good, fast)
amod(processor, fast)
#
amod(processor, fast)
24
amod(processor, good)
13
conj(good, fast)
19
amod, 24
amod, 13
conj_and, 19
1. detect topics
underlying a
mixed-domain
dataset using
topic modeling.
2. Aggregate individual dependency relations between domain-specific
content words, weigh them with tf-idf and select the highest-ranked
words and their dependency relations.
13. Contribution II: DDGs for Aspect Category Detection
13
processor
fast
good
amod, 2
amod, 1
#
amod(processor, fast) 2
amod(processor, good) 1
conj(good, fast) 1
#
amod(processor, fast) 2
amod(processor, good) 1
3. Resulting graphs were filtered and only ‘amod’ (adjective modifying a noun) and
‘nsubj’ (nominal subjects of predicates) relations were selected.
4. For each extracted aspect from the
opinion-aspect pairs, we determine the
existence or absence of this aspect
using a binary feature.
14. Aspect Category Detection: Slot 1
Features:
Aspect list produced by Domain Dependency Graphs (DDG). (0/1)
Top 10 DTs expansions for every 5 five words based on tfidf score in
each aspect category (for example: ‘overpriced’, ‘$’, ‘pricey’, ‘cheap’,
‘expensive’ are the most significant terms in ‘food#price’ category).
(0/1)
Bag of Words. (freq)
14
15. Opinion Term “OT” Extraction Features: Slot 2
15
Features:
PoS context [-2..2]
Word and Local Context [-5..5]
5 DT expansions of current token
Expansion Score
Prefix and Suffix up to 4 characters
Noun phrase head word and its PoS
Character N-grams
Presence of adjective modifier dependency relations
Orthographic features (starts with capital letter)
Is frequent aspect?
Additional features for English:
WordNet (4 noun synsets of current
token)
NE information
Chunk information
Lemma
16. Sentiment Polarity Classification: Slot 3
Features:
N-Gram (unigram and bigram)
The sum of sentiment scores (including our DT-expanded lexicons)
Entity#Attribute pair given in the training set.
16
18. Impact of the Induced Lexicon
18
Feature Ablation Experiment for Sentiment Polarity Classification
(Slot 3)
19. Impact of the Induced Lexicon
19
Feature Ablation Experiment for Sentiment Polarity Classification
(Slot 3)
20. Future Work
20
Apply the Aspect-based Sentiment Analysis approach for German
Analysis of the Deutsche Bahn (DB) passenger user feedback
texts
http://lt.informatik.tu-darmstadt.de/de/research/absa-db-aspect-based-sentiment-analysis-for-db-products-and-
services
22. Opinion Target “OT” Extraction: Slot 2
Since we deal with the OT (opinion target) as a sequence labeling
problem, we identify the boundary of OT using the standard BIO
notation.
We follow the standard BIO notation, where ‘BASP’, ‘I-ASP’ and
‘O’ represent the beginning, intermediate and outside tokens of a
multi-word OT respectively.
22
The (O) Beef (B-ASP) Chow (I-ASP) Fun (I-ASP) was (O) very
(O) dry (O) . (O)
’ Beef Chow Fun’ is the
OT.
Editor's Notes
In the next 12 min I am going to talk about our submission to the SemEval Aspect based sentiment analysis task. This work is done in cooperation betweet TU-Darmstadt and IIT Patna.
Social media allow online users to share and explain their views and opinions about products and events. Mining and summarizing customers opinions from text can help organizations to monitor their products and customers as well to make decisions about their purchase.
1) The first task is to identify the entity#attribute that is referred to by the aspect. E and A should be chosen from predefined classes.
2) Extract the aspect terms “opinion target” from the review text in which the opinion is expressed toward.
3) And finally for identified Entity#Attribute and OT, assign one of three polarity labels: positive, negative, or neutral.
Aspect level analysis is a fine grained type of sentiment analysis which identifies the sentiment orientation towards each aspect.
Semeval task 5 offers the opportunity to experiment Aspect based sentiment analysis on benchmark datasets and across various
domains and languages through three subtasks.
Add the name of slot
Mainly we our submission is based on two contributions:
We use distributional thesaurus to expand the existing lexical resources and sentiment lexicon. This allows us to reach a higher coverage on rare/unseen sentiment or aspect words.
The idea is to expand all the words in the existing seed lexicon. Eg. For english ………..
Then for the words which are not present in the original seed lexicon we assign a new sentiment score by the following equation.
Here are the expansion statistics for the induced lexicons.
7 languages .. 7 seed lexicons.
Induced lexicons after expansion using DTs..
And common entries are the words which present in both the seed and the induced lexicon.
Our second contribution is using Domain Dependency Graphs to extract a list of features.
The idea is to detect topics underlying a mixed-domain dataset, aggregate individual dependency relations between domain-specific
content words, weigh them with tf-idf and produce a DDG by selecting the highest-ranked words and their dependency relations. Since the domains are already given, no topic modeling is required.