Sentiment Analysis in Twitter with Lightweight Discourse Analysis, Subhabrata Mukherjee and Pushpak Bhattacharyya, In Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012), IIT Bombay, Mumbai, Dec 8 - Dec 15, 2012 (http://www.cse.iitb.ac.in/~pb/papers/coling12-discourse-sa.pdf)
Machine Translation (MT) refers to the use of computers for the task of translating
automatically from one language to another. The differences between languages and
especially the inherent ambiguity of language make MT a very difficult problem. Traditional
approaches to MT have relied on humans supplying linguistic knowledge in the form of rules
to transform text in one language to another. Given the vastness of language, this is a highly
knowledge intensive task. Statistical MT is a radically different approach that automatically
acquires knowledge from large amounts of training data. This knowledge, which is typically
in the form of probabilities of various language features, is used to guide the translation
process. This report provides an overview of MT techniques, and looks in detail at the basic
statistical model.
PRONOUN DISAMBIGUATION: WITH APPLICATION TO THE WINOGRAD SCHEMA CHALLENGEkevig
A value-based approach to Natural Language Understanding, in particular, the disambiguation of
pronouns, is illustrated with a solution to a typical example from the Winograd Schema Challenge. The
worked example uses a language engine, Enguage, to support the articulation of the advocation and
fearing of violence. The example illustrates the indexical nature of pronouns, and how their values, their
referent objects, change because they are set by contextual data. It must be noted that Enguage is not a
suitable candidate for addressing the Winograd Schema Challenge as it is an interactive tool, whereas
the Challenge requires a preconfigured, unattended program.
This document summarizes Anabela Barreiro's PhD defense on using automated paraphrasing to improve machine translation, specifically for support verb constructions. It discusses how paraphrasing support verb constructions into semantically related verbs can simplify language and reduce ambiguity, improving machine translation quality. The thesis presents work formalizing support verb constructions and generating paraphrases, and experiments showing paraphrasing improved machine translation results by 21-31%. It suggests areas for future work expanding linguistic knowledge and paraphrasing capabilities.
This paper performs a detailed analysis on the alignment of Portuguese contractions, based on a previously aligned bilingual corpus. The alignment task was performed manually in a subset of the English-Portuguese CLUE4Translation Alignment Collection. The initial parallel corpus was pre-processed and a decision was made as to whether the contraction should be maintained or decomposed in the alignment. Decomposition was required in the cases in which the two words that have been concatenated, i.e., the preposition and the determiner or pronoun, go in two separate translation alignment pairs (PT- [no seio de] [a União Europeia] EN- [within] [the European Union]). Most contractions required decomposition in contexts where they are positioned at the end of a multiword unit. On the other hand, contractions tend to be maintained when they occur at the beginning or in the middle of the multiword unit, i.e., in the frozen part of the multiword (PT- [no que diz respeito a] EN- [with regard to] or PT- [além disso] EN-[in addition]. A correct alignment of multiwords and phrasal units containing contractions is instrumental for machine translation, paraphrasing, and variety adaptation.
This document presents the thesis proposal of Erna Olofiana Girsang, which examines non-equivalence at the word level when translating the Discovery of North Sumatera Guidebook from Bahasa Indonesia to English. The study aims to identify what types of non-equivalence are found, which types dominate, and why. It discusses relevant translation theories, strategies for dealing with non-equivalence, and previous related research. The study will use content analysis to examine the translations between the source and target languages in the guidebook.
In this presentation we will see the different paragraphs we have chosen and the reflection of each one of us about our experience of being able to translate and we will give some examples about the method, strategy and technique
This document discusses statistical machine translation (SMT). It provides an overview of key concepts in SMT including word and sentence alignment, probability, language models, and translation models. The fundamental equation of SMT is presented as using a translation model to find the best target language words for the input and a language model to build up a sentence from those words. Basic SMT architecture and the Bayes' theorem and chain rule for calculating probabilities are also covered.
- The document discusses various translation techniques and procedures such as direct translation, oblique translation, borrowing, calque, literal translation, transposition, modulation, adaptation, etc.
- It provides examples to illustrate techniques like shift/transposition, modulation, equivalence, adaptation, and combined procedures.
- Cultural words or expressions from one language may be translated using techniques like transference, naturalization, descriptive equivalent, or functional equivalent.
Machine Translation (MT) refers to the use of computers for the task of translating
automatically from one language to another. The differences between languages and
especially the inherent ambiguity of language make MT a very difficult problem. Traditional
approaches to MT have relied on humans supplying linguistic knowledge in the form of rules
to transform text in one language to another. Given the vastness of language, this is a highly
knowledge intensive task. Statistical MT is a radically different approach that automatically
acquires knowledge from large amounts of training data. This knowledge, which is typically
in the form of probabilities of various language features, is used to guide the translation
process. This report provides an overview of MT techniques, and looks in detail at the basic
statistical model.
PRONOUN DISAMBIGUATION: WITH APPLICATION TO THE WINOGRAD SCHEMA CHALLENGEkevig
A value-based approach to Natural Language Understanding, in particular, the disambiguation of
pronouns, is illustrated with a solution to a typical example from the Winograd Schema Challenge. The
worked example uses a language engine, Enguage, to support the articulation of the advocation and
fearing of violence. The example illustrates the indexical nature of pronouns, and how their values, their
referent objects, change because they are set by contextual data. It must be noted that Enguage is not a
suitable candidate for addressing the Winograd Schema Challenge as it is an interactive tool, whereas
the Challenge requires a preconfigured, unattended program.
This document summarizes Anabela Barreiro's PhD defense on using automated paraphrasing to improve machine translation, specifically for support verb constructions. It discusses how paraphrasing support verb constructions into semantically related verbs can simplify language and reduce ambiguity, improving machine translation quality. The thesis presents work formalizing support verb constructions and generating paraphrases, and experiments showing paraphrasing improved machine translation results by 21-31%. It suggests areas for future work expanding linguistic knowledge and paraphrasing capabilities.
This paper performs a detailed analysis on the alignment of Portuguese contractions, based on a previously aligned bilingual corpus. The alignment task was performed manually in a subset of the English-Portuguese CLUE4Translation Alignment Collection. The initial parallel corpus was pre-processed and a decision was made as to whether the contraction should be maintained or decomposed in the alignment. Decomposition was required in the cases in which the two words that have been concatenated, i.e., the preposition and the determiner or pronoun, go in two separate translation alignment pairs (PT- [no seio de] [a União Europeia] EN- [within] [the European Union]). Most contractions required decomposition in contexts where they are positioned at the end of a multiword unit. On the other hand, contractions tend to be maintained when they occur at the beginning or in the middle of the multiword unit, i.e., in the frozen part of the multiword (PT- [no que diz respeito a] EN- [with regard to] or PT- [além disso] EN-[in addition]. A correct alignment of multiwords and phrasal units containing contractions is instrumental for machine translation, paraphrasing, and variety adaptation.
This document presents the thesis proposal of Erna Olofiana Girsang, which examines non-equivalence at the word level when translating the Discovery of North Sumatera Guidebook from Bahasa Indonesia to English. The study aims to identify what types of non-equivalence are found, which types dominate, and why. It discusses relevant translation theories, strategies for dealing with non-equivalence, and previous related research. The study will use content analysis to examine the translations between the source and target languages in the guidebook.
In this presentation we will see the different paragraphs we have chosen and the reflection of each one of us about our experience of being able to translate and we will give some examples about the method, strategy and technique
This document discusses statistical machine translation (SMT). It provides an overview of key concepts in SMT including word and sentence alignment, probability, language models, and translation models. The fundamental equation of SMT is presented as using a translation model to find the best target language words for the input and a language model to build up a sentence from those words. Basic SMT architecture and the Bayes' theorem and chain rule for calculating probabilities are also covered.
- The document discusses various translation techniques and procedures such as direct translation, oblique translation, borrowing, calque, literal translation, transposition, modulation, adaptation, etc.
- It provides examples to illustrate techniques like shift/transposition, modulation, equivalence, adaptation, and combined procedures.
- Cultural words or expressions from one language may be translated using techniques like transference, naturalization, descriptive equivalent, or functional equivalent.
The document discusses various translation strategies, techniques and methods. It defines translation strategy and discusses three global strategies employed by translators. It then discusses translation methods and procedures, and defines word-for-word, literal, faithful, semantic and idiomatic translation. Direct and oblique translation techniques are also explained, including borrowing, calque, literal translation, transposition, modulation, reformulation, adaptation and compensation.
Translation Techniques from English into Romanian and RussinElena Shapa
The document discusses various translation techniques used to translate a text from one language to another. It describes techniques such as addition, where the translator adds words to specify meaning; compensation, where something lost in translation is expressed elsewhere; transposition, changing word order; and modulation, using a different phrase to convey the same idea. Specific examples are provided to illustrate each technique.
This document discusses methods for evaluating language models, including intrinsic and extrinsic evaluation. Intrinsic evaluation involves measuring a model's performance on a test set using metrics like perplexity, which is based on how well the model predicts the test set. Extrinsic evaluation embeds the model in an application and measures the application's performance. The document also covers techniques for dealing with unknown words like replacing low-frequency words with <UNK> and estimating its probability from training data.
This document discusses principles and methods of translation. It begins with Catford's definition of translation as replacing textual material in one language with equivalent material in another. It discusses issues like translation equivalence, types of translation like semantic and communicative translation, and structural elements like situational features, semantic structure, and levels of word meaning. It also examines sentence structure in English and implications for translation, including theme, subject types, and notions of structure across languages. Finally, it briefly discusses language varieties such as dialects, registers, styles, and modes that impact translation.
Fidelity refers to accurately rendering the meaning of the source text without adding or subtracting from it, while transparency pertains to making the translation appear as if it was originally written in the target language. These two ideals are often at odds. Machine translation aims for fidelity but can fail to convey the message properly, while adaptation prioritizes transparency but may sacrifice parts of the intended meaning. Translators must make choices between strategies like transposition, loan words, and adaptation to balance both fidelity and transparency.
The document outlines the 5 phases of natural language processing (NLP):
1. Morphological analysis breaks text into paragraphs, sentences, words and assigns parts of speech.
2. Syntactic analysis checks grammar and parses sentences.
3. Semantic analysis focuses on literal word and phrase meanings.
4. Discourse integration considers the effect of previous sentences on current ones.
5. Pragmatic analysis discovers intended effects by applying cooperative dialogue rules.
This document discusses translation procedures, which are methods applied by translators to formulate equivalences when transferring meaning from the source text to the target text for sentences and smaller language units. It outlines 22 translation procedures proposed by translation scholars, including loan, calque, literal translation, transposition, modulation, equivalence, adaptation, and omission. The procedures involve operations such as transferring terms, changing grammar categories, using synonyms, and modifying word order between the source and target languages.
This document provides an overview of the Minimalist Program (MP) proposed by Chomsky in 1993. It discusses the redundant and necessary levels of representation, including Logical Form and Phonetic Form. Principles like economy of derivation and economy of representation are explained. The document also covers topics like phrase structure, movements, feature checking, and the Full Interpretation Principle in MP. The conclusion states that MP aims to minimize theoretical concepts in syntax to achieve universality of grammar.
The document discusses various technical components of the translation process. It describes translation as involving interpreting the source text, applying skills to render the meaning in the target language, and re-expressing that meaning. The document outlines different options for translation, including direct/literal translation and oblique translation. It also distinguishes between factual knowledge of languages and procedural knowledge of translation techniques.
Phrase structure grammar models the internal structure of sentences in a hierarchical organization. It represents sentences as consisting of phrases, which are made up of words, which are made up of morphemes and phonemes. Phrase structure grammars use rewrite rules to break down syntactic structures into their constituent parts in a step-by-step manner. Deep structure represents the underlying meaning of a sentence, while surface structure is the actual form used. Transformational rules derive surface structure from deep structure.
8. Qun Liu (DCU) Hybrid Solutions for TranslationRIILP
The document provides an overview of hybrid machine translation approaches. It discusses selective machine translation which selects the best translation from multiple systems. Pipelined machine translation uses one system for pre-processing or post-processing of another system. Statistical post-editing uses statistical machine translation as a post-editor for rule-based machine translation outputs to improve the translation quality.
The document discusses various theories and models of translation shifts. It describes Vinay and Darbelnet's model which identifies two translation strategies - direct translation and oblique translation. It also discusses Catford's theory of level and category shifts. Additionally, it summarizes Van Leuven-Zwart's comparative and descriptive model of translation shifts which examines shifts at the micro and macro levels. The document provides details on different translation techniques like transposition, modulation, equivalence, and adaptation.
The document discusses context-free grammars for modeling English syntax. It introduces key concepts like constituency, grammatical relations, and subcategorization. Context-free grammars use rules and symbols to generate sentences. They consist of terminal symbols (words), non-terminal symbols (phrases), and rules to expand non-terminals. Context-free grammars can model syntactic knowledge and generate sentences in both a top-down and bottom-up manner through parsing.
Natural Language processing Parts of speech tagging, its classes, and how to ...Rajnish Raj
Part of speech (POS) tagging is the process of assigning a part of speech tag like noun, verb, adjective to each word in a sentence. It involves determining the most likely tag sequence given the probabilities of tags occurring before or after other tags, and words occurring with certain tags. POS tagging is the first step in many NLP applications and helps determine the grammatical role of words. It involves calculating bigram and lexical probabilities from annotated corpora to find the tag sequence with the highest joint probability.
Some problems of ambiguity in translation with reference to english and arabicfalah_hasan77
1. Ambiguity in translation refers to words, terms or concepts that have more than one possible meaning. This can cause unclear or misleading interpretations when translating between languages.
2. Some common causes of ambiguity include pronouns without clear referents, words with multiple meanings, and syntactic structures that can have more than one interpretation.
3. There are two main types of ambiguity - lexical, which occurs with individual words, and structural, which occurs with phrases or sentences that can have multiple syntactic structures. Identifying and addressing ambiguity is an important part of accurate translation.
This document discusses parsing with context-free grammars. It begins by introducing context-free grammars and their use in parsing sentences. It then discusses parsing as a search problem, and presents top-down and bottom-up parsing algorithms. Top-down parsing builds trees from the root node down, while bottom-up parsing builds trees from the leaves up. Both approaches have advantages and disadvantages related to efficiency. The document also introduces probabilistic context-free grammars, which augment grammars with rule probabilities, and discusses how these can be used for disambiguation.
This document describes a sentiment analysis tool developed by Ravindra Chaudhary and Sachin Singh under the guidance of Mrs. Smita Tiwari. It uses a Naive Bayes classifier to analyze tweets and classify the sentiment as positive, negative, or neutral. The methodology involves collecting tweets using the Twitter API, preprocessing the text by removing URLs, hashtags, numbers, and other unnecessary words. Features are then extracted such as capitalized words and emoticons. The preprocessed text and features are fed into the Naive Bayes classifier to predict the sentiment. The tool was implemented using technologies like NET BEANS IDE, WAMP SERVER, MYSQL, HTML5 and CSS. Future work could involve converting it to a
Sentiment analysis on twitter
The document discusses sentiment analysis on tweets. It introduces sentiment analysis and why it is needed, particularly for promotion, products, politics and prediction. It describes Twitter terminology and presents a system architecture for sentiment analysis on tweets that includes preprocessing steps like removing URLs and tags, spell correction, emoticon tagging, part-of-speech tagging, and a scoring module using corpus-based and dictionary-based approaches to determine sentiment scores and classify tweets as positive, negative or neutral. Examples are provided to illustrate the sentiment analysis process.
Sentiment analysis using naive bayes classifier Dev Sahu
This ppt contains a small description of naive bayes classifier algorithm. It is a machine learning approach for detection of sentiment and text classification.
The document discusses various translation strategies, techniques and methods. It defines translation strategy and discusses three global strategies employed by translators. It then discusses translation methods and procedures, and defines word-for-word, literal, faithful, semantic and idiomatic translation. Direct and oblique translation techniques are also explained, including borrowing, calque, literal translation, transposition, modulation, reformulation, adaptation and compensation.
Translation Techniques from English into Romanian and RussinElena Shapa
The document discusses various translation techniques used to translate a text from one language to another. It describes techniques such as addition, where the translator adds words to specify meaning; compensation, where something lost in translation is expressed elsewhere; transposition, changing word order; and modulation, using a different phrase to convey the same idea. Specific examples are provided to illustrate each technique.
This document discusses methods for evaluating language models, including intrinsic and extrinsic evaluation. Intrinsic evaluation involves measuring a model's performance on a test set using metrics like perplexity, which is based on how well the model predicts the test set. Extrinsic evaluation embeds the model in an application and measures the application's performance. The document also covers techniques for dealing with unknown words like replacing low-frequency words with <UNK> and estimating its probability from training data.
This document discusses principles and methods of translation. It begins with Catford's definition of translation as replacing textual material in one language with equivalent material in another. It discusses issues like translation equivalence, types of translation like semantic and communicative translation, and structural elements like situational features, semantic structure, and levels of word meaning. It also examines sentence structure in English and implications for translation, including theme, subject types, and notions of structure across languages. Finally, it briefly discusses language varieties such as dialects, registers, styles, and modes that impact translation.
Fidelity refers to accurately rendering the meaning of the source text without adding or subtracting from it, while transparency pertains to making the translation appear as if it was originally written in the target language. These two ideals are often at odds. Machine translation aims for fidelity but can fail to convey the message properly, while adaptation prioritizes transparency but may sacrifice parts of the intended meaning. Translators must make choices between strategies like transposition, loan words, and adaptation to balance both fidelity and transparency.
The document outlines the 5 phases of natural language processing (NLP):
1. Morphological analysis breaks text into paragraphs, sentences, words and assigns parts of speech.
2. Syntactic analysis checks grammar and parses sentences.
3. Semantic analysis focuses on literal word and phrase meanings.
4. Discourse integration considers the effect of previous sentences on current ones.
5. Pragmatic analysis discovers intended effects by applying cooperative dialogue rules.
This document discusses translation procedures, which are methods applied by translators to formulate equivalences when transferring meaning from the source text to the target text for sentences and smaller language units. It outlines 22 translation procedures proposed by translation scholars, including loan, calque, literal translation, transposition, modulation, equivalence, adaptation, and omission. The procedures involve operations such as transferring terms, changing grammar categories, using synonyms, and modifying word order between the source and target languages.
This document provides an overview of the Minimalist Program (MP) proposed by Chomsky in 1993. It discusses the redundant and necessary levels of representation, including Logical Form and Phonetic Form. Principles like economy of derivation and economy of representation are explained. The document also covers topics like phrase structure, movements, feature checking, and the Full Interpretation Principle in MP. The conclusion states that MP aims to minimize theoretical concepts in syntax to achieve universality of grammar.
The document discusses various technical components of the translation process. It describes translation as involving interpreting the source text, applying skills to render the meaning in the target language, and re-expressing that meaning. The document outlines different options for translation, including direct/literal translation and oblique translation. It also distinguishes between factual knowledge of languages and procedural knowledge of translation techniques.
Phrase structure grammar models the internal structure of sentences in a hierarchical organization. It represents sentences as consisting of phrases, which are made up of words, which are made up of morphemes and phonemes. Phrase structure grammars use rewrite rules to break down syntactic structures into their constituent parts in a step-by-step manner. Deep structure represents the underlying meaning of a sentence, while surface structure is the actual form used. Transformational rules derive surface structure from deep structure.
8. Qun Liu (DCU) Hybrid Solutions for TranslationRIILP
The document provides an overview of hybrid machine translation approaches. It discusses selective machine translation which selects the best translation from multiple systems. Pipelined machine translation uses one system for pre-processing or post-processing of another system. Statistical post-editing uses statistical machine translation as a post-editor for rule-based machine translation outputs to improve the translation quality.
The document discusses various theories and models of translation shifts. It describes Vinay and Darbelnet's model which identifies two translation strategies - direct translation and oblique translation. It also discusses Catford's theory of level and category shifts. Additionally, it summarizes Van Leuven-Zwart's comparative and descriptive model of translation shifts which examines shifts at the micro and macro levels. The document provides details on different translation techniques like transposition, modulation, equivalence, and adaptation.
The document discusses context-free grammars for modeling English syntax. It introduces key concepts like constituency, grammatical relations, and subcategorization. Context-free grammars use rules and symbols to generate sentences. They consist of terminal symbols (words), non-terminal symbols (phrases), and rules to expand non-terminals. Context-free grammars can model syntactic knowledge and generate sentences in both a top-down and bottom-up manner through parsing.
Natural Language processing Parts of speech tagging, its classes, and how to ...Rajnish Raj
Part of speech (POS) tagging is the process of assigning a part of speech tag like noun, verb, adjective to each word in a sentence. It involves determining the most likely tag sequence given the probabilities of tags occurring before or after other tags, and words occurring with certain tags. POS tagging is the first step in many NLP applications and helps determine the grammatical role of words. It involves calculating bigram and lexical probabilities from annotated corpora to find the tag sequence with the highest joint probability.
Some problems of ambiguity in translation with reference to english and arabicfalah_hasan77
1. Ambiguity in translation refers to words, terms or concepts that have more than one possible meaning. This can cause unclear or misleading interpretations when translating between languages.
2. Some common causes of ambiguity include pronouns without clear referents, words with multiple meanings, and syntactic structures that can have more than one interpretation.
3. There are two main types of ambiguity - lexical, which occurs with individual words, and structural, which occurs with phrases or sentences that can have multiple syntactic structures. Identifying and addressing ambiguity is an important part of accurate translation.
This document discusses parsing with context-free grammars. It begins by introducing context-free grammars and their use in parsing sentences. It then discusses parsing as a search problem, and presents top-down and bottom-up parsing algorithms. Top-down parsing builds trees from the root node down, while bottom-up parsing builds trees from the leaves up. Both approaches have advantages and disadvantages related to efficiency. The document also introduces probabilistic context-free grammars, which augment grammars with rule probabilities, and discusses how these can be used for disambiguation.
This document describes a sentiment analysis tool developed by Ravindra Chaudhary and Sachin Singh under the guidance of Mrs. Smita Tiwari. It uses a Naive Bayes classifier to analyze tweets and classify the sentiment as positive, negative, or neutral. The methodology involves collecting tweets using the Twitter API, preprocessing the text by removing URLs, hashtags, numbers, and other unnecessary words. Features are then extracted such as capitalized words and emoticons. The preprocessed text and features are fed into the Naive Bayes classifier to predict the sentiment. The tool was implemented using technologies like NET BEANS IDE, WAMP SERVER, MYSQL, HTML5 and CSS. Future work could involve converting it to a
Sentiment analysis on twitter
The document discusses sentiment analysis on tweets. It introduces sentiment analysis and why it is needed, particularly for promotion, products, politics and prediction. It describes Twitter terminology and presents a system architecture for sentiment analysis on tweets that includes preprocessing steps like removing URLs and tags, spell correction, emoticon tagging, part-of-speech tagging, and a scoring module using corpus-based and dictionary-based approaches to determine sentiment scores and classify tweets as positive, negative or neutral. Examples are provided to illustrate the sentiment analysis process.
Sentiment analysis using naive bayes classifier Dev Sahu
This ppt contains a small description of naive bayes classifier algorithm. It is a machine learning approach for detection of sentiment and text classification.
The document sends 10 roses to represent wishes for the new year. Each rose symbolizes a different wish - the first for friendship, second for love, third for wealth, and so on up to the tenth rose wishing for a long, healthy, and prosperous life. It encourages sending the roses to others to wish them a happy Valentine's Day in 2007.
A Fuzzy Approach For Multi-Domain Sentiment AnalysisMauro Dragoni
An emerging field within Sentiment Analysis concerns the investigation about how sentiment polarities towards concepts have to be adapted with respect to the different domains in which they are used. In this paper, we explore the use of fuzzy logic for modeling concept polarities, and the uncertainty associated with them, with respect to different domains. The approach is based on the use of a knowledge graph built by combining two linguistic resources, namely WordNet and SenticNet. Such a knowledge graph is then exploited by a graph-propagation algorithm that propagates sentiment information learned from labeled datasets. The system implementing the proposed approach has been evaluated on the Blitzer dataset by demonstrating its viability in real-world cases.
The document presents a group project on analyzing the relationship between tweet sentiments and stock prices. It discusses extracting data from Twitter, applying sentiment analysis techniques to label tweets as positive, negative or neutral, and extracting stock price data from Yahoo Finance. It then analyzes the correlation between sentiment scores and stock prices, and evaluates several classification models for predicting stock market movement. Key findings include sentiment being more predictive of market highs than lows, and correlation increasing when tweets are highly relevant to specific company events. Limitations and opportunities for future improvement are also outlined.
Sentiment mining- The Design and Implementation of an Internet PublicOpinion...Prateek Singh
Sentiment mining paper presentation, database mining and business intelligence.
The Design and Implementation of an Internet PublicOpinion Monitoring and Analysing System
Mike davies sentiment_analysis_presentation_backupm1ked
The document discusses using sentiment analysis on tweets to predict time series data like stock markets or box office success. It involves 3 parts: 1) classifying tweet sentiment, 2) building a network of Twitter users, and 3) finding a time series of sentiment for each user. Methods discussed include classifying tweets as spam/not spam and objective/subjective/positive/negative, using an algorithm to extract opinion words and targets, and using community detection and LDA to analyze the Twitter user network. The end goal is to use the sentiment time series to predict real-world time series data.
"Naive Bayes Classifier" @ Papers We Love BucharestStefan Adam
The document summarizes the Naive Bayes classifier machine learning algorithm. It discusses how the Naive Bayes classifier was inspired by the "library problem" of the 1960s, where documents needed to be probabilistically ranked based on tags. The key points of the Naive Bayes classifier are:
- It makes the naive assumption that attributes are independent given the class.
- It uses Bayes' theorem to calculate the probability of a class given attribute values in order to classify new examples.
- It can handle both discrete and continuous-valued attributes using categorical and normal distributions.
- It has been shown to be very effective for problems like spam filtering and sentiment analysis despite the independence assumption often being violated.
This document discusses sentiment analysis on Twitter data using machine learning techniques. It begins with introducing sentiment analysis and its goals for Twitter data, including determining if tweets convey positive, negative, or neutral sentiment. It then outlines the challenges of analyzing Twitter data and its approach, which includes downloading tweets, preprocessing, feature extraction, and using an SVM classifier. It finds its feature-based model performs better than the baseline model, with an accuracy of 57.85% and F1 score of 61.17% for sentence-level sentiment classification. The tools used include Python, Java, LIBSVM, NLTK, and the Twitter API.
This document discusses sentiment analysis on Twitter data using machine learning classifiers. It describes Twitter sentiment analysis as determining if a tweet is positive, negative, or neutral. Some challenges are that people express opinions complexly using sarcasm, irony, and slang. The document tests different classifiers like Naive Bayes and SVM on Twitter data preprocessed by tokenizing, extracting sentiment features, and part-of-speech tagging. It finds that extracting more features like sentiment and part-of-speech tags along with an SVM classifier achieves the best accuracy of 68% at determining tweet sentiment.
1. The Naive Bayes classifier is a simple probabilistic classifier based on Bayes' theorem that assumes independence between features.
2. It has various applications including email spam detection, language detection, and document categorization.
3. The Naive Bayes approach involves computing the class prior probabilities, feature likelihoods, and applying Bayes' theorem to calculate the posterior probabilities to classify new instances. Laplace smoothing is often used to handle cases with insufficient training data.
Project prSentiment Analysis of Twitter Data Using Machine Learning Approach...Geetika Gautam
This document outlines a research project on classifying user reviews for electronic gadgets using sentiment analysis. The project used Twitter data labeled as positive or negative and preprocessed, extracted features from, and trained classifiers on this data. Naive Bayes, maximum entropy, and support vector machines were evaluated, with Naive Bayes achieving the best accuracy of 88.2%. Adding semantic analysis using WordNet further improved accuracy to 89.9%. The results were analyzed and future work proposed to expand the training data and use WordNet for summarization.
This document outlines an approach for using web content mining techniques for Arabic text classification. It begins with introductions to web mining and its subfields like web content mining. It discusses related work on text classification in different languages, including a few prior studies on Arabic text classification. The document then describes building an Arabic text corpus from online newspapers and preprocessing steps. It proposes using machine learning algorithms like Naive Bayes and K-Nearest Neighbor for Arabic text classification and evaluating accuracy through cross-validation. The full document provides details on the proposed method and evaluation plan to classify Arabic texts using web content mining techniques.
This document discusses Arabic tokenization and stemming. It presents research on tokenizing and stemming Arabic text. For tokenization, the researchers used a bigram approach that achieved 98.83% accuracy on a dataset of 29,092 tokens. For stemming, they evaluated root-based, light, n-gram, and hybrid approaches. Their hybrid method, which incorporates roots and light stemming with n-grams, performed best, achieving 82.33% accuracy for Arabic text categorization. The document concludes the hybrid approach is effective for Arabic processing tasks.
This document summarizes a survey of sentiment analysis in Arabic. It outlines that sentiment analysis analyzes opinions and emotions from text. While most systems are for English, there are few Arabic resources. Arabic is spoken by over 300 million people and is growing online. The document discusses subjectivity and sentiment processes in Arabic, including tokenization, stemming, stop words removal and sentiment classifications. It also outlines experiments on Arabic text corpora, the importance of sentiment analysis for financial markets, and challenges of unbalanced sentiment data in Arabic. Classification algorithms like Naive Bayes and SVMs are evaluated on sentiment categories with validation using confusion matrices.
Scalable sentiment classification for big data analysis using naive bayes cla...Tien-Yang (Aiden) Wu
The document discusses evaluating the scalability of the Naive Bayes classifier for sentiment analysis on large datasets. It presents the Naive Bayes classification method, which uses Bayes' theorem with independence assumptions between features. It then describes implementing Naive Bayes in Hadoop for sentiment classification of movie reviews at scale, including preprocessing data, calculating word frequencies, and predicting sentiment. An experimental study tested the implementation on a Hadoop cluster with over 1,000 positive and 1,000 negative reviews for training.
The document discusses a proposed study exploring the role of collaborative meta-talk in developing argument skills. It summarizes previous research showing that practice in dialogic argumentation helps adolescents develop argument skills and recognition of strategies over time. The study would analyze discussion transcripts from student pairs to examine how meta-talk functions and whether its role is to provide a bi-directional zone of proximal development. It hypothesizes that student pairs who work together over multiple sessions (Stay condition) will engage in more meta-talk focused on jointly regulating their argumentation compared to pairs who change partners each session (Switch condition).
The document discusses language production and summarizes key points in 3 sentences:
Language production involves conceptualizing thoughts, formulating linguistic plans by selecting words and structures, and implementing plans through articulation. Evidence from eye movements, slips of the tongue, and self-repairs suggests language production involves parallel planning at multiple linguistic levels from meaning to sounds. Models of speech production propose different views on whether planning proceeds incrementally from smaller units or begins with larger syntactic structures.
Natural language processing (NLP) aims to help computers understand human language. Ambiguity is a major challenge for NLP as words and sentences can have multiple meanings depending on context. There are different types of ambiguity including lexical ambiguity where a word has multiple meanings, syntactic ambiguity where sentence structure is unclear, and semantic ambiguity where meaning depends on broader context. NLP techniques like part-of-speech tagging and word sense disambiguation aim to resolve ambiguity by analyzing context.
Sentiment Analysis in Twitter with Lightweight Discourse AnalysisNaveen Kumar
This document presents an approach to sentiment analysis in Twitter that incorporates lightweight discourse analysis. It introduces discourse relations and semantic operators that are important for sentiment analysis. An algorithm is proposed that uses these features to create vectors for classification. The approach is evaluated on three datasets, showing improved accuracy over baseline methods. Specifically, the approach uses discourse conjunctions, modals, conditionals and negation to modify feature weights and polarity. Classification is done both lexicon-based and with supervised learning. Evaluation demonstrates accuracy gains of 2-4% over baselines by incorporating discourse information.
This document provides an overview of natural language processing (NLP). It discusses topics like natural language understanding, text categorization, syntactic analysis including parsing and part-of-speech tagging, semantic analysis, and pragmatic analysis. It also covers corpus-based statistical approaches to NLP, measuring performance, and supervised learning methods. The document outlines challenges in NLP like ambiguity and knowledge representation.
Discourse analysis examines language use in context. It looks at both spoken and written language beyond the sentence level. Discourse is a social interaction between speakers, while a text is simply a written message. Discourse analysis uses tools like cohesion, coherence, background knowledge, and conversation analysis to interpret meaning. It considers aspects like anaphoric references, speech events, and Grice's cooperative principle to understand what a speaker intends to communicate. Discourse analysis is useful for interpreting meaning from both grammatical and ungrammatical language use.
This document discusses discourse coherence and the strategies used for interpreting discourse. It defines key terms like inference, background knowledge, explicature, and implicature. Inference refers to information not explicitly stated but implied in a discourse. Background knowledge is what someone already knows about a topic to help them understand new information. Explicature is the explicit information in a text, while implicature is the implied meaning derived from context. Coherence in discourse relies on inferences made using background knowledge. Discourse markers and punctuation help connect ideas and establish relationships between parts of a text.
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...CITE
5 March 2010 (Friday) | 09:00 - 12:30 | http://citers2010.cite.hku.hk/abstract/69 | Dr. Kwok Ping CHAN, Associate Professor, Department of Computer Science, HKU
This document provides an overview of the different levels of linguistic analysis, including phonetics, phonology, morphology, lexicon, syntax, semantics, pragmatics, and discourse.
It defines each level and discusses their basic units and organizing structures. For example, it states that syntax analyzes sentence structure and has phrases as its basic units, organized by syntactic operations. Semantics examines meaning within language and has no set units, but aspects like denotation and reference. Pragmatics studies meaning beyond language to speech acts, using components like implicatures.
The document also gives examples to illustrate each level, such as speech acts for pragmatics, and deixis, politeness, and enunciation for
This document provides an overview of natural language processing (NLP) and the use of deep learning for NLP tasks. It discusses how deep learning models can learn representations and patterns from large amounts of unlabeled text data. Deep learning approaches are now achieving superior results to traditional NLP methods on many tasks, such as named entity recognition, machine translation, and question answering. However, deep learning models do not explicitly model linguistic knowledge. The document outlines common NLP tasks and how deep learning algorithms like LSTMs, CNNs, and encoder-decoder models are applied to problems involving text classification, sequence labeling, and language generation.
Discourse analysis is the study of language use in context. It focuses on how spoken and written language is structured and how meaning is derived based on context. Key aspects of discourse analysis include cohesion, coherence, speech events, background knowledge, conversational interaction, and the cooperation principle. Cohesion refers to linguistic ties like anaphora that link parts of discourse. Coherence relies on background knowledge to interpret meaning. Speech events and interactions provide social context. The cooperation principle proposes conversational maxims like being relevant and clear that aid understanding. Overall, discourse analysis examines language patterns and pragmatics to interpret intended meanings in context.
The document discusses how collaborative metacognitive talk can help develop argument skills. It proposes studying student pairs' metacognitive conversations during argument tasks to explore if longer partnering leads to more regulatory talk. Two frameworks are considered: Piagetian cognitive conflict through disagreement talk; and Vygotskyan perspective of partners forming a bidirectional learning zone through regulatory language to scaffold understanding. Hypotheses predict longer partnered students will have more metacognitive versus topic talk, and their talk will become more reciprocal and focused on argument norms over time.
This paper presents a model for Chinese word segmentation that integrates it as part of sentence analysis using a parser. The model achieves high accuracy by resolving most ambiguities at the lexical level using dictionary information, but handles cases requiring syntactic context in the parsing process. The complexity usually associated with parsing is reduced by pruning implausible segmentations prior to parsing. The approach is implemented in a natural language understanding system developed at Microsoft Research.
DDH 2021-03-03: Text Processing and Searching in the Medical DomainLuukBoulogne
This document summarizes Gianmaria Silvello's presentation on text processing and searching in the medical domain. It includes an introduction to text processing and outlines the typical text processing pipeline which includes steps like tokenization, stopword removal, stemming, part-of-speech tagging, and named entity recognition. It then provides an example of applying this pipeline to a short medical report about a colon biopsy. Finally, it discusses term representations and how distributed representations are used to define similarity between terms in order to represent their meanings.
The document defines discourse and discusses various aspects of discourse analysis. It defines discourse as language beyond the sentence level, linked to social practices, and as a system of thought. It discusses constructions of discourse through linguistic, cognitive, and interactional processes. Discourse analysis focuses on patterns of sentences and units in texts constructed through social interactions. The goals, types, and structures of spoken and written discourse are examined, including sequential/distributional analysis, repair/recipient design, adjacency pairs, narratives, and registers.
The document discusses stepwise methodologies for building ontologies. It outlines common steps such as identifying the purpose and scope, capturing concepts and relationships, coding the ontology formally, integrating existing ontologies, evaluation, and documentation. It emphasizes starting with a middle-out approach to capture definitions and discusses reaching consensus among those involved in building the ontology. Modularization of ontologies into reusable components is also presented as an important aspect of the methodology.
This document presents a model for Chinese word segmentation that integrates it as part of sentence analysis using a parser. The model uses a parser to resolve ambiguities that require syntactic information from the full sentence. Most ambiguities are resolved at the lexical level using dictionary information, reducing complexity for the parser. The model prioritizes parsing efficiency by only presenting unambiguous words and postponing ambiguous words to the parsing stage when needed. It is implemented in a natural language understanding system.
PRONOUN DISAMBIGUATION: WITH APPLICATION TO THE WINOGRAD SCHEMA CHALLENGEijnlc
A value-based approach to Natural Language Understanding, in particular, the disambiguation of pronouns, is illustrated with a solution to a typical example from the Winograd Schema Challenge. The worked example uses a language engine, Enguage, to support the articulation of the advocation and fearing of violence. The example illustrates the indexical nature of pronouns, and how their values, their referent objects, change because they are set by contextual data. It must be noted that Enguage is not a suitable candidate for addressing the Winograd Schema Challenge as it is an interactive tool, whereas the Challenge requires a preconfigured, unattended program.
The paper deals with the scope of linguistic description. It thus highlights the idea of idealization and how models of linguistic descriptions rely thoroughly on abstracting linguistic data.
XtremeDistil: Multi-stage Distillation for Massive Multilingual ModelsSubhabrata Mukherjee
Massive distillation of pre-trained language models like multilingual BERT with 35x compression and 51x speedup (98% smaller and faster) retaining 95% F1-score over 41 languages
OpenTag: Open Attribute Value Extraction From Product ProfilesSubhabrata Mukherjee
Guineng Zheng, Subhabrata Mukherjee, Xin Luna Dong, Feifei Li
KDD 2018, London, UK
OpenTag brings deep learning and active learning together for state-of-the-art imputation and open entity extraction system.
Probabilistic Graphical Models for Credibility Analysis in Evolving Online Co...Subhabrata Mukherjee
One of the major hurdles preventing the full exploitation of information from online communities is the widespread concern regarding the quality and credibility of user-contributed content. We propose probabilistic graphical models that can leverage the joint interplay between multiple factors --- like user interactions, community dynamics, and textual content --- to automatically assess the credibility of user-contributed online information, expertise of users and their evolution with user-interpretable explanation. We devise new models based on Conditional Random Fields that enable applications such as extracting reliable side-effects of drugs from user-contributed posts in health forums, and identifying credible news articles in news forums.
Online communities are dynamic, as users join and leave, adapt to evolving trends, and mature over time. To capture this dynamics, we propose generative models based on Hidden Markov Model, Latent Dirichlet Allocation, and Brownian Motion to trace the continuous evolution of user expertise and their language model over time. This allows us to identify expert users and credible content jointly over time, improving state-of-the-art recommender systems by explicitly considering the maturity of users. This enables applications such as identifying useful product reviews, and detecting fake and anomalous reviews with limited information.
Continuous Experience-aware Language Model
Subhabrata Mukherjee, Stephan Günnemann and Gerhard Weikum
Proc. of the 22nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD). 2016
Experience aware Item Recommendation in Evolving Review CommunitiesSubhabrata Mukherjee
Experience aware Item Recommendation in Evolving Review Communities
Subhabrata Mukherjee, Hemank Lamba and Gerhard Weikum
IEEE International Conference in Data Mining (ICDM) 2015
Domain Cartridge: Unsupervised Framework for Shallow Domain Ontology Construc...Subhabrata Mukherjee
Subhabrata Mukherjee, Jitendra Ajmera and Sachindra Joshi.
Domain Cartridge: Unsupervised Framework for Shallow Domain Ontology Construction from Corpus
Proc. of the 23rd ACM International Conference on Information and Knowledge Management (CIKM). 2014.
Leveraging Joint Interactions for Credibility Analysis in News CommunitiesSubhabrata Mukherjee
Leveraging Joint Interactions for Credibility Analysis in News Communities,
Subhabrata Mukherjee and Gerhard Weikum,
Max Planck Institute for Informatics,
CIKM 2015
People on Drugs: Credibility of User Statements in Health ForumsSubhabrata Mukherjee
People on Drugs: Credibility of User Statements in Health Communities. Subhabrata Mukherjee, Gerhard Weikum and Cristian Danescu-Niculescu-Mizil. Proc. of the 20th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD). 2014
Author-Specific Hierarchical Sentiment Aggregation for Rating Prediction of R...Subhabrata Mukherjee
Author-Specific Hierarchical Sentiment Aggregation for Rating Prediction of Reviews, Subhabrata Mukherjee and Sachindra Joshi, In Proc. of the 9th edition of the Language Resources and Evaluation Conference (LREC 2014), Reykjavik, Iceland, May 26-31, 2014
Joint Author Sentiment Topic Model, Subhabrata Mukherjee, Gaurab Basu and Sachindra Joshi, In Proc. of the SIAM International Conference in Data Mining (SDM 2014), Pennsylvania, USA, Apr 24-26, 2014 [http://people.mpi-inf.mpg.de/~smukherjee/jast.pdf]
TwiSent: A Multi-Stage System for Analyzing Sentiment in TwitterSubhabrata Mukherjee
TwiSent: A Multi-Stage System for Analyzing Sentiment in Twitter, Subhabrata Mukherjee, Akshat Malu, Balamurali A.R. and Pushpak Bhattacharyya, In Proceedings of The 21st ACM Conference on Information and Knowledge Management (CIKM 2012), Hawai, Oct 29 - Nov 2, 2012 (http://www.cse.iitb.ac.in/~pb/papers/cikm2012-twisent.pdf)
Adaptation of Sentiment Analysis to New Linguistic Features, Informal Languag...Subhabrata Mukherjee
Adaptation of Sentiment Analysis to New Linguistic Features, Informal Language Form and World Knowledge, Subhabrata Mukherjee and Pushpak Bhattacharyya, Master's Thesis, IIT Bombay, Dept. of Computer Science and Engineering
Leveraging Sentiment to Compute Word Similarity, Balamurali A.R., Subhabrata Mukherjee, Akshat Malu and Pushpak Bhattacharyya, In Proceedings of the 6th International Global Wordnet Conference (GWC 2011), Matsue, Japan, Jan, 2012 (http://www.cse.iitb.ac.in/~pb/papers/gwc12-sense-sa.pdf)
WikiSent : Weakly Supervised Sentiment Analysis Through Extractive Summarizat...Subhabrata Mukherjee
WikiSent : Weakly Supervised Sentiment Analysis Through Extractive Summarization With Wikipedia, Subhabrata Mukherjee and Pushpak Bhattacharyya, In Proceedings of the European Conference on Machine Learning (ECML PKDD 2012), Bristol, U.K., 24-28 Sept, 2012 (http://www.cs.bris.ac.uk/~flach/ECMLPKDD2012papers/1125567.pdf)
Feature Specific Sentiment Analysis for Product Reviews, Subhabrata Mukherjee and Pushpak Bhattacharyya, In Proceedings of the 13th International Conference on Intelligent Text Processing and Computational Intelligence (CICLING 2012), New Delhi, India, March, 2012 (http://www.cse.iitb.ac.in/~pb/papers/cicling12-feature-specific-sa.pdf)
YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...Subhabrata Mukherjee
YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data & User Comments using WordNet & Wikipedia, Subhabrata Mukherjee and Pushpak Bhattacharyya, In Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012), IIT Bombay, Mumbai, Dec 8 - Dec 15, 2012 (Long Paper)
What is an RPA CoE? Session 1 – CoE VisionDianaGray10
In the first session, we will review the organization's vision and how this has an impact on the COE Structure.
Topics covered:
• The role of a steering committee
• How do the organization’s priorities determine CoE Structure?
Speaker:
Chris Bolin, Senior Intelligent Automation Architect Anika Systems
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
The Microsoft 365 Migration Tutorial For Beginner.pptxoperationspcvita
This presentation will help you understand the power of Microsoft 365. However, we have mentioned every productivity app included in Office 365. Additionally, we have suggested the migration situation related to Office 365 and how we can help you.
You can also read: https://www.systoolsgroup.com/updates/office-365-tenant-to-tenant-migration-step-by-step-complete-guide/
"Scaling RAG Applications to serve millions of users", Kevin GoedeckeFwdays
How we managed to grow and scale a RAG application from zero to thousands of users in 7 months. Lessons from technical challenges around managing high load for LLMs, RAGs and Vector databases.
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving
What began over 115 years ago as a supplier of precision gauges to the automotive industry has evolved into being an industry leader in the manufacture of product branding, automotive cockpit trim and decorative appliance trim. Value-added services include in-house Design, Engineering, Program Management, Test Lab and Tool Shops.
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...DanBrown980551
This LF Energy webinar took place June 20, 2024. It featured:
-Alex Thornton, LF Energy
-Hallie Cramer, Google
-Daniel Roesler, UtilityAPI
-Henry Richardson, WattTime
In response to the urgency and scale required to effectively address climate change, open source solutions offer significant potential for driving innovation and progress. Currently, there is a growing demand for standardization and interoperability in energy data and modeling. Open source standards and specifications within the energy sector can also alleviate challenges associated with data fragmentation, transparency, and accessibility. At the same time, it is crucial to consider privacy and security concerns throughout the development of open source platforms.
This webinar will delve into the motivations behind establishing LF Energy’s Carbon Data Specification Consortium. It will provide an overview of the draft specifications and the ongoing progress made by the respective working groups.
Three primary specifications will be discussed:
-Discovery and client registration, emphasizing transparent processes and secure and private access
-Customer data, centering around customer tariffs, bills, energy usage, and full consumption disclosure
-Power systems data, focusing on grid data, inclusive of transmission and distribution networks, generation, intergrid power flows, and market settlement data
Session 1 - Intro to Robotic Process Automation.pdfUiPathCommunity
👉 Check out our full 'Africa Series - Automation Student Developers (EN)' page to register for the full program:
https://bit.ly/Automation_Student_Kickstart
In this session, we shall introduce you to the world of automation, the UiPath Platform, and guide you on how to install and setup UiPath Studio on your Windows PC.
📕 Detailed agenda:
What is RPA? Benefits of RPA?
RPA Applications
The UiPath End-to-End Automation Platform
UiPath Studio CE Installation and Setup
💻 Extra training through UiPath Academy:
Introduction to Automation
UiPath Business Automation Platform
Explore automation development with UiPath Studio
👉 Register here for our upcoming Session 2 on June 20: Introduction to UiPath Studio Fundamentals: https://community.uipath.com/events/details/uipath-lagos-presents-session-2-introduction-to-uipath-studio-fundamentals/
Essentials of Automations: Exploring Attributes & Automation ParametersSafe Software
Building automations in FME Flow can save time, money, and help businesses scale by eliminating data silos and providing data to stakeholders in real-time. One essential component to orchestrating complex automations is the use of attributes & automation parameters (both formerly known as “keys”). In fact, it’s unlikely you’ll ever build an Automation without using these components, but what exactly are they?
Attributes & automation parameters enable the automation author to pass data values from one automation component to the next. During this webinar, our FME Flow Specialists will cover leveraging the three types of these output attributes & parameters in FME Flow: Event, Custom, and Automation. As a bonus, they’ll also be making use of the Split-Merge Block functionality.
You’ll leave this webinar with a better understanding of how to maximize the potential of automations by making use of attributes & automation parameters, with the ultimate goal of setting your enterprise integration workflows up on autopilot.
"Choosing proper type of scaling", Olena SyrotaFwdays
Imagine an IoT processing system that is already quite mature and production-ready and for which client coverage is growing and scaling and performance aspects are life and death questions. The system has Redis, MongoDB, and stream processing based on ksqldb. In this talk, firstly, we will analyze scaling approaches and then select the proper ones for our system.
Discover top-tier mobile app development services, offering innovative solutions for iOS and Android. Enhance your business with custom, user-friendly mobile applications.
In the realm of cybersecurity, offensive security practices act as a critical shield. By simulating real-world attacks in a controlled environment, these techniques expose vulnerabilities before malicious actors can exploit them. This proactive approach allows manufacturers to identify and fix weaknesses, significantly enhancing system security.
This presentation delves into the development of a system designed to mimic Galileo's Open Service signal using software-defined radio (SDR) technology. We'll begin with a foundational overview of both Global Navigation Satellite Systems (GNSS) and the intricacies of digital signal processing.
The presentation culminates in a live demonstration. We'll showcase the manipulation of Galileo's Open Service pilot signal, simulating an attack on various software and hardware systems. This practical demonstration serves to highlight the potential consequences of unaddressed vulnerabilities, emphasizing the importance of offensive security practices in safeguarding critical infrastructure.
Leveraging the Graph for Clinical Trials and Standards
Sentiment Analysis in Twitter with Lightweight Discourse Analysis
1. Sentiment Analysis in Twitter with
Lightweight Discourse Analysis
Subhabrata Mukherjee, Pushpak Bhattacharyya
IBM Research Lab, India
Dept. of Computer Science and Engineering,
Indian Institute of Technology, Bombay
24th International Conference on Computational Linguistics
COLING 2012,
IIT Bombay, Mumbai, Dec 8 - Dec 15, 2012
3. 3
An important component of language comprehension
in most natural language contexts involves
connecting clauses and phrases together in order to
establish a coherent discourse (Wolf et al., 2004).
Discourse
4. 4
An important component of language comprehension
in most natural language contexts involves
connecting clauses and phrases together in order to
establish a coherent discourse (Wolf et al., 2004).
Presence of a discourse marker can alter the overall
sentiment of a sentence
Discourse
5. 5
An important component of language comprehension
in most natural language contexts involves
connecting clauses and phrases together in order to
establish a coherent discourse (Wolf et al., 2004).
Presence of a discourse marker can alter the overall
sentiment of a sentence
In most of the bag-of-words models, the discourse
markers are ignored as stop words during feature
vector creation
Discourse
6. Motivation
i'm quite excited about Tintin, despite
not really liking original comics -
probably because Joe Cornish had a
hand in
Think i'll stay with the whole 'sci-fi' shit
but this time...a classic movie.
7. Motivation Contd…
Traditional works in discourse analysis use parsing of
some form like a discourse parser or a dependency
parser
Most of these theories are well-founded for
structured text, and structured discourse annotated
corpora are available to train the models
However, using these methods for micro-blog
discourse analysis pose some fundamental difficulties
8. Micro-blogs, like Twitter, do not have any restriction
on the form and content of the user posts
Users do not use formal language to communicate in
the micro-blogs. As a result, there are abundant
spelling mistakes, abbreviations, slangs,
discontinuities and grammatical errors
The errors cause natural language processing tools
like parsers and taggers to fail frequently
Increased processing time adds an overhead to real-
time applications
Motivation Contd…
9. Discourse Relation
A coherently structured discourse is a
collection of sentences having some relation
with each other
A coherent relation reflects how different
discourse segments interact
Discourse segments are non-overlapping
spans of text
10. Discourse Coherent Relations
in generalGeneralization
also; furthermore; in addition; note (furthermore) that;
(for , in, with) which; who; (for, in, on, against, with)
whom
Elaboration
for example; for instanceExample
according to …; …said; claim that …; maintain that …;
stated
Attribution
(and) then; first, second, … before; after; whileTemporal Sequence
by contrast; butContrast
and; (and) similarlySimilarity
if…(then); as long as; whileCondition
although; but; whileViolated Expectations
because; and soCause-effect
ConjunctionsCoherence Relations
Contentful Conjunctions used to illustrate Coherence Relations (Wolf et al. 2005)
11. 1. Cause-effect: (YES! I hope she goes with Chris) so (I can freak out like I did with Emmy
Awards.)
2. Violated Expectations: (i'm quite excited about Tintin), despite (not really liking original
comics.)
3. Condition: If (MicroMax improved its battery life), (it wud hv been a gr8 product).
4. Similarity: (I lyk Nokia) and (Samsung as well).
5. Contrast: (my daughter is off school very poorly), but (brightened up when we saw you on
gmtv today).
6. Temporal Sequence: (The film got boring) after a while.
7. Attribution: (Parliament is a sausage-machine: the world) according to (Kenneth
Clarke).
8. Example: (Dhoni made so many mistakes…) for instance, (he shud’ve let Ishant bowl wn
he was peaking).
9. Elaboration: In addition (to the worthless direction), (the story lacked depth too).
10. Generalization: In general,(movies made under the RGV banner) (are not worth a
penny).
Discourse Coherent Relations Examples
12. Discourse Relations and
Sentiment Analysis
Not all discourse relations are significant for sentiment analysis
Discourse relation essential for Sentiment Analysis
That connects segments having contrasting information
Violated Expectations
That places higher importance to certain discourse segments
Inferential Conjunctions
That incorporates hypothetical situation in the context
Conditionals
Semantic Operators influencing discourse relations in Sentiment
Analysis
That incorporates hypothetical situation in the context
Modals
That negates the information in the discourse segment
Negation
13. Violated Expectations and
Contrast
Violating expectation conjunctions oppose or refute
the neighboring discourse segment
We categorize them into Conj_Fol and Conj_Prev
Conj_Fol is the set of conjunctions that give more
importance to the discourse segment that follows them
Conj_Prev is the set of conjunctions that give more
importance to the previous discourse segment
14. Violated Expectations and
Contrast
Violating expectation conjunctions oppose or refute
the neighboring discourse segment
We categorize them into Conj_Fol and Conj_Prev
Conj_Fol is the set of conjunctions that give more
importance to the discourse segment that follows them
Conj_Prev is the set of conjunctions that give more
importance to the previous discourse segment
(i'm quite excited about Tintin), despite (not really liking
original comics.)
(my daughter is off school very poorly), but (brightened up when we
saw you on gmtv today).
15. Conclusive or Inferential
Conjunctions
These are the set of conjunctions that
tend to draw a conclusion or inference
Hence, the discourse segment following
them should be given more weight
16. Conclusive or Inferential
Conjunctions
These are the set of conjunctions that
tend to draw a conclusion or inference
Hence, the discourse segment following
them should be given more weight
@User I was not much satisfied with ur so-called good phone and
subsequently decided to reject it.
17. Conditionals
Conditionals introduce a hypothetical situation in the
context
The if…then…else constructs depict situations which
may or may not happen subject to certain conditions.
In our work, the polarity of the discourse segment in
a conditional statement is toned down, in lexicon-
based classification
In supervised classifiers, the conditionals are marked
as features
18. Modals
Events that have happened, events that are happening or events that
are certain to occur are called realis events. Events that have possibly
occurred or have some probability to occur in the distant future are
called irrealis events. Modals depict irrealis events
We divide the modals into two sub-categories: Strong_Mod and
Weak_Mod.
Strong_Mod is the set of modals that express a higher degree of
uncertainty in any situation
Weak_Mod is the set of modals that express lesser degree of uncertainty
and more emphasis on certain events or situations
In our work, the polarity of the discourse segment neighboring a strong
modal is toned down in lexicon-based classification
In supervised classifiers, the modals are marked as features.
19. Modals
Events that have happened, events that are happening or events that
are certain to occur are called realis events. Events that have possibly
occurred or have some probability to occur in the distant future are
called irrealis events. Modals depict irrealis events
We divide the modals into two sub-categories: Strong_Mod and
Weak_Mod.
Strong_Mod is the set of modals that express a higher degree of
uncertainty in any situation
Weak_Mod is the set of modals that express lesser degree of uncertainty
and more emphasis on certain events or situations
In our work, the polarity of the discourse segment neighboring a strong
modal is toned down in lexicon-based classification
In supervised classifiers, the modals are marked as features.(Strong Modals): Unless I missed the announcement their God is
now featured on postage stamps, it might be a hard sell.
(Weak Modals): G.E 12 must be the most deadly General Election
for politicians ever.
20. Negation
The negation operator inverts the sentiment of the word
following it
The usual way of handling negation in SA is to consider a
window of size n (typically 3-5) and reverse the polarity of all
the words in the window
(Negation): I do not like Nokia but I like Samsung
We consider a negation window of size 5 and reverse all the
words in the window, till either the window size exceeds or a
violating expectation (or a contrast) conjunction is encountered
21. Features
not, neither, never, no, norNeg
should, ought to, need not, shall, will, mustWeak_Mod
might, could, can, would, mayStrong_Mod
IfConditionals
therefore, furthermore, consequently, thus, as a
result, subsequently, eventually, hence
Conj_Infer
till, until, despite, in spite, though, althoughConj_Prev
but, however, nevertheless, otherwise, yet, still,
nonetheless
Conj_Fol
AttributesDiscourse Relations
23. Algorithm
Conj_Fol, Conj_Infer : but, however, nevertheless, otherwise, yet,
still, nonetheless, nevertheless, therefore, furthermore,
consequently, thus, as a result, subsequently, eventually, hence
24. Algorithm
Conj_Fol, Conj_Infer : but, however, nevertheless, otherwise, yet,
still, nonetheless, nevertheless, therefore, furthermore,
consequently, thus, as a result, subsequently, eventually, hence
• Words after them are given more weightage
• Frequency count of those words is incremented by 1
• The movie looked promising+1, but it failed-2 to make an impact in the box-
office
25. Algorithm
Conj_Fol, Conj_Infer : but, however, nevertheless, otherwise, yet,
still, nonetheless, nevertheless, therefore, furthermore,
consequently, thus, as a result, subsequently, eventually, hence
Conj_Prev - till, until, despite, in spite, though, although
• Words after them are given more weightage
• Frequency count of those words is incremented by 1
• The movie looked promising+1, but it failed-2 to make an impact in the box-
office
26. Algorithm
Conj_Fol, Conj_Infer : but, however, nevertheless, otherwise, yet,
still, nonetheless, nevertheless, therefore, furthermore,
consequently, thus, as a result, subsequently, eventually, hence
Conj_Prev - till, until, despite, in spite, though, although
• Words after them are given more weightage
• Frequency count of those words is incremented by 1
• The movie looked promising+1, but it failed-2 to make an impact in the box-
office
• Words before them are given more weightage
• Frequency count of those words is incremented by 1
• India staged a marvelous victory+2 down under despite all odds-1.
27. Algorithm
Conj_Fol, Conj_Infer : but, however, nevertheless, otherwise, yet,
still, nonetheless, nevertheless, therefore, furthermore,
consequently, thus, as a result, subsequently, eventually, hence
Conj_Prev - till, until, despite, in spite, though, although
All sentences containing if are marked, in supervised classifiers
• Words after them are given more weightage
• Frequency count of those words is incremented by 1
• The movie looked promising+1, but it failed-2 to make an impact in the box-
office
• Words before them are given more weightage
• Frequency count of those words is incremented by 1
• India staged a marvelous victory+2 down under despite all odds-1.
28. Algorithm
Conj_Fol, Conj_Infer : but, however, nevertheless, otherwise, yet,
still, nonetheless, nevertheless, therefore, furthermore,
consequently, thus, as a result, subsequently, eventually, hence
Conj_Prev - till, until, despite, in spite, though, although
All sentences containing if are marked, in supervised classifiers
• Words after them are given more weightage
• Frequency count of those words is incremented by 1
• The movie looked promising+1, but it failed-2 to make an impact in the box-
office
• Words before them are given more weightage
• Frequency count of those words is incremented by 1
• India staged a marvelous victory+2 down under despite all odds-1.
• In lexicon-based classifiers, their weights are decreased
29. Algorithm
Conj_Fol, Conj_Infer : but, however, nevertheless, otherwise, yet,
still, nonetheless, nevertheless, therefore, furthermore,
consequently, thus, as a result, subsequently, eventually, hence
Conj_Prev - till, until, despite, in spite, though, although
All sentences containing if are marked, in supervised classifiers
All sentences containing strong modals are marked, in supervised
classifiers
• Words after them are given more weightage
• Frequency count of those words is incremented by 1
• The movie looked promising+1, but it failed-2 to make an impact in the box-
office
• Words before them are given more weightage
• Frequency count of those words is incremented by 1
• India staged a marvelous victory+2 down under despite all odds-1.
• In lexicon-based classifiers, their weights are decreased
30. Algorithm
Conj_Fol, Conj_Infer : but, however, nevertheless, otherwise, yet,
still, nonetheless, nevertheless, therefore, furthermore,
consequently, thus, as a result, subsequently, eventually, hence
Conj_Prev - till, until, despite, in spite, though, although
All sentences containing if are marked, in supervised classifiers
All sentences containing strong modals are marked, in supervised
classifiers
• Words after them are given more weightage
• Frequency count of those words is incremented by 1
• The movie looked promising+1, but it failed-2 to make an impact in the box-
office
• Words before them are given more weightage
• Frequency count of those words is incremented by 1
• India staged a marvelous victory+2 down under despite all odds-1.
• In lexicon-based classifiers, their weights are decreased• In lexicon-based classifiers, their weights are decreased
31. Algorithm
Conj_Fol, Conj_Infer : but, however, nevertheless, otherwise, yet,
still, nonetheless, nevertheless, therefore, furthermore,
consequently, thus, as a result, subsequently, eventually, hence
Conj_Prev - till, until, despite, in spite, though, although
All sentences containing if are marked, in supervised classifiers
All sentences containing strong modals are marked, in supervised
classifiers
Negation
• Words after them are given more weightage
• Frequency count of those words is incremented by 1
• The movie looked promising+1, but it failed-2 to make an impact in the box-
office
• Words before them are given more weightage
• Frequency count of those words is incremented by 1
• India staged a marvelous victory+2 down under despite all odds-1.
• In lexicon-based classifiers, their weights are decreased• In lexicon-based classifiers, their weights are decreased
32. Algorithm
Conj_Fol, Conj_Infer : but, however, nevertheless, otherwise, yet,
still, nonetheless, nevertheless, therefore, furthermore,
consequently, thus, as a result, subsequently, eventually, hence
Conj_Prev - till, until, despite, in spite, though, although
All sentences containing if are marked, in supervised classifiers
All sentences containing strong modals are marked, in supervised
classifiers
Negation
• Words after them are given more weightage
• Frequency count of those words is incremented by 1
• The movie looked promising+1, but it failed-2 to make an impact in the box-
office
• Words before them are given more weightage
• Frequency count of those words is incremented by 1
• India staged a marvelous victory+2 down under despite all odds-1.
• In lexicon-based classifiers, their weights are decreased• In lexicon-based classifiers, their weights are decreased
• A window of 5 is considered
• Polarity of all words in the window are reversed till another violating
expectation conjunction is encountered
• The polarity reversals are specially marked
• I do not like-1 Nokia but I like+2 Samsung.
33. Algorithm Contd…
Let a user post R consist of m sentences si (i=1…m), where each si consist of ni
words wij (i=1…m, j=1…ni)
Let fij be the weight of the word wij in sentence si, initialized to 1
The weight of a word wij is adjusted according to the presence of a discourse
marker or a semantic operator
Let flipij be a variable which indicates whether the polarity of wij should be
flipped or not
Let hypij be a variable which indicates the presence of a conditional or a strong
modal in si.
Input: Review R
Output: wij, fij , flipij , hypij
35. Supervised Classification
Support Vector Machines are used with the following features:
N-grams (N=1,2)
Stop Word Removal (except discourse markers)
Discourse Weight of Features - fij
Modal and Conditional Indicators - hypij
Stemming
Negation - flipij
Emoticons
Part-of-Speech Information
Feature Space
Lexeme - wij
Sense-Space – Synset-id(wij)
37. Datasets
Dataset 1 (Twitter – Manually Annotated)
8507 tweets over 2000 entities from 20 domains
Annotated by 4 annotators into positive, negative and
objective classes
38. Datasets
Dataset 1 (Twitter – Manually Annotated)
8507 tweets over 2000 entities from 20 domains
Annotated by 4 annotators into positive, negative and
objective classes
Dataset 2 (Twitter – Auto Annotated)
15,214 tweets collected and annotated based on hashtags
Positive hashtags - #positive, #joy, #excited, #happy
Negative hashtags - #negative, #sad, #depressed,
#gloomy,#disappointed
39. Datasets Contd…
Manually Annotated Dataset
#Positive #Negative #Objective
Not Spam
#Objective
Spam
Total
2548 1209 2757 1993 8507
Auto Annotated Dataset
#Positive #Negative Total
7348 7866 15214
40. Datasets Contd…
Dataset 3 (Travel Domain - Balamurali et al., EMNLP
2011)
Each word is manually tagged with its
disambiguated WordNet sense
Contains 595 polarity tagged documents of each
Manually Annotated Dataset
#Positive #Negative #Objective
Not Spam
#Objective
Spam
Total
2548 1209 2757 1993 8507
Auto Annotated Dataset
#Positive #Negative Total
7348 7866 15214
42. Baselines
Twitter
C-Feel-It (Joshi et al., 2011, ACL)
Travel Reviews
Balamurali et al., 2011, EMNLP
Iterative Word-Sense Disambiguation
Algorithm (Khapra et al., 2010, GWC) is
used to auto sense-annotate the words
43. Features
Let a user post R consist of m sentences si (i=1…m), where each si consist of ni
words wij (i=1…m, j=1…ni)
Let fij be the weight of the word wij in sentence si, initialized to 1
The weight of a word wij is adjusted according to the presence of a discourse
marker or a semantic operator
Let flipij be a variable which indicates whether the polarity of wij should be
flipped or not
Let hypij be a variable which indicates the presence of a conditional or a strong
modal in si.
Input: Review R
Output: wij, fij , flipij , hypij
44. Classification Results in Twitter
(Datasets 1 and 2)
Comparison with C-Feel-It (Joshi et al., ACL 2011)
45. Classification Results in Twitter
(Datasets 1 and 2)
Comparison with C-Feel-It (Joshi et al., ACL 2011)
Lexicon-based
Classification
46. Classification Results in Twitter
(Datasets 1 and 2)
Comparison with C-Feel-It (Joshi et al., ACL 2011)
Lexicon-based
Classification
47. Classification Results in Twitter
(Datasets 1 and 2)
Comparison with C-Feel-It (Joshi et al., ACL 2011)
Lexicon-based
Classification
Supervised
Classification
48. Classification Results in Twitter
(Datasets 1 and 2)
Comparison with C-Feel-It (Joshi et al., ACL 2011)
Lexicon-based
Classification
Supervised
Classification
49. Classification Results in Travel Reviews (Dataset 3)
Comparison with Balamurali et al., EMNLP 2011)49
50. Classification Results in Travel Reviews (Dataset 3)
Comparison with Balamurali et al., EMNLP 2011)50
Lexicon-based
Classification
51. Classification Results in Travel Reviews (Dataset 3)
Comparison with Balamurali et al., EMNLP 2011)51
71.78Bag-of-Words Model + Discourse
69.62Baseline Bag-of-Words Model
AccuracySentiment Evaluation Criterion
Lexicon-based
Classification
52. Classification Results in Travel Reviews (Dataset 3)
Comparison with Balamurali et al., EMNLP 2011)52
Supervised
Classification
71.78Bag-of-Words Model + Discourse
69.62Baseline Bag-of-Words Model
AccuracySentiment Evaluation Criterion
Lexicon-based
Classification
53. Classification Results in Travel Reviews (Dataset 3)
Comparison with Balamurali et al., EMNLP 2011)53
Supervised
Classification
Systems Accuracy (%)
Baseline Accuracy (Only Unigrams) 84.90
Balamurali et al., 2011 (Only IWSD Sense of Unigrams) 85.48
Balamurali et al., 2011 (Unigrams+IWSD Sense of Unigrams) 86.08
Unigrams + IWSD Sense of Unigrams+Discourse Features 88.13
71.78Bag-of-Words Model + Discourse
69.62Baseline Bag-of-Words Model
AccuracySentiment Evaluation Criterion
Lexicon-based
Classification
54. Drawbacks54
Usage of a generic lexicon in lexeme feature space
Lexicons do not have entries for interjections like wow,
duh etc. which are strong indicators of sentiment
Noisy Text (luv, gr8, spams, …)
Sparse feature space (140 chars) for supervised
classification
70% accuracy of IWSD in sense space for travel review
classification
55. Drawbacks Contd…
55
I wanted+2 to follow my dreams and ambitions+2 despite all the
obstacles-1, but I did not succeed-2.
want and ambition will get polarity +2 each, as they appear
before despite, obstacle will get polarity -1 and not succeed will
get a polarity -2 as they appear after but
Overall polarity is +1, whereas the overall sentiment should be
negative
We do not consider positional importance of a discourse marker
in the sentence and consider all markers equally important
Better give a ranking to the discourse markers based on their
positional importance