This document summarizes a presentation on sentiment classification using supervised machine learning approaches and RapidMiner. It discusses how sentiment analysis can be used for search, recommendations, market research and ad placement. A case study is described that uses RapidMiner to classify movie reviews from IMDB as positive or negative based on word vectors. Additional features like part-of-speech tags, sentiment lexicons, and document statistics are shown to improve accuracy from 85% to 86%.
Can Deep Learning solve the Sentiment Analysis ProblemMark Cieliebak
Sentiment analysis appears to be one of the easier tasks in the realm of text analytics: given a text like a tweet or product review, decide whether it contains positive or negative opinion. This task is almost trivial for humans, but it turns out to be a true challenge for automated systems. In fact, state-of-the-art sentiment analysis tools are wrong on approx. 4 out of 10 documents.
Current sentiment analysis tools are rule-based, feature-based, or combinations of both. However, recent research uses deep learning on very large sets of documents.
In this talk, we will explain the intrinsic difficulties of automated sentiment analysis; present existing solution approaches and their performance; describe an architecture for a deep learning system; and explore whether deep learning can improve sentiment analysis accuracy.
Twitter has brought much attention recently as a hot research topic in the domain of sentiment analysis. Training sentiment classifiers from tweets data often faces the data sparsity problem partly due to the large variety of short and irregular forms introduced to tweets because of the 140-character limit. In this work we propose using two different sets of features to alleviate the data sparseness problem. One is the semantic feature set where we extract semantically hidden concepts from tweets and then incorporate them into classifier training through interpolation. Another is the sentiment-topic feature set where we extract latent topics and the associated topic sentiment from tweets, then augment the original feature space with these sentiment-topics. Experimental results on the Stanford Twitter Sentiment Dataset show that both feature sets outperform the baseline model using unigrams only. Moreover, using semantic features rivals the previously reported best result. Using sentiment-topic features achieves 86.3% sentiment classification accuracy, which outperforms existing approaches.
Sentiment analysis over Twitter offers organisations and individuals a fast and effective way to monitor the publics' feelings towards them and their competitors. To assess the performance of sentiment analysis methods over Twitter a small set of evaluation datasets have been released in the last few years. In this paper we present an overview of eight publicly available and manually annotated evaluation datasets for Twitter sentiment analysis. Based on this review, we show that a common limitation of most of these datasets, when assessing sentiment analysis at target (entity) level, is the lack of distinctive sentiment annotations among the tweets and the entities contained in them. For example, the tweet ``I love iPhone, but I hate iPad'' can be annotated with a mixed sentiment label, but the entity iPhone within this tweet should be annotated with a positive sentiment label. Aiming to overcome this limitation, and to complement current evaluation datasets, we present STS-Gold, a new evaluation dataset where tweets and targets (entities) are annotated individually and therefore may present different sentiment labels. This paper also provides a comparative study of the various datasets along several dimensions including: total number of tweets, vocabulary size and sparsity. We also investigate the pair-wise correlation among these dimensions as well as their correlations to the sentiment classification performance on different datasets.
Seminar presentation made by me for the topic of 'Resources for Sentiment Analysis' at IIT Bombay. Includes a set of bonus slides for additional information which was not actually presented.
In recent times, research activities in the areas of Opinion and Sentiment analysis in natural language texts and other media are gaining ground under the umbrella of subjectivity analysis. The reason may be the huge amount of available text data in the Social Web in the forms of news, reviews, blogs, chats and even twitter. Though Sentiment analysis from natural lan-guage text is a multifaceted and multidisciplinary problem, in general, the term “sentiment” is used in reference to the automatic analysis of evaluative text.
Can Deep Learning solve the Sentiment Analysis ProblemMark Cieliebak
Sentiment analysis appears to be one of the easier tasks in the realm of text analytics: given a text like a tweet or product review, decide whether it contains positive or negative opinion. This task is almost trivial for humans, but it turns out to be a true challenge for automated systems. In fact, state-of-the-art sentiment analysis tools are wrong on approx. 4 out of 10 documents.
Current sentiment analysis tools are rule-based, feature-based, or combinations of both. However, recent research uses deep learning on very large sets of documents.
In this talk, we will explain the intrinsic difficulties of automated sentiment analysis; present existing solution approaches and their performance; describe an architecture for a deep learning system; and explore whether deep learning can improve sentiment analysis accuracy.
Twitter has brought much attention recently as a hot research topic in the domain of sentiment analysis. Training sentiment classifiers from tweets data often faces the data sparsity problem partly due to the large variety of short and irregular forms introduced to tweets because of the 140-character limit. In this work we propose using two different sets of features to alleviate the data sparseness problem. One is the semantic feature set where we extract semantically hidden concepts from tweets and then incorporate them into classifier training through interpolation. Another is the sentiment-topic feature set where we extract latent topics and the associated topic sentiment from tweets, then augment the original feature space with these sentiment-topics. Experimental results on the Stanford Twitter Sentiment Dataset show that both feature sets outperform the baseline model using unigrams only. Moreover, using semantic features rivals the previously reported best result. Using sentiment-topic features achieves 86.3% sentiment classification accuracy, which outperforms existing approaches.
Sentiment analysis over Twitter offers organisations and individuals a fast and effective way to monitor the publics' feelings towards them and their competitors. To assess the performance of sentiment analysis methods over Twitter a small set of evaluation datasets have been released in the last few years. In this paper we present an overview of eight publicly available and manually annotated evaluation datasets for Twitter sentiment analysis. Based on this review, we show that a common limitation of most of these datasets, when assessing sentiment analysis at target (entity) level, is the lack of distinctive sentiment annotations among the tweets and the entities contained in them. For example, the tweet ``I love iPhone, but I hate iPad'' can be annotated with a mixed sentiment label, but the entity iPhone within this tweet should be annotated with a positive sentiment label. Aiming to overcome this limitation, and to complement current evaluation datasets, we present STS-Gold, a new evaluation dataset where tweets and targets (entities) are annotated individually and therefore may present different sentiment labels. This paper also provides a comparative study of the various datasets along several dimensions including: total number of tweets, vocabulary size and sparsity. We also investigate the pair-wise correlation among these dimensions as well as their correlations to the sentiment classification performance on different datasets.
Seminar presentation made by me for the topic of 'Resources for Sentiment Analysis' at IIT Bombay. Includes a set of bonus slides for additional information which was not actually presented.
In recent times, research activities in the areas of Opinion and Sentiment analysis in natural language texts and other media are gaining ground under the umbrella of subjectivity analysis. The reason may be the huge amount of available text data in the Social Web in the forms of news, reviews, blogs, chats and even twitter. Though Sentiment analysis from natural lan-guage text is a multifaceted and multidisciplinary problem, in general, the term “sentiment” is used in reference to the automatic analysis of evaluative text.
Most existing approaches to Twitter sentiment analysis assume that sentiment is explicitly expressed through affective words. Nevertheless, sentiment is often implicitly expressed via latent semantic relations, patterns and dependencies among words in tweets. In this paper, we propose a novel approach that automatically captures patterns of words of similar contextual semantics and sentiment in tweets. Unlike previous work on sentiment pattern extraction, our proposed approach does not rely on external and fixed sets of syntactical templates/patterns, nor requires deep analyses of the syntactic structure of sentences in tweets.
We evaluate our approach with tweet- and entity-level sentiment analysis tasks by using the extracted semantic patterns as classification features in both tasks. We use 9 Twitter datasets in our evaluation and compare the performance of our patterns against 6 state-of-the-art baselines. Results show that our patterns consistently outperform all other baselines on all datasets by 2.19% at the tweet-level and 7.5% at the entity-level in average F-measure.
Sentiment analysis using naive bayes classifier Dev Sahu
This ppt contains a small description of naive bayes classifier algorithm. It is a machine learning approach for detection of sentiment and text classification.
Sentiment analysis or opinion mining is a process of categorizing and identifying the sentiment expressed in a particular text. The need of automatic sentiment retrieval of
the text is quite high as a number of reviews obtained from the Internet sources like Twitter are huge in number. These reviews or opinions on popular products or events help in determining the public opinion towards the issue. An averaged histogram model is proposed in the process that deals with text classification in continuous variable approach. After data cleaning and feature extraction from the reviews, average histograms are constructed for every class, containing a generalized feature representation in that particular class, namely positive and negative. Histograms of every test elements are then classified using k-NN, Bayesian Classifier and LSTM network. This work is then implemented in Android integrated with Twitter. The user will have to provide the topic for analysis. The Application will show the result as the percentage of positive review tweets in favor of the topic using Bayesian Classifier.
Lexicon-based approaches to Twitter sentiment analysis are gaining much popularity due to their simplicity, domain independence, and relatively good performance. These approaches rely on sentiment lexicons, where a collection of words are marked with fixed sentiment polarities. However, words' sentiment orientation (positive, neural, negative) and/or sentiment strengths could change depending on context and targeted entities. In this paper we present SentiCircle; a novel lexicon-based approach that takes into account the contextual and conceptual semantics of words when calculating their sentiment orientation and strength in Twitter. We evaluate our approach on three Twitter datasets using three different sentiment lexicons. Results show that our approach significantly outperforms two lexicon baselines. Results are competitive but inconclusive when comparing to state-of-art SentiStrength, and vary from one dataset to another. SentiCircle outperforms SentiStrength in accuracy on average, but falls marginally behind in F-measure.
Sentiment Analysis/Opinion Mining of Twitter Data on Unigram/Bigram/Unigram+Bigram Model using:
1. Machine Learning
2. Lexical Scores
3. Emoticon Scores
YouTube Video: https://youtu.be/VuR16P87yPE
Link to the WebPage: http://akirato.github.io/Twitter-Sentiment-Analysis-Tool
Github Page: https://github.com/Akirato/Twitter-Sentiment-Analysis-Tool
Michael Manukyan and Hrayr Harutyunyan gave a talk on sentence representations in the context of deep learning during Armenian NLP Meetup. They also reviewed a recent paper on machine comprehension (Wang, Jiang, 2016)
Ayush Kumar, Sarah Kohail, Amit Kumar, Asif Ekbal, Chris Biemann
IIT Patna, India
TU Darmstadt, Germany
Presented by: Alexander Panchenko, TU Darmstadt, Germany
Sentiment Analysis Using Hybrid Structure of Machine Learning AlgorithmsSangeeth Nagarajan
Sentiment Analysis is the process used to determine the attitude/ opinion/ emotion expressed by a person about a particular topic. The presentation dealt with general approach and different machine learning based classification alogorithms. The slides is based on the work "Sentiment analysis using Neuro-Fuzzy and Hidden Markov models of text" by Rustamov S , Mustafayev E and Clements M A.
LEPOR: an augmented machine translation evaluation metric - Thesis PPT Lifeng (Aaron) Han
Machine translation (MT) was developed as one of the hottest research topics in the natural language processing (NLP) literature. One important issue in MT is that how to evaluate the MT system reasonably and tell us whether the translation system makes an improvement or not. The traditional manual judgment methods are expensive, time-consuming, unrepeatable, and sometimes with low agreement. On the other hand, the popular automatic MT evaluation methods have some weaknesses. Firstly, they tend to perform well on the language pairs with English as the target language, but weak when English is used as source. Secondly, some methods rely on many additional linguistic features to achieve good performance, which makes the metric unable to replicateand apply to other language pairs easily. Thirdly, some popular metrics utilize incomprehensive factors, which result in low performance on some practical tasks.
In this thesis, to address the existing problems, we design novel MT evaluation methods and investigate their performances on different languages. Firstly, we design augmented factors to yield highly accurate evaluation.Secondly, we design a tunable evaluation model where weighting of factors can be optimized according to the characteristics of languages. Thirdly, in the enhanced version of our methods, we design concise linguistic feature using POS to show that our methods can yield even higher performance when using some external linguistic resources. Finally, we introduce the practical performance of our metrics in the ACL-WMT workshop shared tasks, which show that the proposed methods are robust across different languages.
Question Answering System using machine learning approachGarima Nanda
In a compact form, this is a presentation reflecting how the machine learning approach can be used for the effective and efficient interaction using classification techniques.
This presentation was part of my graduation project which we have done during the spring semester of 2013 it talked about how we extracted sentiment from tweets using data-mining and NLP methods
Most existing approaches to Twitter sentiment analysis assume that sentiment is explicitly expressed through affective words. Nevertheless, sentiment is often implicitly expressed via latent semantic relations, patterns and dependencies among words in tweets. In this paper, we propose a novel approach that automatically captures patterns of words of similar contextual semantics and sentiment in tweets. Unlike previous work on sentiment pattern extraction, our proposed approach does not rely on external and fixed sets of syntactical templates/patterns, nor requires deep analyses of the syntactic structure of sentences in tweets.
We evaluate our approach with tweet- and entity-level sentiment analysis tasks by using the extracted semantic patterns as classification features in both tasks. We use 9 Twitter datasets in our evaluation and compare the performance of our patterns against 6 state-of-the-art baselines. Results show that our patterns consistently outperform all other baselines on all datasets by 2.19% at the tweet-level and 7.5% at the entity-level in average F-measure.
Sentiment analysis using naive bayes classifier Dev Sahu
This ppt contains a small description of naive bayes classifier algorithm. It is a machine learning approach for detection of sentiment and text classification.
Sentiment analysis or opinion mining is a process of categorizing and identifying the sentiment expressed in a particular text. The need of automatic sentiment retrieval of
the text is quite high as a number of reviews obtained from the Internet sources like Twitter are huge in number. These reviews or opinions on popular products or events help in determining the public opinion towards the issue. An averaged histogram model is proposed in the process that deals with text classification in continuous variable approach. After data cleaning and feature extraction from the reviews, average histograms are constructed for every class, containing a generalized feature representation in that particular class, namely positive and negative. Histograms of every test elements are then classified using k-NN, Bayesian Classifier and LSTM network. This work is then implemented in Android integrated with Twitter. The user will have to provide the topic for analysis. The Application will show the result as the percentage of positive review tweets in favor of the topic using Bayesian Classifier.
Lexicon-based approaches to Twitter sentiment analysis are gaining much popularity due to their simplicity, domain independence, and relatively good performance. These approaches rely on sentiment lexicons, where a collection of words are marked with fixed sentiment polarities. However, words' sentiment orientation (positive, neural, negative) and/or sentiment strengths could change depending on context and targeted entities. In this paper we present SentiCircle; a novel lexicon-based approach that takes into account the contextual and conceptual semantics of words when calculating their sentiment orientation and strength in Twitter. We evaluate our approach on three Twitter datasets using three different sentiment lexicons. Results show that our approach significantly outperforms two lexicon baselines. Results are competitive but inconclusive when comparing to state-of-art SentiStrength, and vary from one dataset to another. SentiCircle outperforms SentiStrength in accuracy on average, but falls marginally behind in F-measure.
Sentiment Analysis/Opinion Mining of Twitter Data on Unigram/Bigram/Unigram+Bigram Model using:
1. Machine Learning
2. Lexical Scores
3. Emoticon Scores
YouTube Video: https://youtu.be/VuR16P87yPE
Link to the WebPage: http://akirato.github.io/Twitter-Sentiment-Analysis-Tool
Github Page: https://github.com/Akirato/Twitter-Sentiment-Analysis-Tool
Michael Manukyan and Hrayr Harutyunyan gave a talk on sentence representations in the context of deep learning during Armenian NLP Meetup. They also reviewed a recent paper on machine comprehension (Wang, Jiang, 2016)
Ayush Kumar, Sarah Kohail, Amit Kumar, Asif Ekbal, Chris Biemann
IIT Patna, India
TU Darmstadt, Germany
Presented by: Alexander Panchenko, TU Darmstadt, Germany
Sentiment Analysis Using Hybrid Structure of Machine Learning AlgorithmsSangeeth Nagarajan
Sentiment Analysis is the process used to determine the attitude/ opinion/ emotion expressed by a person about a particular topic. The presentation dealt with general approach and different machine learning based classification alogorithms. The slides is based on the work "Sentiment analysis using Neuro-Fuzzy and Hidden Markov models of text" by Rustamov S , Mustafayev E and Clements M A.
LEPOR: an augmented machine translation evaluation metric - Thesis PPT Lifeng (Aaron) Han
Machine translation (MT) was developed as one of the hottest research topics in the natural language processing (NLP) literature. One important issue in MT is that how to evaluate the MT system reasonably and tell us whether the translation system makes an improvement or not. The traditional manual judgment methods are expensive, time-consuming, unrepeatable, and sometimes with low agreement. On the other hand, the popular automatic MT evaluation methods have some weaknesses. Firstly, they tend to perform well on the language pairs with English as the target language, but weak when English is used as source. Secondly, some methods rely on many additional linguistic features to achieve good performance, which makes the metric unable to replicateand apply to other language pairs easily. Thirdly, some popular metrics utilize incomprehensive factors, which result in low performance on some practical tasks.
In this thesis, to address the existing problems, we design novel MT evaluation methods and investigate their performances on different languages. Firstly, we design augmented factors to yield highly accurate evaluation.Secondly, we design a tunable evaluation model where weighting of factors can be optimized according to the characteristics of languages. Thirdly, in the enhanced version of our methods, we design concise linguistic feature using POS to show that our methods can yield even higher performance when using some external linguistic resources. Finally, we introduce the practical performance of our metrics in the ACL-WMT workshop shared tasks, which show that the proposed methods are robust across different languages.
Question Answering System using machine learning approachGarima Nanda
In a compact form, this is a presentation reflecting how the machine learning approach can be used for the effective and efficient interaction using classification techniques.
This presentation was part of my graduation project which we have done during the spring semester of 2013 it talked about how we extracted sentiment from tweets using data-mining and NLP methods
Make a query regarding a topic of interest and come to know the sentiment for the day in pie-chart or for the week in form of line-chart for the tweets gathered from twitter.com
Integrating Structure and Analytics with Unstructured DataDATAVERSITY
How can you make sense of messy data? How do you wrap structure around non-relational, flexibly structured data? With the growth in cloud technologies, how do you balance the need for flexibility and scale with the need for structure and analytics? Join us for an overview of the marketplace today and a review of the tools needed to get the job done.
During this hour, we'll cover:
- How big data is challenging the limits of traditional data management tools
- How to recognize when tools like MongoDB, Hadoop, IBM Cloudant, R Studio, IBM dashDB, CouchDB, and others are the right tools for the job.
Netbase AMA Sentiment Analysis PresentationNetBase
Marketers can’t stop talking about social media, but how many understand how it can help them meet critical business objectives? Or what tools are available to analyze social media, how they compare, and which one is best suited for market research and brand managers?
This NetBase presentation will teach you:
How social media impacts your sales funnel.
Why understanding specific customer themes is important.
How to quantify conversations and get actionable insights that strengthen your brand.
How to use social analytics tools to efficiently get valuable competitive insights.
isMOOD: Listening to the customers’ voice through social network analyticsisMOOD
Using isMOOD social network analytics tool to listen to your customers' needs and react to them timely and effectively.
Using isMOOD social network analytics tool to listen to your customers' needs and react to them timely and effectively.
Supervised Sentiment Classification using DTDP algorithmIJSRD
Sentiment analysis is the process widely used in all fields and it uses the statistical machine learning approach for text modeling. The primarily used approach is Bag-of-words (BOW). Though, this technique has some limitations in polarity shift problem. Thus, here we propose a new method called Dual sentiment analysis (DSA) which resolves the polarity shift problem. Proposed method involves two approaches such as dual training and dual prediction (DPDT). First, we propose a data expansion technique by creating a reversed review for training data. Second, dual training and dual prediction algorithm is developed for doing analysis on sentiment data. The dual training algorithm is used for learning a sentiment classifier and the dual prediction algorithm is developed for classifying the review by considering two sides of one review.
Sentiment classification is an ongoing field and interesting area of research because of its application in various fields collecting review from people about products and social and political events through the web. Currently, Sentiment Analysis concentrates for subjective statements or on subjectivity and overlook objective statements which carry sentiment(s). During the sentiment classification more challenging problem are faced due to the ambiguous sense of words, negation words and intensifier. Due to its importance the correct sense of target word is extracted and determined for which the similarity arise in WordNet Glosses. This paper presents a survey covering the techniques and methods in sentiment analysis and challenges appear in the field.
Présentation de l'atelier Classification présenté lors de l'action de formation CNRS-INRAE en novembre 2021 (https://anf-tdm-2021.sciencesconf.org).
Code et données : https://github.com/pbellot/ANFTDM2021
Dear Sir/Ma’am
I am interested to work as a data specialist in your organization. I believe my experience, skills and work attitude will aid your organization in a great way. Please accept my enclosed resume with this letter.
I worked at Accenture for the last four years. My key responsibilities here were to collect, analyse, store and create data. I made sure that these data were accurate and not damaged. As far as my educational background is concerned, I have a bachelor's degree in EXTC. I am excellent at solving problems and have great analytical skills. I am capable of working well with network administration and can explain the technical problems.
I would appreciate if we could meet up for an interview wherein we can discuss more on this. I can be contacted at +919493377607 or you can email me at imtiaz.khan.sw39@gmail.com
Thank You.
Yours sincerely,
Imtiaz Khan
DEEP LEARNING SENTIMENT ANALYSIS OF AMAZON.COM REVIEWS AND RATINGSijscai
Our study employs sentiment analysis to evaluate the compatibility of Amazon.com reviews with their
corresponding ratings. Sentiment analysis is the task of identifying and classifying the sentiment expressed
in a piece of text as being positive or negative. On e-commerce websites such as Amazon.com, consumers
can submit their reviews along with a specific polarity rating. In some instances, there is a mismatch between
the review and the rating. To identify the reviews with mismatched ratings we performed sentiment analysis
using deep learning on Amazon.com product review data. Product reviews were converted to vectors using
paragraph vector, which then was used to train a recurrent neural network with gated recurrent unit. Our
model incorporated both semantic relationship of review text and product information. We also developed a
web service application that predicts the rating score for a submitted review using the trained model and if
there is a mismatch between predicted rating score and submitted rating score, it provides feedback to the
reviewer.
From Linked Data to Semantic ApplicationsAndre Freitas
In this talk we will discuss how to build (today) semantically intelligent systems, i.e. systems with the ability to process and interpret information by its meaning. We will take a multidisciplinary perspective showing how recent advances in other computer science areas such as Information Retrieval and Natural Language Processing can enable, together with Linked Data and Semantic Web resources, the construction of the next generation of information systems. A summary of the core principles and available
resources from these areas will give a concrete understanding on how to jump-start your own semantic system.
Big Data analytics, social media analytics, text analytics, unstructured data analytics... call it what you may, we see ourselves as experts in text mining and have products and services that provide insights from various kinds of unstructured data. Already recognized by Gartner for our expertise, we are passionate about what we do and have also filed patents for some innovative approaches we have used.
Sentiment Analysis Using Hybrid Approach: A SurveyIJERA Editor
Sentiment analysis is the process of identifying people’s attitude and emotional state’s from language. The main objective is realized by identifying a set of potential features in the review and extracting opinion expressions about those features by exploiting their associations. Opinion mining, also known as Sentiment analysis, plays an important role in this process. It is the study of emotions i.e. Sentiments, Expressions that are stated in natural language. Natural language techniques are applied to extract emotions from unstructured data. There are several techniques which can be used to analysis such type of data. Here, we are categorizing these techniques broadly as ”supervised learning”, ”unsupervised learning” and ”hybrid techniques”. The objective of this paper is to provide the overview of Sentiment Analysis, their challenges and a comparative analysis of it’s techniques in the field of Natural Language Processing.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
2. Our Talk
Introduction to Sentiment Analysis
Supervised Learning Approaches
Case Study with RapidMiner
3. Motivation
“81% of US internet users (60 of population) have
60%
used the internet to perform research on a product they
intended to purchase, as of 2007.”
“Over 30% of US internet users have at one time
%
posted a comment or online review about a product or
service they’ve purchased.”
(Horrigan, 2008)
4. Motivation
A lot of online content is subjective in nature.
User Generated Content: Product reviews, blog
posts, twitter, etc.
epinions.com, Amazon, RottenTomatoes.com.
Sheer volume of opinion data calls for automated
analytical methods.
5. Why Are Automated Methods Relevant?
Search and Recommendation Engines.
Show me only positive/negative/neutral.
Market Research.
What is being said about brand X on Twitter?
Contextual Ad Placement.
Mediation of online communities.
6. A Growing Industry
Opinion Mining offerings
Voice of Customer analytics
Social Media Monitoring
SaaS or embedded in data mining packages
7. Opinion Mining – Sentiment Classification
For a given Text Document, Determine Sentiment
Orientation
Positive or Negative, Favorable or Unfavorable, etc.
Binary or along a scale (e.g. 1 stars)
1-5
Data is unstructured text format. From sentence to
document level.
Ex: Positive or Negative?
“This is by far the worst hotel experience i've ever had. the owner
overbooked while i was staying there (even though i booked the room
two months in advance) and made me move to another room, but that
room wasn't even a hotel room!”
8. Supervised Learning for Text
Train a classifier algorithm based on a training
data set.
Raw data will be text.
Approach: Use term presence information as
features.
A plain text document becomes a word vector.
9. Supervised Learning for Text
A word vector can be used to train a classifier.
Building a Word Vector
Unit of tokenization: uni/bi/n
uni/bi/n-gram
Term presence metric
Binary, tf-idf, frequency
idf,
Stemming
Stop Words Removal
Word Train Classifier
Tokenize Stemming
Vector
IMDB Data Set
(Plain Text)
10. Opinion Mining – Sentiment Classification
Challenges of Data Driven Approaches
Domain dependence.
“chuck norris” might be a good sentiment
”
predictor, but on movies only
We lose discourse information.
Ex: negation detection
“This comedy is not really funny.”
NLP techniques might help.
11. RapidMiner Case Study
Sentiment Classification based on Word Vectors.
Convert Text data to Word Vectors
Using RapidMiner’s Text Processing Extension.
Use it to Train/Test a Learner Model.
Using Cross-Validation.
Using Correlation and Parameter Testing to pick better
features.
Our data set is a collection of Film reviews from IMDB
presented in (Pang et al, 2004).
12. RapidMiner Case Study
Selects document collectio
From a directory.
From text to list of tokens
Convert word variations t
Their stem.
13. RapidMiner Case Study
Parameter Testing
- Filter “top K” most correlated attributes.
- K is a macro iterated using Parameter
Testing.
Testing
14. RapidMiner Case Study
Cross Validation - Training Step.
Calculate Attribute Weights and Normalize.
Pass models on “through port” to Testing.
Select “top k” attributes by weight and train SVM.
16. Case Study – Adding More Features
Pre-Computed features based on text statistics.
Computed
Document, Word and Sentence Sizes, Part
Part-of-speech
Presence, Stop words ratio, Syllable Count.
Features based on scoring using a sentiment lexicon.
(Ohana & Tierney ‘09).
Used SentiWordNet as the Lexicon (Esuli et al, 09).
In RapidMiner we can merge those data sets using a
known unique ID (File name in our case).
17. Opinion Lexicons
Opinion Lexicons.
A database of terms and opinion information they carry.
Some terms and expressions carry “a priori” opinion
bias, relatively independent from context.
Ex: good, excellent, bad, poor.
To build the data set:
Score document based on terms found.
Total positive/negative scores.
Per part-of-speech.
Per document section.
18. Lexicon Based Approach
Document Scores
POS Negation
Scoring SWN Features
Tagger Detection
MDB Data Set
(Plain Text)
SentiWordNet
19. Part of Speech Tagging
The computer-animated comedy " shrek " is designed to be enjoyed on
animated
different levels by different groups . for children , it offers imaginative
visuals , appealing new characters mixed with a host of familiar faces ,
loads of action and a barrage of big laughs
The/DT computer-animated/JJ comedy/NN ''/'' shrek/NN ''/'' is/VBZ
designed/VBN to/TO be/VB enjoyed/VBN on/IN different/JJ levels/NNS by/IN
different/JJ groups/NNS ./. for/IN children/NNS ,/, it/PRP offers/VBZ
imaginative/JJ visuals/NNS ,/, appealing/VBG new/JJ characters/NNS
mixed/VBN with/IN a/DT host/NN of/IN familiar/JJ faces/NNS ,/, loads/NNS of/IN
action/NN and/CC a/DT barrage/NN of/IN big/JJ laughs/NNS
20. Negation Detection
NegEx (Chapman et al ’01).
Look for negating expressions
Pseudo-negations.
“no wonder”, “no change”, “not only”
Forward and Backward Scope.
“don’t”, “not”, “without”, “unlikely to”, etc…
21. Case Study – Adding More Features
Data Set Merging
22. Results - Accuracy
Average Accuracy using 10-fold Cross
fold Cross-validation
Method Accuracy % Feature Count
Baseline word vector 85.39 6739
Baseline less uncorrelated attributes 85.49 1800
Document Stats (S) 68.73 22
SentiWordNet features (SWN) 67.40 39
Merging (S) + (N) 72.79 61
Merging Baseline + (S) + (SWN) and 86.39 1800
removing uncorrelated attributes
23. Opinion Mining – Sentiment Classification
Some results from the field (IMDB data set).
Method Accuracy Source
Support Vector Machines and 77.10% (Pang et al, 2002)
Bigrams word vector
Word Vector Naïve Bayes + Parts of 77.50% (Salvetti et al, 2004)
Speech
Support Vector Machines and 82.90% (Pang et al, 2002)
Unigrams word vector
Unigrams + Subjectivity Detection 87.15% (Pang et al, 2004)
SVM + stylistic features 87.95% (Abbasi et al, 2008)
SVM + GA feature selection 95.55% (Abbasi et al, 2008)