On Using Deep Learning for Sentiment Analysis
Conor Brady
School of Computer Science and Statistics
University of Dublin
This dissertation is submitted for the degree of
BA(mod) in Computer Science
Trinity College April 2015
I dedicate this project to my parents, Jim and Méabh Brady, without whose unwavering support
during my constant meandering over the last few years, I surely would have not had the
opportunity to submit this for consideration for my undergraduate degree. I owe you more than
I can ever repay.
Declaration
I hereby declare that this thesis is, except where otherwise stated, entirely my own work and
that it has not been submitted as an exercise for a degree at any other university
Conor Brady
April 2015
Acknowledgements
Foremost I would like to acknowledge my supervisor Rozenn Dayhot, who was helpful,
involved and supportive all the way through the last few months. I would like to thank Conor
Murphy and Micheal Barry for reading drafts of this report and providing invaluable feedback.
A special thanks to Paddy Corr, whose supply of cigarettes kept me going through the hard
times. Last but not least, Sam Green. Thanks for the speakers. They’re great speakers.
Abstract
This project relates to the analysis of applications of deep learning to sentiment analysis. A
technology developed at Stanford University relating to this field is described. This technology
achieves state the art results in sentiment analysis of sentences of varying length. This is then
applied in two novel implementations. Firstly, a Firefox extension that parses text on a webpage,
then colours it according to sentiment. Secondly, a Javascript visualisation of the sentiment
of users as they tweet during the Dublin Marathon 2014 and live over San Fransisco. These
technologies are then evaluated along with the Stanford sentiment analysis engine.
Table of contents
List of figures xiii
1 Introduction 1
1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.1 HTTP Sentiment Endpoint . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.2 Sentiment Analysis Firefox Extension . . . . . . . . . . . . . . . . . 2
1.2.3 Geolocated Visualization of Tweets . . . . . . . . . . . . . . . . . . 2
2 Background 3
2.1 Distributed Semantic Vector Representations . . . . . . . . . . . . . . . . . . 3
2.2 Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2.1 Feature Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2.2 Artificial Feed-Forward Neural Network . . . . . . . . . . . . . . . . 5
2.3 Learning Semantic Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4 Composition of Semantic Vectors . . . . . . . . . . . . . . . . . . . . . . . 9
2.5 Transformation to a Sentiment Space . . . . . . . . . . . . . . . . . . . . . . 10
2.6 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3 Implementation 13
3.1 The Stanford CoreNLP Java Library . . . . . . . . . . . . . . . . . . . . . . 13
3.1.1 Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.1.2 Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2 Sentiment Firefox Extension . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2.1 Selectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2.2 Sentence Boundaries . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2.3 Server Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2.4 Colouring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
xii Table of contents
3.2.5 HTTPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2.6 Overview Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.3 Sentiment Rain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.3.1 The TCD GRAISearch Dataset . . . . . . . . . . . . . . . . . . . . . 23
3.3.2 Server Stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3.3 Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.3.4 Scenario-Tweets Resource . . . . . . . . . . . . . . . . . . . . . . . 24
3.3.5 Javascript Visualisation . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3.6 Sound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3.7 Live . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4 Experimental Results 31
4.1 Stanford Sentiment Server . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2 Firefox Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.3 Sentiment Rain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5 Future Work 39
5.1 Sentiment Analysis and Semantic Vectors . . . . . . . . . . . . . . . . . . . 39
5.2 Stanford Sentiment Server . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.3 Firefox Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.4 Sentiment Rain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
6 Conclusions 43
References 45
List of figures
2.1 Example of feature engineering to classify binary digits into their respective
⊕ output, introduction of the ∧ operator makes previously inseparable classes
easily separable. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Example of a feed-forward artificial neural network with two inputs, a hidden
layer and one output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Graph of a common activation function (Equation 2.2) used in the functional
units of Artificial Neural Networks. . . . . . . . . . . . . . . . . . . . . . . 6
2.4 An example of a Recursive Neural Network, where the output is equal in length
to each of its two child inputs . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5 Example of the transformation of a semantic vector into the sentiment space . 11
3.1 Stanford CoreNLP Maven dependancy listing . . . . . . . . . . . . . . . . . 14
3.2 Server library imports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3 Server initialisation code . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.4 Required API endpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.5 Sentiment Server Response . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.6 The sentiment server endpoint’s code . . . . . . . . . . . . . . . . . . . . . . 17
3.7 Firefox extension that highlights sentiment out of paragraphs of text. . . . . . 18
3.8 Firefox extension content selectors . . . . . . . . . . . . . . . . . . . . . . . 19
3.9 Sentence boundary detection regexes . . . . . . . . . . . . . . . . . . . . . . 20
3.10 Headers required to enable Cross-Origin Resource Sharing . . . . . . . . . . 20
3.11 Function for calculating the colour from a sentiment value . . . . . . . . . . 21
3.12 Plugin running on Twitter over HTTPS . . . . . . . . . . . . . . . . . . . . . 22
3.13 Sentiment Rain scenario showing sentiment of Tweets during the Dublin
Marathon in 2014 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.14 GRAISearch database endpoint . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.15 Sample required /scenario_tweets resource . . . . . . . . . . . . . . . . 24
3.16 Sentiment Rain architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
xiv List of figures
3.17 Tweet selected by clicking a circle . . . . . . . . . . . . . . . . . . . . . . . 26
3.18 Sound synthesis schematic for audio creation parametrised by sentiment and
location . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.19 An example of a ASDR envelope . . . . . . . . . . . . . . . . . . . . . . . . 28
3.20 Live view of tweets over San Francisco . . . . . . . . . . . . . . . . . . . . 29
4.1 Random sample of websites to test effectiveness of content selection from
extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.2 Short movie review [5], colour coded based on sentiment . . . . . . . . . . . 33
4.3 Short movie review [5], colour coded based on sentiment . . . . . . . . . . . 34
4.4 Short movie review [5], colour coded based on sentiment . . . . . . . . . . . 34
4.5 Short movie review [5], colour coded based on sentiment . . . . . . . . . . . 34
4.6 Excerpt from the Wikipedia page of Dublin, Ireland . . . . . . . . . . . . . . 35
4.7 Screenshot of Sentiment Rain over the Dublin Marathon 2014 . . . . . . . . 37
Chapter 1
Introduction
1.1 Overview
Deep Learning is a broad field of research and this project has focused on its application
to sentiment analysis, as set forth in [31]. Sentiment analysis is the determination of how
positively or negatively an author of a piece of text feels about the subject of the text. This has
applications in fields such as predictions of stock market prices and corporate brand evaluation.
[31] has led to state of the art results in analysis of the sentiment sentences of variable length,
with 85% successful classification of entire sentences into classes of positive or negative.
Chapter 2 relates to a discussion of the technologies and ideas that leads to [31], with
an emphasis on an abstract understanding of the mathematics behind it. It explains stage by
stage the technologies that lead to the Recursive Neural Tensor Network, a technology that
semantically composes a sentence up its syntax tree, then a transformation of the output reveals
the sentiment information inherent in the sentence.
Using the library resulting from the efforts of the Stanford department, made available
under the GPLv3 license, this report chronicles the creation of novel uses of the sentiment
analysis engine, with regard to various sources of text on the Internet. The process in building
these technologies is described in Chapter 3.
Chapter 4 investigates the results from the implementations and evaluates their effectiveness.
Consideration for future development based on the results is considered in Chapter 5.
Finally, concluding remarks are documented in Chapter 6. These relate to evaluating the
application of deep learning technologies to sentiment analysis and the effectiveness of the use
of these technologies in the implementations described within.
2 Introduction
1.2 Technologies
Three technologies have been developed as part of this final year project. While the first two
applications (Section 1.2.1 and 1.2.2) have been designed and developed independently with
use of the Stanford CoreNLP library, the third has benefited from interacting with researchers
on an ongoing European project coordinated by the supervisor Prof. Rozenn Dahyot [13].
More specifically, Cyril Bourges, Marie Curie Research Fellow on this GRAISearch project,
has provided access to the dataset ’Dublin Marathon 2014’ hosted on a server in the school of
computer science and Statistics in Trinity College.
1.2.1 HTTP Sentiment Endpoint
This is a server consisting of a wrapper around the library, as supplied by the Stanford NLP
Group, that allows for sentiment analysis to be carried out by any other service able to make
HTTP requests across the Internet. It will return a number between 0 for negative and 1 for
positive allowing further processing to be carried out. This is the foundation of the other two
technologies, and will run on its own server hosted by Amazon Web Services.
1.2.2 Sentiment Analysis Firefox Extension
An extension that when installed in the Firefox web browser will automatically search for
passages of text in webpages. Once the passages are identified it colour codes sentences
depending on their sentiment: red for negative, black for neutral and green for positive, with a
gradient between.
1.2.3 Geolocated Visualization of Tweets
Another novel application of sentiment analysis, this takes a dataset of tweets over a period, in
this case the Dublin City Marathon 2014, and displays them on a map relating to where they
were tweeted over the course of the scenario. Using colour and music to signify the sentiment
of the tweet it creates an interactive audiovisual experience, where tweets can be intuitively
viewed and navigated over the course of the scenario.
Chapter 2
Background
Deep learning has led to state of the art innovation in the areas of Automatic Speech Recognition
[35], Image Recognition [17] and Natural Language Processing [18]. This project specifically
deals with its applications in sentiment analysis through the creation of semantic vector
representations. A brief introduction is provided in this chapter to the technologies that have
led to the state of the art results in sentiment analysis of sentences of varying length [31].
Section 2.1 provides an introduction to the idea of a semantic vector representation of an
input and contrasts it to an atomic representation, the example given of English words.
Section 2.2 provides an introduction to the area of deep learning, the factors that lead to its
success and how it is implemented in artificial neural networks.
Section 2.3 returns to semantic vectors and discusses how these can be learned by artificial
neural networks.
Section 2.4 discusses how these semantic vectors, when learned to describe words in the
English language, can be composed up a sentence’s syntax tree resulting in a single vector that
describes the semantic meaning of the sentence.
Section 2.5 describes how vectors in the semantic space may be transformed into a sentiment
space to distill sentiment information from the sentiment vectors.
Finally, Section 2.6 investigates how the Stanford NLP department trained these technolo-
gies with crowdsourcing labelled data.
2.1 Distributed Semantic Vector Representations
Words exist as a means of communication between people, but they are just labels people who
speak the same language have agreed upon to communicate meaning. They offer nothing in
so far as denoting what they describe. Words are dense representations in that each symbol
of the word offers nothing in isolation, they are atomic units. In the alternative, a distributed
4 Background
representation, some of the dimensions may be lost and still information about the object is
present.
For example "cat" and "lion" trigger commonalities in the mind as they trigger internal
representations, but are useless to computers for describing the objects they denote. What these
systems need are its own distributed semantic vector representations. These semantic vectors
describe, with each dimension, some feature of the input that is useful for solving meaningful
problems.
2.2 Deep Learning
Deep learning is the process of learning successive semantic vector representations of data in
a hierarchical manner. Each vector feeds into the next, more abstract vector to get more and
more abstract representations. For example, given an input of an image on a network tasked to
find faces, the first layer could detect edges, the next layer eyes, ears and noses and then a third
could detect faces. Presented with raw image data at the input, it generates successively more
abstract vectors to eventually carry out a task at the output.
To return to the cat analogy, one feature of an abstract layer would be ’is it a cat?’, followed
on an even more abstract layer with ’is it a lion?’ and these would take cue from the lower
layers of ’has it four legs?’ or ’has is it got fur?’. This is useful for simplifying the system as at
each layer information is concentrated, abstracted and discarded.
The process of learning is by determining how much of an error is in the network at the
output. Functional units that produce the output are modified and optimised to reduce the error
and eventually generalise to produce the correct output for unseen inputs.
2.2.1 Feature Engineering
Traditional machine learning depends on feature engineering. Feature engineering is the
practice of hand producing a set of functions that transform an input into a semantic or feature
vector. The goal is to make the feature vector a good representation of the input. Whether that
be by lowering the dimensionality and increasing the signal to noise ratio, or to perform non
linear operations to transform a dimension of the output vector making it linearly separable
from the other vectors of a different class. The engineer tries to define a semantic distributed
representation by hand.
A toy example of feature engineering can be seen in Figure 2.1. Here the class of inputs to
the exclusive disjunction operator that output a 1 (in red) are linearly inseperable from those
that output a 0 (in blue), from the input alone with two dimensional line. Once the conjunction
2.2 Deep Learning 5
Fig. 2.1 Example of feature engineering to classify binary digits into their respective ⊕ output,
introduction of the ∧ operator makes previously inseparable classes easily separable.
of the inputs is introduced in the three dimensional model as the new dimension, the classes
can easily be separated by a plane in the three dimensional space.
This feature engineering is time consuming, depends heavily on the expertise of the engineer
and may miss important features of the data that would be useful in carrying out the task at
hand. Also the degree of complexity to feed the output of a layer of features into the input
of another layer grows with each successive layer of abstraction. This means truly powerful
hierarchical representations of the input are difficult to manually engineer.
Instead of defining these feature vectors by hand, they can be learned. These feature vectors
may then reveal information about the data useful to carrying out the task that is less apparent,
but still contributes to the robustness of the system. As the outputs of one layer of features are
used as the inputs into the next layer of features, these features become more robust as each
would be required to influence multiple features in the next layer. The forced generalisation
helps stop the features from overfitting. This can be used through multiple layers of features,
creating complex hierarchical representations of the input. Further analysis can be found in [3].
2.2.2 Artificial Feed-Forward Neural Network
A common implementation of deep learning is an artificial feed-forward neural network, as
seen in Figure 2.2. This comprises of layers of non-linear functional units, commonly referred
to as "neurons", a layer of these produces a linear transformation of the input vector followed
by an offset (Equation 2.1) and a non-linear activation function. The activation function is
usually a sigmoid function (Figure 2.3 and Equation 2.2), due to its asymptotic nature and
6 Background
Fig. 2.2 Example of a feed-forward artificial neural network with two inputs, a hidden layer
and one output
Fig. 2.3 Graph of a common activation function (Equation 2.2) used in the functional units of
Artificial Neural Networks.
convenient derivative that is in terms of its input (Equation 2.3). More on activation functions
to follow.
z = xT
W +b (2.1)
f(z) =
1
1+e−z
(2.2)
f (z) = f(z)(1− f(z)) (2.3)
These functional units are parametrised by a weight matrix W and a bias vector b on each
layer. The weights are parameters of a transformation on the input vector and the biases are an
offset that is applied after the transformation but before the non-linear activation function.
2.2 Deep Learning 7
Due to the non-linear nature of each transformation, complex transformations of the input
data can be achieved. Given multiple layers, transformations that would have been impossible
with a single layer of hand tuned features can be described.
Networks are commonly trained via supervised learning, this is a method of training where
the network has a dataset of inputs x, be they text, image, video or some other raw data, and
corresponding labelled outputs y which describe correct answers for the each input on the task
at hand. The dataset is then split into the training set and the testing set randomly. The network
is then trained on the training set and its ability is assessed on the testing set.
These networks are trained with the method of back propagation [27]. This is a form of
gradient descent that has been designed specifically for artificial neural networks. It works
by defining a cost function that describes the error of the network and the error is propagated
backwards from the output through the layers, all the way to the input, altering the parameters
along the way. Generally for supervised learning, where an output y for a given input x needs
to be learned, the squared-mean error given in Equation 2.4, is used as a cost function.
Cx =
1
2
||y− f(x)||2
(2.4)
Back-propagation is achieved by taking advantage of a non-linear activation function with
convenient derivatives that simplifies the system’s partial derivatives (the rate of change of each
parameter in the system with respect to the error) such as the sigmoid function (Equation 2.2).
Then by getting the rate of change of the error with respect to each weight and bias in the
system, each weight is altered by a small amount (known as a learning rate). The parameters
are adjusted in the direction of the gradient of each rate of change in order to minimise the cost
function. If this is repeated for a diversified input, a good generalised representation that can
produce useful classifications or representations of an unseen input can be learned.
One disadvantage of neural networks is they don’t benefit from the head start most machine
learning techniques receive from feature engineering. When engineering features the engineer
can use intuition to rapidly develop meaningful features. An artificial neural network may take
time to develop these features. It has to slowly move down a gradient in high dimensional space,
but given enough training data and computational resources they are capable of surpassing even
the most complex hand made features. Without this foresight they can also get stuck in local
minima in the gradient space. This can be helped by using other optimisation strategies such as
simulated annealing.
8 Background
2.3 Learning Semantic Vectors
Semantic vectors can be learned by a neural network being exposed to problems it can only
solve by building layered hierarchical vector representations of the input that are necessary in
solving the problem. The input can be any raw data and the network can learn these vectors
and transformations via back-propagating the error on the output and adjusting the parameters
to minimise the output of a cost function.
These vectors hold enough semantic meaning to attempt to solve any problem the network
has seen in the past, so if the question “is it a cat?” is proposed to a network trained on words in
the English language, inputs such as "lion" and "cat" will have probably have some dimension
close to one, whereas the input “chair” will have a number closer to zero in that dimension. It
is worth noting however that unless this information has been useful to the network in solving a
problem it has seen in the past, it would not learn this information and thereby a diverse input
is essential for complete, robust learning. For example the network could have learned that a
sufficient commonality to identify cats is that they have four legs, but then it could confuse a
chair for a cat given it has four legs. This is not a failure of the network, but more a failure
of the training data. In practice vector dimensions are rarely this quantifiable, this example is
contrived and for illustrative purposes. Getting this kind of information from the generated
vectors requires learning a linear transformation of the vector space, as seen in Section 2.5.
The idea of learning semantic vectors to represent words in the English language is in-
troduced in [4], and developed further in [7], with the unsupervised generation of vector
representations of words. The methods set out in the latter paper generate a language matrix
over the words found on Wikipedia. This matrix in initialised to gaussian noise and after
training represents relations between words as approximately linear relationships in this high
dimensional space, with each column vector representing a word across the language. Each
word in the language indexes a vector from the matrix and provides it as input to the system.
The network is trained to come up with meaningful representations of the input so it can carry
out the task at hand, which in [7], is to give correct English sentences a higher score than
incorrect English sentences.
The use of this method purely for mapping relationships to a linear space has been improved
upon, with the current state of the art set out in [21]. The evaluation of this technology found
that if the vector representing the word "Man" is subtracted from the vector representing the
word "King", and this resultant vector is added to the vector for "Woman", we end up with a
vector that is close to "Queen".
2.4 Composition of Semantic Vectors 9
Fig. 2.4 An example of a Recursive Neural Network, where the output is equal in length to
each of its two child inputs
2.4 Composition of Semantic Vectors
The idea of composing semantic vectors is proposed in [33]. The idea of composing word
vectors together to enable representations of varying length sentences is useful as it allows the
semantic vectors of any sentence length to be compared.
This paper re-introduces the Recursive Neural Network (RNN) set out in [23]. A RNN
(Figure 2.4) is a network that the output vector is the same in length as each of the two input
vectors. This means the output can then be provided to the same network as an input at the next
stage. Each input/output mapping uses the same weight matrix. This is useful for applying
over a structure, such as an abstract syntax tree. Where the child nodes can be composed into
an output vector representation at the parent node. This can then act as a child for its parent,
until a single vector representing the tree is composed from bottom-up. This works under the
assumption that the composition function for the children into a parent is constant for all child
parent relations.
A problem with the basic RNN is the limitation on how one word can transform the other.
The vectors are concatenated before being multiplied by the weight matrix and they cannot
multiplicatively affect one another, they can only additively affect the output vector, as shown
in Equation 2.5. p is the resultant parent vector with c1 and c2 as child vectors transformed by
W, all of which the same length. Note the similarity to Equation 2.1, with f(z) coming from
Equation 2.2, the bias is omitted, as it sometimes is in artificial neural networks to simplify the
system.
p = f(W[c1 : c2]) (2.5)
10 Background
Words like ‘not’ are inherently unary operators and have no real stand alone meaning. They
could be viewed more as matrices than vectors that should transform the semantic vector of
another word. For example, ‘excited’ would have a specific meaning, ‘not excited’ would then
be linearly transformed across the vector space by the word ‘not’. This can conceptually be
achieved through the use of Matrix-Vector Recursive Neural Networks (MV-RNNs).
MV-RNNs [32] seek to solve this issue with assigning each word a matrix, as well as a
vector, that can transform the sibling vector. These matrices can be learned in the same way as
the vectors and can enable words like "not" transform other words into a different area of the
space before combining the two, allowing more flexible representations to be learned. However,
the number of parameters gets squared by the introduction of the matrices and so does the
dimensionality of the gradient space, so these matrices are difficult to learn. A lower parameter
version of this approach is required.
Recursive Neural Tensor Networks (RNTNs) [31] are a development on the MV-RNNs,
they are RNNs that add a term that consists of a concatenation of two child vectors, transposed
[c1 : c2]T then multiplied by a square slice of a parameterised rank 3 tensor Vi, then multiplied
by another concatenation of the child vectors [c1 : c2] (Equation 2.6) to the composition function.
This results in a scalar value and this is repeated for each slice of the tensor until a vector as long
as a child is produced. This is added to product of the weight matrix W and the concatenated
children as in the RNN. Most importantly, this gives the children multiplicative effects on
one another without squaring the parameter set. The effect each position of the vector has on
the other is controlled by the value at a given position in the tensor. This results in certain
dimensions of the semantic word vectors relating to affects on siblings and some relating to
pure semantic content.
pi = f([c1 : c2]T
Vi[c1 : c2]+Wi[c1 : c2]) (2.6)
2.5 Transformation to a Sentiment Space
Carrying the proposition that the semantic space contains all the semantic information relevant
to the network during its training, then it follows that with relevancy of sentiment during
training, the sentiment information must be contained in the semantic space. A projection can
be performed into a sentiment space to reveal the sentiment information.
A sentiment space with five interdependent dimensions is proposed, very negative, negative,
neutral, positive and very positive. A vector can be transformed from the semantic space to
this space by a matrix learned in the same way as the vectors and is trained on labeled data.
So given a vector representing “this movie is terrible” in the semantic vector space, a matrix
2.6 Training 11
Fig. 2.5 Example of the transformation of a semantic vector into the sentiment space
would be trained to output close to a 1 in the very negative dimension and zeros in all the other
dimensions.
2.6 Training
In order to train all the parameters (the semantic vectors, the composition function and the
semantic to sentiment projection matrix), a large amount of labeled data is required. The
Stanford NLP department collected this data by utilising Amazon’s mechanical turk, a service
for crowd sourcing tasks that are difficult for a computer to carry out but take moments for a
human to perform.
They use a common data set acquired from the website Rotten Tomatoes in the form of
movie reviews. The dataset consists of 10,000 sentences which are parsed into their syntax
trees by technologies sourced elsewhere in the Stanford NLP department, resulting in 200,000
subtrees or phrases. Each phrase is presented without context to a person on Mechanical Turk
where they give it a score from very negative to very positive, these scores are then translated
to sentiment vectors that can be used to train all the parameters of the network.
Chapter 3
Implementation
This chapter outlines the processes involved in creating three technologies. Firstly, a Java server
to host the Stanford CoreNLP library described in Section 3.1. Second, a Firefox extension
that parses any website for content, then using the technology outlined in Section 3.1 performs
coloured sentiment highlighting of the text, described in detail in Section 3.2. Finally, the
location-based audio-visual technology titled "Sentiment Rain" is presented in Section 3.3.
3.1 The Stanford CoreNLP Java Library
The technologies described in the previous chapter are available as part of the Stanford CoreNLP
library [18], specifically the technologies described in [31] are available as the sentiment
analysis package.
3.1.1 Configuration
The Stanford sentiment library is available in Java and hosted in the Maven Central repository
[20]. This allows a project to be created with the library listed as a dependency (Fig 3.1).
This keeps the code easier to manage and allows the introduction of updates to the project by
changing the version number and rebuilding. This is particularly useful as the models are likely
to be upgraded regularly by the Stanford NLP department.
The goal is to create a network accessible HTTP endpoint that allows the building of
platform-agnostic, novel implementations with Stanford CoreNLP’s sentiment technologies.
The first task is to choose a server stack that facilitate such an endpoint across the Internet.
As the library is hosted in the Maven Central Repository, a Java server with Maven
building process is the clear choice. A Spring MVC web application and RESTful web
service framework [34] suffices this requirement. It has a Maven build process and manages
14 Implementation
<dependency>
<groupId>edu.stanford.nlp</groupId>
<artifactId>stanford-corenlp</artifactId>
<version>3.5.1</version>
</dependency>
<dependency>
<groupId>edu.stanford.nlp</groupId>
<artifactId>stanford-corenlp</artifactId>
<version>3.5.1</version>
<classifier>models</classifier>
</dependency>
Fig. 3.1 Stanford CoreNLP Maven dependancy listing
dependencies while also providing much of the boilerplate code involved in creating a web
server.
As shown in Fig 3.2, Spring Java Annotators are imported to inject the boilerplate code
involved in creating a web-server into the code-base. Also some Java imports, and the Stanford
CoreNLP libraries with one dependency of the Efficient Java Matrix Library’s SimpleMatrix.
As shown in Fig 3.3, the use of the CoreNLP sentiment library requires a tokenizer and
sentiment annotator. The tokenizer splits any input strings into sentences with the ’ssplit’
annotator followed by splitting into lexical tokens by the ’tokenize’ annotator. They can then
be passed into the sentiment annotator. This first gets annotated by the ’parse’ annotator to
produces the syntax tree, followed by the ’sentiment’ annotator which uses the technologies
described in the earlier chapter to generate a five dimensional sentiment analysis for the tree.
The listing below describes the process in which the server initialises these annotators for the
use in response to a request.
The server is required to respond to the request in Fig 3.4 with a "Content-type: applica-
tion/json" response in Fig 3.5.
As shown in Fig 3.6, to provide this response, the endpoint receives the ’lines’ parameter
from the request via the ’@RequestParam’ Annotation, bound to ’lines’. Each line is taken and
passed through each of the CoreNLP annotators, calculating a weighted average of the resultant
sentiment vector for each sentence in the line and the average is calculated. Only one sentence
should be present, but in the implementation in Section 3.3 it is useful, as the client requires an
overall sentiment for a multiple sentences. However the more sentences that are averaged, the
less the sentiment can be relied on so this functionality will be used sparingly and only where
appropriate. The server responds as defined in the following snippit, i.e. the line followed by a
number between 0 and 1. 1 meaning positive and 0 meaning negative.
3.1 The Stanford CoreNLP Java Library 15
package server;
import org.springframework.stereotype.Controller;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.ResponseBody;
import java.util.HashMap;
import java.util.List;
import java.util.Properties;
import java.io.*;
import org.ejml.simple.SimpleMatrix;
import edu.stanford.nlp.sentiment.*;
import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.ling.Sentence;
import edu.stanford.nlp.neural.rnn.RNNCoreAnnotations;
import edu.stanford.nlp.pipeline.Annotation;
import edu.stanford.nlp.pipeline.StanfordCoreNLP;
import edu.stanford.nlp.trees.Tree;
import edu.stanford.nlp.util.CoreMap;
Fig. 3.2 Server library imports
16 Implementation
@Controller
public class SentimentController {
StanfordCoreNLP tokenizer;
StanfordCoreNLP pipeline;
public SentimentController() {
Properties tokenizerProps = new Properties();
tokenizerProps.setProperty("annotators", "tokenize, ssplit");
this.tokenizer = new StanfordCoreNLP(tokenizerProps);
Properties pipelineProps = new Properties();
pipelineProps.setProperty("annotators", "parse, sentiment");
pipelineProps.setProperty("ssplit.isOneSentence", "true");
pipelineProps.setProperty("enforceRequirements", "false");
this.pipeline = new StanfordCoreNLP(pipelineProps);
}
Fig. 3.3 Server initialisation code
GET /sentiment?lines=what+a+bad+film&lines=actually+I+really+like+it
Fig. 3.4 Required API endpoint
{
"0":{ "sentiment":0.2580948040808713,
"line":"what a bad film" },
"1":{ "sentiment":0.6893179757900284,
"line":"actually I really like it" }
}
Fig. 3.5 Sentiment Server Response
3.1 The Stanford CoreNLP Java Library 17
@RequestMapping("/sentiment")
public @ResponseBody HashMap<Integer,HashMap<String,Object>>
sentiment(
@RequestParam(value="lines", required=true) List<String> lines,
Model model ) {
HashMap<Integer,HashMap<String,Object>> response =
new HashMap<Integer,HashMap<String,Object>>();
for(int i = 0; i < lines.size(); i++) {
response.put(i,new HashMap<String,Object>());
Annotation annotation = tokenizer.process(lines.get(i));
pipeline.annotate(annotation);
double sentiment = 0;
int sentenceCount =
annotation.get(CoreAnnotations.SentencesAnnotation.class)
.size();
for( int j = 0; j < sentenceCount; j++ ) {
CoreMap sentence =
annotation.get(CoreAnnotations.SentencesAnnotation.class)
.get(j);
Tree tree =
sentence.get(SentimentCoreAnnotations.AnnotatedTree.class);
SimpleMatrix vector =
RNNCoreAnnotations.getPredictions(tree);
sentiment += (vector.get(1)*0.25+
vector.get(2)*0.5 +
vector.get(3)*0.75+
vector.get(4))/
(double)sentenceCount;
}
response.get(i).put("line",lines.get(i));
response.get(i).put("sentiment",sentiment);
}
return response;
}
}
Fig. 3.6 The sentiment server endpoint’s code
18 Implementation
Fig. 3.7 Firefox extension that highlights sentiment out of paragraphs of text.
3.1.2 Deployment
A first attempt at deployment was to use a free instance on Heroku [14]. Heroku is a Cloud
Platform as a Service that supports a wide variety of languages and configurations, it also
allows for easy deploys via a git [11] from the command line and it is also free for limited
use. This option is unavailable as the Stanford CoreNLP models are over 300MB in size and
300MB is the maximum allowed size on Heroku.
An alternative is to deploy on an Amazon EC2 Server [1]. An Ubuntu server is operating
on AWS and with code pulled via git. The server is made available at a static IP address
for responding to requests. The domain conorbrady.com is used - whose DNS records are
managed by Cloundflare [6] - to create a domain name at stanford-nlp.conorbrady.com
and using a DNS A record, it points to the EC2 server. An endpoint as described in Fig 3.4 is
made available on this server.
3.2 Sentiment Firefox Extension
The first build using the previously mentioned server is a Firefox extension, [10] this extension
runs in a Firefox browser and highlights positive sentences as green and negative sentences as
red, as shown in Fig. 3.7.
This uses content scripts, [8] these are provided to the browser from an extension on
pageload, modifying the content of the page.
3.2 Sentiment Firefox Extension 19
/* <p> elements inside <article> elements */
article p
/* <p> elements inside elements whose class or id
contain article or content as a sub string */
[class*=content] p
[id*=content] p
[class*=article] p
[id*=article] p
/* <p class="tweet-text"> elements, for twitter */
p.tweet-text
/* <p> elements inside elements with class="body",
for the webpage used in the results section */
.body p
Fig. 3.8 Firefox extension content selectors
3.2.1 Selectors
Looking for consistency in websites is impossible, every website has their structure and
nomenclature, and it is at their discretion. So there is no way to pick off relevant text than to
compile a list of selectors [16]. Selectors are the means with which Javascript can reference
objects in the HTML of a webpage. The comments above the selectors in Fig 3.8 denote what
elements they will select in the HTML.
This is an incomplete list, but there is no complete list. If it is too general, unnecessary
information will be sent to the server for analysis, wasting resources and slowing down an
already slow, processing intensive process. jQuery wildcard selectors are employed to increase
the system’s robustness, the benefits are documented in the results chapter. Elements are then
removed that contain other: paragraphs, textareas and scripts, as these cause issues and are not
required by the extension to provide its functionality.
3.2.2 Sentence Boundaries
Once the elements are selected, the sentences are extracted from the HTML. A record is kept
of where the sentences stars and end so it can be reconstructed on reinsertion. This is achieved
by gathering a list of sentence markers. These are markers that are acquired by testing the
paragraph for two regular expressions [25] shown in Fig 3.9.
The first regex finds possible sentence boundaries. As regex look-behinds are unsupported
in Javascript, a second pass has to be performed once a candidate has been identified. The
second pass tests for strings such as ’i.e.’, ’e.g.’, ’Mr’, ’Dr’ and so forth. Upon a match of the
20 Implementation
/([.?!])s*(?=[A-Z]|<)/g
/(?:w.w)|(?:[A-Z][a-z])/g
Fig. 3.9 Sentence boundary detection regexes
Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: GET
Access-Control-Max-Age: 3600
Access-Control-Allow-Headers: x-requested-with
Fig. 3.10 Headers required to enable Cross-Origin Resource Sharing
second regex the candidate is rejected. The start and end (0 and length) are also added to the
sentence markers as they will fail to match the first regex but are sentence markers.
3.2.3 Server Interaction
These acquired, the paragraph is substringed by the sentence markers and all HTML tags are
removed from the text, leaving only the text with the sentiment information.
The sentences are then sent to the server via an HTTP GET request. Each sentence is passed
as a ’lines’ GET parameter in the request to the server.
When a HTTP request is made from a browser via Javascript, it requires that either the
domain of the server responding to the request be the same as the current website domain, or
the server receiving the request must support Cross Origin Resource Sharing or CORS [9].
The server has to be modified to send the appropriate headers, as it would be impossible to
guarantee the domain of the source of the request. The plugin is running in the browser and
could be applied to virtually any site. The response headers in Fig 3.10 are added to the server’s
response and the browser now allows the responses from the server to pass its security.
3.2.4 Colouring
The sentence markers are used to deliminate <span> elements, elements which have no impact
on the webpage apart from providing the facility to apply custom CSS to a precise section.
These are given inline styling with a colour value computed by the function in Fig 3.11.
This results in a black colour for 0.5, indicating a neutral sentiment, becoming green at
1.0 and red at 0.0, indicating these extrema, with a gradient between the three values. It is
important leave the functionality of the page unaffected as changing any <a> tags could render
3.3 Sentiment Rain 21
function getRGB(sentiment) {
var red = Math.floor(
Math.max(0, -510 * sentiment + 255)).toString(16);
var green = Math.floor(
Math.max(0, 510 * sentiment - 255)).toString(16);
return ’#’ + ( red.length == 2 ? red : "0" + red )
+ ( green.length == 2 ? green : "0" + green ) + ’00’;
}
Fig. 3.11 Function for calculating the colour from a sentiment value
the site unusable. The insertion the <span> elements ensure the safety of the site’s functionality,
as all other content remained unchanged.
3.2.5 HTTPS
This will not work for sites using HTTPS [26]. The rules regarding CORS state that any request
must be made on the same protocol as the current browser location. This means websites like
Twitter that exclusively use HTTPS require HTTPS to be set up on the endpoint to work on
those sites.
Cloudflare offer a service that allows the proxying of traffic through their server, offering a
HTTPS connection to the client but connecting to the sentiment server via HTTP. This means
configuration of HTTPS or purchasing a certificate from a Certificate Authority is unnecessary.
Cloundflare is configured to offer the service and the extension is modified to make the request
via HTTPS when on a secure website, such as Twitter. (Fig. 3.12)
3.2.6 Overview Chart
To provide an overview of the sentiment of the page, HighCharts [15] is used. Highcharts is a
Javascript library that allows for easy creation of simple charts. As each sentence is processed
on the page it increments the respective bucket on the histogram defined by the sentiment value.
This creates a histogram reflecting the number of sentences of each sentiment.
3.3 Sentiment Rain
This section describes a separate use of the sentiment analysis engine, to map Tweets over the
course of a time frame, specifically the Dublin Marathon 2014 (Fig. 3.13) then live over San
Fransisco.
22 Implementation
Fig. 3.12 Plugin running on Twitter over HTTPS
Fig. 3.13 Sentiment Rain scenario showing sentiment of Tweets during the Dublin Marathon in
2014
3.3 Sentiment Rain 23
http://graisearch.scss.tcd.ie/query/Graisearch/sql/:querystring
Fig. 3.14 GRAISearch database endpoint
3.3.1 The TCD GRAISearch Dataset
A database is made available on a read-only basis from the department of statistics in Trinity
College. This database is exposed on the endpoint in Fig 3.14, once supplied with valid
credentials using HTTP Basic Auth. HTTP Basic Auth is a simple form of authorisation where
a username and password combination is encoded into a HTTP header. As this is encoded and
not encrypted, this is potentially insecure, a HTTPS connection would resolve this problem
but at time of writing is unavailable on the GRAISearch server, which returns a ’503 - Service
Unavailable’ when requested with a HTTPS protocol.
The dataset contains numerous geocoded, timestamped Tweets collected during the Dublin
Marathon. By leveraging this dataset with the sentiment analysis server, an interactive experi-
ence of Twitter activity during the day of the Marathon is created.
3.3.2 Server Stack
The requirements of the server are minimal. The majority of the processing is done client side
with minimal business logic occuring on the server. It serves to go between the client, the
GRAISearch dataset and the sentiment server. No database is required as the data is streamed
from the GRAISearch database. All that is required is a thin layer to serve the page and to
provide some endpoints for the Javascript to get information it needed for the visualisation (the
Tweet objects).
The Sinatra Ruby Mirco-framework [30] is an appropriate choice, it is lightweight and quick
to get up and running. It also defaults to using productive technologies such as Coffeescript
(a terse expressive language that compiles to Javascript for the browser), SASS and HAML
(similar technologies relating to CSS and HTML respectively). A comparison with a large
traditional MVC framework such as Ruby on Rails reveals a cost of considerable set up time
for unneeded features.
Heroku is used for hosting, as this time the server has sufficiently small a footprint and is
more typical of what Heroku is generally used for. This allows for a git push to Heroku from the
command line to update the server easily. They also provide the option to create a subdomain of
herokuapp.com that resolve to the server’s IP address. sentiment-rain.herokuapp.com is
chosen and CNAME DNS record is set up on Cloudflare to point from
24 Implementation
{
"id": "526661075383894016",
"lat": 53.72362511,
"lon": -6.33728538,
"text": "Best of luck to our online editor @JaneLundon running the
@dublinmarathon today. We’re so proud of you Jane! @Elverys ",
"link": "http://t.co/RfAKcz8FsF",
"created_at": "1414400759000",
"sentiment": 0.6815987306407396
}
Fig. 3.15 Sample required /scenario_tweets resource
sentiment-rain.conorbrady.com to to the Heroku supplied domain. The server is now
available at http://sentiment-rain.conorbrady.com/
3.3.3 Map
A powerful Javascript browser mapping library is needed with high customisability. Google
Maps was the original path of research, but Mapbox [19] offers more options for customising
the mapview and drawing shapes over it. Mapbox is built on top of leaflet - an expressive map
drawing library, it is the choice of Foursquare, Pinterest and Evernote to name a few and it is
free to use for the purposes of this project.
3.3.4 Scenario-Tweets Resource
A resource is provided at /scenario_tweets?since=:since&limit=:limit
This returns a number (:limit) of Tweets since a certain timestamp (:since) in a JSON
format, an example is shown in Fig 3.15.
To implement this endpoint, the GRAISearch dataset must be queried on the URL described
in Fig 3.14.
Requesting:
• The ID as defined by Twitter
• The coordinates of tweet
• The text of the tweet
3.3 Sentiment Rain 25
Fig. 3.16 Sentiment Rain architecture
• The time the tweet was created at
Filtered by:
• Only English tweets
• That had no null fields of those requested
• Created since the time defined on the request
• Limited by the limit on the request
Upon the GRAISearch server’s response, URLs have to be parsed out of all the text
responses to get the raw tweet text. All the responses are then encoded into a HTTP GET
request and sent to the server for analysis and the results are filtered back into the tweet objects
for returning to the client. The architecture of this system is shown in Fig. 3.16.
3.3.5 Javascript Visualisation
To visualise the data, a clock is initialised at 8:00am on the morning of the Dublin Marathon
2014. The clock progresses at 5 minutes of simulated time per second of real time and ticks 12
times per second real time.
26 Implementation
Fig. 3.17 Tweet selected by clicking a circle
On each tick it messages the view and the data source separately and allows them to carry
out their appropriate action, depending on their state.
The data source works simply, if it is not already waiting on a network request to return
tweets, it checks the current time against the latest tweet it already received, if this reveals that
the visualisation will run out of new tweets in 6 seconds of real time, it will asynchronously
request the next batch of tweets from the server and feed it into the view.
The view visualises the tweets it is aware of at the time. It is the data source’s job to feed
it with tweets to visualise. When the view receives a tweet it wraps it in a TweetView object.
It is these objects that represent the circles on the map and any accompanying interactivity.
Upon creation, a click listener is attached to the circle that produces the tweet overlaid on the
map when a mouse click event occurs on it, as shown in Fig. 3.17. This is achieved by using
Twitter’s Javascript SDK and passing the Tweet ID to a call along with a webpage element
in which to render it. This element is one with id ’frame’ and exists for the sole purpose of
containing these tweets.
On each clock tick, the view passes the timestamp to each TweetView it also removes any
TweetViews from its set that have destroyed themselves. This destruction occurs when they
detect they should no longer be visible and will invisible in the future.
The actual TweetView objects shrink their circle’s size and modulate their border radius
with respect to the supplied timestamp and the time they were created at on Twitter. After a
certain point in time after they are created, they will have no size and call a destroy method on
3.3 Sentiment Rain 27
Fig. 3.18 Sound synthesis schematic for audio creation parametrised by sentiment and location
themselves that instructs the parent view to stop messaging them and let the garbage collector
clean them up as described previously.
3.3.6 Sound
Given the visual nature of this project it is appropriate to furnish it with a complimentary
soundscape. For this the Audiolet Javascript library [2] is appropriate. This allows sample-
altering and frequency-producing nodes to be connected in a network to produce sound effects
and music synthesis.
After some experimentation and research the configuration shown in Fig. 3.18 was produced.
This generates a low tone for negative sentiment and a high tone for positive sentiment. These
tones are selected from a C Minor scale with the 2nd, 4th and 7th removed as these are not
dominant notes of the scale and as the notes are selected with uniform randomness, it is better
that the selection be of more dominant notes to reinforce the scale.
28 Implementation
Fig. 3.19 An example of a ASDR envelope
From there it is connected to a low pass filter that is modulated by a square wave. A low
pass filter filters out high frequencies and allows lower frequencies to pass, when the cut off
frequency is varied or modulated, this produces a rhythmic effect. The modulating square
wave’s frequency is controlled by the sentiment, giving a slow modulation for negative and a
high modulation for positive, between 1Hz and 16Hz, this modulates the cutoff of the low pass
filter between 5kHz and 9kHz resulting in a rhythmic effect.
A device known as an ADSR (Attack, Decay, Sustain, Release) envelope produces the
signal shown in Fig 3.19. This is triggered the moment the tweet is tweeted, then summed
with another modulation square wave that is off the frequency of the modulator of the low
pass filter by enough to create a rhythm between the two. The summation of the ADSR and
former modulation square wave produces a signal that begins with the transient of the ASDR
combined with the choppiness of the square wave, but after the release continues to hold with
just the square.
This summed signal is combined in a multiplier with the previous signal from the low
pass filter, this acts on the output of the low pass filter as an amplitude control, to create the
enveloped sound with a transient followed by a rhythmic amplitude at odds with the already
present modulated low pass filter.
The final 2 stages give spatial awareness to the sound by first attenuating it with respect to
its distance to the map centre, and panning it with respect to its x position on the map.
3.3 Sentiment Rain 29
Fig. 3.20 Live view of tweets over San Francisco
3.3.7 Live
As a final experiment a live stream of tweets from Twitter is connected, to showcase the tweets
over San Fransisco in real time. This is fed directly from Twitter’s API as opposed to the
dataset made available from TCD GRAISearch. This means Twitter’s API policies such as rate
limiting and requesting based on location had to be adhered to. After that the approach and
technologies are similar and it is a success [29].
Chapter 4
Experimental Results
4.1 Stanford Sentiment Server
This server performs as expected with roughly 200ms round trip time per sentence submitted in
the request. Measuring the performance of sentiment analysis, the server performs as well to
Stanford NLP department’s live demonstration [24] in all comparisons. In future it is expected
that Stanford’s demonstration will perform better as it has access to the most up to date training.
No replication or load balancing is in play and this implementation will not scale to large
amounts of users requesting simultaneously, but as no shared state is held on the server the task
of replication and load balancing is straight forward when required.
4.2 Firefox Extension
The Firefox extension has 40% recall of relevant content with the basic Javascript selector
article p. By augmenting the selectors with jQuery wildcard selectors (as explained in
Section 3.2.1) the recall rises to 88% of relevant content. These selectors search anywhere in
the class or id attribute of elements in the webpage for the string of either "article" or "content",
instead of trying to match the whole name. Due to the unpredictable nature in which websites
are constructed this works exceptionally well, as demonstrated in Fig. 4.1.
The sentence boundary splitting appears robust, with 95% precision of detection of sentence
boundaries across Fig 4.2, Fig 4.3 and Fig 4.4. The approach presented has independently been
estimated at 95% effective [22].
Each paragraph of content requires around three seconds to process on an idle server. This
processing is carried out in a serial manner, therefore the processing time of the page is a linear
function of the number of paragraphs, with paragraph length taken into account. As mentioned
32 Experimental Results
Website Selector
http://www.vulture.com/ article p
http://www.theglobeandmail.com/ article p
http://www.detroitnews.com/ article p
http://theadvocate.com/ article p
http://wegotthiscovered.com/ article p
http://www.newyorker.com/ article p
http://qctimes.com/ article p
http://www.rollingstone.com/ article p
http://www.dailynews.com/ article p
http://www.rogerebert.com/ article p
http://www.thestar.com/ [class*=article] p
http://www.abc.net.au/ [class*=article] p
http://www.forbes.com/ [class*=article] p
http://www.reviewjournal.com/ [class*=content] p
http://www.nj.com/ [class*=content] p
http://thepopcornjunkie.com/ [class*=content] p
http://baretnewswire.org/ [class*=content] p
http://www.tvinsider.com/ [class*=content] p
http://filmink.com.au/ [id*=article] p
http://www.vox.com/ [id*=article] p
http://theyoungfolks.com/ [id*=content] p
http://screenrant.com/ [id*=content] p
http://www.cityweekly.net/ fail
http://www.reelingreviews.com/ fail
http://www.ericdsnider.com/ fail
Fig. 4.1 Random sample of websites to test effectiveness of content selection from extension
4.2 Firefox Extension 33
Fig. 4.2 Short movie review [5], colour coded based on sentiment
previously due to limitations on the server this does not scale well and after a number of users
performance will become sluggish.
Fig. 4.2 to Fig. 4.5 show the effect the of the extension on a series of short, opinionated
movie reviews. Due to the libraries training on movie reviews, this is a perfect demonstration of
its abilities. The reader is encouraged to evaluate the effectiveness for themselves. The Figures
show parallels between sentiment and colour coding. Also it can be seen in these excerpts, and
that of Fig 4.6, that no functionality has been affected in the page, all links remain clickable.
The chart overview offers little insight into the pages sentiment. Simply counting the
number of positive and negative sentences reveals little. If one sentence can carry much more
weight than the others, it should represent more value on the histogram. Without said weights
this representation is impossible. Also hidden text on the page will contribute to the histogram’s
state. While colouring the text this does not present a problem, as the user cannot see it to begin
with it. The problem this chart attempts to solve is complex and likely an ongoing research
effort in Stanford’s NLP department. That is attempts to decipher the importance and weight of
a sentence relative to its surrounding content.
Fig 4.6 shows an article on Wikipedia, a website with little to no opinion, apparently has
sentiment information. It is the authors conviction this is due to two forms of sentiment, the
first being sentiment opinion of the writer which is evident in the movie reviews, and the
34 Experimental Results
Fig. 4.3 Short movie review [5], colour coded based on sentiment
Fig. 4.4 Short movie review [5], colour coded based on sentiment
Fig. 4.5 Short movie review [5], colour coded based on sentiment
4.2 Firefox Extension 35
Fig. 4.6 Excerpt from the Wikipedia page of Dublin, Ireland
36 Experimental Results
second being sentiment opinion of the reader as apparent in this article. Take the sentence
of "In response to Strongbow’s successful invasion ... pronounced himself Lord of Ireland".
This is a subjective sentiment, some people would think it positive and some negative but it is
unrepresentative of an opinion. This is likely a failure in the training of the dataset with bias
and personal outlook creeping into the model. It could be due to an under-trained network, but
evidence suggests the contrary as this theme is consistent across Wikipedia.
The measure in which this library achieved 85% correct classification ignored the neutral
class [31]. Evidenced by Fig 4.6 it could be argued that a neutral class is as important as
positive and negative and should be included in future state of the art benchmarks.
A packaged extension can be be found on the CD accompanying this report in the folder
entitled "demonstration".
4.3 Sentiment Rain
Sentiment Rain is a success on the desktop browsers, Safari and Chrome on OSX. The frame
rate is smooth and the colours and tones reflect the sentiment. It struggles on Firefox on OSX
due to the manner in which that browser allocates its threads. In Firefox there is only one
thread available to the Javascript engine per tab, when a network call is made it disrupts the
user interface’s processing and this manifests as stuttering in the browser. Safari and Chrome
handle this better with the networking not interfering with the user interface. Internet Explorer
remains untested for compatibility.
As proof of concept this is a sufficient result. A screenshot is provided in Fig 4.7. As
a screenshot is a poor demonstration of this implementation, the reader is encouraged to
investigate the live demos in the Chrome browser, [12] available at [28] and [29]. Also a
video demonstration is included on the CD accompanying this report in the folder entitled
"demonstration".
4.3 Sentiment Rain 37
Fig. 4.7 Screenshot of Sentiment Rain over the Dublin Marathon 2014
Chapter 5
Future Work
5.1 Sentiment Analysis and Semantic Vectors
The research in this project is about learning semantic vector representations and how to gain
useful information from them. As the semantic vectors’ axes are undefined, a transformation
must be performed to known axes to gain insight into the information contained within the
space. Sentiment is a poorly defined concept as it is not always objectively clear what the
sentiment of a sentence is, for example "I love when people die in war" gets a sentiment rating
of 0.54. This sentence contains sentiment, but in multiple, separate dimensions. When this
space gets projected to a one dimensional space all the sentiment inherent in the sentence is
lost.
If clear objective axes are defined, and a projection is learnt to them, the network could
yield a more useful insight into the semantic meaning of the vector representations. If a
multidimensional sentiment model could be proposed such as the sentiment of the writer on
one axis and the perceived sentiment of the reader on the other, the network may be trained to
produce a more robust, context free sentiment analyser than the ones that have been seen so
far. A model such as this is hard to define, the proposal above may not be a valid model, as the
sentiment of a reader is a subjective concept and thus may require a different transformation
based on the political and moral viewpoint of the reader.
It would be interesting to investigate what other meaningful projections could be achieved
from the semantic vectors. A model where sentiment is decomposed into two dimensions
may just be the beginning. The field of sentiment analysis could be generalised into one
of semantic analysis with multiple dimensions revealing their own information along with
sentiment. Multiple transformation matrices could encode different viewpoints and encode
personal biases into the vectors and allow for insight into the personal semantic information a
reader would receive from a piece of text.
40 Future Work
5.2 Stanford Sentiment Server
The sentiment server could be spread across multiple machines with caches and load balancers
to enable scalability. This system could be scaled as there would be no interdependence of
stored data and therefore could load balance with ease. As this was merely a proof of concept,
the non scalable server was sufficient for these purposes.
The biggest improvement in scalability would be sourced with the removal of the server
entirely and replace with a similar library written in Javascript. This would allow much cheaper
scalability as no dedicated server would be required for sentiment processing and each user
would do the sentiment analysis on their own machine. This could be trained on Stanford’s
sentiment treebank [31] to achieve similar results.
5.3 Firefox Extension
Moving sentence splitting to the server would prove much more reliable as the CoreNLP
contains functionality to split sentences more robustly than anything Javascript libraries can
offer. This is too significant an architecture redesign to undertake at this time but should be
considered early on.
The other issue concerns reliable extraction of pertinent text. This problem has no clear
solution as each webpage has potentially a different structure to any seen before. One possible
solution would be to provide a facility to click paragraphs that are of interest and derive a
selector to parse it, and update the selector list for future.
Increasing interactivity in the extension, allowing users to correct mislabelled sentences to
augment the training set in future would prove invaluable. As this is a deep learning technology,
the more data the network has to learn and generalise with, the more powerful it becomes.
5.4 Sentiment Rain
Caching on the backend, both in the live and scenario models would allow for large performance
gains and scalability. In the live model, each request that comes into the server requests new
tweets from Twitter. Given a modest number of users this would quickly exceed twitter’s rate
limits and render the system useless, until the API reopens again 15 minutes later, only to
be rapidly exceeded again. As each user is looking for the same tweets only one request is
necessary and should be cached for successive requests from clients.
The other need for caching involves both live and scenario models. The sentiment for each
tweet is calculated at time of request. This is wasteful of resources as if two users are requesting
5.4 Sentiment Rain 41
the same tweets there is no need to analyse them twice. Online analysis is unavoidable in
the live model, but should only be analysed once and cached for future requests from other
users. In the Dublin Marathon scenario, tweets can be pre-analysed removing the need for the
sentiment server altogether, this would improve responsiveness and scalability massively.
Tweets could be preprocessed. Emojis and hashtags are unlikely to provide meaningful
inputs to the system. If they could be substituted for more clear words or removed altogether a
performance boost could be perceived.
Chapter 6
Conclusions
In conclusion deep learning is an effective means of determining sentiment analysis, as sen-
timent is one projection on semantic meaning. The technology set out in [31] is the only
technology that takes the entire semantic meaning of a sentence into account before applying
this projection. Words alone cannot determine sentiment without first taking into account
what they mean in the context of the sentence and the ordering they appear in. Without this
composition of semantic vectors further progress in sentiment analysis is unlikely.
By leveraging the flexibility of the HTTP protocol, a platform agnostic endpoint contained
within a replicatable server is created. This allows for a scalable sentiment analysis server
infrastructure.
This powers two novel implementations of sentiment analysis, one on any piece of text
visited by the browser, the other temporally mapping tweets across a map during a given
scenario. Implementations of this type are effective uses of this technology and warrant further
investigation in the future.
References
[1] Amazon Web Services Elastic Cloud 2. http://aws.amazon.com/ec2/. [Online; accessed
17-April-2015]. 2015.
[2] Audiolet - JavaScript library for audio synthesis and composition. http://oampo.github.
io/Audiolet/. [Online; accessed 17-April-2015]. 2015.
[3] Yoshua Bengio. “Learning deep architectures for AI”. In: Foundations and trends® in
Machine Learning 2.1 (2009), pp. 1–127.
[4] Yoshua Bengio et al. “A Neural Probabilistic Language Model”. In: J. Mach. Learn. Res.
3 (Mar. 2003), pp. 1137–1155. ISSN: 1532-4435. URL: http://dl.acm.org/citation.cfm?
id=944919.944966.
[5] Don Chartier. Short and Sweet Movie Reviews. http://shortandsweet.blogspot.ie/.
[Online; accessed 17-April-2015]. 2015.
[6] Cloudflare. https://www.cloudflare.com/. [Online; accessed 17-April-2015]. 2015.
[7] Ronan Collobert and Jason Weston. “A unified architecture for natural language pro-
cessing: Deep neural networks with multitask learning”. In: Proceedings of the 25th
international conference on Machine learning. ACM. 2008, pp. 160–167.
[8] Content Scripts | Mozilla Developer Network. https://developer.mozilla.org/en-US/Add-
ons/SDK/Guides/Content_Scripts. [Online; accessed 17-April-2015]. 2015.
[9] Cross-Origin Resource Sharing | w3.org. http://www.w3.org/TR/cors/. [Online; accessed
17-April-2015]. 2015.
[10] Firefox Extensions. https://addons.mozilla.org/en-US/firefox/extensions/. [Online;
accessed 17-April-2015]. 2015.
[11] Git –fast-version-control. http://git-scm.com/. [Online; accessed 17-April-2015]. 2015.
[12] Google Chrome Browser. https://www.google.ie/chrome/browser/desktop/. [Online;
accessed 18-April-2015]. 2015.
[13] GRAISearch. Use of Graphics Rendering and Artificial Intelligence for Improved Mobile
Search Capabilities. FP7-PEOPLE-2013-IAPP (612334) 2015-18.
[14] Heroku. https://www.heroku.com/. [Online; accessed 17-April-2015]. 2015.
[15] Highcharts - Interactive JavaScript charts for your webpage. http://www.highcharts.
com/. [Online; accessed 17-April-2015]. 2015.
[16] jQuery Selectors. https://api.jquery.com/category/selectors/. [Online; accessed 17-April-
2015]. 2015.
46 References
[17] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. “ImageNet Classification with
Deep Convolutional Neural Networks”. In: Advances in Neural Information Processing
Systems 25. Ed. by F. Pereira et al. Curran Associates, Inc., 2012, pp. 1097–1105. URL:
http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-
neural-networks.pdf.
[18] Christopher D. Manning et al. “The Stanford CoreNLP Natural Language Processing
Toolkit”. In: Proceedings of 52nd Annual Meeting of the Association for Computational
Linguistics: System Demonstrations. 2014, pp. 55–60. URL: http://www.aclweb.org/
anthology/P/P14/P14-5010.
[19] Mapbox | Design and publish beautiful maps. http://www.mapbox.com/. [Online;
accessed 17-April-2015]. 2015.
[20] Maven. http://search.maven.org/. [Online; accessed 17-April-2015]. 2015.
[21] Tomas Mikolov et al. “Efficient estimation of word representations in vector space”. In:
arXiv preprint arXiv:1301.3781 (2013).
[22] John O’Neil. Doing Things with Words, Part Two: Sentence Boundary Detection. http:
//web.archive.org/web/20131103201401/http://www.attivio.com/blog/57-unified-
information-access/263-doing-things-with-words-part-two-sentence-boundary-
detection.html. [Online; accessed 17-April-2015]. 2008.
[23] Jordan B Pollack. “Recursive distributed representations”. In: Artificial Intelligence 46.1
(1990), pp. 77–105.
[24] Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank -
Live Demo. http://nlp.stanford.edu:8080/sentiment/rntnDemo.html. [Online; accessed
17-April-2015]. 2015.
[25] Regular Expressions - Javascript | MDN. https://developer.mozilla.org/en/docs/Web/
JavaScript/Guide/Regular_Expressions. [Online; accessed 17-April-2015]. 2015.
[26] Eric Rescorla. HTTP Over TLS. RFC 2817. RFC Editor, May 2000, pp. 1–7. URL:
http://tools.ietf.org/html/rfc2818.
[27] David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. “Learning internal
representations by error-propagation”. In: Parallel Distributed Processing: Explorations
in the Microstructure of Cognition. Volume 1. Vol. 1. 6088. MIT Press, Cambridge, MA,
1986, pp. 318–362.
[28] Sentiment Rain | Dublin Marathon 2014. http://sentiment-rain.conorbrady.com/scenario.
[Online; accessed 17-April-2015]. 2015.
[29] Sentiment Rain | Live Over San Fransisco. http://sentiment-rain.conorbrady.com/live.
[Online; accessed 17-April-2015]. 2015.
[30] Sinatra - A Ruby Server Micro-framework. http://www.sinatrarb.com/. [Online; accessed
17-April-2015]. 2015.
[31] Richard Socher et al. “Recursive deep models for semantic compositionality over a
sentiment treebank”. In: Proceedings of the conference on empirical methods in natural
language processing (EMNLP). Vol. 1631. Citeseer. 2013, p. 1642.
References 47
[32] Richard Socher et al. “Semantic Compositionality Through Recursive Matrix-vector
Spaces”. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural
Language Processing and Computational Natural Language Learning. EMNLP-CoNLL
’12. Jeju Island, Korea: Association for Computational Linguistics, 2012, pp. 1201–1211.
URL: http://dl.acm.org/citation.cfm?id=2390948.2391084.
[33] Richard Socher et al. “Semi-supervised Recursive Autoencoders for Predicting Sen-
timent Distributions”. In: Proceedings of the Conference on Empirical Methods in
Natural Language Processing. EMNLP ’11. Edinburgh, United Kingdom: Association
for Computational Linguistics, 2011, pp. 151–161. ISBN: 978-1-937284-11-4. URL:
http://dl.acm.org/citation.cfm?id=2145432.2145450.
[34] Spring Web MVC Framework. http://docs.spring.io/spring/docs/current/spring-
framework-reference/html/mvc.html. [Online; accessed 17-April-2015]. 2015.
[35] Dong Yu and Li Deng. Automatic Speech Recognition - A Deep Learning Approach.
Springer, Oct. 2014. URL: http://research.microsoft.com/apps/pubs/default.aspx?id=
230891.
hardback

hardback

  • 1.
    On Using DeepLearning for Sentiment Analysis Conor Brady School of Computer Science and Statistics University of Dublin This dissertation is submitted for the degree of BA(mod) in Computer Science Trinity College April 2015
  • 3.
    I dedicate thisproject to my parents, Jim and Méabh Brady, without whose unwavering support during my constant meandering over the last few years, I surely would have not had the opportunity to submit this for consideration for my undergraduate degree. I owe you more than I can ever repay.
  • 5.
    Declaration I hereby declarethat this thesis is, except where otherwise stated, entirely my own work and that it has not been submitted as an exercise for a degree at any other university Conor Brady April 2015
  • 7.
    Acknowledgements Foremost I wouldlike to acknowledge my supervisor Rozenn Dayhot, who was helpful, involved and supportive all the way through the last few months. I would like to thank Conor Murphy and Micheal Barry for reading drafts of this report and providing invaluable feedback. A special thanks to Paddy Corr, whose supply of cigarettes kept me going through the hard times. Last but not least, Sam Green. Thanks for the speakers. They’re great speakers.
  • 9.
    Abstract This project relatesto the analysis of applications of deep learning to sentiment analysis. A technology developed at Stanford University relating to this field is described. This technology achieves state the art results in sentiment analysis of sentences of varying length. This is then applied in two novel implementations. Firstly, a Firefox extension that parses text on a webpage, then colours it according to sentiment. Secondly, a Javascript visualisation of the sentiment of users as they tweet during the Dublin Marathon 2014 and live over San Fransisco. These technologies are then evaluated along with the Stanford sentiment analysis engine.
  • 11.
    Table of contents Listof figures xiii 1 Introduction 1 1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2.1 HTTP Sentiment Endpoint . . . . . . . . . . . . . . . . . . . . . . . 2 1.2.2 Sentiment Analysis Firefox Extension . . . . . . . . . . . . . . . . . 2 1.2.3 Geolocated Visualization of Tweets . . . . . . . . . . . . . . . . . . 2 2 Background 3 2.1 Distributed Semantic Vector Representations . . . . . . . . . . . . . . . . . . 3 2.2 Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2.1 Feature Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2.2 Artificial Feed-Forward Neural Network . . . . . . . . . . . . . . . . 5 2.3 Learning Semantic Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.4 Composition of Semantic Vectors . . . . . . . . . . . . . . . . . . . . . . . 9 2.5 Transformation to a Sentiment Space . . . . . . . . . . . . . . . . . . . . . . 10 2.6 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3 Implementation 13 3.1 The Stanford CoreNLP Java Library . . . . . . . . . . . . . . . . . . . . . . 13 3.1.1 Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.1.2 Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.2 Sentiment Firefox Extension . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.2.1 Selectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.2.2 Sentence Boundaries . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.2.3 Server Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.2.4 Colouring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
  • 12.
    xii Table ofcontents 3.2.5 HTTPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.2.6 Overview Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.3 Sentiment Rain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.3.1 The TCD GRAISearch Dataset . . . . . . . . . . . . . . . . . . . . . 23 3.3.2 Server Stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.3.3 Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.3.4 Scenario-Tweets Resource . . . . . . . . . . . . . . . . . . . . . . . 24 3.3.5 Javascript Visualisation . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.3.6 Sound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.3.7 Live . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4 Experimental Results 31 4.1 Stanford Sentiment Server . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.2 Firefox Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.3 Sentiment Rain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 5 Future Work 39 5.1 Sentiment Analysis and Semantic Vectors . . . . . . . . . . . . . . . . . . . 39 5.2 Stanford Sentiment Server . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 5.3 Firefox Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 5.4 Sentiment Rain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 6 Conclusions 43 References 45
  • 13.
    List of figures 2.1Example of feature engineering to classify binary digits into their respective ⊕ output, introduction of the ∧ operator makes previously inseparable classes easily separable. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Example of a feed-forward artificial neural network with two inputs, a hidden layer and one output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 Graph of a common activation function (Equation 2.2) used in the functional units of Artificial Neural Networks. . . . . . . . . . . . . . . . . . . . . . . 6 2.4 An example of a Recursive Neural Network, where the output is equal in length to each of its two child inputs . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.5 Example of the transformation of a semantic vector into the sentiment space . 11 3.1 Stanford CoreNLP Maven dependancy listing . . . . . . . . . . . . . . . . . 14 3.2 Server library imports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.3 Server initialisation code . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.4 Required API endpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.5 Sentiment Server Response . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.6 The sentiment server endpoint’s code . . . . . . . . . . . . . . . . . . . . . . 17 3.7 Firefox extension that highlights sentiment out of paragraphs of text. . . . . . 18 3.8 Firefox extension content selectors . . . . . . . . . . . . . . . . . . . . . . . 19 3.9 Sentence boundary detection regexes . . . . . . . . . . . . . . . . . . . . . . 20 3.10 Headers required to enable Cross-Origin Resource Sharing . . . . . . . . . . 20 3.11 Function for calculating the colour from a sentiment value . . . . . . . . . . 21 3.12 Plugin running on Twitter over HTTPS . . . . . . . . . . . . . . . . . . . . . 22 3.13 Sentiment Rain scenario showing sentiment of Tweets during the Dublin Marathon in 2014 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.14 GRAISearch database endpoint . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.15 Sample required /scenario_tweets resource . . . . . . . . . . . . . . . . 24 3.16 Sentiment Rain architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
  • 14.
    xiv List offigures 3.17 Tweet selected by clicking a circle . . . . . . . . . . . . . . . . . . . . . . . 26 3.18 Sound synthesis schematic for audio creation parametrised by sentiment and location . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.19 An example of a ASDR envelope . . . . . . . . . . . . . . . . . . . . . . . . 28 3.20 Live view of tweets over San Francisco . . . . . . . . . . . . . . . . . . . . 29 4.1 Random sample of websites to test effectiveness of content selection from extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.2 Short movie review [5], colour coded based on sentiment . . . . . . . . . . . 33 4.3 Short movie review [5], colour coded based on sentiment . . . . . . . . . . . 34 4.4 Short movie review [5], colour coded based on sentiment . . . . . . . . . . . 34 4.5 Short movie review [5], colour coded based on sentiment . . . . . . . . . . . 34 4.6 Excerpt from the Wikipedia page of Dublin, Ireland . . . . . . . . . . . . . . 35 4.7 Screenshot of Sentiment Rain over the Dublin Marathon 2014 . . . . . . . . 37
  • 15.
    Chapter 1 Introduction 1.1 Overview DeepLearning is a broad field of research and this project has focused on its application to sentiment analysis, as set forth in [31]. Sentiment analysis is the determination of how positively or negatively an author of a piece of text feels about the subject of the text. This has applications in fields such as predictions of stock market prices and corporate brand evaluation. [31] has led to state of the art results in analysis of the sentiment sentences of variable length, with 85% successful classification of entire sentences into classes of positive or negative. Chapter 2 relates to a discussion of the technologies and ideas that leads to [31], with an emphasis on an abstract understanding of the mathematics behind it. It explains stage by stage the technologies that lead to the Recursive Neural Tensor Network, a technology that semantically composes a sentence up its syntax tree, then a transformation of the output reveals the sentiment information inherent in the sentence. Using the library resulting from the efforts of the Stanford department, made available under the GPLv3 license, this report chronicles the creation of novel uses of the sentiment analysis engine, with regard to various sources of text on the Internet. The process in building these technologies is described in Chapter 3. Chapter 4 investigates the results from the implementations and evaluates their effectiveness. Consideration for future development based on the results is considered in Chapter 5. Finally, concluding remarks are documented in Chapter 6. These relate to evaluating the application of deep learning technologies to sentiment analysis and the effectiveness of the use of these technologies in the implementations described within.
  • 16.
    2 Introduction 1.2 Technologies Threetechnologies have been developed as part of this final year project. While the first two applications (Section 1.2.1 and 1.2.2) have been designed and developed independently with use of the Stanford CoreNLP library, the third has benefited from interacting with researchers on an ongoing European project coordinated by the supervisor Prof. Rozenn Dahyot [13]. More specifically, Cyril Bourges, Marie Curie Research Fellow on this GRAISearch project, has provided access to the dataset ’Dublin Marathon 2014’ hosted on a server in the school of computer science and Statistics in Trinity College. 1.2.1 HTTP Sentiment Endpoint This is a server consisting of a wrapper around the library, as supplied by the Stanford NLP Group, that allows for sentiment analysis to be carried out by any other service able to make HTTP requests across the Internet. It will return a number between 0 for negative and 1 for positive allowing further processing to be carried out. This is the foundation of the other two technologies, and will run on its own server hosted by Amazon Web Services. 1.2.2 Sentiment Analysis Firefox Extension An extension that when installed in the Firefox web browser will automatically search for passages of text in webpages. Once the passages are identified it colour codes sentences depending on their sentiment: red for negative, black for neutral and green for positive, with a gradient between. 1.2.3 Geolocated Visualization of Tweets Another novel application of sentiment analysis, this takes a dataset of tweets over a period, in this case the Dublin City Marathon 2014, and displays them on a map relating to where they were tweeted over the course of the scenario. Using colour and music to signify the sentiment of the tweet it creates an interactive audiovisual experience, where tweets can be intuitively viewed and navigated over the course of the scenario.
  • 17.
    Chapter 2 Background Deep learninghas led to state of the art innovation in the areas of Automatic Speech Recognition [35], Image Recognition [17] and Natural Language Processing [18]. This project specifically deals with its applications in sentiment analysis through the creation of semantic vector representations. A brief introduction is provided in this chapter to the technologies that have led to the state of the art results in sentiment analysis of sentences of varying length [31]. Section 2.1 provides an introduction to the idea of a semantic vector representation of an input and contrasts it to an atomic representation, the example given of English words. Section 2.2 provides an introduction to the area of deep learning, the factors that lead to its success and how it is implemented in artificial neural networks. Section 2.3 returns to semantic vectors and discusses how these can be learned by artificial neural networks. Section 2.4 discusses how these semantic vectors, when learned to describe words in the English language, can be composed up a sentence’s syntax tree resulting in a single vector that describes the semantic meaning of the sentence. Section 2.5 describes how vectors in the semantic space may be transformed into a sentiment space to distill sentiment information from the sentiment vectors. Finally, Section 2.6 investigates how the Stanford NLP department trained these technolo- gies with crowdsourcing labelled data. 2.1 Distributed Semantic Vector Representations Words exist as a means of communication between people, but they are just labels people who speak the same language have agreed upon to communicate meaning. They offer nothing in so far as denoting what they describe. Words are dense representations in that each symbol of the word offers nothing in isolation, they are atomic units. In the alternative, a distributed
  • 18.
    4 Background representation, someof the dimensions may be lost and still information about the object is present. For example "cat" and "lion" trigger commonalities in the mind as they trigger internal representations, but are useless to computers for describing the objects they denote. What these systems need are its own distributed semantic vector representations. These semantic vectors describe, with each dimension, some feature of the input that is useful for solving meaningful problems. 2.2 Deep Learning Deep learning is the process of learning successive semantic vector representations of data in a hierarchical manner. Each vector feeds into the next, more abstract vector to get more and more abstract representations. For example, given an input of an image on a network tasked to find faces, the first layer could detect edges, the next layer eyes, ears and noses and then a third could detect faces. Presented with raw image data at the input, it generates successively more abstract vectors to eventually carry out a task at the output. To return to the cat analogy, one feature of an abstract layer would be ’is it a cat?’, followed on an even more abstract layer with ’is it a lion?’ and these would take cue from the lower layers of ’has it four legs?’ or ’has is it got fur?’. This is useful for simplifying the system as at each layer information is concentrated, abstracted and discarded. The process of learning is by determining how much of an error is in the network at the output. Functional units that produce the output are modified and optimised to reduce the error and eventually generalise to produce the correct output for unseen inputs. 2.2.1 Feature Engineering Traditional machine learning depends on feature engineering. Feature engineering is the practice of hand producing a set of functions that transform an input into a semantic or feature vector. The goal is to make the feature vector a good representation of the input. Whether that be by lowering the dimensionality and increasing the signal to noise ratio, or to perform non linear operations to transform a dimension of the output vector making it linearly separable from the other vectors of a different class. The engineer tries to define a semantic distributed representation by hand. A toy example of feature engineering can be seen in Figure 2.1. Here the class of inputs to the exclusive disjunction operator that output a 1 (in red) are linearly inseperable from those that output a 0 (in blue), from the input alone with two dimensional line. Once the conjunction
  • 19.
    2.2 Deep Learning5 Fig. 2.1 Example of feature engineering to classify binary digits into their respective ⊕ output, introduction of the ∧ operator makes previously inseparable classes easily separable. of the inputs is introduced in the three dimensional model as the new dimension, the classes can easily be separated by a plane in the three dimensional space. This feature engineering is time consuming, depends heavily on the expertise of the engineer and may miss important features of the data that would be useful in carrying out the task at hand. Also the degree of complexity to feed the output of a layer of features into the input of another layer grows with each successive layer of abstraction. This means truly powerful hierarchical representations of the input are difficult to manually engineer. Instead of defining these feature vectors by hand, they can be learned. These feature vectors may then reveal information about the data useful to carrying out the task that is less apparent, but still contributes to the robustness of the system. As the outputs of one layer of features are used as the inputs into the next layer of features, these features become more robust as each would be required to influence multiple features in the next layer. The forced generalisation helps stop the features from overfitting. This can be used through multiple layers of features, creating complex hierarchical representations of the input. Further analysis can be found in [3]. 2.2.2 Artificial Feed-Forward Neural Network A common implementation of deep learning is an artificial feed-forward neural network, as seen in Figure 2.2. This comprises of layers of non-linear functional units, commonly referred to as "neurons", a layer of these produces a linear transformation of the input vector followed by an offset (Equation 2.1) and a non-linear activation function. The activation function is usually a sigmoid function (Figure 2.3 and Equation 2.2), due to its asymptotic nature and
  • 20.
    6 Background Fig. 2.2Example of a feed-forward artificial neural network with two inputs, a hidden layer and one output Fig. 2.3 Graph of a common activation function (Equation 2.2) used in the functional units of Artificial Neural Networks. convenient derivative that is in terms of its input (Equation 2.3). More on activation functions to follow. z = xT W +b (2.1) f(z) = 1 1+e−z (2.2) f (z) = f(z)(1− f(z)) (2.3) These functional units are parametrised by a weight matrix W and a bias vector b on each layer. The weights are parameters of a transformation on the input vector and the biases are an offset that is applied after the transformation but before the non-linear activation function.
  • 21.
    2.2 Deep Learning7 Due to the non-linear nature of each transformation, complex transformations of the input data can be achieved. Given multiple layers, transformations that would have been impossible with a single layer of hand tuned features can be described. Networks are commonly trained via supervised learning, this is a method of training where the network has a dataset of inputs x, be they text, image, video or some other raw data, and corresponding labelled outputs y which describe correct answers for the each input on the task at hand. The dataset is then split into the training set and the testing set randomly. The network is then trained on the training set and its ability is assessed on the testing set. These networks are trained with the method of back propagation [27]. This is a form of gradient descent that has been designed specifically for artificial neural networks. It works by defining a cost function that describes the error of the network and the error is propagated backwards from the output through the layers, all the way to the input, altering the parameters along the way. Generally for supervised learning, where an output y for a given input x needs to be learned, the squared-mean error given in Equation 2.4, is used as a cost function. Cx = 1 2 ||y− f(x)||2 (2.4) Back-propagation is achieved by taking advantage of a non-linear activation function with convenient derivatives that simplifies the system’s partial derivatives (the rate of change of each parameter in the system with respect to the error) such as the sigmoid function (Equation 2.2). Then by getting the rate of change of the error with respect to each weight and bias in the system, each weight is altered by a small amount (known as a learning rate). The parameters are adjusted in the direction of the gradient of each rate of change in order to minimise the cost function. If this is repeated for a diversified input, a good generalised representation that can produce useful classifications or representations of an unseen input can be learned. One disadvantage of neural networks is they don’t benefit from the head start most machine learning techniques receive from feature engineering. When engineering features the engineer can use intuition to rapidly develop meaningful features. An artificial neural network may take time to develop these features. It has to slowly move down a gradient in high dimensional space, but given enough training data and computational resources they are capable of surpassing even the most complex hand made features. Without this foresight they can also get stuck in local minima in the gradient space. This can be helped by using other optimisation strategies such as simulated annealing.
  • 22.
    8 Background 2.3 LearningSemantic Vectors Semantic vectors can be learned by a neural network being exposed to problems it can only solve by building layered hierarchical vector representations of the input that are necessary in solving the problem. The input can be any raw data and the network can learn these vectors and transformations via back-propagating the error on the output and adjusting the parameters to minimise the output of a cost function. These vectors hold enough semantic meaning to attempt to solve any problem the network has seen in the past, so if the question “is it a cat?” is proposed to a network trained on words in the English language, inputs such as "lion" and "cat" will have probably have some dimension close to one, whereas the input “chair” will have a number closer to zero in that dimension. It is worth noting however that unless this information has been useful to the network in solving a problem it has seen in the past, it would not learn this information and thereby a diverse input is essential for complete, robust learning. For example the network could have learned that a sufficient commonality to identify cats is that they have four legs, but then it could confuse a chair for a cat given it has four legs. This is not a failure of the network, but more a failure of the training data. In practice vector dimensions are rarely this quantifiable, this example is contrived and for illustrative purposes. Getting this kind of information from the generated vectors requires learning a linear transformation of the vector space, as seen in Section 2.5. The idea of learning semantic vectors to represent words in the English language is in- troduced in [4], and developed further in [7], with the unsupervised generation of vector representations of words. The methods set out in the latter paper generate a language matrix over the words found on Wikipedia. This matrix in initialised to gaussian noise and after training represents relations between words as approximately linear relationships in this high dimensional space, with each column vector representing a word across the language. Each word in the language indexes a vector from the matrix and provides it as input to the system. The network is trained to come up with meaningful representations of the input so it can carry out the task at hand, which in [7], is to give correct English sentences a higher score than incorrect English sentences. The use of this method purely for mapping relationships to a linear space has been improved upon, with the current state of the art set out in [21]. The evaluation of this technology found that if the vector representing the word "Man" is subtracted from the vector representing the word "King", and this resultant vector is added to the vector for "Woman", we end up with a vector that is close to "Queen".
  • 23.
    2.4 Composition ofSemantic Vectors 9 Fig. 2.4 An example of a Recursive Neural Network, where the output is equal in length to each of its two child inputs 2.4 Composition of Semantic Vectors The idea of composing semantic vectors is proposed in [33]. The idea of composing word vectors together to enable representations of varying length sentences is useful as it allows the semantic vectors of any sentence length to be compared. This paper re-introduces the Recursive Neural Network (RNN) set out in [23]. A RNN (Figure 2.4) is a network that the output vector is the same in length as each of the two input vectors. This means the output can then be provided to the same network as an input at the next stage. Each input/output mapping uses the same weight matrix. This is useful for applying over a structure, such as an abstract syntax tree. Where the child nodes can be composed into an output vector representation at the parent node. This can then act as a child for its parent, until a single vector representing the tree is composed from bottom-up. This works under the assumption that the composition function for the children into a parent is constant for all child parent relations. A problem with the basic RNN is the limitation on how one word can transform the other. The vectors are concatenated before being multiplied by the weight matrix and they cannot multiplicatively affect one another, they can only additively affect the output vector, as shown in Equation 2.5. p is the resultant parent vector with c1 and c2 as child vectors transformed by W, all of which the same length. Note the similarity to Equation 2.1, with f(z) coming from Equation 2.2, the bias is omitted, as it sometimes is in artificial neural networks to simplify the system. p = f(W[c1 : c2]) (2.5)
  • 24.
    10 Background Words like‘not’ are inherently unary operators and have no real stand alone meaning. They could be viewed more as matrices than vectors that should transform the semantic vector of another word. For example, ‘excited’ would have a specific meaning, ‘not excited’ would then be linearly transformed across the vector space by the word ‘not’. This can conceptually be achieved through the use of Matrix-Vector Recursive Neural Networks (MV-RNNs). MV-RNNs [32] seek to solve this issue with assigning each word a matrix, as well as a vector, that can transform the sibling vector. These matrices can be learned in the same way as the vectors and can enable words like "not" transform other words into a different area of the space before combining the two, allowing more flexible representations to be learned. However, the number of parameters gets squared by the introduction of the matrices and so does the dimensionality of the gradient space, so these matrices are difficult to learn. A lower parameter version of this approach is required. Recursive Neural Tensor Networks (RNTNs) [31] are a development on the MV-RNNs, they are RNNs that add a term that consists of a concatenation of two child vectors, transposed [c1 : c2]T then multiplied by a square slice of a parameterised rank 3 tensor Vi, then multiplied by another concatenation of the child vectors [c1 : c2] (Equation 2.6) to the composition function. This results in a scalar value and this is repeated for each slice of the tensor until a vector as long as a child is produced. This is added to product of the weight matrix W and the concatenated children as in the RNN. Most importantly, this gives the children multiplicative effects on one another without squaring the parameter set. The effect each position of the vector has on the other is controlled by the value at a given position in the tensor. This results in certain dimensions of the semantic word vectors relating to affects on siblings and some relating to pure semantic content. pi = f([c1 : c2]T Vi[c1 : c2]+Wi[c1 : c2]) (2.6) 2.5 Transformation to a Sentiment Space Carrying the proposition that the semantic space contains all the semantic information relevant to the network during its training, then it follows that with relevancy of sentiment during training, the sentiment information must be contained in the semantic space. A projection can be performed into a sentiment space to reveal the sentiment information. A sentiment space with five interdependent dimensions is proposed, very negative, negative, neutral, positive and very positive. A vector can be transformed from the semantic space to this space by a matrix learned in the same way as the vectors and is trained on labeled data. So given a vector representing “this movie is terrible” in the semantic vector space, a matrix
  • 25.
    2.6 Training 11 Fig.2.5 Example of the transformation of a semantic vector into the sentiment space would be trained to output close to a 1 in the very negative dimension and zeros in all the other dimensions. 2.6 Training In order to train all the parameters (the semantic vectors, the composition function and the semantic to sentiment projection matrix), a large amount of labeled data is required. The Stanford NLP department collected this data by utilising Amazon’s mechanical turk, a service for crowd sourcing tasks that are difficult for a computer to carry out but take moments for a human to perform. They use a common data set acquired from the website Rotten Tomatoes in the form of movie reviews. The dataset consists of 10,000 sentences which are parsed into their syntax trees by technologies sourced elsewhere in the Stanford NLP department, resulting in 200,000 subtrees or phrases. Each phrase is presented without context to a person on Mechanical Turk where they give it a score from very negative to very positive, these scores are then translated to sentiment vectors that can be used to train all the parameters of the network.
  • 27.
    Chapter 3 Implementation This chapteroutlines the processes involved in creating three technologies. Firstly, a Java server to host the Stanford CoreNLP library described in Section 3.1. Second, a Firefox extension that parses any website for content, then using the technology outlined in Section 3.1 performs coloured sentiment highlighting of the text, described in detail in Section 3.2. Finally, the location-based audio-visual technology titled "Sentiment Rain" is presented in Section 3.3. 3.1 The Stanford CoreNLP Java Library The technologies described in the previous chapter are available as part of the Stanford CoreNLP library [18], specifically the technologies described in [31] are available as the sentiment analysis package. 3.1.1 Configuration The Stanford sentiment library is available in Java and hosted in the Maven Central repository [20]. This allows a project to be created with the library listed as a dependency (Fig 3.1). This keeps the code easier to manage and allows the introduction of updates to the project by changing the version number and rebuilding. This is particularly useful as the models are likely to be upgraded regularly by the Stanford NLP department. The goal is to create a network accessible HTTP endpoint that allows the building of platform-agnostic, novel implementations with Stanford CoreNLP’s sentiment technologies. The first task is to choose a server stack that facilitate such an endpoint across the Internet. As the library is hosted in the Maven Central Repository, a Java server with Maven building process is the clear choice. A Spring MVC web application and RESTful web service framework [34] suffices this requirement. It has a Maven build process and manages
  • 28.
    14 Implementation <dependency> <groupId>edu.stanford.nlp</groupId> <artifactId>stanford-corenlp</artifactId> <version>3.5.1</version> </dependency> <dependency> <groupId>edu.stanford.nlp</groupId> <artifactId>stanford-corenlp</artifactId> <version>3.5.1</version> <classifier>models</classifier> </dependency> Fig. 3.1Stanford CoreNLP Maven dependancy listing dependencies while also providing much of the boilerplate code involved in creating a web server. As shown in Fig 3.2, Spring Java Annotators are imported to inject the boilerplate code involved in creating a web-server into the code-base. Also some Java imports, and the Stanford CoreNLP libraries with one dependency of the Efficient Java Matrix Library’s SimpleMatrix. As shown in Fig 3.3, the use of the CoreNLP sentiment library requires a tokenizer and sentiment annotator. The tokenizer splits any input strings into sentences with the ’ssplit’ annotator followed by splitting into lexical tokens by the ’tokenize’ annotator. They can then be passed into the sentiment annotator. This first gets annotated by the ’parse’ annotator to produces the syntax tree, followed by the ’sentiment’ annotator which uses the technologies described in the earlier chapter to generate a five dimensional sentiment analysis for the tree. The listing below describes the process in which the server initialises these annotators for the use in response to a request. The server is required to respond to the request in Fig 3.4 with a "Content-type: applica- tion/json" response in Fig 3.5. As shown in Fig 3.6, to provide this response, the endpoint receives the ’lines’ parameter from the request via the ’@RequestParam’ Annotation, bound to ’lines’. Each line is taken and passed through each of the CoreNLP annotators, calculating a weighted average of the resultant sentiment vector for each sentence in the line and the average is calculated. Only one sentence should be present, but in the implementation in Section 3.3 it is useful, as the client requires an overall sentiment for a multiple sentences. However the more sentences that are averaged, the less the sentiment can be relied on so this functionality will be used sparingly and only where appropriate. The server responds as defined in the following snippit, i.e. the line followed by a number between 0 and 1. 1 meaning positive and 0 meaning negative.
  • 29.
    3.1 The StanfordCoreNLP Java Library 15 package server; import org.springframework.stereotype.Controller; import org.springframework.web.bind.annotation.RequestMapping; import org.springframework.web.bind.annotation.RequestParam; import org.springframework.web.bind.annotation.ResponseBody; import java.util.HashMap; import java.util.List; import java.util.Properties; import java.io.*; import org.ejml.simple.SimpleMatrix; import edu.stanford.nlp.sentiment.*; import edu.stanford.nlp.ling.CoreAnnotations; import edu.stanford.nlp.ling.Sentence; import edu.stanford.nlp.neural.rnn.RNNCoreAnnotations; import edu.stanford.nlp.pipeline.Annotation; import edu.stanford.nlp.pipeline.StanfordCoreNLP; import edu.stanford.nlp.trees.Tree; import edu.stanford.nlp.util.CoreMap; Fig. 3.2 Server library imports
  • 30.
    16 Implementation @Controller public classSentimentController { StanfordCoreNLP tokenizer; StanfordCoreNLP pipeline; public SentimentController() { Properties tokenizerProps = new Properties(); tokenizerProps.setProperty("annotators", "tokenize, ssplit"); this.tokenizer = new StanfordCoreNLP(tokenizerProps); Properties pipelineProps = new Properties(); pipelineProps.setProperty("annotators", "parse, sentiment"); pipelineProps.setProperty("ssplit.isOneSentence", "true"); pipelineProps.setProperty("enforceRequirements", "false"); this.pipeline = new StanfordCoreNLP(pipelineProps); } Fig. 3.3 Server initialisation code GET /sentiment?lines=what+a+bad+film&lines=actually+I+really+like+it Fig. 3.4 Required API endpoint { "0":{ "sentiment":0.2580948040808713, "line":"what a bad film" }, "1":{ "sentiment":0.6893179757900284, "line":"actually I really like it" } } Fig. 3.5 Sentiment Server Response
  • 31.
    3.1 The StanfordCoreNLP Java Library 17 @RequestMapping("/sentiment") public @ResponseBody HashMap<Integer,HashMap<String,Object>> sentiment( @RequestParam(value="lines", required=true) List<String> lines, Model model ) { HashMap<Integer,HashMap<String,Object>> response = new HashMap<Integer,HashMap<String,Object>>(); for(int i = 0; i < lines.size(); i++) { response.put(i,new HashMap<String,Object>()); Annotation annotation = tokenizer.process(lines.get(i)); pipeline.annotate(annotation); double sentiment = 0; int sentenceCount = annotation.get(CoreAnnotations.SentencesAnnotation.class) .size(); for( int j = 0; j < sentenceCount; j++ ) { CoreMap sentence = annotation.get(CoreAnnotations.SentencesAnnotation.class) .get(j); Tree tree = sentence.get(SentimentCoreAnnotations.AnnotatedTree.class); SimpleMatrix vector = RNNCoreAnnotations.getPredictions(tree); sentiment += (vector.get(1)*0.25+ vector.get(2)*0.5 + vector.get(3)*0.75+ vector.get(4))/ (double)sentenceCount; } response.get(i).put("line",lines.get(i)); response.get(i).put("sentiment",sentiment); } return response; } } Fig. 3.6 The sentiment server endpoint’s code
  • 32.
    18 Implementation Fig. 3.7Firefox extension that highlights sentiment out of paragraphs of text. 3.1.2 Deployment A first attempt at deployment was to use a free instance on Heroku [14]. Heroku is a Cloud Platform as a Service that supports a wide variety of languages and configurations, it also allows for easy deploys via a git [11] from the command line and it is also free for limited use. This option is unavailable as the Stanford CoreNLP models are over 300MB in size and 300MB is the maximum allowed size on Heroku. An alternative is to deploy on an Amazon EC2 Server [1]. An Ubuntu server is operating on AWS and with code pulled via git. The server is made available at a static IP address for responding to requests. The domain conorbrady.com is used - whose DNS records are managed by Cloundflare [6] - to create a domain name at stanford-nlp.conorbrady.com and using a DNS A record, it points to the EC2 server. An endpoint as described in Fig 3.4 is made available on this server. 3.2 Sentiment Firefox Extension The first build using the previously mentioned server is a Firefox extension, [10] this extension runs in a Firefox browser and highlights positive sentences as green and negative sentences as red, as shown in Fig. 3.7. This uses content scripts, [8] these are provided to the browser from an extension on pageload, modifying the content of the page.
  • 33.
    3.2 Sentiment FirefoxExtension 19 /* <p> elements inside <article> elements */ article p /* <p> elements inside elements whose class or id contain article or content as a sub string */ [class*=content] p [id*=content] p [class*=article] p [id*=article] p /* <p class="tweet-text"> elements, for twitter */ p.tweet-text /* <p> elements inside elements with class="body", for the webpage used in the results section */ .body p Fig. 3.8 Firefox extension content selectors 3.2.1 Selectors Looking for consistency in websites is impossible, every website has their structure and nomenclature, and it is at their discretion. So there is no way to pick off relevant text than to compile a list of selectors [16]. Selectors are the means with which Javascript can reference objects in the HTML of a webpage. The comments above the selectors in Fig 3.8 denote what elements they will select in the HTML. This is an incomplete list, but there is no complete list. If it is too general, unnecessary information will be sent to the server for analysis, wasting resources and slowing down an already slow, processing intensive process. jQuery wildcard selectors are employed to increase the system’s robustness, the benefits are documented in the results chapter. Elements are then removed that contain other: paragraphs, textareas and scripts, as these cause issues and are not required by the extension to provide its functionality. 3.2.2 Sentence Boundaries Once the elements are selected, the sentences are extracted from the HTML. A record is kept of where the sentences stars and end so it can be reconstructed on reinsertion. This is achieved by gathering a list of sentence markers. These are markers that are acquired by testing the paragraph for two regular expressions [25] shown in Fig 3.9. The first regex finds possible sentence boundaries. As regex look-behinds are unsupported in Javascript, a second pass has to be performed once a candidate has been identified. The second pass tests for strings such as ’i.e.’, ’e.g.’, ’Mr’, ’Dr’ and so forth. Upon a match of the
  • 34.
    20 Implementation /([.?!])s*(?=[A-Z]|<)/g /(?:w.w)|(?:[A-Z][a-z])/g Fig. 3.9Sentence boundary detection regexes Access-Control-Allow-Origin: * Access-Control-Allow-Methods: GET Access-Control-Max-Age: 3600 Access-Control-Allow-Headers: x-requested-with Fig. 3.10 Headers required to enable Cross-Origin Resource Sharing second regex the candidate is rejected. The start and end (0 and length) are also added to the sentence markers as they will fail to match the first regex but are sentence markers. 3.2.3 Server Interaction These acquired, the paragraph is substringed by the sentence markers and all HTML tags are removed from the text, leaving only the text with the sentiment information. The sentences are then sent to the server via an HTTP GET request. Each sentence is passed as a ’lines’ GET parameter in the request to the server. When a HTTP request is made from a browser via Javascript, it requires that either the domain of the server responding to the request be the same as the current website domain, or the server receiving the request must support Cross Origin Resource Sharing or CORS [9]. The server has to be modified to send the appropriate headers, as it would be impossible to guarantee the domain of the source of the request. The plugin is running in the browser and could be applied to virtually any site. The response headers in Fig 3.10 are added to the server’s response and the browser now allows the responses from the server to pass its security. 3.2.4 Colouring The sentence markers are used to deliminate <span> elements, elements which have no impact on the webpage apart from providing the facility to apply custom CSS to a precise section. These are given inline styling with a colour value computed by the function in Fig 3.11. This results in a black colour for 0.5, indicating a neutral sentiment, becoming green at 1.0 and red at 0.0, indicating these extrema, with a gradient between the three values. It is important leave the functionality of the page unaffected as changing any <a> tags could render
  • 35.
    3.3 Sentiment Rain21 function getRGB(sentiment) { var red = Math.floor( Math.max(0, -510 * sentiment + 255)).toString(16); var green = Math.floor( Math.max(0, 510 * sentiment - 255)).toString(16); return ’#’ + ( red.length == 2 ? red : "0" + red ) + ( green.length == 2 ? green : "0" + green ) + ’00’; } Fig. 3.11 Function for calculating the colour from a sentiment value the site unusable. The insertion the <span> elements ensure the safety of the site’s functionality, as all other content remained unchanged. 3.2.5 HTTPS This will not work for sites using HTTPS [26]. The rules regarding CORS state that any request must be made on the same protocol as the current browser location. This means websites like Twitter that exclusively use HTTPS require HTTPS to be set up on the endpoint to work on those sites. Cloudflare offer a service that allows the proxying of traffic through their server, offering a HTTPS connection to the client but connecting to the sentiment server via HTTP. This means configuration of HTTPS or purchasing a certificate from a Certificate Authority is unnecessary. Cloundflare is configured to offer the service and the extension is modified to make the request via HTTPS when on a secure website, such as Twitter. (Fig. 3.12) 3.2.6 Overview Chart To provide an overview of the sentiment of the page, HighCharts [15] is used. Highcharts is a Javascript library that allows for easy creation of simple charts. As each sentence is processed on the page it increments the respective bucket on the histogram defined by the sentiment value. This creates a histogram reflecting the number of sentences of each sentiment. 3.3 Sentiment Rain This section describes a separate use of the sentiment analysis engine, to map Tweets over the course of a time frame, specifically the Dublin Marathon 2014 (Fig. 3.13) then live over San Fransisco.
  • 36.
    22 Implementation Fig. 3.12Plugin running on Twitter over HTTPS Fig. 3.13 Sentiment Rain scenario showing sentiment of Tweets during the Dublin Marathon in 2014
  • 37.
    3.3 Sentiment Rain23 http://graisearch.scss.tcd.ie/query/Graisearch/sql/:querystring Fig. 3.14 GRAISearch database endpoint 3.3.1 The TCD GRAISearch Dataset A database is made available on a read-only basis from the department of statistics in Trinity College. This database is exposed on the endpoint in Fig 3.14, once supplied with valid credentials using HTTP Basic Auth. HTTP Basic Auth is a simple form of authorisation where a username and password combination is encoded into a HTTP header. As this is encoded and not encrypted, this is potentially insecure, a HTTPS connection would resolve this problem but at time of writing is unavailable on the GRAISearch server, which returns a ’503 - Service Unavailable’ when requested with a HTTPS protocol. The dataset contains numerous geocoded, timestamped Tweets collected during the Dublin Marathon. By leveraging this dataset with the sentiment analysis server, an interactive experi- ence of Twitter activity during the day of the Marathon is created. 3.3.2 Server Stack The requirements of the server are minimal. The majority of the processing is done client side with minimal business logic occuring on the server. It serves to go between the client, the GRAISearch dataset and the sentiment server. No database is required as the data is streamed from the GRAISearch database. All that is required is a thin layer to serve the page and to provide some endpoints for the Javascript to get information it needed for the visualisation (the Tweet objects). The Sinatra Ruby Mirco-framework [30] is an appropriate choice, it is lightweight and quick to get up and running. It also defaults to using productive technologies such as Coffeescript (a terse expressive language that compiles to Javascript for the browser), SASS and HAML (similar technologies relating to CSS and HTML respectively). A comparison with a large traditional MVC framework such as Ruby on Rails reveals a cost of considerable set up time for unneeded features. Heroku is used for hosting, as this time the server has sufficiently small a footprint and is more typical of what Heroku is generally used for. This allows for a git push to Heroku from the command line to update the server easily. They also provide the option to create a subdomain of herokuapp.com that resolve to the server’s IP address. sentiment-rain.herokuapp.com is chosen and CNAME DNS record is set up on Cloudflare to point from
  • 38.
    24 Implementation { "id": "526661075383894016", "lat":53.72362511, "lon": -6.33728538, "text": "Best of luck to our online editor @JaneLundon running the @dublinmarathon today. We’re so proud of you Jane! @Elverys ", "link": "http://t.co/RfAKcz8FsF", "created_at": "1414400759000", "sentiment": 0.6815987306407396 } Fig. 3.15 Sample required /scenario_tweets resource sentiment-rain.conorbrady.com to to the Heroku supplied domain. The server is now available at http://sentiment-rain.conorbrady.com/ 3.3.3 Map A powerful Javascript browser mapping library is needed with high customisability. Google Maps was the original path of research, but Mapbox [19] offers more options for customising the mapview and drawing shapes over it. Mapbox is built on top of leaflet - an expressive map drawing library, it is the choice of Foursquare, Pinterest and Evernote to name a few and it is free to use for the purposes of this project. 3.3.4 Scenario-Tweets Resource A resource is provided at /scenario_tweets?since=:since&limit=:limit This returns a number (:limit) of Tweets since a certain timestamp (:since) in a JSON format, an example is shown in Fig 3.15. To implement this endpoint, the GRAISearch dataset must be queried on the URL described in Fig 3.14. Requesting: • The ID as defined by Twitter • The coordinates of tweet • The text of the tweet
  • 39.
    3.3 Sentiment Rain25 Fig. 3.16 Sentiment Rain architecture • The time the tweet was created at Filtered by: • Only English tweets • That had no null fields of those requested • Created since the time defined on the request • Limited by the limit on the request Upon the GRAISearch server’s response, URLs have to be parsed out of all the text responses to get the raw tweet text. All the responses are then encoded into a HTTP GET request and sent to the server for analysis and the results are filtered back into the tweet objects for returning to the client. The architecture of this system is shown in Fig. 3.16. 3.3.5 Javascript Visualisation To visualise the data, a clock is initialised at 8:00am on the morning of the Dublin Marathon 2014. The clock progresses at 5 minutes of simulated time per second of real time and ticks 12 times per second real time.
  • 40.
    26 Implementation Fig. 3.17Tweet selected by clicking a circle On each tick it messages the view and the data source separately and allows them to carry out their appropriate action, depending on their state. The data source works simply, if it is not already waiting on a network request to return tweets, it checks the current time against the latest tweet it already received, if this reveals that the visualisation will run out of new tweets in 6 seconds of real time, it will asynchronously request the next batch of tweets from the server and feed it into the view. The view visualises the tweets it is aware of at the time. It is the data source’s job to feed it with tweets to visualise. When the view receives a tweet it wraps it in a TweetView object. It is these objects that represent the circles on the map and any accompanying interactivity. Upon creation, a click listener is attached to the circle that produces the tweet overlaid on the map when a mouse click event occurs on it, as shown in Fig. 3.17. This is achieved by using Twitter’s Javascript SDK and passing the Tweet ID to a call along with a webpage element in which to render it. This element is one with id ’frame’ and exists for the sole purpose of containing these tweets. On each clock tick, the view passes the timestamp to each TweetView it also removes any TweetViews from its set that have destroyed themselves. This destruction occurs when they detect they should no longer be visible and will invisible in the future. The actual TweetView objects shrink their circle’s size and modulate their border radius with respect to the supplied timestamp and the time they were created at on Twitter. After a certain point in time after they are created, they will have no size and call a destroy method on
  • 41.
    3.3 Sentiment Rain27 Fig. 3.18 Sound synthesis schematic for audio creation parametrised by sentiment and location themselves that instructs the parent view to stop messaging them and let the garbage collector clean them up as described previously. 3.3.6 Sound Given the visual nature of this project it is appropriate to furnish it with a complimentary soundscape. For this the Audiolet Javascript library [2] is appropriate. This allows sample- altering and frequency-producing nodes to be connected in a network to produce sound effects and music synthesis. After some experimentation and research the configuration shown in Fig. 3.18 was produced. This generates a low tone for negative sentiment and a high tone for positive sentiment. These tones are selected from a C Minor scale with the 2nd, 4th and 7th removed as these are not dominant notes of the scale and as the notes are selected with uniform randomness, it is better that the selection be of more dominant notes to reinforce the scale.
  • 42.
    28 Implementation Fig. 3.19An example of a ASDR envelope From there it is connected to a low pass filter that is modulated by a square wave. A low pass filter filters out high frequencies and allows lower frequencies to pass, when the cut off frequency is varied or modulated, this produces a rhythmic effect. The modulating square wave’s frequency is controlled by the sentiment, giving a slow modulation for negative and a high modulation for positive, between 1Hz and 16Hz, this modulates the cutoff of the low pass filter between 5kHz and 9kHz resulting in a rhythmic effect. A device known as an ADSR (Attack, Decay, Sustain, Release) envelope produces the signal shown in Fig 3.19. This is triggered the moment the tweet is tweeted, then summed with another modulation square wave that is off the frequency of the modulator of the low pass filter by enough to create a rhythm between the two. The summation of the ADSR and former modulation square wave produces a signal that begins with the transient of the ASDR combined with the choppiness of the square wave, but after the release continues to hold with just the square. This summed signal is combined in a multiplier with the previous signal from the low pass filter, this acts on the output of the low pass filter as an amplitude control, to create the enveloped sound with a transient followed by a rhythmic amplitude at odds with the already present modulated low pass filter. The final 2 stages give spatial awareness to the sound by first attenuating it with respect to its distance to the map centre, and panning it with respect to its x position on the map.
  • 43.
    3.3 Sentiment Rain29 Fig. 3.20 Live view of tweets over San Francisco 3.3.7 Live As a final experiment a live stream of tweets from Twitter is connected, to showcase the tweets over San Fransisco in real time. This is fed directly from Twitter’s API as opposed to the dataset made available from TCD GRAISearch. This means Twitter’s API policies such as rate limiting and requesting based on location had to be adhered to. After that the approach and technologies are similar and it is a success [29].
  • 45.
    Chapter 4 Experimental Results 4.1Stanford Sentiment Server This server performs as expected with roughly 200ms round trip time per sentence submitted in the request. Measuring the performance of sentiment analysis, the server performs as well to Stanford NLP department’s live demonstration [24] in all comparisons. In future it is expected that Stanford’s demonstration will perform better as it has access to the most up to date training. No replication or load balancing is in play and this implementation will not scale to large amounts of users requesting simultaneously, but as no shared state is held on the server the task of replication and load balancing is straight forward when required. 4.2 Firefox Extension The Firefox extension has 40% recall of relevant content with the basic Javascript selector article p. By augmenting the selectors with jQuery wildcard selectors (as explained in Section 3.2.1) the recall rises to 88% of relevant content. These selectors search anywhere in the class or id attribute of elements in the webpage for the string of either "article" or "content", instead of trying to match the whole name. Due to the unpredictable nature in which websites are constructed this works exceptionally well, as demonstrated in Fig. 4.1. The sentence boundary splitting appears robust, with 95% precision of detection of sentence boundaries across Fig 4.2, Fig 4.3 and Fig 4.4. The approach presented has independently been estimated at 95% effective [22]. Each paragraph of content requires around three seconds to process on an idle server. This processing is carried out in a serial manner, therefore the processing time of the page is a linear function of the number of paragraphs, with paragraph length taken into account. As mentioned
  • 46.
    32 Experimental Results WebsiteSelector http://www.vulture.com/ article p http://www.theglobeandmail.com/ article p http://www.detroitnews.com/ article p http://theadvocate.com/ article p http://wegotthiscovered.com/ article p http://www.newyorker.com/ article p http://qctimes.com/ article p http://www.rollingstone.com/ article p http://www.dailynews.com/ article p http://www.rogerebert.com/ article p http://www.thestar.com/ [class*=article] p http://www.abc.net.au/ [class*=article] p http://www.forbes.com/ [class*=article] p http://www.reviewjournal.com/ [class*=content] p http://www.nj.com/ [class*=content] p http://thepopcornjunkie.com/ [class*=content] p http://baretnewswire.org/ [class*=content] p http://www.tvinsider.com/ [class*=content] p http://filmink.com.au/ [id*=article] p http://www.vox.com/ [id*=article] p http://theyoungfolks.com/ [id*=content] p http://screenrant.com/ [id*=content] p http://www.cityweekly.net/ fail http://www.reelingreviews.com/ fail http://www.ericdsnider.com/ fail Fig. 4.1 Random sample of websites to test effectiveness of content selection from extension
  • 47.
    4.2 Firefox Extension33 Fig. 4.2 Short movie review [5], colour coded based on sentiment previously due to limitations on the server this does not scale well and after a number of users performance will become sluggish. Fig. 4.2 to Fig. 4.5 show the effect the of the extension on a series of short, opinionated movie reviews. Due to the libraries training on movie reviews, this is a perfect demonstration of its abilities. The reader is encouraged to evaluate the effectiveness for themselves. The Figures show parallels between sentiment and colour coding. Also it can be seen in these excerpts, and that of Fig 4.6, that no functionality has been affected in the page, all links remain clickable. The chart overview offers little insight into the pages sentiment. Simply counting the number of positive and negative sentences reveals little. If one sentence can carry much more weight than the others, it should represent more value on the histogram. Without said weights this representation is impossible. Also hidden text on the page will contribute to the histogram’s state. While colouring the text this does not present a problem, as the user cannot see it to begin with it. The problem this chart attempts to solve is complex and likely an ongoing research effort in Stanford’s NLP department. That is attempts to decipher the importance and weight of a sentence relative to its surrounding content. Fig 4.6 shows an article on Wikipedia, a website with little to no opinion, apparently has sentiment information. It is the authors conviction this is due to two forms of sentiment, the first being sentiment opinion of the writer which is evident in the movie reviews, and the
  • 48.
    34 Experimental Results Fig.4.3 Short movie review [5], colour coded based on sentiment Fig. 4.4 Short movie review [5], colour coded based on sentiment Fig. 4.5 Short movie review [5], colour coded based on sentiment
  • 49.
    4.2 Firefox Extension35 Fig. 4.6 Excerpt from the Wikipedia page of Dublin, Ireland
  • 50.
    36 Experimental Results secondbeing sentiment opinion of the reader as apparent in this article. Take the sentence of "In response to Strongbow’s successful invasion ... pronounced himself Lord of Ireland". This is a subjective sentiment, some people would think it positive and some negative but it is unrepresentative of an opinion. This is likely a failure in the training of the dataset with bias and personal outlook creeping into the model. It could be due to an under-trained network, but evidence suggests the contrary as this theme is consistent across Wikipedia. The measure in which this library achieved 85% correct classification ignored the neutral class [31]. Evidenced by Fig 4.6 it could be argued that a neutral class is as important as positive and negative and should be included in future state of the art benchmarks. A packaged extension can be be found on the CD accompanying this report in the folder entitled "demonstration". 4.3 Sentiment Rain Sentiment Rain is a success on the desktop browsers, Safari and Chrome on OSX. The frame rate is smooth and the colours and tones reflect the sentiment. It struggles on Firefox on OSX due to the manner in which that browser allocates its threads. In Firefox there is only one thread available to the Javascript engine per tab, when a network call is made it disrupts the user interface’s processing and this manifests as stuttering in the browser. Safari and Chrome handle this better with the networking not interfering with the user interface. Internet Explorer remains untested for compatibility. As proof of concept this is a sufficient result. A screenshot is provided in Fig 4.7. As a screenshot is a poor demonstration of this implementation, the reader is encouraged to investigate the live demos in the Chrome browser, [12] available at [28] and [29]. Also a video demonstration is included on the CD accompanying this report in the folder entitled "demonstration".
  • 51.
    4.3 Sentiment Rain37 Fig. 4.7 Screenshot of Sentiment Rain over the Dublin Marathon 2014
  • 53.
    Chapter 5 Future Work 5.1Sentiment Analysis and Semantic Vectors The research in this project is about learning semantic vector representations and how to gain useful information from them. As the semantic vectors’ axes are undefined, a transformation must be performed to known axes to gain insight into the information contained within the space. Sentiment is a poorly defined concept as it is not always objectively clear what the sentiment of a sentence is, for example "I love when people die in war" gets a sentiment rating of 0.54. This sentence contains sentiment, but in multiple, separate dimensions. When this space gets projected to a one dimensional space all the sentiment inherent in the sentence is lost. If clear objective axes are defined, and a projection is learnt to them, the network could yield a more useful insight into the semantic meaning of the vector representations. If a multidimensional sentiment model could be proposed such as the sentiment of the writer on one axis and the perceived sentiment of the reader on the other, the network may be trained to produce a more robust, context free sentiment analyser than the ones that have been seen so far. A model such as this is hard to define, the proposal above may not be a valid model, as the sentiment of a reader is a subjective concept and thus may require a different transformation based on the political and moral viewpoint of the reader. It would be interesting to investigate what other meaningful projections could be achieved from the semantic vectors. A model where sentiment is decomposed into two dimensions may just be the beginning. The field of sentiment analysis could be generalised into one of semantic analysis with multiple dimensions revealing their own information along with sentiment. Multiple transformation matrices could encode different viewpoints and encode personal biases into the vectors and allow for insight into the personal semantic information a reader would receive from a piece of text.
  • 54.
    40 Future Work 5.2Stanford Sentiment Server The sentiment server could be spread across multiple machines with caches and load balancers to enable scalability. This system could be scaled as there would be no interdependence of stored data and therefore could load balance with ease. As this was merely a proof of concept, the non scalable server was sufficient for these purposes. The biggest improvement in scalability would be sourced with the removal of the server entirely and replace with a similar library written in Javascript. This would allow much cheaper scalability as no dedicated server would be required for sentiment processing and each user would do the sentiment analysis on their own machine. This could be trained on Stanford’s sentiment treebank [31] to achieve similar results. 5.3 Firefox Extension Moving sentence splitting to the server would prove much more reliable as the CoreNLP contains functionality to split sentences more robustly than anything Javascript libraries can offer. This is too significant an architecture redesign to undertake at this time but should be considered early on. The other issue concerns reliable extraction of pertinent text. This problem has no clear solution as each webpage has potentially a different structure to any seen before. One possible solution would be to provide a facility to click paragraphs that are of interest and derive a selector to parse it, and update the selector list for future. Increasing interactivity in the extension, allowing users to correct mislabelled sentences to augment the training set in future would prove invaluable. As this is a deep learning technology, the more data the network has to learn and generalise with, the more powerful it becomes. 5.4 Sentiment Rain Caching on the backend, both in the live and scenario models would allow for large performance gains and scalability. In the live model, each request that comes into the server requests new tweets from Twitter. Given a modest number of users this would quickly exceed twitter’s rate limits and render the system useless, until the API reopens again 15 minutes later, only to be rapidly exceeded again. As each user is looking for the same tweets only one request is necessary and should be cached for successive requests from clients. The other need for caching involves both live and scenario models. The sentiment for each tweet is calculated at time of request. This is wasteful of resources as if two users are requesting
  • 55.
    5.4 Sentiment Rain41 the same tweets there is no need to analyse them twice. Online analysis is unavoidable in the live model, but should only be analysed once and cached for future requests from other users. In the Dublin Marathon scenario, tweets can be pre-analysed removing the need for the sentiment server altogether, this would improve responsiveness and scalability massively. Tweets could be preprocessed. Emojis and hashtags are unlikely to provide meaningful inputs to the system. If they could be substituted for more clear words or removed altogether a performance boost could be perceived.
  • 57.
    Chapter 6 Conclusions In conclusiondeep learning is an effective means of determining sentiment analysis, as sen- timent is one projection on semantic meaning. The technology set out in [31] is the only technology that takes the entire semantic meaning of a sentence into account before applying this projection. Words alone cannot determine sentiment without first taking into account what they mean in the context of the sentence and the ordering they appear in. Without this composition of semantic vectors further progress in sentiment analysis is unlikely. By leveraging the flexibility of the HTTP protocol, a platform agnostic endpoint contained within a replicatable server is created. This allows for a scalable sentiment analysis server infrastructure. This powers two novel implementations of sentiment analysis, one on any piece of text visited by the browser, the other temporally mapping tweets across a map during a given scenario. Implementations of this type are effective uses of this technology and warrant further investigation in the future.
  • 59.
    References [1] Amazon WebServices Elastic Cloud 2. http://aws.amazon.com/ec2/. [Online; accessed 17-April-2015]. 2015. [2] Audiolet - JavaScript library for audio synthesis and composition. http://oampo.github. io/Audiolet/. [Online; accessed 17-April-2015]. 2015. [3] Yoshua Bengio. “Learning deep architectures for AI”. In: Foundations and trends® in Machine Learning 2.1 (2009), pp. 1–127. [4] Yoshua Bengio et al. “A Neural Probabilistic Language Model”. In: J. Mach. Learn. Res. 3 (Mar. 2003), pp. 1137–1155. ISSN: 1532-4435. URL: http://dl.acm.org/citation.cfm? id=944919.944966. [5] Don Chartier. Short and Sweet Movie Reviews. http://shortandsweet.blogspot.ie/. [Online; accessed 17-April-2015]. 2015. [6] Cloudflare. https://www.cloudflare.com/. [Online; accessed 17-April-2015]. 2015. [7] Ronan Collobert and Jason Weston. “A unified architecture for natural language pro- cessing: Deep neural networks with multitask learning”. In: Proceedings of the 25th international conference on Machine learning. ACM. 2008, pp. 160–167. [8] Content Scripts | Mozilla Developer Network. https://developer.mozilla.org/en-US/Add- ons/SDK/Guides/Content_Scripts. [Online; accessed 17-April-2015]. 2015. [9] Cross-Origin Resource Sharing | w3.org. http://www.w3.org/TR/cors/. [Online; accessed 17-April-2015]. 2015. [10] Firefox Extensions. https://addons.mozilla.org/en-US/firefox/extensions/. [Online; accessed 17-April-2015]. 2015. [11] Git –fast-version-control. http://git-scm.com/. [Online; accessed 17-April-2015]. 2015. [12] Google Chrome Browser. https://www.google.ie/chrome/browser/desktop/. [Online; accessed 18-April-2015]. 2015. [13] GRAISearch. Use of Graphics Rendering and Artificial Intelligence for Improved Mobile Search Capabilities. FP7-PEOPLE-2013-IAPP (612334) 2015-18. [14] Heroku. https://www.heroku.com/. [Online; accessed 17-April-2015]. 2015. [15] Highcharts - Interactive JavaScript charts for your webpage. http://www.highcharts. com/. [Online; accessed 17-April-2015]. 2015. [16] jQuery Selectors. https://api.jquery.com/category/selectors/. [Online; accessed 17-April- 2015]. 2015.
  • 60.
    46 References [17] AlexKrizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. “ImageNet Classification with Deep Convolutional Neural Networks”. In: Advances in Neural Information Processing Systems 25. Ed. by F. Pereira et al. Curran Associates, Inc., 2012, pp. 1097–1105. URL: http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional- neural-networks.pdf. [18] Christopher D. Manning et al. “The Stanford CoreNLP Natural Language Processing Toolkit”. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations. 2014, pp. 55–60. URL: http://www.aclweb.org/ anthology/P/P14/P14-5010. [19] Mapbox | Design and publish beautiful maps. http://www.mapbox.com/. [Online; accessed 17-April-2015]. 2015. [20] Maven. http://search.maven.org/. [Online; accessed 17-April-2015]. 2015. [21] Tomas Mikolov et al. “Efficient estimation of word representations in vector space”. In: arXiv preprint arXiv:1301.3781 (2013). [22] John O’Neil. Doing Things with Words, Part Two: Sentence Boundary Detection. http: //web.archive.org/web/20131103201401/http://www.attivio.com/blog/57-unified- information-access/263-doing-things-with-words-part-two-sentence-boundary- detection.html. [Online; accessed 17-April-2015]. 2008. [23] Jordan B Pollack. “Recursive distributed representations”. In: Artificial Intelligence 46.1 (1990), pp. 77–105. [24] Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank - Live Demo. http://nlp.stanford.edu:8080/sentiment/rntnDemo.html. [Online; accessed 17-April-2015]. 2015. [25] Regular Expressions - Javascript | MDN. https://developer.mozilla.org/en/docs/Web/ JavaScript/Guide/Regular_Expressions. [Online; accessed 17-April-2015]. 2015. [26] Eric Rescorla. HTTP Over TLS. RFC 2817. RFC Editor, May 2000, pp. 1–7. URL: http://tools.ietf.org/html/rfc2818. [27] David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. “Learning internal representations by error-propagation”. In: Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Volume 1. Vol. 1. 6088. MIT Press, Cambridge, MA, 1986, pp. 318–362. [28] Sentiment Rain | Dublin Marathon 2014. http://sentiment-rain.conorbrady.com/scenario. [Online; accessed 17-April-2015]. 2015. [29] Sentiment Rain | Live Over San Fransisco. http://sentiment-rain.conorbrady.com/live. [Online; accessed 17-April-2015]. 2015. [30] Sinatra - A Ruby Server Micro-framework. http://www.sinatrarb.com/. [Online; accessed 17-April-2015]. 2015. [31] Richard Socher et al. “Recursive deep models for semantic compositionality over a sentiment treebank”. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP). Vol. 1631. Citeseer. 2013, p. 1642.
  • 61.
    References 47 [32] RichardSocher et al. “Semantic Compositionality Through Recursive Matrix-vector Spaces”. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. EMNLP-CoNLL ’12. Jeju Island, Korea: Association for Computational Linguistics, 2012, pp. 1201–1211. URL: http://dl.acm.org/citation.cfm?id=2390948.2391084. [33] Richard Socher et al. “Semi-supervised Recursive Autoencoders for Predicting Sen- timent Distributions”. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. EMNLP ’11. Edinburgh, United Kingdom: Association for Computational Linguistics, 2011, pp. 151–161. ISBN: 978-1-937284-11-4. URL: http://dl.acm.org/citation.cfm?id=2145432.2145450. [34] Spring Web MVC Framework. http://docs.spring.io/spring/docs/current/spring- framework-reference/html/mvc.html. [Online; accessed 17-April-2015]. 2015. [35] Dong Yu and Li Deng. Automatic Speech Recognition - A Deep Learning Approach. Springer, Oct. 2014. URL: http://research.microsoft.com/apps/pubs/default.aspx?id= 230891.