SlideShare a Scribd company logo
ANALYSIS OF TWITTER MESSAGES FOR
SENTIMENT AND INSIGHT FOR USE IN STOCK
MARKET DECISION MAKING
ERIC D. BROWN
DOCTORAL DISSERTATION FINAL DEFENSE
AGENDA
• Introduction
• Previous Research
• Research Summary
• Research Model
• Research Methodology
• Data Analysis
• Research Findings
• Conclusions & Future Research
INTRODUCTION
• Sentiment has an underlying factor in the investing world for many
years.
• Many companies create and track various types of sentiment
• Consumer Confidence Index
• Investors Intelligence Sentiment Index
• American Association of Individual Investors Sentiment Survey
• “Market Sentiment”
• Rather than waiting days, weeks or months like current sentiment
measures, can we use sentiment generated in real-time to
improve trading performance and investment decisions?
• Can we create a “sentiment of now” using social media or other
user-generated content?
• Can Twitter be used to determine the ‘sentiment of now’?
INTRODUCTION
• The goal of this study was to gain a more thorough
understanding of Twitter content and the users that create it.
• Can a Tweet convey sentiment with only 140 characters
available?
• If Tweets do convey some form of sentiment can this sentiment
be used in a predictive manner?
• Can this Twitter content and users be ‘tapped’ to build
methodology that identifies and evaluates likely investment
opportunities?
PREVIOUS RESEARCH
• Wysoki (1998) – Found a strong positive correlation between
volume of messages posted on message boards overnight and
next day’s trading volume and stock returns.
• Tumarkin and Whitelaw (2001) – Concluded that there are no
predictive capabilities found within message board activity.
• Antweiler and Frank (2004) – Used sentiment analysis to
show strong positive correlation between message board posts
and next day trading volume and volatility. Showed minor
correlation between message board posts and next day price
activity.
PREVIOUS RESEARCH
• Gu, et al (2006) – Found that aggregation of individual
recommendations on stock message boards have no
predictive power on future stock returns.
• Das and Chen (2007) – Using sentiment analysis of
messages on message boards, found no correlation between
sentiment and individual stock price movement but did find
positive correlation of the aggregate sentiment of a set of
aggregate stocks and movement in the stock market.
• Zhang (2009) – Studied the reputation of a message board
poster and showed that a ‘better’ reputation was shared more
widely and had a larger effect on sentiment.
PREVIOUS RESEARCH
• Bollen, Mao & Zeng (2010) – Using sentiment analysis,
determines the ‘mood’ of the twitter universe and then predicts
the next day movement of the Dow Jones Industrial Average –
with an 87.6% accuracy.
• Accuracy isn’t everything. A Hedge Fund attempted to run
their fund with this research and closed shop within a year.
• Sprenger and Welpe (2010) – Focused on the S&P 100
stocks and the sentiment of Tweets regarding those stocks.
Showed that sentiment of the company on Twitter closely
follows market movements. This research also showed positive
correlation between trading volume and Tweet volume.
PREVIOUS RESEARCH
Additional research in Sentiment Analysis of Twitter:
• Bifet & Frank, 2010 – Sentiment Knowledge Discovery in
Twitter Streaming Data.
• Pak & Paroubek, 2010 - Twitter as a Corpus for Sentiment
Analysis and Opinion Mining.
• Romero, Meeder, & Klienberg, 2010 - Differences in the
Mechanics of Information Diffusion Across Topics: Idioms,
Political Hashtags, and Complex Contagion on Twitter
• Castillo, Mendoza & Poblete, 2010 – Information Credibility
on Twitter.
• Diakopoulos & Shamma, 2010 – Characterizing Debate
Performance via Aggregated Twitter Sentiment.
RESEARCH SUMMARY
The main questions driving this study were:
• Can analysis of publicly available Tweets provide insight
for investing decisions?
• Do Tweets (and their subsequent sentiment) have any
effect on movement in the stock market?
• Can Tweets be mined and analyzed to predict daily
movements in the stock market?
• Does a Twitter user’s reputation have an effect on how
people perceive and use their shared investing ideas?
RESEARCH SUMMARY
To address those main drivers, the following research questions were
developed:
• RQ-1: Using a given sector of the stock market, does the
sentiment for that sector match the aggregated sentiment for the
stocks that make up that sector? How well does the sentiment
predict price / volume movement?
• RQ-2: Are there specific stocks within a given sector that supply
the majority of the sentiment for that sector? If so, do these stocks
supply sentiment in correlation to the weighting given to them by
ratings agencies (e.g., Standard & Poor’s)?
• RQ-3: Are there times of the day or days of the week that provide
a more accurate and informative sentiment for a stock or sector?
• RQ-4: Are there specific users that provide more ‘weight’ to a
sentiment of a stock or sector based on the users’ reputation?
RESEARCH SUMMARY
RQ-1 Hypotheses
• H1a: The sentiment of a sector will match the overall averaged
sentiment of all stocks within the sector.
• H1a0: States that there will be no noticeable relationship
between the sentiment of a sector and the overall averaged
sentiment of stocks within the sector.
• H1b: The sentiment of a sector can be used to predict the
movement of all stocks in that sector.
• H1b0: States that the sentiment of a sector will provide no
predictive capability.
• H1c: The sentiment of a sector or stock on any given day will
provide a prediction for the next day’s movement in that stock.
• H1c0: States that there will be no predictive capability on price
and sentiment from day to day.
RESEARCH SUMMARY
RQ-2 Hypotheses
• H2a: The sentiment of a stock within a given sector will affect
the sentiment of the overall sector based on the relative market
cap weighting of that stock.
• H2a0: States that the sentiment of a stock is not correlated
with the market cap weighting of the stock in that sector.
• H2b: The stocks that provide the most weight toward the
sentiment of a sector are also the stocks with the highest
number of mentions on Twitter.
• H2b0: States that there is no relationship between the
number of mentions on Twitter and the affect that these
stocks have on the sector sentiment.
RESEARCH SUMMARY
RQ-3 Hypothesis
• H3: There is a difference in the effect that Tweets sent during
non-market hours (i.e., evenings and weekends) and Tweets
sent during market hours have on sentiment and price.
• H30: States that there is no difference in the effect of
Tweets during market hours and non-market hours.
RESEARCH SUMMARY
RQ-4 Hypothesis
• H4: The number of followers of a Twitter user determines the
effect that users’ Tweets will have on sentiment for a stock or
sector.
• H40: States that there is no relationship between the
number of followers and sentiment on a stock or sector.
RESEARCH SUMMARY
Mapping Hypothesis and Research Questions
Research Question Hypothesis
RQ-1: Using a given sector of the stock market, does the sentiment for that
sector match the aggregated sentiment for the stocks that make up that sector?
How well does the sentiment predict price / volume movement?
H1a, H1b, H1c
RQ-2: Are there specific stocks within a given sector that supply the majority of
the sentiment for that sector? If so, do these stocks supply sentiment in
correlation to the weighting give to them by ratings agencies (e.g., Standard &
Poor’s)?
H2a, H2b
RQ-3: Are there times of the day or days of the week that provide a more
accurate and informative sentiment for a stock or sector?
H3
RQ-4: Are there specific users that provide more ‘weight’ to a sentiment of a
stock or sector based on the users’ reputation?
H4
RESEARCH MODEL
Twitter Sentiment Analysis
For Stocks and Sectors
Stock &
Sector
Analysis
Sentiment
Weighting
within
Sectors
H1a, H1b, H1c
H2a, H2b
Day /
Time
Analysis
H3
Information
Content of
Tweets
Correlations
with Stock
Market
Prices
User Reputation
Analysis
of Twitter
Users
H4
Predictive
Nature of
Tweets
RESEARCH METHOD
Twitter
Data
Collection
Sentiment
Analysis
User
Analysis
Stock
Market
Data
Price
Analysis
Correlation of Twitter Sentiment
with Price
Reputation of
Twitter user
Understanding of predictive capabilities of Twitter Sentiment and the affect
of user reputation for investing decisions
RESEARCH
METHODOLOGY
Data Collection
• Twitter API to collect tweets (tweet, sender, date, time)
• Tweets referencing companies and sectors are collected and
stored in a MySQL database for future study
• Using the nomenclature made popular by StockTwits
(www.stocktwits.com). Example: The stock symbol for Apple
is AAPL. Users following the StockTwits nomenclature add a
“$” to the symbol – “$AAPL”.
• EODData.com market feed to gather Stock Market data (price
and volume)
RESEARCH
METHODOLOGY
Market Data
• This study reviewed the Energy (XLE) and Consumer Staples
Sectors (XLP).
• Chosen to get different types of companies.
• Both have the same number of symbols in the sector.
• Used XLE and XLP Exchange Traded Funds (ETF’s)
• ETF’s are a ‘proxy’ for owning each company covered by the
ETF.
• ETF’s are, generally, a weighted index made up of each
company within the sector. The company’s stock price is
weighted based on the market cap of the company.
• ETF’s provide a method to diversify and/or invest in a sector
or industry without owning a large portfolio of companies.
Market Data
• XLE (top chart) shows a
non-trending volatile market
• Gains for the year =
$1.86 per share or
2.77% gain
• 42 companies make
up the XLE Sector
• XLP (bottom chart) shows
an upward trending
• Gains for the year =
$3.05 per share or
10.05% gain
• 42 companies make
up the XLP sector
RESEARCH
METHODOLOGY
RESEARCH
METHODOLOGY
Sentiment Analysis
• Using the Python programming language and the Natural
Language Toolkit’s implementation of the Bayesian text
classification system, algorithms were implemented to
determine sentiment found within Tweets
• For Bayesian classification, a data set was needed to ‘train’ the
classifier to categorize data appropriately.
• To create the training data set, 10,000 Tweets were
randomly selected from the collection of Tweets.
• Each Tweet was ‘cleansed’ to remove identifying Twitter
user information, Twitter hash-tags and stock symbols.
• Each Tweet was then manually reviewed and assigned a
category
RESEARCH
METHODOLOGY
Sentiment Analysis (cont)
• Tweets were categorized as
• Bullish: denotes a positive sentiment.
• Bearish: denotes a negative sentiment.
• Neutral for those Tweets that do not convey any discernible
sentiment.
• Spam for those Tweets that aren’t delivering market
information.
RESEARCH
METHODOLOGY
Training Dataset Samples
Bullish
• consumer staples outperforming the broader market, expect this to
continue
Bearish
• if dexia doesn't get a bailout, markets will plunge%+ in a session, it is a lot
bigger than lehman ever was.
Neutral
• what to expect from the big google music announcement tomorrow
Spam
• unlimited free tv shows on your pc, free channels
RESEARCH
METHODOLOGY
Sentiment Analysis (cont)
• 1,000 Tweets of each classification were used in the training
dataset
• Using a built-in accuracy check algorithm, the training dataset
provided a 89.35% classification accuracy
• With the training data set created, each Tweet was analyzed
and assigned one of the four categories.
• Only Tweets assigned Bullish or Bearish were considered
during this study.
• Only Tweets mentioning the Energy Sector (XLE) and
Consumer Staples Sector (XLP) ETF’s and the symbols that
make up the sectors were analyzed
RESEARCH
METHODOLOGY
Twitter Twitter API
Mysql
Database
Bayes
Classification
Training
Dataset
Classified Tweet
RESEARCH
METHODOLOGY
Converting Qualitative to Quantitative
• To utilize the sentiment found within Tweets as a market
‘signal’, a quantitative measure was needed.
• The Bear/Bull ratio was created by counting the total number
of Tweets with Bearish sentiment during a period and dividing
that number by the total number of Tweets with Bullish
sentiment during a period.
• The Bear/Bull ratio follows the Put/Call ratio that is widely
known and followed to measure sentiment using the buying
and selling of Options in the stock market.
• The Put/Call ratio is calculated by dividing the number of
Puts (bearish activity) by the number of Calls (bullish
activity).
RESEARCH
METHODOLOGY
Converting Qualitative to Quantitative (cont)
The Bear/Bull Ratio is used to describe the overall sentiment for a
symbol, sector or overall market using a single value.
For the Bear/Bull Ratio:
• A value of 1.0 would equate to an equal number of Bearish and
Bullish sentiment Tweets.
• A value greater than 1.0 would provide evidence that there are
more Bearish Tweets than Bullish Tweets during the measured
time period.
• A value less than 1.0 would provide evidence that there are
more Bullish Tweets than Bearish Tweets in a given time
period.
RESEARCH
METHODOLOGY
Example of Daily Bear/Bull Ratio and Closing Price for XLE ETF
Date Number of
Bearish
Tweets
Number of
Bullish
Tweets
Bear/Bull
Ratio
XLE Close
5/1/2012 13 7 1.86 69.07
5/2/2012 5 5 1.00 67.95
5/3/2012 7 13 0.54 66.82
5/4/2012 9 13 0.69 65.29
RESEARCH
METHODOLOGY
Social Network Analysis
• An analysis of Twitter users was performed to determine
whether a Tweet sent by a user with more followers
provided more ‘weight’ to the sentiment of the symbol
mentioned in that Tweet.
• Using the concept of ReTweets, analysis was performed to
determine how far a user’s tweet travels.
• A ReTweet is simply when a user ‘forwards’ a Tweet by
another user.
DATA ANALYSIS
• Period of study – January 2012 through December 2012 (360 Days).
• During the collection period, a total of approximately 2.6 million Tweets
were collected from a total of 473,090 Twitter users.
• For this study, the following data was used:
• For XLE, 130,611 Tweets from 13,067 Twitter users.
• Average of 362.81 Tweets per day.
• Average of 9.99 Tweets per user.
• 1.09% of users sent 50% of Tweets.
• One user sent 6.67% of Tweets.
• For XLP, 144,214 Tweets from 37,760 Twitter users.
• Average of 400.59 Tweets per day.
• Average of 3.82 Tweets per user.
• 1.00% of users sent 50% of Tweets.
• One user sent 3.43% of Tweets.
DATA ANALYSIS
Description of Tweets for all symbols in XLE
Number of Total Tweets 130,611 Percentage
Number of Bullish Tweets 45,883 35.12%
Number of Bearish Tweets 30,680 23.49%
Number of Neutral Tweets 50,886 38.95%
Number of Spam Tweets 3,482 2.67%
Number of Tweets with no
classification
0 0
DATA ANALYSIS
Description of Tweets for all symbols in XLP
Number of Total Tweets 144,214 Percentage
Number of Bullish Tweets 32,315 22.41%
Number of Bearish Tweets 22,568 15.65%
Number of Neutral Tweets 60,572 42.00%
Number of Spam Tweets 28,757 19.94%
Number of Tweets with no
classification
2 0.001%
RESEARCH FINDINGS
H1a: The sentiment of a sector will match the overall averaged
sentiment of all stocks within the sector.
• H1a0 states that there will be no noticeable relationship
between the sentiment of a sector and the overall averaged
sentiment of stocks within the sector.
• For the analysis, the XLE and XLP ETF Bear/Bull ratios were
compared with the respective aggregated Bear/Bull ratios from
all symbols making up each sector.
RESEARCH FINDINGS
XLE Data:
• The XLE ETF averaged less than 5 Bullish Tweets per day and just over 6
Bearish Tweets per day
• Compare that to the aggregated counts of all 42 symbols that make up
the XLE sector:
• Bullish Tweets average approximately 150 Tweets per day
• Bearish Tweets average almost 89 Tweets per day.
XLP Data:
• The XLP ETF averaged less than 3 Bullish Tweets per day and just over 2
Bearish Tweets per day
• Compare that to the aggregated counts of all 42 symbols that make up
the XLP sector:
• Bullish Tweets average approximately 90 Tweets per day
• Bearish Tweets average almost 50 Tweets per day
XLE Distribution
• With such a low average count of Tweets per day, some concern exists that the
Central Limit Theorem isn't satisfied
• Reviewing the distributions, it is clear that the XLE Bear/Bull ratio (bottom left) is
not normally distributed while the Aggregated Symbol Bear/Bull ratio (bottom right)
is.
RESEARCH FINDINGS
9.07.56.04.53.01.50.0
80
70
60
50
40
30
20
10
0
Bear_Bull
Frequency
XLE Histogram of Bear_Bull
1.21.00.80.60.40.20.0
40
30
20
10
0
Bear_Bull
Frequency
Mean 0.6156
StDev 0.2066
N 366
Normal
Histogram of Aggregated XLE Bear_Bull
XLP Distribution
• With such a low average count of Tweets per day, some concern exists that the
Central Limit Theorem isn't satisfied
• Reviewing the distributions, it is clear that the XLP Bear/Bull ratio (bottom left) is
not normally distributed while the Aggregated Symbol Bear/Bull ratio (bottom right)
is.
RESEARCH FINDINGS
9.07.56.04.53.01.50.0
80
70
60
50
40
30
20
10
0
Bear_Bull
Frequency
XLP Histogram of Bear_Bull
1.21.00.80.60.40.20.0
40
30
20
10
0
Bear_Bull
Frequency
Mean 0.5609
StDev 0.2581
N 366
Normal
Histogram of XLP Sector Bear_Bull
RESEARCH FINDINGS
Based on the significant differences in distributions and
insufficient number of daily observations for either XLE or
XLP ETF's:
• There is not enough evidence available on a daily basis to
reject the null (H1a0)
RESEARCH FINDINGS
H1b: The sentiment of a sector can be used to predict the
movement of all stocks in that sector.
• H1b0 states that there will be no noticeable relationship
between the sentiment of a sector and the overall
averaged sentiment of stocks within the sector.
H1c: The sentiment of a sector or stock on any given day will
provide a prediction for the next day’s movement in that stock.
• H1c0 states that the sentiment of a sector will provide no
predictive capability.
RESEARCH FINDINGS
Similar to the research for H1a, the different distributions and
insufficient number of daily observations for either XLE or XLP
ETF's found previously:
• There is not enough evidence available on a daily basis for
individual symbols to reject the null for both H1b and H1c.
Although there is insufficient evidence to reject H1b0 and H1c0:
• A new definition of sector sentiment was defined and used to
continue the analysis.
• By using the aggregated sentiment of a sector as the Bear/Bull
ratio, additional analysis was performed.
RESEARCH FINDINGS
• Using the aggregated Bear/Bull ratio for the sectors covered by
XLE and XLP, a regression analysis was performed to analyze
whether the aggregated Bear/Bull ratio could predict daily price
movement for the XLE and XLP ETF’s and the symbols within
each sector.
• To perform regression analysis on stock market data, the time-
series data was transformed from a non-stationary series into a
stationary series.
• This transformation was accomplished by taking daily
closing price and creating a percentage change value from
one day to the next
RESEARCH FINDINGS
Regression Analysis Equation
• The regression equation used throughout the study:
Pi = a + b*ii +εi (1)
where:
Pi is the Predicted price at observation i
ii is the Bear/Bull ratio at observation i
RESEARCH FINDINGS
Regression analysis (Cont)
• The majority of correlations are low
• Durbin-Watson values are between 1.7 and 2.3, which points
to little to no autocorrelation in the residuals. This isn’t a
surprise since we transformed the data into a stationary series.
• The sign of the correlation coefficient's are negative, which
aligns with the idea behind the Bear/Bull ratio.
• Most symbols have very good F-statistics and correlations that
are statistically significant.
RESEARCH FINDINGS
Regression analysis (Cont)
• For XLE:
• 36 out of 43 symbols have a statistically significant
correlations with 95% significance between the
transformed daily close and aggregated Bear/Bull.
• For XLP:
• 5 out of 43 symbols have a statistically significant
correlation with 95% significance between the transformed
daily close and aggregated Bear/Bull.
RESEARCH FINDINGS
Regression analysis (Cont)
• To test the regression analysis, the data set was split into two
parts to create an in-sample and out-of-sample data set.
• The in-sample data set was used to run the regression
analysis and the out-of-sample data set was used to run
predictions of price movement to determine how well the
model works.
• The in-sample data set consisted of 188 days of data while
the out-of-sample data set consisted of 90 days of data.
• In the finance world, it is standard practice to use 20% to
30% of data for out-of-sample data.
RESEARCH FINDINGS
Regression analysis (Cont)
• Using the regression analysis output and the in-sample / out-
of-sample data, the regression models were tested for
accuracy.
• To find the accuracy measurement, the directional prediction of
the Bear/Bull ratio was compared to the direction of the
percentage change of the stock.
• Only those symbols with statistically significant correlations at
the 95% confidence level.
RESEARCH FINDINGS
Regression analysis (Cont)
• For XLE:
• 24 symbols with accuracy greater than or equal to 50%.
• Average accuracy is 51.79%.
• Median accuracy is 51.67%.
• Standard deviation is 4.73%.
• For XLP:
• 3 symbols with accuracy greater than or equal to 50%.
• Average accuracy is 51.57%.
• Median accuracy is 52.22%.
• Standard deviation is 3.95%.
RESEARCH FINDINGS
Outcome of H1a, H1b and H1c
• As stated previously:
• There is insufficient evidence available on a daily basis to
reject the null for H1a.
• By the original definition of sentiment, there is insufficient
evidence available on a daily basis to reject the null for
both H1b and H1c.
• Using the modified definition of sentiment to use
aggregated sentiment:
• There is limited evidence to reject the null for H1b and
H1c.
RESEARCH FINDINGS
H2a: The sentiment of a stock within a given sector will affect the
sentiment of the overall sector based on the relative market cap
weighting of that stock assigned to that stock within the sector.
• H2a0 states that the sentiment of a stock is not correlated with
the market cap weighting of the stock in that sector.
H2b: The stocks that provide the most weight toward the
sentiment of a sector are also the stocks with the highest number
of mentions on Twitter.
• H2b0 states that there is no relationship between the number
of mentions on Twitter and the affect that these stocks have
on the sector sentiment.
RESEARCH FINDINGS
Analysis for H2a
• The daily sentiment reading for each symbol was calculated
then multiplied by the index weighting and then regression
analysis was performed.
• For example, ExxonMobil (XOM) comprised ~18% of the
XLE ETF during the study
• XOM’s tweet volume was multiplied by this index weighting
to build a weighted sentiment Bear/Bull ratio
RESEARCH FINDINGS
Regression analysis for H2a
• For XLE:
• 4 out of 43 symbols had a statistically significant correlation with 95%
significance between daily close and aggregated Bear/Bull.
• 3 symbols with accuracy greater than or equal to 50%
• Average accuracy is 53.33%
• Median accuracy is 55.00%
• Standard deviation is 3.93%
• For XLP:
• 2 out of 43 symbols had a statistically significant correlation with 95%
significance between daily close and aggregated Bear/Bull.
• 1 symbol with accuracy greater than or equal to 50%
• Average accuracy is 49.44%
• Median accuracy is 49.44%
• Standard deviation is 0.56%
RESEARCH FINDINGS
Analysis for H2b
• Similarly to H2a, a regression analysis was performed using
regression analysis.
• A weighting mechanism was developed to assign a weight to
each symbol dependent on its contribution to the number of
Tweets per day.
• This weighted contribution was then used to build the
aggregated sentiment signal, which was then used for
regression analysis as described previously.
RESEARCH FINDINGS
Regression analysis for H2b
• For XLE:
• 13 out of 43 symbols have a statistically significant correlation with
95% significance between daily close and aggregated Bear/Bull.
• 10 symbols with accuracy greater than or equal to 50%
• Average accuracy is 53.08%.
• Median accuracy is 53.33%.
• Standard deviation is 4.14%.
• For XLP:
• 2 out of 43 symbols have a statistically significant correlation with
95% significance between daily close and aggregated Bear/Bull.
• 2 symbols with accuracy greater than or equal to 50%.
• Average accuracy is 51.67%.
• Median accuracy is 51.67%.
• Standard deviation is 0.56%.
RESEARCH FINDINGS
Outcome of H2a and H2b
• There is insufficient evidence available on a daily basis to
reject the null for H2a.
• There is limited evidence to support rejecting the null for H2b.
RESEARCH FINDINGS
H3: There is a difference in the effect that Tweets sent during non-
market hours (i.e., evenings and weekends) and Tweets sent during
market hours have on sentiment and price.
• H30 states that there is no difference in the effect of Tweets sent
during market hours and non-market hours.
Analysis for H3
• Tweets were split into two categories to describe whether the
Tweets were sent during trading hours or non-trading hours.
• Trading hours: For equity and index markets in the U.S., trading
hours are defined as 8:30 AM to 3:00 PM Central Time, Monday
through Friday.
• Non-trading hours: For equity and index markets in the US,
non-trading hours are defined as any time outside of the 8:30 AM
to 3:00 PM Central time including evenings and weekends.
RESEARCH FINDINGS
Regression analysis for H3:
• XLE Trading Hours
• 39 out of 43 symbols have a statistically significant correlation with
95% significance between daily close and aggregated Bear/Bull.
• 24 symbols with accuracy greater than or equal to 50%.
• Average accuracy is 51.06%.
• Median accuracy is 51.11%.
• Standard deviation is 3.09%.
• XLE Non-Trading Hours
• 36 out of 43 symbols have a statistically significant correlation with
95% significance between daily close and aggregated Bear/Bull.
• 20 symbols with accuracy greater than or equal to 50%.
• Average accuracy is 49.85%.
• Median accuracy is 50.00%.
• Standard deviation is 4.16%.
RESEARCH FINDINGS
Regression analysis for H3:
• XLP Trading Hours
• 5 out of 43 symbols have a statistically significant correlation with
95% significance between daily close and aggregated Bear/Bull.
• 3 symbols with accuracy greater than or equal to 50%.
• Average accuracy is 49.33%.
• Median accuracy is 51.11%.
• Standard deviation is 5.56%.
• XLP Non-Trading Hours
• 4 out of 43 symbols have a statistically significant correlation with
95% significance between daily close and aggregated Bear/Bull.
• 2 symbols with accuracy greater than or equal to 50%.
• Average accuracy is 50.23%.
• Median accuracy is 49.44%.
• Standard deviation is 4.80%.
RESEARCH FINDINGS
Outcome of H3
• There is evidence available on a daily basis to reject the null
for H3 for the XLE sector but not for the XLP sector.
• For XLE, Tweets sent during trading hours provided a
slight improvement in accuracy over those sent during
non-trading hours.
RESEARCH FINDINGS
H4: The number of followers of a Twitter user determines the effect
that users’ Tweets will have on sentiment for a stock or sector.
• H40 states that there is no relationship between the number of
followers and sentiment on a stock or sector.
Analysis for H4
• Recall that:
• XLE had 130,611 Tweets and 13,067 unique users.
• XLP had 144,214 Tweets and 37,760 unique users.
• No single user had more than 30 Tweets per day.
• XLE's most prolific sender of Tweets, on average, sent 24.19
Tweets per day.
• XLPs most prolific sender of Tweets, on average, sent 13.85
Tweets per day.
RESEARCH FINDINGS
Analysis for H4
• To satisfy the Central Limit Theorem, the Top 50 users sorted
by number of followers for each sector were selected in order
to get an average of 30 Tweets per day.
• The top 50 users by number of followers comprised just
8.41% of total Tweets for XLE and 9.06% of total Tweets
for XLP
• The Tweets by the Top 50 users by number of followers for
both XLE and XLP were combined to create a Bear/Bull ratio
for each sector.
• This Top 50 Bear/Bull ratio was used in regression analysis
using the regression equation.
RESEARCH FINDINGS
Regression analysis for H4
• For XLE:
• 38 out of 43 symbols have a statistically significant correlation with
95% significance between daily close and aggregated Bear/Bull.
• 21 symbols with accuracy greater than or equal to 50%.
• Average accuracy is 49.39%.
• Median accuracy is 50.00%.
• Standard deviation is 4.79%.
• For XLP:
• 4 out of 43 symbols have a statistically significant correlation with
95% significance between daily close and aggregated Bear/Bull.
• 3 symbols with accuracy greater than or equal to 50%.
• Average accuracy is 49.72%.
• Median accuracy is 51.11%.
• Standard deviation is 3.18%.
RESEARCH FINDINGS
Outcome of H4
There is insufficient evidence available on a daily basis to reject
the null for H4 for both individual users and the Top 50 users.
RESEARCH FINDINGS
Hypothesis Summary Table
Hypothesis Outcome
H1a: Sector ETF sentiment will match the aggregated sentiment. Insufficient evidence to reject
the null hypothesis
H1b: Sector ETF sentiment can be used to predict market movement for all sector
stocks.
Insufficient evidence to reject
the null hypothesis
H1c: Sentiment can be used to predict next day price movement. Insufficient evidence to reject
the null hypothesis.
H2a: Stocks will affect sentiment based on their index weighting. Insufficient evidence to reject
the null hypothesis
H2b: Stocks will affect sentiment based on how often they are mentioned. There is limited evidence to
support rejecting the null
H3: Stocks sent during trading and non-trading hours will affect sentiment differently. There is limited evidence to
support rejecting the null
H4: The number of followers of a Twitter user will affect sentiment Insufficient evidence to reject
the null hypothesis
RESEARCH FINDINGS
Using the Bear/Bull Ratio in an Investment Strategy
• Rather than try to predict daily movements, can the Bear/Bull
ratio be used in other ways?
• During this study, the idea of "extremes" in the Bear/Bull
ratio was investigated to determine whether they would
identify proper entry and exit signals
• Based on the contrarian approach to investing where
extreme sentiment is used as a signal to enter in the
opposite direction
• Can Bear/Bull extremes be used to enter the market and
provide adequate returns?
RESEARCH FINDINGS
Using the Bear/Bull Ratio in an Investment Strategy
To find extremes, a simple approach was used
• Identify the top 90% of values as Bearish Extremes and the
bottom 10% of values as Bullish Extremes.
• A trading signal was generated if the Bear/Bull ratio closes above
the Bearish Extreme value or below the Bullish Extreme value.
The extreme values for XLE, XLP are:
• XLE:
• Bearish Extreme: >= 0.90
• Bullish Extreme: <= 0.43
• XLP:
• Bearish Extreme: >= 0.90
• Bullish Extreme: <= 0.33
RESEARCH FINDINGS
Using the Bear/Bull Ratio in an Investment Strategy
• Using Tradestation, a highly regarded professional investing
platform, an investing strategy was developed using Bear/Bull
ratio extremes values.
• Using the Aggregated Bear/Bull ratio, the strategy was tested
against the XLE and XLP ETF's as well as each of the symbols
within the sectors.
• This strategy was compared to a simple Buy and Hold strategy
and a Random Entry strategy.
• Buy and Hold means to buy a stock on Day 1 of the test
period and sell it on the last day.
• Random Entry means to enter at random times in the
market.
RESEARCH FINDINGS
Using the Bear/Bull Ratio in an Investment Strategy
• Highlights of the Investing strategy:
• August 21 2012 to December 31 2012
• Entry criteria (If not already in a trade):
• Bearish Extreme = Buy
• Bullish Extreme = Short
• Direction: Long & Short
• Number of Shares: 500
• Holding period: 2 days
• Commission: $5 per trade
• Slippage: $0.10 per trade
• Slippage was used to simulate non-perfect entries
RESEARCH FINDINGS
Using the Bear/Bull Ratio in an Investment Strategy
Investing strategy outcomes for XLE
XLE All Symbols in XLE (Average)
Bear/Bull Sentiment Return 4.85% Bear/Bull Sentiment Return 3.86%
Bear/Bull Extreme Accuracy 54.55% Bear/Bull Extreme Accuracy 54.16%
Buy and Hold Return -1.07% Buy and Hold Return 1.09%
Random Entry Return -3.62% Random Entry Return -2.61%
RESEARCH FINDINGS
Using the Bear/Bull Ratio in an Investment Strategy
Investing strategy outcomes for XLP
XLP All Symbols in XLP (Average)
Bear/Bull Sentiment Return -1.39% Bear/Bull Sentiment Return -2.19%
Bear/Bull Extreme Accuracy 33.33% Bear/Bull Extreme Accuracy 34.60%
Buy and Hold Return -2.10% Buy and Hold Return -1.87%
Random Entry Return -2.52% Random Entry Return -1.64%
RESEARCH FINDINGS
Using the Bear/Bull Ratio in an Investment Strategy
• The XLE ETF resulted in a 578 basis point improvement over buy
and hold returns and 723 basis point improvement over random
entry returns.
• For all symbols in the XLE sector resulted in a 277 basis point
improvement over buy and hold returns and a 511 basis point
improvement over random entry returns.
• The XLP ETF resulted in a 71 basis point improvement over buy
and hold returns and 113 basis point improvement over random
entry returns.
• For all symbols in the XLP sector resulted in a 32 basis point
decrease in performance over buy and hold returns and a 55
basis point decrease in performance over random entry returns.
CONCLUSIONS AND
FUTURE RESEARCH
• Due to the lower volume of Tweets for most symbols, it is
recommended to look at methods to aggregate sentiment rather
than use individual symbol sentiment for those symbols with a
small number of Tweets.
• Negative correlation between sentiment and next day price
movement points toward future analysis of using sentiment as a
contrarian indicator using the Bear/Bull ratio construct.
• Stocks with higher volatility appear to be better candidates for use
with Twitter Sentiment
• XLE and the symbols that make up the sector were more
volatile than XLP
• XLE Bear/Bull ratios were more accurate than XLP
• Tweets sent during market hours appear to provide more valuable
information relative to market movements than those sent during
non-market hours.
CONCLUSIONS AND
FUTURE RESEARCH
• The idea of a sentiment ‘extreme’ was shown to be a
potentially useful approach to using sentiment as a predictor
for price movement.
• The number of followers a user has on Twitter does not appear
to have any correlation with how that user’s tweets affect price
on the symbols studied.
• Stocks that exhibit high trading volume on a regular basis also
exhibit high Tweet volume on a regular basis.
• A small number of users send the majority of Tweets
discussing stocks and ETF’s.
• Approximately 1% of users sent 50% of Tweets during the
study.
CONCLUSIONS AND
FUTURE RESEARCH
Avenues for Future Research
• Further research using Twitter sentiment extremes for investing
signals.
• Additional research into classification methods to attempt to find
faster or more effective classification techniques
• Further analysis of Tweet volume on a per-symbol, sector and
market basis compared to stock market volume.
• Further analysis into the use of aggregated sentiment to be used
across sectors or multiple symbols.
• Further analysis of intraday sentiment analysis and market
correlations.
• Further analysis of longer time periods (Weekly, Monthly) and
market correlations.
• Further analysis of the interaction of volatility and twitter sentiment
QUESTIONS?
Any Questions?
Feel free to reach out to me afterwards with comments or
questions:
eric@ericbrown.com
(918) 928-2887

More Related Content

What's hot

Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new datase...
Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new datase...Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new datase...
Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new datase...
Knowledge Media Institute - The Open University
 
Ontology based sentiment analysis
Ontology based sentiment analysisOntology based sentiment analysis
Ontology based sentiment analysis
prathako
 
Datapedia Analysis Report
Datapedia Analysis ReportDatapedia Analysis Report
Datapedia Analysis Report
Abanoub Amgad
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter Data
Sumit Raj
 
Best Practices for Sentiment Analysis Webinar
Best Practices for Sentiment Analysis Webinar Best Practices for Sentiment Analysis Webinar
Best Practices for Sentiment Analysis Webinar
Mechanical Turk
 
Final deck
Final deckFinal deck
Final deck
Swapna Lekkala
 
Stock prediction using social network
Stock prediction using social networkStock prediction using social network
Stock prediction using social network
Chanon Hongsirikulkit
 
Sentiment Analysis in Twitter with Lightweight Discourse Analysis
Sentiment Analysis in Twitter with Lightweight Discourse AnalysisSentiment Analysis in Twitter with Lightweight Discourse Analysis
Sentiment Analysis in Twitter with Lightweight Discourse Analysis
Naveen Kumar
 
SENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATASENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATAParvathy Devaraj
 
Sentiment analysis of Twitter data using python
Sentiment analysis of Twitter data using pythonSentiment analysis of Twitter data using python
Sentiment analysis of Twitter data using python
Hetu Bhavsar
 
IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...
IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...
IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...
IRJET Journal
 
Sentiment analysis - Our approach and use cases
Sentiment analysis - Our approach and use casesSentiment analysis - Our approach and use cases
Sentiment analysis - Our approach and use cases
Karol Chlasta
 
Sentiment Analysis on Amazon Movie Reviews Dataset
Sentiment Analysis on Amazon Movie Reviews DatasetSentiment Analysis on Amazon Movie Reviews Dataset
Sentiment Analysis on Amazon Movie Reviews Dataset
Maham F'Rajput
 
Twitter sentimentanalysis report
Twitter sentimentanalysis reportTwitter sentimentanalysis report
Twitter sentimentanalysis report
Savio Aberneithie
 
Product Sentiment Analysis
Product Sentiment AnalysisProduct Sentiment Analysis
Product Sentiment Analysisnancy amala
 
Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14Rachit Goel
 
Sentiment Analaysis on Twitter
Sentiment Analaysis on TwitterSentiment Analaysis on Twitter
Sentiment Analaysis on Twitter
Nitish J Prabhu
 
Sentiment analyzer and opinion mining
Sentiment analyzer and opinion miningSentiment analyzer and opinion mining
Sentiment analyzer and opinion mining
Ankush Mehta
 
An overview of text mining and sentiment analysis for Decision Support System
An overview of text mining and sentiment analysis for Decision Support SystemAn overview of text mining and sentiment analysis for Decision Support System
An overview of text mining and sentiment analysis for Decision Support System
Gan Keng Hoon
 

What's hot (20)

Opinion Mining – Twitter
Opinion Mining – TwitterOpinion Mining – Twitter
Opinion Mining – Twitter
 
Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new datase...
Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new datase...Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new datase...
Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new datase...
 
Ontology based sentiment analysis
Ontology based sentiment analysisOntology based sentiment analysis
Ontology based sentiment analysis
 
Datapedia Analysis Report
Datapedia Analysis ReportDatapedia Analysis Report
Datapedia Analysis Report
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter Data
 
Best Practices for Sentiment Analysis Webinar
Best Practices for Sentiment Analysis Webinar Best Practices for Sentiment Analysis Webinar
Best Practices for Sentiment Analysis Webinar
 
Final deck
Final deckFinal deck
Final deck
 
Stock prediction using social network
Stock prediction using social networkStock prediction using social network
Stock prediction using social network
 
Sentiment Analysis in Twitter with Lightweight Discourse Analysis
Sentiment Analysis in Twitter with Lightweight Discourse AnalysisSentiment Analysis in Twitter with Lightweight Discourse Analysis
Sentiment Analysis in Twitter with Lightweight Discourse Analysis
 
SENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATASENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATA
 
Sentiment analysis of Twitter data using python
Sentiment analysis of Twitter data using pythonSentiment analysis of Twitter data using python
Sentiment analysis of Twitter data using python
 
IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...
IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...
IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...
 
Sentiment analysis - Our approach and use cases
Sentiment analysis - Our approach and use casesSentiment analysis - Our approach and use cases
Sentiment analysis - Our approach and use cases
 
Sentiment Analysis on Amazon Movie Reviews Dataset
Sentiment Analysis on Amazon Movie Reviews DatasetSentiment Analysis on Amazon Movie Reviews Dataset
Sentiment Analysis on Amazon Movie Reviews Dataset
 
Twitter sentimentanalysis report
Twitter sentimentanalysis reportTwitter sentimentanalysis report
Twitter sentimentanalysis report
 
Product Sentiment Analysis
Product Sentiment AnalysisProduct Sentiment Analysis
Product Sentiment Analysis
 
Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14
 
Sentiment Analaysis on Twitter
Sentiment Analaysis on TwitterSentiment Analaysis on Twitter
Sentiment Analaysis on Twitter
 
Sentiment analyzer and opinion mining
Sentiment analyzer and opinion miningSentiment analyzer and opinion mining
Sentiment analyzer and opinion mining
 
An overview of text mining and sentiment analysis for Decision Support System
An overview of text mining and sentiment analysis for Decision Support SystemAn overview of text mining and sentiment analysis for Decision Support System
An overview of text mining and sentiment analysis for Decision Support System
 

Viewers also liked

Twitter analysis by Kaify Rais
Twitter analysis by Kaify RaisTwitter analysis by Kaify Rais
Twitter analysis by Kaify Rais
Ajay Ohri
 
Athertyn May 2013 Market Update
Athertyn May 2013 Market UpdateAthertyn May 2013 Market Update
Athertyn May 2013 Market Update
Rajeev Sajja
 
Text Categorization using N-grams and Hidden-Markov-Models
Text Categorization using N-grams and Hidden-Markov-ModelsText Categorization using N-grams and Hidden-Markov-Models
Text Categorization using N-grams and Hidden-Markov-Models
Thomas Mathew
 
Big data大数据presentation1
Big data大数据presentation1Big data大数据presentation1
Big data大数据presentation1
Johnson Zhu
 
The Art of Social Media Analysis with Twitter & Python
The Art of Social Media Analysis with Twitter & PythonThe Art of Social Media Analysis with Twitter & Python
The Art of Social Media Analysis with Twitter & Python
Krishna Sankar
 
Sentiment Analysis Training Guide [Simplified Chinese]
Sentiment Analysis Training Guide [Simplified Chinese]Sentiment Analysis Training Guide [Simplified Chinese]
Sentiment Analysis Training Guide [Simplified Chinese]
Massolutions
 
Final difence presentation
Final difence presentationFinal difence presentation
Final difence presentationH. M. Ariful .
 
思维的片段 有动画
思维的片段 有动画思维的片段 有动画
思维的片段 有动画
Lanhui Ou
 
Brand Digital Asset Analysis (Facebook FansPage & Twitter)
Brand Digital Asset Analysis (Facebook FansPage & Twitter)Brand Digital Asset Analysis (Facebook FansPage & Twitter)
Brand Digital Asset Analysis (Facebook FansPage & Twitter)
Master of Business Administration SBM-ITB
 
Las elecciones a vista de pájaro
Las elecciones a vista de pájaroLas elecciones a vista de pájaro
Las elecciones a vista de pájaro
Mª Luz Congosto
 
Ogilvy PR 360 DI Twitter Webinar
Ogilvy PR 360 DI Twitter WebinarOgilvy PR 360 DI Twitter Webinar
Ogilvy PR 360 DI Twitter Webinar
guestfd8f1
 
優化宅的日常-數據分析篇
優化宅的日常-數據分析篇優化宅的日常-數據分析篇
優化宅的日常-數據分析篇
Wanju Wang
 
The Twitter Tutorial
The Twitter TutorialThe Twitter Tutorial
The Twitter Tutorial
David Griner
 
Instagram Analytics: What to Measure to Grow Your Instagram
Instagram Analytics: What to Measure to Grow Your InstagramInstagram Analytics: What to Measure to Grow Your Instagram
Instagram Analytics: What to Measure to Grow Your Instagram
Peg Fitzpatrick
 
Social media and data collection for citizen science
Social media and data collection for citizen scienceSocial media and data collection for citizen science
Social media and data collection for citizen science
Giulia Annovi
 
Top 9 Solutions To Oily Hair
Top 9 Solutions To Oily HairTop 9 Solutions To Oily Hair
Top 9 Solutions To Oily Hair
Eason Chan
 
Twitter Tips
Twitter TipsTwitter Tips
Twitter Tips
ron mader
 
The Next Big Thing is Web 3.0. Catch It If You Can
The Next Big Thing is Web 3.0. Catch It If You Can The Next Big Thing is Web 3.0. Catch It If You Can
The Next Big Thing is Web 3.0. Catch It If You Can
Judy O'Connell
 
Sentiment analysis of tweets
Sentiment analysis of tweetsSentiment analysis of tweets
Sentiment analysis of tweetsVasu Jain
 

Viewers also liked (20)

Twitter analysis by Kaify Rais
Twitter analysis by Kaify RaisTwitter analysis by Kaify Rais
Twitter analysis by Kaify Rais
 
Athertyn May 2013 Market Update
Athertyn May 2013 Market UpdateAthertyn May 2013 Market Update
Athertyn May 2013 Market Update
 
Text Categorization using N-grams and Hidden-Markov-Models
Text Categorization using N-grams and Hidden-Markov-ModelsText Categorization using N-grams and Hidden-Markov-Models
Text Categorization using N-grams and Hidden-Markov-Models
 
Big data大数据presentation1
Big data大数据presentation1Big data大数据presentation1
Big data大数据presentation1
 
The Art of Social Media Analysis with Twitter & Python
The Art of Social Media Analysis with Twitter & PythonThe Art of Social Media Analysis with Twitter & Python
The Art of Social Media Analysis with Twitter & Python
 
Sentiment Analysis Training Guide [Simplified Chinese]
Sentiment Analysis Training Guide [Simplified Chinese]Sentiment Analysis Training Guide [Simplified Chinese]
Sentiment Analysis Training Guide [Simplified Chinese]
 
Final difence presentation
Final difence presentationFinal difence presentation
Final difence presentation
 
思维的片段 有动画
思维的片段 有动画思维的片段 有动画
思维的片段 有动画
 
Brand Digital Asset Analysis (Facebook FansPage & Twitter)
Brand Digital Asset Analysis (Facebook FansPage & Twitter)Brand Digital Asset Analysis (Facebook FansPage & Twitter)
Brand Digital Asset Analysis (Facebook FansPage & Twitter)
 
Las elecciones a vista de pájaro
Las elecciones a vista de pájaroLas elecciones a vista de pájaro
Las elecciones a vista de pájaro
 
Ogilvy PR 360 DI Twitter Webinar
Ogilvy PR 360 DI Twitter WebinarOgilvy PR 360 DI Twitter Webinar
Ogilvy PR 360 DI Twitter Webinar
 
優化宅的日常-數據分析篇
優化宅的日常-數據分析篇優化宅的日常-數據分析篇
優化宅的日常-數據分析篇
 
The Twitter Tutorial
The Twitter TutorialThe Twitter Tutorial
The Twitter Tutorial
 
Twitter PPT
Twitter PPTTwitter PPT
Twitter PPT
 
Instagram Analytics: What to Measure to Grow Your Instagram
Instagram Analytics: What to Measure to Grow Your InstagramInstagram Analytics: What to Measure to Grow Your Instagram
Instagram Analytics: What to Measure to Grow Your Instagram
 
Social media and data collection for citizen science
Social media and data collection for citizen scienceSocial media and data collection for citizen science
Social media and data collection for citizen science
 
Top 9 Solutions To Oily Hair
Top 9 Solutions To Oily HairTop 9 Solutions To Oily Hair
Top 9 Solutions To Oily Hair
 
Twitter Tips
Twitter TipsTwitter Tips
Twitter Tips
 
The Next Big Thing is Web 3.0. Catch It If You Can
The Next Big Thing is Web 3.0. Catch It If You Can The Next Big Thing is Web 3.0. Catch It If You Can
The Next Big Thing is Web 3.0. Catch It If You Can
 
Sentiment analysis of tweets
Sentiment analysis of tweetsSentiment analysis of tweets
Sentiment analysis of tweets
 

Similar to These slides cover the final defense presentation for my Doctorate degree. The topic: Analysis of Twitter Messages for Sentiment and Insight for use in Stock Market Decision Making.

Competitive Intelligence Workflow
Competitive Intelligence WorkflowCompetitive Intelligence Workflow
Competitive Intelligence Workflow
quidsupport
 
Ym2 assigment 2.1 - an an - hieu hieu
Ym2   assigment 2.1 - an an - hieu hieuYm2   assigment 2.1 - an an - hieu hieu
Ym2 assigment 2.1 - an an - hieu hieuAn Trần
 
IRJET- Sentimental Analysis of Twitter for Stock Market Investment
IRJET- Sentimental Analysis of Twitter for Stock Market InvestmentIRJET- Sentimental Analysis of Twitter for Stock Market Investment
IRJET- Sentimental Analysis of Twitter for Stock Market Investment
IRJET Journal
 
SM&WA_S1-2.pptx
SM&WA_S1-2.pptxSM&WA_S1-2.pptx
SM&WA_S1-2.pptx
SurabhiSakshi1
 
Everyone’s Watching It: The Role of Hype in Television Engagement through So...
Everyone’s Watching It: The Role of Hype in Television Engagement through So...Everyone’s Watching It: The Role of Hype in Television Engagement through So...
Everyone’s Watching It: The Role of Hype in Television Engagement through So...
Darryl Woodford
 
Social Media and AI: Don’t forget the users
Social Media and AI: Don’t forget the usersSocial Media and AI: Don’t forget the users
Social Media and AI: Don’t forget the users
Mounia Lalmas-Roelleke
 
Social intelligence understanding your audience to enhance your business
Social intelligence understanding your audience to enhance your businessSocial intelligence understanding your audience to enhance your business
Social intelligence understanding your audience to enhance your business
Alterian
 
FIX the Fixing PROactive quelling of SPORTS EVENTS manipulation
FIX the Fixing PROactive quelling of SPORTS EVENTS manipulationFIX the Fixing PROactive quelling of SPORTS EVENTS manipulation
FIX the Fixing PROactive quelling of SPORTS EVENTS manipulation
apolon1hermes1
 
UNIT 1 Business Research Method by Dr. Rashmi Maini-1.pptx
UNIT 1 Business Research Method by Dr. Rashmi Maini-1.pptxUNIT 1 Business Research Method by Dr. Rashmi Maini-1.pptx
UNIT 1 Business Research Method by Dr. Rashmi Maini-1.pptx
HarshkumarSingh74
 
Research design
Research designResearch design
Research design
Kritika Jain
 
Research design
Research designResearch design
Research design
Kritika Jain
 
Research on cosumer perception-Alok & Ajinkya
Research on cosumer perception-Alok & AjinkyaResearch on cosumer perception-Alok & Ajinkya
Research on cosumer perception-Alok & AjinkyaALOK MOKASE
 
Extent april2012-kostroma social-networks-socialmedia-trading
Extent april2012-kostroma social-networks-socialmedia-tradingExtent april2012-kostroma social-networks-socialmedia-trading
Extent april2012-kostroma social-networks-socialmedia-tradingextentconf Tsoy
 
How to Analyze Your Conversations on Social Media
How to Analyze Your Conversations on Social MediaHow to Analyze Your Conversations on Social Media
How to Analyze Your Conversations on Social Media
Mohamed Mahdy
 
IRJET- Stock Market Prediction using Financial News Articles
IRJET- Stock Market Prediction using Financial News ArticlesIRJET- Stock Market Prediction using Financial News Articles
IRJET- Stock Market Prediction using Financial News Articles
IRJET Journal
 
Data mining project
Data mining projectData mining project
Data mining project
Shweta_Kamble
 
An Ensemble Model for Cross-Domain Polarity Classification on Twitter
An Ensemble Model for Cross-Domain Polarity Classification on TwitterAn Ensemble Model for Cross-Domain Polarity Classification on Twitter
An Ensemble Model for Cross-Domain Polarity Classification on Twitter
Symeon Papadopoulos
 
The power of social media anlaytics
The power of social media anlayticsThe power of social media anlaytics
The power of social media anlaytics
Ajay Ram
 
Semester 2, 2015 BUSM3200 Assignment 1 Guide 1 How we .docx
Semester 2, 2015 BUSM3200 Assignment 1 Guide 1 How we .docxSemester 2, 2015 BUSM3200 Assignment 1 Guide 1 How we .docx
Semester 2, 2015 BUSM3200 Assignment 1 Guide 1 How we .docx
lesleyryder69361
 
Research design
Research designResearch design
Research design
Dr. Priyanka Jain
 

Similar to These slides cover the final defense presentation for my Doctorate degree. The topic: Analysis of Twitter Messages for Sentiment and Insight for use in Stock Market Decision Making. (20)

Competitive Intelligence Workflow
Competitive Intelligence WorkflowCompetitive Intelligence Workflow
Competitive Intelligence Workflow
 
Ym2 assigment 2.1 - an an - hieu hieu
Ym2   assigment 2.1 - an an - hieu hieuYm2   assigment 2.1 - an an - hieu hieu
Ym2 assigment 2.1 - an an - hieu hieu
 
IRJET- Sentimental Analysis of Twitter for Stock Market Investment
IRJET- Sentimental Analysis of Twitter for Stock Market InvestmentIRJET- Sentimental Analysis of Twitter for Stock Market Investment
IRJET- Sentimental Analysis of Twitter for Stock Market Investment
 
SM&WA_S1-2.pptx
SM&WA_S1-2.pptxSM&WA_S1-2.pptx
SM&WA_S1-2.pptx
 
Everyone’s Watching It: The Role of Hype in Television Engagement through So...
Everyone’s Watching It: The Role of Hype in Television Engagement through So...Everyone’s Watching It: The Role of Hype in Television Engagement through So...
Everyone’s Watching It: The Role of Hype in Television Engagement through So...
 
Social Media and AI: Don’t forget the users
Social Media and AI: Don’t forget the usersSocial Media and AI: Don’t forget the users
Social Media and AI: Don’t forget the users
 
Social intelligence understanding your audience to enhance your business
Social intelligence understanding your audience to enhance your businessSocial intelligence understanding your audience to enhance your business
Social intelligence understanding your audience to enhance your business
 
FIX the Fixing PROactive quelling of SPORTS EVENTS manipulation
FIX the Fixing PROactive quelling of SPORTS EVENTS manipulationFIX the Fixing PROactive quelling of SPORTS EVENTS manipulation
FIX the Fixing PROactive quelling of SPORTS EVENTS manipulation
 
UNIT 1 Business Research Method by Dr. Rashmi Maini-1.pptx
UNIT 1 Business Research Method by Dr. Rashmi Maini-1.pptxUNIT 1 Business Research Method by Dr. Rashmi Maini-1.pptx
UNIT 1 Business Research Method by Dr. Rashmi Maini-1.pptx
 
Research design
Research designResearch design
Research design
 
Research design
Research designResearch design
Research design
 
Research on cosumer perception-Alok & Ajinkya
Research on cosumer perception-Alok & AjinkyaResearch on cosumer perception-Alok & Ajinkya
Research on cosumer perception-Alok & Ajinkya
 
Extent april2012-kostroma social-networks-socialmedia-trading
Extent april2012-kostroma social-networks-socialmedia-tradingExtent april2012-kostroma social-networks-socialmedia-trading
Extent april2012-kostroma social-networks-socialmedia-trading
 
How to Analyze Your Conversations on Social Media
How to Analyze Your Conversations on Social MediaHow to Analyze Your Conversations on Social Media
How to Analyze Your Conversations on Social Media
 
IRJET- Stock Market Prediction using Financial News Articles
IRJET- Stock Market Prediction using Financial News ArticlesIRJET- Stock Market Prediction using Financial News Articles
IRJET- Stock Market Prediction using Financial News Articles
 
Data mining project
Data mining projectData mining project
Data mining project
 
An Ensemble Model for Cross-Domain Polarity Classification on Twitter
An Ensemble Model for Cross-Domain Polarity Classification on TwitterAn Ensemble Model for Cross-Domain Polarity Classification on Twitter
An Ensemble Model for Cross-Domain Polarity Classification on Twitter
 
The power of social media anlaytics
The power of social media anlayticsThe power of social media anlaytics
The power of social media anlaytics
 
Semester 2, 2015 BUSM3200 Assignment 1 Guide 1 How we .docx
Semester 2, 2015 BUSM3200 Assignment 1 Guide 1 How we .docxSemester 2, 2015 BUSM3200 Assignment 1 Guide 1 How we .docx
Semester 2, 2015 BUSM3200 Assignment 1 Guide 1 How we .docx
 
Research design
Research designResearch design
Research design
 

Recently uploaded

Instant Issue Debit Cards
Instant Issue Debit CardsInstant Issue Debit Cards
Instant Issue Debit Cards
egoetzinger
 
一比一原版(UoB毕业证)伯明翰大学毕业证如何办理
一比一原版(UoB毕业证)伯明翰大学毕业证如何办理一比一原版(UoB毕业证)伯明翰大学毕业证如何办理
一比一原版(UoB毕业证)伯明翰大学毕业证如何办理
nexop1
 
Abhay Bhutada Leads Poonawalla Fincorp To Record Low NPA And Unprecedented Gr...
Abhay Bhutada Leads Poonawalla Fincorp To Record Low NPA And Unprecedented Gr...Abhay Bhutada Leads Poonawalla Fincorp To Record Low NPA And Unprecedented Gr...
Abhay Bhutada Leads Poonawalla Fincorp To Record Low NPA And Unprecedented Gr...
Vighnesh Shashtri
 
how to swap pi coins to foreign currency withdrawable.
how to swap pi coins to foreign currency withdrawable.how to swap pi coins to foreign currency withdrawable.
how to swap pi coins to foreign currency withdrawable.
DOT TECH
 
1. Elemental Economics - Introduction to mining.pdf
1. Elemental Economics - Introduction to mining.pdf1. Elemental Economics - Introduction to mining.pdf
1. Elemental Economics - Introduction to mining.pdf
Neal Brewster
 
The European Unemployment Puzzle: implications from population aging
The European Unemployment Puzzle: implications from population agingThe European Unemployment Puzzle: implications from population aging
The European Unemployment Puzzle: implications from population aging
GRAPE
 
Scope Of Macroeconomics introduction and basic theories
Scope Of Macroeconomics introduction and basic theoriesScope Of Macroeconomics introduction and basic theories
Scope Of Macroeconomics introduction and basic theories
nomankalyar153
 
The Role of Non-Banking Financial Companies (NBFCs)
The Role of Non-Banking Financial Companies (NBFCs)The Role of Non-Banking Financial Companies (NBFCs)
The Role of Non-Banking Financial Companies (NBFCs)
nickysharmasucks
 
Pensions and housing - Pensions PlayPen - 4 June 2024 v3 (1).pdf
Pensions and housing - Pensions PlayPen - 4 June 2024 v3 (1).pdfPensions and housing - Pensions PlayPen - 4 June 2024 v3 (1).pdf
Pensions and housing - Pensions PlayPen - 4 June 2024 v3 (1).pdf
Henry Tapper
 
一比一原版(IC毕业证)帝国理工大学毕业证如何办理
一比一原版(IC毕业证)帝国理工大学毕业证如何办理一比一原版(IC毕业证)帝国理工大学毕业证如何办理
一比一原版(IC毕业证)帝国理工大学毕业证如何办理
conose1
 
APP I Lecture Notes to students 0f 4the year
APP I  Lecture Notes  to students 0f 4the yearAPP I  Lecture Notes  to students 0f 4the year
APP I Lecture Notes to students 0f 4the year
telilaalilemlem
 
Intro_Economics_ GPresentation Week 4.pptx
Intro_Economics_ GPresentation Week 4.pptxIntro_Economics_ GPresentation Week 4.pptx
Intro_Economics_ GPresentation Week 4.pptx
shetivia
 
Tax System, Behaviour, Justice, and Voluntary Compliance Culture in Nigeria -...
Tax System, Behaviour, Justice, and Voluntary Compliance Culture in Nigeria -...Tax System, Behaviour, Justice, and Voluntary Compliance Culture in Nigeria -...
Tax System, Behaviour, Justice, and Voluntary Compliance Culture in Nigeria -...
Godwin Emmanuel Oyedokun MBA MSc PhD FCA FCTI FCNA CFE FFAR
 
Seminar: Gender Board Diversity through Ownership Networks
Seminar: Gender Board Diversity through Ownership NetworksSeminar: Gender Board Diversity through Ownership Networks
Seminar: Gender Board Diversity through Ownership Networks
GRAPE
 
Which Crypto to Buy Today for Short-Term in May-June 2024.pdf
Which Crypto to Buy Today for Short-Term in May-June 2024.pdfWhich Crypto to Buy Today for Short-Term in May-June 2024.pdf
Which Crypto to Buy Today for Short-Term in May-June 2024.pdf
Kezex (KZX)
 
How to get verified on Coinbase Account?_.docx
How to get verified on Coinbase Account?_.docxHow to get verified on Coinbase Account?_.docx
How to get verified on Coinbase Account?_.docx
Buy bitget
 
Earn a passive income with prosocial investing
Earn a passive income with prosocial investingEarn a passive income with prosocial investing
Earn a passive income with prosocial investing
Colin R. Turner
 
Donald Trump Presentation and his life.pptx
Donald Trump Presentation and his life.pptxDonald Trump Presentation and his life.pptx
Donald Trump Presentation and his life.pptx
SerdarHudaykuliyew
 
how can I sell pi coins after successfully completing KYC
how can I sell pi coins after successfully completing KYChow can I sell pi coins after successfully completing KYC
how can I sell pi coins after successfully completing KYC
DOT TECH
 
Tumelo-deep-dive-into-pass-through-voting-Feb23 (1).pdf
Tumelo-deep-dive-into-pass-through-voting-Feb23 (1).pdfTumelo-deep-dive-into-pass-through-voting-Feb23 (1).pdf
Tumelo-deep-dive-into-pass-through-voting-Feb23 (1).pdf
Henry Tapper
 

Recently uploaded (20)

Instant Issue Debit Cards
Instant Issue Debit CardsInstant Issue Debit Cards
Instant Issue Debit Cards
 
一比一原版(UoB毕业证)伯明翰大学毕业证如何办理
一比一原版(UoB毕业证)伯明翰大学毕业证如何办理一比一原版(UoB毕业证)伯明翰大学毕业证如何办理
一比一原版(UoB毕业证)伯明翰大学毕业证如何办理
 
Abhay Bhutada Leads Poonawalla Fincorp To Record Low NPA And Unprecedented Gr...
Abhay Bhutada Leads Poonawalla Fincorp To Record Low NPA And Unprecedented Gr...Abhay Bhutada Leads Poonawalla Fincorp To Record Low NPA And Unprecedented Gr...
Abhay Bhutada Leads Poonawalla Fincorp To Record Low NPA And Unprecedented Gr...
 
how to swap pi coins to foreign currency withdrawable.
how to swap pi coins to foreign currency withdrawable.how to swap pi coins to foreign currency withdrawable.
how to swap pi coins to foreign currency withdrawable.
 
1. Elemental Economics - Introduction to mining.pdf
1. Elemental Economics - Introduction to mining.pdf1. Elemental Economics - Introduction to mining.pdf
1. Elemental Economics - Introduction to mining.pdf
 
The European Unemployment Puzzle: implications from population aging
The European Unemployment Puzzle: implications from population agingThe European Unemployment Puzzle: implications from population aging
The European Unemployment Puzzle: implications from population aging
 
Scope Of Macroeconomics introduction and basic theories
Scope Of Macroeconomics introduction and basic theoriesScope Of Macroeconomics introduction and basic theories
Scope Of Macroeconomics introduction and basic theories
 
The Role of Non-Banking Financial Companies (NBFCs)
The Role of Non-Banking Financial Companies (NBFCs)The Role of Non-Banking Financial Companies (NBFCs)
The Role of Non-Banking Financial Companies (NBFCs)
 
Pensions and housing - Pensions PlayPen - 4 June 2024 v3 (1).pdf
Pensions and housing - Pensions PlayPen - 4 June 2024 v3 (1).pdfPensions and housing - Pensions PlayPen - 4 June 2024 v3 (1).pdf
Pensions and housing - Pensions PlayPen - 4 June 2024 v3 (1).pdf
 
一比一原版(IC毕业证)帝国理工大学毕业证如何办理
一比一原版(IC毕业证)帝国理工大学毕业证如何办理一比一原版(IC毕业证)帝国理工大学毕业证如何办理
一比一原版(IC毕业证)帝国理工大学毕业证如何办理
 
APP I Lecture Notes to students 0f 4the year
APP I  Lecture Notes  to students 0f 4the yearAPP I  Lecture Notes  to students 0f 4the year
APP I Lecture Notes to students 0f 4the year
 
Intro_Economics_ GPresentation Week 4.pptx
Intro_Economics_ GPresentation Week 4.pptxIntro_Economics_ GPresentation Week 4.pptx
Intro_Economics_ GPresentation Week 4.pptx
 
Tax System, Behaviour, Justice, and Voluntary Compliance Culture in Nigeria -...
Tax System, Behaviour, Justice, and Voluntary Compliance Culture in Nigeria -...Tax System, Behaviour, Justice, and Voluntary Compliance Culture in Nigeria -...
Tax System, Behaviour, Justice, and Voluntary Compliance Culture in Nigeria -...
 
Seminar: Gender Board Diversity through Ownership Networks
Seminar: Gender Board Diversity through Ownership NetworksSeminar: Gender Board Diversity through Ownership Networks
Seminar: Gender Board Diversity through Ownership Networks
 
Which Crypto to Buy Today for Short-Term in May-June 2024.pdf
Which Crypto to Buy Today for Short-Term in May-June 2024.pdfWhich Crypto to Buy Today for Short-Term in May-June 2024.pdf
Which Crypto to Buy Today for Short-Term in May-June 2024.pdf
 
How to get verified on Coinbase Account?_.docx
How to get verified on Coinbase Account?_.docxHow to get verified on Coinbase Account?_.docx
How to get verified on Coinbase Account?_.docx
 
Earn a passive income with prosocial investing
Earn a passive income with prosocial investingEarn a passive income with prosocial investing
Earn a passive income with prosocial investing
 
Donald Trump Presentation and his life.pptx
Donald Trump Presentation and his life.pptxDonald Trump Presentation and his life.pptx
Donald Trump Presentation and his life.pptx
 
how can I sell pi coins after successfully completing KYC
how can I sell pi coins after successfully completing KYChow can I sell pi coins after successfully completing KYC
how can I sell pi coins after successfully completing KYC
 
Tumelo-deep-dive-into-pass-through-voting-Feb23 (1).pdf
Tumelo-deep-dive-into-pass-through-voting-Feb23 (1).pdfTumelo-deep-dive-into-pass-through-voting-Feb23 (1).pdf
Tumelo-deep-dive-into-pass-through-voting-Feb23 (1).pdf
 

These slides cover the final defense presentation for my Doctorate degree. The topic: Analysis of Twitter Messages for Sentiment and Insight for use in Stock Market Decision Making.

  • 1. ANALYSIS OF TWITTER MESSAGES FOR SENTIMENT AND INSIGHT FOR USE IN STOCK MARKET DECISION MAKING ERIC D. BROWN DOCTORAL DISSERTATION FINAL DEFENSE
  • 2. AGENDA • Introduction • Previous Research • Research Summary • Research Model • Research Methodology • Data Analysis • Research Findings • Conclusions & Future Research
  • 3. INTRODUCTION • Sentiment has an underlying factor in the investing world for many years. • Many companies create and track various types of sentiment • Consumer Confidence Index • Investors Intelligence Sentiment Index • American Association of Individual Investors Sentiment Survey • “Market Sentiment” • Rather than waiting days, weeks or months like current sentiment measures, can we use sentiment generated in real-time to improve trading performance and investment decisions? • Can we create a “sentiment of now” using social media or other user-generated content? • Can Twitter be used to determine the ‘sentiment of now’?
  • 4. INTRODUCTION • The goal of this study was to gain a more thorough understanding of Twitter content and the users that create it. • Can a Tweet convey sentiment with only 140 characters available? • If Tweets do convey some form of sentiment can this sentiment be used in a predictive manner? • Can this Twitter content and users be ‘tapped’ to build methodology that identifies and evaluates likely investment opportunities?
  • 5. PREVIOUS RESEARCH • Wysoki (1998) – Found a strong positive correlation between volume of messages posted on message boards overnight and next day’s trading volume and stock returns. • Tumarkin and Whitelaw (2001) – Concluded that there are no predictive capabilities found within message board activity. • Antweiler and Frank (2004) – Used sentiment analysis to show strong positive correlation between message board posts and next day trading volume and volatility. Showed minor correlation between message board posts and next day price activity.
  • 6. PREVIOUS RESEARCH • Gu, et al (2006) – Found that aggregation of individual recommendations on stock message boards have no predictive power on future stock returns. • Das and Chen (2007) – Using sentiment analysis of messages on message boards, found no correlation between sentiment and individual stock price movement but did find positive correlation of the aggregate sentiment of a set of aggregate stocks and movement in the stock market. • Zhang (2009) – Studied the reputation of a message board poster and showed that a ‘better’ reputation was shared more widely and had a larger effect on sentiment.
  • 7. PREVIOUS RESEARCH • Bollen, Mao & Zeng (2010) – Using sentiment analysis, determines the ‘mood’ of the twitter universe and then predicts the next day movement of the Dow Jones Industrial Average – with an 87.6% accuracy. • Accuracy isn’t everything. A Hedge Fund attempted to run their fund with this research and closed shop within a year. • Sprenger and Welpe (2010) – Focused on the S&P 100 stocks and the sentiment of Tweets regarding those stocks. Showed that sentiment of the company on Twitter closely follows market movements. This research also showed positive correlation between trading volume and Tweet volume.
  • 8. PREVIOUS RESEARCH Additional research in Sentiment Analysis of Twitter: • Bifet & Frank, 2010 – Sentiment Knowledge Discovery in Twitter Streaming Data. • Pak & Paroubek, 2010 - Twitter as a Corpus for Sentiment Analysis and Opinion Mining. • Romero, Meeder, & Klienberg, 2010 - Differences in the Mechanics of Information Diffusion Across Topics: Idioms, Political Hashtags, and Complex Contagion on Twitter • Castillo, Mendoza & Poblete, 2010 – Information Credibility on Twitter. • Diakopoulos & Shamma, 2010 – Characterizing Debate Performance via Aggregated Twitter Sentiment.
  • 9. RESEARCH SUMMARY The main questions driving this study were: • Can analysis of publicly available Tweets provide insight for investing decisions? • Do Tweets (and their subsequent sentiment) have any effect on movement in the stock market? • Can Tweets be mined and analyzed to predict daily movements in the stock market? • Does a Twitter user’s reputation have an effect on how people perceive and use their shared investing ideas?
  • 10. RESEARCH SUMMARY To address those main drivers, the following research questions were developed: • RQ-1: Using a given sector of the stock market, does the sentiment for that sector match the aggregated sentiment for the stocks that make up that sector? How well does the sentiment predict price / volume movement? • RQ-2: Are there specific stocks within a given sector that supply the majority of the sentiment for that sector? If so, do these stocks supply sentiment in correlation to the weighting given to them by ratings agencies (e.g., Standard & Poor’s)? • RQ-3: Are there times of the day or days of the week that provide a more accurate and informative sentiment for a stock or sector? • RQ-4: Are there specific users that provide more ‘weight’ to a sentiment of a stock or sector based on the users’ reputation?
  • 11. RESEARCH SUMMARY RQ-1 Hypotheses • H1a: The sentiment of a sector will match the overall averaged sentiment of all stocks within the sector. • H1a0: States that there will be no noticeable relationship between the sentiment of a sector and the overall averaged sentiment of stocks within the sector. • H1b: The sentiment of a sector can be used to predict the movement of all stocks in that sector. • H1b0: States that the sentiment of a sector will provide no predictive capability. • H1c: The sentiment of a sector or stock on any given day will provide a prediction for the next day’s movement in that stock. • H1c0: States that there will be no predictive capability on price and sentiment from day to day.
  • 12. RESEARCH SUMMARY RQ-2 Hypotheses • H2a: The sentiment of a stock within a given sector will affect the sentiment of the overall sector based on the relative market cap weighting of that stock. • H2a0: States that the sentiment of a stock is not correlated with the market cap weighting of the stock in that sector. • H2b: The stocks that provide the most weight toward the sentiment of a sector are also the stocks with the highest number of mentions on Twitter. • H2b0: States that there is no relationship between the number of mentions on Twitter and the affect that these stocks have on the sector sentiment.
  • 13. RESEARCH SUMMARY RQ-3 Hypothesis • H3: There is a difference in the effect that Tweets sent during non-market hours (i.e., evenings and weekends) and Tweets sent during market hours have on sentiment and price. • H30: States that there is no difference in the effect of Tweets during market hours and non-market hours.
  • 14. RESEARCH SUMMARY RQ-4 Hypothesis • H4: The number of followers of a Twitter user determines the effect that users’ Tweets will have on sentiment for a stock or sector. • H40: States that there is no relationship between the number of followers and sentiment on a stock or sector.
  • 15. RESEARCH SUMMARY Mapping Hypothesis and Research Questions Research Question Hypothesis RQ-1: Using a given sector of the stock market, does the sentiment for that sector match the aggregated sentiment for the stocks that make up that sector? How well does the sentiment predict price / volume movement? H1a, H1b, H1c RQ-2: Are there specific stocks within a given sector that supply the majority of the sentiment for that sector? If so, do these stocks supply sentiment in correlation to the weighting give to them by ratings agencies (e.g., Standard & Poor’s)? H2a, H2b RQ-3: Are there times of the day or days of the week that provide a more accurate and informative sentiment for a stock or sector? H3 RQ-4: Are there specific users that provide more ‘weight’ to a sentiment of a stock or sector based on the users’ reputation? H4
  • 16. RESEARCH MODEL Twitter Sentiment Analysis For Stocks and Sectors Stock & Sector Analysis Sentiment Weighting within Sectors H1a, H1b, H1c H2a, H2b Day / Time Analysis H3 Information Content of Tweets Correlations with Stock Market Prices User Reputation Analysis of Twitter Users H4 Predictive Nature of Tweets
  • 17. RESEARCH METHOD Twitter Data Collection Sentiment Analysis User Analysis Stock Market Data Price Analysis Correlation of Twitter Sentiment with Price Reputation of Twitter user Understanding of predictive capabilities of Twitter Sentiment and the affect of user reputation for investing decisions
  • 18. RESEARCH METHODOLOGY Data Collection • Twitter API to collect tweets (tweet, sender, date, time) • Tweets referencing companies and sectors are collected and stored in a MySQL database for future study • Using the nomenclature made popular by StockTwits (www.stocktwits.com). Example: The stock symbol for Apple is AAPL. Users following the StockTwits nomenclature add a “$” to the symbol – “$AAPL”. • EODData.com market feed to gather Stock Market data (price and volume)
  • 19. RESEARCH METHODOLOGY Market Data • This study reviewed the Energy (XLE) and Consumer Staples Sectors (XLP). • Chosen to get different types of companies. • Both have the same number of symbols in the sector. • Used XLE and XLP Exchange Traded Funds (ETF’s) • ETF’s are a ‘proxy’ for owning each company covered by the ETF. • ETF’s are, generally, a weighted index made up of each company within the sector. The company’s stock price is weighted based on the market cap of the company. • ETF’s provide a method to diversify and/or invest in a sector or industry without owning a large portfolio of companies.
  • 20. Market Data • XLE (top chart) shows a non-trending volatile market • Gains for the year = $1.86 per share or 2.77% gain • 42 companies make up the XLE Sector • XLP (bottom chart) shows an upward trending • Gains for the year = $3.05 per share or 10.05% gain • 42 companies make up the XLP sector RESEARCH METHODOLOGY
  • 21. RESEARCH METHODOLOGY Sentiment Analysis • Using the Python programming language and the Natural Language Toolkit’s implementation of the Bayesian text classification system, algorithms were implemented to determine sentiment found within Tweets • For Bayesian classification, a data set was needed to ‘train’ the classifier to categorize data appropriately. • To create the training data set, 10,000 Tweets were randomly selected from the collection of Tweets. • Each Tweet was ‘cleansed’ to remove identifying Twitter user information, Twitter hash-tags and stock symbols. • Each Tweet was then manually reviewed and assigned a category
  • 22. RESEARCH METHODOLOGY Sentiment Analysis (cont) • Tweets were categorized as • Bullish: denotes a positive sentiment. • Bearish: denotes a negative sentiment. • Neutral for those Tweets that do not convey any discernible sentiment. • Spam for those Tweets that aren’t delivering market information.
  • 23. RESEARCH METHODOLOGY Training Dataset Samples Bullish • consumer staples outperforming the broader market, expect this to continue Bearish • if dexia doesn't get a bailout, markets will plunge%+ in a session, it is a lot bigger than lehman ever was. Neutral • what to expect from the big google music announcement tomorrow Spam • unlimited free tv shows on your pc, free channels
  • 24. RESEARCH METHODOLOGY Sentiment Analysis (cont) • 1,000 Tweets of each classification were used in the training dataset • Using a built-in accuracy check algorithm, the training dataset provided a 89.35% classification accuracy • With the training data set created, each Tweet was analyzed and assigned one of the four categories. • Only Tweets assigned Bullish or Bearish were considered during this study. • Only Tweets mentioning the Energy Sector (XLE) and Consumer Staples Sector (XLP) ETF’s and the symbols that make up the sectors were analyzed
  • 26. RESEARCH METHODOLOGY Converting Qualitative to Quantitative • To utilize the sentiment found within Tweets as a market ‘signal’, a quantitative measure was needed. • The Bear/Bull ratio was created by counting the total number of Tweets with Bearish sentiment during a period and dividing that number by the total number of Tweets with Bullish sentiment during a period. • The Bear/Bull ratio follows the Put/Call ratio that is widely known and followed to measure sentiment using the buying and selling of Options in the stock market. • The Put/Call ratio is calculated by dividing the number of Puts (bearish activity) by the number of Calls (bullish activity).
  • 27. RESEARCH METHODOLOGY Converting Qualitative to Quantitative (cont) The Bear/Bull Ratio is used to describe the overall sentiment for a symbol, sector or overall market using a single value. For the Bear/Bull Ratio: • A value of 1.0 would equate to an equal number of Bearish and Bullish sentiment Tweets. • A value greater than 1.0 would provide evidence that there are more Bearish Tweets than Bullish Tweets during the measured time period. • A value less than 1.0 would provide evidence that there are more Bullish Tweets than Bearish Tweets in a given time period.
  • 28. RESEARCH METHODOLOGY Example of Daily Bear/Bull Ratio and Closing Price for XLE ETF Date Number of Bearish Tweets Number of Bullish Tweets Bear/Bull Ratio XLE Close 5/1/2012 13 7 1.86 69.07 5/2/2012 5 5 1.00 67.95 5/3/2012 7 13 0.54 66.82 5/4/2012 9 13 0.69 65.29
  • 29. RESEARCH METHODOLOGY Social Network Analysis • An analysis of Twitter users was performed to determine whether a Tweet sent by a user with more followers provided more ‘weight’ to the sentiment of the symbol mentioned in that Tweet. • Using the concept of ReTweets, analysis was performed to determine how far a user’s tweet travels. • A ReTweet is simply when a user ‘forwards’ a Tweet by another user.
  • 30. DATA ANALYSIS • Period of study – January 2012 through December 2012 (360 Days). • During the collection period, a total of approximately 2.6 million Tweets were collected from a total of 473,090 Twitter users. • For this study, the following data was used: • For XLE, 130,611 Tweets from 13,067 Twitter users. • Average of 362.81 Tweets per day. • Average of 9.99 Tweets per user. • 1.09% of users sent 50% of Tweets. • One user sent 6.67% of Tweets. • For XLP, 144,214 Tweets from 37,760 Twitter users. • Average of 400.59 Tweets per day. • Average of 3.82 Tweets per user. • 1.00% of users sent 50% of Tweets. • One user sent 3.43% of Tweets.
  • 31. DATA ANALYSIS Description of Tweets for all symbols in XLE Number of Total Tweets 130,611 Percentage Number of Bullish Tweets 45,883 35.12% Number of Bearish Tweets 30,680 23.49% Number of Neutral Tweets 50,886 38.95% Number of Spam Tweets 3,482 2.67% Number of Tweets with no classification 0 0
  • 32. DATA ANALYSIS Description of Tweets for all symbols in XLP Number of Total Tweets 144,214 Percentage Number of Bullish Tweets 32,315 22.41% Number of Bearish Tweets 22,568 15.65% Number of Neutral Tweets 60,572 42.00% Number of Spam Tweets 28,757 19.94% Number of Tweets with no classification 2 0.001%
  • 33. RESEARCH FINDINGS H1a: The sentiment of a sector will match the overall averaged sentiment of all stocks within the sector. • H1a0 states that there will be no noticeable relationship between the sentiment of a sector and the overall averaged sentiment of stocks within the sector. • For the analysis, the XLE and XLP ETF Bear/Bull ratios were compared with the respective aggregated Bear/Bull ratios from all symbols making up each sector.
  • 34. RESEARCH FINDINGS XLE Data: • The XLE ETF averaged less than 5 Bullish Tweets per day and just over 6 Bearish Tweets per day • Compare that to the aggregated counts of all 42 symbols that make up the XLE sector: • Bullish Tweets average approximately 150 Tweets per day • Bearish Tweets average almost 89 Tweets per day. XLP Data: • The XLP ETF averaged less than 3 Bullish Tweets per day and just over 2 Bearish Tweets per day • Compare that to the aggregated counts of all 42 symbols that make up the XLP sector: • Bullish Tweets average approximately 90 Tweets per day • Bearish Tweets average almost 50 Tweets per day
  • 35. XLE Distribution • With such a low average count of Tweets per day, some concern exists that the Central Limit Theorem isn't satisfied • Reviewing the distributions, it is clear that the XLE Bear/Bull ratio (bottom left) is not normally distributed while the Aggregated Symbol Bear/Bull ratio (bottom right) is. RESEARCH FINDINGS 9.07.56.04.53.01.50.0 80 70 60 50 40 30 20 10 0 Bear_Bull Frequency XLE Histogram of Bear_Bull 1.21.00.80.60.40.20.0 40 30 20 10 0 Bear_Bull Frequency Mean 0.6156 StDev 0.2066 N 366 Normal Histogram of Aggregated XLE Bear_Bull
  • 36. XLP Distribution • With such a low average count of Tweets per day, some concern exists that the Central Limit Theorem isn't satisfied • Reviewing the distributions, it is clear that the XLP Bear/Bull ratio (bottom left) is not normally distributed while the Aggregated Symbol Bear/Bull ratio (bottom right) is. RESEARCH FINDINGS 9.07.56.04.53.01.50.0 80 70 60 50 40 30 20 10 0 Bear_Bull Frequency XLP Histogram of Bear_Bull 1.21.00.80.60.40.20.0 40 30 20 10 0 Bear_Bull Frequency Mean 0.5609 StDev 0.2581 N 366 Normal Histogram of XLP Sector Bear_Bull
  • 37. RESEARCH FINDINGS Based on the significant differences in distributions and insufficient number of daily observations for either XLE or XLP ETF's: • There is not enough evidence available on a daily basis to reject the null (H1a0)
  • 38. RESEARCH FINDINGS H1b: The sentiment of a sector can be used to predict the movement of all stocks in that sector. • H1b0 states that there will be no noticeable relationship between the sentiment of a sector and the overall averaged sentiment of stocks within the sector. H1c: The sentiment of a sector or stock on any given day will provide a prediction for the next day’s movement in that stock. • H1c0 states that the sentiment of a sector will provide no predictive capability.
  • 39. RESEARCH FINDINGS Similar to the research for H1a, the different distributions and insufficient number of daily observations for either XLE or XLP ETF's found previously: • There is not enough evidence available on a daily basis for individual symbols to reject the null for both H1b and H1c. Although there is insufficient evidence to reject H1b0 and H1c0: • A new definition of sector sentiment was defined and used to continue the analysis. • By using the aggregated sentiment of a sector as the Bear/Bull ratio, additional analysis was performed.
  • 40. RESEARCH FINDINGS • Using the aggregated Bear/Bull ratio for the sectors covered by XLE and XLP, a regression analysis was performed to analyze whether the aggregated Bear/Bull ratio could predict daily price movement for the XLE and XLP ETF’s and the symbols within each sector. • To perform regression analysis on stock market data, the time- series data was transformed from a non-stationary series into a stationary series. • This transformation was accomplished by taking daily closing price and creating a percentage change value from one day to the next
  • 41. RESEARCH FINDINGS Regression Analysis Equation • The regression equation used throughout the study: Pi = a + b*ii +εi (1) where: Pi is the Predicted price at observation i ii is the Bear/Bull ratio at observation i
  • 42. RESEARCH FINDINGS Regression analysis (Cont) • The majority of correlations are low • Durbin-Watson values are between 1.7 and 2.3, which points to little to no autocorrelation in the residuals. This isn’t a surprise since we transformed the data into a stationary series. • The sign of the correlation coefficient's are negative, which aligns with the idea behind the Bear/Bull ratio. • Most symbols have very good F-statistics and correlations that are statistically significant.
  • 43. RESEARCH FINDINGS Regression analysis (Cont) • For XLE: • 36 out of 43 symbols have a statistically significant correlations with 95% significance between the transformed daily close and aggregated Bear/Bull. • For XLP: • 5 out of 43 symbols have a statistically significant correlation with 95% significance between the transformed daily close and aggregated Bear/Bull.
  • 44. RESEARCH FINDINGS Regression analysis (Cont) • To test the regression analysis, the data set was split into two parts to create an in-sample and out-of-sample data set. • The in-sample data set was used to run the regression analysis and the out-of-sample data set was used to run predictions of price movement to determine how well the model works. • The in-sample data set consisted of 188 days of data while the out-of-sample data set consisted of 90 days of data. • In the finance world, it is standard practice to use 20% to 30% of data for out-of-sample data.
  • 45. RESEARCH FINDINGS Regression analysis (Cont) • Using the regression analysis output and the in-sample / out- of-sample data, the regression models were tested for accuracy. • To find the accuracy measurement, the directional prediction of the Bear/Bull ratio was compared to the direction of the percentage change of the stock. • Only those symbols with statistically significant correlations at the 95% confidence level.
  • 46. RESEARCH FINDINGS Regression analysis (Cont) • For XLE: • 24 symbols with accuracy greater than or equal to 50%. • Average accuracy is 51.79%. • Median accuracy is 51.67%. • Standard deviation is 4.73%. • For XLP: • 3 symbols with accuracy greater than or equal to 50%. • Average accuracy is 51.57%. • Median accuracy is 52.22%. • Standard deviation is 3.95%.
  • 47. RESEARCH FINDINGS Outcome of H1a, H1b and H1c • As stated previously: • There is insufficient evidence available on a daily basis to reject the null for H1a. • By the original definition of sentiment, there is insufficient evidence available on a daily basis to reject the null for both H1b and H1c. • Using the modified definition of sentiment to use aggregated sentiment: • There is limited evidence to reject the null for H1b and H1c.
  • 48. RESEARCH FINDINGS H2a: The sentiment of a stock within a given sector will affect the sentiment of the overall sector based on the relative market cap weighting of that stock assigned to that stock within the sector. • H2a0 states that the sentiment of a stock is not correlated with the market cap weighting of the stock in that sector. H2b: The stocks that provide the most weight toward the sentiment of a sector are also the stocks with the highest number of mentions on Twitter. • H2b0 states that there is no relationship between the number of mentions on Twitter and the affect that these stocks have on the sector sentiment.
  • 49. RESEARCH FINDINGS Analysis for H2a • The daily sentiment reading for each symbol was calculated then multiplied by the index weighting and then regression analysis was performed. • For example, ExxonMobil (XOM) comprised ~18% of the XLE ETF during the study • XOM’s tweet volume was multiplied by this index weighting to build a weighted sentiment Bear/Bull ratio
  • 50. RESEARCH FINDINGS Regression analysis for H2a • For XLE: • 4 out of 43 symbols had a statistically significant correlation with 95% significance between daily close and aggregated Bear/Bull. • 3 symbols with accuracy greater than or equal to 50% • Average accuracy is 53.33% • Median accuracy is 55.00% • Standard deviation is 3.93% • For XLP: • 2 out of 43 symbols had a statistically significant correlation with 95% significance between daily close and aggregated Bear/Bull. • 1 symbol with accuracy greater than or equal to 50% • Average accuracy is 49.44% • Median accuracy is 49.44% • Standard deviation is 0.56%
  • 51. RESEARCH FINDINGS Analysis for H2b • Similarly to H2a, a regression analysis was performed using regression analysis. • A weighting mechanism was developed to assign a weight to each symbol dependent on its contribution to the number of Tweets per day. • This weighted contribution was then used to build the aggregated sentiment signal, which was then used for regression analysis as described previously.
  • 52. RESEARCH FINDINGS Regression analysis for H2b • For XLE: • 13 out of 43 symbols have a statistically significant correlation with 95% significance between daily close and aggregated Bear/Bull. • 10 symbols with accuracy greater than or equal to 50% • Average accuracy is 53.08%. • Median accuracy is 53.33%. • Standard deviation is 4.14%. • For XLP: • 2 out of 43 symbols have a statistically significant correlation with 95% significance between daily close and aggregated Bear/Bull. • 2 symbols with accuracy greater than or equal to 50%. • Average accuracy is 51.67%. • Median accuracy is 51.67%. • Standard deviation is 0.56%.
  • 53. RESEARCH FINDINGS Outcome of H2a and H2b • There is insufficient evidence available on a daily basis to reject the null for H2a. • There is limited evidence to support rejecting the null for H2b.
  • 54. RESEARCH FINDINGS H3: There is a difference in the effect that Tweets sent during non- market hours (i.e., evenings and weekends) and Tweets sent during market hours have on sentiment and price. • H30 states that there is no difference in the effect of Tweets sent during market hours and non-market hours. Analysis for H3 • Tweets were split into two categories to describe whether the Tweets were sent during trading hours or non-trading hours. • Trading hours: For equity and index markets in the U.S., trading hours are defined as 8:30 AM to 3:00 PM Central Time, Monday through Friday. • Non-trading hours: For equity and index markets in the US, non-trading hours are defined as any time outside of the 8:30 AM to 3:00 PM Central time including evenings and weekends.
  • 55. RESEARCH FINDINGS Regression analysis for H3: • XLE Trading Hours • 39 out of 43 symbols have a statistically significant correlation with 95% significance between daily close and aggregated Bear/Bull. • 24 symbols with accuracy greater than or equal to 50%. • Average accuracy is 51.06%. • Median accuracy is 51.11%. • Standard deviation is 3.09%. • XLE Non-Trading Hours • 36 out of 43 symbols have a statistically significant correlation with 95% significance between daily close and aggregated Bear/Bull. • 20 symbols with accuracy greater than or equal to 50%. • Average accuracy is 49.85%. • Median accuracy is 50.00%. • Standard deviation is 4.16%.
  • 56. RESEARCH FINDINGS Regression analysis for H3: • XLP Trading Hours • 5 out of 43 symbols have a statistically significant correlation with 95% significance between daily close and aggregated Bear/Bull. • 3 symbols with accuracy greater than or equal to 50%. • Average accuracy is 49.33%. • Median accuracy is 51.11%. • Standard deviation is 5.56%. • XLP Non-Trading Hours • 4 out of 43 symbols have a statistically significant correlation with 95% significance between daily close and aggregated Bear/Bull. • 2 symbols with accuracy greater than or equal to 50%. • Average accuracy is 50.23%. • Median accuracy is 49.44%. • Standard deviation is 4.80%.
  • 57. RESEARCH FINDINGS Outcome of H3 • There is evidence available on a daily basis to reject the null for H3 for the XLE sector but not for the XLP sector. • For XLE, Tweets sent during trading hours provided a slight improvement in accuracy over those sent during non-trading hours.
  • 58. RESEARCH FINDINGS H4: The number of followers of a Twitter user determines the effect that users’ Tweets will have on sentiment for a stock or sector. • H40 states that there is no relationship between the number of followers and sentiment on a stock or sector. Analysis for H4 • Recall that: • XLE had 130,611 Tweets and 13,067 unique users. • XLP had 144,214 Tweets and 37,760 unique users. • No single user had more than 30 Tweets per day. • XLE's most prolific sender of Tweets, on average, sent 24.19 Tweets per day. • XLPs most prolific sender of Tweets, on average, sent 13.85 Tweets per day.
  • 59. RESEARCH FINDINGS Analysis for H4 • To satisfy the Central Limit Theorem, the Top 50 users sorted by number of followers for each sector were selected in order to get an average of 30 Tweets per day. • The top 50 users by number of followers comprised just 8.41% of total Tweets for XLE and 9.06% of total Tweets for XLP • The Tweets by the Top 50 users by number of followers for both XLE and XLP were combined to create a Bear/Bull ratio for each sector. • This Top 50 Bear/Bull ratio was used in regression analysis using the regression equation.
  • 60. RESEARCH FINDINGS Regression analysis for H4 • For XLE: • 38 out of 43 symbols have a statistically significant correlation with 95% significance between daily close and aggregated Bear/Bull. • 21 symbols with accuracy greater than or equal to 50%. • Average accuracy is 49.39%. • Median accuracy is 50.00%. • Standard deviation is 4.79%. • For XLP: • 4 out of 43 symbols have a statistically significant correlation with 95% significance between daily close and aggregated Bear/Bull. • 3 symbols with accuracy greater than or equal to 50%. • Average accuracy is 49.72%. • Median accuracy is 51.11%. • Standard deviation is 3.18%.
  • 61. RESEARCH FINDINGS Outcome of H4 There is insufficient evidence available on a daily basis to reject the null for H4 for both individual users and the Top 50 users.
  • 62. RESEARCH FINDINGS Hypothesis Summary Table Hypothesis Outcome H1a: Sector ETF sentiment will match the aggregated sentiment. Insufficient evidence to reject the null hypothesis H1b: Sector ETF sentiment can be used to predict market movement for all sector stocks. Insufficient evidence to reject the null hypothesis H1c: Sentiment can be used to predict next day price movement. Insufficient evidence to reject the null hypothesis. H2a: Stocks will affect sentiment based on their index weighting. Insufficient evidence to reject the null hypothesis H2b: Stocks will affect sentiment based on how often they are mentioned. There is limited evidence to support rejecting the null H3: Stocks sent during trading and non-trading hours will affect sentiment differently. There is limited evidence to support rejecting the null H4: The number of followers of a Twitter user will affect sentiment Insufficient evidence to reject the null hypothesis
  • 63. RESEARCH FINDINGS Using the Bear/Bull Ratio in an Investment Strategy • Rather than try to predict daily movements, can the Bear/Bull ratio be used in other ways? • During this study, the idea of "extremes" in the Bear/Bull ratio was investigated to determine whether they would identify proper entry and exit signals • Based on the contrarian approach to investing where extreme sentiment is used as a signal to enter in the opposite direction • Can Bear/Bull extremes be used to enter the market and provide adequate returns?
  • 64. RESEARCH FINDINGS Using the Bear/Bull Ratio in an Investment Strategy To find extremes, a simple approach was used • Identify the top 90% of values as Bearish Extremes and the bottom 10% of values as Bullish Extremes. • A trading signal was generated if the Bear/Bull ratio closes above the Bearish Extreme value or below the Bullish Extreme value. The extreme values for XLE, XLP are: • XLE: • Bearish Extreme: >= 0.90 • Bullish Extreme: <= 0.43 • XLP: • Bearish Extreme: >= 0.90 • Bullish Extreme: <= 0.33
  • 65. RESEARCH FINDINGS Using the Bear/Bull Ratio in an Investment Strategy • Using Tradestation, a highly regarded professional investing platform, an investing strategy was developed using Bear/Bull ratio extremes values. • Using the Aggregated Bear/Bull ratio, the strategy was tested against the XLE and XLP ETF's as well as each of the symbols within the sectors. • This strategy was compared to a simple Buy and Hold strategy and a Random Entry strategy. • Buy and Hold means to buy a stock on Day 1 of the test period and sell it on the last day. • Random Entry means to enter at random times in the market.
  • 66. RESEARCH FINDINGS Using the Bear/Bull Ratio in an Investment Strategy • Highlights of the Investing strategy: • August 21 2012 to December 31 2012 • Entry criteria (If not already in a trade): • Bearish Extreme = Buy • Bullish Extreme = Short • Direction: Long & Short • Number of Shares: 500 • Holding period: 2 days • Commission: $5 per trade • Slippage: $0.10 per trade • Slippage was used to simulate non-perfect entries
  • 67. RESEARCH FINDINGS Using the Bear/Bull Ratio in an Investment Strategy Investing strategy outcomes for XLE XLE All Symbols in XLE (Average) Bear/Bull Sentiment Return 4.85% Bear/Bull Sentiment Return 3.86% Bear/Bull Extreme Accuracy 54.55% Bear/Bull Extreme Accuracy 54.16% Buy and Hold Return -1.07% Buy and Hold Return 1.09% Random Entry Return -3.62% Random Entry Return -2.61%
  • 68. RESEARCH FINDINGS Using the Bear/Bull Ratio in an Investment Strategy Investing strategy outcomes for XLP XLP All Symbols in XLP (Average) Bear/Bull Sentiment Return -1.39% Bear/Bull Sentiment Return -2.19% Bear/Bull Extreme Accuracy 33.33% Bear/Bull Extreme Accuracy 34.60% Buy and Hold Return -2.10% Buy and Hold Return -1.87% Random Entry Return -2.52% Random Entry Return -1.64%
  • 69. RESEARCH FINDINGS Using the Bear/Bull Ratio in an Investment Strategy • The XLE ETF resulted in a 578 basis point improvement over buy and hold returns and 723 basis point improvement over random entry returns. • For all symbols in the XLE sector resulted in a 277 basis point improvement over buy and hold returns and a 511 basis point improvement over random entry returns. • The XLP ETF resulted in a 71 basis point improvement over buy and hold returns and 113 basis point improvement over random entry returns. • For all symbols in the XLP sector resulted in a 32 basis point decrease in performance over buy and hold returns and a 55 basis point decrease in performance over random entry returns.
  • 70. CONCLUSIONS AND FUTURE RESEARCH • Due to the lower volume of Tweets for most symbols, it is recommended to look at methods to aggregate sentiment rather than use individual symbol sentiment for those symbols with a small number of Tweets. • Negative correlation between sentiment and next day price movement points toward future analysis of using sentiment as a contrarian indicator using the Bear/Bull ratio construct. • Stocks with higher volatility appear to be better candidates for use with Twitter Sentiment • XLE and the symbols that make up the sector were more volatile than XLP • XLE Bear/Bull ratios were more accurate than XLP • Tweets sent during market hours appear to provide more valuable information relative to market movements than those sent during non-market hours.
  • 71. CONCLUSIONS AND FUTURE RESEARCH • The idea of a sentiment ‘extreme’ was shown to be a potentially useful approach to using sentiment as a predictor for price movement. • The number of followers a user has on Twitter does not appear to have any correlation with how that user’s tweets affect price on the symbols studied. • Stocks that exhibit high trading volume on a regular basis also exhibit high Tweet volume on a regular basis. • A small number of users send the majority of Tweets discussing stocks and ETF’s. • Approximately 1% of users sent 50% of Tweets during the study.
  • 72. CONCLUSIONS AND FUTURE RESEARCH Avenues for Future Research • Further research using Twitter sentiment extremes for investing signals. • Additional research into classification methods to attempt to find faster or more effective classification techniques • Further analysis of Tweet volume on a per-symbol, sector and market basis compared to stock market volume. • Further analysis into the use of aggregated sentiment to be used across sectors or multiple symbols. • Further analysis of intraday sentiment analysis and market correlations. • Further analysis of longer time periods (Weekly, Monthly) and market correlations. • Further analysis of the interaction of volatility and twitter sentiment
  • 73. QUESTIONS? Any Questions? Feel free to reach out to me afterwards with comments or questions: eric@ericbrown.com (918) 928-2887

Editor's Notes

  1. At the end of the slide: With the data in mind, Let me walk you through the findings
  2. Stationary vs non-stationary: Transformation was performed to remove trend and seasonality from the non-stationary data and to remove any ‘time’ issues from the data. This means that a stationary dataset will look the same regardless of when you look at it. This isn’t true with non-stationary data.
  3. P = Price I = sentiment index a = a Constant B = coefficient E is the error term
  4. From http://www.investopedia.com/terms/d/durbin-watson-statistic.asp: Autocorrelation can be a significant problem in analyzing historical pricing information if one does not know to look out for it. For instance, since stock prices tend not to change too radically from one day to another, the prices from one day to the next could potentailly be highly correlated, even though there is little useful information in this observation. In order to avoid autocorrelation issues, the easiest solution in finance is to simply convert a series of historical prices into a series of percentage-price changes from day to day.
  5. Highlight the standard deviations: XLE - accuracy has the possibility of swinging between 56.52% and 47.06% approximately two-thirds of the time. XLP - accuracy has the possibility of swinging between 55.51% and 47.61% approximately two-thirds of the time. '