SlideShare a Scribd company logo
1 of 39
Analysis of Twitter Messages for
Sentiment and Insight for use in Stock
Market Decision Making
Eric D. Brown
Dakota State University
Introduction
• Agenda
  •   Background
  •   Literature Review
  •   Research Summary & Model
  •   Research Questions & Hypotheses
  •   Contributions
  •   Research Methods
  •   Preliminary Results
  •   Discussion & Challenges
  •   Conclusions & Future Work
Background
• Sentiment has long been an underlying factor in the investing
  world
  • Consumer Confidence Index
  • Investors Intelligence Sentiment Index
  • “Market Sentiment”


• Rather than waiting days, months or weeks, can the
  ‘sentiment of now’ be used to improve trading performance
  and investing decisions?

• Can Twitter be used to determine the ‘sentiment of now’?
Background
The thoughts driving this research are:

• Can analysis of publicly available Twitter Messages provide
  insight for decision making for investing?

• Do Twitter messages (and their subsequent sentiment) have
  any effect on movement in the stock market?

• Can Twitter messages be mined and analyzed to predict
  movements in the stock market?

• Does a Twitter user’s reputation have an effect on how people
  perceive and use their shared investing ideas?
Literature Review
• Wysoki (1998) – Strong positive correlation between volume
  of messages posted on message boards overnight and next
  day’s trading volume and stock returns

• Tumarkin and Whitelaw (2001) – concluded that there are no
  predictive capabilities found within message board activity

• Antweiler and Frank (2004) – Used sentiment analysis to
  show strong positive correlation between message board
  posts and next day trading volume and volatility. Showed
  minor correlation between message board posts and next day
  price activity.
Literature Review
• Gu, et al (2006) – Found that aggregation of individual
  recommendations on stock message boards have no
  predictive power on future stock returns

• Das and Chen (2007) – Using sentiment analysis of messages
  on message boards, found no correlation between sentiment
  and individual stock price movement but did find positive
  correlation of the aggregate sentiment of a set of aggregate
  stocks and movement in the stock market

• Zhang (2009) – Studied the reputation of a message board
  poster and showed that a ‘better’ reputation were shared
  more and had a higher effect on sentiment
Literature Review
• Bollen, Mao & Zeng (2010) – Using sentiment
  analysis, determines the ‘mood’ of the twitter universe and
  then predicts the next day movement of the Dow Jones
  Industrial Average – with an 87.6% accuracy.
  • This model is being used by a Hedge Fund to actively trade. The
    first month of trading showed profitability


• Sprenger and Welpe (2010) – Focused on the S&P 100 stocks
  and the sentiment of those stocks. Showed that sentiment of
  the company on Twitter closely follows market movements.
  Also shows positive correlation between trading volume and
  message volume on Twitter for that company.
  • Research laid the groundwork for TweetTrader.net
Literature Review
• Vincent & Armstrong (2010) – Undertook a research project
  to understand how Twitter ‘buzz’ and measure the ‘change of
  context’ within Twitter messages. They called this change of
  context the ‘breaking point’ - when messages turn from
  Bullish to Bearish (and vice versa).
  • Using these ‘breaking point’, a profitable automated trading
    system was developed to trade the Forex market.


• Saavedra, Hagerty, Uzzi (2011) – Studied a proprietary trading
  firm to determine if ideas shared through instant messaging
  platforms lead to increased performance (measured in terms
  of profitability).
Literature Review
• Additional research in Sentiment Analysis of Twitter:
  • Bifet & Frank, 2010 - Sentiment knowledge discovery in twitter
    streaming data
  • Pak & Paroubek, 2010 - Twitter as a Corpus for Sentiment analysis
    and Opinion Mining.
  • Romero, Meeder, & Klienberg, 2010 - Differences in the
    Mechanics of Information Diffusion Across Topics: Idioms,
    Political Hashtags, and Complex Contagion on Twitter
  • Castillo, Mendoza & Poblete, 2010 - Information credibility on
    twitter.
  • Diakopoulos & Shamma, 2010 - Characterizing debate
    performance via aggregated twitter sentiment.
Research Summary
• The goal of this research is a more thorough understanding of
  Twitter users and their sharing of investing ideas and how
  those ideas can or should be used in investing decisions

• If Twitter messages do convey some form of sentiment which
  is correlated to activity in the stock market, can this sentiment
  be used in a predictive manner?

• Can this large distributed network of users be ‘tapped’ to build
  a decision support system for the generation of investing
  ideas?
Research Summary
• This research will attempt to:

  • Study how individuals use Twitter to share and consume
    knowledge in support of their investing decisions

  • Determine whether correlation exists between the sentiment of a
    Tweet and movement in the stock market

  • Determine whether there are times of day (or days of the week)
    that provide more ‘weight’ toward sentiment

  • Understand how a user’s reputation might affect the sentiment of
    a company or sector
Research Model                                        User Reputation

Twitter Sentiment Analysis                                  Social
  For Stocks and Sectors                                  Analysis of
                                                           Twitter
                                                            Users
         Stock &        H1a, H1b, H1c
          Sector                                                H4a, H4b
         Analysis                       Information
                                         Content of
        Sentiment                          Twitter
        Weighting            H2a, H2b    Messages
          within
         Sectors                        Correlations   Predictive Nature
                                         with Stock    of Shared Tweets
                                          Market
          Day /                H3       Movement       How users might
          Time
                                                        use Twitter for
         Analysis
                                                       Decision Support
                                                         in investing
Research Questions & Hypotheses
• RQ-1: Using a given sector of the stock market, does the
  sentiment for that sector match the weighted sentiment for
  the stocks within that sector? How well does the sentiment
  predict price and volume movement?

     • H1a: The sentiment of a sector will match the overall
       averaged sentiment of all stocks within the sector.
     • H1b: The sentiment of a sector can be used to predict the
       movement of all stocks in that sector.
     • H1c: The sentiment of a sector or stock on any given day will
       provide a prediction for the next day’s movement in that stock
       or sector
Research Questions & Hypotheses
• RQ-2: Are there specific stocks within a given sector that
  supply the majority of the sentiment for that sector? If so, do
  these stocks supply sentiment in correlation to the weighting
  of those stocks in the sector?

  • H2a: The sentiment of a stock within a sector will affect the
    sentiment of the sector based on the relative weighting of that
    stock within the sector.

  • H2b: The stocks that provide the most weight toward the
    sentiment of a sector are also the stocks with the highest number
    of mentions on Twitter.
Research Questions & Hypotheses
• RQ-3: Are there times of the day or days of the week that
  provide a more accurate and informative sentiment for a stock
  or sector?

  • H3: Messages sent during non-market hours (i.e., evenings and
    weekends) will have the most effect on sentiment for the
    following day
Research Questions & Hypotheses
• RQ-4: Are there specific users that provide more ‘weight’ to a
  sentiment of a stock or sector based on the users’ reputation?
  Do retweets by these users (or of these users’ tweets) provide
  more weight for the sentiment of a stock or sector?

  • H4a: The number of followers of a Twitter user determines the
    effect that users’ tweets will have on sentiment for a stock or
    sector.

  • H4b: A message sent or retweeted by a user with a large number
    of Twitter followers will provide more weight toward the
    sentiment of a stock or sector.
Contributions
• Extends the body of knowledge for Sentiment Analysis of
  Twitter for decision support in the investing domain

• Extends the body of knowledge in regards to the information
  content of Twitter messages and how users might use that
  knowledge for decision support

• Gaining a better understanding of how a user’s reputation
  effects the sharing of their information

• Building a Text Corpus that can be used in future sentiment
  analysis research for twitter messages
Research Method
                                                                      Stock
         Twitter                              Data                    Market
                                            Collection                 Data




                                               Price &
                       Sentiment                                     Social
                                               Volume
                        Analysis                                    Analysis
                                               Analysis


                     Positive Correlation of sentiment and        Reputation of
                      message volume with price/volume             Twitter user


Understanding of predictive capabilities of Twitter Sentiment and the affect of
                 user reputation investing decision support
Research Method
• Data Collection
  • Using Twitter API to collect tweets (tweet, sender, date, time)
     • Tweets referencing companies and sectors are collected and stored
       in a MySQL database for future study
     • Using the nomenclature made popular by StockTwits
       (www.stocktwits.com). Example: The stock symbol for Apple is AAPL.
       Users following the StockTwits nomenclature add a “$” to the symbol
       – “$AAPL”.
     • Stocktwits.com describes their purpose as a place to:
        • …share ideas, market insights and trades on stocks, futures and the market
          in general *.


  • Using Yahoo Finance data feed to gather Stock Market data (price
    and volume)
     • Provides historical data
Research Method
• Sentiment Analysis
• Using a Naïve Bayesian text classification algorithm to determine
  sentiment of collected Tweets
      • Naïve Bayesian is being used for simplicity but also because many
        researchers have pointed out very minor differences between it and
        other sentiment analysis methods
      • A subset of the data collected will be manually assigned ‘sentiment’ to
        build the necessary training dataset

• Using the R Programming language combined with Python and the
  WEKA Data Mining Platform, a text classification algorithms will be
  implemented to determine sentiment

• For each tweet, the overall score is calculated and assigned.
  • Ideally, tweets will fall into +1 (Bullish), 0 (Neutral), -1 (Bearish)
    buckets.
  • Currently, tweet sentiments are summed and added without being
    normalized
Research Method
• Price and Volume Analysis

  • Using regression and other analysis techniques, the movements
    in price and volume will be compared with sentiment of stocks
    and sectors to determine if any predictive capabilities exist
    between sentiment, tweet message volume, price movement and
    volume.

  • The Autoregressive integrated moving average (ARIMA) and
    Granger Causality Analysis techniques are being considered for
    use in this project for modeling predictive behavior. Other
    appropriate statistical techniques may also be considered
Research Method
• Social Analysis

   • Using social graphs and analysis of twitter users, determine
     whether a tweet sent by a user with more followers provides
     more ‘weight’ to a sentiment of the stocks mentioned in the
     tweet.

   • Using the concept of ‘retweets’, determine how far a user’s
     tweet travels via ‘retweets’, can any form of reputation or ‘trust’
     of that user be determined?
Preliminary Results
• A short study was conducted in May 2011 (May 2 to May 11) to
  determine viability of data collection and sentiment analysis
   • 2 Stock Market Sectors chosen to collect data:
      • Energy (XLE) – consists of 41 companies
      • Consumer Staples (XLP) – consists of 41 companies

   • Using the Twitter API and the collection prototype, a ten day run was
     initiated
      • 13,000 tweets collected for XLE, XLP and 82 companies comprising the
        sectors

• Basic Naïve Bayesian approach to determining sentiment using the R
  programming language

• For a quick test, Hu and Liu’s (2004) polarity dataset used to determine
  / assign sentiment
Preliminary Results
• The XLE ETF saw about an seven-point drop from a high of
  $80.80 on May 2, 2011 to a low of $73.70 on May 11, 2011.




               Courtesy of StockCharts.com
Preliminary Results
• XLE Average Sentiment: 0.115
• 1=Bullish / Positive; 0=Neutral; -1=Bearish / Negative
• 253 tweets captured for XLE
Preliminary Results
• XLE Price Movement compared with Sentiment
Preliminary Results




       Volume Data via Yahoo Finance
Preliminary Results
Top 5 Holdings of the XLE ETF and their Sentiment

  Company Symbol           % of ETF     Price Change ($)   Average Sentiment
    Exxon / XOM             19.02 %          -7.16              -0.034
   Chevron / CVX            15.03 %          -6.99               0.113
 Schlumberger / SLB         7.12 %           -7.28               0.413
Conocophillips / COP        5.17 %           -7.36               0.258
  Occidental / OXY          4.32 %           -13.31              0.173
Preliminary Results
• The XLP ETF saw a slight upward move from $31.12 on May
  2, 2011 to $31.24 on May 11, 2011 .




              Courtesy of StockCharts.com
Preliminary Results
• XLP Average Sentiment: 0.5
• 1=Bullish / Positive; 0=Neutral; -1=Bearish / Negative
• 52 tweets captured for XLP
Preliminary Results
• XLP Price Movement compared with Daily Sentiment
Preliminary Results




       Volume Data via Yahoo Finance
Preliminary Results
Top 5 Holdings of the XLP ETF and their Sentiment

  Company Symbol           % of ETF     Price Change ($)   Average Sentiment
Proctor & Gamble / PG       14.69 %          1.58                0.418
  Philip Morris / PM        9.60 %           -1.48               0.134
  Wal Mart / WMT            8.00 %           1.40                0.171
   Coca Cola / KO           7.48 %           0.41                0.671
  Kraft Foods / KFT         5.04 %           1.40                0.254
Preliminary Results
• Social Analysis
  • One of the users that sent a Tweet during this test was a twitter
    user named “gtotoy”.
      •   He has 6,502 twitter followers
      •   Sent 28,869 tweets
      •   One Tweet by gtotoy for was retweeted by 10 separate twitter users
      •   According to TweetReach.com, gtotoy’s reach over the last 50 tweets
          (as of Oct 26 2011 @ 12:22PM):
          • 13,161 people
          • 145,125 Impressions


  • Will gtotoy’s tweets (and subsequent retweets) provide more
    ‘weight’ for a stock or sector’s sentiment?
Social Graph for “gtotoy”




http://twiangulate.com/search/gtotoy/inner_circle/map/my_friends-1/graph_label-names/graph_bidir_only-0/
Preliminary Results - Discussion

• There’s not enough data gathered over the 10 day period to
  begin to answer any research questions.

• Purpose of the preliminary test was to validate the research
  method and approach – data can be collected, scored and
  analyzed.

• Data collection has been ongoing since May 1
Conclusions & Future Work
• There are some challenges to this research:

  • Building the training dataset will be key
     • Building a corpus of investing and trading ‘words’ for
       positive, neutral and bearish opinions


  • Continued Access to Twitter API

  • Will Twitter “Spam” have an affect on sentiment?

  • Can sarcasm be detected? If not, how does it effect sentiment?
Conclusions & Future Work
• Based on the initial analysis presented, there appears to be an
  interesting study to be done to determine if Twitter Sentiment
  has predictive capabilities

• Next Steps:
  •   Dissertation Approval
  •   Continue data collection
  •   Determine Modeling and Predictive Analysis Approaches
  •   Complete Analysis & Research
  •   Write up
Questions?



         Thank you

More Related Content

Viewers also liked

Best Practices for Sentiment Analysis Webinar
Best Practices for Sentiment Analysis Webinar Best Practices for Sentiment Analysis Webinar
Best Practices for Sentiment Analysis Webinar Mechanical Turk
 
Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms
Sentiment Analysis Using Hybrid Structure of Machine Learning AlgorithmsSentiment Analysis Using Hybrid Structure of Machine Learning Algorithms
Sentiment Analysis Using Hybrid Structure of Machine Learning AlgorithmsSangeeth Nagarajan
 
Sentiment Analysis with NVivo 11 Plus
Sentiment Analysis with NVivo 11 PlusSentiment Analysis with NVivo 11 Plus
Sentiment Analysis with NVivo 11 PlusShalin Hai-Jew
 
Practical Sentiment Analysis
Practical Sentiment AnalysisPractical Sentiment Analysis
Practical Sentiment AnalysisPeople Pattern
 
Can Deep Learning solve the Sentiment Analysis Problem
Can Deep Learning solve the Sentiment Analysis ProblemCan Deep Learning solve the Sentiment Analysis Problem
Can Deep Learning solve the Sentiment Analysis ProblemMark Cieliebak
 
Sentiment Analysis via R Programming
Sentiment Analysis via R ProgrammingSentiment Analysis via R Programming
Sentiment Analysis via R ProgrammingSkillspeed
 
Practical sentiment analysis
Practical sentiment analysisPractical sentiment analysis
Practical sentiment analysisDiana Maynard
 
Sentiment analysis-by-nltk
Sentiment analysis-by-nltkSentiment analysis-by-nltk
Sentiment analysis-by-nltkWei-Ting Kuo
 
5 Hot Trends for Data and Analytics in 2017
5 Hot Trends for Data and Analytics in 20175 Hot Trends for Data and Analytics in 2017
5 Hot Trends for Data and Analytics in 2017ibi
 
MTech Seminar Presentation [IIT-Bombay]
MTech Seminar Presentation [IIT-Bombay]MTech Seminar Presentation [IIT-Bombay]
MTech Seminar Presentation [IIT-Bombay]Sagar Ahire
 
R by example: mining Twitter for consumer attitudes towards airlines
R by example: mining Twitter for consumer attitudes towards airlinesR by example: mining Twitter for consumer attitudes towards airlines
R by example: mining Twitter for consumer attitudes towards airlinesJeffrey Breen
 
Tutorial of Sentiment Analysis
Tutorial of Sentiment AnalysisTutorial of Sentiment Analysis
Tutorial of Sentiment AnalysisFabio Benedetti
 
Sentiment analysis of tweets
Sentiment analysis of tweetsSentiment analysis of tweets
Sentiment analysis of tweetsVasu Jain
 
Opinion Mining Tutorial (Sentiment Analysis)
Opinion Mining Tutorial (Sentiment Analysis)Opinion Mining Tutorial (Sentiment Analysis)
Opinion Mining Tutorial (Sentiment Analysis)Kavita Ganesan
 
Sentiment Analysis in Twitter
Sentiment Analysis in TwitterSentiment Analysis in Twitter
Sentiment Analysis in TwitterAyushi Dalmia
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSumit Raj
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDevashish Shanker
 
Big Data & Sentiment Analysis
Big Data & Sentiment AnalysisBig Data & Sentiment Analysis
Big Data & Sentiment AnalysisMichel Bruley
 

Viewers also liked (18)

Best Practices for Sentiment Analysis Webinar
Best Practices for Sentiment Analysis Webinar Best Practices for Sentiment Analysis Webinar
Best Practices for Sentiment Analysis Webinar
 
Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms
Sentiment Analysis Using Hybrid Structure of Machine Learning AlgorithmsSentiment Analysis Using Hybrid Structure of Machine Learning Algorithms
Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms
 
Sentiment Analysis with NVivo 11 Plus
Sentiment Analysis with NVivo 11 PlusSentiment Analysis with NVivo 11 Plus
Sentiment Analysis with NVivo 11 Plus
 
Practical Sentiment Analysis
Practical Sentiment AnalysisPractical Sentiment Analysis
Practical Sentiment Analysis
 
Can Deep Learning solve the Sentiment Analysis Problem
Can Deep Learning solve the Sentiment Analysis ProblemCan Deep Learning solve the Sentiment Analysis Problem
Can Deep Learning solve the Sentiment Analysis Problem
 
Sentiment Analysis via R Programming
Sentiment Analysis via R ProgrammingSentiment Analysis via R Programming
Sentiment Analysis via R Programming
 
Practical sentiment analysis
Practical sentiment analysisPractical sentiment analysis
Practical sentiment analysis
 
Sentiment analysis-by-nltk
Sentiment analysis-by-nltkSentiment analysis-by-nltk
Sentiment analysis-by-nltk
 
5 Hot Trends for Data and Analytics in 2017
5 Hot Trends for Data and Analytics in 20175 Hot Trends for Data and Analytics in 2017
5 Hot Trends for Data and Analytics in 2017
 
MTech Seminar Presentation [IIT-Bombay]
MTech Seminar Presentation [IIT-Bombay]MTech Seminar Presentation [IIT-Bombay]
MTech Seminar Presentation [IIT-Bombay]
 
R by example: mining Twitter for consumer attitudes towards airlines
R by example: mining Twitter for consumer attitudes towards airlinesR by example: mining Twitter for consumer attitudes towards airlines
R by example: mining Twitter for consumer attitudes towards airlines
 
Tutorial of Sentiment Analysis
Tutorial of Sentiment AnalysisTutorial of Sentiment Analysis
Tutorial of Sentiment Analysis
 
Sentiment analysis of tweets
Sentiment analysis of tweetsSentiment analysis of tweets
Sentiment analysis of tweets
 
Opinion Mining Tutorial (Sentiment Analysis)
Opinion Mining Tutorial (Sentiment Analysis)Opinion Mining Tutorial (Sentiment Analysis)
Opinion Mining Tutorial (Sentiment Analysis)
 
Sentiment Analysis in Twitter
Sentiment Analysis in TwitterSentiment Analysis in Twitter
Sentiment Analysis in Twitter
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter Data
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 
Big Data & Sentiment Analysis
Big Data & Sentiment AnalysisBig Data & Sentiment Analysis
Big Data & Sentiment Analysis
 

Recently uploaded

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 

Recently uploaded (20)

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 

Understanding Twitter Sentiment for Investing Decisions

  • 1. Analysis of Twitter Messages for Sentiment and Insight for use in Stock Market Decision Making Eric D. Brown Dakota State University
  • 2. Introduction • Agenda • Background • Literature Review • Research Summary & Model • Research Questions & Hypotheses • Contributions • Research Methods • Preliminary Results • Discussion & Challenges • Conclusions & Future Work
  • 3. Background • Sentiment has long been an underlying factor in the investing world • Consumer Confidence Index • Investors Intelligence Sentiment Index • “Market Sentiment” • Rather than waiting days, months or weeks, can the ‘sentiment of now’ be used to improve trading performance and investing decisions? • Can Twitter be used to determine the ‘sentiment of now’?
  • 4. Background The thoughts driving this research are: • Can analysis of publicly available Twitter Messages provide insight for decision making for investing? • Do Twitter messages (and their subsequent sentiment) have any effect on movement in the stock market? • Can Twitter messages be mined and analyzed to predict movements in the stock market? • Does a Twitter user’s reputation have an effect on how people perceive and use their shared investing ideas?
  • 5. Literature Review • Wysoki (1998) – Strong positive correlation between volume of messages posted on message boards overnight and next day’s trading volume and stock returns • Tumarkin and Whitelaw (2001) – concluded that there are no predictive capabilities found within message board activity • Antweiler and Frank (2004) – Used sentiment analysis to show strong positive correlation between message board posts and next day trading volume and volatility. Showed minor correlation between message board posts and next day price activity.
  • 6. Literature Review • Gu, et al (2006) – Found that aggregation of individual recommendations on stock message boards have no predictive power on future stock returns • Das and Chen (2007) – Using sentiment analysis of messages on message boards, found no correlation between sentiment and individual stock price movement but did find positive correlation of the aggregate sentiment of a set of aggregate stocks and movement in the stock market • Zhang (2009) – Studied the reputation of a message board poster and showed that a ‘better’ reputation were shared more and had a higher effect on sentiment
  • 7. Literature Review • Bollen, Mao & Zeng (2010) – Using sentiment analysis, determines the ‘mood’ of the twitter universe and then predicts the next day movement of the Dow Jones Industrial Average – with an 87.6% accuracy. • This model is being used by a Hedge Fund to actively trade. The first month of trading showed profitability • Sprenger and Welpe (2010) – Focused on the S&P 100 stocks and the sentiment of those stocks. Showed that sentiment of the company on Twitter closely follows market movements. Also shows positive correlation between trading volume and message volume on Twitter for that company. • Research laid the groundwork for TweetTrader.net
  • 8. Literature Review • Vincent & Armstrong (2010) – Undertook a research project to understand how Twitter ‘buzz’ and measure the ‘change of context’ within Twitter messages. They called this change of context the ‘breaking point’ - when messages turn from Bullish to Bearish (and vice versa). • Using these ‘breaking point’, a profitable automated trading system was developed to trade the Forex market. • Saavedra, Hagerty, Uzzi (2011) – Studied a proprietary trading firm to determine if ideas shared through instant messaging platforms lead to increased performance (measured in terms of profitability).
  • 9. Literature Review • Additional research in Sentiment Analysis of Twitter: • Bifet & Frank, 2010 - Sentiment knowledge discovery in twitter streaming data • Pak & Paroubek, 2010 - Twitter as a Corpus for Sentiment analysis and Opinion Mining. • Romero, Meeder, & Klienberg, 2010 - Differences in the Mechanics of Information Diffusion Across Topics: Idioms, Political Hashtags, and Complex Contagion on Twitter • Castillo, Mendoza & Poblete, 2010 - Information credibility on twitter. • Diakopoulos & Shamma, 2010 - Characterizing debate performance via aggregated twitter sentiment.
  • 10. Research Summary • The goal of this research is a more thorough understanding of Twitter users and their sharing of investing ideas and how those ideas can or should be used in investing decisions • If Twitter messages do convey some form of sentiment which is correlated to activity in the stock market, can this sentiment be used in a predictive manner? • Can this large distributed network of users be ‘tapped’ to build a decision support system for the generation of investing ideas?
  • 11. Research Summary • This research will attempt to: • Study how individuals use Twitter to share and consume knowledge in support of their investing decisions • Determine whether correlation exists between the sentiment of a Tweet and movement in the stock market • Determine whether there are times of day (or days of the week) that provide more ‘weight’ toward sentiment • Understand how a user’s reputation might affect the sentiment of a company or sector
  • 12. Research Model User Reputation Twitter Sentiment Analysis Social For Stocks and Sectors Analysis of Twitter Users Stock & H1a, H1b, H1c Sector H4a, H4b Analysis Information Content of Sentiment Twitter Weighting H2a, H2b Messages within Sectors Correlations Predictive Nature with Stock of Shared Tweets Market Day / H3 Movement How users might Time use Twitter for Analysis Decision Support in investing
  • 13. Research Questions & Hypotheses • RQ-1: Using a given sector of the stock market, does the sentiment for that sector match the weighted sentiment for the stocks within that sector? How well does the sentiment predict price and volume movement? • H1a: The sentiment of a sector will match the overall averaged sentiment of all stocks within the sector. • H1b: The sentiment of a sector can be used to predict the movement of all stocks in that sector. • H1c: The sentiment of a sector or stock on any given day will provide a prediction for the next day’s movement in that stock or sector
  • 14. Research Questions & Hypotheses • RQ-2: Are there specific stocks within a given sector that supply the majority of the sentiment for that sector? If so, do these stocks supply sentiment in correlation to the weighting of those stocks in the sector? • H2a: The sentiment of a stock within a sector will affect the sentiment of the sector based on the relative weighting of that stock within the sector. • H2b: The stocks that provide the most weight toward the sentiment of a sector are also the stocks with the highest number of mentions on Twitter.
  • 15. Research Questions & Hypotheses • RQ-3: Are there times of the day or days of the week that provide a more accurate and informative sentiment for a stock or sector? • H3: Messages sent during non-market hours (i.e., evenings and weekends) will have the most effect on sentiment for the following day
  • 16. Research Questions & Hypotheses • RQ-4: Are there specific users that provide more ‘weight’ to a sentiment of a stock or sector based on the users’ reputation? Do retweets by these users (or of these users’ tweets) provide more weight for the sentiment of a stock or sector? • H4a: The number of followers of a Twitter user determines the effect that users’ tweets will have on sentiment for a stock or sector. • H4b: A message sent or retweeted by a user with a large number of Twitter followers will provide more weight toward the sentiment of a stock or sector.
  • 17. Contributions • Extends the body of knowledge for Sentiment Analysis of Twitter for decision support in the investing domain • Extends the body of knowledge in regards to the information content of Twitter messages and how users might use that knowledge for decision support • Gaining a better understanding of how a user’s reputation effects the sharing of their information • Building a Text Corpus that can be used in future sentiment analysis research for twitter messages
  • 18. Research Method Stock Twitter Data Market Collection Data Price & Sentiment Social Volume Analysis Analysis Analysis Positive Correlation of sentiment and Reputation of message volume with price/volume Twitter user Understanding of predictive capabilities of Twitter Sentiment and the affect of user reputation investing decision support
  • 19. Research Method • Data Collection • Using Twitter API to collect tweets (tweet, sender, date, time) • Tweets referencing companies and sectors are collected and stored in a MySQL database for future study • Using the nomenclature made popular by StockTwits (www.stocktwits.com). Example: The stock symbol for Apple is AAPL. Users following the StockTwits nomenclature add a “$” to the symbol – “$AAPL”. • Stocktwits.com describes their purpose as a place to: • …share ideas, market insights and trades on stocks, futures and the market in general *. • Using Yahoo Finance data feed to gather Stock Market data (price and volume) • Provides historical data
  • 20. Research Method • Sentiment Analysis • Using a Naïve Bayesian text classification algorithm to determine sentiment of collected Tweets • Naïve Bayesian is being used for simplicity but also because many researchers have pointed out very minor differences between it and other sentiment analysis methods • A subset of the data collected will be manually assigned ‘sentiment’ to build the necessary training dataset • Using the R Programming language combined with Python and the WEKA Data Mining Platform, a text classification algorithms will be implemented to determine sentiment • For each tweet, the overall score is calculated and assigned. • Ideally, tweets will fall into +1 (Bullish), 0 (Neutral), -1 (Bearish) buckets. • Currently, tweet sentiments are summed and added without being normalized
  • 21. Research Method • Price and Volume Analysis • Using regression and other analysis techniques, the movements in price and volume will be compared with sentiment of stocks and sectors to determine if any predictive capabilities exist between sentiment, tweet message volume, price movement and volume. • The Autoregressive integrated moving average (ARIMA) and Granger Causality Analysis techniques are being considered for use in this project for modeling predictive behavior. Other appropriate statistical techniques may also be considered
  • 22. Research Method • Social Analysis • Using social graphs and analysis of twitter users, determine whether a tweet sent by a user with more followers provides more ‘weight’ to a sentiment of the stocks mentioned in the tweet. • Using the concept of ‘retweets’, determine how far a user’s tweet travels via ‘retweets’, can any form of reputation or ‘trust’ of that user be determined?
  • 23. Preliminary Results • A short study was conducted in May 2011 (May 2 to May 11) to determine viability of data collection and sentiment analysis • 2 Stock Market Sectors chosen to collect data: • Energy (XLE) – consists of 41 companies • Consumer Staples (XLP) – consists of 41 companies • Using the Twitter API and the collection prototype, a ten day run was initiated • 13,000 tweets collected for XLE, XLP and 82 companies comprising the sectors • Basic Naïve Bayesian approach to determining sentiment using the R programming language • For a quick test, Hu and Liu’s (2004) polarity dataset used to determine / assign sentiment
  • 24. Preliminary Results • The XLE ETF saw about an seven-point drop from a high of $80.80 on May 2, 2011 to a low of $73.70 on May 11, 2011. Courtesy of StockCharts.com
  • 25. Preliminary Results • XLE Average Sentiment: 0.115 • 1=Bullish / Positive; 0=Neutral; -1=Bearish / Negative • 253 tweets captured for XLE
  • 26. Preliminary Results • XLE Price Movement compared with Sentiment
  • 27. Preliminary Results Volume Data via Yahoo Finance
  • 28. Preliminary Results Top 5 Holdings of the XLE ETF and their Sentiment Company Symbol % of ETF Price Change ($) Average Sentiment Exxon / XOM 19.02 % -7.16 -0.034 Chevron / CVX 15.03 % -6.99 0.113 Schlumberger / SLB 7.12 % -7.28 0.413 Conocophillips / COP 5.17 % -7.36 0.258 Occidental / OXY 4.32 % -13.31 0.173
  • 29. Preliminary Results • The XLP ETF saw a slight upward move from $31.12 on May 2, 2011 to $31.24 on May 11, 2011 . Courtesy of StockCharts.com
  • 30. Preliminary Results • XLP Average Sentiment: 0.5 • 1=Bullish / Positive; 0=Neutral; -1=Bearish / Negative • 52 tweets captured for XLP
  • 31. Preliminary Results • XLP Price Movement compared with Daily Sentiment
  • 32. Preliminary Results Volume Data via Yahoo Finance
  • 33. Preliminary Results Top 5 Holdings of the XLP ETF and their Sentiment Company Symbol % of ETF Price Change ($) Average Sentiment Proctor & Gamble / PG 14.69 % 1.58 0.418 Philip Morris / PM 9.60 % -1.48 0.134 Wal Mart / WMT 8.00 % 1.40 0.171 Coca Cola / KO 7.48 % 0.41 0.671 Kraft Foods / KFT 5.04 % 1.40 0.254
  • 34. Preliminary Results • Social Analysis • One of the users that sent a Tweet during this test was a twitter user named “gtotoy”. • He has 6,502 twitter followers • Sent 28,869 tweets • One Tweet by gtotoy for was retweeted by 10 separate twitter users • According to TweetReach.com, gtotoy’s reach over the last 50 tweets (as of Oct 26 2011 @ 12:22PM): • 13,161 people • 145,125 Impressions • Will gtotoy’s tweets (and subsequent retweets) provide more ‘weight’ for a stock or sector’s sentiment?
  • 35. Social Graph for “gtotoy” http://twiangulate.com/search/gtotoy/inner_circle/map/my_friends-1/graph_label-names/graph_bidir_only-0/
  • 36. Preliminary Results - Discussion • There’s not enough data gathered over the 10 day period to begin to answer any research questions. • Purpose of the preliminary test was to validate the research method and approach – data can be collected, scored and analyzed. • Data collection has been ongoing since May 1
  • 37. Conclusions & Future Work • There are some challenges to this research: • Building the training dataset will be key • Building a corpus of investing and trading ‘words’ for positive, neutral and bearish opinions • Continued Access to Twitter API • Will Twitter “Spam” have an affect on sentiment? • Can sarcasm be detected? If not, how does it effect sentiment?
  • 38. Conclusions & Future Work • Based on the initial analysis presented, there appears to be an interesting study to be done to determine if Twitter Sentiment has predictive capabilities • Next Steps: • Dissertation Approval • Continue data collection • Determine Modeling and Predictive Analysis Approaches • Complete Analysis & Research • Write up
  • 39. Questions? Thank you