Stock prediction using social network

Stock Prediction Using Social
Network Data
Rohit Tiwari (rtiwari2)
Chanon Hongsirikulkit (hongsir2)

Outline
- Introduction
- Data Sources
- APIs
- Filter Relevant Data
- Text Normalization
- Noise Removal
- Feature Extraction
- Topic Modeling
- Sentiment Analysis
- Tweet Features
- Prediction Model Construction
- Conclusion
- Future Works

Introduction
- Social Network is a communication platform contain hidden valuable knowledge
- Information on social network can reflect the real-world events
- Many researches exploit those information to enhance the application capability
- To analyze tweets contain information needs (Zhao and Mei 2013)
- Apply tweet-rate to predict box office revenues of movie (Asur and Huberman 2010)
- Our survey will focus on using social network data to predict stock market movement
- False message on Twitter “BREAKING: Two Explosions in the White House and Barack Obama is injured.” -> The Dow Jones and
S&P 500 indexes dropped by close to 1%, the equivalent of hundreds of billions of dollars changing hands.
- In August 2012, an Italian journalist set up a fake Twitter account for a member of Russia's government and tweeted that the
president of Syria had been killed, causing brief fluctuations in the oil markets.
http://www.telegraph.co.uk/finance/markets/10013768/Bogus-AP-tweet-
about-explosion-at-the-White-House-wipes-billions-off-US-markets.html

Formal Description: The Efficient Market Hypothesis (EMH)
- The EMH states that financial markets are the source of comprehensive and huge information.
It implies that market prices reflect changes in investor behavior since they take this into
account and act accordingly.
- Research asserts investor’s rational considerations are influenced by psychological biases and
emotions.
- For several decades, direct surveys have been the prominent method to estimate public mood
and investor sentiment. However, explicit expressions can be manipulated incorrectly. It cannot
take behavior based indicators into consideration.
J. Bollen and H. Mao, “Twitter Mood as a Stock Market Predictor,” Computer, vol. 44, no. 10, pp. 91-94, 2011.

General Methodology for Stock prediction
Data
Sources
Relevant
Dataset
Data
Preprocessing
-Text Filter
-Text Normalization
-Noise Removal
via APIs
Feature
Extraction
Features
Topic
Modeling
Sentiment
Analysis
Tweet
Features
Classifiers
Training
Data
Results
Correlation /
Prediction
Capability
Testing

Data Sources
- Twitter (Asur and Huberman 2010; Bollen and Mao 2011; Zhao and Mei 2013; Arias et al. 2015)
- Streaming API -> collect real-time tweets
- Search API -> search and collect historical tweets one week in past
- Yahoo Finance (Nguyen et al. 2015)
- Collect historical stock prices
- Collect posts from Yahoo Finance Board
- Sina Weibo (Liu et al. 2015)
- Microblogging service from China which is similar to Twitter

Filter Relevant Data from Corpus
- Collect data from social network contain both relevant and non-relevant data
to our specific domain
- We need to filter only relevant data
- Some approaches are used in the researches
- Filter by keywords -> exploit hashtag or cash tag in the messages
- Apply LDA to do topic modeling and then filter only related topics (Arias et al.
2015)
M. Arias, A. Arratia, and R. Xuriguera, “Forecasting with Twitter Data,” ACM
Transactions on Intelligent Systems and Technology, vol. 5, no. 1, pp. 1-24, 2015.

Text Normalization
Primary step to refine the data. It can involve tasks.
- Stop word removal
- Punctuation removal
- Lowercase conversion
- Compressing
- Transform “Haaappyyyy” to “Happy” . This is done in multiple iterations,
finally validated with the dictionary lookup at the end.

Noise Removal in tweets
- Noise data removing has standard tools to remove highly weighted and
frequent terms with IDF.
- Named entity recognition (NER) system - Initially, it was built to figure out
if tweet contains name entities related to companies(or other feature) based
on conditional random fields (CRF) model. If the Tweet doesn’t have any
named entities from keyword list for the company, it is removed.

Cluttered Information
Refined form
Feature Extraction

- Some researches use topics of the messages to be features for forecasting model
- Many approaches are proposed for topic extraction
- Extract n-gram (unigrams or bigrams)
- Latent Dirichlet Allocation (LDA)
- Joint Sentiment-Topic (JST) -> to extract both sentiment information and topics from
text data simultaneously
- Aspect-based sentiment -> to extract topics first and then calculate sentiment scores
concerning the distance between topics and emotion words / the importance of each topic
(Nguyen et al. 2015)
Topic Modelling

- To extract topics first and then calculate sentiment scores concerning the distance
between topics and emotion words / the importance of each topic (Nguyen et al.
2015)
Aspect-based sentiment algorithm
Algorithm for extracting topics
from dataset
Algorithm for extracting topics
and their sentiment values
T. H. Nguyen, K. Shirai, and J. Velcin, “Sentiment analysis on
social media for stock movement prediction,” Expert Systems with
Applications, vol. 42, no. 24, pp. 9603-9611, 2015.

Sentiment Analysis
- Some researches consider sentiment information on social network as features for their model
- There are two ways to extract sentiment score
- Using software to calculate sentiment scores
- Construct a classifier for sentiment classification
- Popular tools
- GPOMS -> categorize people’s emotions into 6 categories: calm, alert, sure, vital, kind, and happy
- OpinionFinder (OF) -> classify sentiment into positive or negative feelings

Constructing Sentiment Classifier
- Have experts to annotate sentiment data and use them as training data
- Extract features from training data -> n-gram, POS tagging
- Use classifier (SVM, Linear Regression Model) to learn from training data
- Apply the classifier to entire collection

Extracting Sentiment Features
After having classified sentiment data, we can generate sentiment features in various ways
Example of sentiment features used in some researches.
- Average daily sentiment score
- Sentiment index = Numbers of positive tweets / Total numbers of tweets
- PNRatio = Numbers of positive tweets / Numbers of negative tweets
- Sentiment polarity = (ptw - ntw) / (ptw + ntw)
- ptw : numbers of positive tweets
- ntw : numbers of negative tweets

Sentiment Features Testing
- To ensure that sentiment information reflect the real-world events and can be used for prediction
- Some approaches used in researches (Bollen and Mao 2011)
- Causality testing : to test correlation between sentiment information and stock market price (DJIA / VIX)
- Self-organizing fuzzy neural network (SOFFN) : to test prediction capability of sentiment information
J. Bollen, and H. Mao, “Twitter Mood as a Stock Market Predictor,”
Computer, vol. 44, no. 10, pp. 91-94, 2011.

Extracting Tweet Features
Some useful quantifiable information out of corpus.
- Number of followers of the company or the famous personality tweeting about the
company (typical problem of mapreduce framework)
- Tweet volume (related to a specific identity or hashtag)
- Retweet volume (related to a specific hashtag coupled with an identity)
- Tweet-rate = Numbers of tweets / Duration for generating those tweets
- Tweet length

Prediction Model Construction
1. Combine features from previous step
- Topic features
- Sentiment features
- Tweet features
- Stock historical price features (additional features)

Google Heat Map:
Gives the fair idea of any form of concentrated information by the geography. Eg, Facebook trends

Iterative Training & Validation
2. Train the classifier -> SVM, Linear Regression, Neural Networks
3. Test and evaluate the model
- Most popular method for this is windowing mechanism, where model segregates
tweets in a window (w1) spanning over days and analyses their sentiments or
features.
- Then in the subsequent window(w2) of 1-2 days, stock indices are measured.
- Then, w1 & w2 are formally analyzed together to find interesting patterns.

Correlation of sentiments & indices
This involve formally casually correlating social network sentiments and stock market
indices from Dow Jones, NASDAQ, NYSE, VIX
M. Arias, A. Arratia, and R. Xuriguera, “Forecasting with Twitter Data,” ACM
Transactions on Intelligent Systems and Technology, vol. 5, no. 1, pp. 1-24, 2015.
T. H. Nguyen, K. Shirai, and J. Velcin, “Sentiment analysis on social media for stock movement
prediction,” Expert Systems with Applications, vol. 42, no. 24, pp. 9603-9611, 2015.

Conclusion
- Information on social network reflect the real-world events
- Social network data can be used to predict stock market movement at certain
degree
- The knowledge extracted from social media can be applied to different
applications
- Individual stock price prediction
- Predicting box-office revenue of a movie
- Presidential/Senate election prediction based on campaigning data.

Future Works
- Try to work on longer duration dataset -> some current works use only 15
transaction dates
- Combining information from different data sources might improve prediction
accuracy -> we know that Twitter contain many noise data
- Come up with new features, such as the credibility of tweets. -> most of
current researches focus on topic + sentiment without concerning about
reliability of data

References
[1] M. Arias, A. Arratia, and R. Xuriguera, “Forecasting with Twitter Data,” ACM Transactions on Intelligent Systems and Technology, vol. 5, no. 1,
pp. 1-24, 2015.
[2] L. Liu, J. Wu, P. Li, and Q. Li, “A social-media-based approach to predicting stock comovement,” Expert Systems with Applications, vol. 42, no.
8, pp. 3893-3901, 2015.
[3] T. H. Nguyen, K. Shirai, and J. Velcin, “Sentiment analysis on social media for stock movement prediction,” Expert Systems with Applications,
vol. 42, no. 24, pp. 9603-9611, 2015.
[4] S. Asur, B. A. Huberman, "Predicting the Future with Social Media," 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence
(WI) and Intelligent Agent Technologies (IAT), pp. 492-499, 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent
Agent Technology, 2010.
[5] Z. Zhao, Q. Mei, “Questions about questions: an empirical analysis of information needs on Twitter,” Proceedings of the 22nd international
conference on World Wide Web, May 13-17, 2013, Rio de Janeiro, Brazil
[6] J. Bollen, and H. Mao, “Twitter Mood as a Stock Market Predictor,” Computer, vol. 44, no. 10, pp. 91-94, 2011.
[7] J. Si, A. Mukherjee, B. Liu, Q. Li, H. Li, and X. Deng, “Exploiting Topic based Twitter Sentiment for Stock Prediction,” Proceedings of the 51st
Annual Meeting of the Association for Computational Linguistics, pp. 24-29, 2013.
[8] X. Zhang, H. Fuehres, and P. A. Gloor, “Predicting Stock Market Indicators Through Twitter “I hope it is not as bad as I fear”,” The 2nd
Collaborative Innovation Networks Conference - COINs2010, vol. 26, pp. 55-62, 2011.
[9] G. Ranco, D. Aleksovski, G. Caldarelli, M. Grčar, and I. Mozetič, “The Effects of Twitter Sentiment on Stock Price Returns,” Plos ONE, vol. 10,
no. 9, pp. 1-21, 2015.
[10] T. T. Vu, S. Chang, Q. T. Ha, and N. Collier, “An Experiment in Integrating Sentiment Features for Tech Stock Prediction in Twitter,”
Workshop on Information Extraction and Entity Analytics on Social Media Data, pp. 23-38, 2012.

Stock prediction using social network

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Stock prediction using social network

Similar to Stock prediction using social network (20)

Recently uploaded

Recently uploaded (20)

Stock prediction using social network