< sen·ti·ment > the prevailing attitude of investors as to anticipated price development in a market.Tim Harbers, CTO SNTMNTDataScienceNL Meetup November 8th 2012
Tim Harbers Background BSc Computer Science MSc Computer Science Researcher Data Miner Technical Consultant Co-Founder and COO Co-Founder and CTO
The RockstarsVincent van LeeuwenCustomer Development ‣ Balanced multidisciplinary team ‣ Two machine learning experts in predictive analysis and largeKees van Nunen datasetsProduct Development ‣ Academic degrees in Behavioral Finance, Portfolio Finance, Strategic Management & ArtificialDurk Kingma IntelligenceData Mining Expert ‣ Strong network in (Dutch) financial industry ‣ Young, enthusiastic team with aTim Harbers proven entrepreneurial mindsetMachine Learning Expert
How to select the right stock to invest in?
Our solution:Predicting stock price movementbased on online buzz Engineered based on academic research: Van Leeuwen (2011) Bollen, et al, (2010) Sprenger and Welpe (2010) Sehgal and Song (2007)
Why would this work? Very different from traditional indicators News travels faster via social than traditional media Tremendous amount of data (Almost) nobody uses it yet
Why focus on Twitter? Public data & easily accessible Structured language 400M tweets per day
Historic ResearchBollen (2010)Created a model based on Twitter mood states, whichwas 86% accurate on the DJI.Sprenger and Welpe (2011)Analyzed correlation of the stock market and microblogs
Financial Sentiment vs Brand SentimentFinancial Sentiment Brand Sentiment Tweets relating to Tweets relating to stocks brands Written by traders Written by consumers Trader mumbo jumbo Any language More relevant Larger dataset Shorter term Longer term
Data setupPeriodJune 2010 to April 2012StocksTop 15 most tweeted stocks in S&P 500TweetsFinancial Dataset Timm Sprenger (4 million)4 Million tweets Topsy Brand Tweets (100+ million tweets)OtherKloutPeerindex
Sentiment analysis:Enabling computers to derive sentimentfrom natural language
Naive Approach: Dictionaries Use a dictionary of common positive and negative terms Count the number of positive and negative terms Use the difference between the two.
SNTMNT’s approach: machine learning Label a training set of tweets (target) Use preprocessing techniques Use several feature extractors Create a sparse dataset. Use supervised learning to train a machine learning model.
Labeling • 25K Financial tweets hand labeled • 30K Commercial tweets hand labeled • 1M #happy vs. #sad
ResultsFinancial tweets84.3% accurate on 2-point scale (Baseline: 60.4%)76.8% accurate on 3-point scale (Baseline: 65.0%)Beat Lexalytics (84.3% vs. 70.3%)Commercial tweets84.7% accurate on 2-point scale (Baseline: 61.0%)86.9% accurate on 3-point scale (baseline: 81.1%)
Stock Regression Input: Sentiment scores Mood states Meta Data Stock Output: Trading Indication Confidence
Many dimensions Tweet period Trading period Financial Tweets or Commercial Tweets Tweet Crunchers Models Trading strategy
Tweet Aggregation Problem Tweet volume Volume positive tweets Avg sentiment Sentiment Growth Etc.
Machine Learning Models Linear Regression Bayesian Approaches Decision Trees Neural Nets Support Vector Machines
Results R2 < 0.01 Not usable as an independent trading model after transaction costs. Still usable as an extra indicator to be used by proven trading models.
Products - next steps: Sentiment APIs Stock Dashboard Trading Indicator API (B2B) (B2B2C) (B2B) ‣ Market leader and thought leader financial sentiment analysis. ‣ Getting more insights ‣ Extend scope to further into added value of niche domains and SNTMNT algorithm as languages. indicator next to fundamental and technical analysis.
For more info, visit:www.SNTMNT.com Any questions?