Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

20121108 sntmnt data_sciencenl


Published on

  • Be the first to comment

  • Be the first to like this

20121108 sntmnt data_sciencenl

  1. 1. < sen·ti·ment > the prevailing attitude of investors as to anticipated price development in a market.Tim Harbers, CTO SNTMNTDataScienceNL Meetup November 8th 2012
  2. 2. Tim Harbers Background BSc Computer Science MSc Computer Science Researcher Data Miner Technical Consultant Co-Founder and COO Co-Founder and CTO
  3. 3. The RockstarsVincent van LeeuwenCustomer Development ‣ Balanced multidisciplinary team ‣ Two machine learning experts in predictive analysis and largeKees van Nunen datasetsProduct Development ‣ Academic degrees in Behavioral Finance, Portfolio Finance, Strategic Management & ArtificialDurk Kingma IntelligenceData Mining Expert ‣ Strong network in (Dutch) financial industry ‣ Young, enthusiastic team with aTim Harbers proven entrepreneurial mindsetMachine Learning Expert
  4. 4. How to select the right stock to invest in?
  5. 5. Our solution:Predicting stock price movementbased on online buzz Engineered based on academic research: Van Leeuwen (2011) Bollen, et al, (2010) Sprenger and Welpe (2010) Sehgal and Song (2007)
  6. 6. Why would this work? Very different from traditional indicators News travels faster via social than traditional media Tremendous amount of data (Almost) nobody uses it yet
  7. 7. Why focus on Twitter? Public data & easily accessible Structured language 400M tweets per day
  8. 8. Historic ResearchBollen (2010)Created a model based on Twitter mood states, whichwas 86% accurate on the DJI.Sprenger and Welpe (2011)Analyzed correlation of the stock market and microblogs
  9. 9. Financial Sentiment vs Brand SentimentFinancial Sentiment Brand Sentiment Tweets relating to  Tweets relating to stocks brands Written by traders  Written by consumers Trader mumbo jumbo  Any language More relevant  Larger dataset Shorter term  Longer term
  10. 10. Data setupPeriodJune 2010 to April 2012StocksTop 15 most tweeted stocks in S&P 500TweetsFinancial Dataset Timm Sprenger (4 million)4 Million tweets Topsy Brand Tweets (100+ million tweets)OtherKloutPeerindex
  11. 11. SentimentScoring
  12. 12. Financial tweets
  13. 13. Commercial tweets
  14. 14. Sentiment analysis:Enabling computers to derive sentimentfrom natural language
  15. 15. Naive Approach: Dictionaries Use a dictionary of common positive and negative terms Count the number of positive and negative terms Use the difference between the two.
  16. 16. SNTMNT’s approach: machine learning Label a training set of tweets (target) Use preprocessing techniques Use several feature extractors Create a sparse dataset. Use supervised learning to train a machine learning model.
  17. 17. Labeling • 25K Financial tweets hand labeled • 30K Commercial tweets hand labeled • 1M #happy vs. #sad
  18. 18. Difficulties in sentiment analysis Authors / Urls Foreign languages Slang  aykm  lol  tgsttttptct Negation Target Sentiment Analysis
  19. 19. ResultsFinancial tweets84.3% accurate on 2-point scale (Baseline: 60.4%)76.8% accurate on 3-point scale (Baseline: 65.0%)Beat Lexalytics (84.3% vs. 70.3%)Commercial tweets84.7% accurate on 2-point scale (Baseline: 61.0%)86.9% accurate on 3-point scale (baseline: 81.1%)
  20. 20. Stock
  21. 21. Stock Regression Input:  Sentiment scores  Mood states  Meta Data  Stock Output:  Trading Indication  Confidence
  22. 22. Many dimensions Tweet period Trading period Financial Tweets or Commercial Tweets Tweet Crunchers Models Trading strategy
  23. 23. Tweet Aggregation Problem  Tweet volume  Volume positive tweets  Avg sentiment  Sentiment Growth  Etc.
  24. 24. Machine Learning Models Linear Regression Bayesian Approaches Decision Trees Neural Nets Support Vector Machines
  25. 25. Results R2 < 0.01 Not usable as an independent trading model after transaction costs. Still usable as an extra indicator to be used by proven trading models.
  26. 26. Products - next steps: Sentiment APIs Stock Dashboard Trading Indicator API (B2B) (B2B2C) (B2B) ‣ Market leader and thought leader financial sentiment analysis. ‣ Getting more insights ‣ Extend scope to further into added value of niche domains and SNTMNT algorithm as languages. indicator next to fundamental and technical analysis.
  27. 27. For more info, Any questions?