< sen·ti·ment >        the prevailing attitude of investors as to anticipated                   price development in a mar...
Tim Harbers             Background   BSc Computer Science   MSc Computer Science              Researcher              Data...
The RockstarsVincent van LeeuwenCustomer Development                          ‣ Balanced multidisciplinary team           ...
How to select the right stock        to invest in?
Our solution:Predicting stock price movementbased on online buzz        Engineered based on academic research:        Van ...
Why would this work?   Very different from traditional indicators   News travels faster via social than traditional medi...
Why focus on Twitter?   Public data & easily accessible   Structured language   400M tweets per day
Historic ResearchBollen (2010)Created a model based on Twitter mood states, whichwas 86% accurate on the DJI.Sprenger and ...
Financial Sentiment vs Brand SentimentFinancial Sentiment      Brand Sentiment   Tweets relating to      Tweets relating...
Data setupPeriodJune 2010 to April 2012StocksTop 15 most tweeted stocks in S&P 500TweetsFinancial Dataset Timm Sprenger...
SentimentScoring
Financial tweets
Commercial tweets
Sentiment analysis:Enabling computers to derive sentimentfrom natural language
Naive Approach: Dictionaries   Use a dictionary of common positive and negative    terms   Count the number of positive ...
SNTMNT’s approach: machine learning   Label a training set of tweets (target)   Use preprocessing techniques   Use seve...
Labeling  • 25K Financial tweets hand labeled  • 30K Commercial tweets hand labeled  • 1M #happy vs. #sad
Difficulties in sentiment analysis   Authors / Urls   Foreign languages   Slang       aykm       lol       tgsttttpt...
ResultsFinancial tweets84.3% accurate on 2-point scale (Baseline: 60.4%)76.8% accurate on 3-point scale (Baseline: 65.0%)B...
Stock
Stock Regression   Input:       Sentiment scores       Mood states       Meta Data       Stock   Output:       Trad...
Many dimensions   Tweet period   Trading period   Financial Tweets or Commercial Tweets   Tweet Crunchers   Models  ...
Tweet Aggregation Problem                       Tweet volume                       Volume positive tweets               ...
Machine Learning Models   Linear Regression   Bayesian Approaches   Decision Trees   Neural Nets   Support Vector Mac...
Results   R2 < 0.01   Not usable as an independent trading model after    transaction costs.   Still usable as an extra...
Products - next steps: Sentiment APIs               Stock Dashboard   Trading Indicator API (B2B)                        (...
For more info, visit:www.SNTMNT.com                    Any questions?
20121108 sntmnt data_sciencenl
Upcoming SlideShare
Loading in …5
×

20121108 sntmnt data_sciencenl

1,487 views
1,420 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,487
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
22
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • -
  • -
  • 20121108 sntmnt data_sciencenl

    1. 1. < sen·ti·ment > the prevailing attitude of investors as to anticipated price development in a market.Tim Harbers, CTO SNTMNTDataScienceNL Meetup November 8th 2012
    2. 2. Tim Harbers Background BSc Computer Science MSc Computer Science Researcher Data Miner Technical Consultant Co-Founder and COO Co-Founder and CTO
    3. 3. The RockstarsVincent van LeeuwenCustomer Development ‣ Balanced multidisciplinary team ‣ Two machine learning experts in predictive analysis and largeKees van Nunen datasetsProduct Development ‣ Academic degrees in Behavioral Finance, Portfolio Finance, Strategic Management & ArtificialDurk Kingma IntelligenceData Mining Expert ‣ Strong network in (Dutch) financial industry ‣ Young, enthusiastic team with aTim Harbers proven entrepreneurial mindsetMachine Learning Expert
    4. 4. How to select the right stock to invest in?
    5. 5. Our solution:Predicting stock price movementbased on online buzz Engineered based on academic research: Van Leeuwen (2011) Bollen, et al, (2010) Sprenger and Welpe (2010) Sehgal and Song (2007)
    6. 6. Why would this work? Very different from traditional indicators News travels faster via social than traditional media Tremendous amount of data (Almost) nobody uses it yet
    7. 7. Why focus on Twitter? Public data & easily accessible Structured language 400M tweets per day
    8. 8. Historic ResearchBollen (2010)Created a model based on Twitter mood states, whichwas 86% accurate on the DJI.Sprenger and Welpe (2011)Analyzed correlation of the stock market and microblogs
    9. 9. Financial Sentiment vs Brand SentimentFinancial Sentiment Brand Sentiment Tweets relating to  Tweets relating to stocks brands Written by traders  Written by consumers Trader mumbo jumbo  Any language More relevant  Larger dataset Shorter term  Longer term
    10. 10. Data setupPeriodJune 2010 to April 2012StocksTop 15 most tweeted stocks in S&P 500TweetsFinancial Dataset Timm Sprenger (4 million)4 Million tweets Topsy Brand Tweets (100+ million tweets)OtherKloutPeerindex
    11. 11. SentimentScoring
    12. 12. Financial tweets
    13. 13. Commercial tweets
    14. 14. Sentiment analysis:Enabling computers to derive sentimentfrom natural language
    15. 15. Naive Approach: Dictionaries Use a dictionary of common positive and negative terms Count the number of positive and negative terms Use the difference between the two.
    16. 16. SNTMNT’s approach: machine learning Label a training set of tweets (target) Use preprocessing techniques Use several feature extractors Create a sparse dataset. Use supervised learning to train a machine learning model.
    17. 17. Labeling • 25K Financial tweets hand labeled • 30K Commercial tweets hand labeled • 1M #happy vs. #sad
    18. 18. Difficulties in sentiment analysis Authors / Urls Foreign languages Slang  aykm  lol  tgsttttptct Negation Target Sentiment Analysis
    19. 19. ResultsFinancial tweets84.3% accurate on 2-point scale (Baseline: 60.4%)76.8% accurate on 3-point scale (Baseline: 65.0%)Beat Lexalytics (84.3% vs. 70.3%)Commercial tweets84.7% accurate on 2-point scale (Baseline: 61.0%)86.9% accurate on 3-point scale (baseline: 81.1%)
    20. 20. Stock
    21. 21. Stock Regression Input:  Sentiment scores  Mood states  Meta Data  Stock Output:  Trading Indication  Confidence
    22. 22. Many dimensions Tweet period Trading period Financial Tweets or Commercial Tweets Tweet Crunchers Models Trading strategy
    23. 23. Tweet Aggregation Problem  Tweet volume  Volume positive tweets  Avg sentiment  Sentiment Growth  Etc.
    24. 24. Machine Learning Models Linear Regression Bayesian Approaches Decision Trees Neural Nets Support Vector Machines
    25. 25. Results R2 < 0.01 Not usable as an independent trading model after transaction costs. Still usable as an extra indicator to be used by proven trading models.
    26. 26. Products - next steps: Sentiment APIs Stock Dashboard Trading Indicator API (B2B) (B2B2C) (B2B) ‣ Market leader and thought leader financial sentiment analysis. ‣ Getting more insights ‣ Extend scope to further into added value of niche domains and SNTMNT algorithm as languages. indicator next to fundamental and technical analysis.
    27. 27. For more info, visit:www.SNTMNT.com Any questions?

    ×