20121108 sntmnt data_sciencenl
Upcoming SlideShare
Loading in...5
×
 

20121108 sntmnt data_sciencenl

on

  • 1,504 views

 

Statistics

Views

Total Views
1,504
Slideshare-icon Views on SlideShare
829
Embed Views
675

Actions

Likes
0
Downloads
15
Comments
0

3 Embeds 675

http://data-scientist.nl 651
http://inergy20.wordpress.com 17
https://twitter.com 7

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • -
  • -

20121108 sntmnt data_sciencenl 20121108 sntmnt data_sciencenl Presentation Transcript

  • < sen·ti·ment > the prevailing attitude of investors as to anticipated price development in a market.Tim Harbers, CTO SNTMNTDataScienceNL Meetup November 8th 2012
  • Tim Harbers Background BSc Computer Science MSc Computer Science Researcher Data Miner Technical Consultant Co-Founder and COO Co-Founder and CTO
  • The RockstarsVincent van LeeuwenCustomer Development ‣ Balanced multidisciplinary team ‣ Two machine learning experts in predictive analysis and largeKees van Nunen datasetsProduct Development ‣ Academic degrees in Behavioral Finance, Portfolio Finance, Strategic Management & ArtificialDurk Kingma IntelligenceData Mining Expert ‣ Strong network in (Dutch) financial industry ‣ Young, enthusiastic team with aTim Harbers proven entrepreneurial mindsetMachine Learning Expert
  • How to select the right stock to invest in?
  • Our solution:Predicting stock price movementbased on online buzz Engineered based on academic research: Van Leeuwen (2011) Bollen, et al, (2010) Sprenger and Welpe (2010) Sehgal and Song (2007)
  • Why would this work? Very different from traditional indicators News travels faster via social than traditional media Tremendous amount of data (Almost) nobody uses it yet
  • Why focus on Twitter? Public data & easily accessible Structured language 400M tweets per day
  • Historic ResearchBollen (2010)Created a model based on Twitter mood states, whichwas 86% accurate on the DJI.Sprenger and Welpe (2011)Analyzed correlation of the stock market and microblogs
  • Financial Sentiment vs Brand SentimentFinancial Sentiment Brand Sentiment Tweets relating to  Tweets relating to stocks brands Written by traders  Written by consumers Trader mumbo jumbo  Any language More relevant  Larger dataset Shorter term  Longer term
  • Data setupPeriodJune 2010 to April 2012StocksTop 15 most tweeted stocks in S&P 500TweetsFinancial Dataset Timm Sprenger (4 million)4 Million tweets Topsy Brand Tweets (100+ million tweets)OtherKloutPeerindex
  • SentimentScoring
  • Financial tweets
  • Commercial tweets
  • Sentiment analysis:Enabling computers to derive sentimentfrom natural language
  • Naive Approach: Dictionaries Use a dictionary of common positive and negative terms Count the number of positive and negative terms Use the difference between the two.
  • SNTMNT’s approach: machine learning Label a training set of tweets (target) Use preprocessing techniques Use several feature extractors Create a sparse dataset. Use supervised learning to train a machine learning model.
  • Labeling • 25K Financial tweets hand labeled • 30K Commercial tweets hand labeled • 1M #happy vs. #sad
  • Difficulties in sentiment analysis Authors / Urls Foreign languages Slang  aykm  lol  tgsttttptct Negation Target Sentiment Analysis
  • ResultsFinancial tweets84.3% accurate on 2-point scale (Baseline: 60.4%)76.8% accurate on 3-point scale (Baseline: 65.0%)Beat Lexalytics (84.3% vs. 70.3%)Commercial tweets84.7% accurate on 2-point scale (Baseline: 61.0%)86.9% accurate on 3-point scale (baseline: 81.1%)
  • Stock
  • Stock Regression Input:  Sentiment scores  Mood states  Meta Data  Stock Output:  Trading Indication  Confidence
  • Many dimensions Tweet period Trading period Financial Tweets or Commercial Tweets Tweet Crunchers Models Trading strategy
  • Tweet Aggregation Problem  Tweet volume  Volume positive tweets  Avg sentiment  Sentiment Growth  Etc.
  • Machine Learning Models Linear Regression Bayesian Approaches Decision Trees Neural Nets Support Vector Machines
  • Results R2 < 0.01 Not usable as an independent trading model after transaction costs. Still usable as an extra indicator to be used by proven trading models.
  • Products - next steps: Sentiment APIs Stock Dashboard Trading Indicator API (B2B) (B2B2C) (B2B) ‣ Market leader and thought leader financial sentiment analysis. ‣ Getting more insights ‣ Extend scope to further into added value of niche domains and SNTMNT algorithm as languages. indicator next to fundamental and technical analysis.
  • For more info, visit:www.SNTMNT.com Any questions?