Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Recorded Future News Analytics for Financial Services


Published on

Published in: Economy & Finance, Business
  • Be the first to comment

Recorded Future News Analytics for Financial Services

  1. 1. Recorded Future<br />David Moon<br />Global Head of Financial Services<br />Bill Ladd<br />Chief Analytic Officer<br />
  2. 2. What is Recorded Future?<br />3/1/2011<br />2<br />We believe that the content of the web has predictive power.<br />So...<br />We’ve harvested and organized the only real-time source for past, planned and speculative events on the web.<br />To...<br />Allow users to “slice-and-dice” the web to make predictions. <br />
  3. 3. Web is Loaded with Predictions<br />3/1/2011<br />3<br />Silicon Valley executives head to Vail, Colo. next week for the annual Pacific Crest Technology Leadership Forum<br />Drought and malnutrition hinder next year’s development plans in Yemen...<br />“Strange new Russian worm set to unleash botnet on 4/1/2012...”<br /> The carrier may select partners to set up a new carrier as early as next month<br />“According to TechCrunch China’s new 4G network will be deployed by mid-2010”<br />“... Dr Sarkar says the new facility will be operational by March 2014...”<br />“2010 is the year when Iran will kick out Islam. Ya Ahura we will.”<br />“...opposition organizers plan to meet on Thursday to protest...”<br />“Excited to see Mubarak speak this weekend...”<br />
  4. 4. The RF Stack<br />3/1/2011<br />4<br />Application<br />Daily Average of Scores<br />API / FTP<br />RF Scores & Aggregates<br />Client Scores & Aggregates Clients can use the same underlying date to define their own<br /><ul><li>Scores: Proprietary sentiment, momentum, event score, etc
  5. 5. Aggregates</li></ul>Aggregates<br />RF Scores – Sentiment & Momentum<br />Scores<br />Time<br />Pub Date<br />Harvest Date<br />Inferred Dates<br />Recorded Future Driven Linguistic Processing yields a corpus that is<br /><ul><li>Structured
  6. 6. Relationship driven
  7. 7. Machine-Readable
  8. 8. Back-testable</li></ul>Events<br />Entities<br />Entities & Events –Extracted & Normalized<br />Sources<br />
  9. 9. Case Studies<br />Liquidity Management<br />Predicting liquidity with media coverage<br />Short Term Trading<br />“Future event” study<br />Strategy Allocation<br />Measuring investment strategy crowdedness with online media.<br />Risk Modeling<br />Anticipating future volatility with media sentiment and macroeconomic discussion.<br />3/1/2011<br />5<br />
  10. 10. Case 1 – Liquidity ManagementPredicting Liquidity with Momentum<br />Recorded Future momentum contains predictive information for dollar volume of S&P 500 companies.<br />Control for trailing market volume on a 1 and 20-day basis.<br />Use 1-day trailing momentum.<br />Call:lm(formula = Dollarvol.1 ~ 0 + lDollarvol.1 + smaDvol.Dollarvol.1 + smaxlMo, data = seriesdf)Residuals:Min 1Q Median 3Q Max -5.039e+09 -2.215e+07 -2.284e+06 1.813e+07 1.597e+10 Coefficients:Estimate Std. Error t value Pr(>|t|) lDollarvol.1 0.513193 0.003237 158.54 < 2e-16 ***smaDvol.Dollarvol.1 0.471645 0.003817 123.56 < 2e-16 ***smaxlMo 0.077162 0.015683 4.92 8.67e-07 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 170900000 on 72109 degrees of freedomMultiple R-squared: 0.8539, Adjusted R-squared: 0.8539 F-statistic: 1.405e+05 on 3 and 72109 DF, p-value: < 2.2e-16<br />3/1/2011<br />6<br />
  11. 11. Case 2 – Short Term TradingFuture Event Distributions<br />3/1/2011<br />7<br />Non-earnings related events are negative.<br />We controlled for earnings and non-earnings related news.<br />The study queried instances where there was advance notice of specific future events.<br />Events defined as one day long with S&P 500 constituents<br />These typically provided one to three days advance notice<br />~19,000 unique events satisfied these criteria<br /> ~1-3 days<br />t(days)<br />
  12. 12. Case 2 – Short Term TradingNews “Should” be Priced in Immediately<br />Buy the rumor, sell the news describes earnings related events.<br />Market adjusted returns increase on approach to the event day and decline thereafter.<br />It does not describe non-earnings related events.<br />No increase in returns on approach to event-day<br />Statistically significant increase in volume (0.3σ) and decrease in market adjusted returns.<br />Non-earnings related events were net negative.<br />3/1/2011<br />8<br />Typical Publication Day<br />Predicted Event Day<br />
  13. 13. Case 3 – Strategy AllocationQuantifying Strategy Crowdedness<br />3/1/2011<br />9<br />Recorded Future data yielded an inverse correlation between the performance of a momentum strategy and the business media’s discussion of momentum.<br />The study introduced a synthetic linguistic score.<br />Relied on standard API queries<br />Scored fragments based on momentum-related terms<br />Increased discussion of momentum-related trading correlated with declining returns.<br />Inverse correlation with $NAV/share of momentum mutual fund<br />Monthly correlation of -0.56 over the past year<br />
  14. 14. Case 4 – Risk ModelingVolatility Forecasting Methodology<br />Data Extraction<br />Extract all references to S&P 500 Companies from Recorded Future’s structured content database from January 1, 2009 to December 9, 2010.<br />Includes synonyms (IBM vs. International Business Machines, etc.)<br />Reduce to only mentions on “Blog” sources.<br />Compute sentiment and momentum of text surrounding references to the Index over the time period.<br />Data Aggregation<br />Compute daily series of count-weighted mean sentiment and momentum.<br />Modeling<br />Calculate exponential moving averages of these values over a 26-day trailing window. <br />Regress against 1-month forward realized volatility of S&P 500.<br />Model Assessment<br />Economic evaluation of model parameters – do they make sense?<br />Comparison to other volatility metrics – how does the signal compare?<br />3/1/2011<br />10<br />
  15. 15. Case 4 – Risk ModelingModel Summary<br />Call:<br />lm(formula = spyvol ~ vix + emamo + emaneg, data = blogus)<br />Residuals:<br /> Min 1Q Median 3Q Max <br />-0.0087503 -0.0020655 -0.0004415 0.0020463 0.0100361 <br />Coefficients:<br /> Estimate Std. Error t value Pr(>|t|) <br />(Intercept) -1.237e-02 2.460e-03 -5.028 7.03e-07 ***<br />vix 3.938e-04 2.511e-05 15.681 < 2e-16 ***<br />emamo 2.337e-02 8.164e-03 2.863 0.00439 ** <br />emaneg 3.204e-01 3.631e-02 8.824 < 2e-16 ***<br />---<br />Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 <br />Residual standard error: 0.003263 on 478 degrees of freedom<br /> (25 observations deleted due to missingness)<br />Multiple R-squared: 0.6867, Adjusted R-squared: 0.6848 <br />F-statistic: 349.3 on 3 and 478 DF, p-value: < 2.2e-16 <br />3/1/2011<br />11<br />Regressors are VIX value, and 28-day EMAs of average momentum and negative sentiment in text surrounding S&P500 companies.<br />Controlling for VIX, an increase in chatter around S&P 500 companies and an increase in negative sentiment around S&P500 companies lead increases in one-month forward realized volatility.<br />Positive sentiment NOT a statistically significant term in this model. Volatility driven by fear, not euphoria?<br />R-squared of 0.68 respectable compared to VIX’s ability to predict 1-month forward volatility – R-squared 0.63. <br />RF data orthogonal to market data – controlling for VIX leads to models with R-squared > 0.63<br />
  16. 16. Getting Started – Data & Aggregates<br />Data Instances<br />Sources & Documents<br />Entities & Events<br />Canonical events<br />Entity identifiers: tickers, industry taxonomy<br />Time<br />Publication Date<br />Event Date<br />Calculated Scores<br />Momentum, Sentiment<br />Aggregates<br />US equities aggregates<br />Daily composite momentum and sentiment scores for constituents of the Russell 3000<br />Custom aggregates built on data elements<br />3/1/2011<br />12<br />Canonical info<br />Sentiment<br />Momentum<br />Event time<br />Co-occurring entities<br />Source metadata<br />Document metadata<br />RF State Data<br />Entity information<br />
  17. 17. Access – Historical & Live Data<br />3/1/2011<br />13<br />Recorded Future Web Service API<br />Recorded Future FTP Archive<br />Data Formats – JSON, CSV<br />Historical Data Delivery – API, FTP<br />API – Historical results from raw data via web-service calls<br />FTP – Files of aggregates, and bulk history<br />Live Data Delivery – API <br />Customized calls – as frequently as intra-day<br />RF Aggregates – calculated daily<br />JSON HTTP <br />Request<br />.zip archive<br />csv<br /> aggregates<br />json/tsv<br /> instances<br />FTP Request<br />JSON/CSV<br />Response<br />Historical Batch Download<br />Live Download<br />Load RF Data<br />RF Customer Analytic Environment<br />(R, Matlab, Java, Python, Excel, etc.)<br />
  18. 18. Applications – Slicing the Data<br />Case Studies, revisited<br />Liquidity Management<br />Pull aggregate Day/Company momentum data for S&P 500<br />Short Term Trading<br />Pull instance data for S&P 500 companies where publish date is before event date<br />Strategy Allocation<br />Pull instance data where document category is “Business/Finance” and score fragments based on word/phrase choice<br />Risk Modeling<br />Pull aggregate momentum and sentiment data for the S&P 500 Companiesfor specified time period<br />Different slices entail unique media-analytic feeds<br />3/1/2011<br />14<br />
  19. 19. Summary<br />Recorded Future provides the world’s only real-time source of past, planned and speculative events.<br />Designed for clients to create unique media-analytic feeds via web-services API and FTP access.<br />Applied to liquidity planning, short term trading, strategy allocation, risk modeling, among other scenarios<br />3/1/2011<br />15<br />