Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.



Published on

  • Be the first to comment

  • Be the first to like this


  1. 1. "Youknow,I've been dealingwith these big mathematical modelsfor forecastingthe economy...ifI couldfigure outa wayto determine whether or not people are more fearful,or changingto more euphoric, and havea thirdway of figuringoutwhichof the two thingsare working, I don'tneed any ofthis other stuff. I couldforecast the economy better thanany way I know how" AlanGreenspan,November 2007 Abstract This quote fromformer Fed Chairmanalludes to a theoretical understandingofmarket dynamics proposedby Behavioral Economistssuggesting that “psychological,social,cognitive,andemotional factors influencethe economicdecisionsof individualsand institutionshaveconsequencesfor market prices, returns, and resource allocation”( Wikipedia). Criticsof Behavioral Economics cite the EfficientMarket Hypothesis statingthat “itis impossibleto"beat the market" because stock market efficiencycauses existingshareprices to always incorporateand reflect all relevant information. Accordingto the EMH, stocks alwaystrade at their fair valueon stock exchanges,makingit impossiblefor investorsto either purchaseundervaluedstocks or sell stocks for inflatedprices.As such, it shouldbe impossibletooutperformthe overall market through expert stock selection or market timing,and that the onlyway an investor canpossiblyobtainhigherreturns is by purchasingriskierinvestments” The debate between proponentsof Behavioral Economicsandthe EfficientMarket Theory spansa vast range oftopics that are far beyondthe scope of this paper. Whatfollowsrather isan introductory explorationofwhether or not sentimentdata— extracted from Twitter, Stock Twits,and market focusedchat rooms—canbe used to predict dailystock market returns. Analysiswasperformed on3 years of a sentiment data from 2012-2015 providedby PsychSignal© forresearchpurposes. The dataconsistsof two sentimentbased time series indicatorsderived froma proprietary sentiment miningprocessdeveloped by PsychSignal. TheSpider S&P500 ETF (SPY) was used as the target variable for the forecastinganalysis. Both contemporaneousandlagged relationshipsbetweenthe SPYand PsychSignal indicators were explored as well as relationships between the SPYand two compositeindicators derived from manipulationsofthe raw PsychSignal data. Finally,MultivariateOLSRegressionmodels were appliedto the data to determine whether PsychSignal’ssentimentbasedindicatorscouldbe used to predict next day returns in the Spider S&P500 ETF Data The data for thisanalysiswasobtainedfrom Quandl.comandPsychSignal’swebsite. Accordingto PsychSignal’sWebsite: • We measure twotypesof sentiment.Bullish and Bearish.Theymeasure veryspecific financial language andnotsimplypositiveand negative. • Bullishsentimentcanbe interpretedasbuyers or those whofavorthe buyside. • Bearishsentimentcanbe interpretedassellers or those whofavorthe sell side. • Our sentimentismeasuredona0 - 4 scale.0 beingthe lowest4beingthe highest.We sometimesconvertthe 0-4 scale to %. • The 0 - 4 sentimentscale measuresboththe intensity of the sentimentexpressedby individualsbutalsothe collective volumeof that sentimentexpressedoverthe crowdmood. A clarificationwasmade viaanemail exchange with PsychSignal’sfounderconfirmingthatintensity refersto the degree of sentimentpresentinastatement. For example, compare the followingstatements: 1. The SPY is reallystrongtoday 2. The SPY doesn’tlook like abadbuy here The firststatementwouldreceiveahigherintensity rating. The sentimentdataisthenaggregatedinto4 indicators and are chartedinFigures1 and 2 alongside the SPY: 1. BearishSentiment (Intensity) 2. BullishSentiment (Intensity) 3. BearishVolume (Quantity) 4. BullishVolume (Quantity)
  2. 2. Figure 1 SPY vs. Bullish and Bearish Sentiment (weekend sentiment data removed) Figure 2 SPY vs. Bullish and Bearish Volume (weekend sentiment data removed) Data Challenges Upon visual inspectionof the original Psychsignal data setsshownbelowinFigure 3, it appearedasif there were 3 distincttrendsinthe data that seemedtobe the resultof some systematicprocessassociatedwiththe waythe data wascollected. Figure 3 Original PsychSignal Dataset revealing 3 systematic trends in the data prior to 2012 and after 2015 You can see that priorto 2012 there isconsistently extreme variance inthe dataset. Additionally,there appearsto be a systematicjumpinbothseries subsequentto2015. Neitherphenomenoncoincided witha similarinthe observedSPYdata. Therefore,it was assumedtobe the resultof a systematicprocess associatedwiththe datacollection processperformed by PsychSignal. Forthat reason,analysiswas confined to a 3 yearperiodbetween2012 and 2015. There are over700 data pointsinthistime periodwhichshouldbe sufficientforthisanalysis. Secondly,the original PsychSignal dataincluded sentimentdatacollectedduringthe weekend. A zoomedininspectionof the 10 daycorrelationbetween the PsychSignal indicatorsshownbelowreveala consistentseasonal decline incorrelation withthe SPY that coincideswiththe weekendsentimentdata. Figure 4. Rolling 10 day correlation between SPY and Bullish/Bearish indicators. Cyclical pattern coincides with weekend sentiment measurement. A choice wasmade to remove the weekendsentiment data basedon the assumptionthatmarketparticipants discussingthe marketsduringthe weekendwere likely ruminatingovercarriedtradingpositionswhichthey were unsure of potentiallyresultinginbias. Data Transformations A visual inspectionof the datarevealscleartrendsin the meanand variance of each series. Eachserieswas differencedto induce trendstationaryandthen standardizedtoz scorescenteredaroundthe mean to minimize the impactbetweendifferentscalesof measurement. The resultingdistributionsare shown below. Figure 5. Distribution of data after differencing and standardization It isnecessarytopointout that bothdailyand annualizedreturnsin S&P500 are platykurtic,witha large concentrationof valuescenteredatthe meanand largertailsthan wouldbe expectedinanormal distribution. The SPYdata usedforthisanalysisis consistentwiththishistorical tendencyasshownbelow.
  3. 3. Figure 6. Platykurtic Distribution of Daily SPY Returns between 2012 and 2015 It appearsas if thisphenomenonisconsistentin the PsychSignal dataaswell. The figure belowshowsan overlayof all of the distributionsusedinthisanalysis. Figure 7. Overlay of data distributions One of the assumptionsthatwill be made throughout the followinganalysisisthatviolatingthe assumptionof normalityinthese datawill have anegligibleimpacton resultsdue tothe similarity inshape of eachdataset’s distribution. Additionally, itwill be assumedthatthe data provided by PsychSignal isanaccurate reflectionof the true sentimentexpressedbymarketparticipantsregardless of the potential impactonmarket prices. Inother words,the data reflectsthe true feelingsexpressedby tradersevenif those feelingscanorcannot predict marketmovement. Figure 8. A visual inspection of the chart above reveals a consistent visual relationship between the PsychSignal index and survey based data collected in the weeklyNAAIM and AAII Sentiment Surveys All otherassumptions regardingmodelingwill be discussedin latersections. ContemporaneousAnalysis Afterthe necessarydatacleaningandtransformation, analysiswasperformedtodetermine if there were any contemporaneousrelationshipsevidentinthe data. For clarification,contemporaneousinthiscontextrefersto the comparisonof the standardizeddailynetchange (closingprice toclosingprice) inthe SPYto the standardizeddailychange insentimentindicatorsas reportedbyPsychSignal. Contemporaneous doesnot referto intradaydatain thiscontext. Figure 9 is a CorrelationMatrix useful forvisualizingthe degree of correlationbetweenvariables. Blue values indicate positive correlationsandredvaluesindicate negative correlations. The size of the dotalsoreflects the degree of correlation. Mostof what isseeninthis graphicis inline withwhatone wouldexpectwiththis data ie.Bullishindicatorsare positivelyrelatedtothe SPY and BearishIndicatorsare negativelycorrelated. However,the Bull Volumeseriesrevealsaslight negative correlationtoSPYsuggestingthatas prices increase,the volume of PositiveSentimentdecreases slightly. Additionally,BullVolume andBearVolume exhibitastrongpositivecorrelationratherthana negative correlation.
  4. 4. Figure 9. Contemporaneous Correlation Matrix using Pearson Method. Scale is to the right. Both size and color of circle reflect degree of correlation. PairedSample t-testsforCorrelationwere performedto determine if thesecorrelationsweresignificant. The followingtable reveals we mustrejectthe null hypothesisthatr=0 and acceptthe alternate hypothesis that there isa non-zerocorrelationbetweenthese variable. All correlationssuggestall findingsare highly significantwithextremelylowp-values. The scatterplotmatrix inFigure 10 isfurtherhelpful in visualizingthe relationshipsbetweenthe data. The red line isa LoessRegressionSmoothingline usedtobetter visualize the nature of the relationshipsineachplot. The color coded rectangles are used to designate specific plots of interest. The blue rectangle highlights a very strong positive correlation between Bear Volume andBull Volume. The greenbox highlights the relationship between the Bullish and Bearish intensity based indicators and the SPY revealing a moderate positiveandnegativecorrelationrespectively as would be expected. The two red boxes both show the relationship between SPY and the PsychSignal Volume indicators. Note the “V” or curve-linear relationshipbetweenSentimentVolume and S&P Price. Figure 11 and 12 below are helpful visualizing this relationship in more depth. Figure 11. Density Plot comparing SPY on the X-axis to Bull Volume on the Y-axis. Notice V shaped pattern Figure 13 on the following page provides perhaps the best snapshot of the entire contemporaneous dataset. It stratifies the SPY data into 4 equal quadrants (based on frequency) and then displays the resulting distributionforall the otherdatasetsthatcorrespondto the data in each quadrant. CorrelationwithSPY TwoTailedPValue DegreesofFreedom 95%ConfidenceInterval Bullish 0.4095 2.20E-16 717 .3467<R<.4686 Bearish -0.3818 2.20E-16 717 -0.4425<R<-.3175 BullVolume -0.1636 0.00001044 717 -0.2339<R <-0.0915 BearVolume -0.4405 2.20E-16 717 -0.4976< R<-0.3796 Figure 10. ScatterPlot Matrix of Contemporaneous Pairwise Relationships. Colored rectangles highlight 3 primary areas of interest Figure 12 Density Plot comparing SPY (Xaxis) to Bear Volume (Y axis). Note that while the Bullish Volume tends to increase for both large negative and positive SPY returns, Bearish Volume exhibits a similar V shape but with far less densityin the positive return tails of the SPY
  5. 5. Figure 13. Marginal Distribution Plot showing the distribution of PsychSignal data corresponding to 4 equal sized quadrants of SPY Data. Note the consistency in the order of the horizontal shifting of the Bullish and Bearish Sentiment Indicators. Additionally, the Volume Indicators are largely centered at zero for the interquartile range of the SPY but differentiate at both tails. Bullish Volume increases uniformly while Bearish Volume increases more for the negative return tail. Thisis reflectedinthe datatable belowwhichdisplays the descriptive statisticsforBullishandBearishVolume on positive returndaysandnegative returndays revealingthatthe meanof Bull Volume onUpdays and Downdays isnot significantlydifferentfromzerowhile Bear Volume is. Summary of Key Findings of Contemporaneous Analysis 1. All foursentimentbasedindicatorsexhibit statisticallysignificantnon-zerocorrelationsto DailySPY Returns. 2. Bear Volume andBull Volumeindicators are positivelycorrelated contrarytoexpectation 3. Bear Volume andBull VolumeIndicatorsare stagnantfor dayswhenSPY volatilityislowand bothincrease duringhighvolatilitydays regardlessof the directionof the volatility 4. BullishVolume increasesmore uniformlyfor bothpositive andnegative returndayswhile Bear Volume increasesmuchmore fornegative daysthan it doespositive days. Predictive Analysis The secondstage of my analysiswasfocusedon exploringwhetherthe sentimentbaseddataprovided by PsychSignal couldbe usedto forecastnextday returnsof the SPY. Lags of (t-1,t-2, t-3, andt-4) were usedtopredictthe value of the target at time t inboth analyses. A brief visual inspectionof the data,showninFigure 13 and Figure 14 raisedimmediate doubtregardingthe predictive powerof the indicators. Figure 14 Correlation Matrix of all lagged series. Notice there is almost no correlation present between the SPY and any of the lagged sentiment indicators.
  6. 6. Figure 15. Marginal Distribution Plot of PsychSignal indicators at Lag 1 conditioned on the following day’s returns in the SPY. Notice the lack of differentiation in the distributions compared to the contemporaneous distributions in Figure 13 A multivariate OLSmodel wasthenappliedtothe datasetusingall foursentimentindicatorsateachlag individuallyandthenwithall 4 lagsat the same time. The followingassumptionswere made:  There isa linearrelationshipbetweenvariables  Multivariate normality.  Little multicollinearity.  No auto-correlation.  Homoscedasticity. Figure 16. Results of the OLS Multivariate Regression Model Onlytwolaggedseries,Lag1_BearishandLag4_Bearish were foundtobe significant andthe model performed poorlyoverall withanR-Squaredof .03 Additional regressionmodelswerethenapplied iterativelyconsistingof only1lag at a time. In addition to the laggedmodels,twomodelswere builtusing forwardlagsi.e.I ran a regressionbackwardsintime usingthe sentimentdataatforwardlags (t+1) and (t+2) to predictSPYat time t. The reasoningforthiswasthat if the sentimentdatacouldnotpredictSPY forwardin time butcouldpredictbackwardsin time,itwould suggestthatthe sentimentdataisreactionaryto price movementratherthananticipatoryof it. The R-Squared valuesof these modelsare shownbelow. Figure 17. R-Squared of Lagged, Contemporaneous, and Reverse Lagged Models. As youcan see,all laggedmodelsproduce R-Squared valuesof nearly0 while the contemporaneousmodels and reverse laggedmodelsexhibit largerandlinearly descendingR-Squaredvaluesindicatingthatthe sentimentreflectedinthe dataisreactionaryrather than anticipatory. Summaryof Key Findings 1. All foursentiment basedindicatorsexhibit statisticallysignificantnon-zero contemporaneous correlationstoDailySPY Returns. 2. Bear Volume andBull Volumeindicatorsare positivelycorrelatedcontrarytoexpectation 3. Bear Volume andBull VolumeIndicatorsare stagnantfor dayswhenSPY volatilityislowand bothincrease duringhighvolatilitydays regardlessof the directionof the volatility 4. BullishVolume increasesmore uniformlyfor bothpositive andnegative returndayswhile Bear Volume increasesmuchmore fornegative returndays thanit does forpositive days. 5. Correlation betweenSPYdataandSentiment Data drop off significantlyinlags1-5 SUMMARY OUTPUT Standardized Regression Statistics Multiple R 0.182102668 R Square 0.033161382 Adjusted R Square 0.011125174 Standard Error 0.998003864 Observations 719 ANOVA df SS MS F Significance F Regression 16 23.98171227 1.498857 1.504859 0.091512499 Residual 702 699.2002223 0.996012 Total 718 723.1819345 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept -0.001966444 0.037222494 -0.05283 0.957883 -0.075047192 0.071114 Lag1_Bullish 0.020831277 0.056744276 0.367108 0.713649 -0.090577543 0.13224 Lag1_Bearish -0.1554315 0.068882939 -2.25646 0.024349 -0.290672751 -0.02019 Lag1_BullVolum -0.064300473 0.10332598 -0.62231 0.533942 -0.267165435 0.138564 Lag1_BearVolume 0.12832188 0.121951763 1.052235 0.293054 -0.111111996 0.367756 Lag2_Bullish -0.04651744 0.064697438 -0.719 0.47238 -0.173541091 0.080506 Lag2_Bearish -0.12377006 0.076165343 -1.62502 0.104608 -0.273309211 0.025769 Lag2_BullVolum 0.05869661 0.120948614 0.485302 0.627614 -0.178767734 0.296161 Lag2_BearVolume -0.045350618 0.145316264 -0.31208 0.755071 -0.330657163 0.239956 Lag3_Bullish -0.072203302 0.064672649 -1.11644 0.264615 -0.199178284 0.054772 Lag3_Bearish -0.143175415 0.075859529 -1.88738 0.059522 -0.292114147 0.005763 Lag3_BullVolum 0.013903224 0.121460004 0.114468 0.9089 -0.224565155 0.252372 Lag3_BearVolume 0.004262675 0.1460306 0.02919 0.976721 -0.28244636 0.290972 Lag4_Bullish -0.089134713 0.056103892 -1.58874 0.112568 -0.199286235 0.021017 Lag4_Bearish -0.226359518 0.067750585 -3.34107 0.000879 -0.359377564 -0.09334 Lag4_BullVolum 0.028900516 0.103969722 0.277971 0.781117 -0.175228335 0.233029 Lag4_BearVolume 0.08721789 0.12336881 0.706969 0.479821 -0.154998142 0.329434
  7. 7. 6. OLS Multivariate Modelswere ineffectivein predictingfuture returnsinthe SPYyetwere marginallyeffective whenrun backwards, suggestingthatsentimentdataisreactive rather thananticipatory. 7. Basedon contemporaneousanalysis,there is evidence tosuggestthatmarketagentsexhibit strongertendenciestoreactto negative market movementthanpositive marketmovement. OngoingResearch 1. Applyingmore sophisticateddatamining techniquestoanalyze the data 2. Explorationof intradaydata 3. Exploringdirectional predictionmodels.i.e. “Up” or “Down”days 4. Developmentof aHeuristicbasedtrading indicator