Center for Financial Studies at the Goethe UniversityPhD Mini-courseFrankfurt, 25 January 2013Financial NetworksV. Inferri...
I. Mapping        II. MappingSystemic Risk   Financial Markets                                    2
Observing vs inferring• Observing links  – Exposures, payment flow, trade, co-ownership, joint    board membership, etc.  ...
AgendaV. Inferring Links•   Prices and Returns•   Controlling for common factors•   Correlation and dependence•   Signific...
Prices and Returns                     5
Prices vs Returns• Benefit of using returns vs prices for correlations:•   Normalization. All variables are denoted in sam...
Log vs Arithmetics Returns• Most often logarithmic returns are used in Finance. Why?• Benefit of using log returns vs arit...
Data Issues•   Which prices     – daily (low, high, close)     – time-zone issues     – high-frequency•   Missing prices  ...
Ticker   Name                     Sector                         WeightDAX stocks             ADS      Adidas             ...
Data: Prices of 30 stocks in DAX in 2011          date         ADS           ALV           BAYN       BAS           ...   ...
calculatereturns -command•   File (-file) : File with price data. The file must contain a table where each    row has the ...
Examples# Calculate daily log returnscalculatereturns -file daxprices-2011.csv -saveas daxreturns-2011.csv -methodlog -dat...
Data: Returns of 30 stocks in DAX in 2011          date         ADS.DE ALV.DE BAYN.DE BAS.DE      ...          2011-12-29 ...
Common Factors• A common factor may be affecting all the returns   – E.g bullish/bearish market• If we know the market ret...
Eigenvectors - example                         A plot of the normalised                         data (mean subtracted)    ...
Eigenvectors of asset correlations•   The eigenvectors of a correlation matrix are orthogonal to each other. The    data c...
pca -command•   File (-file) : File with input data.•   Components (-components) : Components to keep, e.g. 2- or 1 or 2-1...
Correlation and Covariance                             18
19
buildbycorrelation -command•   File (-file) : File from which to read data. To skip lines or use non-default field    deli...
•   Missing values (-missing) : Handling of missing or non-numeric values in data.    Alternatives:     –   Zero : missing...
Correlation Matrix & Network              A          B          C          A          0       0.72       0.31          B  ...
Examples# Calculate 100 day pairwise correlations with 1 day sliding window, considerany missing/non-numeric return as 0bu...
Heatmap -command•   Creates time series of heatmaps•   Like viz -command•   Allows mapping of data to cell size, shape, co...
28&29All components   Principal Component Removed                                               25
About Color PerceptionA and B are the sameshade of gray  http://upload.wikimedia.org/wikipedia/comm  ons/9/93/Optical_illu...
Anscombes QuartetConstructed in 1973 by Francis Anscombe to demonstrate both theimportance of graphing data before analyzi...
Significance of Correlations• We are estimating true correlations by means of  samples• Our estimates have errors• Questio...
Example - distribution of correlation in 30trials with random numbers         20 pairs                50 pairs        100 ...
Test for significance• Errors:   – Type I: False positive - we identify a correlation where there is     none   – Type II:...
30-32alpha -parameterSignificance level (-alpha) : Two-sided significance level forconfidence interval.Example:buildbycorr...
No signifance test, Principal   Only significant correlations ( =0.05),Component Removed               Principal Component...
Problem of Multiple Comparisons• Occurs when considering multiple simultanous estimates    – e.g. with correlation matrix ...
33-35bonferroni -parameterEnable Bonferroni correction (-bonferroni) : Option to use Bonferronicorrection in significance ...
Only significant correlations   Bonferroni correction, Only( =0.05), Principal Component   significant correlations ( =0.0...
Blog, Library and Demos at www.fna.fiDr. Kimmo Soramäkikimmo@soramaki.netTwitter: soramaki
Upcoming SlideShare
Loading in …5
×

Financial Networks V - Inferring Links

700 views

Published on

Fifth lecture of a PhD level course on "Financial Networks" at Center for Financial Research at Goethe University, Frankfurt.

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
700
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
25
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Financial Networks V - Inferring Links

  1. 1. Center for Financial Studies at the Goethe UniversityPhD Mini-courseFrankfurt, 25 January 2013Financial NetworksV. Inferring Links Dr. Kimmo Soramäki Founder and CEO FNA, www.fna.fi
  2. 2. I. Mapping II. MappingSystemic Risk Financial Markets 2
  3. 3. Observing vs inferring• Observing links – Exposures, payment flow, trade, co-ownership, joint board membership, etc. – Cause of link is known• Inferring links – Observing the effects and inferring a relationship – Cause of link is unknown – Time series on asset prices, trade volumes, balance sheet items 3
  4. 4. AgendaV. Inferring Links• Prices and Returns• Controlling for common factors• Correlation and dependence• Significant correlations• Multiple ComparisonsVI. Correlation Networks• Distance and Hierarchical Clustering• Minimum Spanning Tree & PMFG• Filtering• Visual Layouts 4
  5. 5. Prices and Returns 5
  6. 6. Prices vs Returns• Benefit of using returns vs prices for correlations:• Normalization. All variables are denoted in same units• Required by some matrix algebra• Returns (on asset prices) have low serial correlation• Returns are what matter once investement is made• Well use returns 6
  7. 7. Log vs Arithmetics Returns• Most often logarithmic returns are used in Finance. Why?• Benefit of using log returns vs arithmetic returns – Assumptions in statistical estimates (log-normal returns) – Easier calculations (integration, compounding) lead to smaller algorithmic complexity• Normaliy assumptions make log returns better• Well use log returns 7
  8. 8. Data Issues• Which prices – daily (low, high, close) – time-zone issues – high-frequency• Missing prices – Errors in source data – Partial holidays – Weekends/holidays – Exit and entry of assets• Handling options – Look up from other sources – Edit values, e.g. replace with same as previous (prices) or 0 (returns) – Exclude days/series from analysis – Decide on weekend returns depending on data 8
  9. 9. Ticker Name Sector WeightDAX stocks ADS Adidas clothing and footwear 2.0 ALV Allianz insurance 6.7 BAS BASF speciality chemicals 9.6 BAYN Bayer speciality chemicals 7.6 BEI Beiersdorf personal products 0.9 BMW BMW automobile manufacturers 3.3 CBK Commerzbank credit banks 1.0 CON Continental car parts manufacturers 0.8 DAI Daimler automobile manufacturers 5.8 DB1 Deutsche Börse securities brokers 1.5 DBK Deutsche Bank credit banks 5.1 DPW Deutsche Post logistics 1.9 DTE Deutsche Telekom fixed-line telecommunication 5.4 EOAN E.ON multi-utilities 6.2 FME Fresenius Medical Care health care 2.2 FRE Fresenius health care 1.6 HEI HeidelbergCement building materials 0.9 HEN3 Henkel personal products 1.5 IFX Infineon Technologies semiconductors 1.3 LHA Deutsche Lufthansa airlines 0.8 LIN Linde industrial gases 3.8 LXS Lanxess specialty chemicals 0.7 MRK Merck pharmaceuticals 1.0 MUV2 Munich Re re-insurance 2.9 RWE RWE multi-utilities 2.2 SAP SAP software 7.7 SDF K+S commodity chemicals 1.2 SIE Siemens diversified industrials 10.0 TKA ThyssenKrupp diversified industrials 1.3 VOW3 Volkswagen Group automobile manufacturers 3.4 9
  10. 10. Data: Prices of 30 stocks in DAX in 2011 date ADS ALV BAYN BAS ... 2011-12-30 50.26 73.91 49.4 53.89 2011-12-29 49.94 73.21 48.73 53.17 2011-12-28 49.79 73.15 47.36 52.39 2011-12-27 50.3 75.45 48.27 53.17 2011-12-23 50.29 75.97 48.13 52.83 2011-12-22 49.98 75.81 47.92 52.63 2011-12-21 49.43 75.37 47.04 52.4 2011-12-20 49.56 75.43 46.99 52.9 2011-12-19 47.96 72.23 44.75 51.15 2011-12-16 48.1 72.77 45.04 51.71 2011-12-15 47.89 73.47 44.99 51.13 2011-12-14 48 71.53 44.8 50.58 2011-12-13 48.45 73.04 45.73 51.1 ... 10
  11. 11. calculatereturns -command• File (-file) : File with price data. The file must contain a table where each row has the observation date as its first element, and asset prices as subsequent elements separated by a field delimiter. The first line read must contain names for each asset column (except the first column which contains the date). Mandatory.• Save As (-saveas) : File where result should be saved. Mandatory.• Date format (-dateformat) : Format of date in input file. Default value yyyy- MM-dd. Optional.• Method of calculation (-method) : Arithmetic (arithmetic) or logarithmic (log) returns. Mandatory. By default log.• Number of observations (-obs) : Larger than or equal to 2. Optional. By default 2.• Length of interval (-interval) : Larger than or equal to 1. Optional. By default 1.• Date order (-dateorder) : Date order in the file. Optional. Allowed values: [asc, desc]. By default asc. 11
  12. 12. Examples# Calculate daily log returnscalculatereturns -file daxprices-2011.csv -saveas daxreturns-2011.csv -methodlog -dateorder desc# Calculate two-day log returnscalculatereturns -file daxprices-2011.csv -saveas daxreturnsl-2dl2.csv -methodlog -obs 2 -interval 2 -dateorder desc# Calculate two-day log returns with a one-day windowcalculatereturns -file daxprices-2011.csv -saveas daxreturnsl-2dl1.csv -methodlog -obs 2 -interval 1 -dateorder desc# Calculate daily arithmetic returnscalculatereturns -file daxprices-2011.csv -saveas daxreturns-2da1.acsv -methodarithmetic -dateorder desc 12
  13. 13. Data: Returns of 30 stocks in DAX in 2011 date ADS.DE ALV.DE BAYN.DE BAS.DE ... 2011-12-29 0.0064 0.0095 0.0137 0.0135 2011-12-28 0.0030 0.0008 0.0285 0.0148 2011-12-27 -0.0102 -0.0310 -0.0190 -0.0148 2011-12-23 0.0002 -0.0069 0.0029 0.0064 2011-12-22 0.0062 0.0021 0.0044 0.0038 2011-12-21 0.0111 0.0058 0.0185 0.0044 2011-12-20 -0.0026 -0.0008 0.0011 -0.0095 2011-12-19 0.0328 0.0433 0.0488 0.0336 2011-12-16 -0.0029 -0.0074 -0.0065 -0.0109 2011-12-15 0.0044 -0.0096 0.0011 0.0113 2011-12-14 -0.0023 0.0268 0.0042 0.0108 2011-12-13 -0.0093 -0.0209 -0.0205 -0.0102 2011-12-12 0.0058 -0.0163 -0.0020 -0.0099 ... 13
  14. 14. Common Factors• A common factor may be affecting all the returns – E.g bullish/bearish market• If we know the market return, we can deduct it from the returns to look at excess returns to market• If we dont know, we can try to control for this unknown common factor via statistical methods• Principal Component Analysis allows us to identify and remove the common factor 14
  15. 15. Eigenvectors - example A plot of the normalised data (mean subtracted) -> First component explains maximum variance The data has two variables (x and y), the covariance matrix is 2x2 and thus there are two Eigenvectors The eigenvectors of the covariance matrix are overlayed as dotted lines 15
  16. 16. Eigenvectors of asset correlations• The eigenvectors of a correlation matrix are orthogonal to each other. The data can be projected in terms of its eigenvectors• Each eigenvector represents a Priciple Component – if values are similar, the component affects all returns similarly• Eigenvalues scale with the share of variance explaned by each component – Divide each by the sum to get share• Laloux et al (1999) find on S&P500 (and other markets) – The largest eigenvalue is well separated from the bulk and corresponds to the whole market as the corresponding eigenvector has roughly equal components – The next largest eigenvectors carry information about the real correlations and can be used in identifying clusters of strongly interacting assets – The smaller eigenvectors are noise (95%)L. Laloux, P. Cizeau, J.-P. Bouchaud, M. Potters, Noise Dressing of Financial Correlation Matrices. Phys. Rev. Lett. 83 (1999)1467 ; T. Heimo, J. Saramaki, J-P. Onnela, K. Kaski (2007). Spectral and network methods in the analysis of correlationmatrices of stock returns, Physica A 383, pp. 147–151 16
  17. 17. pca -command• File (-file) : File with input data.• Components (-components) : Components to keep, e.g. 2- or 1 or 2-10 Optional.• Observations (-obs) : Number of observations in each period.• Interval (-interval) : Number of observations between each period.• Saveas (-saveas) : Name of file to store amended data.Examples# calculate returns corresponding to all but principal componentpca -file dax-returns-2011.csv -obs 30 -interval 30 -components 2- -saveas dax-montly1.csv# calculate returns corresponding to second to 10th largest componentpca -file dax-returns-2011.csv -obs 30 -interval 30 -components 2-10 -saveas dax-montly2.csv 17
  18. 18. Correlation and Covariance 18
  19. 19. 19
  20. 20. buildbycorrelation -command• File (-file) : File from which to read data. To skip lines or use non-default field delimiter use the delimiter and skiplines arguments. Mandatory.• Save as property (-saveas) : Property name of saved result. Optional. By default correlation.• Number of observations (-obs) : Larger than or equal to 2. Optional.• Length of interval (-interval) : Larger than or equal to 1. Optional.• -commitinterval : Larger than or equal to 1. Optional. By default 10000.• Save standard deviation (-savestdev) : Option to save standard deviation as vertex property. Optional. By default false.• Save average returns (-savereturns) : Option to save average returns as vertex property. Optional. By default true.• Preserve (-preserve) : Do not overwrite existing networks in database. Optional. By default true. 20
  21. 21. • Missing values (-missing) : Handling of missing or non-numeric values in data. Alternatives: – Zero : missing values are considered as 0. – NaN : default value corresponds to NaN and missing values setting in Default format settings. – Alert : missing values cause the calculation to fail. Optional. Allowed values: [Zero, NaN, Alert]. By default Zero.• Start date (-start) : Starting date of calculation. Optional.• End date (-end) : Ending date of calculation. Optional.• Date format (-dateformat) : Format of date in input file. Default value corresponds to Date format setting in Default format settings. Optional.• Date order (-dateorder) : Order of dates in the input file. If dates are ascending order, the newest network has fewer observations than other networks. If dates are descending order, the oldest network has fewer observations. Optional. Allowed values: [asc, desc]. By default desc. 21
  22. 22. Correlation Matrix & Network A B C A 0 0.72 0.31 B 0.72 0 0.60 C 0.31 0.60 0 22
  23. 23. Examples# Calculate 100 day pairwise correlations with 1 day sliding window, considerany missing/non-numeric return as 0buildbycorrelationd -file daxreturns-2011.csv -missing Zero -days 100 -interval1 -preserve false# Calculate 100 day pairwise correlations with no sliding window, consider anymissing/non-numeric return as 0buildbycorrelationd -file daxreturns-2011.csv -missing Zero -days 100 -interval100 -preserve false# Calculate pairwise correlations for whole data, alert about missing valuesbuildbycorrelation -file daxreturns-2011.csv -missing Alert -preserve false 23
  24. 24. Heatmap -command• Creates time series of heatmaps• Like viz -command• Allows mapping of data to cell size, shape, color, label, hover• Allows definition of color ranges and domains# create heatmap from correlationsheatmap -sortv vertex_id -p correlation -symmetric true -cellsizedefault 13 -transition 0 -cellhover correlation -colordomain (-1)-1 -palette darkblue-lightgray-darkred -saveas daxheat-def-Y 24
  25. 25. 28&29All components Principal Component Removed 25
  26. 26. About Color PerceptionA and B are the sameshade of gray http://upload.wikimedia.org/wikipedia/comm ons/9/93/Optical_illusion_greysquares.gif
  27. 27. Anscombes QuartetConstructed in 1973 by Francis Anscombe to demonstrate both theimportance of graphing data before analyzing it and the effectof outliers on statistical properties Mean of x 9 Variance of x 11 Mean of y ~7.50 Variance of y ~4.1 Correlation ~0.816 Linear regression: y = 3.00 + 0.500x Chatterjee, Sangit; Firat, Aykut (2007). "Generating Data with Identical Statistics but Dissimilar Graphics: A Follow up to the Anscombe Dataset". American Statistician 61 (3): 248–254. 27
  28. 28. Significance of Correlations• We are estimating true correlations by means of samples• Our estimates have errors• Question: What is the chance that random data exhibits an observed correlation? 28
  29. 29. Example - distribution of correlation in 30trials with random numbers 20 pairs 50 pairs 100 pair 200 pairs 29
  30. 30. Test for significance• Errors: – Type I: False positive - we identify a correlation where there is none – Type II: False negative - we identify no correlation when there is one• We can control for Type I – Fisher transformation -> Normal distr. -> Confidence Interval• Possible to estimate Type II error rate - only more data will bring this down 30
  31. 31. 30-32alpha -parameterSignificance level (-alpha) : Two-sided significance level forconfidence interval.Example:buildbycorrelation -file daxreturns-2011.csv -missing Alert-obs 30 -interval 1 -alpha 0.05 -preserve false 31
  32. 32. No signifance test, Principal Only significant correlations ( =0.05),Component Removed Principal Component Removed 32
  33. 33. Problem of Multiple Comparisons• Occurs when considering multiple simultanous estimates – e.g. with correlation matrix of n=30 we have 30*30/2-30 = 420 estimates – > too many false positives by random chance alone – the system may not be accurate enough• Familywise error rate (FWER) - Probability of making one or more false positives among all the hypotheses when performing multiple hypotheses tests. – Solved with e.g. Bonferroni correction• False discovery rate (FDR) - Proportion of significant correlations that are false positives – Benjamini-Hochberg -procedure• Question: which is more important? Missing correlations that exist, or disregarding ones that do? 33
  34. 34. 33-35bonferroni -parameterEnable Bonferroni correction (-bonferroni) : Option to use Bonferronicorrection in significance testing. Optional. By default false.Examplebuildbycorrelation -file daxreturns.csv -missing Alert -obs30 -interval 1 -alpha 0.05 -bonferroni true -preserve false 34
  35. 35. Only significant correlations Bonferroni correction, Only( =0.05), Principal Component significant correlations ( =0.05),Removed Principal Component Removed 35
  36. 36. Blog, Library and Demos at www.fna.fiDr. Kimmo Soramäkikimmo@soramaki.netTwitter: soramaki

×