This document discusses conditional correlation and non-stationarity in correlation. It summarizes that correlation is often not stable over time and increases during turbulent markets. This has implications for risk modeling and portfolio diversification. The document reviews literature on time-varying correlation and proposes adjusting risk models to include separate correlation regimes to allow for conditional correlation that varies based on market conditions. It provides examples of implementing Markov switching or two-regime models to account for changes in correlation during normal versus turbulent periods.
Data analysis test for association BY Prof Sachin Udepurkarsachinudepurkar
1) The document discusses analyzing relationships between variables through bivariate analysis. Bivariate analysis examines the relationship between two variables and can determine direction, strength, and statistical significance.
2) It provides examples of using scatter plots and calculating covariance to visually represent and quantify relationships between variables. Covariance measures how much two variables change together.
3) Calculating the correlation coefficient further standardizes and quantifies relationships, resulting in a number between -1 and 1 that indicates the strength and direction of a relationship. Strong positive or negative correlations near 1 or -1 show clear relationships between variables.
Regression analysis is a statistical technique used to model relationships between variables. It allows one to predict the average value of a dependent variable based on the value of one or more independent variables. The key ideas are that the dependent variable is influenced by the independent variables in a linear or curvilinear fashion, and regression provides an equation to estimate the dependent variable given values of the independent variables. Common applications of linear regression include forecasting, determining relationships between variables, and estimating how changes in one variable impact another.
This document provides an overview of econometric modeling techniques. It discusses objectives of econometric modeling including empirical verification of economic theories and policy analysis. It also describes types of econometric models such as single-equation regression models, simultaneous-equation models, and time series models. Model building criteria and assumptions of single-equation regression models are outlined along with methods for dealing with violations of assumptions like multicollinearity and autocorrelation.
Factor analysis was conducted on survey data from 30 respondents about factors influencing toothpaste purchase. The analysis aimed to reduce the 6 survey items into underlying factors. Correlation and significance tests supported conducting factor analysis. Principal component analysis extracted 2 factors explaining over 80% of the variance: 1) "health benefits" including preventing cavities and decay and strengthening gums, and 2) "smile attractiveness" including shiny teeth and fresh breath. Varimax rotation produced a clear factor structure which was validated through model fit measures.
This document discusses correlation and regression analysis. It defines correlation as a statistical measure of how two variables are related. A correlation coefficient between -1 and 1 indicates the strength and direction of the linear relationship between variables. A scatterplot can show this graphically. Regression analysis involves using one variable to predict scores on another variable. Simple linear regression uses one independent variable to predict a dependent variable, while multiple regression uses two or more independent variables. The goal is to identify the regression line that best fits the data with the least error. The coefficient of determination, R2, indicates how much variance in the dependent variable is explained by the independent variables.
The presentation was prepared to provide a brief overview on Regression analysis as to what it means and how it differs from related statistical measures like Correlation. The last few slides briefly discusses on the different types of data used in Econometrics.
The document provides guidance on summarizing categorical data through frequency tables, bar charts, pie charts, and contingency tables. It emphasizes that data displays should follow the area principle to accurately show distributions and not be misleading. Conditional distributions and marginal distributions from contingency tables allow examining relationships between variables. Common errors include violating the area principle, misleading scales, small sample sizes, and overstating conclusions.
Discriminant analysis (DA) is a statistical technique used to predict group membership when the dependent variable is categorical and the independent variables are continuous. It identifies which variables discriminate between two or more naturally occurring groups. DA develops a linear equation to predict group membership based on weighted combinations of predictor variables. It aims to maximize the distance between group means to achieve strong discriminatory power. Like regression, DA assumes variables are normally distributed, cases are randomly sampled, and groups are mutually exclusive and collectively exhaustive. It requires at least two groups with minimal overlap and similar group sizes of at least five cases. DA can classify new cases into groups based on the discriminant functions derived from existing data.
Data analysis test for association BY Prof Sachin Udepurkarsachinudepurkar
1) The document discusses analyzing relationships between variables through bivariate analysis. Bivariate analysis examines the relationship between two variables and can determine direction, strength, and statistical significance.
2) It provides examples of using scatter plots and calculating covariance to visually represent and quantify relationships between variables. Covariance measures how much two variables change together.
3) Calculating the correlation coefficient further standardizes and quantifies relationships, resulting in a number between -1 and 1 that indicates the strength and direction of a relationship. Strong positive or negative correlations near 1 or -1 show clear relationships between variables.
Regression analysis is a statistical technique used to model relationships between variables. It allows one to predict the average value of a dependent variable based on the value of one or more independent variables. The key ideas are that the dependent variable is influenced by the independent variables in a linear or curvilinear fashion, and regression provides an equation to estimate the dependent variable given values of the independent variables. Common applications of linear regression include forecasting, determining relationships between variables, and estimating how changes in one variable impact another.
This document provides an overview of econometric modeling techniques. It discusses objectives of econometric modeling including empirical verification of economic theories and policy analysis. It also describes types of econometric models such as single-equation regression models, simultaneous-equation models, and time series models. Model building criteria and assumptions of single-equation regression models are outlined along with methods for dealing with violations of assumptions like multicollinearity and autocorrelation.
Factor analysis was conducted on survey data from 30 respondents about factors influencing toothpaste purchase. The analysis aimed to reduce the 6 survey items into underlying factors. Correlation and significance tests supported conducting factor analysis. Principal component analysis extracted 2 factors explaining over 80% of the variance: 1) "health benefits" including preventing cavities and decay and strengthening gums, and 2) "smile attractiveness" including shiny teeth and fresh breath. Varimax rotation produced a clear factor structure which was validated through model fit measures.
This document discusses correlation and regression analysis. It defines correlation as a statistical measure of how two variables are related. A correlation coefficient between -1 and 1 indicates the strength and direction of the linear relationship between variables. A scatterplot can show this graphically. Regression analysis involves using one variable to predict scores on another variable. Simple linear regression uses one independent variable to predict a dependent variable, while multiple regression uses two or more independent variables. The goal is to identify the regression line that best fits the data with the least error. The coefficient of determination, R2, indicates how much variance in the dependent variable is explained by the independent variables.
The presentation was prepared to provide a brief overview on Regression analysis as to what it means and how it differs from related statistical measures like Correlation. The last few slides briefly discusses on the different types of data used in Econometrics.
The document provides guidance on summarizing categorical data through frequency tables, bar charts, pie charts, and contingency tables. It emphasizes that data displays should follow the area principle to accurately show distributions and not be misleading. Conditional distributions and marginal distributions from contingency tables allow examining relationships between variables. Common errors include violating the area principle, misleading scales, small sample sizes, and overstating conclusions.
Discriminant analysis (DA) is a statistical technique used to predict group membership when the dependent variable is categorical and the independent variables are continuous. It identifies which variables discriminate between two or more naturally occurring groups. DA develops a linear equation to predict group membership based on weighted combinations of predictor variables. It aims to maximize the distance between group means to achieve strong discriminatory power. Like regression, DA assumes variables are normally distributed, cases are randomly sampled, and groups are mutually exclusive and collectively exhaustive. It requires at least two groups with minimal overlap and similar group sizes of at least five cases. DA can classify new cases into groups based on the discriminant functions derived from existing data.
This document discusses multiple linear regression analysis conducted to assess staff satisfaction levels at an educational institution. A questionnaire was administered to staff across multiple locations. Factor analysis was used to identify the variables that best predict overall satisfaction. A regression model was developed using satisfaction as the dependent variable and questions regarding workplace expectations, resources, communication, recognition, development opportunities, and opinions as independent variables. The model was analyzed in SPSS and showed high explanatory power, with no issues of multicollinearity between predictors.
This chapter discusses various statistical tests used for multiple regression analysis, including:
1. Testing individual regression coefficients and the overall model significance.
2. Testing whether two or more coefficients are equal.
3. Testing if coefficients satisfy certain restrictions.
4. Testing the stability of a regression model over time using the Chow test.
5. Testing linear vs. log-linear functional forms using the MacKinnon-White-Davidson test.
The chapter outlines different statistical approaches like confidence intervals, F-tests, and t-tests to evaluate hypotheses about coefficients and models.
This document provides an overview of correlation analysis procedures in SPSS, including bivariate correlation, partial correlation, and distance measures. It discusses interpreting correlation coefficients and significance values. Scatterplots are recommended to check assumptions before correlation. Hands-on exercises are included to find correlations between variables while controlling for other variables.
This presentation explains almost all the concepts that needs to be understood and developed before running an OLS in Regression analysis. The concept of Unconditional and Conditional means have been discussed in detail along with the differences between the PRF and SRF.
This document discusses multiple linear regression analysis. It begins by defining a multiple regression equation that describes the relationship between a response variable and two or more explanatory variables. It notes that multiple regression allows prediction of a response using more than one predictor variable. The document outlines key elements of multiple regression including visualization of relationships, statistical significance testing, and evaluating model fit. It provides examples of interpreting multiple regression output and using the technique to predict outcomes.
This document discusses multicollinearity in regression analysis. It defines multicollinearity as a near linear relationship between predictor variables, which violates an assumption of classical linear regression. It provides an example of multicollinearity between product price and competitor prices. The effects of multicollinearity include indeterminate regression coefficients and infinite variance and covariance of coefficients when multicollinearity is perfect. Sources of multicollinearity include the data collection method, constraints in the population, model specification, and having more predictors than observations.
This document provides an overview of linear regression machine learning techniques. It introduces linear regression models using one feature and multiple features. It discusses estimating regression coefficients to minimize error and find the best fitting line. The document also covers correlation, explaining that a correlation does not necessarily indicate causation. Multiple linear regression is described as fitting a linear function to multiple predictor variables. The risks of overfitting with too complex a model are noted. Code examples of implementing linear regression in Scikit-Learn and Statsmodels are referenced.
Correlation is a statistical technique used to determine the degree of relationship between two variables. Correlational research aims to identify and describe relationships but does not imply causation. Positive correlation indicates high scores on one variable are associated with high scores on the other, while negative correlation means high scores on one variable are associated with low scores on the other. Correlational research can be used for explanatory or predictive purposes. More complex techniques like multiple regression allow prediction using combinations of variables. Threats to internal validity like subject characteristics must be controlled.
Segunda parte del Curso de Perfeccionamiento Profesional no Conducente a Grado Académico: Inglés Técnico para Profesionales de Ciencias de la Salud. DEPARTAMENTO ADMINISTRATIVO SOCIAL. Escuela de Enfermería. ULA. Mérida. Venezuela. Se oferta en la modalidad presencial de 3 ó 4 unidades crédito y los costos son solidarios y dependen de la zona del país que lo solicite.
El inglés técnico se basa en el tipo de vocabulario que va a manejar y el objetivo para el que va a estudiar inglés. En general en inglés técnico se busca poder comprender textos, y principalmente, textos técnicos de las disciplinas de salud en este caso que esté buscando, por ejemplo, si estas estudiando algo que tenga que ver con Medicina o Enfermería, empezara a ver nombres de enfermedades, enfoques epidemiológicos, entre otros. A diferencia del inglés normal que es mayormente comunicación diaria y gramática.
Durante las sesiones de aprendizaje se presentan las nociones generales acerca de la gramática de escritura inglesa y su transferencia en nuestra lengua española. En este módulo, se inicia la experiencia práctica eligiendo textos para observar los elementos facilitados.
Seguidamente, los participantes las ideas que se encuentran alrededor de fuentes en línea para profundizar en el aprendizaje en materia de inglés técnico.
Correlation analysis quantifies the relationship between two or more variables. It is used to determine how much one variable predicts another and whether their relationship is positive or negative. The Pearson correlation coefficient (r) measures the strength and direction of linear relationships between variables. SPSS software enables statistical analysis and calculation of correlation coefficients to interpret relationships in data.
This document discusses correlation and provides examples to illustrate key concepts:
1. Correlation quantifies the linear relationship between two variables and ranges from -1 to 1. Values closer to 1 or -1 indicate a stronger linear relationship.
2. Scatterplots visually depict the relationship and can show if variables are positively or negatively correlated.
3. The Pearson correlation coefficient (r) is a common measure of linear correlation calculated using variables' means, sums, and standard deviations.
4. Correlation only captures linear relationships and does not prove causation between variables. Additional analysis is needed to interpret correlated variables.
Correlation analysis is a statistical technique used to determine the degree of relationship between two quantitative variables. Scatterplots are used to graphically depict the relationship and identify if it is positive, negative, or no correlation. The correlation coefficient measures the strength and direction of correlation, ranging from -1 to 1. A significance test determines if a correlation is likely to have occurred by chance or is statistically significant. Different types of correlation include simple, multiple, partial, and autocorrelation.
Multiple Regression and Logistic RegressionKaushik Rajan
1) Multiple Regression to predict Life Expectancy using independent variables Lifeexpectancymale, Lifeexpectancyfemale, Adultswhosmoke, Bingedrinkingadults, Healthyeatingadults and Physicallyactiveadults.
2) Binomial Logistic Regression to predict the Gender (0 - Male, 1 - Female) with the help of independent variables such as LifeExpectancy, Smokingadults, DrinkingAdults, Physicallyactiveadults and Healthyeatingadults.
Tools used:
> RStudio for Data pre-processing and exploratory data analysis
> SPSS for building the models
> LATEX for documentation
Explains some advanced uses of multiple linear regression, including partial correlations, analysis of residuals, interactions, and analysis of change. See also previous lecture http://www.slideshare.net/jtneill/multiple-linear-regression
The document discusses various methods for scaling and measuring constructs, including unidimensional and multidimensional scaling. It describes exploratory factor analysis and confirmatory factor analysis as two common methods for analyzing multidimensional data. Exploratory factor analysis is used to uncover the underlying factor structure without prior hypotheses, while confirmatory factor analysis tests a hypothesized factor structure.
A review of the assumptions behind fundamental, macro, and statistical risk models. Pros and cons of each approach. Introducing adaptive hybrid risk models.
- Multinomial logistic regression predicts categorical membership in a dependent variable based on multiple independent variables. It is an extension of binary logistic regression that allows for more than two categories.
- Careful data analysis including checking for outliers and multicollinearity is important. A minimum sample size of 10 cases per independent variable is recommended.
- Multinomial logistic regression does not assume normality, linearity or homoscedasticity like discriminant function analysis does, making it more flexible and commonly used. It does assume independence between dependent variable categories.
- Noise in interest rate data propagates into calculated forward rates, making them more volatile and less correlated over time.
- The standard covariance estimator of forward rate volatility is biased, especially for longer-term forward rates and adjacent rates.
- New robust estimation techniques that use subsampling, averaging, and bias-correction can produce a lower estimated volatility and higher correlation than the standard approach. These account better for the effects of noise in the data.
This document outlines the schedule and topics for an advanced econometrics and Stata training course taking place in Beijing from November 17-26, 2019. The course will cover topics including introduction to econometrics and Stata, single and multiple regression, hypothesis testing, time series models, panel data models, and frontier analysis. Sessions are planned each morning and evening, with exercises and practice sessions interspersed.
The document summarizes a study on modeling risk aggregation and sensitivity analysis for economic capital at banks. It finds that different risk aggregation methodologies, such as historical bootstrap, normal approximation, and copula models, produce significantly different economic capital estimates ranging from 10% to 60% differences. The empirical copula approach tends to be the most conservative while normal approximation is the least conservative. The results indicate banks should take a conservative approach to quantify integrated risk and consider the impact of methodology choice and parameter uncertainty on economic capital estimates.
This document discusses multiple linear regression analysis conducted to assess staff satisfaction levels at an educational institution. A questionnaire was administered to staff across multiple locations. Factor analysis was used to identify the variables that best predict overall satisfaction. A regression model was developed using satisfaction as the dependent variable and questions regarding workplace expectations, resources, communication, recognition, development opportunities, and opinions as independent variables. The model was analyzed in SPSS and showed high explanatory power, with no issues of multicollinearity between predictors.
This chapter discusses various statistical tests used for multiple regression analysis, including:
1. Testing individual regression coefficients and the overall model significance.
2. Testing whether two or more coefficients are equal.
3. Testing if coefficients satisfy certain restrictions.
4. Testing the stability of a regression model over time using the Chow test.
5. Testing linear vs. log-linear functional forms using the MacKinnon-White-Davidson test.
The chapter outlines different statistical approaches like confidence intervals, F-tests, and t-tests to evaluate hypotheses about coefficients and models.
This document provides an overview of correlation analysis procedures in SPSS, including bivariate correlation, partial correlation, and distance measures. It discusses interpreting correlation coefficients and significance values. Scatterplots are recommended to check assumptions before correlation. Hands-on exercises are included to find correlations between variables while controlling for other variables.
This presentation explains almost all the concepts that needs to be understood and developed before running an OLS in Regression analysis. The concept of Unconditional and Conditional means have been discussed in detail along with the differences between the PRF and SRF.
This document discusses multiple linear regression analysis. It begins by defining a multiple regression equation that describes the relationship between a response variable and two or more explanatory variables. It notes that multiple regression allows prediction of a response using more than one predictor variable. The document outlines key elements of multiple regression including visualization of relationships, statistical significance testing, and evaluating model fit. It provides examples of interpreting multiple regression output and using the technique to predict outcomes.
This document discusses multicollinearity in regression analysis. It defines multicollinearity as a near linear relationship between predictor variables, which violates an assumption of classical linear regression. It provides an example of multicollinearity between product price and competitor prices. The effects of multicollinearity include indeterminate regression coefficients and infinite variance and covariance of coefficients when multicollinearity is perfect. Sources of multicollinearity include the data collection method, constraints in the population, model specification, and having more predictors than observations.
This document provides an overview of linear regression machine learning techniques. It introduces linear regression models using one feature and multiple features. It discusses estimating regression coefficients to minimize error and find the best fitting line. The document also covers correlation, explaining that a correlation does not necessarily indicate causation. Multiple linear regression is described as fitting a linear function to multiple predictor variables. The risks of overfitting with too complex a model are noted. Code examples of implementing linear regression in Scikit-Learn and Statsmodels are referenced.
Correlation is a statistical technique used to determine the degree of relationship between two variables. Correlational research aims to identify and describe relationships but does not imply causation. Positive correlation indicates high scores on one variable are associated with high scores on the other, while negative correlation means high scores on one variable are associated with low scores on the other. Correlational research can be used for explanatory or predictive purposes. More complex techniques like multiple regression allow prediction using combinations of variables. Threats to internal validity like subject characteristics must be controlled.
Segunda parte del Curso de Perfeccionamiento Profesional no Conducente a Grado Académico: Inglés Técnico para Profesionales de Ciencias de la Salud. DEPARTAMENTO ADMINISTRATIVO SOCIAL. Escuela de Enfermería. ULA. Mérida. Venezuela. Se oferta en la modalidad presencial de 3 ó 4 unidades crédito y los costos son solidarios y dependen de la zona del país que lo solicite.
El inglés técnico se basa en el tipo de vocabulario que va a manejar y el objetivo para el que va a estudiar inglés. En general en inglés técnico se busca poder comprender textos, y principalmente, textos técnicos de las disciplinas de salud en este caso que esté buscando, por ejemplo, si estas estudiando algo que tenga que ver con Medicina o Enfermería, empezara a ver nombres de enfermedades, enfoques epidemiológicos, entre otros. A diferencia del inglés normal que es mayormente comunicación diaria y gramática.
Durante las sesiones de aprendizaje se presentan las nociones generales acerca de la gramática de escritura inglesa y su transferencia en nuestra lengua española. En este módulo, se inicia la experiencia práctica eligiendo textos para observar los elementos facilitados.
Seguidamente, los participantes las ideas que se encuentran alrededor de fuentes en línea para profundizar en el aprendizaje en materia de inglés técnico.
Correlation analysis quantifies the relationship between two or more variables. It is used to determine how much one variable predicts another and whether their relationship is positive or negative. The Pearson correlation coefficient (r) measures the strength and direction of linear relationships between variables. SPSS software enables statistical analysis and calculation of correlation coefficients to interpret relationships in data.
This document discusses correlation and provides examples to illustrate key concepts:
1. Correlation quantifies the linear relationship between two variables and ranges from -1 to 1. Values closer to 1 or -1 indicate a stronger linear relationship.
2. Scatterplots visually depict the relationship and can show if variables are positively or negatively correlated.
3. The Pearson correlation coefficient (r) is a common measure of linear correlation calculated using variables' means, sums, and standard deviations.
4. Correlation only captures linear relationships and does not prove causation between variables. Additional analysis is needed to interpret correlated variables.
Correlation analysis is a statistical technique used to determine the degree of relationship between two quantitative variables. Scatterplots are used to graphically depict the relationship and identify if it is positive, negative, or no correlation. The correlation coefficient measures the strength and direction of correlation, ranging from -1 to 1. A significance test determines if a correlation is likely to have occurred by chance or is statistically significant. Different types of correlation include simple, multiple, partial, and autocorrelation.
Multiple Regression and Logistic RegressionKaushik Rajan
1) Multiple Regression to predict Life Expectancy using independent variables Lifeexpectancymale, Lifeexpectancyfemale, Adultswhosmoke, Bingedrinkingadults, Healthyeatingadults and Physicallyactiveadults.
2) Binomial Logistic Regression to predict the Gender (0 - Male, 1 - Female) with the help of independent variables such as LifeExpectancy, Smokingadults, DrinkingAdults, Physicallyactiveadults and Healthyeatingadults.
Tools used:
> RStudio for Data pre-processing and exploratory data analysis
> SPSS for building the models
> LATEX for documentation
Explains some advanced uses of multiple linear regression, including partial correlations, analysis of residuals, interactions, and analysis of change. See also previous lecture http://www.slideshare.net/jtneill/multiple-linear-regression
The document discusses various methods for scaling and measuring constructs, including unidimensional and multidimensional scaling. It describes exploratory factor analysis and confirmatory factor analysis as two common methods for analyzing multidimensional data. Exploratory factor analysis is used to uncover the underlying factor structure without prior hypotheses, while confirmatory factor analysis tests a hypothesized factor structure.
A review of the assumptions behind fundamental, macro, and statistical risk models. Pros and cons of each approach. Introducing adaptive hybrid risk models.
- Multinomial logistic regression predicts categorical membership in a dependent variable based on multiple independent variables. It is an extension of binary logistic regression that allows for more than two categories.
- Careful data analysis including checking for outliers and multicollinearity is important. A minimum sample size of 10 cases per independent variable is recommended.
- Multinomial logistic regression does not assume normality, linearity or homoscedasticity like discriminant function analysis does, making it more flexible and commonly used. It does assume independence between dependent variable categories.
- Noise in interest rate data propagates into calculated forward rates, making them more volatile and less correlated over time.
- The standard covariance estimator of forward rate volatility is biased, especially for longer-term forward rates and adjacent rates.
- New robust estimation techniques that use subsampling, averaging, and bias-correction can produce a lower estimated volatility and higher correlation than the standard approach. These account better for the effects of noise in the data.
This document outlines the schedule and topics for an advanced econometrics and Stata training course taking place in Beijing from November 17-26, 2019. The course will cover topics including introduction to econometrics and Stata, single and multiple regression, hypothesis testing, time series models, panel data models, and frontier analysis. Sessions are planned each morning and evening, with exercises and practice sessions interspersed.
The document summarizes a study on modeling risk aggregation and sensitivity analysis for economic capital at banks. It finds that different risk aggregation methodologies, such as historical bootstrap, normal approximation, and copula models, produce significantly different economic capital estimates ranging from 10% to 60% differences. The empirical copula approach tends to be the most conservative while normal approximation is the least conservative. The results indicate banks should take a conservative approach to quantify integrated risk and consider the impact of methodology choice and parameter uncertainty on economic capital estimates.
This document provides an analysis and application of the Markov Switching Multifractal (MSM) volatility model. The author applies the MSM to daily log-returns of the S&P 500, S&P 100, VIX, and VXO indices. The MSM constructs a multifractal model with over 1,000 states but only 4 parameters. It outperforms a normal distribution GARCH model in-sample and out-of-sample, though not a Student's t-distribution GARCH. However, the MSM forecasts volatility significantly better than both GARCH models at horizons of 20-50 days.
A researcher in attempting to run a regression model noticed a neg.docxevonnehoggarth79783
A researcher in attempting to run a regression model noticed a negative beta sign for an explanatory variable when s/he was expecting a positive sign based on theoretical considerations. What advice would you give to the researcher as to what is going on and what specific diagnostics would you look at? Explain conceptually and statisticallythe different ways you cancorrect for this problem.
Reason
One of the most common and important reasons for such situations is the existence of multicollinearity. Multicollinearity can happen if some of the independent variables are highly correlated to each other or to another variable that is not in the model.
Multicollinearity also has other symptoms such as
· Large variance for regression coefficients
· Non-significant individual coefficients while the general model is significant
· Change of marginal contributions depending on the variables in the model
· Large correlation coefficients in the correlation matrix of variables
It should however be noted that the general model can preserve its predictive ability and it is only the explanatory power that is lost
Before going to the solutions and measures the researcher can take it is wise to take a step back and see the underlying reason for the multicollinearity. An extreme case where two variables are identical gives the best understanding of problem
In this case we are trying to define y as a function of and while in reality . Therefore any linear combination of and is replaceable by infinite other linear combinations (ie )
It is simply understandable that while the y is predicted correctly in all the instances individual coefficients for and are meaningless.
Diagnosis
One of the most common diagnoses for multicollinearity is the variance inflation factor (VIF)
Where
And is the coefficient of multiple determination of regression of on other variables
The variance inflation factor therefore determines how much the variance of each coefficient inflates. when equals zero VIF equals 1 which suggests zero multicollinearity heuristic is that any value of VIF larger than 10 is alerting and a case of strong multicollinearity exists.
Solution
s
There are a few solutions for the multi Collinearity problem:
1- Ignoring the problem completely is possible for cases where we only care about the final model fit and prediction capability rather than individual coefficients and explanation power
2- Removing some of the correlated variables from the model, this can be justified since we can argue the effect of variable is however seen by similar highly correlated variables that are kept in the model
3- Principle component analysis (or any orthogonal transformation) can reduce the number of factors to a few orthogonal factors with no collinearity; however we should note that the interpretation of variables after a PC transformation is hard.
4- For cases where we intend to keep all the variables in the model without any major transformation, the Ridge regr.
This document provides an overview of descriptive statistics and inferential statistics. Descriptive statistics are used to describe basic features of data through simple summaries, while inferential statistics are used to make inferences about populations based on samples. Examples of descriptive statistics include measures of central tendency, dispersion, frequency distributions and contingency tables. Inferential statistics allow for comparisons between groups and populations through techniques like t-tests, analysis of variance, regression analysis, and other general linear models.
This document provides an overview of descriptive statistics and inferential statistics. Descriptive statistics are used to describe basic features of data through simple summaries, while inferential statistics are used to make generalizations beyond the sample data. Key concepts covered include measures of central tendency and dispersion, the general linear model, dummy variables, experimental and quasi-experimental designs, analysis of variance, analysis of covariance, and regression analysis.
The document discusses descriptive statistics and inferential statistics. Descriptive statistics are used to describe basic features of data through simple summaries, while inferential statistics are used to make inferences beyond the sample data to general populations. Some common descriptive statistics are measures of central tendency, dispersion, frequency, and contingency tables. Inferential statistics allow for comparisons between groups and determining the probability of observed differences occurring by chance. Regression analysis is also discussed as a technique used to model relationships between dependent and independent variables and understand how changes in independent variables impact the dependent variable.
This document provides an overview of key concepts in statistics including measures of central tendency, measures of dispersion, probability, correlation, time series analysis, and network planning methods like CPM and PERT. It defines terms like mean, median, mode, standard deviation, variance, probability, random experiment, correlation, time series, network, critical path, slack time, and differences between CPM and PERT. It also lists advantages and limitations of using PERT/CPM for project management.
This document provides an overview and summary of linear regression analysis theory and computing. It discusses linear regression models and the goals of regression analysis. It also introduces some key topics that will be covered in the book, including simple and multiple linear regression, model diagnosis, generalized linear models, Bayesian linear regression, and computational methods like least squares estimation. The book aims to serve as a one-semester textbook on fundamental regression analysis concepts for graduate students.
The document appears to be a statistics assignment submitted by a student analyzing daily stock price data of SBI, ICICI and HDFC banks from January 2012 to October 2012.
Key findings from the analysis include: SBI had the highest average price and turnover, while ICICI had the lowest variability in stock prices. A positive skewness was found for ICICI, indicating more high values, while SBI had a negative skewness. Correlation coefficients were computed between the stock prices and total traded quantities, and linear regression equations were formulated. Overall, the analysis aimed to identify which of the three bank stocks exhibited the most consistent patterns for investment purposes.
A gentle introduction to growth curves using SPSSsmackinnon
A brief introduction on how to conduct growth curve statistical analyses using SPSS software, including some sample syntax. Originally presented at IWK Statistics Seminar Series at the IWK Health Center, Halifax, NS, May 1, 2013.
This document provides an overview of exploratory factor analysis (EFA). EFA is used to uncover the underlying structure of a set of variables and identify latent variables that are inferred rather than directly observed. It allows investigation of factor structures and helps understand response patterns. The document discusses EFA assumptions, procedures for extraction and rotation of factors, and methods for determining the appropriate number of factors.
Factor analysis is a statistical technique used to identify underlying factors that explain the pattern of correlations within a set of observed variables. It groups variables that are highly correlated with each other into factors to reduce data dimensionality. The key steps are extracting factors with eigenvalues greater than 1, evaluating factor loadings to interpret the grouping of variables, and rotating factors to maximize interpretability of the results. SPSS output includes correlation coefficients, KMO/Bartlett's tests of sampling adequacy, eigenvalues, communalities, scree plots, and rotated component matrices.
Regression Analysis presentation by Al Arizmendez and Cathryn LottierAl Arizmendez
We present an overview of regression analysis, theoretical construct, then provide a graphic representation before performing multiple regression analysis step by step using SPSS (audio files accompany the tutorial).
This document discusses canonical correlation analysis (CCA), a statistical technique used to analyze relationships between sets of multiple dependent and independent variables. CCA identifies linear combinations of variables from each set that are most strongly correlated. The summary discusses how CCA works, its objectives of determining relationships between variable sets and maximizing correlation, and how it differs from ordinary correlation analysis by finding an optimal coordinate system. Limitations of CCA are also noted.
Using Cross Asset Information To Improve Portfolio Risk Estimationyamanote
There are obvious relationships between the various securities of a given firm that impact our expectations of risk. For example, if fixed income investors expect a corporate bond of a company to default, there must be a related bankruptcy event that would negatively impact shareholders in that firm. In this presentation, Nick will describe how to use data from bond and option markets to improve risk estimation for equity portfolios, and how to use information from the equity markets to improve estimation of credit risk in fixed income securities. The goal of the process is to create holistic risk estimation where all expectations of risk are mutually consistent across the entire capital structure of a firm, and related derivatives.
The Search for a Better Risk Model - MPT Forum Tokyo March 1st 2012yamanote
This presentation discusses the three most common ways to estimate a multi-factor risk model, sheds some light on the numerous assumptions underlying the models, and provides some thoughts about how to address those assumptions to make the models better fit the real world. The Northfield hybrid risk model is discussed. Non-stationary volatility, correlation, clusters in volatility, the use of forward-looking signals such as implied-volatility and cross-sectional dispersion, as well as the use of quantified news information to update risk forecasts are included.
Since the CAPM model Sharpe (1965) and the first “fundamental” model by King (1966) the use of “factors” in alpha generation and risk modeling has become mainstream. However, the types of factors we employ and the techniques we use to model relationships have in general not progressed much since. In addition, many of our favorite techniques assume that the world is static, whereas of course markets evolve and change dramatically; as we have seen so vividly illustrated over the last few years.
We review fundamental, macro-economic, and statistical factors, describing the advantages and disadvantages of each, and review some newer techniques that explicitly allow for evolving relationships in data sets and harness emerging technologies that can capture much more nuanced relationships than simple correlation: “flexible” least-squares regression, artificial immune systems, single-pass clustering, semantic clustering, social network influence measurement, layer-embedded networks, block-modeling, and more.
Nick Wade Using A Structural Model For Enterprise Risk, Dst Conference 2011...yamanote
The document discusses using Northfield's Enterprise Engine (EE) structural risk model to provide integrated risk analysis across asset classes for both portfolio management and enterprise risk purposes. Key points: 1) EE uses a consistent factors-based approach to model risk across equities, fixed income, property, and other asset classes; 2) It employs structural models to assess credit and illiquid asset risk rather than relying solely on ratings and appraisal smoothing; 3) EE can be used at different levels of detail for different audiences and provides customizable reporting and risk decomposition.
A brief literature review and roadmap through agent-based models of financial markets. Laying out the key decisions agent based model builders need to make and some of the empirical results from recent models investigating the effect of short-selling bans, leverage etc.
Balancing quantitative models with common sense 2008yamanote
1. Technological advances have led to more available data but distinguishing true signals from noise remains a challenge.
2. New techniques in machine learning, time series analysis, and decision systems allow for more sophisticated modeling but also risks of overfitting or relying too heavily on assumptions of history repeating.
3. As markets evolve with new types of participants and strategies, risk models must question traditional assumptions and incorporate more realistic representations of factors like liquidity, speculation, and asymmetric responses in bull and bear markets.
Looking at the risk within an investment period rather than just at the end of the period. Evaluating the effect on VaR of non-Normal distributions, jumps, and drift
2. Overview Correlation holds a pivotal place in our analysis of data, and the construction of forecasting models for return and risk Review the literature on correlation stability with a particular focus on turbulent markets Backtrack: Review assumptions underlying correlation Explore role in regression, factor analysis, and cluster analysis Discuss alternate distance measures and adjustments Show recent advances in regression, factor analysis, and cluster analysis that avoid non-stationarity issues Return to the thread: Show how a simple regime-switching model allowing conditional correlation can be incorporated into the linear multifactor risk model
3. Correlation breakdown The concept of a “correlation breakdown” describing the tendency of correlations to move towards 1 as markets melt down reducing all the benefit of diversification just when it’s needed the most. It would also have stark implications for the mutual fund industry, whose business model is at least in part based on providing access to diversification
4. To open… Increasing attention is being paid to the issue of correlations varying over time: (stocks) De Santis, G. and B. Gerard (1997), International asset pricing and portfolio diversification with time-varying risk, Journal of Finance, 52, 1881-1912. (stocks) Longin, F. and B. Solnik (2001), Extreme correlation of international equity markets, Journal of Finance, LVI(2), 646-676. (bonds) Hunter, D.M. and D.P. Simon (2005), A conditional assessment of the relationships between the major world bond markets, European Financial Management, 11(4), 463-482. (bonds) Solnik, B., C. Boucrelle and Y.L. Fur (1996), International market correlation and volatility, Financial Analysts Journal, 52(5), 17-34. Markov switching model: Chesnay, F and Jondeau, E “Does Correlation Between Stock Returns really increase during turbulent periods?” Bank of France research paper. To date little explored – however, Implied Correlation also seems useful, and more powerful than historical correlation in forecasting (we saw the same result with volatility): Campa, J.M. and P.H.K. Chang (1998), The forecasting ability of correlations implied in foreign exchange options, Journal of International Money and Finance, 17, 855-880.
5. Correlation stability One of the first… Kaplanis (1988): STABLE Tang (1995), Ratner (1992), Sheedy (1997): STABLE – although crash of 1987 regarded as an “anomaly” Bertero and Mayer (1989), King and Wadwhani (1990) and Lee and Kim (1993): correlation has increased, but STABLE Not quite so stable? Erb et al (1994) – increases in bear markets Longin and Solnick (1995) – increases in periods of high volatility Longin and Solnick (2001) – increases in bear markets
6. Turbulence in the market Kritzman (2009): Correlation of US and foreign stocks when both markets’ returns are one standard deviation above their mean: -17% Correlation of US and foreign stocks when both markets’ returns are one standard deviation below their mean: +76% “Conditional correlations are essential for constructing properly diversified portfolios”
7. Correlation regimes If it’s not stable, how about Markov switching models? Ramchand and Susmel (1998), Chesnay and Jondreau (2001) – correlation, conditioned on market regime, increases in periods with high volatility Ang and Bekaert (1999) – evidence for two regimes; a high vol/high corr, and a low vol/low corr.
8. backtrack Before we get too far down the track, lets look at the assumptions underlying correlation and see what the implications are
9. Correlation – a definition Correlation measures the strength and direction of a linear relationship between two variables Difference from the mean in the numerator Standard deviation in the denominator Not new… dates from the 1880’s…
10. Correlation – limitations & assumptions Linear – fits only the linear part of the relationship Stationary – susceptible to trend in mean in either series Stationary – assumes the volatility of each series is unchanging over time No concept of higher moments Sensitive to the underlying distribution Sensitive to the presence of estimation error in the observations An incomplete measure of the relationship between two series
11. Correlation in regression Stating the obvious, but… Correlation is at the heart of regression analysis: Remember to use something like Durbin-Watson to test for serial correlation Typical situation: huge R^2, low DW (positive s.c.) If DW<1 problematic positive serial correlation If you think the residuals are non-Normal, DW blows up. Use Breusch-Godfrey test instead.
12. Correlation in factor analysis (Principal Components Analysis is a sub-class of the family of Factor Analysis techniques) Correlation is a key part of factor analysis PCA uses the eigenvectors of the covariance matrix, and hence is affected by anything that impacts the volatility or correlation of the series
13. Correlation in cluster analysis Cluster analysis is closely related to factor analysis Cluster analysis assigns members to a group depending on rules and a distance measure For example “complete linkage” cluster analysis adds a new member to the group whose least-related member is most highly related to the new member. Thus, depending on the choice of distance measure [correlation is the most common choice], cluster analysis may also be affected by any problems with correlation…
14. One classification system Jorge Luis Borges, “Other Inquisitions” 1937-1952Animals are divided into: a)those that belong to the Emperor, b) embalmed ones, c) those that are trained, d) suckling pigs, e) mermaids, f) fabulous ones, g) stray dogs, h) those that are included in this classification, i) those that tremble as if they were mad, j) innumerable ones, k) those drawn with a very fine camel's hair brush, l) others, m) those that have just broken a flower vase, n) those that resemble flies from a distance.
15. Thoughts Non-stationary volatility (ARCH, GARCH, etc) We spend an heroic amount of time trying to forecast non-stationary volatility But we often just ignore it when we calculate correlation, or perform regression analysis, or run factor analysis (or PCA) Non-stationary mean (Trend) We often build models to capture the alpha in momentum, reversals, and other manifestations of a non-stationary mean But we often ignore those when we calculate correlation, or perform regression analysis, or run factor analysis Read the fine print…
16. paralysis Yes, I know, if you read all the fine print and believed that non-stationarity in the data rendered all the techniques useless you’d never do any analysis and would eventually get fired for surfing the internet all day for months and months looking for The Right Approach. Fries with that?
17. More thoughts… What happens ex-post when we analyze data? IR, IC Is my risk model broken? Measures such as IR, IC, are also affected by non-stationarity IC varies over time
18. Nonstationarity again…Huber 2001 Manager has 2.33% forecast tracking error and -6.3% realized return. 3-sigma event or a broken risk model? Risk model on target [ex-ante and ex-post SD both 2.xx] Then why? Unfortunately -50bps per month alpha trend Typical measures of risk are centered on the trend and thus ignore the risk of being consistently bad (Or good! He could have had +6% return…) Extending this idea, Qian and Hua (2004) define “strategy risk” as the standard deviation of the manager’s IC over time, and thus “forecast true active risk”: Forecast Active Risk = std(IC) * Breadth1/2 * Forecast Tracking Error Huber, Gerard. “Tracking Error and Active. Management”, Northfield Conference Proceedings, 2001, http://www.northinfo.com/documents/164.pdf
19. An aside… Sharpe Ratio Trend Effect: in the case of declining markets, the fund with the higher total risk exhibits a higher (less negative) Sharpe ratio… Another great way to stuff up your Sharpe ratio: Imagine you have a model that only works sometimes [no, no, I know your models ALWAYS work…] Be really really good sometimes …and in cash the rest of the time [being responsible] => your mean return is low, and your best months are the ones contributing all your volatility. Another manager doing the same thing with less skill can have a better Sharpe ratio
21. Outlier dependence on correlation The presence of outliers is problematic for correlation (think about regression) Use Mahalanobis distance as a test for outliers
23. alternatives We are looking for a measure of similarity, or shared behavior, or difference, or distance between two series Ideally we want one with as few restrictions as possible i.e. non-linear, robust to errors, not dependent on a particular distribution, and so on.
24. Related measures, adjustments and alternatives Rank correlation Disattenuation Total correlation / mutual information Cohesion/Coherence Mahalanobis distance Euclidean Distance (special case of MD) Masochists can read 74 pages on similarity measures in: Sneath PHA & Sokal RR (1973) Numerical Taxonomy. Freeman, San Francisco.
25. Rank correlation No linearity requirement where: di = xi − yi = the difference between the ranks of corresponding values Xi and Yi, and n = the number of values in each data set (same for both sets).
26. Disattenuation (what??) Somewhat complicated, but essentially just an adjustment to correlation for the presence of estimation error in underlying series Tends to upward bias correlations
27. Mutual information Mutual information/ Total correlation Total Correlation [Watanabe (1960)] expresses the amount of redundancy or dependency existing among a set of variables.
28. Cohesion/coherence The spectral coherence is a statistic that can be used to examine the relation between two signals or data sets. It is commonly used to estimate the power transfer between input and output of a linear system. The squared coherence between two signals x(t) and y(t) is a real-valued function that is defined as [1][2]: where Gxy is the cross-spectral density between x and y, and Gxx and Gyy the autospectral density of x and y respectively. The magnitude or power of the spectral density is denoted as |G|.
29. Mahalanobis distance MD is one example of a “Bregman divergence” , a group of distance measures. Clustering: classifies the test point as belonging to that class for which the Mahalanobis distance is minimal. This is equivalent to selecting the class with the maximum likelihood. Regression: Mahalanobis distance and leverage are often used to detect outliers, especially in the development of linear regression models. A point that has a greater Mahalanobis distance from the rest of the sample population of points is said to have higher leverage since it has a greater influence on the slope or coefficients of the regression equation. Specifically, Mahalanobis distance is also used to determine multivariate outliers. A point can be an multivariate outlier even if it is not a univariate outlier on any variable. Factor Analysis: recent research couples Mahalanobis distance with Factor Analysis and use MD to determine whether a new observation is an outlier or a member of the existing factor set. [Zhang 2003] MD depends on covariance (S^-1 is the inverse of the covariance matrix), so is exposed to the same stationarity issues that affect correlation, however as described above it can help us reduce correlation’s outlier dependence.
30. Mahalanobis in action Borrowed from Kritzman: Skulls, financial turbulence, and the implications for risk management. July 2009
31. Regression with nonstationary data Techniques have been developed specifically to allow time-varying sensitivities FLS (flexible least-squares) FLS is primarily a descriptive tool that allows us to gauge the potential for time-evolution of exposures Minimze both sum of squared errors and sum of squared dynamic errors (coefficient estimates)
32. FLS example An example from Clayton and MacKinnon (2001) The coefficient apparently exhibits structural shift in 1992
33. Factor analysis with nonstationary data Dahlhaus, R. (1997). Fitting Time Series Models to Nonstationary Processes. Annals of Statistics, Vol. 25, 1-37. Del Negro and Otrok (2008): Dynamic Factor Models with Time-Varying Parameters: Measuring Changes in International Business Cycles (Federal Reserve Bank New York) Eichler, M., Motta, G., and von Sachs, R. (2008). Fitting dynamic factor models to non-stationary time series. METEOR research memoranda RM/09/002, Maastricht University. Stock and Watson (2007): Forecasting in dynamic factor models subject to structural instability (Harvard). There are techniques available, and they are being applied to financial series.
34. Cluster analysis with nonstationary data Guedalia, London, Werman; “An on-line agglomerative clustering method for nonstationary data” Neural Computation, February 15, 1999, Vol. 11, No. 2, Pages 521-54 C. Aggarwal, J. Han, J. Wang, and P. S. Yu, On Demand Classification of Data Streams, Proc. 2004 Int. Conf. on Knowledge Discovery and Data Mining (KDD'04), Seattle, WA, Aug. 2004. G. Widmer and M. Kubat, “Learning in the Presence of Concept Drift and Hidden Contexts”, Machine Learning, Vol. 23, No. 1, pp. 69-101, 1996. Again, there are techniques available to conquer the problem
35. Clustering – artificial immune systems Non-stationary clustering is also related to the development of artificial immune systems: Modeling evolving data sets, you can think of data as “born” with a set of factors (immunity) and subsequently develops immunity to new effects as they appear. You can decide whether/how much memory their should be of new and non-current factors. Each new observation could be one of three things: a member of an existing cluster, the first member of a new cluster, or an outlier.
36. The story so far A long time ago (in a galaxy far far away?) correlation was stable. Recent evidence tends to suggest that correlation is very much regime-dependent, and the consensus seems to be that two regimes are sufficient. Correlation can be upset by outliers, trend, noise, and non-linearity Correlation is the “default” choice in regression analysis, factor analysis, and cluster analysis – the core of our toolkit More recent techniques, alternative measures, and adjustments exist to combat these effects.
37. What’s left The linear multi-factor risk model Adjusting the model for non-stationarity Making the model “conditional” by allowing separate regimes Lunch
38. The linear multi-factor risk model Relationship between R and F is linear ∀F The Exposures E are the correlations between R and F. There are N common factor sources of return The distribution of F is stationary, Normal, i.i.d. ∀F (Implicitly also the volatility of R and F is stationary)
39. Effect on risk model estimation If not properly addressed during estimation: Time-Series (macro model): effect will be observed in exposures and correlations Cross-sectional (fundamental): effect observed in correlation (covariance) matrix Statistical: effect observed in factor loadings (from FA), factor returns (from regression analysis) and in covariance matrix. Ouch!
40. Limitations of the model Symmetric – the response is the same whether a factor increases or decreases Linear – the response is the same for any size move in a factor Stationary – we are assuming that volatility and mean is stationary for all the factors
41. Adjustments for non-stationarity (WARNING: MARKETING PLUG: all currently used in the Northfield risk model family) Exponential Weighting Conditional variances Parkinson range measure Cross-sectional dispersion-inferred time-series variance adjustments Implied volatility Serial-correlation adjustments in short-horizon models
42. Is A Single correlation Adequate? Recent evidence suggests that correlations change over time, and particularly during market turmoil: If true, this impacts our ability to diversify just when we need it most This means our risk analysis may over-estimate the benefits of diversification when reporting our risk Optimized portfolios may be exposed to higher risk than we thought One simple correction would be to have two sets of risk model factors, variances, and correlations Estimate “normal” and “turbulent” volatilities and correlations for factors Scale exposures to account for the probability that each regime will be visible within the risk forecast horizon
43. implementation Use Viterbi’s algorithm (Viterbi 1967) to detect states. Use Jennrich tests (Jennrich 1970) to decide whether correlation differences between states are significant
44. Implementation 2 Take a factor model First Pass: “clone” the factor set and assume all factors apply in both regimes Estimate factor variances and correlations separately for the two regimes Weight the security exposures to the factors based on the probability of each regime: E.g. Regime 1 90%, Regime 2 10% Single correlation model Market exposure for security X = 1.25 In the two-regime model the Market exposure to “Market Normal” factor is 0.90*1.25, and the exposure to “Market Turbulent” factor is 0.10*1.25 Set correlations between the two models equal to zero across models (i.e. Correlation (“Market Normal”, “Market Turbulent”) = 0) Run risk analysis / optimization as usual
45. limitations Still linear – no accommodation of asymmetric correlation (i.e. UP ≠ DOWN) Same set of factors used Opportunity to estimate completely different betas for each regime – small-sample problems? Pessimistic? If the probability assigned to the turbulent regime is high, risk estimates and optimal portfolios will be very conservative – perhaps too conservative?
46. conclusions Correlation lies at the heart of our favorite tools for data analysis (and risk model estimation) A clear understanding of its behavior is a requirement for good analysis Recent developments aid the incorporation of time-varying correlation and/or non-stationary processes. Recent research strongly indicates that correlation is regime dependent We can incorporate multiple regimes into standard risk models simply, but linearity/symmetry assumptions remain Overall conclusion: read more papers?
47. references Ang, A. and Bekaert, G. (1999) ‘International Asset Allocation with time-varying Correlations’, working paper, Graduate School of Business, Stanford University and NBER. Banerjee, Arindam; Merugu, Srujana; Dhillon, Inderjit S.; Ghosh, Joydeep (2005). "Clustering with Bregman divergences". Journal of Machine Learning Research6: 1705–1749. http://jmlr.csail.mit.edu/papers/v6/banerjee05b.html. Bertero, E. and Mayer, C. (1989) ‘Structure and Performance:Global Interdependence of Stock Markets around the Crash of October 1987’, London, Centre for Economic Policy Research. Chesnay, F. and Jondeau, E. (2001) ‘Does Correlation between Stock Returns really increase during turbulent Periods?’, Economic Notes by Banca Monte dei Paschi di Siena SpA, Vol. 30,No. 1, pp.53–80. Jim Clayton and Greg MacKinnon (2001), "The Time-Varying Nature of the Link Between REIT, Real Estate and Financial Asset Returns"(pdf,6.3M), Journal of Real Estate Portfolio Management, January-March Issue Erb, C.B., Harvey, C.R. and Viskanta, T.E. (1994) ‘Forecasting international Equity Correlations’, Financial Analysts Journal,pp.32–45. Jakulin A & Bratko I (2003a). Analyzing Attribute Dependencies, in N Lavrauad{c}, D Gamberger, L Todorovski & H Blockeel, eds, Proceedings of the 7th European Conference on Principles and Practice of Knowledge Discovery in Databases, Springer, Cavtat-Dubrovnik, Croatia, pp. 229-240 Jennrich R. (1970) ‘An Asymptotic χ2 Test for the Equality of Two Correlation Matrices’, Journal of the American Statistical Association,Vol. 65, No. 330.
48. references R. Kalaba, L. Tesfatsion. Time-varying linear regression via flexible least squares. International Journal on Computers and Mathematics with Applications, 1989, Vol. 17, pp. 1215-1245. Kaplanis, E. (1988) ‘Stability and Forecasting of the Comovement Measures of International Stock Market Returns’, Journal of International Money and Finance, Vol. 7, pp.63–75. Lee, S.B. and Kim, K.J. (1993) ‘Does the October 1987 Crash strengthen the co-Movements among national Stock Markets?’,Review of Financial Economics, Vol. 3, No. 1, pp.89–102. Longin, F. and Solnik, B. (1995) ‘Is the Correlation in International Equity Returns constant: 1960–1990?’, Journal of InternationalMoney and Finance, Vol. 14, No. 1, pp.3–26. Longin, F. and Solnik, B. (2001) ‘Extreme Correlation of International Equity Markets’, The Journal of Finance, Vol. 56, No.2. Mahalanobis, P C (1936). "On the generalised distance in statistics". Proceedings of the National Institute of Sciences of India2 (1): 49–55. http://ir.isical.ac.in/dspace/handle/1/1268. Retrieved 2008-11-05 Nemenman I (2004). Information theory, multivariate dependence, and genetic network inference
49. references Osborne, Jason W. (2003). Effect sizes and the disattenuation of correlation and regression coefficients: lessons from educational psychology. Practical Assessment, Research & Evaluation, 8(11). Qian, Edward and Ronald Hua. “Active Risk and the Information Ratio”, Journal of Investment Management, Third Quarter 2004. Ramchand, L. and Susmel, R. (1998) ‘Volatility and Cross Correlation across major Stock Markets’, Journal of Empirical Finance, Vol. 5, No. 4, pp.397–416. Ratner, M. (1992) ‘Portfolio Diversification and the inter-temporal Stability of International Indices’, Global Finance Journal, Vol. 3, pp.67–78. Sheedy, E. (1997) ‘Is Correlation constant after all? (A Study of multivariate Risk Estimation for International Equities)’, working paper. Sneath PHA & Sokal RR (1973) Numerical Taxonomy. Freeman, San Francisco. Tang, G.: ‘Intertemporal Stability in International Stock Market Relationships: A Revisit’, The Quarterly Review of Economics and Finance, Vol. 35 (Special), pp.579–593.Sharpe W. F., "Morningstar’s Risk-adjusted Ratings", Financial Analysts Journal, July/August 1998, p. 21-33. Viterbi, A. (1967) ‘Error Bounds for convolutional Codes and an asymptotically Optimum Decoding Algorithm’, IEEE Transactions on Information Theory, Vol. 13, No. 2, pp.260–269.Watanabe S (1960). Information theoretical analysis of multivariate correlation, IBM Journal of Research and Development4, 66-82. Yule, 1926. G.U. Yule, Why do we sometimes get nonsense-correlations between time series?. Journal of the Royal Statistical Society 89 (1926), pp. 1–69.