An introductory overview of Correspondence Analysis.

An introductory overview of Correspondence Analysis.

### Statistics

### Views

- Total Views
- 5,027
- Views on SlideShare
- 5,011
- Embed Views

### Actions

- Likes
- 0
- Downloads
- 131
- Comments
- 9

http://www.slideshare.net | 15 |

http://www.lmodules.com | 1 |

### Accessibility

### Categories

### Upload Details

Uploaded via SlideShare as Microsoft PowerPoint

### Usage Rights

© All Rights Reserved

Full NameComment goes here.Gaetan Lion, Quantitative research at a large financial service company Buta... Those are tough questions.I must admit, I have not used Correspondence Analysis (CA) since I made this presentation. Thus, I am not an expert in this discipline. I don't know if you can use CA scores in a logistic regression. But, what you can do is use the underlying independent categorial variable that is within your CA, and include this categorical variable among your independent variables within your logistic regression.

Your second question about multicollinearity, I am unclear how CA would run into multicollinearity issue since it typically deals with the 'correspondence' between just two categorical variables.

butasbutauskasHello,Thanks for presentation, it's very useful.

I have a question concerning correspondence analysis and logistic regression.

If the task is to reduce categorical variable data I have to choose correspondence analysis. However, if I want to use results (scores) from the correspondence analysis to logistic regression; is it possible to do that? And is correspondence analysis can cope with multicollinearity problem?

Gaetan Lion, Quantitative research at a large financial service company PCA is not easy. You probably have to look at different sources and even accept that unless you are a professional mathematician you may not understand the whole thing. I don't. But, I understand it reasonably well enough to interpret its results when I actually restudy the material. The very basic essence of PCA, however, is not so complicated. PCA creates principal components that represent combinations of your independent variables. And, it does so in such a fashion so that such principal components are perpendicular to each other on a scatter plot. Thus, those principal components are not correlated at all. This is how it eliminates multicollinearity between independent variables.Let's say you attempt to model auto sales nationwide. And, you have 15 independent variables that are of either a macroeconomic, demographic, and auto industry nature. Many of them are correlated. So, a multivariate model would suffer from multicollinearity. PCA essentially reduces those 15 variables into three principal components that are orthogonal to each (zero correlation).

In terms of precise calculation of the coordinates on each of those independent variables, I refer you to this book:

http://www.amazon.com/Principal-Components-Analysis-Quantitative-Applications/dp/0803931042/ref=sr_1_3?ie=UTF8&s=books&qid=1271956552&sr=1-3

that I actually studied at one point and reviewed. Also, the Wikipedia entry seemed pretty detailed and informative.

d p, service Yes Guy,Now you're exactly on the point.I need to understand the co-ordinates derived using PCA.I know the applications of it in further analysis but fail to get this basic understanding of how the co-ordinates are obtained.Have searched lot many pages on internet but in vain.Have attended Strang's lessons on linear algebra.But he gets lost towards the end.So unable to get it.Kindly help me out in explaining how to get co-ordinates using PCA.Gaetan Lion, Quantitative research at a large financial service company Part of your question is simple and part is tricky. You can see that on slide 9, the Eigenvalue for the dimension F1 is equal to the sum of the Row Mass x Coordinate^2. So, the first row for the 16-24 year old equals: 15.3% times 0.718^2 = 0.079. You do that for all the age groups. You sum those up. And, you get the 0.095 Eigenvalue for the dimension F1.To fully flesh out your question, we next should explain how PCA calculates the coordinates of F1 (its first principal component). This is pretty challenging. It deals with rotation of the X and Y axis by about 45 degrees in such a manner that the first principal component (F1) explains a majority of the variance between Y the dependent variable and one or more independent variables. By doing so, PCA is the best method to deal with and eliminate multicolinearity between independent variables.

PCA is not something one (me anyway) can clearly explain in a couple of paragraphs. For further understanding, I recommend you study materials on the subject at Wikipedia, Slideshare.net, Google knols, and other similar places. With much studying grasping the basics of PCA is not that difficult. But, given its counterintuitive nature (the principal components are often unexplainable combination of the X variables) it is not so often used outside fairly intensive quantitative circles. Unfortunately, PCA is the key engine for a lot of good stuff including Correspondence Analysis, Factor Analysis, Discriminant Analysis, and is also valuable as a stand alone method.

You may have heard of the Michael Mann's hockey stick controversy (Global Warming). It was solely a PCA application. Essentially, Steve McIntyre, a mathematician, uncovered that Michael Mann had overweighed certain tree ring data as proxy for temperature changes to generate principal components that in turn would generate the hockey stick pick up in temperature during the second half of the 20th century. When McIntyre corrected this overweighting, the long term trend in temperature returned to random and the hockey stick pattern disappeared. That's a dramatic example that suggests understanding PCA is a critical part of modern critical thinking.