Your SlideShare is downloading.
×

- 1. MRQAP tutorial Fariba Karimi Fariba.karimi@gesis.org 24.11.2015
- 2. Mul=ple Regression Quadra=c Assignment Procedure
- 3. Why regression in network analysis? • Inferen=al sta=s=cs have proven to have very useful applica=ons to social network analysis. At a most general level, the ques=on of "inference" is: how much conﬁdence can I have that the pa6ern I see in the data I've collected is actually typical of some larger popula?on, or that the apparent pa6ern is not really just a random occurrence?
- 4. OLS (Ordinary Least Square) Y = β0 + β1X1 + β2 X2 +...+ε Dependent variable coeﬃcients Explanatory/independent variables residual
- 5. OLS (Ordinary Least Square) - test null-hypothesis à Small p-value suggests that coeﬃcients are signiﬁcant. E.g. p-value 0.01 means that coeﬃcients are signiﬁcant with 99% conﬁdence interval. Y = β0 + β1X1 + β2 X2 +...+ε β = 0
- 6. OLS (Ordinary Least Square) - test • P-value: null-hypothesis à Small p-value suggests that coeﬃcients are signiﬁcant. E.g. p-value 0.01 means that coeﬃcients are signiﬁcant with 99% conﬁdence interval. • R-squared: quan=fying model performance. E.g. R-squared = 0.4 means that the model explains 40% of the varia=ons in the dependent variables. Y = β0 + β1X1 + β2 X2 +...+ε β = 0
- 7. Problem • Observa=ons are not independent of each other. If A are connected to B and B is connected C, it maybe likely that A is connected to C. • Repea=ng observa=ons à error correlated with each other. Observa=ons in rows and columns tend to be highly correlated which inﬂuence the standard error.
- 8. Problem • Repea=ng observa=ons à error correlated with each other. Observa=ons in rows and columns tend to be highly correlated which inﬂuence the standard error.
- 9. What does QAP do? • Essen=ally, what the QAP does is to “scramble” the dependent variable data through several permuta?ons. By taking the data, and “scrambling” it repeatedly, resul=ng in mul=ple random datasets with the dependent variable— and then mul=ple analyses can be performed. • Those datasets and analyses form an empirical sampling distribu=on, and we can compare our coeﬃcient with this sampling distribu?on of coeﬃcients from all the permuted datasets.
- 10. In other words … • We are preserving the dependence within rows / columns—but removing the rela=onship between the dependent and independent variables.
- 11. Friendship, age , class A B C D E F G A 0 1 0 0 1 0 0 B 1 0 3 5 1 4 2 C 0 3 0 4 5 8 10 D 2 5 4 0 0 3 2 E 1 1 3 0 0 2 2 F 0 4 2 3 3 0 1 G 0 2 1 2 2 1 0 A B C D E F G A 0 1 0 2 1 0 0 B 1 0 3 5 1 4 2 C 0 3 0 4 5 8 10 D 2 5 4 0 0 3 2 E 1 1 3 0 0 2 2 F 0 4 2 3 3 0 1 G 0 2 1 2 2 1 0 A B C D E F G A 0 1 0 2 1 0 0 B 1 0 3 5 1 4 2 C 0 3 0 4 5 8 10 D 2 5 4 0 0 3 2 E 1 1 3 0 0 2 2 F 0 4 2 3 3 0 1 G 0 2 1 2 2 1 0 ≈ + Friendship =e Age diﬀerence educa=on
- 12. Friendship, age , class A B C D E F G A 0 1 0 0 1 0 0 B 1 0 3 5 1 4 2 C 0 3 0 4 5 8 10 D 2 5 4 0 0 3 2 E 1 1 3 0 0 2 2 F 0 4 2 3 3 0 1 G 0 2 1 2 2 1 0 A B C D E F G A 0 1 0 2 1 0 0 B 1 0 3 5 1 4 2 C 0 3 0 4 5 8 10 D 2 5 4 0 0 3 2 E 1 1 3 0 0 2 2 F 0 4 2 3 3 0 1 G 0 2 1 2 2 1 0 A B C D E F G A 0 1 0 2 1 0 0 B 1 0 3 5 1 4 2 C 0 3 0 4 5 8 10 D 2 5 4 0 0 3 2 E 1 1 3 0 0 2 2 F 0 4 2 3 3 0 1 G 0 2 1 2 2 1 0 ≈ + Friendship =e Age diﬀerence educa=on
- 13. A B C D E F G A 0 1 0 0 1 0 0 B 1 0 3 5 1 4 2 C 0 3 0 4 5 8 10 D 2 5 4 0 0 3 2 E 1 1 3 0 0 2 2 F 0 4 2 3 3 0 1 G 0 2 1 2 2 1 0 A B C D E F G A 0 1 0 2 1 0 0 B 1 0 3 5 1 4 2 C 0 3 0 4 5 8 10 D 2 5 4 0 0 3 2 E 1 1 3 0 0 2 2 F 0 4 2 3 3 0 1 G 0 2 1 2 2 1 0 A B C D E F G A 0 1 0 2 1 0 0 B 1 0 3 5 1 4 2 C 0 3 0 4 5 8 10 D 2 5 4 0 0 3 2 E 1 1 3 0 0 2 2 F 0 4 2 3 3 0 1 G 0 2 1 2 2 1 0 ≈ + Friendship =e Age diﬀerence educa=on • Permutes dependent variables lots of =me. Measure the sampling distribu=on of the coeﬃcients. • P-value is a propor=on of =mes that the observa=on is Falling outside the sampling distribu=on. QAP procedure
- 14. QAP process – graph representa=on before reshuﬄing ajer
- 15. Available func=ons • UCINET: tools -> tes=ng hypothesis -> dyadic - > regression (QAP) • R: library(statnet) -> netlm • c/python ?
- 16. Example 1 – there is no correla=on
- 17. Example 1 – there is no correla=on
- 18. Example 2 – there is a correla=on
- 19. Example 2 – there is a correla=on
- 20. Recap • QAP is useful when we have dyadic rela=onship in the data. • Use netlm func=on in R for the regression analysis. • Disadvantage: it is slow for large network size
- 21. References • Predic=ng with networks: nonparametric mul=ple regression analysis of dyadic data, D. Krackhardt (1981) • The SNA package, CT Buos (2014) • hop://svitsrv25.epﬂ.ch/R-doc/library/sna/html/ qaptest.html • hop://www.stata.com/mee=ng/1nasug/ simpson.pdf • hop://www.erikgjesqeld.net/uploads/ 3/7/6/8/37685481/ sna_code_(gjesqeld_and_phillips_2013).pdf