Upcoming SlideShare
×

# Statistical Analysis

3,760 views

Published on

Statistical Analysis (5/6). A series of six presentation, introduce scientific research in the areas of cross-cultural, using quantitative approach.

Published in: Education, Technology
1 Comment
7 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• For data visualization,data analyticsand data intelligence tools online training with job placements, register at http://www.todaycourses.com

Are you sure you want to  Yes  No
Views
Total views
3,760
On SlideShare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
217
1
Likes
7
Embeds 0
No embeds

No notes for slide

### Statistical Analysis

1. 1. Quantitative Research Methodologies (5/6): Statistical Analysis Prof. Dr. Hora Tjitra & Dr. He Quan www.SinauOnline.com
2. 2. 2 Statistical Analysis Process ... Describing the data, are used to describe the basic features of the data in a study. They provide simple summaries about the sample and the measures. Data Preparation Descriptive Statistics Inferential Statistic ... Testing Hypotheses and Models, investigate questions, models and hypotheses. In many cases, the conclusions from inferential statistics extend beyond the immediate data alone. ... Cleaning and organizing the data for analysis, involves checking or logging the data in; checking the data for accuracy; entering the data into the computer; etc.
3. 3. 3 Data Preparation Logging the Data Checking the Data For Accuracy Developing a Database Structure Entering the Data into the Computer Data Transformation •mail surveys returns •coded interview data •pretest or posttest data • observational data •Are the responses legible/ readable? •Are all important questions answered? •Are the responses complete? •Is all relevant contextual information included (e.g., data, time, place) •missing values •item reversals •scale totals •categories •Double Entry. •In this procedure you enter the data once. •Then, you use a special program that allows you to enter the data a second time and checks each second entry against the first. •variable name •variable description •variable format (number, data, text) •instrument/method of collection •date collected •respondent or group •variable location (in database) •notes
4. 4. 4 Descriptive Statistics (Univariate Analysis ) The Distribution: A summary of the frequency of individual values or ranges of values for a variable Central Tendency: A distribution is an estimate of the “center” of a distribution of values Dispersion: the spread of the values around the central tendency
5. 5. 5 The Distribution Frequency distributions can be depicted in two ways …. A table shows an age frequency distribution with five categories of age ranges defined A graph shows the frequency distribution. It is often referred to as a histogram or bar chart
6. 6. 6 Central Tendency • The Mean or average is probably the most commonly used method of describing central tendency • The Median is the score found at the exact middle of the set of values • The Mode is the most frequently occurring value in the set of scores 2, 3, 5, 3, 4, 3, 6 M=(2+3+5+3+4+3+6)/7=26/7=3.71 Md, 2, 3, 3, 3, 4, 5, 6 2.5----2.83----3.16---3.5 Md=3.33 Mo = 3 Example
7. 7. 7 Dispersion • The range is simply the highest value minus the lowest value. In our example distribution, the high value is 36 and the low is 15, so the range is 36 - 15 = 21. Standard Deviation The Standard Deviation shows the relation that set of scores has to the mean of the sample (15,20,21,20,36,15,25,15), M=20.875 15 - 20.875 = -5.875 20 - 20.875 = -0.875 21 - 20.875 = +0.125 20 - 20.875 = -0.875 36 - 20.875 = 15.125 15 - 20.875 = -5.875 25 - 20.875 = +4.125 15 - 20.875 = -5.875 5.875 * -5.875 = 34.515625 -0.875 * -0.875 = 0.765625 +0.125 * +0.125 = 0.015625 -0.875 * -0.875 = 0.765625 15.125 * 15.125 = 228.765625 -5.875 * -5.875 = 34.515625 +4.125 * +4.125 = 17.015625 -5.875 * -5.875 = 34.515625 Sum of Squares 350.875 350.875 / 7 = 50.125 SQRT(50.125) = 7.079901129253
8. 8. 8 Normal Distribution The standard normal distribution is a normal distribution with a mean of 0 and a standard deviation of 1. • (-1,1)------------68.26% • (-1.96, 1.96)---95% • (-2.58, 2.58)---99% Normal distributions: are a family of distributions that have the same general shape. They are symmetric with scores more concentrated in the middle than in the tails (two parameters: the mean (m) and the standard deviation (s).
9. 9. 9 Hypothesis Testing / significance testing a statistical procedure for discriminating between two statistical hypotheses - the null hypothesis (H0) and the alternative hypothesis ( Ha, often denoted as H1) • The philosophical basis for hypothesis testing lies in the fact that random variation pervades all aspects of life, and in the desire to avoid being fooled by what might be chance variation • The alternative hypothesis typically describes some change or effect that you expect or hope to see confirmed by data. For example, new drug A works better than standard drug B. Or the accuracy of a new weapon targeting system is better than historical standards • The null hypothesis embodies the presumption that nothing has changed, or that there is no difference
10. 10. 10 The Statistical Inference Decision Matrix In reality What we conclude H0 is true,H1 is false In reality... There is no relationship; There is no difference, no gain; Our theory is wrong H0 is false,H1 is true In reality... There is a relationship, There is a difference or gain, Our theory is correct We accept H0, reject H1. We say... "There is no relationship"; "There is no difference, no gain"; "Our theory is wrong" 1-α (e.g., .95) THE CONFIDENCE LEVEL The odds of saying there is no relationship, difference, gain, when in fact there is none The odds of correctly not confirming our theory 95 times out of 100 when there is no effect, we’ll say there is none β(e.g., .20) TYPE II ERROR The odds of saying there is no relationship, difference, gain, when in fact there is one The odds of not confirming our theory when it’s true 20 times out of 100, when there is an effect, we’ll say there isn’t We reject H0 ,accept H1. We say... "There is a relationship"; "There is a difference or gain"; "Our theory is correct" α (e.g., .05) TYPE I ERROR (SIGNIFICANCE LEVEL) The odds of saying there is an relationship, difference, gain, when in fact there is not The odds of confirming our theory incorrectly 5 times out of 100, when there is no effect, we’ll say there is on We should keep this small when we can’t afford/ risk wrongly concluding that our program works 1-β(e.g., .80) POWER The odds of saying that there is an relationship, difference, gain, when in fact there is one The odds of confirming our theory correctly 80 times out of 100, when there is an effect, we’ll say there is We generally want this to be as large as possible
11. 11. 11 Examples
12. 12. 12 Selecting the Appropriate Statistical Test Type of Design Between-subject Within- subject Number of independent variables Number of groups or levels of the independent variables One independent variable Two independent variable Between-subject Between-subject Two groups More than two groups Two-way analysis of variance Independent sample t-test One-way analysis of variance Two groups or two levels of the independent variable More than two groups or more than two levels of the independent variable Correlated t-test Repeated measures analysis of variance
13. 13. 13 Correlation Correlation is a measure of the relation between two or more variables The measurement scales used should be at least interval scales, but other correlation coefficients are available to handle other types of data Correlation coefficients can range from -1.00 to +1.00
14. 14. 14 The types of correlation subject X Y Zx Zy ZxZy A 1 4 -1.5 -1.5 2.25 B 3 7 -1.0 -1.0 1.00 C 5 10 -0.5 -0.5 0.25 D 7 13 0 0 0.00 E 9 16 0.5 0.5 0.25 F 11 19 1.0 1.0 1.00 G 13 22 1.5 1.5 2.25 N=7 , X=7.0, Y=13.0, Sx=4.0, Sy=6.0, ∑X=49, ∑Y=91, ∑Zx=∑Zy=0.0, ∑ZxZy=7.00, ∑X2=455, ∑Y2=1435 r=(∑ZxZy)/N=7.00/7=1.00 N=7 , X=7.0, Y=13.0, Sx=4.0, Sy=6.0, ∑X=49, ∑Y=91, ∑Zx=∑Zy=0.0, ∑ZxZy=7.00, ∑X2=455, ∑Y2=1435 r=(∑ZxZy)/N=7.00/7=1.00 N=7 , X=7.0, Y=13.0, Sx=4.0, Sy=6.0, ∑X=49, ∑Y=91, ∑Zx=∑Zy=0.0, ∑ZxZy=7.00, ∑X2=455, ∑Y2=1435 r=(∑ZxZy)/N=7.00/7=1.00 N=7 , X=7.0, Y=13.0, Sx=4.0, Sy=6.0, ∑X=49, ∑Y=91, ∑Zx=∑Zy=0.0, ∑ZxZy=7.00, ∑X2=455, ∑Y2=1435 r=(∑ZxZy)/N=7.00/7=1.00 N=7 , X=7.0, Y=13.0, Sx=4.0, Sy=6.0, ∑X=49, ∑Y=91, ∑Zx=∑Zy=0.0, ∑ZxZy=7.00, ∑X2=455, ∑Y2=1435 r=(∑ZxZy)/N=7.00/7=1.00 N=7 , X=7.0, Y=13.0, Sx=4.0, Sy=6.0, ∑X=49, ∑Y=91, ∑Zx=∑Zy=0.0, ∑ZxZy=7.00, ∑X2=455, ∑Y2=1435 r=(∑ZxZy)/N=7.00/7=1.00 r = (∑ZxZy)/N Pearson r - Used when data represents either interval or ratio scales and assumes a linear relationship between variables. Spearman rs , the rank correlation coefficient,used with ordered or ranked data. rs =1-[6(∑D2)/N(N2-1)] IQ rank Leadership rank D D2 1 4 -3 9 2 2 0 0 3 1 2 4 4 6 -2 4 5 3 2 4 6 5 1 1 ∑D=0 ∑D2=22 N=6, rs =1-[6(∑D2)/N(N2-1)]=1-[ 6*22/(6*35)]=1-.63=. 37 N=6, rs =1-[6(∑D2)/N(N2-1)]=1-[ 6*22/(6*35)]=1-.63=. 37 N=6, rs =1-[6(∑D2)/N(N2-1)]=1-[ 6*22/(6*35)]=1-.63=. 37 N=6, rs =1-[6(∑D2)/N(N2-1)]=1-[ 6*22/(6*35)]=1-.63=. 37 How to judge the significance of r?
15. 15. 15 T-Test :Testing for Differences Between Two Groups Group 1 Group 2 3 2 2 1 2 2 3 2 2 1 2 2 3 3 3 3 4 3 4 N ∑X ∑X2 X SS s 10 20 44 2.0 4.0 0.6667 9 27 85 3.0 4.0 0.7071 t = -3.17 df = (10-1)+(9-1)=17 p = .004 Report :t (17) =-3.17, p< .05
16. 16. 16 ANOVA A statistical procedure developed by RA Fisher that allows one compare simultaneously the difference between two or more means One-way ANOVA …comparing the effects of different levels of a single independent variable Two-way ANOVA …comparing simultaneously the effects of two independent variables Between-Groups Variance… Estimate of variance between group means With-Groups Variance… Estimate of the average variance within each group Homogeneity of variance …the variance of the groups are equivalent to each other Basic concept
17. 17. 17 Work Example Group 1 Group 2 Group 3 Group 4 X 1.1=4 X 1.2=6 X 1.3=4 X 1.4=5 X 2.1=2 X 2.2=3 X 2.3=5 X 2.4=8 X 3.1=1 X 3.2=5 X 3.3=7 X 3.4=6 X 4.1=3 X 4.2=4 X 4.3=6 X 4.4=5 X .1 =10 n1 =4 x=2.5 ∑Xi1 2 =30 s1 2 =1.67 X .2 =18 n2 =4 x=4.5 ∑Xi2 2 =86 s2 2 =1.67 X .3 =22 n3 =4 x=5.5 ∑Xi3 2 =126 s3 2 =1.67 X .4 =24 n4 =4 x=6.0 ∑Xi4 2 =150 s4 2 =2.00 ∑Xij=74 N=16 X=4.62 ∑Xij 2 =392 F max =2.00/1.67 =1.198 The degree of freedom df total =N-1 =16-1 =15 df between =k-1= 4-1 =3 df within =∑(nj-1) = 12 F =MS between /MS within = 5.48 Result Table Source SS df MS F Between-groups 28.75 3 9.583 5.48 Within-groups 21.00 12 1.750 Total 49.75 15 SStotal, SSwithin , SSbetween
18. 18. 18 MANOVA MANOVA is a technique which determines the effects of independent categorical variables on multiple continuous dependent variables. It is usually used to compare several groups with respect to multiple continuous variables. The main distinction between MANOVA and ANOVA is that several dependent variables are considered in MANOVA. While ANOVA tests for inter-group differences between the mean values of one dependent variable, MANOVA tests for differences between centroid s - the vectors of the mean values of the dependent variables. One important index is interaction
19. 19. 19 Work Example Source SS df MS F Sig. of F Target 235.20 3 78.40 74.59 .000 Device 86.47 2 43.23 41.13 .000 Light 76.80 1 76.80 73.07 .000 Target* Device 104.20 6 17.37 16.52 .000 Target* Light 93.87 3 31.29 29.77 .000 * Target Device* Light 174.33 6 29.06 27.65 .000 Model 770.87 21 34.93 34.93 .000 With +Residual 103.00 98 1.05 Total 873.87 119 心理运动测验分数与受试者瞄准的目标大小的关系 Target: T(1), T(2), T(3),T(4) Device: D(1), D(2), D(3) Light : L(1), L(2) Interaction: target*light, target*device, Target*Device*Light
20. 20. 20 Factor Analysis …a statistical technique used to reduce a set of variables to a smaller number of variables or factors. examines the pattern of intercorrelations between the variables, and determines whether there are subsets of variables (or factors) that correlate highly with each other but that show low correlations with other subsets (or factors). Variable: x1, x2, x3, x4, … xm Load Factor: z1 , z2 , z3 , z4 , … zn x1 =b11 z1 +b12 z2 +b13 z3 +...+b1n zn +e1 ..... z1 =a11 x1 +a12 x2+a13 x3+…+a1n xm ....... • Exploratory Factor Analysis • Confirmatory Factor Analysis
21. 21. 21 Exploratory Factor Analysis (EFA) Seeks to uncover the underlying structure of a relatively large set of variables. The researcher's a priori assumption is that any indicator may be associated with any factor. This is the most common form of factor analysis. There is no prior theory and one uses factor loadings to intuit the factor structure of the data. Assumptions of Exploratory Factor Analysis No outliers, interval data, linearity, multivariate normality, orthogonality for principal factor analysis
22. 22. 22 Work Example 测量项目 因素1 因素2 信度 1.我对我的家庭生活感到满意 .808 .270 .846 2.我的家庭生活状况很好 .791 .268 .846 3.我现实的家庭生活与理想的家庭生活很接近 .736 .226 .8464.我已得到我在家庭生活中想要的重要东西 .709 .251 .846 5.我不满意夫妻间的交流，我的配偶不理解我 .672 -.105 .846 6.我不喜欢配偶的性格和个人习惯 .644 -.135 .846 7.我非常满意夫妻双方在婚姻中承担的责任 .583 .239 .846 8.总的来说，我对我现在的工作非常满意 .081 .846 .804 9.我从我的工作中感受到快乐 .100 .820 .80410.我对这份工作实在是毫无兴趣可言 .052 .717 .804 11.大部分从事这份工作的人对这份工作很满意 .158 .702 .804 12.这里的员工常常想辞职 .160 .551 .804 因素变异解释量（％） 29.999 25.366 累加因素变异解释量（％） 29.999 55.365 因素命名 家庭 满意感 工作 满意感
23. 23. 23 Confirmatory Factor Analysis (CFA) Seeks to determine if the number of factors and the loadings of measured (indicator) variables on them conform to what is expected on the basis of pre- established theory. Indicator variables are selected on the basis of prior theory and factor analysis is used to see if they load as predicted on the expected number of factors. There are two approaches to confirmatory factor analysis: The Traditional Method. Confirmatory factor analysis can be accomplished through any general-purpose statistical package which supports factor analysis The SEM Approach. Confirmatory factor analysis can mean the analysis of alternative measurement (factor) models using a structural equation modeling package such as AMOS or LISREL. A minimum requirement of confirmatory factor analysis is that one hypothesize beforehand the number of factors in the model, but usually also the researcher will posit expectations about which variables will load on which factors
24. 24. @ Tjitra,2010 Thanks You Any comments & questions are welcome Contact me at hora_t@sianuonline.com 24 www.SinauOnline.com