• Like

Loading…

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

Statistical Analysis

  • 1,963 views
Published

Statistical Analysis (5/6). A series of six presentation, introduce scientific research in the areas of cross-cultural, using quantitative approach.

Statistical Analysis (5/6). A series of six presentation, introduce scientific research in the areas of cross-cultural, using quantitative approach.

Published in Education , Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,963
On SlideShare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
114
Comments
0
Likes
3

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Quantitative Research Methodologies (5/6): Statistical Analysis Prof. Dr. Hora Tjitra & Dr. He Quan www.SinauOnline.com
  • 2. Statistical Analysis Process ... Cleaning and organizing the data for analysis, involves checking or logging the data in; checking the data for Data accuracy; entering the data into the computer; etc. Preparation ... Describing the data, are used to describe the basic features of Descriptive the data in a study. They provide simple summaries about the sample and the measures. Statistics Inferential ... Testing Hypotheses and Models, investigate questions, models and hypotheses. In many cases, the conclusions from inferential Statistic statistics extend beyond the immediate data alone. 2
  • 3. Data Preparation Developing a Entering the Data Checking the Data For Data Database into the Logging the Data Accuracy Transformation Structure Computer •mail surveys returns •Are the responses legible/ •variable name •Double Entry. •missing values readable? •variable description •In this procedure •item reversals •Are all important questions •variable format you enter the data •scale totals •coded interview data answered? (number, data, text) once. •categories •instrument/method •Then, you use a •Are the responses complete? •pretest or posttest data of collection special program that •Is all relevant contextual •date collected allows you to enter information included (e.g., •respondent or the data a second • observational data data, time, place) group time and checks •variable location (in each second entry database) against the first. •notes 3
  • 4. Descriptive Statistics (Univariate Analysis ) The Distribution: A summary of the frequency of individual values or ranges of values for a variable Central Tendency: A distribution is an estimate of the “center” of a distribution of values Dispersion: the spread of the values around the central tendency 4
  • 5. The Distribution Frequency distributions can be depicted in two ways …. A table shows an age frequency distribution with five categories of age ranges defined A graph shows the frequency distribution. It is often referred to as a histogram or bar chart 5
  • 6. Central Tendency • The Mean or average is probably the most commonly used method of describing central tendency • The Median is the score found at the exact middle of the set of values • The Mode is the most frequently occurring value in the set of scores Example 2, 3, 5, 3, 4, 3, 6 M=(2+3+5+3+4+3+6)/7=26/7=3.71 Md, 2, 3, 3, 3, 4, 5, 6 2.5----2.83----3.16---3.5 Md=3.33 Mo = 3 6
  • 7. Dispersion • The range is simply the highest value minus the lowest value. In our example distribution, the high value is 36 and the low is 15, so the range is 36 - 15 = 21. Standard Deviation The Standard Deviation shows the relation that set of scores has to the mean of the sample (15,20,21,20,36,15,25,15), M=20.875 15 - 20.875 = -5.875 5.875 * -5.875 = 34.515625 20 - 20.875 = -0.875 -0.875 * -0.875 = 0.765625 21 - 20.875 = +0.125 +0.125 * +0.125 = 0.015625 20 - 20.875 = -0.875 -0.875 * -0.875 = 0.765625 36 - 20.875 = 15.125 15.125 * 15.125 = 228.765625 15 - 20.875 = -5.875 -5.875 * -5.875 = 34.515625 25 - 20.875 = +4.125 +4.125 * +4.125 = 17.015625 -5.875 * -5.875 = 34.515625 15 - 20.875 = -5.875 Sum of Squares 350.875 350.875 / 7 = 50.125 SQRT(50.125) = 7.079901129253 7
  • 8. Normal Distribution Normal distributions: are a family of distributions that have the same general shape. They are symmetric with scores more concentrated in the middle than in the tails (two parameters: the mean (m) and the standard deviation (s). The standard normal distribution is a normal distribution with a mean of 0 and a standard deviation of 1. • (-1,1)------------68.26% • (-1.96, 1.96)---95% • (-2.58, 2.58)---99% 8
  • 9. Hypothesis Testing / significance testing a statistical procedure for discriminating between two statistical hypotheses - the null hypothesis (H0) and the alternative hypothesis ( Ha, often denoted as H1) • The philosophical basis for hypothesis testing lies in the fact that random variation pervades all aspects of life, and in the desire to avoid being fooled by what might be chance variation • The alternative hypothesis typically describes some change or effect that you expect or hope to see confirmed by data. For example, new drug A works better than standard drug B. Or the accuracy of a new weapon targeting system is better than historical standards • The null hypothesis embodies the presumption that nothing has changed, or that there is no difference 9
  • 10. The Statistical Inference Decision Matrix In reality H0 is true,H1 is false H0 is false,H1 is true In reality... There is no relationship; There is no In reality... There is a relationship, There is a difference difference, no gain; Our theory is wrong or gain, Our theory is correct What we conclude We accept H0, reject H1. 1-α (e.g., .95) β(e.g., .20) We say... "There is no THE CONFIDENCE LEVEL TYPE II ERROR relationship"; The odds of saying there is no relationship, The odds of saying there is no relationship, "There is no difference, gain, when in fact there is none difference, gain, when in fact there is one difference, no The odds of correctly not confirming our theory The odds of not confirming our theory when it’s gain"; "Our 95 times out of 100 when there is no effect, we’ll true theory is wrong" say there is none 20 times out of 100, when there is an effect, we’ll say there isn’t We reject H0 ,accept H1. α (e.g., .05) 1-β(e.g., .80) We say... "There is a TYPE I ERROR POWER relationship"; The odds of saying that there is an relationship, "There is a (SIGNIFICANCE LEVEL) The odds of saying there is an relationship, difference, gain, when in fact there is one difference or The odds of confirming our theory correctly difference, gain, when in fact there is not gain"; "Our 80 times out of 100, when there is an effect, we’ll The odds of confirming our theory incorrectly theory is correct" say there is 5 times out of 100, when there is no effect, we’ll say there is on We generally want this to be as large as possible We should keep this small when we can’t afford/ risk wrongly concluding that our program works 10
  • 11. Examples 11
  • 12. Selecting the Appropriate Statistical Test Type of Design Between-subject Within- subject Number of independent Number of groups or levels of variables the independent variables Between-subject Between-subject One independent Two independent variable variable Two groups or two More than two Two More than Two-way analysis levels of the groups or more groups two groups of variance independent than two levels of variable the independent variable Correlated Repeated Independent One-way t-test measures analysis sample analysis of of variance t-test variance 12
  • 13. Correlation Correlation is a measure of the relation between two or more variables The measurement scales used should be at least interval scales, but other correlation coefficients are available to handle other types of data Correlation coefficients can range from -1.00 to +1.00 13
  • 14. The types of correlation Pearson r - Used when data Spearman rs , the rank represents either interval or ratio scales and correlation coefficient,used with ordered or assumes a linear relationship between ranked data. variables. r = (∑ZxZy)/N rs =1-[6(∑D2)/N(N2-1)] subject X Y Zx Zy ZxZy Leadership IQ rank D D2 rank A 1 4 -1.5 -1.5 2.25 1 4 -3 9 B 3 7 -1.0 -1.0 1.00 2 2 0 0 3 1 2 4 C 5 10 -0.5 -0.5 0.25 4 6 -2 4 D 7 13 0 0 0.00 5 3 2 4 E 9 16 0.5 0.5 0.25 6 5 1 1 F 11 19 1.0 1.0 1.00 ∑D=0 ∑D2=22 G 13 22 1.5 1.5 2.25 N=6, rs =1-[6(∑D2)/N(N2-1)]=1-[ 6*22/(6*35)]=1-.63=. 37 N=7 , X=7.0, Y=13.0, Sx=4.0, Sy=6.0, ∑X=49, ∑Y=91, ∑Zx=∑Zy=0.0, ∑ZxZy=7.00, ∑X2=455, ∑Y2=1435 r=(∑ZxZy)/N=7.00/7=1.00 How to judge the significance of r? 14
  • 15. T-Test :Testing for Differences Between Two Groups Group Group 1 2 3 2 2 2 2 3 1 3 2 3 2 3 3 4 2 3 2 4 1 N 10 9 ∑X 20 27 ∑X2 44 85 X 2.0 3.0 SS 4.0 4.0 s 0.6667 0.7071 t = -3.17 df = (10-1)+(9-1)=17 p = .004 Report :t (17) =-3.17, p< .05 15
  • 16. ANOVA A statistical procedure developed by RA Fisher that allows one compare simultaneously the difference between two or more means One-way ANOVA …comparing the effects of different levels of a single independent variable Two-way ANOVA …comparing simultaneously the effects of two independent variables Basic concept Between-Groups Variance… Estimate of variance between group means With-Groups Variance… Estimate of the average variance within each group Homogeneity of variance …the variance of the groups are equivalent to each other 16
  • 17. Work Example Group Group Group Group F max =2.00/1.67 =1.198 1 2 3 4 X 1.1=4 X 1.2=6 X 1.3=4 X 1.4=5 SStotal, SSwithin , SSbetween X 2.1=2 X 2.2=3 X 2.3=5 X 2.4=8 X 3.1=1 X 3.2=5 X 3.3=7 X 3.4=6 The degree of freedom df total =N-1 =16-1 =15 X 4.1=3 X 4.2=4 X 4.3=6 X 4.4=5 df between =k-1= 4-1 =3 X .1 =10 X .2 =18 X .3 =22 X .4 =24 ∑Xij=74 df within =∑(nj-1) = 12 n1 =4 n2 =4 n3 =4 n4 =4 N=16 x=2.5 x=4.5 x=5.5 x=6.0 ∑Xi12 =30 ∑Xi22 =86 ∑Xi32 =126 ∑Xi42 =150 X=4.62 ∑Xij2 =392 F =MS between /MS within s12 =1.67 s22 =1.67 s32 =1.67 s42 =2.00 = 5.48 Result Table Source SS df MS F Between-groups 28.75 3 9.583 5.48 Within-groups 21.00 12 1.750 Total 49.75 15 17
  • 18. MANOVA MANOVA is a technique which determines the effects of independent categorical variables on multiple continuous dependent variables. It is usually used to compare several groups with respect to multiple continuous variables. The main distinction between MANOVA and ANOVA is that several dependent variables are considered in MANOVA. While ANOVA tests for inter-group differences between the mean values of one dependent variable, MANOVA tests for differences between centroid s - the vectors of the mean values of the dependent variables. One important index is interaction 18
  • 19. Work Example 运 瞄 关 Target: T(1), T(2), T(3),T(4) Device: D(1), D(2), D(3) Light : L(1), L(2) Interaction: target*light, target*device, Target*Device*Light Source SS df MS F Sig. of F Target 235.20 3 78.40 74.59 .000 Device 86.47 2 43.23 41.13 .000 Light 76.80 1 76.80 73.07 .000 Target* Device 104.20 6 17.37 16.52 .000 Target* Light 93.87 3 31.29 29.77 .000 * Target Device* Light 174.33 6 29.06 27.65 .000 Model 770.87 21 34.93 34.93 .000 With +Residual 103.00 98 1.05 Total 873.87 119 19
  • 20. Factor Analysis …a statistical technique used to reduce a set of variables to a smaller number of variables or factors. examines the pattern of intercorrelations between the variables, and determines whether there are subsets of variables (or factors) that correlate highly with each other but that show low correlations with other subsets (or factors). Variable: x1, x2, x3, x4, … xm Load Factor: z1 , z2 , z3 , z4 , … zn x1 =b11 z1 +b12 z2 +b13 z3 +...+b1n zn +e1 ..... z1 =a11 x1 +a12 x2+a13 x3+…+a1n xm ....... • Exploratory Factor Analysis • Confirmatory Factor Analysis 20
  • 21. Exploratory Factor Analysis (EFA) Seeks to uncover the underlying structure of a relatively large set of variables. The researcher's a priori assumption is that any indicator may be associated with any factor. This is the most common form of factor analysis. There is no prior theory and one uses factor loadings to intuit the factor structure of the data. Assumptions of Exploratory Factor Analysis No outliers, interval data, linearity, multivariate normality, orthogonality for principal factor analysis 21
  • 22. Work Example 1 2 1. .808 .270 2. .791 .268 3. .736 .226 4. .709 .251 .846 5. .672 -.105 6. .644 -.135 7. .583 .239 8. .081 .846 9. .100 .820 10. .052 .717 .804 11. .158 .702 12. .160 .551 29.999 25.366 29.999 55.365 22
  • 23. Confirmatory Factor Analysis (CFA) Seeks to determine if the number of factors and the loadings of measured (indicator) variables on them conform to what is expected on the basis of pre- established theory. Indicator variables are selected on the basis of prior theory and factor analysis is used to see if they load as predicted on the expected number of factors. A minimum requirement of confirmatory factor analysis is that one hypothesize beforehand the number of factors in the model, but usually also the researcher will posit expectations about which variables will load on which factors There are two approaches to confirmatory factor analysis: The Traditional Method. Confirmatory factor analysis can be accomplished through any general-purpose statistical package which supports factor analysis The SEM Approach. Confirmatory factor analysis can mean the analysis of alternative measurement (factor) models using a structural equation modeling package such as AMOS or LISREL. 23
  • 24. Thanks You An y co m m e n ts & q u e s ti o n s are welcome Contact me at hora_t@sianuonline.com www.SinauOnline.com @ Tjitra, 2010 24