Like this presentation? Why not share!

# Data analysis

## by Kinshook Chaturvedi, Manager Signal & Train Control at DELHI METRO RAIL CORPORATION LIMITED on May 19, 2011

• 277 views

### Views

Total Views
277
Views on SlideShare
277
Embed Views
0

Likes
0
3
0

No embeds

### Categories

Uploaded via SlideShare as Microsoft PowerPoint

## Data analysisPresentation Transcript

• DATA ANALYSIS Dr.Ajay Pandit September 14, 2010
• Data Analysis
• Data Analysis involves three stages:
• Testing association between variables
• Determining the degree of association between the variables
• Estimating the values of the variables
• Identifying the technique
• Technique shall largely depend upon the scales of measurement of variables i.e. nominal, ordinal, interval or ratio.
• Bi-variate Analysis
• Independent
• Nom Int/Ratio
• Nom Chi-Sq. χ 2 Discriminant
• Dependant
• Int/Ratio ANOVA Reg/Co-Rel
• CHI-SQUARE ( χ 2 )
• The technique uses data arranged in a contingency table to determine whether two classifications of a population of nominal data are statistically independent .
• This test can also be interpreted as a comparison of two or more populations.
• Example
• The demand for an MBA program’s optional courses and majors is quite variable year over year.
• The research hypothesis is that the academic background of the students (i.e. their undergrad degrees) affects their choice of major.
• A random sample of data on last year’s MBA students was collected and summarized in a contingency table …
• Example The Data MBA Major UG Degree Acntng Finance Mktg Total BA 31 13 16 60 BEng 8 16 7 31 BBA 12 10 17 39 Other 10 5 7 22 Total 61 44 47 152
• Example
• We are interested in determining whether or not the academic background of the students affects their choice of MBA major . Thus our research hypothesis is:
• H 1 : The two variables are dependent
• Our null hypothesis then, is:
• H 0 : The two variables are independent.
• Example
• In this case, our test statistic is:
• (where k is the number of cells in the contingency table, i.e. rows x columns)
• Our rejection region is:
• where the number of degrees of freedom is (r–1)(c–1)
• Example
• In order to calculate our χ 2 test statistic, we need to calculate the expected frequencies for each cell…
• The expected frequency of the cell in row i and column j is:
COMPUTE Row i total x Column j total e ij = Sample size
• Contingency Table Set-up…
• Example COMPUTE e 23 = (31)(47)/152 = 9.59 — compare this to f 23 = 7 Compute expected frequencies… Row i total x Column j total e ij = Sample size MBA Major Undergrad Degree Accounting Finance Marketing Total BA 31 13 16 60 BEng 8 16 31 x 47 152 31 BBA 12 10 17 39 Other 10 5 7 22 Total 61 44 47 152
• Example
• We can now compare observed with expected frequencies…
• and calculate our test statistic:
MBA Major Undergrad Degree Accounting Finance Marketing BA 31 24.08 13 17.37 16 18.55 BEng 8 12.44 16 8.97 7 9.59 BBA 12 15.65 10 11.29 17 12.06 Other 10 8.83 5 6.37 7 6.80
• Example
• We compare χ 2 = 14.70 with:
• Since our test statistic falls into the rejection region, we reject
• H 0 : The two variables are independent.
• in favor of
• H 1 : The two variables are dependent.
• That is, there is evidence of a relationship between undergrad degree and MBA major.
INTERPRET χ 2 = χ 2 = χ 2 = 12.5916 α , ν .05, (4-1)(3-1) .05,6
• Required Condition – Rule of Five…
• In a contingency table where one or more cells have expected values of less than 5 , we need to combine rows or columns to satisfy the rule of five.
• Note: by doing this, the degrees of freedom must be changed as well.
• Type of Measurement Differences between three or more independent groups Interval or ratio One-way ANOVA ANOVA
• SAMPLE RESULTS OF PACKAGE SALES
• ONE WAY ANOVA       1) - k(n df SSW 1 - k df SSB 1 - nk df SST 2 1 1 1 2 2 1 1                    k i n j i ij k i GM i k i n j GM ij X X X X n X X
•                  57 40 - 97 SSB - SST SSW 40 ) 4 3 . 3 4 3 4 6 4 6 . 3 7 SSB 97 4 2 ..... 4 5 4 3 SST 2 2 2 2 2 2 2                   
• PACKAGE SALES DESCRIPTIVE STATISTICS
• PACKAGE SALES ANOVA SUMMARY TABLE
• THANKS