Application of Statistical Tools for Data Analysis
1. Application of Statistical Tools
for Data Analysis in Research
Dr Joseph James V.
Professor of Commerce & Management, MSN
Institute of Management & Technology, Chavara.
(Formerly Associate Professor
and Head, P G & Research
Department of Commerce, Fatima
Mata National College
(Autonomous), Kollam).
2. Probability
Approaches towards Probability
Basic terminology
◦ Experiments and events
◦ Mutually exclusive events
◦ Collectively exhaustive events/ sample space
◦ Equally likely events
◦ Independent and Dependant events
◦ Simple and compound events
3. Theorems of Probability
Addition Theorem
◦ Mutually exclusive cases
◦ Not mutually exclusive cases
Multiplication Theorem (Joint Probability)
◦ Under statistical independence
◦ Under statistical dependence
Conditional probability
Revision of probability
◦ Bayes’ Theorem
Mathematical Expectation
Probability/Theoretical Distribution
4. Statistical Data
• Measurement Scales
• Nominal
• Ordinal
• Interval
• Scale/Ratio
• Data Types
• Simple, Discrete and Continuous Data
• Temporal/Time series Data
• Cross Sectional Data
• Pooled Data
• Panel Data
5. A Broad Classification of
Statistical Analysis
Descriptive Analysis
Difference Analysis
Relationship Analysis
Predictive Analysis
Analysis through Classification
6. Descriptive Analysis
Describe the characteristics of the data/
distribution in a summary form
Tools.
• Measures of central tendency
Mean, Median, Mode, Partition values, GM, HM,
Specialised Averages like Index Numbers
• Measures of dispersion
• Skewness /Asymmetry
• Kurtosis / Peakness or flatness
7. Difference Analysis
As to whether a statistic is significantly
different from the population parameter
◦ Crosstab and Chi square test in the case of
categorical variables
◦ In case of Ordinal or better:
Independent samples – Mann Whitney U Test
Dependant samples – Wilcox sign test
◦ Scale/Ratio
One variable – t test, one way ANOVA
Two or more samples – ANOVA, MANOVA,
MANCOVA etc.
9. Predictive Analysis
• Simple regression
–Uses of Regression Analysis
–The regression lines
–The regression equations
–Properties of regression coefficients
–Standard error of estimate
–The coefficient of determination (r2)
• Multiple regression analysis
–E(Y) = a + b1X1 + b2X2 + …..bjXj + eij
10. Interpretation of Regression
Result
• Descriptive Statistics
• Correlations
• Variables Entered/Removed(Stepwise
regression)
• Model Summary(R,R square, Adj R
square& SE
• ANOVA – p value
• Coefficients (Constant, B -
unstandardized, Beta - standardized, SE,
t test and p values , Confidence limits)
11. Assumptions of Classical Linear
Regression Model (CLRM)
• Assumption of Linearity
Correlation and Scatter plot
• Assumption of Normality
– Histogram and a fitted normal curve or a Q-Q-Plot.
– Box plots
– Descriptive statistics using skewness and kurtosis
– Normality can be checked with a goodness of fit test,
e.g., the Kolmogorov-Smirnov test or by Shapiro Wilk
test or by Jarque – Bera test available in Eviews
• When the data is not normally distributed a non-
linear transformation, e.g., log-transformation
might fix this issue, however it can introduce
effects of multi collinearity.
12. Assumptions of Classical Linear
Regression Model (CLRM)
• Assumption of Stationarity.
– first differencing and Second differencing
– smoothed by performing regression on a deterministic
time scale and generating expected values.
– unit root test - Augmented Dickey Fuller (ADF)
• Assumption of Homoscedasticity (problem of
Hetroscedasticity)
Test that there is no outlier
The data points are independent (No
autocorrelation within the variables) –
Durbin Watson test.
The residuals are normally distributed with mean zero
and have constant variance - Residual statistics and
Histogram of the residuals
13. Assumptions of Classical Linear
Regression Model (CLRM)
Assumption of Autocorrelation
◦ DW statistics
◦ Correlogram Q statistic – Eviews output
Autocorrelation and Partial Autocorrelation
Problem of Multicolleaniarity
◦ Correlation matrix
◦ Tolerance and Variance Inflation Factor
(VIF).
14. Test for Specification error
• Ramsey’s RESET
–Single test which gives an overall idea on the
presence of specification error arising out of
inadequacy of the model specification,
measurement errors and errors with respect to
normality.
–The model, in order to be precise and suitable,
the coefficient of the fitted values when
regressed on the dependent variable along with
the independent variable should be equal to
zero. Ramsey’s RESET is a test in this direction
15. Test for Specification error
• Ramsey’s RESET
• Estimate the LRM, Y = α + β1X1+ β2X2+………+
βjXj + ej and save the fitted values.
• Include the combination of the powered values of
predicted (fitted) values of Y2, Y3… ) in the model
and regress again to test whether the coefficient
of fitted values (γ) = 0 against the model:
Y = α + β1X1+ β2X2+………+ βjXj + γ1Y2 + γ2Y3
+ ej
• The significance of γ (coefficients of squired
fitted values, 3rd power of fitted values etc. are
tested using F test for generalization.
• Eviews example