Application of Statistical Tools for Data Analysis

Application of Statistical Tools
for Data Analysis in Research
Dr Joseph James V.
Professor of Commerce & Management, MSN
Institute of Management & Technology, Chavara.
(Formerly Associate Professor
and Head, P G & Research
Department of Commerce, Fatima
Mata National College
(Autonomous), Kollam).

Probability
 Approaches towards Probability
 Basic terminology
◦ Experiments and events
◦ Mutually exclusive events
◦ Collectively exhaustive events/ sample space
◦ Equally likely events
◦ Independent and Dependant events
◦ Simple and compound events

Theorems of Probability
 Addition Theorem
◦ Mutually exclusive cases
◦ Not mutually exclusive cases
 Multiplication Theorem (Joint Probability)
◦ Under statistical independence
◦ Under statistical dependence
 Conditional probability
 Revision of probability
◦ Bayes’ Theorem
 Mathematical Expectation
 Probability/Theoretical Distribution

Statistical Data
• Measurement Scales
• Nominal
• Ordinal
• Interval
• Scale/Ratio
• Data Types
• Simple, Discrete and Continuous Data
• Temporal/Time series Data
• Cross Sectional Data
• Pooled Data
• Panel Data

A Broad Classification of
Statistical Analysis
 Descriptive Analysis
 Difference Analysis
 Relationship Analysis
 Predictive Analysis
 Analysis through Classification

Descriptive Analysis
 Describe the characteristics of the data/
distribution in a summary form
 Tools.
• Measures of central tendency
 Mean, Median, Mode, Partition values, GM, HM,
Specialised Averages like Index Numbers
• Measures of dispersion
• Skewness /Asymmetry
• Kurtosis / Peakness or flatness

Difference Analysis
 As to whether a statistic is significantly
different from the population parameter
◦ Crosstab and Chi square test in the case of
categorical variables
◦ In case of Ordinal or better:
 Independent samples – Mann Whitney U Test
 Dependant samples – Wilcox sign test
◦ Scale/Ratio
 One variable – t test, one way ANOVA
 Two or more samples – ANOVA, MANOVA,
MANCOVA etc.

Relationship Analysis
 Correlation Analysis
◦ Scatter Diagram
◦ Correlation graph
◦ Karl Pearson coefficient of correlation
◦ Coefficient of determination (R square)
◦ Spearman’s Rank correlation
◦ Partial Correlation
◦ Multiple correlation (Correlation Matrix)

Predictive Analysis
• Simple regression
–Uses of Regression Analysis
–The regression lines
–The regression equations
–Properties of regression coefficients
–Standard error of estimate
–The coefficient of determination (r2)
• Multiple regression analysis
–E(Y) = a + b1X1 + b2X2 + …..bjXj + eij

Interpretation of Regression
Result
• Descriptive Statistics
• Correlations
• Variables Entered/Removed(Stepwise
regression)
• Model Summary(R,R square, Adj R
square& SE
• ANOVA – p value
• Coefficients (Constant, B -
unstandardized, Beta - standardized, SE,
t test and p values , Confidence limits)

Assumptions of Classical Linear
Regression Model (CLRM)
• Assumption of Linearity
Correlation and Scatter plot
• Assumption of Normality
– Histogram and a fitted normal curve or a Q-Q-Plot.
– Box plots
– Descriptive statistics using skewness and kurtosis
– Normality can be checked with a goodness of fit test,
e.g., the Kolmogorov-Smirnov test or by Shapiro Wilk
test or by Jarque – Bera test available in Eviews
• When the data is not normally distributed a non-
linear transformation, e.g., log-transformation
might fix this issue, however it can introduce
effects of multi collinearity.

• Assumption of Stationarity.
– first differencing and Second differencing
– smoothed by performing regression on a deterministic
time scale and generating expected values.
– unit root test - Augmented Dickey Fuller (ADF)
• Assumption of Homoscedasticity (problem of
Hetroscedasticity)
Test that there is no outlier
The data points are independent (No
autocorrelation within the variables) –
Durbin Watson test.
The residuals are normally distributed with mean zero
and have constant variance - Residual statistics and
Histogram of the residuals

 Assumption of Autocorrelation
◦ DW statistics
◦ Correlogram Q statistic – Eviews output
 Autocorrelation and Partial Autocorrelation
 Problem of Multicolleaniarity
◦ Correlation matrix
◦ Tolerance and Variance Inflation Factor
(VIF).

Test for Specification error
• Ramsey’s RESET
–Single test which gives an overall idea on the
presence of specification error arising out of
inadequacy of the model specification,
measurement errors and errors with respect to
normality.
–The model, in order to be precise and suitable,
the coefficient of the fitted values when
regressed on the dependent variable along with
the independent variable should be equal to
zero. Ramsey’s RESET is a test in this direction

Test for Specification error
• Ramsey’s RESET
• Estimate the LRM, Y = α + β1X1+ β2X2+………+
βjXj + ej and save the fitted values.
• Include the combination of the powered values of
predicted (fitted) values of Y2, Y3… ) in the model
and regress again to test whether the coefficient
of fitted values (γ) = 0 against the model:
Y = α + β1X1+ β2X2+………+ βjXj + γ1Y2 + γ2Y3
+ ej
• The significance of γ (coefficients of squired
fitted values, 3rd power of fitted values etc. are
tested using F test for generalization.
• Eviews example

Application of Statistical Tools for Data Analysis

Recommended

Recommended

More Related Content

Similar to Application of Statistical Tools for Data Analysis

Similar to Application of Statistical Tools for Data Analysis (20)

Recently uploaded

Recently uploaded (20)

Application of Statistical Tools for Data Analysis