SlideShare a Scribd company logo
1 of 83
Download to read offline
Understanding
Research in
SLL
James Dean Brown
Presented by:
Zahra Farajnezhad
Chapter 9;
Statistical Logic
Use a sample study to show how logic is applied in the three most commonly
reported families of statistical studies:
1_ Explore the strength of relationships between variables;
2_ Compare group means;
3_ Compare frequency.
Stage1: Focusing the Study
– Noticing a problem
– Identifying and operationally defining the constructs to be
examined
– Formulating research questions and hypotheses.
Identifying a Problem
– These often derived from classroom teaching experiences or
from reading the literature in the field.
– The researcher must first notice a problem that is worthy of a
solution; one for which the answer will be indirectly useful to
teachers in theory building or more directly for actual language
teaching.
Operationalizing constructs as
variables
– The researcher must attempt to identify all the constructs that are pertinent
to solving the problem at hand.
– It often requires a great deal of hard thought because studies of language
learning and teaching are highly complex and have many variables.
– It can be frustrating because many constructs that are important to
language learning (e.g., students’ motivation and students’ ambition) may
be difficult to measure or operationalize as variables.
– The failure to identify or operationalize variables in the beginning could
threaten the entire logic and framework of a study.
Research Hypotheses
• Hypotheses: is a precise, testable statement of what the researcher(s)
predict will be the outcome of the study.
• This usually involves proposing a possible relationship between two
variables: the independent variable (what the researcher changes)
and the dependent variable (what the research measures).
• In research, there is a convention that the hypothesis is written in
two forms, the null hypothesis, and the alternative hypothesis (called
the experimental hypothesis when the method of investigation is
an experiment).
Directional Hypothesis
• A one-tailed directional hypothesis predicts the nature of the effect of the
independent variable on the dependent variable. E.g., adults will correctly recall
more words than children.
• It can be formulated when there is a good theoretical reason, usually based on
previous research, to hypothesize that the relationship, if there is any, will be in
one direction or the other.
• If we had a correlational study, the directional hypothesis would state whether
we expect a positive or a negative correlation, we are stating how the two
variables will be related to each other. The directional hypothesis can also state
a negative correlation, e.g. the higher the number of face-book friends, the
lower the life satisfaction score “
Nondirectional Hypotheses
• A non-directional (or two tailed hypothesis) simply states that there will be a
difference between the two groups/conditions but does not say which will be
greater/smaller, quicker/slower etc. Using our example above we would say
“There will be a difference between the number of cold symptoms experienced
in the following week after exposure to a virus for those participants who have
been sleep deprived for 24 hours compared with those who have not been sleep
deprived for 24 hours.”
• When the study is correlational, we simply state that variables will be correlated
but do not state whether the relationship will be positive or negative, e.g. there
will be a significant correlation between variable A and variable B.
• A two-tailed non-directional hypothesis predicts that the independent variable
will have an effect on the dependent variable, but the direction of the effect is not
specified.
Null Hypothesis
 The null hypothesis states that there is no relationship between the two
variables being studied (one variable does not affect the other).
 It states results are due to chance and are not significant in terms of
supporting the idea being investigated.
 A null hypothesis is a type of conjecture used in statistics that proposes that
there is no difference between certain characteristics of a population or data-
generating process.
 Hypothesis testing provides a method to reject a null hypothesis within a
certain confidence level. (Null hypotheses cannot be proven, though.)
Null Hypothesis Example:
 Participants who have been deprived of sleep for 24 hours will NOT have more cold
symptoms in the following week after exposure to a virus than participants who have
not been sleep deprived and any difference that does arise will be due to chance alone.
 With a directional correlational hypothesis: There will NOT be a positive correlation
between the number of stress life events experienced in the last year and the number of
coughs and colds suffered, whereby the more life events you have suffered the more
coughs and cold you will have had”
 With a non-directional or two tailed hypothesis: There will be NO difference between
the number of cold symptoms experienced in the following week after exposure to a
virus for those participants who have been sleep deprived for 24 hours compared with
those who have not been sleep deprived for 24 hours.
 For a correlational: there will be NO correlation between variable A and variable B.
Alternative Hypotheses
 The alternative hypothesis describes the population parameters that the
sample data represent if the predicted relationship exists.
 The alternative hypothesis (H 1 ) is the statement that the scores came
from different populations the independent variable significantly affected
the dependent variable.
 “There are no differences between the groups”. This is the hypothesis that
you are testing! Alternative hypothesis (Ha): “There are
effects/differences between the groups”. This is what you expect to find!
Alternative Hypotheses Example
 In a two-tailed test, the null hypothesis states that the population mean equals a
given value. For example, H 0 :  = 100. In a two-tailed test, the alternative
hypothesis states that the population mean does not equal the same given value as
in the null hypothesis. For example, H a :  100. Two-Tailed Hypotheses
 The Null Hypothesis (H o ) states that there is no difference, effect, or correlation
in the population The Null Hypothesis (H o ) states that there is no difference,
effect, or correlation in the population H o is assumed to be true unless there is
enough evidence to reject it. H o is assumed to be true unless there is enough
evidence to reject it. Burden of proof on the researcher Burden of proof on the
researcher The researcher’s hypothesis (Alternative Hypothesis, H A ) is only tested
indirectly.
Stage 2: Sampling
Sampling is a process used in statistical analysis in which a
predetermined number of observations are taken from a
larger population. The methodology used to sample from a larger
population depends on the type of analysis being performed, but it
may include simple random sampling or systematic sampling.
1_ Random Sampling
2_ Systematic Sampling
3_ Stratified Random Sampling
Random Sampling
 With random sampling, every item within a population has an equal
probability of being chosen. It is the furthest removed from any
potential bias because there is no human judgement involved in
selecting the sample.
 E.g., a random sample may include choosing the names of 25
employees out of a hat in a company of 250 employees.
The population is all 250 employees, and the sample is random
because each employee has an equal chance of being chosen.
Systematic Sampling
– Systematic sampling begins at a random starting point within the population
and uses a fixed, periodic interval to select items for a sample. The sampling
interval is calculated as the population size divided by the sample size.
Despite the sample population being selected in advance, systematic
sampling is still considered random if the periodic interval is determined
beforehand and the starting point is random.
– Systematic sampling is simpler and more straightforward than random
sampling. It can also be more conducive to covering a wide study area. On
the other hand, systematic sampling introduces certain arbitrary parameters
in the data. This can cause over- or under-representation of particular
patterns.
Systematic Sampling
 Because of its simplicity, systematic sampling is popular with
researchers.
 Other advantages of this methodology include eliminating the
phenomenon of clustered selection and a low probability of
contaminating data.
 Disadvantages include over- or under-representation of
particular patterns and a greater risk of data manipulation.
Stratified Random Sampling
 Stratified random sampling allows researchers to obtain a sample
population that best represents the entire population being studied
by dividing it into subgroups called strata.
 This method of statistical sampling, however, cannot be used in
every study design or with every data set.
 Stratified random sampling differs from simple random sampling,
which involves the random selection of data from an entire
population, so each possible sample is equally likely to occur.
Stratified Random Sampling
– Stratified random sampling involves first dividing a population into
subpopulations and then applying random sampling methods to each
subpopulation to form a test group. A disadvantage is when researchers can't
classify every member of the population into a subgroup.
– This is different from simple random sampling, which involves the random
selection of data from the entire population so that each possible sample is
equally likely to occur. In contrast, stratified random sampling divides the
population into smaller groups, or strata, based on shared characteristics. A
random sample is taken from each stratum in direct proportion to the size of
the stratum compared to the population.
Sampling Distribution
 A sampling distribution is a statistic that is arrived out through
repeated sampling from a larger population.
 It describes a range of possible outcomes that of a statistic, such
as the mean or mode of some variable, as it truly exists a
population.
 The majority of data analyzed by researchers are actually drawn
from samples, and not populations.
Sampling Distribution
– A sampling distribution is a probability distribution of a statistic obtained
from a larger number of samples drawn from a specific population. The
sampling distribution of a given population is the distribution of
frequencies of a range of different outcomes that could possibly occur for
a statistic of a population.
– In statistics, a population is the entire pool from which a
statistical sample is drawn. A population may refer to an entire group of
people, objects, events, hospital visits, or measurements. A population
can thus be said to be an aggregate observation of subjects grouped
together by a common feature.
Stage 3: Setting up Statistical Decisions
• On the basis of research hypotheses, the researcher must:
1: Select the correct statistical procedures
2: Formulate statistical hypotheses
3: Select an alpha decision level
1: Choosing the correct statistics
 The choose will be based on clear thinking about:
1: How many variables there are;
2: Which variables are dependent, independent, moderator or control
variables;
3: Which scales (nominal, ordinal, or interval) are used for each.
 Then the researcher will have to decide the appropriateness of the
statistics that he/she used.
Statistical Hypotheses
– We can formulate the following shorthand versions:
H₀ r = 0 r equals zero
H₁ r > 0 r is greater than zero
H₂ r < 0 r is less than zero
H₃ r # 0 r does not equals zero
Statistical Hypotheses
• A population is the entire group that is of interest in a study
• A sample is a subgroup taken from that population to represent it.
• When calculations are made to describe a sample, they are called
statistics. (page,114, Brown)
• If the same calculations were actually done for the entire
population, they would be called parameters. These parameters
would give the best picture of what is going on in a given
population.
The Conceptual Differences
Between Statistics and Parameters
 A statistic and a parameter are very similar. The difference between
a statistic and a parameter is that statistics describe a sample.
A parameter describes an entire population.
 We use different notation for parameters and statistics:
 The statistical symbols are usually Roman letters (e.g., X and SD
for the sample mean and the standard deviation)
 The parameters are symbolized by Greek letters (μ and δ for the
population mean and the standard deviation)
Alpha Decision Level
– Typically researcher sets alpha at 0.05.However there are
instances when researcher may decide to use a more stringent
level of alpha , e.g., Alpha 0.05 indicates researcher willing to take
up to 5% risk of making an error (Type I error) when deciding
statistical significance. Alpha 0.01 indicates researcher willing to
take up to 1% risk of Type I error Type I error occurs when a
researcher rejects the null hypothesis when in fact it is true in the
population.
Do you reject or fail to reject the
null hypothesis?
 The decision is made by examining the p level furnished by
the computer. Example: if the alpha level is set at .05,
inferential statistics with p levels of .05 or less are
statistically significant. When this is the case , the H₀ is
rejected and Hḁ is supported.
How strong does the evidence
have to be to reject the Null?
–The researcher must set a criterion. This is the significance level, or alpha ( 
). The researcher must set a criterion. This is the significance level, or alpha ( 
). The conventional alpha level is.05. The conventional alpha level is.05. We are
conservative about rejecting H₀. We are conservative about rejecting H₀.
When testing for significance, we calculate a test statistic. When testing for
significance, we calculate a test statistic. The test statistic allows us to
determine the probability of obtaining our results under the assumption that H o
is true. The test statistic allows us to determine the probability of obtaining our
results under the assumption that H o is true. If this probability is small enough,
then H o is probably not true, so we should reject it.
Determining Significance
• If the probability is lower than our significance level, we Reject H₀ (p
<.05). If the probability is lower than our significance level, we Reject
H₀ (p <.05). If the probability is not lower than our significance level,
we Fail to Reject H₀ (p >.05). If the probability is not lower than our
significance level, we Fail to Reject H₀ (p >.05). H₀ is never
“accepted” or “proven.” H₀ is never “accepted” or “proven.”
• Decide what p-value would be “too unlikely” This threshold is called
the alpha level. When a sample statistic surpasses this level, the result
is said to be significant. Typical alpha levels are .05 and .01.
Determining Significance
 Significance as a Probability Game
 There are four possible outcomes in significance test, based on two
dimensions:
 The researcher’s decision about H₀. The researcher’s decision about
H₀. Whether H₀ is really true or false. Whether H₀ is really true or
false. The probability of each outcome can be determined. The
probability of each outcome can be determined.
Stage 4: Necessary Consideration
 Four types of information must need to be found:
1: The observed statistics (those that were actually calculated)
2: Whether the assumption underlying those statistics were met
3: The degree of freedom involved for each statistic.
4: The critical values for each statistics
Observed Statistics
 Whether the results are a straightforward Pearson r or a
complicated looking analysis of variance table based on F
ratios (Chapter 11, Brown), the researcher does a lot of
adding, subtracting, dividing, and multiplying to get there.
Often, he/she does so with a mainframe computer and using
different statistical software like the Statistical Package for the
Social Sciences (1975).
 The result of the calculations will be observed statistics.
Assumptions
 An assumption is a precondition that must be met for the particular
statistical analysis to be accurately applied.
 E.g., one of the assumptions that underlies the proper application of the
Pearson product-moment correlation coefficient (r) is that each set of scores
is on an interval scale. The scales involved must not be nominal or ordinal.
If they are other than interval scales, other statistics may be applied.
 If the data don’t meet the assumptions of the procedure perfectly, we will
have only a negligible amount of error in the inferences we draw.
Degrees of Freedom
 It is a mathematical equation used primarily in statistics, degrees of
freedom can be used in statistics to determine if results are
significant.
 The degrees of freedom (df) are simply n-1.
– The degrees of freedom can be calculated to help ensure the
statistical validity of chi-square tests, t-tests and even the more
advanced f-tests. These tests are commonly used to compare
observed data with data that would be expected to be obtained
according to a specific hypothesis.
Degrees of Freedom
 Because degrees of freedom calculations identify how many values
in the final calculation are allowed to vary, they can contribute to
the validity of an outcome. These calculations are dependent upon
the sample size, or observations, and the parameters to be
estimated, but generally, in statistics, degrees of freedom equal the
number of observations minus the number of parameters. This
means there are more degrees of freedom with a larger sample size.
Formula for Degrees of Freedom
 df = N-1 (Where N is the number of values in the data set (sample size). Take a
look at the sample computation.)
 If there is a data set of 4, (N=4).
 Call the data set X and create a list with the values for each data. For this
example data, set X includes: 15, 30, 25, 10
 This data set has a mean, or average of 20. Calculate the mean by adding the
values and dividing by N: (15+30+25+10)/4= 20
 Using the formula, the degrees of freedom would be calculated as df = N-1: In
this example, it looks like, df = 4-1 = 3
 This indicates that, in this data set, three numbers have the freedom to vary as
long as the mean remains 20.
Critical Value
 The critical value is the value that the researcher might expect to
observe in the sample simply because of chance. In most cases, an
observed statistic must exceed the critical value to reject the null
hypothesis and thereby accept one of the alternative hypotheses.
 This critical value will vary from study to study even for the same
statistic because the degrees of freedom will usually vary, largely
owing to differences in the size of samples.
Stage 5: Statistical Decisions
1: Hypothesis testing (not to be confused with the common
meaning of “testing”);
2: The careful interpretation of the results;
3: An awareness of the potential pitfalls for a particular statistical
test.
Hypothesis testing
o The formal procedure statisticians follow to determine whether a
certain hypothesis is valid or not is referred to as hypothesis testing.
o By using hypothesis testing, statisticians can validate statements such
as, 'This washer only needs one gallon of water to wash a large load of
clothes.'
 Hypothesis testing is a 4-step process:
Step 1: Write the hypothesis.
Step 2: Create an analysis plan.
Step 3: Analyze the data.
Step 4: Interpret the results.
Interpretation of the Results
 Whenever we encounter a research finding based on the interpretation of a p value
from a statistical test, whether we realize it or not, we are discussing the result of a
formal hypothesis test. This is true irrespective of whether the test involves
comparisons of means, regression results or other types of statistical tests. As
readers of research, it is important to understand the underlying principles of
hypothesis testing, so that when faced with statistical results, we reach the right
conclusions and make good decisions about which findings are robust enough to
be translated into clinical practice.
 A result is statistically significant when the p-value is less than alpha. This
signifies a change was detected: that the default hypothesis can be rejected. If p-
value > alpha: Fail to reject the null hypothesis (i.e. not significant result). If p-
value <= alpha: Reject the null hypothesis (i.e. significant result).
Chapter 10
Correlation
Stage 1:
Focusing the Study
– Identifying a problem
– Operationalizing Variables
– Research Hypotheses
Stage 2: Sampling
Stage 3: Setting up Statistical Decisions
– Choosing the correct statistic
– Statistical hypotheses
– Alpha decision level
Stage 4: Necessary Calculations
 Observed statistics
 Assumptions:
1: Independence
2: Normal Distribution
3: Interval Scales
4: Linear Relationship
 Degrees of Freedom
 Critical Values
Statistical Decisions
o Hypothesis Testing
o Interpretation of Results
o Potential Pitfalls
1: Restriction of Range
2: Skewedness
3: Casuality
Biserial Correlation
 The biserial correlation is a correlation between on one hand, one
or more quantitative variables, and on the other hand one or more
binary variables. It was introduced by Pearson (1909).
 The biserial correlation coefficient varies between -1 and 1. 0
corresponds to no association (the means of the quantitative
variable for the two categories of the qualitative variable are
identical).
Biserial Correlation
– For the two-tailed test, the null H0 and alternative Ha hypotheses are as follows:
 H0 : r = 0
 Ha : r ≠ 0
– In the left one-tailed test, the following hypotheses are used:
 H0 : r = 0
 Ha : r < 0
– In the right one-tailed test, the following hypotheses are used:
 H0 : r = 0
 Ha : r > 0
Correlation Coefficient
 Correlation coefficients are used to measure how strong a relationship is
between two variables.
 There are several types of correlation coefficient, but the most popular
is Pearson’s. Pearson’s correlation (called Pearson’s R) is a correlation
coefficient commonly used in linear regression.
 If you’re starting out in statistics, you’ll probably learn about
Pearson’s R first.
 In fact, when anyone refers to the correlation coefficient, they are usually
talking about Pearson’s.
Correlation Coefficient
 Correlation coefficient formulas are used to find how strong a relationship is
between data. The formulas return a value between -1 and 1, where:
 1 indicates a strong positive relationship.
 -1 indicates a strong negative relationship.
 A result of zero indicates no relationship at all
 A correlation coefficient of 1 means that for every positive increase in one variable,
there is a positive increase of a fixed proportion in the other. For example, shoe sizes
go up in (almost) perfect correlation with foot length.
 A correlation coefficient of -1 means that for every positive increase in one variable,
there is a negative decrease of a fixed proportion in the other. For example, the
amount of gas in a tank decreases in (almost) perfect correlation with speed.
 Zero means that for every increase, there isn’t a positive or negative increase. The
two just aren’t related.
 The absolute value of the correlation coefficient gives us the relationship strength.
The larger the number, the stronger the relationship. For example, |-.75| = .75, which
has a stronger relationship than .65.
Correlation Coefficient
Correlation Coefficient
Kendall Tau
 Kendall’s Tau is a non-parametric measure of relationships between columns
of ranked data. The Tau correlation coefficient returns a value of 0 to 1, where:
 0 is no relationship,
 1 is a perfect relationship.
 A quirk of this test is that it can also produce negative values (i.e. from -1 to 0).
Unlike a linear graph, a negative relationship doesn’t mean much with ranked
columns (other than you perhaps switched the columns around), so just remove
the negative sign when you’re interpreting Tau.
Kendall Tau
 Several version’s of Tau exist.
 Tau-A and Tau-B are usually used for square tables (with equal columns
and rows). Tau-B will adjust for tied ranks. Tau-C is usually used for
rectangular tables. For square tables, Tau-B and Tau-C are essentially the
same.
 Most statistical packages have Tau-B built in, but you can use the following
formula to calculate it by hand:
 Kendall’s Tau = (C – D / C + D)
Where C is the number of concordant pairs and D is the number
of discordant pairs.
Kendall W
– Kendall's W (known as Kendall's coefficient of concordance) is a non-
parametric statistic. It is a normalization of the statistic of the Friedman test, and
can be used for assessing agreement among raters. Kendall's W ranges from 0 (no
agreement) to 1 (complete agreement).
 E.g., a number of people have been asked to rank a list of political concerns, from
most important to least important. Kendall's W can be calculated from these data.
If the test statistic W is 1, then all the survey respondents have been unanimous,
and each respondent has assigned the same order to the list of concerns. If W is 0,
then there is no overall trend of agreement among the respondents, and their
responses may be regarded as essentially random. Intermediate values
of W indicate a greater or lesser degree of unanimity among the various responses.
Multiple Regression
– Multiple regression is an extension of simple linear regression. It is used
when we want to predict the value of a variable based on the value of two or
more other variables. The variable we want to predict is called the dependent
variable (or sometimes, the outcome, target or criterion variable).
– Multiple regression also allows you to determine the overall fit (variance
explained) of the model and the relative contribution of each of the
predictors to the total variance explained. For example, you might want to
know how much of the variation in exam performance can be explained by
revision time, test anxiety, lecture attendance and gender "as a whole", but
also the "relative contribution" of each independent variable in explaining
the variance.
Standard Partial Regression
– the number of standard deviations that YY would change for every
one standard deviation change in X_1X​1​​, if all the
other XX variables could be kept constant.
– When the purpose of multiple regression is prediction, the
important result is an equation
containing partial regression coefficients (slopes).
– The magnitude of the partial regression coefficient depends on
the unit used for each variable.
– When the purpose of multiple regression is understanding functional
relationships, the important result is an equation
containing standard partial regression coefficients, like this:
 Where b'_1b​1​′​​ is
the standard partial regression coefficient of yy on X_1X​1​​.
 The magnitude of the standard partial regression coefficients tells you
something about the relative importance of different variables; XX variables
with bigger standard partial regression coefficients have a stronger
relationship with the YY variable.
Standard Partial Regression
Linear Regression
– Linear regression, while a useful tool, has significant limits. As it’s name implies,
it can’t easily match any data set that is non-linear. It can only be used to make
predictions that fit within the range of the training data set. And, most
importantly for this article, it can only be fit to data sets with a single dependent
variable and a single independent variable.
– The general form of the equation for linear regression is: y = B * x + A
– where y is the dependent variable, x is the independent variable, and A and B are
coefficients dictating the equation. The difference between the equation for linear
regression and the equation for multiple regression is that the equation for
multiple regression must be able to handle multiple inputs, instead of only the
one input of linear regression.
Heteroscedasticity
– Heteroscedasticity is a hard word to pronounce, but it doesn't need to be a
difficult concept to understand. Put simply, heteroscedasticity refers to the
circumstance in which the variability of a variable is unequal across the
range of values of a second variable that predicts it.
– A scatterplot of these variables will often create a cone-like shape, as the
scatter (or variability) of the dependent variable (DV) widens or narrows as
the value of the independent variable (IV) increases. The inverse of
heteroscedasticity is homoscedasticity, which indicates that a DV's
variability is equal across values of an IV.
Heteroscedasticity
 Plot with random data showing
heteroscedasticity
 In statistics, is heteroscedastic (or
heteroskedastic; if the variability
of the random disturbance is
different across elements of the
vector. Variability could be
quantified by the variance or any
other measure of statistical
dispersion. Heteroscedasticity is the
absence of homoscedasticity. A
typical example is the set of
observations of income in different
cities.
Heteroscedasticity
• The existence of heteroscedasticity is a major concern in regression
analysis and the analysis of variance, as it invalidates statistical tests of
significance that assume that the modelling errors all have the same
variance. While the ordinary least squares estimator is still unbiased in the
presence of heteroscedasticity, it is inefficient and generalized least
squares should be used instead.
• Because heteroscedasticity concerns expectations of the second moment of
the errors, its presence is referred to as misspecification of the second order.
Multicollinearity
– Multicollinearity is the occurrence of high intercorrelations among two or
more independent variables in a multiple regression model. Multicollinearity
can lead to skewed or misleading results when a researcher or analyst
attempts to determine how well each independent variable can be used most
effectively to predict or understand the dependent variable in a statistical
model.
– Multicollinearity can lead to wider confidence intervals that produce less
reliable probabilities in terms of the effect of independent variables in a
model. That is, the statistical inferences from a model with multicollinearity
may not be dependable.
 KEY TAKEAWAYS
 Multicollinearity is a statistical concept where independent
variables in a model are correlated.
 Multicollinearity among independent variables will result in less
reliable statistical inferences.
 It is better to use independent variables that are not correlated or
repetitive when building multiple regression models that use two
or more variables.
Multicollinearity
Data Transformation
 In statistics:
 Data transformation is the application of
a deterministic mathematical function to each point in a data set—that is, each
data point zi is replaced with the transformed value yi = f (zi), where f is a
function.
 Transforms are usually applied so that the data appear to more closely meet the
assumptions of a statistical inference procedure that is to be applied, or to
improve the interpretability or appearance of graphs.
Phi Coefficient
– The Phi Coefficient is a measure of association between two binary
variables (i.e. living/dead, black/white, success/failure). It is also called the Yule
phi or Mean Square Contingency Coefficient and is used for contingency
tables when:
– At least one variable is a nominal variable.
– Both variables are dichotomous variables.
A simple contingency table. Image: Michigan Dept. of Agriculture
– The phi coefficient is a symmetrical statistic, which means
the independent variable and dependent variables are interchangeable.
The interpretation for the phi coefficient is similar to the Pearson
Correlation Coefficient. The range is from -1 to 1, where:
 0 is no relationship.
 1 is a perfect positive relationship: most of your data falls along the
diagonal cells.
 -1 is a perfect negative relationship: most of your data is not on the
diagonal.
Phi Coefficient
Point-Biserial Correlation
– A point-biserial correlation is used to measure the strength and direction of
the association that exists between one continuous variable and one
dichotomous variable. It is a special case of the Pearson’s product-moment
correlation, which is applied when you have two continuous variables,
whereas in this case one of the variables is measured on a dichotomous
scale.
– E.g., you could use a point-biserial correlation to determine whether there
is an association between salaries, measured in dollars, and gender (i.e.,
your continuous variable would be "salary" and your dichotomous
variable would be "gender", which has two categories: "males" and
"females").
Spearman rho
Spearman Rank Correlation
 The Spearman rank correlation coefficient, rs, is
the nonparametric version of the Pearson correlation coefficient.
Your data must be ordinal, interval or ratio. Spearman’s returns a
value from -1 to 1, where:
+1 = a perfect positive correlation between ranks
-1 = a perfect negative correlation between ranks
0 = no correlation between ranks.
– The formula for the Spearman rank correlation coefficient when
there are no tied ranks is:
Spearman rho
Spearman Rank Correlation
Tetrachoric Correlation
– Tetrachoric correlation is used to measure rater agreement for binary
data; Binary data is data with two possible answers—usually right or
wrong. The tetrachoric correlation estimates what the correlation
would be if measured on a continuous scale. It is used for a variety
of reasons including analysis of scores in Item Response Theory
(IRT) and converting comorbity statistics to correlation coefficients.
This type of correlation has the advantage that it’s not affected by
the number of rating levels, or the marginal proportions for rating
levels.
– The term “tetrachoric correlation” comes from the tetrachoric
series, a numerical method used before the advent of
computers. While it’s more common to estimate correlations
with methods like maximum likelihood estimation, there is a
basic formula you can use.
Tetrachoric Correlation
The two main assumptions are:
 The underlying variables come from a normal distribution. With only two
variables, this is impossible to test. You should, therefore, have a good
theoretical reason for using this particular type of correlation; in other words,
you might know that the type of data you are dealing with tends to follow a
normal distribution most of the time. Rating errors should also follow a
normal distribution.
 There is a latent continuous scale underneath your binary data. In other
words, the trait you are measuring should be continuous and not discrete.
 In addition, you may want to make sure that errors are independent between
raters and cases and the variance for errors is homogeneous across levels of
the independent variable.
Curvilinear
– Curvilinear regression analysis fits curves to data instead of the straight
lines you see in linear regression. Technically, it’s a catch all term for any
regression that involves a curve. For example, quadratic regression and cubic
regression. About the only type that isn’t includes in this catch-all definition
is simple linear regression.
Standard Error of Estimate (SEE)
– A linear regression gives us a best-fit line for a scatterplot of data. The standard
error of estimate (SEE) is one of the metrics that tells us about the fit of the line to
the data. The SEE is the standard deviation of the errors (or residuals).
– The standard error of estimate tells you approximately how large the prediction
errors (residuals) are for your data set, in the same units as Y. How well can you
predict Y? The answer is, to within about Se above or below.16
– Since you usually want your forecasts and predictions to be as accurate as
possible, you would be glad to find a small value for Se. You can interpret Se as a
standard deviation in the sense that, if you have a normal distribution for the
prediction errors, then you will expect about two-thirds of the data points to fall
within a distance Se either above or below the regression line.
Pearson r
 In statistics, the Pearson correlation coefficient (PCC), also referred to
as Pearson's r, the Pearson product-moment correlation
coefficient (PPMCC), or the bivariate correlation.
 Pearson's correlation coefficient is the covariance of the two variables
divided by the product of their standard deviations. The form of the
definition involves a "product moment", that is, the mean (the first
moment about the origin) of the product of the mean-adjusted random
variables; hence the modifier product-moment in the name.
 The correlation coefficient ranges from −1 to 1. A value of 1 implies that a linear
equation describes the relationship between X and Y perfectly, with all data points
lying on a line for which Y increases as X increases. A value of −1 implies that all
data points lie on a line for which Y decreases as X increases. A value of 0 implies
that there is no linear correlation between the variables.
 More generally, note that (Xᵢ − X)(Yᵢ − Y) is positive if and only if Xᵢ and Yᵢ lie on
the same side of their respective means. Thus the correlation coefficient is positive
if Xᵢ and Yᵢ tend to be simultaneously greater than, or simultaneously less than, their
respective means. The correlation coefficient is negative (anti-correlation) if Xᵢ and
Yᵢ tend to lie on opposite sides of their respective means. Moreover, the stronger is
either tendency, the larger is the absolute value of the correlation coefficient.
Pearson r
One-Tailed Decisions
One tailed test. Although this picture is shaded on the left, it’s mirror
image (i.e. where it’s shaded on the right) would also be a one tailed
test.
 A one-tailed requires a smaller sample size to achieve the same
effect with the same power.
 A one-tailed is a statistical test in which the critical area of a
distribution is one-sided so that it is either greater than or less than
a certain value, but not both. If the sample being tested falls into
the one-sided critical area, the alternative hypothesis will be
accepted instead of the null hypothesis.
 A one-tailed test is also known as a directional hypothesis or
directional test.
One-Tailed Decisions
 A one-tailed test is a statistical hypothesis test set up to show that the
sample mean would be higher or lower than the population mean, but not
both.
 When using a one-tailed test, the analyst is testing for the possibility of
the relationship in one direction of interest, and completely disregarding
the possibility of a relationship in another direction.
 Before running a one-tailed test, the analyst must set up a null hypothesis
and an alternative hypothesis and establish a probability value (p-value).
One-Tailed Decisions
Two-Tailed Decisions
– In statistics, a two-tailed test is a method in which the critical area
of a distribution is two-sided and tests whether a sample is greater
than or less than a certain range of values. It is used in null-
hypothesis testing and testing for statistical significance. If the
sample being tested falls into either of the critical areas, the
alternative hypothesis is accepted instead of the null hypothesis.
The two-tailed test gets its name from testing the area under both
tails of a normal distribution, although the test can be used in other
non-normal distributions.
 In statistics, a two-tailed test is a method in which the critical area of a
distribution is two-sided and tests whether a sample is greater or less than
a range of values.
 It is used in null-hypothesis testing and testing for statistical significance.
 If the sample being tested falls into either of the critical areas, the
alternative hypothesis is accepted instead of the null hypothesis.
 By convention two-tailed tests are used to determine significance at the
5% level, meaning each side of the distribution is cut at 2.5%.
Two-Tailed Decisions
Summary
 State your hypotheses: You are not attempting to prove your alternative
hypotheses. You are testing the null hypothesis. If you reject the null hypothesis,
then you are left with support for the alternative(s).
 Set your decision criteria. Your alpha level will tell you what to decide. Reject
the null hypothesis. Fail to reject the null hypothesis.
 Describe the data you collected from the sample Inferential Statistics; Making
inferences about the population from the data collected from the sample;
Generalize results from study to the population.
Thank You

More Related Content

What's hot

Direct method in applied linguistics
Direct method in applied linguisticsDirect method in applied linguistics
Direct method in applied linguisticsAfshan Khalid
 
Testing for Language Teachers Arthur Hughes
Testing for Language TeachersArthur HughesTesting for Language TeachersArthur Hughes
Testing for Language Teachers Arthur HughesRajputt Ainee
 
Language and Gender (Sociolinguistics)
Language and Gender (Sociolinguistics) Language and Gender (Sociolinguistics)
Language and Gender (Sociolinguistics) Zubair A. Bajwa
 
Politeness (Pragmatics)
Politeness (Pragmatics)Politeness (Pragmatics)
Politeness (Pragmatics)Humaira Flair
 
SLA-Inter-language presentation
SLA-Inter-language presentationSLA-Inter-language presentation
SLA-Inter-language presentationamorenaz
 
The Role of context (Discourse Analysis)
The Role of context (Discourse Analysis)The Role of context (Discourse Analysis)
The Role of context (Discourse Analysis)Faiza Sandhu
 
Second Language Acquisition (SLA)
Second Language Acquisition (SLA) Second Language Acquisition (SLA)
Second Language Acquisition (SLA) Nurul Hasanah Moslem
 
Learning strategies
Learning strategies Learning strategies
Learning strategies Irina K
 
Factors affecting second language learning strategies
Factors affecting second language learning strategiesFactors affecting second language learning strategies
Factors affecting second language learning strategieszilatesl
 
Discourse analysis
Discourse analysisDiscourse analysis
Discourse analysisVale Caicedo
 
Individual differences in second language learning
Individual differences in second language learningIndividual differences in second language learning
Individual differences in second language learningUTPL UTPL
 
Brief Introduction to Psycholinguistics
Brief Introduction to PsycholinguisticsBrief Introduction to Psycholinguistics
Brief Introduction to PsycholinguisticsIqra Abadullah
 
Second language teaching methods
Second language teaching methodsSecond language teaching methods
Second language teaching methodsJaziel Romero
 
Formal instruction and language learning
Formal instruction and language learningFormal instruction and language learning
Formal instruction and language learningUnstain Aficionado
 

What's hot (20)

Direct method in applied linguistics
Direct method in applied linguisticsDirect method in applied linguistics
Direct method in applied linguistics
 
Testing for Language Teachers Arthur Hughes
Testing for Language TeachersArthur HughesTesting for Language TeachersArthur Hughes
Testing for Language Teachers Arthur Hughes
 
Language and Gender (Sociolinguistics)
Language and Gender (Sociolinguistics) Language and Gender (Sociolinguistics)
Language and Gender (Sociolinguistics)
 
Ug & sla
Ug & slaUg & sla
Ug & sla
 
GTM method
GTM method GTM method
GTM method
 
Politeness (Pragmatics)
Politeness (Pragmatics)Politeness (Pragmatics)
Politeness (Pragmatics)
 
SLA-Inter-language presentation
SLA-Inter-language presentationSLA-Inter-language presentation
SLA-Inter-language presentation
 
The Role of context (Discourse Analysis)
The Role of context (Discourse Analysis)The Role of context (Discourse Analysis)
The Role of context (Discourse Analysis)
 
Second Language Acquisition (SLA)
Second Language Acquisition (SLA) Second Language Acquisition (SLA)
Second Language Acquisition (SLA)
 
Learning strategies
Learning strategies Learning strategies
Learning strategies
 
Conversation analysis
Conversation  analysisConversation  analysis
Conversation analysis
 
Sla concepts
Sla conceptsSla concepts
Sla concepts
 
Factors affecting second language learning strategies
Factors affecting second language learning strategiesFactors affecting second language learning strategies
Factors affecting second language learning strategies
 
Discourse analysis
Discourse analysisDiscourse analysis
Discourse analysis
 
Relevance theory part 1
Relevance theory part 1Relevance theory part 1
Relevance theory part 1
 
Individual differences in second language learning
Individual differences in second language learningIndividual differences in second language learning
Individual differences in second language learning
 
Brief Introduction to Psycholinguistics
Brief Introduction to PsycholinguisticsBrief Introduction to Psycholinguistics
Brief Introduction to Psycholinguistics
 
Second language teaching methods
Second language teaching methodsSecond language teaching methods
Second language teaching methods
 
Formal instruction and language learning
Formal instruction and language learningFormal instruction and language learning
Formal instruction and language learning
 
Error analysis
Error analysis Error analysis
Error analysis
 

Similar to Understanding Research Methods in SLL

Similar to Understanding Research Methods in SLL (20)

Statistical test
Statistical test Statistical test
Statistical test
 
20 OCT-Hypothesis Testing.ppt
20 OCT-Hypothesis Testing.ppt20 OCT-Hypothesis Testing.ppt
20 OCT-Hypothesis Testing.ppt
 
Statistics basics
Statistics basicsStatistics basics
Statistics basics
 
HYPOTHESIS
HYPOTHESISHYPOTHESIS
HYPOTHESIS
 
Research
ResearchResearch
Research
 
psychology
psychologypsychology
psychology
 
Statistics
StatisticsStatistics
Statistics
 
Correlation
CorrelationCorrelation
Correlation
 
Hypotheses
HypothesesHypotheses
Hypotheses
 
Page 266LEARNING OBJECTIVES· Explain how researchers use inf.docx
Page 266LEARNING OBJECTIVES· Explain how researchers use inf.docxPage 266LEARNING OBJECTIVES· Explain how researchers use inf.docx
Page 266LEARNING OBJECTIVES· Explain how researchers use inf.docx
 
research hypothesis
research hypothesisresearch hypothesis
research hypothesis
 
Hypothesis
HypothesisHypothesis
Hypothesis
 
Chapter02
Chapter02Chapter02
Chapter02
 
Chapter02
Chapter02Chapter02
Chapter02
 
GROUP 08 .pptx
GROUP 08 .pptxGROUP 08 .pptx
GROUP 08 .pptx
 
Hypothesis Testing. Inferential Statistics pt. 2
Hypothesis Testing. Inferential Statistics pt. 2Hypothesis Testing. Inferential Statistics pt. 2
Hypothesis Testing. Inferential Statistics pt. 2
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
 
Correlational research
Correlational research Correlational research
Correlational research
 
PR 2, WEEK 2.pptx
PR 2, WEEK 2.pptxPR 2, WEEK 2.pptx
PR 2, WEEK 2.pptx
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
 

More from Islamic Azad University, Najafabad Branch

More from Islamic Azad University, Najafabad Branch (11)

Van patten-Chapter 6-Farajnezhad
Van patten-Chapter 6-FarajnezhadVan patten-Chapter 6-Farajnezhad
Van patten-Chapter 6-Farajnezhad
 
Materials for developing reading_present by Zahra Farajnezhad
Materials for developing reading_present by Zahra FarajnezhadMaterials for developing reading_present by Zahra Farajnezhad
Materials for developing reading_present by Zahra Farajnezhad
 
Teaching writing_ teaching by principles. an interactive approach to language...
Teaching writing_ teaching by principles. an interactive approach to language...Teaching writing_ teaching by principles. an interactive approach to language...
Teaching writing_ teaching by principles. an interactive approach to language...
 
World Englishes and Varieties of English
World Englishes and Varieties of EnglishWorld Englishes and Varieties of English
World Englishes and Varieties of English
 
Psychology for language teachers
Psychology for language teachersPsychology for language teachers
Psychology for language teachers
 
Classroom observation_Teaching Skills
Classroom observation_Teaching SkillsClassroom observation_Teaching Skills
Classroom observation_Teaching Skills
 
Virtual learning environment & CALL
Virtual learning environment & CALLVirtual learning environment & CALL
Virtual learning environment & CALL
 
Task-Based_section5_ Zahra Farajnezhad
Task-Based_section5_ Zahra FarajnezhadTask-Based_section5_ Zahra Farajnezhad
Task-Based_section5_ Zahra Farajnezhad
 
Teacher involvement chapter 4 part_i_Zahra Farajnezhad
Teacher involvement chapter 4 part_i_Zahra FarajnezhadTeacher involvement chapter 4 part_i_Zahra Farajnezhad
Teacher involvement chapter 4 part_i_Zahra Farajnezhad
 
Sociocultural theory zahra farajnezhad
Sociocultural theory zahra farajnezhadSociocultural theory zahra farajnezhad
Sociocultural theory zahra farajnezhad
 
Second language research qualitative research_ Zahra Farajnezhad
Second language research qualitative research_ Zahra FarajnezhadSecond language research qualitative research_ Zahra Farajnezhad
Second language research qualitative research_ Zahra Farajnezhad
 

Recently uploaded

Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfadityarao40181
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...jaredbarbolino94
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerunnathinaik
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,Virag Sontakke
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 

Recently uploaded (20)

Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdf
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developer
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 

Understanding Research Methods in SLL

  • 1. Understanding Research in SLL James Dean Brown Presented by: Zahra Farajnezhad
  • 2. Chapter 9; Statistical Logic Use a sample study to show how logic is applied in the three most commonly reported families of statistical studies: 1_ Explore the strength of relationships between variables; 2_ Compare group means; 3_ Compare frequency.
  • 3. Stage1: Focusing the Study – Noticing a problem – Identifying and operationally defining the constructs to be examined – Formulating research questions and hypotheses.
  • 4. Identifying a Problem – These often derived from classroom teaching experiences or from reading the literature in the field. – The researcher must first notice a problem that is worthy of a solution; one for which the answer will be indirectly useful to teachers in theory building or more directly for actual language teaching.
  • 5. Operationalizing constructs as variables – The researcher must attempt to identify all the constructs that are pertinent to solving the problem at hand. – It often requires a great deal of hard thought because studies of language learning and teaching are highly complex and have many variables. – It can be frustrating because many constructs that are important to language learning (e.g., students’ motivation and students’ ambition) may be difficult to measure or operationalize as variables. – The failure to identify or operationalize variables in the beginning could threaten the entire logic and framework of a study.
  • 6. Research Hypotheses • Hypotheses: is a precise, testable statement of what the researcher(s) predict will be the outcome of the study. • This usually involves proposing a possible relationship between two variables: the independent variable (what the researcher changes) and the dependent variable (what the research measures). • In research, there is a convention that the hypothesis is written in two forms, the null hypothesis, and the alternative hypothesis (called the experimental hypothesis when the method of investigation is an experiment).
  • 7. Directional Hypothesis • A one-tailed directional hypothesis predicts the nature of the effect of the independent variable on the dependent variable. E.g., adults will correctly recall more words than children. • It can be formulated when there is a good theoretical reason, usually based on previous research, to hypothesize that the relationship, if there is any, will be in one direction or the other. • If we had a correlational study, the directional hypothesis would state whether we expect a positive or a negative correlation, we are stating how the two variables will be related to each other. The directional hypothesis can also state a negative correlation, e.g. the higher the number of face-book friends, the lower the life satisfaction score “
  • 8. Nondirectional Hypotheses • A non-directional (or two tailed hypothesis) simply states that there will be a difference between the two groups/conditions but does not say which will be greater/smaller, quicker/slower etc. Using our example above we would say “There will be a difference between the number of cold symptoms experienced in the following week after exposure to a virus for those participants who have been sleep deprived for 24 hours compared with those who have not been sleep deprived for 24 hours.” • When the study is correlational, we simply state that variables will be correlated but do not state whether the relationship will be positive or negative, e.g. there will be a significant correlation between variable A and variable B. • A two-tailed non-directional hypothesis predicts that the independent variable will have an effect on the dependent variable, but the direction of the effect is not specified.
  • 9. Null Hypothesis  The null hypothesis states that there is no relationship between the two variables being studied (one variable does not affect the other).  It states results are due to chance and are not significant in terms of supporting the idea being investigated.  A null hypothesis is a type of conjecture used in statistics that proposes that there is no difference between certain characteristics of a population or data- generating process.  Hypothesis testing provides a method to reject a null hypothesis within a certain confidence level. (Null hypotheses cannot be proven, though.)
  • 10. Null Hypothesis Example:  Participants who have been deprived of sleep for 24 hours will NOT have more cold symptoms in the following week after exposure to a virus than participants who have not been sleep deprived and any difference that does arise will be due to chance alone.  With a directional correlational hypothesis: There will NOT be a positive correlation between the number of stress life events experienced in the last year and the number of coughs and colds suffered, whereby the more life events you have suffered the more coughs and cold you will have had”  With a non-directional or two tailed hypothesis: There will be NO difference between the number of cold symptoms experienced in the following week after exposure to a virus for those participants who have been sleep deprived for 24 hours compared with those who have not been sleep deprived for 24 hours.  For a correlational: there will be NO correlation between variable A and variable B.
  • 11. Alternative Hypotheses  The alternative hypothesis describes the population parameters that the sample data represent if the predicted relationship exists.  The alternative hypothesis (H 1 ) is the statement that the scores came from different populations the independent variable significantly affected the dependent variable.  “There are no differences between the groups”. This is the hypothesis that you are testing! Alternative hypothesis (Ha): “There are effects/differences between the groups”. This is what you expect to find!
  • 12. Alternative Hypotheses Example  In a two-tailed test, the null hypothesis states that the population mean equals a given value. For example, H 0 :  = 100. In a two-tailed test, the alternative hypothesis states that the population mean does not equal the same given value as in the null hypothesis. For example, H a :  100. Two-Tailed Hypotheses  The Null Hypothesis (H o ) states that there is no difference, effect, or correlation in the population The Null Hypothesis (H o ) states that there is no difference, effect, or correlation in the population H o is assumed to be true unless there is enough evidence to reject it. H o is assumed to be true unless there is enough evidence to reject it. Burden of proof on the researcher Burden of proof on the researcher The researcher’s hypothesis (Alternative Hypothesis, H A ) is only tested indirectly.
  • 13. Stage 2: Sampling Sampling is a process used in statistical analysis in which a predetermined number of observations are taken from a larger population. The methodology used to sample from a larger population depends on the type of analysis being performed, but it may include simple random sampling or systematic sampling. 1_ Random Sampling 2_ Systematic Sampling 3_ Stratified Random Sampling
  • 14. Random Sampling  With random sampling, every item within a population has an equal probability of being chosen. It is the furthest removed from any potential bias because there is no human judgement involved in selecting the sample.  E.g., a random sample may include choosing the names of 25 employees out of a hat in a company of 250 employees. The population is all 250 employees, and the sample is random because each employee has an equal chance of being chosen.
  • 15. Systematic Sampling – Systematic sampling begins at a random starting point within the population and uses a fixed, periodic interval to select items for a sample. The sampling interval is calculated as the population size divided by the sample size. Despite the sample population being selected in advance, systematic sampling is still considered random if the periodic interval is determined beforehand and the starting point is random. – Systematic sampling is simpler and more straightforward than random sampling. It can also be more conducive to covering a wide study area. On the other hand, systematic sampling introduces certain arbitrary parameters in the data. This can cause over- or under-representation of particular patterns.
  • 16. Systematic Sampling  Because of its simplicity, systematic sampling is popular with researchers.  Other advantages of this methodology include eliminating the phenomenon of clustered selection and a low probability of contaminating data.  Disadvantages include over- or under-representation of particular patterns and a greater risk of data manipulation.
  • 17. Stratified Random Sampling  Stratified random sampling allows researchers to obtain a sample population that best represents the entire population being studied by dividing it into subgroups called strata.  This method of statistical sampling, however, cannot be used in every study design or with every data set.  Stratified random sampling differs from simple random sampling, which involves the random selection of data from an entire population, so each possible sample is equally likely to occur.
  • 18. Stratified Random Sampling – Stratified random sampling involves first dividing a population into subpopulations and then applying random sampling methods to each subpopulation to form a test group. A disadvantage is when researchers can't classify every member of the population into a subgroup. – This is different from simple random sampling, which involves the random selection of data from the entire population so that each possible sample is equally likely to occur. In contrast, stratified random sampling divides the population into smaller groups, or strata, based on shared characteristics. A random sample is taken from each stratum in direct proportion to the size of the stratum compared to the population.
  • 19. Sampling Distribution  A sampling distribution is a statistic that is arrived out through repeated sampling from a larger population.  It describes a range of possible outcomes that of a statistic, such as the mean or mode of some variable, as it truly exists a population.  The majority of data analyzed by researchers are actually drawn from samples, and not populations.
  • 20. Sampling Distribution – A sampling distribution is a probability distribution of a statistic obtained from a larger number of samples drawn from a specific population. The sampling distribution of a given population is the distribution of frequencies of a range of different outcomes that could possibly occur for a statistic of a population. – In statistics, a population is the entire pool from which a statistical sample is drawn. A population may refer to an entire group of people, objects, events, hospital visits, or measurements. A population can thus be said to be an aggregate observation of subjects grouped together by a common feature.
  • 21. Stage 3: Setting up Statistical Decisions • On the basis of research hypotheses, the researcher must: 1: Select the correct statistical procedures 2: Formulate statistical hypotheses 3: Select an alpha decision level
  • 22. 1: Choosing the correct statistics  The choose will be based on clear thinking about: 1: How many variables there are; 2: Which variables are dependent, independent, moderator or control variables; 3: Which scales (nominal, ordinal, or interval) are used for each.  Then the researcher will have to decide the appropriateness of the statistics that he/she used.
  • 23. Statistical Hypotheses – We can formulate the following shorthand versions: H₀ r = 0 r equals zero H₁ r > 0 r is greater than zero H₂ r < 0 r is less than zero H₃ r # 0 r does not equals zero
  • 24. Statistical Hypotheses • A population is the entire group that is of interest in a study • A sample is a subgroup taken from that population to represent it. • When calculations are made to describe a sample, they are called statistics. (page,114, Brown) • If the same calculations were actually done for the entire population, they would be called parameters. These parameters would give the best picture of what is going on in a given population.
  • 25. The Conceptual Differences Between Statistics and Parameters  A statistic and a parameter are very similar. The difference between a statistic and a parameter is that statistics describe a sample. A parameter describes an entire population.  We use different notation for parameters and statistics:  The statistical symbols are usually Roman letters (e.g., X and SD for the sample mean and the standard deviation)  The parameters are symbolized by Greek letters (μ and δ for the population mean and the standard deviation)
  • 26. Alpha Decision Level – Typically researcher sets alpha at 0.05.However there are instances when researcher may decide to use a more stringent level of alpha , e.g., Alpha 0.05 indicates researcher willing to take up to 5% risk of making an error (Type I error) when deciding statistical significance. Alpha 0.01 indicates researcher willing to take up to 1% risk of Type I error Type I error occurs when a researcher rejects the null hypothesis when in fact it is true in the population.
  • 27. Do you reject or fail to reject the null hypothesis?  The decision is made by examining the p level furnished by the computer. Example: if the alpha level is set at .05, inferential statistics with p levels of .05 or less are statistically significant. When this is the case , the H₀ is rejected and Hḁ is supported.
  • 28. How strong does the evidence have to be to reject the Null? –The researcher must set a criterion. This is the significance level, or alpha (  ). The researcher must set a criterion. This is the significance level, or alpha (  ). The conventional alpha level is.05. The conventional alpha level is.05. We are conservative about rejecting H₀. We are conservative about rejecting H₀. When testing for significance, we calculate a test statistic. When testing for significance, we calculate a test statistic. The test statistic allows us to determine the probability of obtaining our results under the assumption that H o is true. The test statistic allows us to determine the probability of obtaining our results under the assumption that H o is true. If this probability is small enough, then H o is probably not true, so we should reject it.
  • 29. Determining Significance • If the probability is lower than our significance level, we Reject H₀ (p <.05). If the probability is lower than our significance level, we Reject H₀ (p <.05). If the probability is not lower than our significance level, we Fail to Reject H₀ (p >.05). If the probability is not lower than our significance level, we Fail to Reject H₀ (p >.05). H₀ is never “accepted” or “proven.” H₀ is never “accepted” or “proven.” • Decide what p-value would be “too unlikely” This threshold is called the alpha level. When a sample statistic surpasses this level, the result is said to be significant. Typical alpha levels are .05 and .01.
  • 30. Determining Significance  Significance as a Probability Game  There are four possible outcomes in significance test, based on two dimensions:  The researcher’s decision about H₀. The researcher’s decision about H₀. Whether H₀ is really true or false. Whether H₀ is really true or false. The probability of each outcome can be determined. The probability of each outcome can be determined.
  • 31. Stage 4: Necessary Consideration  Four types of information must need to be found: 1: The observed statistics (those that were actually calculated) 2: Whether the assumption underlying those statistics were met 3: The degree of freedom involved for each statistic. 4: The critical values for each statistics
  • 32. Observed Statistics  Whether the results are a straightforward Pearson r or a complicated looking analysis of variance table based on F ratios (Chapter 11, Brown), the researcher does a lot of adding, subtracting, dividing, and multiplying to get there. Often, he/she does so with a mainframe computer and using different statistical software like the Statistical Package for the Social Sciences (1975).  The result of the calculations will be observed statistics.
  • 33. Assumptions  An assumption is a precondition that must be met for the particular statistical analysis to be accurately applied.  E.g., one of the assumptions that underlies the proper application of the Pearson product-moment correlation coefficient (r) is that each set of scores is on an interval scale. The scales involved must not be nominal or ordinal. If they are other than interval scales, other statistics may be applied.  If the data don’t meet the assumptions of the procedure perfectly, we will have only a negligible amount of error in the inferences we draw.
  • 34. Degrees of Freedom  It is a mathematical equation used primarily in statistics, degrees of freedom can be used in statistics to determine if results are significant.  The degrees of freedom (df) are simply n-1. – The degrees of freedom can be calculated to help ensure the statistical validity of chi-square tests, t-tests and even the more advanced f-tests. These tests are commonly used to compare observed data with data that would be expected to be obtained according to a specific hypothesis.
  • 35. Degrees of Freedom  Because degrees of freedom calculations identify how many values in the final calculation are allowed to vary, they can contribute to the validity of an outcome. These calculations are dependent upon the sample size, or observations, and the parameters to be estimated, but generally, in statistics, degrees of freedom equal the number of observations minus the number of parameters. This means there are more degrees of freedom with a larger sample size.
  • 36. Formula for Degrees of Freedom  df = N-1 (Where N is the number of values in the data set (sample size). Take a look at the sample computation.)  If there is a data set of 4, (N=4).  Call the data set X and create a list with the values for each data. For this example data, set X includes: 15, 30, 25, 10  This data set has a mean, or average of 20. Calculate the mean by adding the values and dividing by N: (15+30+25+10)/4= 20  Using the formula, the degrees of freedom would be calculated as df = N-1: In this example, it looks like, df = 4-1 = 3  This indicates that, in this data set, three numbers have the freedom to vary as long as the mean remains 20.
  • 37. Critical Value  The critical value is the value that the researcher might expect to observe in the sample simply because of chance. In most cases, an observed statistic must exceed the critical value to reject the null hypothesis and thereby accept one of the alternative hypotheses.  This critical value will vary from study to study even for the same statistic because the degrees of freedom will usually vary, largely owing to differences in the size of samples.
  • 38. Stage 5: Statistical Decisions 1: Hypothesis testing (not to be confused with the common meaning of “testing”); 2: The careful interpretation of the results; 3: An awareness of the potential pitfalls for a particular statistical test.
  • 39. Hypothesis testing o The formal procedure statisticians follow to determine whether a certain hypothesis is valid or not is referred to as hypothesis testing. o By using hypothesis testing, statisticians can validate statements such as, 'This washer only needs one gallon of water to wash a large load of clothes.'  Hypothesis testing is a 4-step process: Step 1: Write the hypothesis. Step 2: Create an analysis plan. Step 3: Analyze the data. Step 4: Interpret the results.
  • 40. Interpretation of the Results  Whenever we encounter a research finding based on the interpretation of a p value from a statistical test, whether we realize it or not, we are discussing the result of a formal hypothesis test. This is true irrespective of whether the test involves comparisons of means, regression results or other types of statistical tests. As readers of research, it is important to understand the underlying principles of hypothesis testing, so that when faced with statistical results, we reach the right conclusions and make good decisions about which findings are robust enough to be translated into clinical practice.  A result is statistically significant when the p-value is less than alpha. This signifies a change was detected: that the default hypothesis can be rejected. If p- value > alpha: Fail to reject the null hypothesis (i.e. not significant result). If p- value <= alpha: Reject the null hypothesis (i.e. significant result).
  • 42. Stage 1: Focusing the Study – Identifying a problem – Operationalizing Variables – Research Hypotheses
  • 43. Stage 2: Sampling Stage 3: Setting up Statistical Decisions – Choosing the correct statistic – Statistical hypotheses – Alpha decision level
  • 44. Stage 4: Necessary Calculations  Observed statistics  Assumptions: 1: Independence 2: Normal Distribution 3: Interval Scales 4: Linear Relationship  Degrees of Freedom  Critical Values
  • 45. Statistical Decisions o Hypothesis Testing o Interpretation of Results o Potential Pitfalls 1: Restriction of Range 2: Skewedness 3: Casuality
  • 46. Biserial Correlation  The biserial correlation is a correlation between on one hand, one or more quantitative variables, and on the other hand one or more binary variables. It was introduced by Pearson (1909).  The biserial correlation coefficient varies between -1 and 1. 0 corresponds to no association (the means of the quantitative variable for the two categories of the qualitative variable are identical).
  • 47. Biserial Correlation – For the two-tailed test, the null H0 and alternative Ha hypotheses are as follows:  H0 : r = 0  Ha : r ≠ 0 – In the left one-tailed test, the following hypotheses are used:  H0 : r = 0  Ha : r < 0 – In the right one-tailed test, the following hypotheses are used:  H0 : r = 0  Ha : r > 0
  • 48. Correlation Coefficient  Correlation coefficients are used to measure how strong a relationship is between two variables.  There are several types of correlation coefficient, but the most popular is Pearson’s. Pearson’s correlation (called Pearson’s R) is a correlation coefficient commonly used in linear regression.  If you’re starting out in statistics, you’ll probably learn about Pearson’s R first.  In fact, when anyone refers to the correlation coefficient, they are usually talking about Pearson’s.
  • 49. Correlation Coefficient  Correlation coefficient formulas are used to find how strong a relationship is between data. The formulas return a value between -1 and 1, where:  1 indicates a strong positive relationship.  -1 indicates a strong negative relationship.  A result of zero indicates no relationship at all
  • 50.  A correlation coefficient of 1 means that for every positive increase in one variable, there is a positive increase of a fixed proportion in the other. For example, shoe sizes go up in (almost) perfect correlation with foot length.  A correlation coefficient of -1 means that for every positive increase in one variable, there is a negative decrease of a fixed proportion in the other. For example, the amount of gas in a tank decreases in (almost) perfect correlation with speed.  Zero means that for every increase, there isn’t a positive or negative increase. The two just aren’t related.  The absolute value of the correlation coefficient gives us the relationship strength. The larger the number, the stronger the relationship. For example, |-.75| = .75, which has a stronger relationship than .65. Correlation Coefficient
  • 52. Kendall Tau  Kendall’s Tau is a non-parametric measure of relationships between columns of ranked data. The Tau correlation coefficient returns a value of 0 to 1, where:  0 is no relationship,  1 is a perfect relationship.  A quirk of this test is that it can also produce negative values (i.e. from -1 to 0). Unlike a linear graph, a negative relationship doesn’t mean much with ranked columns (other than you perhaps switched the columns around), so just remove the negative sign when you’re interpreting Tau.
  • 53. Kendall Tau  Several version’s of Tau exist.  Tau-A and Tau-B are usually used for square tables (with equal columns and rows). Tau-B will adjust for tied ranks. Tau-C is usually used for rectangular tables. For square tables, Tau-B and Tau-C are essentially the same.  Most statistical packages have Tau-B built in, but you can use the following formula to calculate it by hand:  Kendall’s Tau = (C – D / C + D) Where C is the number of concordant pairs and D is the number of discordant pairs.
  • 54. Kendall W – Kendall's W (known as Kendall's coefficient of concordance) is a non- parametric statistic. It is a normalization of the statistic of the Friedman test, and can be used for assessing agreement among raters. Kendall's W ranges from 0 (no agreement) to 1 (complete agreement).  E.g., a number of people have been asked to rank a list of political concerns, from most important to least important. Kendall's W can be calculated from these data. If the test statistic W is 1, then all the survey respondents have been unanimous, and each respondent has assigned the same order to the list of concerns. If W is 0, then there is no overall trend of agreement among the respondents, and their responses may be regarded as essentially random. Intermediate values of W indicate a greater or lesser degree of unanimity among the various responses.
  • 55. Multiple Regression – Multiple regression is an extension of simple linear regression. It is used when we want to predict the value of a variable based on the value of two or more other variables. The variable we want to predict is called the dependent variable (or sometimes, the outcome, target or criterion variable). – Multiple regression also allows you to determine the overall fit (variance explained) of the model and the relative contribution of each of the predictors to the total variance explained. For example, you might want to know how much of the variation in exam performance can be explained by revision time, test anxiety, lecture attendance and gender "as a whole", but also the "relative contribution" of each independent variable in explaining the variance.
  • 56. Standard Partial Regression – the number of standard deviations that YY would change for every one standard deviation change in X_1X​1​​, if all the other XX variables could be kept constant. – When the purpose of multiple regression is prediction, the important result is an equation containing partial regression coefficients (slopes). – The magnitude of the partial regression coefficient depends on the unit used for each variable.
  • 57. – When the purpose of multiple regression is understanding functional relationships, the important result is an equation containing standard partial regression coefficients, like this:  Where b'_1b​1​′​​ is the standard partial regression coefficient of yy on X_1X​1​​.  The magnitude of the standard partial regression coefficients tells you something about the relative importance of different variables; XX variables with bigger standard partial regression coefficients have a stronger relationship with the YY variable. Standard Partial Regression
  • 58. Linear Regression – Linear regression, while a useful tool, has significant limits. As it’s name implies, it can’t easily match any data set that is non-linear. It can only be used to make predictions that fit within the range of the training data set. And, most importantly for this article, it can only be fit to data sets with a single dependent variable and a single independent variable. – The general form of the equation for linear regression is: y = B * x + A – where y is the dependent variable, x is the independent variable, and A and B are coefficients dictating the equation. The difference between the equation for linear regression and the equation for multiple regression is that the equation for multiple regression must be able to handle multiple inputs, instead of only the one input of linear regression.
  • 59. Heteroscedasticity – Heteroscedasticity is a hard word to pronounce, but it doesn't need to be a difficult concept to understand. Put simply, heteroscedasticity refers to the circumstance in which the variability of a variable is unequal across the range of values of a second variable that predicts it. – A scatterplot of these variables will often create a cone-like shape, as the scatter (or variability) of the dependent variable (DV) widens or narrows as the value of the independent variable (IV) increases. The inverse of heteroscedasticity is homoscedasticity, which indicates that a DV's variability is equal across values of an IV.
  • 60. Heteroscedasticity  Plot with random data showing heteroscedasticity  In statistics, is heteroscedastic (or heteroskedastic; if the variability of the random disturbance is different across elements of the vector. Variability could be quantified by the variance or any other measure of statistical dispersion. Heteroscedasticity is the absence of homoscedasticity. A typical example is the set of observations of income in different cities.
  • 61. Heteroscedasticity • The existence of heteroscedasticity is a major concern in regression analysis and the analysis of variance, as it invalidates statistical tests of significance that assume that the modelling errors all have the same variance. While the ordinary least squares estimator is still unbiased in the presence of heteroscedasticity, it is inefficient and generalized least squares should be used instead. • Because heteroscedasticity concerns expectations of the second moment of the errors, its presence is referred to as misspecification of the second order.
  • 62. Multicollinearity – Multicollinearity is the occurrence of high intercorrelations among two or more independent variables in a multiple regression model. Multicollinearity can lead to skewed or misleading results when a researcher or analyst attempts to determine how well each independent variable can be used most effectively to predict or understand the dependent variable in a statistical model. – Multicollinearity can lead to wider confidence intervals that produce less reliable probabilities in terms of the effect of independent variables in a model. That is, the statistical inferences from a model with multicollinearity may not be dependable.
  • 63.  KEY TAKEAWAYS  Multicollinearity is a statistical concept where independent variables in a model are correlated.  Multicollinearity among independent variables will result in less reliable statistical inferences.  It is better to use independent variables that are not correlated or repetitive when building multiple regression models that use two or more variables. Multicollinearity
  • 64. Data Transformation  In statistics:  Data transformation is the application of a deterministic mathematical function to each point in a data set—that is, each data point zi is replaced with the transformed value yi = f (zi), where f is a function.  Transforms are usually applied so that the data appear to more closely meet the assumptions of a statistical inference procedure that is to be applied, or to improve the interpretability or appearance of graphs.
  • 65. Phi Coefficient – The Phi Coefficient is a measure of association between two binary variables (i.e. living/dead, black/white, success/failure). It is also called the Yule phi or Mean Square Contingency Coefficient and is used for contingency tables when: – At least one variable is a nominal variable. – Both variables are dichotomous variables. A simple contingency table. Image: Michigan Dept. of Agriculture
  • 66. – The phi coefficient is a symmetrical statistic, which means the independent variable and dependent variables are interchangeable. The interpretation for the phi coefficient is similar to the Pearson Correlation Coefficient. The range is from -1 to 1, where:  0 is no relationship.  1 is a perfect positive relationship: most of your data falls along the diagonal cells.  -1 is a perfect negative relationship: most of your data is not on the diagonal. Phi Coefficient
  • 67. Point-Biserial Correlation – A point-biserial correlation is used to measure the strength and direction of the association that exists between one continuous variable and one dichotomous variable. It is a special case of the Pearson’s product-moment correlation, which is applied when you have two continuous variables, whereas in this case one of the variables is measured on a dichotomous scale. – E.g., you could use a point-biserial correlation to determine whether there is an association between salaries, measured in dollars, and gender (i.e., your continuous variable would be "salary" and your dichotomous variable would be "gender", which has two categories: "males" and "females").
  • 68. Spearman rho Spearman Rank Correlation  The Spearman rank correlation coefficient, rs, is the nonparametric version of the Pearson correlation coefficient. Your data must be ordinal, interval or ratio. Spearman’s returns a value from -1 to 1, where: +1 = a perfect positive correlation between ranks -1 = a perfect negative correlation between ranks 0 = no correlation between ranks.
  • 69. – The formula for the Spearman rank correlation coefficient when there are no tied ranks is: Spearman rho Spearman Rank Correlation
  • 70. Tetrachoric Correlation – Tetrachoric correlation is used to measure rater agreement for binary data; Binary data is data with two possible answers—usually right or wrong. The tetrachoric correlation estimates what the correlation would be if measured on a continuous scale. It is used for a variety of reasons including analysis of scores in Item Response Theory (IRT) and converting comorbity statistics to correlation coefficients. This type of correlation has the advantage that it’s not affected by the number of rating levels, or the marginal proportions for rating levels.
  • 71. – The term “tetrachoric correlation” comes from the tetrachoric series, a numerical method used before the advent of computers. While it’s more common to estimate correlations with methods like maximum likelihood estimation, there is a basic formula you can use. Tetrachoric Correlation
  • 72. The two main assumptions are:  The underlying variables come from a normal distribution. With only two variables, this is impossible to test. You should, therefore, have a good theoretical reason for using this particular type of correlation; in other words, you might know that the type of data you are dealing with tends to follow a normal distribution most of the time. Rating errors should also follow a normal distribution.  There is a latent continuous scale underneath your binary data. In other words, the trait you are measuring should be continuous and not discrete.  In addition, you may want to make sure that errors are independent between raters and cases and the variance for errors is homogeneous across levels of the independent variable.
  • 73. Curvilinear – Curvilinear regression analysis fits curves to data instead of the straight lines you see in linear regression. Technically, it’s a catch all term for any regression that involves a curve. For example, quadratic regression and cubic regression. About the only type that isn’t includes in this catch-all definition is simple linear regression.
  • 74. Standard Error of Estimate (SEE) – A linear regression gives us a best-fit line for a scatterplot of data. The standard error of estimate (SEE) is one of the metrics that tells us about the fit of the line to the data. The SEE is the standard deviation of the errors (or residuals). – The standard error of estimate tells you approximately how large the prediction errors (residuals) are for your data set, in the same units as Y. How well can you predict Y? The answer is, to within about Se above or below.16 – Since you usually want your forecasts and predictions to be as accurate as possible, you would be glad to find a small value for Se. You can interpret Se as a standard deviation in the sense that, if you have a normal distribution for the prediction errors, then you will expect about two-thirds of the data points to fall within a distance Se either above or below the regression line.
  • 75. Pearson r  In statistics, the Pearson correlation coefficient (PCC), also referred to as Pearson's r, the Pearson product-moment correlation coefficient (PPMCC), or the bivariate correlation.  Pearson's correlation coefficient is the covariance of the two variables divided by the product of their standard deviations. The form of the definition involves a "product moment", that is, the mean (the first moment about the origin) of the product of the mean-adjusted random variables; hence the modifier product-moment in the name.
  • 76.  The correlation coefficient ranges from −1 to 1. A value of 1 implies that a linear equation describes the relationship between X and Y perfectly, with all data points lying on a line for which Y increases as X increases. A value of −1 implies that all data points lie on a line for which Y decreases as X increases. A value of 0 implies that there is no linear correlation between the variables.  More generally, note that (Xᵢ − X)(Yᵢ − Y) is positive if and only if Xᵢ and Yᵢ lie on the same side of their respective means. Thus the correlation coefficient is positive if Xᵢ and Yᵢ tend to be simultaneously greater than, or simultaneously less than, their respective means. The correlation coefficient is negative (anti-correlation) if Xᵢ and Yᵢ tend to lie on opposite sides of their respective means. Moreover, the stronger is either tendency, the larger is the absolute value of the correlation coefficient. Pearson r
  • 77. One-Tailed Decisions One tailed test. Although this picture is shaded on the left, it’s mirror image (i.e. where it’s shaded on the right) would also be a one tailed test.
  • 78.  A one-tailed requires a smaller sample size to achieve the same effect with the same power.  A one-tailed is a statistical test in which the critical area of a distribution is one-sided so that it is either greater than or less than a certain value, but not both. If the sample being tested falls into the one-sided critical area, the alternative hypothesis will be accepted instead of the null hypothesis.  A one-tailed test is also known as a directional hypothesis or directional test. One-Tailed Decisions
  • 79.  A one-tailed test is a statistical hypothesis test set up to show that the sample mean would be higher or lower than the population mean, but not both.  When using a one-tailed test, the analyst is testing for the possibility of the relationship in one direction of interest, and completely disregarding the possibility of a relationship in another direction.  Before running a one-tailed test, the analyst must set up a null hypothesis and an alternative hypothesis and establish a probability value (p-value). One-Tailed Decisions
  • 80. Two-Tailed Decisions – In statistics, a two-tailed test is a method in which the critical area of a distribution is two-sided and tests whether a sample is greater than or less than a certain range of values. It is used in null- hypothesis testing and testing for statistical significance. If the sample being tested falls into either of the critical areas, the alternative hypothesis is accepted instead of the null hypothesis. The two-tailed test gets its name from testing the area under both tails of a normal distribution, although the test can be used in other non-normal distributions.
  • 81.  In statistics, a two-tailed test is a method in which the critical area of a distribution is two-sided and tests whether a sample is greater or less than a range of values.  It is used in null-hypothesis testing and testing for statistical significance.  If the sample being tested falls into either of the critical areas, the alternative hypothesis is accepted instead of the null hypothesis.  By convention two-tailed tests are used to determine significance at the 5% level, meaning each side of the distribution is cut at 2.5%. Two-Tailed Decisions
  • 82. Summary  State your hypotheses: You are not attempting to prove your alternative hypotheses. You are testing the null hypothesis. If you reject the null hypothesis, then you are left with support for the alternative(s).  Set your decision criteria. Your alpha level will tell you what to decide. Reject the null hypothesis. Fail to reject the null hypothesis.  Describe the data you collected from the sample Inferential Statistics; Making inferences about the population from the data collected from the sample; Generalize results from study to the population.