SlideShare a Scribd company logo
1 of 90
BUS308 Week 4 Lecture 1
Examining Relationships
Expected Outcomes
After reading this lecture, the student should be familiar with:
1. Issues around correlation
2. The basics of Correlation analysis
3. The basics of Linear Regression
4. The basics of the Multiple Regression
Overview
Often in our detective shows when the clues are not providing a
clear answer – such as
we are seeing with the apparent continuing contradiction
between the compa-ratio and salary
related results – we hear the line “maybe we need to look at this
from a different viewpoint.”
That is what we will be doing this week.
Our investigation changes focus a bit this week. We started the
class by finding ways to
describe and summarize data sets – finding measures of the
center and dispersion of the data with
means, medians, standard deviations, ranges, etc. As interesting
as these clues were, they did not
tell us all we needed to know to solve our question about equal
work for equal pay. In fact, the
evidence was somewhat contradictory depending upon what
measure we focused on. In Weeks 2
and 3, we changed our focus to asking questions about
differences and how important different
sample outcomes were. We found that all differences were not
important, and that for many
relatively small result differences we could safely ignore them
for decision making purposes –
they were due to simple sampling (or chance) errors. We found
that this idea of sampling error
could extend into work and individual performance outcomes
observed over time; and that over-
reacting to such differences did not make much sense.
Now, in our continuing efforts to detect and uncover what the
data is hiding from us, we
change focus again as we start to find out why something
happened, what caused the data to act
as it did; rather than merely what happened (describing the data
as we have been doing). This
week we move from examining differences to looking at
relationships; that is, if some measure
changes does another measure change as well? And, if so, can
we use this information to make
predictions and/or understand what underlies this common
movement?
Our tools in doing this involve correlation, the measurement of
how closely two
variables move together; and regression, an equation showing
the impact of inputs on a final
output. A regression is similar to a recipe for a cake or other
food dish; take a bit of this and
some of that, put them together, and we get our result.
Correlation
We have seen correlations a lot, and probably have even used
them (formally or
informally). We know, for example, that all other things being
equal; the more we eat. the more
we weigh. Kids, up to the early teens, grow taller the older they
get. If we consistently speed,
we will get more speeding tickets than those who obey the
speed limit. The more efforts we put
into studying, the better grades we get. All of these are
examples of correlations.
Correlations exist in many forms. A somewhat specialized
correlation was the Chi
Square contingency test (for multi-row, multi-column tables) we
looked at last week, if we find
the distributions differ, then we say that the variables are
related/correlated. This correlation
would run from 0 (no correlation) thru positive values (the
larger the value the stronger the
relationship).
Probably the most commonly used correlation is the Pearson
Correlation Coefficient,
symbolized by r. It measures the strength of the association –
the extent to which measures change
together – between interval or ratio level measures. Excel’s Fx
Correl, and the Data Analysis
Correlation both produce Pearson Correlations.
Most correlations that we are familiar with show both the
direction (direct or inverse) as
well as the strength of the relationship, and run from -1.0 (a
strong and perfect inverse
correlation) through 0 (a weak and non-existent correlation) to
+1.0 (a strong an perfect direct
correlation). A direct correlation is positive; that is, both
variables move in the same direction,
such as weight and height for kids. An inverse, or negative,
correlation has variables moving in
different directions. For example, the number of hours you sleep
and how tired you feel; the
more hours, the less tired while the fewer hours, the more tired.
The strength of a correlation is shown by the value (regardless
of the sign). For example,
a correlation of +.78 is just as strong as a correlation of -.78;
the only difference is the direction
of the change. If we graphed a +.78 correlation the data points
would run from the lower left to
the upper right and somewhat cluster around a line we could
draw thru the middle of the data
points. A graph of a -.78 correlation would have the data points
starting in the upper left and run
down to the lower right. They would also cluster around a line.
Correlations below an absolute value (when we ignore the plus
or minus sign) of around
.70 are generally not considered to be very strong. The reason
for this is due to the coefficient of
determination(CD). This equals the square of the correlation
and shows the amount of shared
variation between the two variables. Shared variation can be
roughly considered the reason that
both variables move as they do when one changes. The more
the shared variation, the more one
variable can be used to predict the other. If we square .70 we
get .49, or about 50% of the
variation being shared. Anything less is too weak of a
relationship to be of much help.
Students often feel that a correlation shows a “cause-and-effect”
relationship; that is,
changes in one thing “cause” changes in the other variable. In
some cases, this is true – height
and weight for pre-teens, weight and food consumption, etc. are
all examples of possible cause-
and- effect relationships; but we can argue that even with these
there are other variables that
might interfere with the outcomes. And, in research, we cannot
say that one thing causes or
explains another without having a strong correlation present.
However, just as our favorite detectives find what they think is
a cause for someone to
have committed the crime, only to find that the motive did not
actually cause that person to
commit the crime; a correlation does not prove cause-and-
effect. An example of this is the
example the author heard in a statistics class of a perfect +1.00
correlation found between the
barrels of rum imported into the New England region of the
United States between the years of
1790 and 1820 and the number of churches built each year. If
this correlation showed a cause-
and-effect relationship, what does it mean? Does rum drinking
(the assumed result of importing
rum) cause churches to be built? Does the building of churches
cause the population to drink
more rum?
As tempting as each of these explanations is, neither is
reasonable – there is no theory or
justification to assume either is true. This is a spurious
correlation – one caused by some other,
often unknown, factor. In this case, the culprit is population
growth. During these years – many
years before Carrie Nation’s crusade against Demon Rum – rum
was the common drink for
everyone. It was even served on the naval ships of most
nations. And, as the population grew,
so did the need for more rum. At the same time, churches in the
region could only hold so many
bodies (this was before mega-churches that held multiple
services each Sunday); so, as the
population got too large to fit into the existing churches, new
ones were needed.
At times, when a correlation makes no sense we can find an
underlying variable fairly
easily with some thought. At other times, it is harder to figure
out, and some experimentation is
needed. The site http://www.tylervigen.com/spurious-
correlations is an interesting website
devoted to spurious correlations, take a look and see if you can
explain them. ��
Regression
Linear. Even if the correlation is spurious, we can often use the
data in making
predictions until we understand what the correlation is really
showing us. This is what
regression is all about. Earlier correlations between age,
height, and even weight were
mentioned. In pediatrician offices, doctors will often have
charts showing typical weights and
heights for children of different ages. These are the results of
regressions, equations showing
relationships. For example (and these values are made up for
this example), a child’s height
might be his/her initial height at birth plus and average growth
of 3.5 inches per year. If the
average height of a newborn child is about 19 inches, then the
linear regression would be:
Height = 19 inches plus 3.5 inches * age in years, or in math
symbols:
Y = a + b*x, where y stands for height, a is the intercept or
initial value at age 0
(immediate birth), b is the rate of growth per year, and x is the
age in years.
In both cases, we would read and interpret it the same way: the
expected height of a child is 19
inches plus 3.5 inches times its age. For a 12-year old, this
would be 19 + 3.5*12 = 19 + 42 = 61
inches or 5 feet 1 inch (assuming the made-up numbers are
accurate).
Multiple. That was an example of a linear regression having
one output and a single,
independent variable as an input. A multiple regression
equation is quite similar but has several
independent input variables. It could be considered to be
similar to a recipe for a cake:
http://www.tylervigen.com/spurious-correlations
Cake = cake mix + 2* eggs + 1½ * cup milk + ½ * teaspoon
vanilla + 2 tablespoons* butter.
A regression equation, either linear or multiple, shows us how
“much” each factor is used in or
influences the outcome. The math format of the multiple
regression equation is quite similar to
that of the linear regression, it just includes more variables:
Y = a + b1*X1 + b2*X2 + b3*X3 + …; where a is the intercept
value when all the inputs
are 0, the various b’s are the coefficients that are multiplied by
each variable value, and
the x’s are the values of each input.
A note on how to read the math symbols in the equations. The
Y is considered the output or
result, and is often called the dependent variable as its value
depends on the other factors. The
different b’s (b1, b2, etc.) are coefficients and read b-sub-1, b-
sub-2, etc. The subscripts 1, 2, etc.
are used to indicate the different coefficient values that are
related to each of the input variables.
The X-sub-1, X-sub-2, etc., are the different variables used to
influence the output, and are called
independent variables. In the recipe example, Y would be the
quality of the cake, a would be the
cake mix (a constant as we use all of what is in the box), the
other ingredients would relate to the
b*X terms. The 2*eggs would relate to b1*X1, where b1 would
equal 2 and X1 stands for eggs,
the second input relates to the milk, etc.
Summary
This week we changed our focus from examining differences to
looking for relationships
– do variables change in predictable ways. Correlation lets us
see both the strength and the
direction of change for two variables. Regression allows us to
see how some variables “drive” or
explain the change in another.
Pearson’s (for interval and ratio data variables) and Spearman’s
(for rank ordered or
ordinal data variables) are the two most commonly used
correlation coefficients. Each looks at
how a pair of variables moves in predictable patterns – either
both increasing together or one
increasing as the other decreases. The correlation ranges from -
1.00 (moving in opposite
directions) to +1.00 (moving in the same direction). These are
both examples of linear
correlation – how closely the variables move in a straight line
(if graphed). Curvilinear
corrections exist but are not covered in this class.
Regression equations show the relationship between
independent (input) variables and a
dependent (output variables). Linear regression involves a pair
of variables as seen in the linear
correlations. Multiple regression uses several input
(independent) variables for a single output
(dependent) variable.
The basic form of the regression equation is the same for both
linear and multiple
regression equations. The only difference is in the number of
inputs used. The multiple
regression equation general form is:
Y = Intercept + coefficient1 * variable1 + coefficient2 *
variable2 + etc. or
Y = A + b1*X1 + b2*X2 + …; where A is the intercept value, b
is a coefficient value, and
X is the name of a variable, and the subscripts identify different
variables.
Summary
This week we changed focus from examining differences to
examining relationships –
how variables might move in predictable patterns. This, we
found, can be done with either
correlations or regression equations.
Correlations measure both the strength (the value of the
correlation) and the direction (the
sign) of the relationship. We looked at the Pearson Correlation
(for interval and ratio level data)
and the Spearman’s Rank Order Correlation (for ordinal level
data). Both range from -1.00 (a
perfect inverse correlation where as one value increases the
other decreases) to +1.00 (a perfect
direct correlation where both value increase together). A
perfect correlation means the data
points would fall on a straight line if graphed. One interesting
characteristic of these correlations
occurs when you square the values. This produces the
Coefficient of Determination (CD), which
gives us an estimate of how much variation is in common
between the two variables. CD values
of less than .50 are not particularly useful for practical
purposes.
Regression equations provide a formula that shows us how
much influence an input
variable has on the output; that is, how much the output changes
for a given change in an input.
Regression equations are behind such commonly used
information such as the relationship
between height and weight for children that doctors use to
assess our children’s development.
That would be a linear regression, Weight = constant +
coefficient*height in inches or Y = A +
b*X, where Y stands for weight, A is the constant, b is the
coefficient, and X is the height. A
multiple regression is conceptually the same but has several
inputs impacting a single output.
If you have any questions on this material, please ask your
instructor.
After finishing with this lecture, please go to the first
discussion for the week, and engage
in a discussion with others in the class over the first couple of
days before reading the second
lecture.
BUS 308 Week 3 Lecture 1
Examining Differences - Continued
Expected Outcomes
After reading this lecture, the student should be familiar with:
1. Issues around multiple testing
2. The basics of the Analysis of Variance test
3. Determining significant differences between group means
4. The basics of the Chi Square Distribution.
Overview
Last week, we found out ways to examine differences between a
measure taken on two
groups (two-sample test situation) as well as comparing that
measure to a standard (a one-sample
test situation). We looked at the F test which let us test for
variance equality. We also looked at
the t-test which focused on testing for mean equality. We noted
that the t-test had three distinct
versions, one for groups that had equal variances, one for
groups that had unequal variances, and
one for data that was paired (two measures on the same subject,
such as salary and midpoint for
each employee). We also looked at how the 2-sample unequal t-
test could be used to use Excel
to perform a one-sample mean test against a standard or
constant value. This week we expand
our tool kit to let us compare multiple groups for similar mean
values.
A second tool will let us look at how data values are distributed
– if graphed, would they
look the same? Different shapes or patterns often means the
data sets differ in significant ways
that can help explain results.
Multiple Groups
As interesting as comparing two groups is, often it is a bit
limiting as to what it tells us.
One obvious issue that we are missing in the comparisons made
last week was equal work. This
idea is still somewhat hard to get a clear handle on. Typically,
as we look at this issue, questions
arise about things such as performance appraisal ratings,
education distribution, seniority impact,
etc.
Some of these can be tested with the tools introduced last week.
We can see, for
example, if the performance rating average is the same for each
gender. What we couldn’t do, at
this point however, is see if performance ratings differ by
grade, do the more senior workers
perform relatively better? Is there a difference between ratings
for each gender by grade level?
The same questions can be asked about seniority impact. This
week will give us tools to expand
how we look at the clues hidden within the data set about equal
pay for equal work.
ANOVA
So, let’s start taking a look at these questions. The first tool for
this week is the Analysis
of Variance – ANOVA for short. ANOVA is often confusing
for students; it says it analyzes
variance (which it does) but the purpose of an ANOVA test is to
determine if the means of
different groups are the same! Now, so far, we have considered
means and variance to be two
distinct characteristics of data sets; characteristics that are not
related, yet here we are saying that
looking at one will give us insight into the other.
The reason is due to the way the variance is analyzed. Just as
our detectives succeed by
looking at the clues and data in different ways, so does
ANOVA. There are two key variances
that are examined with this test. The first, called Within Group
variance, is the average variance
of the groups. ANOVA assumes the population(s) the samples
are taken from have the same
variation, so this average is an estimate of the population
variance.
The second is the variance of the entire group, Between Group
Variation, as if all the
samples were from the same group. Here are exhibits showing
two situations. In Exhibit A, the
groups are close together, in fact they are overlapping, and the
means are obviously close to each
other. The Between Group variation (which would be from the
data set that starts with the
orange group on the right and ends with the gray group on the
left) is very close to the Within
Group (the average) variation for the three groups.
So, if we divide our estimate of the Between Group (overall)
variation by the estimate of
our Within Group (average) variation, we would get a value
close to 1, and certainly less than
about 1.5. Recalling the F statistic from last week, we could
guess that there is not a significant
difference in the variation estimates. (Of course, with the
statistical test we do not guess but
know if the result is significant or not.)
Look at three sample distributions in Exhibit A. Each has the
same within group
variance, and the overall variance of the entire data set is not all
that much larger than the
average of the three separate groups. This would give us an F
relatively close to 1.00.
Exhibit A: No Significant Difference with Overall Variation
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
-5 -4 -3 -2 -1 0 1 2 3 4 5
Exhibit B: Significant Difference with Overall Variation
Now, if we look at exhibit B, we see a different situation. Here
the group distributions do
not overlap, and the means are quite different. If we were to
divide the Between Group (overall)
variance by the Within Group (average) variance we would get a
value quite a bit larger than the
value we calculated with the pervious samples, probably large
enough to indicate a difference
between the within and between group variation estimates.
And, again, we would examine this F
value for statistical significance.
This is essentially what ANOVA does; we will look at how and
the output in the next
lecture. If the F statistic is statistically significant (the null
hypothesis of no difference is
rejected), then we can say that the means are different. Neat!
So, why bother learning a new tool to test means? Why don’t
we merely use multiple t-
tests to test each pair separately. Granted, it would take more
time that doing a single test, but
with Excel that is not much of an issue. The best reason to use
ANOVA is to ensure we do not
reduce our confidence in our results. If we use an alpha of
0.05, it is essentially saying we are
95% sure we made the right decision in rejecting the null.
However, if we do even 3 t-tests on
related data, our confidence drops to the P(Decision 1 correct +
Decision 2 correct + Decision 3
correct). As we recall from week 1, the probability of three
events occurring is the product of
each event separately, or .95*.95*.95 = 0.857! And in
comparing means for 6 groups (such as
means for the different grade levels), we have 16 comparisons
which would reduce our overall
confidence that all decisions were correct to 44%. Not very
good. Therefore, a single ANOVA
test is much better for our confidence in making the right
decision than multiple T-tests.
The hypothesis testing procedure steps are set up in a similar
fashion to what we did in
with the t-tests. There is a single approach to wording the null
and alternate hypothesis
statements with ANOVA:
Ho: All means are equal
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
-10 -5 0 5 10
Ha: At least one mean differs.
The reason for this is simple. No matter how many groups we
are testing, if a single mean
differs, we will reject the null hypothesis. And, it can get
cumbersome listing all possible
outcomes of one or more means differing for the alternate.
One issue remains for us if we reject the null of no differences
among the mean, which
means are different? This is done by constructing what we can
call, for now, difference
intervals. A difference interval will give us a range of values
that the “real” difference between
two means could really be. Remember, since the means are
from samples, they are close
approximations to the actual population mean, which might be a
bit larger or smaller than any
given mean. These difference intervals will take into account
the possible sampling error we
have. (How we do this will be discussed in lecture 2 for this
week.).
A difference interval might be -2 to +1.8. This says that the
actual difference when we
subtract one mean from another could be any value between -2
to +1.8. Since this interval says
the difference could be 0 (meaning the means could be the
same), we would find this pair of
means to be not significantly different. If, however, our
difference range was, for example, from
+1.8 to + 3.8 (the same range but all positive values), we would
say the difference between the
means is significant as 0 is not within the range.
ANOVA is a very useful tool when we need to compare multiple
groups. For example,
this can be used to see if average shipping costs are the same
across multiple shippers. The
average time to fill open positions using different advertising
approaches, or the associated costs
of each, can also be tested with this technique. With our equal
pay issues, we can look at mean
equality across grades of variables such as compa-ratio, salary,
performance rating, seniority, and
even raise.
Chi Square Tests
The ANOVA test somewhat relies upon the shape of the
samples, both with our
assumption that each sample is normally distributed with an
equal variance and with their
relative relationship (how close or distant they are). In many
cases, we are concerned more with
the distribution of our variables than with other measures. In
some cases, particularly with
nominal labels, distribution is all we can measure.
In our salary question, one issue that might impact our analysis
is knowing if males and
females are distributed across the grades in a similar pattern. If
not, then whichever gender holds
more higher-level jobs would obviously have higher salaries.
While this might be an affirmative
action or possible discrimination issue, it is not an equal pay for
equal work situation.
So, again, we have some data that we are looking at, but are not
sure how to make the
decision if things are the same or not. And, just by examining
means we cannot just look at the
data we have and tell anything about how the variables are
distributed.
But, have no fear, statistics comes to our rescue! Examining
distributions, or shapes, or
counts per group (all ways of describing the same data) is done
using a version of the Chi Square
test; and, after setting up the data Excel does the work for us.
In comparing distributions, and we can do this with discrete
(such as the number of
employees in each grade) variables or continuous variables
(such as age or years of service
which can take any value within a range if measured precisely
enough) that we divide into
ranges, we simply count how many are in each group or range.
For something like the
distribution of gender by grades; simply count how many males
and females are in each grade,
simple even if a bit tedious. For something like compa-ratio,
we first set up the range values we
are interested in (such as .80 up to but not including .90, etc.),
and then count how many values
fall within each group range.
These counts are displayed in tables, such as the following on
gender distribution by
grade. The first is the distribution of employees by grade level
for the entire sample, and the
second is the distribution by gender. The question we ask is for
both kinds of tables is basically
the same, is the difference enough to be statistically significant
or meaningfully different from
our comparison standard?
A B C D E F
Overall 15 7 5 5 12 6
A B C D E F
Male 3 3 3 2 10 4
Female 12 4 2 3 2 2
The answer to the question of whether the distributions are
different enough, when using
the Chi Square test, depends with the group we are comparing
the distribution with. When we
are dealing with a single row table, we need to decide what our
comparison group or distribution
is. For example, we could decide to compare the existing
distribution or shape against a claim
that the employees are spread out equally across the 6 grades
with 50/6 = 8.33 employees in each
grade. Or we could decide to compare the existing distribution
against a pyramid shape - a more
typical organization hierarchy, with the most employees at the
lower grades (A and B) and fewer
at the top; for example, 17, 10, 8, 7, 5, 3. The expected
frequency per cell does not need to be a
whole number. What is important is having some justification
for the comparison distribution
we use.
When we have multi-row tables, such as the second example
with 2 rows, the comparison
group is known or considered to be basically the average of the
existing counts. We will get into
exactly how to set this up in the next lecture. In either case the
comparison (or “expected”)
distribution needs to have the row and column total sums to be
the same as the original or actual
counts.
The hypothesis claims for either chi square test are basically the
same:
Ho: Variable counts are distributed as expected (a claim of no
difference)
Ha: Variable counts are not distributed as expected (a claim that
a difference exists)
Comparing distributions/shapes has a lot of uses in business.
Manufacturing generally
produces parts that have some variation in key measures; we can
use the Chi Square to see if the
distribution of these differences from the specification value is
normally distributed, or if the
distribution is changing overtime (indicating something is
changing – such as machine
tolerances). The author used this approach to compare the
distribution/pattern of responses to
questions on an employee opinion survey between departments
and the overall division.
Different response patterns suggested the issue was a
departmental one while similar patterns
suggested that the division “owned” the results, indicating
which group should develop ways to
improve the results.
Summary
This week we looked at two different tests, one that looks for
mean differences among
two or more groups and one that looks for differences in
patterns, distributions, or shapes in the
data set.
The Analysis of Variance (ANOVA) test uses the difference in
variance between the
entire data set and the average variance of the groups to see if
at least one mean differs. If so, the
construction of difference intervals will tell us which of the
pairs of means actually differ.
The Chi Square tests look at patterns within data sets and lets us
compare them to a
standard or to each other.
Both tests are found in the Data Analysis link in Excel and
follow the same basic set-up
process as we saw with the F and t-tests last week.
If you have any questions on this material, please ask your
instructor.
After finishing with this lecture, please go to the first
discussion for the week, and engage
in a discussion with others in the class over the first couple of
days before reading the second
lecture.
BUS 308 Week 2 Lecture 1
Examining Differences - overview
Expected Outcomes
After reading this lecture, the student should be familiar with:
1. The importance of random sampling.
2. The meaning of statistical significance.
3. The basic approach to determining statistical significance.
4. The meaning of the null and alternate hypothesis statements.
5. The hypothesis testing process.
6. The purpose of the F-test and the T-test.
Overview
Last week we collected clues and evidence to help us answer
our case question about
males and females getting equal pay for equal work. As we
looked at the clues presented by the
salary and comp-ratio measures of pay, things got a bit
confusing with results that did not see to
be consistent. We found, among other things, that the male and
female compa-ratios were fairly
close together with the female mean being slightly larger. The
salary analysis showed a different
view; here we noticed that the averages were apparently quite
different with the males, on
average, earning more. Contradictory findings such as this are
not all that uncommon when
examining data in the “real world.”
One issue that we could not fully address last week was how
meaningful were the
differences? That is, would a different sample have results that
might be completely different, or
can we be fairly sure that the observed differences are real and
show up in the population as
well? This issue, often referred to as sampling error, deals with
the fact that random samples
taken from a population will generally be a bit different than the
actual population parameters,
but will be “close” enough to the actual values to be valuable in
decision making.
This week, our journey takes us to ways to explore differences,
and how significant these
differences are. Just as clues in mysteries are not all equally
useful, not all differences are
equally important; and one of the best things statistics will do
for us is tell us what differences
we should pay attention to and what we can safely ignore.
Side note; this is a skill that many managers could benefit from.
Not all differences in
performances from one period to another are caused by
intentional employee actions, some are
due to random variations that employees have no control over.
Knowing which differences to
react to would make managers much more effective.
In keeping with our detective theme, this week could be
considered the introduction of
the crime scene experts who help detectives interpret what the
physical evidence means and how
it can relate to the crime being looked at. We are getting into
the support being offered by
experts who interpret details. We need to know how to use
these experts to our fullest
advantage. ��
Differences
In general, differences exist in virtually everything we measure
that is man-made or
influenced. The underlying issue in statistical analysis is that at
times differences are important.
When measuring related or similar things, we have two types of
differences: differences in
consistency and differences in average values. Some examples
of things that should be the
“same” could be:
• The time it takes to drive to work in the morning.
• The quality of parts produced on the same manufacturing line.
• The time it takes to write a 3-page paper in a class.
• The weight of a 10-pound bag of potatoes.
• Etc.
All of these “should” be the same, as each relates to the same
outcome. Yet, they all differ. We
all experience differences in travel time, and the time it takes to
produce the same output on the
job or in school (such as a 3-page paper). Production standards
all recognize that outcomes
should be measured within a range rather than a single point.
For example, few of us would be
upset if a 10-pound bag of potatoes weighed 9.85 pounds or
would think we were getting a great
deal if the bag weighed 10.2 pounds. We realize that it is
virtually impossible for a given
number of potatoes to weigh exactly the same and we accept
this as normal.
One reason for our acceptance is that we know that variation
occurs. Variation is simply
the differences that occur in things that should be “the same.”
If we can measure things with
enough detail, everything we do in life has variation over time.
When we get up in the morning,
how long it takes to get to work, how effective we are at doing
the same thing over and over, etc.
Except for physical constants, we can say that things differ and
we need to recognize this. A side
note: variation exists in virtually everything we study (we have
more than one language, word,
sentence, paragraph, past actions, financial transactions, etc.),
but only in statistics do we bring
this idea front and center for examination.
This suggests that any population that we are interested in will
consist of things that are
slightly different, even if the population contains only one
“thing.” Males are not all the same,
neither are females. Manufactured parts differ in key
measurements; this is the reason we have
quality control checking to make sure the differences are not
too large. So, even if we measure
everything in our population we will have a mean that is
accompanied by a standard deviation
(or range). Managers and professionals need to manage this
variation, whether it is quantitative
(such as salary paid for similar work) or even qualitative (such
as interpersonal interactions with
customers).
The second reason that we are so concerned with differences is
that we rarely have all the
evidence, or all the possible measures of what we are looking
for. Having this would mean we
have access to the entire population (everything we are
interested in); rarely is this the case.
Generally, all decisions, analysis, research, etc. is done with
samples, a selected subset of the
population. And, with any sample we are not going have all the
information needed, obviously;
but we also know that each sample we take is going to differ a
bit. (Remember, variation is
everywhere, including in the consistency of sample values.) If
you are not sure of this, try
flipping a coin 10 times for 10 trials, do you expect or get the
exact same number of heads for
each trial? Variation!
Since we are making decisions using samples, we have even
more variation to consider
than simply that with the population we are looking at. Each
sample will be slightly different
from its population and from others taken from the same
population.
How do we make informed decisions with all this variation and
our not being able to
know the “real” values of the measures we are using? This
question is much like how detectives
develop the “motive” for a crime – do they know exactly how
the guilty party felt/thought when
they say “he was jealous of the success the victim had.” This
could be true, but it is only an
approximation of the true feelings, but it is “close enough” to
say it was the reason. It is similar
with data samples, good ones are “close enough” to use the
results to make decisions with. The
question we have now focuses on how do we know what the
data results show?
The answer lies with statistical tests. They can use the
observed variation to provide
results that let us make decisions with a known chance of being
wrong! Most managers hope to
be right just over 50% of the time, a statistical decision can be
correct 95% or more of the time!
Quite an improvement.
Sampling. The use of samples brings us to a distinction in
summary statistics, between
descriptive and inferential statistics. With one minor exception
(discussed shortly), these two
appear to be the same: means, standard deviations, etc.
However, one very important distinction
exists in how we use these. Descriptive statistics, as we saw
last week, describes a data set. But,
that is all they do. We cannot use them to make claims or
inferences about any other larger
group.
Making inferences or judgements about a larger population is
the role of inferential
statistics and statistical tests. So, what makes descriptive
statistics sound enough to become
inferential statistics? The group they were taken from! If we
have a sample that is randomly
selected from the population (meaning that each member has the
same chance of being selected
at the start), then we have our best chance of having a sample
that accurately reflects the
population, and we can use the statistics developed from that
sample to make inferences back to
the population. (How we develop a randomly selected sample is
more of a research course issue,
and we will not go into these details. You are welcome to
search the web for approaches.)
Random Sampling. If we are not working with a random
sample, then our descriptive
statistics apply only to the group they are developed for. For
example, asking all of our friends
their opinion of Facebook only tells us what our friends feel; we
cannot say that their opinions
reflect all Facebook users, all Facebook users that fall in the
age range of our friends, or any
other group. Our friends are not a randomly selected group of
Facebook users, so they may not
be typical; and, if not typical users, cannot be considered to
reflect the typical users.
If our sample is random, then we know (or strongly suspect) a
few things. First, the
sample is unlikely to contain both the smallest and largest value
that exists in the larger
population, so an estimate of the population variation is likely
to be too small if based on the
sample. This is corrected by using a sample standard deviation
formula rather than a population
formula. We will look at what this means specifically in the
other lectures this week; but Excel
will do this for us easily.
Second, we know that our summary statistics are not the same
as the population’s
parameter values. We are dealing with some (generally small)
errors. This is where the new
statistics student often begins to be uncomfortable. How can we
make good judgements if our
information is wrong? This is a reasonable question, and one
that we, as data detectives, need to
be comfortable with.
The first part of the answer falls with the design of the sample,
by selecting the right
sample size (how many are in the sample), we can control the
relative size of the likely error.
For example, we can design a sample where the estimated error
for our average salary is about
plus or minus $1,000. Does knowing that our estimates could
be $1000 off change our view of
the data? If the female average was a thousand dollars more
and the male salary was a thousand
dollars less, would you really change your opinion about them
being different? Probably not
with the difference we see in our salary values (around 38K
versus 52K). If the actual averages
were closer together, this error range might impact our
conclusions, so we could select a sample
with a smaller error range. (Again, the technical details on how
to do this are found in research
courses. For our statistics class, we assume we have the correct
sample.)
Note, this error range is often called the margin of error. We
see this most often in
opinion polls. For example, if a poll said that the percent of
Americans who favored Federal
Government support for victims of natural disasters (hurricanes,
floods, etc.) was 65% with a
margin of error of +/- 3%; we would say that the true proportion
was somewhat between 62% to
68%, clearly a majority of the population. Where the margin of
error becomes important to
know is when results are closer together, such as when support
is 52% in favor versus 48%
opposed, with a margin of error of 3%. This means the actual
support could be as low as 49% or
as high as 55%; meaning the results are generally too close to
make a solid decision that the issue
is supported by a majority, the proverbial “too close to call.”
The second part of answering the question of how do we make
good decisions introduces
the tools we will be looking at this week, decision making
statistical tests that focus on
examining the size of observed differences to see if they are
“meaningful” or not. The neat part
of these tools is we do not need to know what the sampling
error was, as the techniques will
automatically include this impact into our results!
The statistical tools we will be looking at for the next couple of
weeks all “work” due to a
couple of assumptions about the population. First, the data
needs to be at the interval or ratio
level; the differences between sequential values needs to be
constant (such as in temperature or
money). Additionally, the data is assumed to come from a
population that is normally
distributed, the normal curve shape that we briefly looked at
last week. Note that many
statisticians feel that minor deviations from these strict
assumptions will not significantly impact
the outcomes of the tests.
The tools for this week and next use the same basic logic. If we
take a lot of samples
from the population and graph the mean for all of them, we will
get a normal curve (even if the
population is not exactly normal) distribution called the
sampling distribution of the mean.
Makes sense as we are using sample means. This distribution
has an overall, or grand, mean
equal to that of the population. The standard deviation equals
the standard deviation of the
population divided by the square root of the population. (Let’s
take this on faith for now, trust
me you do not want to see the math behind proving these. But
if you do, I invite you to look it
up on the web.) Now, knowing – in theory – what the mean
values will be from population
samples, we can look at how any given sample differs from
what we think the population mean
is. This difference can be translated into what is essentially a
z-score (although the specific
measure will vary depending upon the test we are using) that we
looked at last week. With this
statistic, we can determine how likely (the probability of)
getting a difference as large or larger
than we have purely by chance (sampling error from the actual
population value) alone.
If we have a small likelihood of getting this large of a
difference, we say that our
difference is too large to have been purely a sampling error, and
we say a real difference exists or
that the mean of the population that the sample came from is not
what we thought.
That is the basic logic of statistical testing. Of course, the
actual process is a bit more
structured, but the logic holds: if the probability of getting our
result is small (for example 4% or
0.04), we say the difference is significant. If the probability is
large (for example 37% or 0.37),
then we say there is not enough evidence to say the difference is
anything but a simple sampling
error difference from the actual population result.
The tools we will be adding to our bag of tricks this week will
allow us to examine
differences between data sets. One set of tools, called the t-
test, looks at means to see if the
observed difference is significant or merely a chance difference
due mostly to sampling error
rather than a true difference in the population. Knowing if
means differ is a critical issue in
examining groups and making decisions.
The other tool – the F-test for variance, does the same for the
data variation between
groups. Often ignored, the consistency within groups is an
important characteristic in
understanding whether groups having similar means can be said
to be similar or not. For
example, if a group of English majors all took two classes
together, one math and one English,
would you expect the grade distributions to be similar, or would
you expect one to show a larger
range (or variation) than the other?
We will see throughout the class that consistency and
differences are key elements to
understanding what the data is hiding from us, or trying to tell
us – depending on how you look
at it. In either case, as detectives our job is to ferret out the
information we need to answer our
questions.
Hypothesis Testing-Are Differences Meaningful
Here is where the crime scene experts come in. Detectives have
found something but are
not completely sure of how to interpret it. Now the training and
tools used by detectives and
analysts take over to examine what is found and make some
interpretations. The process or
standard approach that we will use is called the hypothesis
testing procedure. It consists of six
steps; the first four (4) set up the problem and how we will
make our decisions (and are done
before we do anything with the actual data), the fifth step
involves the analysis (done with
Excel), and the final and sixth step focuses on interpreting the
result.
The hypothesis testing procedure is a standardized decision-
making process that ensures
we make our decisions (on whether things are significantly
different or not) is based on the data,
and not some other factors. Many times, our results are more
conservative than individual
managerial judgements; that is, a statistical decision will call
fewer things significantly different
than many managerial judgement calls. This statistical
tendency is, at times, frustrating for
managers who want to show that things have changed. At other
times, it is a benefit such as if
we are hoping that things, such as error rates, have not changed.
While a lot of statistical texts have slightly different versions of
the hypothesis testing
procedure (fewer or more steps), they are essentially the same,
and are a spinoff of the scientific
method. For this class, we will use the following six steps:
1. State the null and alternate hypothesis
2. Select a level of significance
3. Identify the statistical test to use
4. State the decision rule. Steps 1 – 4 are done before we
examine the data
5. Perform the analysis
6. Interpret the result.
Step 1
A hypothesis is a claim about an outcome. It comes in two
forms. The first is the null
hypothesis – sometimes called the testable hypothesis, as it is
the claim we perform all of our
statistical tests on. It is termed the “Null” hypothesis, shown as
Ho, as it basically says “no
difference exists.” Even if we want to test for a difference,
such as males and females having a
different average compa-ratio; in statistics, we test to see if
they do not.
Why? It is easier to show that something differs from a fixed
point than it is to show that
the difference is meaningful – I mean how can we focus on
“different?” What does “different”
mean? So, we go with testing no difference. The key rule
about developing a null hypothesis is
that it always contains an equal claim, this could be equal (=),
equal to or less than (<=), or equal
to or more than (=>).
Here are some examples:
Ex 1: Question: Is the female compa-ratio mean = 1.0?
Ho: Female compa-ratio mean = 1.0.
Ex 2: Q: is the female compa-ratio mean = the male compa-
ratio mean?
Ho: Female compa-ratio mean = Male compa-ratio mean.
Ex. 3: Q: Is the female compa-ratio more than the male compa-
ratio? Note that this
question does not contain an equal condition. In this case, the
null is the opposite of what
the question asks:
Ho: Female compa-ratio <= Male compa-ratio.
We can see by testing this null, we can answer our initial
question of a directional
difference. This logic is key to developing the correct test
claim.
A null hypothesis is always coupled with an alternate
hypothesis. The alternate is the
opposite claim as the null. The alternate hypothesis is shown as
Ha. Between the two claims, all
possible outcomes must be covered. So, for our three examples,
the complete step 1 (state the
null and alternate hypothesis statements) would look like:
Ex 1: Question: Is the female compa-ratio mean = 1.0?
Ho: Female compa-ratio mean = 1.0.
Ha: Female compa-ratio mean =/= (not equal to) 1.0
Ex 2: Q: is the female compa-ratio mean = the male compa-
ratio mean?
Ho: Female compa-ratio mean = Male compa-ratio mean.
Ha: Female compa-ratio mean =/= Male compa-ration mean.
Ex. 3: Q: Is the female compa-ratio more than the male compa-
ratio?
Ho: Female compa-ratio <= Male compa-ratio
Ha: Female compa-ratio > Male compa-ratio. (Note that in this
case, the alternate
hypothesis is the question being asked, but the null is what we
always use as the
test hypothesis.)
When developing the null and alternate hypothesis,
1. Look at the question being asked.
2. If the wording implies an equality could exist (equal to, at
least, no more than, etc.),
we have a null hypothesis and we write it exactly as the
question asks.
3. If the wording does not suggest an equality (less than, more
than, etc.), it refers to the
alternate hypothesis. Write the alternate first.
4. Then, for whichever hypothesis statement you wrote, develop
the other to contain all
of the other cases. An = null should have a =/= alternate, an =>
null should have a <
alternate; a <= null should have a > alternate, and vice versa.
5. The order the variables are listed in each hypothesis must be
the same, if we list
males first in the null, we need to list males first in the
alternate. This minimizes
confusion in interpreting results.
Note: the hypothesis statements are claims about the population
parameters/values based
on the sample results. So, when we develop our hypothesis
statements, we do not consider the
sample values when developing the hypothesis statements. For
example, consider our desire to
determine if the compa-ratio and salary means for males and
females are different in the
population, based on our sample results. While the compa-ratio
means seemed fairly close
together, the salary means seemed to differ by quite a bit; in
both cases, we would test if the male
and female means were equal since that is the question we have
about the values in the
population.
If you look at the examples, you can notice two distinct kinds of
null hypothesis
statements. One has only an equal sign in it, while the other
contains an equal sign and an
inequality sign (<=, but it could be =>). These two types
correspond to two different research
questions and test results.
If we are only interested in whether something is equal or not,
such as if the male average
salary equals the female average salary; we do not really care
which is greater, just if they could
be the same in the population or not. For our equal salary
question, it is not important if we find
that the male’s mean is > (greater than) the female’s mean or if
the male’s mean is < (less than)
the female’s mean; we only care about a difference existing or
not in the population. This, by the
way, is considered a two-tail test (more on this later), as either
conditions would cause us to say
the null’s claim of equality is wrong: a result of “rejecting the
null hypothesis.”
The other condition we might be interested in, and we need a
reason to select this
approach, occurs when we want to specifically know if one
mean exceeds the other. In this
situation, we care about the direction of the difference. For
example, only if the male mean is
greater than the female mean or if the male mean is less than the
female mean.
Step 2
The level of significance is another concept that is critical in
statistics but is often not
used in typical business decisions. One senior manager told the
author that their role was to
ensure that the “boss’ decisions were right 50% +1 of the time
rather than 50% -1.” This
suggests that the level of confidence that the right decisions are
being made is around 50%. In
statistics, this would be completely unacceptable.
A typically statistical test has a level of confidence that the
right decision is being made is
about 95%, with a typical range from 90 to 99%. This is done
with our chosen level of
significance. For this class, we will always use the most
common level of 5%, or more
technically alpha = 0.05. This means we will live with a 5%
chance of saying a difference is
significant when it is not and we really have only a chance
sampling error.
Remember, no decision that does not involve all the possible
information that can be
collected will ever have a zero possibility of being wrong. So,
saying we are 95% sure we made
the right call is great. Marketing studies often will use an alpha
of .10, meaning that are 90%
sure when they say the marketing campaign worked. Medical
studies will often use an alpha of
0.01 or even 0.001, meaning they are 99% or even 99.9% sure
that the difference is real and not
a chance sampling error.
Step 3
Choosing the statistical test and test statistic depends upon the
data we have and the
question we are asking. For this week, we will be using compa-
ratio data in the examples and
salary data in the homework – both are continuous and at least
interval level data. The questions
we will look at this week will focus on seeing if there is a
difference in the average pay (as
measured by either the compa-ratio or salary) between males
and females in the population,
based on our sample results. After all, if we cannot find a
difference in our sample, should we
even be working on the question?
In the quality improvement world, one of the strategies for
looking for and improving
performance of a process is to first look at and reduce the
variation in the data. If the data has a
lot of variation, we cannot really trust the mean to be very
reflective of the entire data set.
Our first statistical test is called the F-test. It is used when we
have at least interval level
data and we are interested in determining if the variances of two
groups are significantly
different or if the observed difference is merely chance
sampling error. The test statistic for this
is the F.
Once we know if the variances are the same or not, we can
move to looking for
differences between the group means. This is done with the T-
test and the t-statistic. Details on
these two tests will be given later; for now, we just need to
know what we are looking at and
what we will be using.
Step 4
One of the rules in researching questions is that the decision
rule, how we are going to
make our decision once the analysis is done, should be stated
upfront and, technically, even
before we even get to the data. This helps ensure that our
decision is data driven rather than
being made by emotional factors to get the outcome we want
rather than the outcome that fits the
data. (Much like making our detectives go after the suspect that
did the crime rather than the one
they do not like and want to arrest, at least when they are being
honest detectives.)
The decision rule for our class is very simple, and will always
be the same:
Reject the null hypothesis if the p-value is less than our alpha
of .05. (Note: this would
be the same as saying that if the p-value is not less than 0.05,
we would fail to reject the null
hypothesis.)
We introduced the p-value last week, it is the probability of our
outcome being as large or
larger than we have by pure chance alone. The further from the
actual mean a sample mean is,
the less chance we have of getting a value that differs from the
mean that much or more; the
closer to the actual mean, the greater our chance would be of
getting that difference or more
purely by sampling error.
Our decision rule ties our criteria for significance of the
outcome, the step 2 choice of
alpha, with the results that the statistical tests will provide (and,
the Excel tests will give us the p-
values for us to use in making the decisions).
These four steps define our analysis, and are done before we do
any analysis of the data.
Step 5
Once we know how we will analyze and interpret the results, it
is time to get our sample
data and set it up for input into an Excel statistical function.
Some examples of how this data
input works will be discussed in the third lecture for this week.
This step is fairly easy, simply identify the statistical test we
want to use. The test to use
is based on our question and the related hypothesis claims. For
this week, if we are looking at
variance equality, we will use the F-test. If we are looking at
mean equality, we will use the T-
test.
Step 6
Here is where we bring everything together and interpret the
outcomes.
What is constant about this step is the need to:
1. Look at the appropriate p-value (indicated in the test outputs,
as we will see in lecture
2).
2. Compare the p-value with our value for alpha (0.05).
3. Make a decision: if the test p-value is less than or equal to
(<=) 0.05, we will reject
the null hypothesis. If the test p-value is more than (=>) 0.05,
we will fail to reject
the null hypothesis.
Rejecting the null hypothesis means that we feel the alternate
hypothesis is the more
accurate statement about the populations we are testing. This is
the same for all of our statistical
tests.
Once we have made our decision to reject or fail to reject the
null hypothesis, we need to
close the loop, and go back and answer our original question.
We need to take the statistical
result or rejecting or failing to reject the null and turn it into an
“English” …
Expected Outcomes
BUS308 – Week 1 Lecture 1
Statistics
After reading this lecture, the student should be familiar with:
1. The basic ideas of data analysis.
2. Key statistical concepts and terms.
3. The basic approach for this class.
4. The case focus for the class.
What we are all about
Data, measurements, counts, etc., is often considered the
language of business. However, it also plays an important role
in our personal lives as well. Data, or more accurately, the
analysis of data answers our questions. These may be business
related or personal. Some questions we may have heard that
require data to answer include:
1. On average, how long does it take you to get to work? Or,
alternately, when do you have to leave to get to work on time?
2. For budget purposes, what is the average expense for
utilities, food, etc.?
3. Has the quality rejection rate on production Line 3 changed?
4. Did the new attendance incentive program reduce the
tardiness for the department?
5. Which vendor has the best average price for what we order?
6. Which customers have the most complaints about our
products?
7. Has the average production time decreased with the new
process?
8. Do different groups respond differently to an employee
questionnaire?
9. What are the chances that a customer will complain about or
return a product?
Note that all of these very reasonable questions require that we
collect data, analyze it, and reach some conclusion based upon
that result.
Making Sense of Data
This class is about ways to turn data sets, lots of raw numbers,
into information that we can use. This may include simple
descriptions of the data with measures such as average, range,
high and low values, etc. It also includes ways to examine the
information within the data set so that we can make decisions,
identify patterns, and identify existing relationships. This is
often called data analysis; some courses discuss this approach
with the term “data-based decision making.” During this class
we will focus on the logic of analyzing data and interpreting
these results.
What this class is not
This class is not a mathematics course. I know, it is called
statistics and it deals with numbers, but we do not focus on
creating formulas or even doing calculations. Excel will do all
of the calculations for us; for those of you who have not used
Excel before, and even for some who have, you will be
pleasantly surprised at how powerful and relatively easy to use
it is.
It is also not a class in collecting the data. Courses in research
focus on how to plan on collecting data so that it is fair and
unbiased. Statistics deals with working on the data after it has
been collected.
Class structure
There are two main themes to this class. The first focuses on
interpreting statistical outcomes. When someone says, the result
is statistically significant with a p-value of 0.01; we need, as
professionals, to know what it means. As you move higher into
business and other professional positions, you will probably
hear others report on studies using this kind of language. (Data
analysis is becoming increasing more common in business.)
The second thread focuses on how to take some data and
generate statistical reports using Excel. Excel is a fairly
common PC program that is part of Microsoft’s Office suite of
tools, and as such many businesses have it available for
professionals and managers. Even if you just do a quick analysis
of some data, this program is tremendously useful.
This class does not have a text, but rather provides the material
you need in three lectures each week. The first lecture is an
overview, it provides a structure about what the week’s focus is
all about. The second lecture focuses on understanding the
statistical tools being presented; how to read the outputs and
how to understand and interpret what they are telling us. The
third lecture for each week focuses on Excel and presenting the
steps needed to generate the statistical output.
Unlike other classes, we have three weekly discussions; one
related to each of the lecture segments. The intent is for you to
read a lecture and then go to the discussion thread for a couple
of days. Then go read the next lecture, discuss it for a couple of
days, and then finish with the last lecture. This chunking of
material is designed to let the information “sink in” before
moving to new things.
Introducing Statistical Analysis
Data analysis
Data analysis, whether statistical, financial, operational, etc.,
often appears to be a set of unrelated tools and techniques that
are somehow applied sequentially to get an answer to some
question. Indeed, most textbooks present statistical analysis this
way; introduce a topic, provide some examples, present practice
exercises, and then on to the next topic with new examples and
exercises that often have nothing to do with what was
previously presented.
This approach, while common in many numerical and even
qualitative courses, often leaves students with an incomplete
idea of how everything fits together. We are trying a different
approach in this class and will be using a single case/situation
to demonstrate the interconnectedness of our tools.
Data analysis, and particularly statistical analysis, is much like
solving a mystery. Those who work with these tools are like the
detectives we see on TV shows. In general, the process involves
a situation (or crime) presents itself and the team goes to work.
Initially, they look at the “big picture” to try and understand the
basics of the situation. After that, the focus shifts to specific
details as they examine suspects, look for and verify alibis, find
links between different individuals and activities; often this part
of the investigation seems uncoordinated and even a bit chaotic
with little obvious links to the overall situation. But, finally
everything is pulled together and all the various threads form a
conclusion and they “nab the culprit.”
So, to tie what the TV detectives do with what we, as data
analysts, will be doing, take a look at the following. Hopefully,
this will relate the familiar crime story to data analysis.
· The “crime” we focus on presents itself as some outcome;
results of a manufacturing process, customer satisfaction ratings
differences, financial outcomes, etc.; that we do not fully
understand.
· The “witnesses” we look at are the different data
measurements we have.
· Our “questions” are just that – questions about what we want
to find out from the data.
· Our “theory of the crime” focuses on how we think the data is
related to our questions.
· The “alibis” are the data summaries and test outcomes that
show if particular data is related
to the outcome or not.
· The “false leads” are data measures that are not actually
helpful in answering our questions.
· The “person(s) of interest” or suspects are the specific
measurements or counts that could
influence pay levels. These include grade level, seniority,
performance appraisal rating,
gender, raise, and education/degree level
· And, finally, the “guilty party” is the data that is related to
any illegal pay discrepancies we
uncover.
Just as with any of our favorite shows, we need to take all of
these elements and work through them to come up with the
answers to our questions; and, often, an understanding of why
the issue exists at all.
The Crime
In this course, we will have a single “crime” to focus on. This
issue will form the basis for the lectures each week and the
assignments. We will be looking at a Human Resources issue:
are males and females getting equal pay for doing equal work?
As background, The Federal Equal Pay Act requires companies
to pay males and females the same pay if they are doing
(substantially) the same work. We will be taking the role of data
analysts (AKA detectives) in a company that has received some
evidence that they are going to have a Federal audit of their pay
practices due to a complaint on unfair pay practices. Our “job,”
the basis of the class assignments, is to determine if we do or do
not comply with the Equal Pay Act.
HR professionals often examine pay issues from two points of
view. One is the actual salary an employee makes, a very
obvious approach. This is the approach that you will take as
you do the weekly assignments. Each assignment and each
question require you to focus on the actual salaries paid to the
employees. What differences do we see, how consistent is the
data, what impacts salary outcomes, etc.?
The second approach is more relative in nature and deals with a
compensation measure called the compa-ratio (comparison-
ratio). This measure compares an employee’s salary to the
midpoint of the salary grade; this is done simply by dividing the
employee’s salary by the midpoint of the salary grade the
employee is in. (For those not familiar with salary grades, they
are groups of jobs within a company that generally require the
same skill and knowledge levels and are paid within the same
salary range. The midpoints of these ranges are considered to be
the market rate, or average pay that companies in the
community pay for similar work.) Examining compa-ratios lets
HR see how employees are distributed around the midpoint
without focusing on the actual different salaries involved. It
provides a second way to examine how males and females are
paid without worrying about the grades they are in. This
approach will be used in the weekly lectures to provide both an
example of how to do each homework problem and a way of
providing a different view on the equal pay question.
So, each week we will be looking at the pay practices of “our
company” in two ways. The lectures and the weekly
assignments will each deal with the same questions but will do
so with a different measure of pay. In the homework, you will
be asked to form tentative conclusions each about the equal pay
question using the insights from BOTH the lecture examples of
compa-ratio and the salary-based results from your homework
problems.
One additional point, the data used in the weekly lectures will
be slightly different than the data set you will be working with.
We can consider these differences to be minor, as if the lecture
uses a different sample of employees, but one that is consistent
with the sample used for the homework. The conclusions
reached in each week’s homework should use the findings from
both the lecture’s examples and the homework problems. The
actual reason for the difference is that students in the past have
copied answers from websites and other students and handed
them in as their own original work. So, to keep this from
happening, the class data set you will be working with changes
periodically, so previous answers will not be correct. It does not
make sense to redo the lectures every time the data changes, so
the lecture’s salary and compa-ratio data is comparable but not
identical.
Getting Started
In real life on the job or with assignments we often, as do TV
detectives, have an overwhelming amount of data that we need
to sift through to get to our clues; and then interpret the clues to
get the information we need to answer our questions about what
happened with the process or outcome we are looking at. The
information that we are first presented with is typically a bunch
of numbers that measure, count, and code lots of things. Note
we have three kinds of data we will deal with:
• Measures tell us how much exists; such as a salary measure
would tell us how many dollars someone is paid.
· Counts tell us how many exist, such as counting how many
employees have a master’s degree.
· Codes tell us about a characteristic; for example, we might
code being male as M and being female as F. However, we
could also use 0 for male and 1 for female. These numbers do
not mean one gender is somehow ‘better” or “higher” than the
other, they merely show a difference. They are identifiers. More
about this latter.
So, as data detectives, we approach any question by finding
numbers (measures, counts, and codes) that somehow relate to
the issue and the question we need to answer about the
situation. Once we have this data, we need to sort thru it to find
the clues that need to be examined to understand the situation or
outcome. For this class, clues are what we get once we have
done some statistical work on the data. This work, as we will
see throughout the class, starts with relatively simply
summaries – average values for different groups or things,
measures of how consistent things are, etc. These summary
measures become our first clues. And, just as with any good
detective story, not all the clues are meaningful and some are
not immediately apparent. The detective/analyst needs to find
out what happened what the clues mean to understand and
“solve” the crime.
Before we start with the data and how to tease clues from it, we
need to understand a couple of concepts:
· Population: includes all of the “things” we are interested in;
for example, the population of the U.S. would include everyone
living in the country.
· Sample: involves only a selected sub-group of the population;
for example, those selected for a national opinion poll.
· Random Sample: a sample where every member of the
population has an initial equal chance of being selected; this is
the only way to obtain a sample that is truly representative of
the entire population. Details on how to conduct a random
survey are covered in research courses; we will assume the data
we will be working with comes from a random sample of a
company’s exempt employees. Note: an exempt employee, AKA
salaried employee, does not get overtime pay for working more
than 40 hours in a week (“exempt from overtime
requirements”).
· Parameter: a characteristic of the population; the average age
of everyone in the US would be a parameter.
· Statistic: a characteristic of a sample; the average age of
everyone you know who attends school would be a statistic as
the group is a sub-group of all students.
· Descriptive Statistics: measures that summarize characteristics
of a group.
· Inferential Statistics: measures that summarize the
characteristics of a random sample
and are used to infer the value of the population parameters.
· Statistical test: a quantitative technique to make a judgement
about population
parameters based upon random sample outcomes (statistics).
The Case
Our class, as a group of data analysts/detectives, will play the
role of helping a Human Resources Department (in our assumed
“company”) prepare for an audit from the government about our
pay practices. This routine audit will focus on the question of
equal pay for equal work, as required by both State and Federal
statutes. Specifically, these require that if males and females
are performing substantially the same job (equal work), then
they should be paid equally.
Of course, nothing is quite that simple. The laws do allow some
differences based on company policies calling for pay
differences due to performance, seniority, education, and – with
some companies – functional areas. Our company does have
policies saying we pay for organizational level (different kinds
of work, which are represented by job grades), performance (as
measured by the performance rating), seniority, experience, and
educational achievements.
Our first step is to decide upon some questions that need to be
answered, as questions lead to the data we need to collect. The
overall question, also known as the Research Question, is
simply: “Do males and females receive equal pay for equal
work?” This just means that if a male and female are doing the
same work for the company, are they paid the same? As
straightforward as this question seems, it is very difficult to
answer directly. So, after brainstorming, secondary or
intermediate (more basic) questions have been identified as
needing to be answered as we build our case towards the final
answer. Some of these secondary questions (which will be
address throughout the course) include:
· Do we have any measures that show pay comparisons between
males and females?
· Since different factors influence pay, do males and females
fare differently on them;
such as age, service, education, performance ratings, etc.?
· How do the various salary related inputs interact with each
other? How do they
impact pay levels?
These general questions lead to our collecting data from a
random sample of employees. Note that a random sample
(covered in research courses) is the best approach to give us a
sample that closely represents the actual employee population.
The sample consists of 25 males and 25 females. The following
data was collected on each employee selected:
· Salary, rounded to the nearest $100 dollars and measured in
thousands of dollars, for example an annual salary of $38,825 is
recorded as 38.8.
· Age, rounded (up or down) to the age as of the employee’s
nearest birthday.
· Seniority, rounded (up or down) to the nearest hiring
anniversary.
· Performance Appraisal Rating, based on a 100-point scale.
· Raise – the percent of their last performance merit increase.
· Job grade – groups of jobs that are considered substantially
similar work (for equal work purposes) that are grouped into
classifications ranging from A (the lowest grade) through F (the
highest grade). Note: all employees in this study are exempt
employees – paid with a salary and not eligible for overtime
payments. They are considered middle management and
professional level employees.
· Midpoint – the middle of the salary range assigned to each Job
Grade level. The midpoint is considered to be the average
market rate that companies pay for jobs within each grade.
· Degree – the educational achievement level, coded as 0 for
those having a Bachelor’s degree and 1 for those having a
Master’s degree or higher.
· Gender – coded as M for Males, and F for Females, also coded
0 (Males) and 1 (Females) for use in an advanced statistical
technique introduced in Week 4.
In addition to these collected measures, the HR Compensation
Department has provided the compa-ratio for each employee.
The Compa-ratio is defined as the salary divided by the
employee’s grade midpoint. For example, an employee with a
salary of $50,000 and a company salary range midpoint of
$48,000 would have a Compa-ratio of 50/48 = 1.042 (rounded to
three decimal places). Employees with a Compa-ratio greater
(>) 1.0 are paid more than the market rate for their job, while
employees with a Compa-ration less than (<) 1.0 are paid less
than the prevailing market rate. Compensation professionals use
Compa-ratios to examine the spread and relative pay levels of
employees while the impact of grade is removed from the
picture.
Here is the data collected that will be used in the lecture
examples and discussions.
ID
Salary
Compa- ratio
Mid
Age
Perf App.
Service
Gender
Raise
Deg.
Gender 1
Grade
1
58
1.017
57
34
85
8
0
5.7
0
M
E
2
27
0.870
31
52
80
7
0
3.9
0
M
B
3
34
1.096
31
30
75
5
1
3.6
1
F
B
4
66
1.157
57
42
100
16
0
5.5
1
M
E
5
47
0.979
48
36
90
16
0
5.7
1
M
D
6
76
1.134
67
36
70
12
0
4.5
1
M
F
7
41
1.025
40
32
100
8
1
5.7
1
F
C
8
23
1.000
23
32
90
9
1
5.8
1
F
A
9
77
1.149
67
49
100
10
0
4
1
M
F
10
22
0.956
23
30
80
7
1
4.7
1
F
A
11
23
1.000
23
41
100
19
1
4.8
1
F
A
12
60
1.052
57
52
95
22
0
4.5
0
M
E
13
42
1.050
40
30
100
2
1
4.7
0
F
C
14
24
1.043
23
32
90
12
1
6
1
F
A
15
24
1.043
23
32
80
8
1
4.9
1
F
A
16
47
1.175
40
44
90
4
0
5.7
0
M
C
17
69
1.210
57
27
55
3
1
3
1
F
E
18
36
1.161
31
31
80
11
1
5.6
0
F
B
19
24
1.043
23
32
85
1
0
4.6
1
M
A
20
34
1.096
31
44
70
16
1
4.8
0
F
B
21
76
1.134
67
43
95
13
0
6.3
1
M
F
22
57
1.187
48
48
65
6
1
3.8
1
F
D
23 23 1.000 23 36 65 6 1 3.3 0 F A 24 50 1.041 48 30 75 9 1 3.8
0 F D 25 24 1.043 23 41 70 4 0 4 0 M A 26 24 1.043 23 22 95 2
1 6.2 0 F A 27 40 1.000 40 35 80 7 0 3.9 1 M C 28 75 1.119 67
44 95 9 1 4.4 0 F F 29 72 1.074 67 52 95 5 0 5.4 0 M F 30 49
1.020 48 45 90 18 0 4.3 0 M D 31 24 1.043 23 29 60 4 1 3.9 1 F
A 32 28 0.903 31 25 95 4 0 5.6 0 M B 33 64 1.122 57 35 90 9 0
5.5 1 M E 34 28 0.903 31 26 80 2 0 4.9 1 M B 35 24 1.043 23
23 90 4 1 5.3 0 F A 36 23 1.000 23 27 75 3 1 4.3 0 F A 37 22
0.956 23 22 95 2 1 6.2 0 F A 38 56 0.982 57 45 95 11 0 4.5 0 M
E 39 35 1.129 31 27 90 6 1 5.5 0 F B 40 25 1.086 23 24 90 2 0
6.3 0 M A 41 43 1.075 40 25 80 5 0 4.3 0 M C 42 24 1.043 23
32 100 8 1 5.7 1 F A 43 77 1.149 67 42 95 20 1 5.5 0 F F 44 60
1.052 57 45 90 16 0 5.2 1 M E 45 55 1.145 48 36 95 8 1 5.2 1 F
D 46 65 1.140 57 39 75 20 0 3.9 1 M E 47 62 1.087 57 37 95 5
0 5.5 1 M E 48 65 1.140 57 34 90 11 1 5.3 1 F E 49 60 1.052 57
41 95 21 0 6.6 0 M E 50 66 1.157 57 38 80 12 0 4.6 0 M E
(Note that this table can be copied into an Excel file if you
would like to duplicate the examples provided in the lectures.)
What kind of data do we have?
Just as all clues and information uncovered on mystery shows
are not equally valuable, or even useful; not all data is equally
useful in answering questions. But, all data has some value. As
we look at this data set, it is clear that not all the data is the
same. We have some measures (salary, seniority, etc.) but we
also have some labels (ID, for example merely identifies
different employees in the data set, and is not useful for much
else). We have some data that are clearly codes, gender and
degree for example. In general, our data set can be sorted into
four kinds of data, nominal, ordinal, interval, and ratio (NOIR):
· Nominal: these are basically names or labels. For example, in
our data set we see Gender1 labeled as M and F (for males and
females). Other examples of nominal data include names of cars
(Ford, Chevrolet, Dodge, etc.), cities and states, flowers, etc.
Anything where the name/label just indicates a difference from
something else that is similar is nominal level data. Now, we
can “code” with words and letters (such as Male or M) but we
can also code them with using 0 and 1(for male and female) as
we do with the Gender variable. Regardless of one looking like
a label (letters) and one looking like a measurement (numbers),
both of these are simply ways to label males and females – they
indicate a difference between the groups only – not that one is
somehow higher than the other (as we typically think of 1 as
higher or more than 0).
Nominal level data are used in two ways. First, we can count
them; for example, how many males and females exist in the
group? Second, we can use them as group labels to identify
different groups, and list other characteristics in each group; a
list of all male and female compa-ratios will be quite helpful in
our analysis, for example.
· Ordinal: these variables add a sense of order to the difference,
but where the differences are not the same between levels.
Often, these variables are based on judgement calls creating
labels that can be placed in a rank order, such as good, better,
best. The grade and degree variables in our data set are ordinal.
We cannot assume that the amount of work to get the higher
degree or higher job grade is the same for all differences. Note:
Even though we only show education as bachelor and graduate,
we could include no high school diploma, high school diploma
on the low end and doctoral degree and professional
certification on the upper end.
· Interval: these variables have a constant difference between
successive values. Temperature is a common example – the
difference between, for example, 45 and 46 degrees is the same
amount of heat as between 87 and 88 degrees. Note: Often,
analysts will assume that personal judgement scores such as
Performance Appraisal ratings or responses on a questionnaire
scale using scores of 1 to 5 are ordinal as it cannot be proven
the differences are constant. Other researchers have suggested
that these measures can be considered interval in nature for
analysis purposes. We will consider performance appraisal
ratings to be interval level data for our analysis purposed.
· Ratio: these are interval measures that have a 0 point that
means none. For example, while 0 dollars in your wallet means
no money, a temperature of 0 degrees does not mean no heat.
Ratio level variables include salary, compa-ratio, midpoint, age,
service, and raise – even if our measurements do not go down to
0, each measure does have a 0 point that would mean none.
These differences are important, as we can do different kinds of
analysis with each level, and attempting to use the wrong level
of data in an approach will result in misleading or wrong
outcomes. Within our data set our variables fit into these
groups.
· Nominal: ID, Gender, Gender1 (merely labels showing a
difference)
· Ordinal: Grade, Degree (can be ordered from low to high, ex
Grade A is the lowest and
Grade F is the highest grade.)
· Interval: Performance Rating (Constant difference between
values, but no meaningful 0 score)
· Ratio: Salary, Compa-ratio, Midpoint, Seniority, Age, Raise
(All have a 0 point that means none)
Wow – a lot of background material. But, now that we have this
covered, we can get to actually looking at our data. As
suggested above, the first question we need to ask is “do we
have any measures that show pay comparisons between our
males and females?”
Now, we move on to some specific ways we can use to see if the
company is guilty of not paying males and females the same for
doing equal work in lectures two and three for the week.
Summary
This class is about uncovering the secrets hidden within data
sets. As data detectives (AKA data analysts), we need to
develop both the tools and the logic to examine data sets and
use them to answer business questions.
This class, will use a single business question: Are males and
females paid the same for doing equal work? Each week we will
look at different tools and techniques to summarize and make
sense out of the data set assigned to your class.
We looked at a lot of what we could call “background” in this
lecture. Information that is needed to understand what we are
doing but which does little more than set-up the problem.
This included the lecture’s data set and definitions of the
variables to be used, some statistical concepts that help identify
what we are doing and the kinds of data that we are using.
Next Steps
Please respond to Discussion Thread 1 for this week with your
initial response and responses to others over a couple of days
before moving on to reading the second lecture for the week.
Ask questions and please share what is unclear. Statistics has
been compared to learning a new language, we need to
understand what the terms mean and how they apply.
Please ask your instructor if you have any questions about this
material.
09/24/2007 12:37 PMWEI JINGSHENG
Page 1 of 7http://www.echonyc.com/~wei/Fifth.html
THE FIFTH MODERNIZATION
by Wei Jingsheng
At the present, the media no longer play up the themes of
dictatorship of the proletariat and class struggle.
One reason is that this line of propaganda was used as sort of a
magical potion by the Gang of Four, who
have now been overthrown. Another reason, which is even more
important, is that the people have had
enough of all that and can no longer be deceived.
According to the laws of history, the new will not come about
until the old is gone. Now that the old is
gone, the people are rubbing their eyes in eager anticipation.
Finally, with God's blessing, there is a new
promise - the Four Modernizations. Chairman Hua, the wise
leader, and Vice-Chairman Deng (who the
people consider even wiser and greater) have defeated the Gang
of Four. Now democracy and prosperity, so
earnestly sought by those who shed their blood at Tian-an-men,
seem soon to be realized.
After the arrest of the Gang of Four, people eagerly hoped that
Vice-Chairman Deng, the so-called "restorer
of capitalism," would once again appear as a great towering
banner. Finally, Vice-Chairman Deng did
return to his post on the Central Committee. The people were
indeed excited, inspired, and ... [sic].
However, to the people's regret, the hated old political system
has not changed, and even any talk about the
much hoped for democracy and freedom is forbidden. People's
living conditions remain the same and the
"increased wages" are far behind the soaring commodity prices.
There has been some talk about the restoration of "capitalism"
and the bonus system. After some
investigation it was confirmed that the "invisible whip" for "the
maximum exploitation of workers," which
had been cursed by the Marxist ancestors, could not be used to
fool the people anymore. Although without
the leadership of the Great Helmsman, people can still be led by
the "wise leader" to catch up with and
surpass England, the United States, Japan, and Yugoslavia (?)
or the advanced world level. Taking part in
revolution is no longer "in vogue." Since entering a university
will greatly enhance a person's prestige,
people no longer need to hear the deafening noise of "class
struggle" slogans. The Four Modernizations
stand for everything that is good. Of course, it is still necessary
to act according to the spirit of the Central
Committee, as relayed to us by the April Fifth Academy. The
beautiful vision can materialize only under
unified leadership and guidance.
In ancient China, there were such maxims as "A cake in the
picture can appease hunger" and "Watching the
plums can quench the thirst." These witty and ironic remarks
were quite popular in ancient times, but today,
after a long and continuous development of history, people
should never take such stupid remarks seriously.
Yet some people not only believe in them but also carry them
out in practice.
For several decades, Chinese people have closely followed the
Great Helmsman. Communist ideology has
provided "the cake in the picture," and the Great Leap Forward
and Three Red Banners have served as
09/24/2007 12:37 PMWEI JINGSHENG
Page 2 of 7http://www.echonyc.com/~wei/Fifth.html
"plums for quenching thirst." People tightening their belts and
bravely forged ahead. Thirty years soon
passed and they have learned a lesson from experience. For
thirty years people were like "monkeys reaching
out for the moon and feeling only emptiness." Therefore, when
Vice-Chairman Deng put forward the
slogan, "Be practical," people's enthusiasm was like surging
waves. Time and again he was helped by the
people to come to power. The people expected him to review the
past and lead them to a realistic future with
a "seeking truth from facts" approach.
However, some people have warned us: Marxism-Leninism-Mao
Zedong Thought is the foundations of all
foundations; Chairman Mao was the Great Savior of the people;
"Without the Communist Party, there would
be no new China"; "Without Chairman Mao there would be no
new China"; and anyone disagreeing with
these will come to no good end. "Some people" even warned us:
Chinese people need dictatorship. His
superiority over feudal emperors precisely shows his greatness.
Chinese people need no democracy unless it
is "democracy under collective leadership" without which
democracy is not worth a dime. It is up to you to
believe or to doubt it, but the prisons (from which so many have
recently been released) were convincing
"proof."
However, someone has now given you a way out. Take the Four
Modernizations as the key link and follow
the principle of stability and unity and be brave (?) to serve the
revolution (?) as an old ox does. Then you
will find your way to paradise, namely the prosperity of
communism and the Four Modernizations. Some
well-intentioned people have given us this advice. "When you
cannot think straight, try hard to study
Marxism-Leninism-Mao Zedong Thought!" The reason why you
cannot think straight is your lack of
understanding which reflects on the level of your ideological
accomplishment. You should be obedient,
otherwise the leadership of your unit cannot forgive you! And
on and on.
I advise everyone not to believe such political swindlers
anymore. Knowing that we are being deceived, we
should implicitly believe in ourselves. We have been tempered
in the Cultural Revolution and cannot be
that ignorant now. Let us find out for ourselves what should be
done.
Why Democracy?
This question has been discussed by many people for centuries.
Others have conducted careful analyses and
indicated on the Democracy Wall how much better is democracy
than autocracy.
"People are the masters of history." Is this a fact or just empty
talk? Well, it can be both. How can there be
history without the people's strength and their participation in
making it? No Great Helmsman or Wise
leader can even exit, not to speak of creating history. From this,
we can see that without new Chinese
people, there would be no new China; but it is not true that
"without Chairman Mao, there would be no new
China." Vice-Chairman Deng is grateful to Chairman Mao for
saving his life. This is understandable. But is
it not reasonable too that he should be grateful to the "outcries"
that pushed him to the seat of power?
Would it be reasonable for him to respond to the outcries by
saying, "You must not denigrate Chairman
Mao, because he saved my life?" This makes "The people are
the masters of history" an empty slogan. It is
empty talk because people cannot master their own destiny
according to the majority will; because their
achievements have been credited to other people's accounts; and
because their rights have been used to
make somebody's royal crown. What kind of master is this? It
may be more correct to call them slaves. In
our history books the people are the masters who create
everything, but in real life they are lackeys, always
standing at attention and waiting to be "led" by leaders who
swell like dough under the effect of yeast.
09/24/2007 12:37 PMWEI JINGSHENG
Page 3 of 7http://www.echonyc.com/~wei/Fifth.html
People should have democracy. When they ask for democracy,
they are only demanding what is rightfully
theirs. Anyone refusing to give it to them is a shameless bandit
no better than a capitalist who robs workers
of their money earned with their sweat and blood. Do the people
have democracy now? No. Do they want
to be masters of their own destiny? Definitely yes. This was the
reason for the Communist Party's victory
over Kuomintang. But what then happened to the promise of
democracy? The slogan "people's democratic
dictatorship" was replaced by the dictatorship of the
proletariat." Even the "democracy" enjoyed by the
infinitesimal portion - one among tens of millions - was
abolished and replaced by the autocracy of the
"Great Leader." Thus, Peng Dehuai was overthrown because,
instead of following the Great Leader's
instruction, he had the audacity to show his temper in the Party.
Then a new promise was held out: Because
the leader is great, implicit faith in such a leaders, rather than
democracy, will bring more happiness to the
people. People have believed in this promise, half reluctantly
and half willingly, until today. But are they
any happier? Are they richer or more prosperous?
Unconcealable facts show that they are poorer, more
miserable, and more backward. Why? This is the first question
to be considered. And what to do now? This
is the second question.
There is no need now to determine the ratio of Mao Zedong's
merits and shortcomings. He first spoke about
this as a self-defense. People should now think for a while and
see if, without Mao Zedong's autocracy,
China could be in its present backward state. Are Chinese
people stupid, or lazy, or unwilling to enjoy
wealth? Are they expecting too much? Quite the opposite. Then
why? The answer is quite obvious. Chinese
people should not have taken this road. Then why did they take
it? Only because they were led by that self-
exalting autocrat. If they did not take this road, he would
exercise dictatorship over them. The people could
see no other road and therefore had no choice. Is this not
deception? Can there be any merit in deception?
What road is this? It is called the "socialist road." According to
the definition of the Marxist ancestors,
socialism means that the people, or the proletariat, are their own
masters. Let me ask the Chinese workers
and peasants: With the meager wages you get every month,
whose master and what kind of master can you
be? Sad to relate, you are "mastered" by somebody else when in
the matter of matrimony. Socialism
guarantees the producers' rights to the surplus production from
their labor over what is needed as a service
to the society. But this service is limitless. So are you not
getting only that miserable little wage "necessary
for maintaining the labor force for production?" Socialism
guarantees many rights, such as the right of a
citizen to receive educations, to use this ability to the best
advantage, and so forth. But none of these rights
can be seen in our daily life. What we can see is only "the
dictatorship of the proletariat" and "a variation of
Russian autocracy" - Chinese socialist autocracy. Is this kind of
socialist road what people want? Can it be
claimed that autocracy means the people's happiness. Is this the
socialist road depicted by Marx and hoped
for by the people? Obviously not. Then what is it? Funny as it
may sound, it is like the feudal socialism
mentioned in the "Manifesto," or a feudal monarchy disguised
as socialism. We have heard that Soviet
Russia has been promoted from social feudalism to social
imperialism. Must Chinese people take the same
road? Some people have proposed that we should change
everything to fascist autocracy under feudal
socialism. To this I entirely agree, because the question of
merits or shortcomings does not exist here.
Let me say a word about the "National Socialism" the real name
of the notorious German fascism. These
fascists, also under an autocrat tyrant, called on the people to
tighten their belts and deceived the people by
telling them that they belonged to a great nation. Their main
purpose was to suppress the most rudimentary
form of democracy, because they clearly knew that democracy
was the most formidable and irresistible
enemy. On this basis, Stalin and Hitler shook hands and signed
the German-Soviet Pact whereby a socialist
state and a National Socialist State toasted the partition of
Poland while the peoples of both countries
suffered enslavement and poverty? If we do not want democracy
as our only choice or, in other words, if we
09/24/2007 12:37 PMWEI JINGSHENG
Page 4 of 7http://www.echonyc.com/~wei/Fifth.html
want modernized economics, science, military science, and so
forth, then there must be modernization of
the people and of the social system.
The Fifth Modernization - What Kind of Democracy?
I would like to ask everyone: What do we want modernization
for? After all, some men feel that the age of
The Dream of the Red Chamber must have been perfectly all
right, because men were free to read, write
poetry, and fool around with women. One needed only to open
his mouth and food would be provided, only
raise an arm to be dressed. Well, today's privileged class get to
see foreign movies and live like gods. Such
a life-style is quite inaccessible to ordinary folk. What the
people want are the happy days which they can
truly enjoy and which are not worse than those enjoyed by
foreigners. All want prosperity, the kind of
prosperity which is universal and which can only result from
increased social productive forces. This is
obvious to everyone. However, there is still something
overlooked by somebody. Can people enjoy good
living when social productive forces have been increased? Now
the questions of authority, of domination,
of distribution, and of exploitation arise.
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docx

More Related Content

Similar to BUS308 Week 4 Lecture 1 Examining Relationships Expect.docx

Case Study Hereditary AngioedemaAll responses must be in your .docx
Case Study  Hereditary AngioedemaAll responses must be in your .docxCase Study  Hereditary AngioedemaAll responses must be in your .docx
Case Study Hereditary AngioedemaAll responses must be in your .docxcowinhelen
 
Describing Relationship between Variables
Describing Relationship between VariablesDescribing Relationship between Variables
Describing Relationship between VariablesMaribelMadarimot1
 
Mr 4. quantitative research design and methods
Mr 4. quantitative research design and methodsMr 4. quantitative research design and methods
Mr 4. quantitative research design and methodsS'Roni Roni
 
BUS308 – Week 1 Lecture 2 Describing Data Expected Out.docx
BUS308 – Week 1 Lecture 2 Describing Data Expected Out.docxBUS308 – Week 1 Lecture 2 Describing Data Expected Out.docx
BUS308 – Week 1 Lecture 2 Describing Data Expected Out.docxcurwenmichaela
 
8 Statistical SignificanceOK, measures of association are one .docx
8 Statistical SignificanceOK, measures of association are one .docx8 Statistical SignificanceOK, measures of association are one .docx
8 Statistical SignificanceOK, measures of association are one .docxevonnehoggarth79783
 
BUS 308 Week 3 Lecture 1 Examining Differences - Continued.docx
BUS 308 Week 3 Lecture 1 Examining Differences - Continued.docxBUS 308 Week 3 Lecture 1 Examining Differences - Continued.docx
BUS 308 Week 3 Lecture 1 Examining Differences - Continued.docxcurwenmichaela
 
Katagorisel veri analizi
Katagorisel veri analiziKatagorisel veri analizi
Katagorisel veri analiziBurak Kocak
 
QUANTITATIVE RESEARCH DESIGN AND METHODS.ppt
QUANTITATIVE RESEARCH DESIGN AND METHODS.pptQUANTITATIVE RESEARCH DESIGN AND METHODS.ppt
QUANTITATIVE RESEARCH DESIGN AND METHODS.pptBhawna173140
 
36033 Topic Happiness Data setNumber of Pages 2 (Double Spac.docx
36033 Topic Happiness Data setNumber of Pages 2 (Double Spac.docx36033 Topic Happiness Data setNumber of Pages 2 (Double Spac.docx
36033 Topic Happiness Data setNumber of Pages 2 (Double Spac.docxrhetttrevannion
 
correlation &causation.docx
correlation &causation.docxcorrelation &causation.docx
correlation &causation.docxRubabNoor2
 
BUS 308 – Week 4 Lecture 2 Interpreting Relationships .docx
BUS 308 – Week 4 Lecture 2 Interpreting Relationships .docxBUS 308 – Week 4 Lecture 2 Interpreting Relationships .docx
BUS 308 – Week 4 Lecture 2 Interpreting Relationships .docxcurwenmichaela
 
BUS 308 – Week 4 Lecture 2 Interpreting Relationships .docx
BUS 308 – Week 4 Lecture 2 Interpreting Relationships .docxBUS 308 – Week 4 Lecture 2 Interpreting Relationships .docx
BUS 308 – Week 4 Lecture 2 Interpreting Relationships .docxjasoninnes20
 
Presentation1.pptx
Presentation1.pptxPresentation1.pptx
Presentation1.pptxmanuelposas
 
ML4 Regression.pptx
ML4 Regression.pptxML4 Regression.pptx
ML4 Regression.pptxDayal Sati
 
Mba724 s2 w1 elements of scientific research
Mba724 s2 w1 elements of scientific researchMba724 s2 w1 elements of scientific research
Mba724 s2 w1 elements of scientific researchRachel Chung
 
BUS308 – Week 5 Lecture 1 A Different View Expected Ou.docx
BUS308 – Week 5 Lecture 1 A Different View Expected Ou.docxBUS308 – Week 5 Lecture 1 A Different View Expected Ou.docx
BUS308 – Week 5 Lecture 1 A Different View Expected Ou.docxcurwenmichaela
 

Similar to BUS308 Week 4 Lecture 1 Examining Relationships Expect.docx (20)

Case Study Hereditary AngioedemaAll responses must be in your .docx
Case Study  Hereditary AngioedemaAll responses must be in your .docxCase Study  Hereditary AngioedemaAll responses must be in your .docx
Case Study Hereditary AngioedemaAll responses must be in your .docx
 
Describing Relationship between Variables
Describing Relationship between VariablesDescribing Relationship between Variables
Describing Relationship between Variables
 
Mr 4. quantitative research design and methods
Mr 4. quantitative research design and methodsMr 4. quantitative research design and methods
Mr 4. quantitative research design and methods
 
Sig Tailed
Sig TailedSig Tailed
Sig Tailed
 
BUS308 – Week 1 Lecture 2 Describing Data Expected Out.docx
BUS308 – Week 1 Lecture 2 Describing Data Expected Out.docxBUS308 – Week 1 Lecture 2 Describing Data Expected Out.docx
BUS308 – Week 1 Lecture 2 Describing Data Expected Out.docx
 
8 Statistical SignificanceOK, measures of association are one .docx
8 Statistical SignificanceOK, measures of association are one .docx8 Statistical SignificanceOK, measures of association are one .docx
8 Statistical SignificanceOK, measures of association are one .docx
 
BUS 308 Week 3 Lecture 1 Examining Differences - Continued.docx
BUS 308 Week 3 Lecture 1 Examining Differences - Continued.docxBUS 308 Week 3 Lecture 1 Examining Differences - Continued.docx
BUS 308 Week 3 Lecture 1 Examining Differences - Continued.docx
 
Katagorisel veri analizi
Katagorisel veri analiziKatagorisel veri analizi
Katagorisel veri analizi
 
QUANTITATIVE RESEARCH DESIGN AND METHODS.ppt
QUANTITATIVE RESEARCH DESIGN AND METHODS.pptQUANTITATIVE RESEARCH DESIGN AND METHODS.ppt
QUANTITATIVE RESEARCH DESIGN AND METHODS.ppt
 
Correlational Study
Correlational StudyCorrelational Study
Correlational Study
 
36033 Topic Happiness Data setNumber of Pages 2 (Double Spac.docx
36033 Topic Happiness Data setNumber of Pages 2 (Double Spac.docx36033 Topic Happiness Data setNumber of Pages 2 (Double Spac.docx
36033 Topic Happiness Data setNumber of Pages 2 (Double Spac.docx
 
correlation &causation.docx
correlation &causation.docxcorrelation &causation.docx
correlation &causation.docx
 
BUS 308 – Week 4 Lecture 2 Interpreting Relationships .docx
BUS 308 – Week 4 Lecture 2 Interpreting Relationships .docxBUS 308 – Week 4 Lecture 2 Interpreting Relationships .docx
BUS 308 – Week 4 Lecture 2 Interpreting Relationships .docx
 
BUS 308 – Week 4 Lecture 2 Interpreting Relationships .docx
BUS 308 – Week 4 Lecture 2 Interpreting Relationships .docxBUS 308 – Week 4 Lecture 2 Interpreting Relationships .docx
BUS 308 – Week 4 Lecture 2 Interpreting Relationships .docx
 
20March2015Waddington
20March2015Waddington20March2015Waddington
20March2015Waddington
 
bigDay1data
bigDay1databigDay1data
bigDay1data
 
Presentation1.pptx
Presentation1.pptxPresentation1.pptx
Presentation1.pptx
 
ML4 Regression.pptx
ML4 Regression.pptxML4 Regression.pptx
ML4 Regression.pptx
 
Mba724 s2 w1 elements of scientific research
Mba724 s2 w1 elements of scientific researchMba724 s2 w1 elements of scientific research
Mba724 s2 w1 elements of scientific research
 
BUS308 – Week 5 Lecture 1 A Different View Expected Ou.docx
BUS308 – Week 5 Lecture 1 A Different View Expected Ou.docxBUS308 – Week 5 Lecture 1 A Different View Expected Ou.docx
BUS308 – Week 5 Lecture 1 A Different View Expected Ou.docx
 

More from richardnorman90310

BUSI 520Discussion Board Forum InstructionsThreadMarket.docx
BUSI 520Discussion Board Forum InstructionsThreadMarket.docxBUSI 520Discussion Board Forum InstructionsThreadMarket.docx
BUSI 520Discussion Board Forum InstructionsThreadMarket.docxrichardnorman90310
 
BUSI 330Collaborative Marketing Plan Final Draft Instructions.docx
BUSI 330Collaborative Marketing Plan Final Draft Instructions.docxBUSI 330Collaborative Marketing Plan Final Draft Instructions.docx
BUSI 330Collaborative Marketing Plan Final Draft Instructions.docxrichardnorman90310
 
BUSI 460 – LT Assignment Brief 1 ACADEMIC YEAR 2020 – S.docx
BUSI 460 – LT Assignment Brief 1 ACADEMIC YEAR  2020 – S.docxBUSI 460 – LT Assignment Brief 1 ACADEMIC YEAR  2020 – S.docx
BUSI 460 – LT Assignment Brief 1 ACADEMIC YEAR 2020 – S.docxrichardnorman90310
 
BUS475 week#7Diversity in the work environment promotes accept.docx
BUS475 week#7Diversity in the work environment promotes accept.docxBUS475 week#7Diversity in the work environment promotes accept.docx
BUS475 week#7Diversity in the work environment promotes accept.docxrichardnorman90310
 
BUS475week#5In Chapter 11 of your textbook, you explored import.docx
BUS475week#5In Chapter 11 of your textbook, you explored import.docxBUS475week#5In Chapter 11 of your textbook, you explored import.docx
BUS475week#5In Chapter 11 of your textbook, you explored import.docxrichardnorman90310
 
BUS475week#6Share a recent or current event in which a busine.docx
BUS475week#6Share a recent or current event in which a busine.docxBUS475week#6Share a recent or current event in which a busine.docx
BUS475week#6Share a recent or current event in which a busine.docxrichardnorman90310
 
BUS475v10Project PlanBUS475 v10Page 2 of 2Wk 4 – App.docx
BUS475v10Project PlanBUS475 v10Page 2 of 2Wk 4 – App.docxBUS475v10Project PlanBUS475 v10Page 2 of 2Wk 4 – App.docx
BUS475v10Project PlanBUS475 v10Page 2 of 2Wk 4 – App.docxrichardnorman90310
 
BUS472L – Unit 2 & 4 AssignmentStudent Name ___________________.docx
BUS472L – Unit 2 & 4 AssignmentStudent Name ___________________.docxBUS472L – Unit 2 & 4 AssignmentStudent Name ___________________.docx
BUS472L – Unit 2 & 4 AssignmentStudent Name ___________________.docxrichardnorman90310
 
BUS301 Memo Rubric Spring 2020 - Student.docxBUS301 Writing Ru.docx
BUS301 Memo Rubric Spring 2020 - Student.docxBUS301 Writing Ru.docxBUS301 Memo Rubric Spring 2020 - Student.docxBUS301 Writing Ru.docx
BUS301 Memo Rubric Spring 2020 - Student.docxBUS301 Writing Ru.docxrichardnorman90310
 
BUS 206 Milestone Two Template To simplify completi.docx
BUS 206 Milestone Two Template  To simplify completi.docxBUS 206 Milestone Two Template  To simplify completi.docx
BUS 206 Milestone Two Template To simplify completi.docxrichardnorman90310
 
BurkleyFirst edition Chapter 14Situational InfluencesC.docx
BurkleyFirst edition Chapter 14Situational InfluencesC.docxBurkleyFirst edition Chapter 14Situational InfluencesC.docx
BurkleyFirst edition Chapter 14Situational InfluencesC.docxrichardnorman90310
 
BurkleyFirst edition Chapter 7BelongingCopyright © 201.docx
BurkleyFirst edition Chapter 7BelongingCopyright © 201.docxBurkleyFirst edition Chapter 7BelongingCopyright © 201.docx
BurkleyFirst edition Chapter 7BelongingCopyright © 201.docxrichardnorman90310
 
BurkleyFirst edition Chapter 5AutonomyCopyright © 2018.docx
BurkleyFirst edition Chapter 5AutonomyCopyright © 2018.docxBurkleyFirst edition Chapter 5AutonomyCopyright © 2018.docx
BurkleyFirst edition Chapter 5AutonomyCopyright © 2018.docxrichardnorman90310
 
Bunker Hill Community College MAT 093 Foundations of Mathema.docx
Bunker Hill Community College  MAT 093 Foundations of Mathema.docxBunker Hill Community College  MAT 093 Foundations of Mathema.docx
Bunker Hill Community College MAT 093 Foundations of Mathema.docxrichardnorman90310
 
BurkleyFirst edition Chapter 3Psychological Origins of M.docx
BurkleyFirst edition Chapter 3Psychological Origins of M.docxBurkleyFirst edition Chapter 3Psychological Origins of M.docx
BurkleyFirst edition Chapter 3Psychological Origins of M.docxrichardnorman90310
 
Bullying and cyberbullying of adolescents have become increasingly p.docx
Bullying and cyberbullying of adolescents have become increasingly p.docxBullying and cyberbullying of adolescents have become increasingly p.docx
Bullying and cyberbullying of adolescents have become increasingly p.docxrichardnorman90310
 
Building an Information Technology Security Awareness an.docx
Building an Information Technology Security Awareness an.docxBuilding an Information Technology Security Awareness an.docx
Building an Information Technology Security Awareness an.docxrichardnorman90310
 
Building a company with the help of IT is really necessary as most.docx
Building a company with the help of IT is really necessary as most.docxBuilding a company with the help of IT is really necessary as most.docx
Building a company with the help of IT is really necessary as most.docxrichardnorman90310
 
Building a Comprehensive Health HistoryBuild a health histor.docx
Building a Comprehensive Health HistoryBuild a health histor.docxBuilding a Comprehensive Health HistoryBuild a health histor.docx
Building a Comprehensive Health HistoryBuild a health histor.docxrichardnorman90310
 
Build-a-modelStarting with this partial model, which contains fina.docx
Build-a-modelStarting with this partial model, which contains fina.docxBuild-a-modelStarting with this partial model, which contains fina.docx
Build-a-modelStarting with this partial model, which contains fina.docxrichardnorman90310
 

More from richardnorman90310 (20)

BUSI 520Discussion Board Forum InstructionsThreadMarket.docx
BUSI 520Discussion Board Forum InstructionsThreadMarket.docxBUSI 520Discussion Board Forum InstructionsThreadMarket.docx
BUSI 520Discussion Board Forum InstructionsThreadMarket.docx
 
BUSI 330Collaborative Marketing Plan Final Draft Instructions.docx
BUSI 330Collaborative Marketing Plan Final Draft Instructions.docxBUSI 330Collaborative Marketing Plan Final Draft Instructions.docx
BUSI 330Collaborative Marketing Plan Final Draft Instructions.docx
 
BUSI 460 – LT Assignment Brief 1 ACADEMIC YEAR 2020 – S.docx
BUSI 460 – LT Assignment Brief 1 ACADEMIC YEAR  2020 – S.docxBUSI 460 – LT Assignment Brief 1 ACADEMIC YEAR  2020 – S.docx
BUSI 460 – LT Assignment Brief 1 ACADEMIC YEAR 2020 – S.docx
 
BUS475 week#7Diversity in the work environment promotes accept.docx
BUS475 week#7Diversity in the work environment promotes accept.docxBUS475 week#7Diversity in the work environment promotes accept.docx
BUS475 week#7Diversity in the work environment promotes accept.docx
 
BUS475week#5In Chapter 11 of your textbook, you explored import.docx
BUS475week#5In Chapter 11 of your textbook, you explored import.docxBUS475week#5In Chapter 11 of your textbook, you explored import.docx
BUS475week#5In Chapter 11 of your textbook, you explored import.docx
 
BUS475week#6Share a recent or current event in which a busine.docx
BUS475week#6Share a recent or current event in which a busine.docxBUS475week#6Share a recent or current event in which a busine.docx
BUS475week#6Share a recent or current event in which a busine.docx
 
BUS475v10Project PlanBUS475 v10Page 2 of 2Wk 4 – App.docx
BUS475v10Project PlanBUS475 v10Page 2 of 2Wk 4 – App.docxBUS475v10Project PlanBUS475 v10Page 2 of 2Wk 4 – App.docx
BUS475v10Project PlanBUS475 v10Page 2 of 2Wk 4 – App.docx
 
BUS472L – Unit 2 & 4 AssignmentStudent Name ___________________.docx
BUS472L – Unit 2 & 4 AssignmentStudent Name ___________________.docxBUS472L – Unit 2 & 4 AssignmentStudent Name ___________________.docx
BUS472L – Unit 2 & 4 AssignmentStudent Name ___________________.docx
 
BUS301 Memo Rubric Spring 2020 - Student.docxBUS301 Writing Ru.docx
BUS301 Memo Rubric Spring 2020 - Student.docxBUS301 Writing Ru.docxBUS301 Memo Rubric Spring 2020 - Student.docxBUS301 Writing Ru.docx
BUS301 Memo Rubric Spring 2020 - Student.docxBUS301 Writing Ru.docx
 
BUS 206 Milestone Two Template To simplify completi.docx
BUS 206 Milestone Two Template  To simplify completi.docxBUS 206 Milestone Two Template  To simplify completi.docx
BUS 206 Milestone Two Template To simplify completi.docx
 
BurkleyFirst edition Chapter 14Situational InfluencesC.docx
BurkleyFirst edition Chapter 14Situational InfluencesC.docxBurkleyFirst edition Chapter 14Situational InfluencesC.docx
BurkleyFirst edition Chapter 14Situational InfluencesC.docx
 
BurkleyFirst edition Chapter 7BelongingCopyright © 201.docx
BurkleyFirst edition Chapter 7BelongingCopyright © 201.docxBurkleyFirst edition Chapter 7BelongingCopyright © 201.docx
BurkleyFirst edition Chapter 7BelongingCopyright © 201.docx
 
BurkleyFirst edition Chapter 5AutonomyCopyright © 2018.docx
BurkleyFirst edition Chapter 5AutonomyCopyright © 2018.docxBurkleyFirst edition Chapter 5AutonomyCopyright © 2018.docx
BurkleyFirst edition Chapter 5AutonomyCopyright © 2018.docx
 
Bunker Hill Community College MAT 093 Foundations of Mathema.docx
Bunker Hill Community College  MAT 093 Foundations of Mathema.docxBunker Hill Community College  MAT 093 Foundations of Mathema.docx
Bunker Hill Community College MAT 093 Foundations of Mathema.docx
 
BurkleyFirst edition Chapter 3Psychological Origins of M.docx
BurkleyFirst edition Chapter 3Psychological Origins of M.docxBurkleyFirst edition Chapter 3Psychological Origins of M.docx
BurkleyFirst edition Chapter 3Psychological Origins of M.docx
 
Bullying and cyberbullying of adolescents have become increasingly p.docx
Bullying and cyberbullying of adolescents have become increasingly p.docxBullying and cyberbullying of adolescents have become increasingly p.docx
Bullying and cyberbullying of adolescents have become increasingly p.docx
 
Building an Information Technology Security Awareness an.docx
Building an Information Technology Security Awareness an.docxBuilding an Information Technology Security Awareness an.docx
Building an Information Technology Security Awareness an.docx
 
Building a company with the help of IT is really necessary as most.docx
Building a company with the help of IT is really necessary as most.docxBuilding a company with the help of IT is really necessary as most.docx
Building a company with the help of IT is really necessary as most.docx
 
Building a Comprehensive Health HistoryBuild a health histor.docx
Building a Comprehensive Health HistoryBuild a health histor.docxBuilding a Comprehensive Health HistoryBuild a health histor.docx
Building a Comprehensive Health HistoryBuild a health histor.docx
 
Build-a-modelStarting with this partial model, which contains fina.docx
Build-a-modelStarting with this partial model, which contains fina.docxBuild-a-modelStarting with this partial model, which contains fina.docx
Build-a-modelStarting with this partial model, which contains fina.docx
 

Recently uploaded

APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 

Recently uploaded (20)

APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 

BUS308 Week 4 Lecture 1 Examining Relationships Expect.docx

  • 1. BUS308 Week 4 Lecture 1 Examining Relationships Expected Outcomes After reading this lecture, the student should be familiar with: 1. Issues around correlation 2. The basics of Correlation analysis 3. The basics of Linear Regression 4. The basics of the Multiple Regression Overview Often in our detective shows when the clues are not providing a clear answer – such as we are seeing with the apparent continuing contradiction between the compa-ratio and salary related results – we hear the line “maybe we need to look at this from a different viewpoint.” That is what we will be doing this week. Our investigation changes focus a bit this week. We started the class by finding ways to describe and summarize data sets – finding measures of the center and dispersion of the data with means, medians, standard deviations, ranges, etc. As interesting as these clues were, they did not tell us all we needed to know to solve our question about equal work for equal pay. In fact, the evidence was somewhat contradictory depending upon what
  • 2. measure we focused on. In Weeks 2 and 3, we changed our focus to asking questions about differences and how important different sample outcomes were. We found that all differences were not important, and that for many relatively small result differences we could safely ignore them for decision making purposes – they were due to simple sampling (or chance) errors. We found that this idea of sampling error could extend into work and individual performance outcomes observed over time; and that over- reacting to such differences did not make much sense. Now, in our continuing efforts to detect and uncover what the data is hiding from us, we change focus again as we start to find out why something happened, what caused the data to act as it did; rather than merely what happened (describing the data as we have been doing). This week we move from examining differences to looking at relationships; that is, if some measure changes does another measure change as well? And, if so, can we use this information to make predictions and/or understand what underlies this common movement? Our tools in doing this involve correlation, the measurement of how closely two variables move together; and regression, an equation showing the impact of inputs on a final output. A regression is similar to a recipe for a cake or other food dish; take a bit of this and some of that, put them together, and we get our result. Correlation
  • 3. We have seen correlations a lot, and probably have even used them (formally or informally). We know, for example, that all other things being equal; the more we eat. the more we weigh. Kids, up to the early teens, grow taller the older they get. If we consistently speed, we will get more speeding tickets than those who obey the speed limit. The more efforts we put into studying, the better grades we get. All of these are examples of correlations. Correlations exist in many forms. A somewhat specialized correlation was the Chi Square contingency test (for multi-row, multi-column tables) we looked at last week, if we find the distributions differ, then we say that the variables are related/correlated. This correlation would run from 0 (no correlation) thru positive values (the larger the value the stronger the relationship). Probably the most commonly used correlation is the Pearson Correlation Coefficient, symbolized by r. It measures the strength of the association – the extent to which measures change together – between interval or ratio level measures. Excel’s Fx Correl, and the Data Analysis Correlation both produce Pearson Correlations. Most correlations that we are familiar with show both the direction (direct or inverse) as well as the strength of the relationship, and run from -1.0 (a strong and perfect inverse
  • 4. correlation) through 0 (a weak and non-existent correlation) to +1.0 (a strong an perfect direct correlation). A direct correlation is positive; that is, both variables move in the same direction, such as weight and height for kids. An inverse, or negative, correlation has variables moving in different directions. For example, the number of hours you sleep and how tired you feel; the more hours, the less tired while the fewer hours, the more tired. The strength of a correlation is shown by the value (regardless of the sign). For example, a correlation of +.78 is just as strong as a correlation of -.78; the only difference is the direction of the change. If we graphed a +.78 correlation the data points would run from the lower left to the upper right and somewhat cluster around a line we could draw thru the middle of the data points. A graph of a -.78 correlation would have the data points starting in the upper left and run down to the lower right. They would also cluster around a line. Correlations below an absolute value (when we ignore the plus or minus sign) of around .70 are generally not considered to be very strong. The reason for this is due to the coefficient of determination(CD). This equals the square of the correlation and shows the amount of shared variation between the two variables. Shared variation can be roughly considered the reason that both variables move as they do when one changes. The more the shared variation, the more one variable can be used to predict the other. If we square .70 we get .49, or about 50% of the variation being shared. Anything less is too weak of a relationship to be of much help.
  • 5. Students often feel that a correlation shows a “cause-and-effect” relationship; that is, changes in one thing “cause” changes in the other variable. In some cases, this is true – height and weight for pre-teens, weight and food consumption, etc. are all examples of possible cause- and- effect relationships; but we can argue that even with these there are other variables that might interfere with the outcomes. And, in research, we cannot say that one thing causes or explains another without having a strong correlation present. However, just as our favorite detectives find what they think is a cause for someone to have committed the crime, only to find that the motive did not actually cause that person to commit the crime; a correlation does not prove cause-and- effect. An example of this is the example the author heard in a statistics class of a perfect +1.00 correlation found between the barrels of rum imported into the New England region of the United States between the years of 1790 and 1820 and the number of churches built each year. If this correlation showed a cause- and-effect relationship, what does it mean? Does rum drinking (the assumed result of importing rum) cause churches to be built? Does the building of churches cause the population to drink more rum? As tempting as each of these explanations is, neither is reasonable – there is no theory or justification to assume either is true. This is a spurious
  • 6. correlation – one caused by some other, often unknown, factor. In this case, the culprit is population growth. During these years – many years before Carrie Nation’s crusade against Demon Rum – rum was the common drink for everyone. It was even served on the naval ships of most nations. And, as the population grew, so did the need for more rum. At the same time, churches in the region could only hold so many bodies (this was before mega-churches that held multiple services each Sunday); so, as the population got too large to fit into the existing churches, new ones were needed. At times, when a correlation makes no sense we can find an underlying variable fairly easily with some thought. At other times, it is harder to figure out, and some experimentation is needed. The site http://www.tylervigen.com/spurious- correlations is an interesting website devoted to spurious correlations, take a look and see if you can explain them. �� Regression Linear. Even if the correlation is spurious, we can often use the data in making predictions until we understand what the correlation is really showing us. This is what regression is all about. Earlier correlations between age, height, and even weight were mentioned. In pediatrician offices, doctors will often have charts showing typical weights and heights for children of different ages. These are the results of regressions, equations showing relationships. For example (and these values are made up for
  • 7. this example), a child’s height might be his/her initial height at birth plus and average growth of 3.5 inches per year. If the average height of a newborn child is about 19 inches, then the linear regression would be: Height = 19 inches plus 3.5 inches * age in years, or in math symbols: Y = a + b*x, where y stands for height, a is the intercept or initial value at age 0 (immediate birth), b is the rate of growth per year, and x is the age in years. In both cases, we would read and interpret it the same way: the expected height of a child is 19 inches plus 3.5 inches times its age. For a 12-year old, this would be 19 + 3.5*12 = 19 + 42 = 61 inches or 5 feet 1 inch (assuming the made-up numbers are accurate). Multiple. That was an example of a linear regression having one output and a single, independent variable as an input. A multiple regression equation is quite similar but has several independent input variables. It could be considered to be similar to a recipe for a cake: http://www.tylervigen.com/spurious-correlations Cake = cake mix + 2* eggs + 1½ * cup milk + ½ * teaspoon vanilla + 2 tablespoons* butter. A regression equation, either linear or multiple, shows us how “much” each factor is used in or
  • 8. influences the outcome. The math format of the multiple regression equation is quite similar to that of the linear regression, it just includes more variables: Y = a + b1*X1 + b2*X2 + b3*X3 + …; where a is the intercept value when all the inputs are 0, the various b’s are the coefficients that are multiplied by each variable value, and the x’s are the values of each input. A note on how to read the math symbols in the equations. The Y is considered the output or result, and is often called the dependent variable as its value depends on the other factors. The different b’s (b1, b2, etc.) are coefficients and read b-sub-1, b- sub-2, etc. The subscripts 1, 2, etc. are used to indicate the different coefficient values that are related to each of the input variables. The X-sub-1, X-sub-2, etc., are the different variables used to influence the output, and are called independent variables. In the recipe example, Y would be the quality of the cake, a would be the cake mix (a constant as we use all of what is in the box), the other ingredients would relate to the b*X terms. The 2*eggs would relate to b1*X1, where b1 would equal 2 and X1 stands for eggs, the second input relates to the milk, etc. Summary This week we changed our focus from examining differences to looking for relationships – do variables change in predictable ways. Correlation lets us see both the strength and the direction of change for two variables. Regression allows us to see how some variables “drive” or
  • 9. explain the change in another. Pearson’s (for interval and ratio data variables) and Spearman’s (for rank ordered or ordinal data variables) are the two most commonly used correlation coefficients. Each looks at how a pair of variables moves in predictable patterns – either both increasing together or one increasing as the other decreases. The correlation ranges from - 1.00 (moving in opposite directions) to +1.00 (moving in the same direction). These are both examples of linear correlation – how closely the variables move in a straight line (if graphed). Curvilinear corrections exist but are not covered in this class. Regression equations show the relationship between independent (input) variables and a dependent (output variables). Linear regression involves a pair of variables as seen in the linear correlations. Multiple regression uses several input (independent) variables for a single output (dependent) variable. The basic form of the regression equation is the same for both linear and multiple regression equations. The only difference is in the number of inputs used. The multiple regression equation general form is: Y = Intercept + coefficient1 * variable1 + coefficient2 * variable2 + etc. or Y = A + b1*X1 + b2*X2 + …; where A is the intercept value, b is a coefficient value, and X is the name of a variable, and the subscripts identify different
  • 10. variables. Summary This week we changed focus from examining differences to examining relationships – how variables might move in predictable patterns. This, we found, can be done with either correlations or regression equations. Correlations measure both the strength (the value of the correlation) and the direction (the sign) of the relationship. We looked at the Pearson Correlation (for interval and ratio level data) and the Spearman’s Rank Order Correlation (for ordinal level data). Both range from -1.00 (a perfect inverse correlation where as one value increases the other decreases) to +1.00 (a perfect direct correlation where both value increase together). A perfect correlation means the data points would fall on a straight line if graphed. One interesting characteristic of these correlations occurs when you square the values. This produces the Coefficient of Determination (CD), which gives us an estimate of how much variation is in common between the two variables. CD values of less than .50 are not particularly useful for practical purposes. Regression equations provide a formula that shows us how much influence an input variable has on the output; that is, how much the output changes for a given change in an input. Regression equations are behind such commonly used
  • 11. information such as the relationship between height and weight for children that doctors use to assess our children’s development. That would be a linear regression, Weight = constant + coefficient*height in inches or Y = A + b*X, where Y stands for weight, A is the constant, b is the coefficient, and X is the height. A multiple regression is conceptually the same but has several inputs impacting a single output. If you have any questions on this material, please ask your instructor. After finishing with this lecture, please go to the first discussion for the week, and engage in a discussion with others in the class over the first couple of days before reading the second lecture. BUS 308 Week 3 Lecture 1 Examining Differences - Continued Expected Outcomes After reading this lecture, the student should be familiar with: 1. Issues around multiple testing 2. The basics of the Analysis of Variance test 3. Determining significant differences between group means 4. The basics of the Chi Square Distribution.
  • 12. Overview Last week, we found out ways to examine differences between a measure taken on two groups (two-sample test situation) as well as comparing that measure to a standard (a one-sample test situation). We looked at the F test which let us test for variance equality. We also looked at the t-test which focused on testing for mean equality. We noted that the t-test had three distinct versions, one for groups that had equal variances, one for groups that had unequal variances, and one for data that was paired (two measures on the same subject, such as salary and midpoint for each employee). We also looked at how the 2-sample unequal t- test could be used to use Excel to perform a one-sample mean test against a standard or constant value. This week we expand our tool kit to let us compare multiple groups for similar mean values. A second tool will let us look at how data values are distributed – if graphed, would they look the same? Different shapes or patterns often means the data sets differ in significant ways that can help explain results. Multiple Groups As interesting as comparing two groups is, often it is a bit limiting as to what it tells us. One obvious issue that we are missing in the comparisons made last week was equal work. This idea is still somewhat hard to get a clear handle on. Typically, as we look at this issue, questions
  • 13. arise about things such as performance appraisal ratings, education distribution, seniority impact, etc. Some of these can be tested with the tools introduced last week. We can see, for example, if the performance rating average is the same for each gender. What we couldn’t do, at this point however, is see if performance ratings differ by grade, do the more senior workers perform relatively better? Is there a difference between ratings for each gender by grade level? The same questions can be asked about seniority impact. This week will give us tools to expand how we look at the clues hidden within the data set about equal pay for equal work. ANOVA So, let’s start taking a look at these questions. The first tool for this week is the Analysis of Variance – ANOVA for short. ANOVA is often confusing for students; it says it analyzes variance (which it does) but the purpose of an ANOVA test is to determine if the means of different groups are the same! Now, so far, we have considered means and variance to be two distinct characteristics of data sets; characteristics that are not related, yet here we are saying that looking at one will give us insight into the other. The reason is due to the way the variance is analyzed. Just as our detectives succeed by
  • 14. looking at the clues and data in different ways, so does ANOVA. There are two key variances that are examined with this test. The first, called Within Group variance, is the average variance of the groups. ANOVA assumes the population(s) the samples are taken from have the same variation, so this average is an estimate of the population variance. The second is the variance of the entire group, Between Group Variation, as if all the samples were from the same group. Here are exhibits showing two situations. In Exhibit A, the groups are close together, in fact they are overlapping, and the means are obviously close to each other. The Between Group variation (which would be from the data set that starts with the orange group on the right and ends with the gray group on the left) is very close to the Within Group (the average) variation for the three groups. So, if we divide our estimate of the Between Group (overall) variation by the estimate of our Within Group (average) variation, we would get a value close to 1, and certainly less than about 1.5. Recalling the F statistic from last week, we could guess that there is not a significant difference in the variation estimates. (Of course, with the statistical test we do not guess but know if the result is significant or not.) Look at three sample distributions in Exhibit A. Each has the same within group variance, and the overall variance of the entire data set is not all that much larger than the average of the three separate groups. This would give us an F
  • 15. relatively close to 1.00. Exhibit A: No Significant Difference with Overall Variation 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 -5 -4 -3 -2 -1 0 1 2 3 4 5 Exhibit B: Significant Difference with Overall Variation Now, if we look at exhibit B, we see a different situation. Here the group distributions do not overlap, and the means are quite different. If we were to
  • 16. divide the Between Group (overall) variance by the Within Group (average) variance we would get a value quite a bit larger than the value we calculated with the pervious samples, probably large enough to indicate a difference between the within and between group variation estimates. And, again, we would examine this F value for statistical significance. This is essentially what ANOVA does; we will look at how and the output in the next lecture. If the F statistic is statistically significant (the null hypothesis of no difference is rejected), then we can say that the means are different. Neat! So, why bother learning a new tool to test means? Why don’t we merely use multiple t- tests to test each pair separately. Granted, it would take more time that doing a single test, but with Excel that is not much of an issue. The best reason to use ANOVA is to ensure we do not reduce our confidence in our results. If we use an alpha of 0.05, it is essentially saying we are 95% sure we made the right decision in rejecting the null. However, if we do even 3 t-tests on related data, our confidence drops to the P(Decision 1 correct + Decision 2 correct + Decision 3 correct). As we recall from week 1, the probability of three events occurring is the product of each event separately, or .95*.95*.95 = 0.857! And in comparing means for 6 groups (such as means for the different grade levels), we have 16 comparisons which would reduce our overall confidence that all decisions were correct to 44%. Not very good. Therefore, a single ANOVA test is much better for our confidence in making the right
  • 17. decision than multiple T-tests. The hypothesis testing procedure steps are set up in a similar fashion to what we did in with the t-tests. There is a single approach to wording the null and alternate hypothesis statements with ANOVA: Ho: All means are equal 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 -10 -5 0 5 10 Ha: At least one mean differs.
  • 18. The reason for this is simple. No matter how many groups we are testing, if a single mean differs, we will reject the null hypothesis. And, it can get cumbersome listing all possible outcomes of one or more means differing for the alternate. One issue remains for us if we reject the null of no differences among the mean, which means are different? This is done by constructing what we can call, for now, difference intervals. A difference interval will give us a range of values that the “real” difference between two means could really be. Remember, since the means are from samples, they are close approximations to the actual population mean, which might be a bit larger or smaller than any given mean. These difference intervals will take into account the possible sampling error we have. (How we do this will be discussed in lecture 2 for this week.). A difference interval might be -2 to +1.8. This says that the actual difference when we subtract one mean from another could be any value between -2 to +1.8. Since this interval says the difference could be 0 (meaning the means could be the same), we would find this pair of means to be not significantly different. If, however, our difference range was, for example, from +1.8 to + 3.8 (the same range but all positive values), we would say the difference between the means is significant as 0 is not within the range. ANOVA is a very useful tool when we need to compare multiple groups. For example, this can be used to see if average shipping costs are the same
  • 19. across multiple shippers. The average time to fill open positions using different advertising approaches, or the associated costs of each, can also be tested with this technique. With our equal pay issues, we can look at mean equality across grades of variables such as compa-ratio, salary, performance rating, seniority, and even raise. Chi Square Tests The ANOVA test somewhat relies upon the shape of the samples, both with our assumption that each sample is normally distributed with an equal variance and with their relative relationship (how close or distant they are). In many cases, we are concerned more with the distribution of our variables than with other measures. In some cases, particularly with nominal labels, distribution is all we can measure. In our salary question, one issue that might impact our analysis is knowing if males and females are distributed across the grades in a similar pattern. If not, then whichever gender holds more higher-level jobs would obviously have higher salaries. While this might be an affirmative action or possible discrimination issue, it is not an equal pay for equal work situation. So, again, we have some data that we are looking at, but are not sure how to make the decision if things are the same or not. And, just by examining means we cannot just look at the data we have and tell anything about how the variables are distributed.
  • 20. But, have no fear, statistics comes to our rescue! Examining distributions, or shapes, or counts per group (all ways of describing the same data) is done using a version of the Chi Square test; and, after setting up the data Excel does the work for us. In comparing distributions, and we can do this with discrete (such as the number of employees in each grade) variables or continuous variables (such as age or years of service which can take any value within a range if measured precisely enough) that we divide into ranges, we simply count how many are in each group or range. For something like the distribution of gender by grades; simply count how many males and females are in each grade, simple even if a bit tedious. For something like compa-ratio, we first set up the range values we are interested in (such as .80 up to but not including .90, etc.), and then count how many values fall within each group range. These counts are displayed in tables, such as the following on gender distribution by grade. The first is the distribution of employees by grade level for the entire sample, and the second is the distribution by gender. The question we ask is for both kinds of tables is basically the same, is the difference enough to be statistically significant or meaningfully different from our comparison standard? A B C D E F
  • 21. Overall 15 7 5 5 12 6 A B C D E F Male 3 3 3 2 10 4 Female 12 4 2 3 2 2 The answer to the question of whether the distributions are different enough, when using the Chi Square test, depends with the group we are comparing the distribution with. When we are dealing with a single row table, we need to decide what our comparison group or distribution is. For example, we could decide to compare the existing distribution or shape against a claim that the employees are spread out equally across the 6 grades with 50/6 = 8.33 employees in each grade. Or we could decide to compare the existing distribution against a pyramid shape - a more typical organization hierarchy, with the most employees at the lower grades (A and B) and fewer at the top; for example, 17, 10, 8, 7, 5, 3. The expected frequency per cell does not need to be a whole number. What is important is having some justification for the comparison distribution we use. When we have multi-row tables, such as the second example with 2 rows, the comparison group is known or considered to be basically the average of the existing counts. We will get into exactly how to set this up in the next lecture. In either case the comparison (or “expected”) distribution needs to have the row and column total sums to be the same as the original or actual
  • 22. counts. The hypothesis claims for either chi square test are basically the same: Ho: Variable counts are distributed as expected (a claim of no difference) Ha: Variable counts are not distributed as expected (a claim that a difference exists) Comparing distributions/shapes has a lot of uses in business. Manufacturing generally produces parts that have some variation in key measures; we can use the Chi Square to see if the distribution of these differences from the specification value is normally distributed, or if the distribution is changing overtime (indicating something is changing – such as machine tolerances). The author used this approach to compare the distribution/pattern of responses to questions on an employee opinion survey between departments and the overall division. Different response patterns suggested the issue was a departmental one while similar patterns suggested that the division “owned” the results, indicating which group should develop ways to improve the results. Summary This week we looked at two different tests, one that looks for mean differences among two or more groups and one that looks for differences in
  • 23. patterns, distributions, or shapes in the data set. The Analysis of Variance (ANOVA) test uses the difference in variance between the entire data set and the average variance of the groups to see if at least one mean differs. If so, the construction of difference intervals will tell us which of the pairs of means actually differ. The Chi Square tests look at patterns within data sets and lets us compare them to a standard or to each other. Both tests are found in the Data Analysis link in Excel and follow the same basic set-up process as we saw with the F and t-tests last week. If you have any questions on this material, please ask your instructor. After finishing with this lecture, please go to the first discussion for the week, and engage in a discussion with others in the class over the first couple of days before reading the second lecture. BUS 308 Week 2 Lecture 1 Examining Differences - overview
  • 24. Expected Outcomes After reading this lecture, the student should be familiar with: 1. The importance of random sampling. 2. The meaning of statistical significance. 3. The basic approach to determining statistical significance. 4. The meaning of the null and alternate hypothesis statements. 5. The hypothesis testing process. 6. The purpose of the F-test and the T-test. Overview Last week we collected clues and evidence to help us answer our case question about males and females getting equal pay for equal work. As we looked at the clues presented by the salary and comp-ratio measures of pay, things got a bit confusing with results that did not see to be consistent. We found, among other things, that the male and female compa-ratios were fairly close together with the female mean being slightly larger. The salary analysis showed a different view; here we noticed that the averages were apparently quite different with the males, on average, earning more. Contradictory findings such as this are not all that uncommon when examining data in the “real world.” One issue that we could not fully address last week was how meaningful were the differences? That is, would a different sample have results that might be completely different, or can we be fairly sure that the observed differences are real and show up in the population as
  • 25. well? This issue, often referred to as sampling error, deals with the fact that random samples taken from a population will generally be a bit different than the actual population parameters, but will be “close” enough to the actual values to be valuable in decision making. This week, our journey takes us to ways to explore differences, and how significant these differences are. Just as clues in mysteries are not all equally useful, not all differences are equally important; and one of the best things statistics will do for us is tell us what differences we should pay attention to and what we can safely ignore. Side note; this is a skill that many managers could benefit from. Not all differences in performances from one period to another are caused by intentional employee actions, some are due to random variations that employees have no control over. Knowing which differences to react to would make managers much more effective. In keeping with our detective theme, this week could be considered the introduction of the crime scene experts who help detectives interpret what the physical evidence means and how it can relate to the crime being looked at. We are getting into the support being offered by experts who interpret details. We need to know how to use these experts to our fullest advantage. �� Differences
  • 26. In general, differences exist in virtually everything we measure that is man-made or influenced. The underlying issue in statistical analysis is that at times differences are important. When measuring related or similar things, we have two types of differences: differences in consistency and differences in average values. Some examples of things that should be the “same” could be: • The time it takes to drive to work in the morning. • The quality of parts produced on the same manufacturing line. • The time it takes to write a 3-page paper in a class. • The weight of a 10-pound bag of potatoes. • Etc. All of these “should” be the same, as each relates to the same outcome. Yet, they all differ. We all experience differences in travel time, and the time it takes to produce the same output on the job or in school (such as a 3-page paper). Production standards all recognize that outcomes should be measured within a range rather than a single point. For example, few of us would be upset if a 10-pound bag of potatoes weighed 9.85 pounds or would think we were getting a great deal if the bag weighed 10.2 pounds. We realize that it is virtually impossible for a given number of potatoes to weigh exactly the same and we accept this as normal. One reason for our acceptance is that we know that variation occurs. Variation is simply the differences that occur in things that should be “the same.” If we can measure things with
  • 27. enough detail, everything we do in life has variation over time. When we get up in the morning, how long it takes to get to work, how effective we are at doing the same thing over and over, etc. Except for physical constants, we can say that things differ and we need to recognize this. A side note: variation exists in virtually everything we study (we have more than one language, word, sentence, paragraph, past actions, financial transactions, etc.), but only in statistics do we bring this idea front and center for examination. This suggests that any population that we are interested in will consist of things that are slightly different, even if the population contains only one “thing.” Males are not all the same, neither are females. Manufactured parts differ in key measurements; this is the reason we have quality control checking to make sure the differences are not too large. So, even if we measure everything in our population we will have a mean that is accompanied by a standard deviation (or range). Managers and professionals need to manage this variation, whether it is quantitative (such as salary paid for similar work) or even qualitative (such as interpersonal interactions with customers). The second reason that we are so concerned with differences is that we rarely have all the evidence, or all the possible measures of what we are looking for. Having this would mean we have access to the entire population (everything we are interested in); rarely is this the case. Generally, all decisions, analysis, research, etc. is done with samples, a selected subset of the
  • 28. population. And, with any sample we are not going have all the information needed, obviously; but we also know that each sample we take is going to differ a bit. (Remember, variation is everywhere, including in the consistency of sample values.) If you are not sure of this, try flipping a coin 10 times for 10 trials, do you expect or get the exact same number of heads for each trial? Variation! Since we are making decisions using samples, we have even more variation to consider than simply that with the population we are looking at. Each sample will be slightly different from its population and from others taken from the same population. How do we make informed decisions with all this variation and our not being able to know the “real” values of the measures we are using? This question is much like how detectives develop the “motive” for a crime – do they know exactly how the guilty party felt/thought when they say “he was jealous of the success the victim had.” This could be true, but it is only an approximation of the true feelings, but it is “close enough” to say it was the reason. It is similar with data samples, good ones are “close enough” to use the results to make decisions with. The question we have now focuses on how do we know what the data results show? The answer lies with statistical tests. They can use the
  • 29. observed variation to provide results that let us make decisions with a known chance of being wrong! Most managers hope to be right just over 50% of the time, a statistical decision can be correct 95% or more of the time! Quite an improvement. Sampling. The use of samples brings us to a distinction in summary statistics, between descriptive and inferential statistics. With one minor exception (discussed shortly), these two appear to be the same: means, standard deviations, etc. However, one very important distinction exists in how we use these. Descriptive statistics, as we saw last week, describes a data set. But, that is all they do. We cannot use them to make claims or inferences about any other larger group. Making inferences or judgements about a larger population is the role of inferential statistics and statistical tests. So, what makes descriptive statistics sound enough to become inferential statistics? The group they were taken from! If we have a sample that is randomly selected from the population (meaning that each member has the same chance of being selected at the start), then we have our best chance of having a sample that accurately reflects the population, and we can use the statistics developed from that sample to make inferences back to the population. (How we develop a randomly selected sample is more of a research course issue, and we will not go into these details. You are welcome to search the web for approaches.)
  • 30. Random Sampling. If we are not working with a random sample, then our descriptive statistics apply only to the group they are developed for. For example, asking all of our friends their opinion of Facebook only tells us what our friends feel; we cannot say that their opinions reflect all Facebook users, all Facebook users that fall in the age range of our friends, or any other group. Our friends are not a randomly selected group of Facebook users, so they may not be typical; and, if not typical users, cannot be considered to reflect the typical users. If our sample is random, then we know (or strongly suspect) a few things. First, the sample is unlikely to contain both the smallest and largest value that exists in the larger population, so an estimate of the population variation is likely to be too small if based on the sample. This is corrected by using a sample standard deviation formula rather than a population formula. We will look at what this means specifically in the other lectures this week; but Excel will do this for us easily. Second, we know that our summary statistics are not the same as the population’s parameter values. We are dealing with some (generally small) errors. This is where the new statistics student often begins to be uncomfortable. How can we make good judgements if our information is wrong? This is a reasonable question, and one that we, as data detectives, need to
  • 31. be comfortable with. The first part of the answer falls with the design of the sample, by selecting the right sample size (how many are in the sample), we can control the relative size of the likely error. For example, we can design a sample where the estimated error for our average salary is about plus or minus $1,000. Does knowing that our estimates could be $1000 off change our view of the data? If the female average was a thousand dollars more and the male salary was a thousand dollars less, would you really change your opinion about them being different? Probably not with the difference we see in our salary values (around 38K versus 52K). If the actual averages were closer together, this error range might impact our conclusions, so we could select a sample with a smaller error range. (Again, the technical details on how to do this are found in research courses. For our statistics class, we assume we have the correct sample.) Note, this error range is often called the margin of error. We see this most often in opinion polls. For example, if a poll said that the percent of Americans who favored Federal Government support for victims of natural disasters (hurricanes, floods, etc.) was 65% with a margin of error of +/- 3%; we would say that the true proportion was somewhat between 62% to 68%, clearly a majority of the population. Where the margin of error becomes important to know is when results are closer together, such as when support is 52% in favor versus 48% opposed, with a margin of error of 3%. This means the actual
  • 32. support could be as low as 49% or as high as 55%; meaning the results are generally too close to make a solid decision that the issue is supported by a majority, the proverbial “too close to call.” The second part of answering the question of how do we make good decisions introduces the tools we will be looking at this week, decision making statistical tests that focus on examining the size of observed differences to see if they are “meaningful” or not. The neat part of these tools is we do not need to know what the sampling error was, as the techniques will automatically include this impact into our results! The statistical tools we will be looking at for the next couple of weeks all “work” due to a couple of assumptions about the population. First, the data needs to be at the interval or ratio level; the differences between sequential values needs to be constant (such as in temperature or money). Additionally, the data is assumed to come from a population that is normally distributed, the normal curve shape that we briefly looked at last week. Note that many statisticians feel that minor deviations from these strict assumptions will not significantly impact the outcomes of the tests. The tools for this week and next use the same basic logic. If we take a lot of samples from the population and graph the mean for all of them, we will get a normal curve (even if the population is not exactly normal) distribution called the
  • 33. sampling distribution of the mean. Makes sense as we are using sample means. This distribution has an overall, or grand, mean equal to that of the population. The standard deviation equals the standard deviation of the population divided by the square root of the population. (Let’s take this on faith for now, trust me you do not want to see the math behind proving these. But if you do, I invite you to look it up on the web.) Now, knowing – in theory – what the mean values will be from population samples, we can look at how any given sample differs from what we think the population mean is. This difference can be translated into what is essentially a z-score (although the specific measure will vary depending upon the test we are using) that we looked at last week. With this statistic, we can determine how likely (the probability of) getting a difference as large or larger than we have purely by chance (sampling error from the actual population value) alone. If we have a small likelihood of getting this large of a difference, we say that our difference is too large to have been purely a sampling error, and we say a real difference exists or that the mean of the population that the sample came from is not what we thought. That is the basic logic of statistical testing. Of course, the actual process is a bit more structured, but the logic holds: if the probability of getting our result is small (for example 4% or 0.04), we say the difference is significant. If the probability is large (for example 37% or 0.37), then we say there is not enough evidence to say the difference is
  • 34. anything but a simple sampling error difference from the actual population result. The tools we will be adding to our bag of tricks this week will allow us to examine differences between data sets. One set of tools, called the t- test, looks at means to see if the observed difference is significant or merely a chance difference due mostly to sampling error rather than a true difference in the population. Knowing if means differ is a critical issue in examining groups and making decisions. The other tool – the F-test for variance, does the same for the data variation between groups. Often ignored, the consistency within groups is an important characteristic in understanding whether groups having similar means can be said to be similar or not. For example, if a group of English majors all took two classes together, one math and one English, would you expect the grade distributions to be similar, or would you expect one to show a larger range (or variation) than the other? We will see throughout the class that consistency and differences are key elements to understanding what the data is hiding from us, or trying to tell us – depending on how you look at it. In either case, as detectives our job is to ferret out the information we need to answer our questions. Hypothesis Testing-Are Differences Meaningful Here is where the crime scene experts come in. Detectives have
  • 35. found something but are not completely sure of how to interpret it. Now the training and tools used by detectives and analysts take over to examine what is found and make some interpretations. The process or standard approach that we will use is called the hypothesis testing procedure. It consists of six steps; the first four (4) set up the problem and how we will make our decisions (and are done before we do anything with the actual data), the fifth step involves the analysis (done with Excel), and the final and sixth step focuses on interpreting the result. The hypothesis testing procedure is a standardized decision- making process that ensures we make our decisions (on whether things are significantly different or not) is based on the data, and not some other factors. Many times, our results are more conservative than individual managerial judgements; that is, a statistical decision will call fewer things significantly different than many managerial judgement calls. This statistical tendency is, at times, frustrating for managers who want to show that things have changed. At other times, it is a benefit such as if we are hoping that things, such as error rates, have not changed. While a lot of statistical texts have slightly different versions of the hypothesis testing procedure (fewer or more steps), they are essentially the same, and are a spinoff of the scientific method. For this class, we will use the following six steps:
  • 36. 1. State the null and alternate hypothesis 2. Select a level of significance 3. Identify the statistical test to use 4. State the decision rule. Steps 1 – 4 are done before we examine the data 5. Perform the analysis 6. Interpret the result. Step 1 A hypothesis is a claim about an outcome. It comes in two forms. The first is the null hypothesis – sometimes called the testable hypothesis, as it is the claim we perform all of our statistical tests on. It is termed the “Null” hypothesis, shown as Ho, as it basically says “no difference exists.” Even if we want to test for a difference, such as males and females having a different average compa-ratio; in statistics, we test to see if they do not. Why? It is easier to show that something differs from a fixed point than it is to show that the difference is meaningful – I mean how can we focus on “different?” What does “different” mean? So, we go with testing no difference. The key rule about developing a null hypothesis is that it always contains an equal claim, this could be equal (=), equal to or less than (<=), or equal to or more than (=>). Here are some examples: Ex 1: Question: Is the female compa-ratio mean = 1.0?
  • 37. Ho: Female compa-ratio mean = 1.0. Ex 2: Q: is the female compa-ratio mean = the male compa- ratio mean? Ho: Female compa-ratio mean = Male compa-ratio mean. Ex. 3: Q: Is the female compa-ratio more than the male compa- ratio? Note that this question does not contain an equal condition. In this case, the null is the opposite of what the question asks: Ho: Female compa-ratio <= Male compa-ratio. We can see by testing this null, we can answer our initial question of a directional difference. This logic is key to developing the correct test claim. A null hypothesis is always coupled with an alternate hypothesis. The alternate is the opposite claim as the null. The alternate hypothesis is shown as Ha. Between the two claims, all possible outcomes must be covered. So, for our three examples, the complete step 1 (state the null and alternate hypothesis statements) would look like: Ex 1: Question: Is the female compa-ratio mean = 1.0? Ho: Female compa-ratio mean = 1.0. Ha: Female compa-ratio mean =/= (not equal to) 1.0
  • 38. Ex 2: Q: is the female compa-ratio mean = the male compa- ratio mean? Ho: Female compa-ratio mean = Male compa-ratio mean. Ha: Female compa-ratio mean =/= Male compa-ration mean. Ex. 3: Q: Is the female compa-ratio more than the male compa- ratio? Ho: Female compa-ratio <= Male compa-ratio Ha: Female compa-ratio > Male compa-ratio. (Note that in this case, the alternate hypothesis is the question being asked, but the null is what we always use as the test hypothesis.) When developing the null and alternate hypothesis, 1. Look at the question being asked. 2. If the wording implies an equality could exist (equal to, at least, no more than, etc.), we have a null hypothesis and we write it exactly as the question asks. 3. If the wording does not suggest an equality (less than, more than, etc.), it refers to the alternate hypothesis. Write the alternate first. 4. Then, for whichever hypothesis statement you wrote, develop the other to contain all of the other cases. An = null should have a =/= alternate, an => null should have a < alternate; a <= null should have a > alternate, and vice versa.
  • 39. 5. The order the variables are listed in each hypothesis must be the same, if we list males first in the null, we need to list males first in the alternate. This minimizes confusion in interpreting results. Note: the hypothesis statements are claims about the population parameters/values based on the sample results. So, when we develop our hypothesis statements, we do not consider the sample values when developing the hypothesis statements. For example, consider our desire to determine if the compa-ratio and salary means for males and females are different in the population, based on our sample results. While the compa-ratio means seemed fairly close together, the salary means seemed to differ by quite a bit; in both cases, we would test if the male and female means were equal since that is the question we have about the values in the population. If you look at the examples, you can notice two distinct kinds of null hypothesis statements. One has only an equal sign in it, while the other contains an equal sign and an inequality sign (<=, but it could be =>). These two types correspond to two different research questions and test results. If we are only interested in whether something is equal or not, such as if the male average
  • 40. salary equals the female average salary; we do not really care which is greater, just if they could be the same in the population or not. For our equal salary question, it is not important if we find that the male’s mean is > (greater than) the female’s mean or if the male’s mean is < (less than) the female’s mean; we only care about a difference existing or not in the population. This, by the way, is considered a two-tail test (more on this later), as either conditions would cause us to say the null’s claim of equality is wrong: a result of “rejecting the null hypothesis.” The other condition we might be interested in, and we need a reason to select this approach, occurs when we want to specifically know if one mean exceeds the other. In this situation, we care about the direction of the difference. For example, only if the male mean is greater than the female mean or if the male mean is less than the female mean. Step 2 The level of significance is another concept that is critical in statistics but is often not used in typical business decisions. One senior manager told the author that their role was to ensure that the “boss’ decisions were right 50% +1 of the time rather than 50% -1.” This suggests that the level of confidence that the right decisions are being made is around 50%. In statistics, this would be completely unacceptable. A typically statistical test has a level of confidence that the right decision is being made is
  • 41. about 95%, with a typical range from 90 to 99%. This is done with our chosen level of significance. For this class, we will always use the most common level of 5%, or more technically alpha = 0.05. This means we will live with a 5% chance of saying a difference is significant when it is not and we really have only a chance sampling error. Remember, no decision that does not involve all the possible information that can be collected will ever have a zero possibility of being wrong. So, saying we are 95% sure we made the right call is great. Marketing studies often will use an alpha of .10, meaning that are 90% sure when they say the marketing campaign worked. Medical studies will often use an alpha of 0.01 or even 0.001, meaning they are 99% or even 99.9% sure that the difference is real and not a chance sampling error. Step 3 Choosing the statistical test and test statistic depends upon the data we have and the question we are asking. For this week, we will be using compa- ratio data in the examples and salary data in the homework – both are continuous and at least interval level data. The questions we will look at this week will focus on seeing if there is a difference in the average pay (as measured by either the compa-ratio or salary) between males and females in the population, based on our sample results. After all, if we cannot find a
  • 42. difference in our sample, should we even be working on the question? In the quality improvement world, one of the strategies for looking for and improving performance of a process is to first look at and reduce the variation in the data. If the data has a lot of variation, we cannot really trust the mean to be very reflective of the entire data set. Our first statistical test is called the F-test. It is used when we have at least interval level data and we are interested in determining if the variances of two groups are significantly different or if the observed difference is merely chance sampling error. The test statistic for this is the F. Once we know if the variances are the same or not, we can move to looking for differences between the group means. This is done with the T- test and the t-statistic. Details on these two tests will be given later; for now, we just need to know what we are looking at and what we will be using. Step 4 One of the rules in researching questions is that the decision rule, how we are going to make our decision once the analysis is done, should be stated upfront and, technically, even before we even get to the data. This helps ensure that our decision is data driven rather than being made by emotional factors to get the outcome we want rather than the outcome that fits the
  • 43. data. (Much like making our detectives go after the suspect that did the crime rather than the one they do not like and want to arrest, at least when they are being honest detectives.) The decision rule for our class is very simple, and will always be the same: Reject the null hypothesis if the p-value is less than our alpha of .05. (Note: this would be the same as saying that if the p-value is not less than 0.05, we would fail to reject the null hypothesis.) We introduced the p-value last week, it is the probability of our outcome being as large or larger than we have by pure chance alone. The further from the actual mean a sample mean is, the less chance we have of getting a value that differs from the mean that much or more; the closer to the actual mean, the greater our chance would be of getting that difference or more purely by sampling error. Our decision rule ties our criteria for significance of the outcome, the step 2 choice of alpha, with the results that the statistical tests will provide (and, the Excel tests will give us the p- values for us to use in making the decisions). These four steps define our analysis, and are done before we do any analysis of the data. Step 5
  • 44. Once we know how we will analyze and interpret the results, it is time to get our sample data and set it up for input into an Excel statistical function. Some examples of how this data input works will be discussed in the third lecture for this week. This step is fairly easy, simply identify the statistical test we want to use. The test to use is based on our question and the related hypothesis claims. For this week, if we are looking at variance equality, we will use the F-test. If we are looking at mean equality, we will use the T- test. Step 6 Here is where we bring everything together and interpret the outcomes. What is constant about this step is the need to: 1. Look at the appropriate p-value (indicated in the test outputs, as we will see in lecture 2). 2. Compare the p-value with our value for alpha (0.05). 3. Make a decision: if the test p-value is less than or equal to (<=) 0.05, we will reject the null hypothesis. If the test p-value is more than (=>) 0.05, we will fail to reject the null hypothesis. Rejecting the null hypothesis means that we feel the alternate hypothesis is the more
  • 45. accurate statement about the populations we are testing. This is the same for all of our statistical tests. Once we have made our decision to reject or fail to reject the null hypothesis, we need to close the loop, and go back and answer our original question. We need to take the statistical result or rejecting or failing to reject the null and turn it into an “English” … Expected Outcomes BUS308 – Week 1 Lecture 1 Statistics After reading this lecture, the student should be familiar with: 1. The basic ideas of data analysis. 2. Key statistical concepts and terms. 3. The basic approach for this class. 4. The case focus for the class. What we are all about Data, measurements, counts, etc., is often considered the language of business. However, it also plays an important role in our personal lives as well. Data, or more accurately, the analysis of data answers our questions. These may be business related or personal. Some questions we may have heard that require data to answer include: 1. On average, how long does it take you to get to work? Or, alternately, when do you have to leave to get to work on time? 2. For budget purposes, what is the average expense for utilities, food, etc.? 3. Has the quality rejection rate on production Line 3 changed? 4. Did the new attendance incentive program reduce the tardiness for the department? 5. Which vendor has the best average price for what we order? 6. Which customers have the most complaints about our products?
  • 46. 7. Has the average production time decreased with the new process? 8. Do different groups respond differently to an employee questionnaire? 9. What are the chances that a customer will complain about or return a product? Note that all of these very reasonable questions require that we collect data, analyze it, and reach some conclusion based upon that result. Making Sense of Data This class is about ways to turn data sets, lots of raw numbers, into information that we can use. This may include simple descriptions of the data with measures such as average, range, high and low values, etc. It also includes ways to examine the information within the data set so that we can make decisions, identify patterns, and identify existing relationships. This is often called data analysis; some courses discuss this approach with the term “data-based decision making.” During this class we will focus on the logic of analyzing data and interpreting these results. What this class is not This class is not a mathematics course. I know, it is called statistics and it deals with numbers, but we do not focus on creating formulas or even doing calculations. Excel will do all of the calculations for us; for those of you who have not used Excel before, and even for some who have, you will be pleasantly surprised at how powerful and relatively easy to use it is. It is also not a class in collecting the data. Courses in research focus on how to plan on collecting data so that it is fair and unbiased. Statistics deals with working on the data after it has been collected. Class structure There are two main themes to this class. The first focuses on interpreting statistical outcomes. When someone says, the result is statistically significant with a p-value of 0.01; we need, as
  • 47. professionals, to know what it means. As you move higher into business and other professional positions, you will probably hear others report on studies using this kind of language. (Data analysis is becoming increasing more common in business.) The second thread focuses on how to take some data and generate statistical reports using Excel. Excel is a fairly common PC program that is part of Microsoft’s Office suite of tools, and as such many businesses have it available for professionals and managers. Even if you just do a quick analysis of some data, this program is tremendously useful. This class does not have a text, but rather provides the material you need in three lectures each week. The first lecture is an overview, it provides a structure about what the week’s focus is all about. The second lecture focuses on understanding the statistical tools being presented; how to read the outputs and how to understand and interpret what they are telling us. The third lecture for each week focuses on Excel and presenting the steps needed to generate the statistical output. Unlike other classes, we have three weekly discussions; one related to each of the lecture segments. The intent is for you to read a lecture and then go to the discussion thread for a couple of days. Then go read the next lecture, discuss it for a couple of days, and then finish with the last lecture. This chunking of material is designed to let the information “sink in” before moving to new things. Introducing Statistical Analysis Data analysis Data analysis, whether statistical, financial, operational, etc., often appears to be a set of unrelated tools and techniques that are somehow applied sequentially to get an answer to some question. Indeed, most textbooks present statistical analysis this way; introduce a topic, provide some examples, present practice exercises, and then on to the next topic with new examples and exercises that often have nothing to do with what was previously presented. This approach, while common in many numerical and even
  • 48. qualitative courses, often leaves students with an incomplete idea of how everything fits together. We are trying a different approach in this class and will be using a single case/situation to demonstrate the interconnectedness of our tools. Data analysis, and particularly statistical analysis, is much like solving a mystery. Those who work with these tools are like the detectives we see on TV shows. In general, the process involves a situation (or crime) presents itself and the team goes to work. Initially, they look at the “big picture” to try and understand the basics of the situation. After that, the focus shifts to specific details as they examine suspects, look for and verify alibis, find links between different individuals and activities; often this part of the investigation seems uncoordinated and even a bit chaotic with little obvious links to the overall situation. But, finally everything is pulled together and all the various threads form a conclusion and they “nab the culprit.” So, to tie what the TV detectives do with what we, as data analysts, will be doing, take a look at the following. Hopefully, this will relate the familiar crime story to data analysis. · The “crime” we focus on presents itself as some outcome; results of a manufacturing process, customer satisfaction ratings differences, financial outcomes, etc.; that we do not fully understand. · The “witnesses” we look at are the different data measurements we have. · Our “questions” are just that – questions about what we want to find out from the data. · Our “theory of the crime” focuses on how we think the data is related to our questions. · The “alibis” are the data summaries and test outcomes that show if particular data is related to the outcome or not. · The “false leads” are data measures that are not actually helpful in answering our questions. · The “person(s) of interest” or suspects are the specific measurements or counts that could
  • 49. influence pay levels. These include grade level, seniority, performance appraisal rating, gender, raise, and education/degree level · And, finally, the “guilty party” is the data that is related to any illegal pay discrepancies we uncover. Just as with any of our favorite shows, we need to take all of these elements and work through them to come up with the answers to our questions; and, often, an understanding of why the issue exists at all. The Crime In this course, we will have a single “crime” to focus on. This issue will form the basis for the lectures each week and the assignments. We will be looking at a Human Resources issue: are males and females getting equal pay for doing equal work? As background, The Federal Equal Pay Act requires companies to pay males and females the same pay if they are doing (substantially) the same work. We will be taking the role of data analysts (AKA detectives) in a company that has received some evidence that they are going to have a Federal audit of their pay practices due to a complaint on unfair pay practices. Our “job,” the basis of the class assignments, is to determine if we do or do not comply with the Equal Pay Act. HR professionals often examine pay issues from two points of view. One is the actual salary an employee makes, a very obvious approach. This is the approach that you will take as you do the weekly assignments. Each assignment and each question require you to focus on the actual salaries paid to the employees. What differences do we see, how consistent is the data, what impacts salary outcomes, etc.? The second approach is more relative in nature and deals with a compensation measure called the compa-ratio (comparison- ratio). This measure compares an employee’s salary to the midpoint of the salary grade; this is done simply by dividing the employee’s salary by the midpoint of the salary grade the employee is in. (For those not familiar with salary grades, they
  • 50. are groups of jobs within a company that generally require the same skill and knowledge levels and are paid within the same salary range. The midpoints of these ranges are considered to be the market rate, or average pay that companies in the community pay for similar work.) Examining compa-ratios lets HR see how employees are distributed around the midpoint without focusing on the actual different salaries involved. It provides a second way to examine how males and females are paid without worrying about the grades they are in. This approach will be used in the weekly lectures to provide both an example of how to do each homework problem and a way of providing a different view on the equal pay question. So, each week we will be looking at the pay practices of “our company” in two ways. The lectures and the weekly assignments will each deal with the same questions but will do so with a different measure of pay. In the homework, you will be asked to form tentative conclusions each about the equal pay question using the insights from BOTH the lecture examples of compa-ratio and the salary-based results from your homework problems. One additional point, the data used in the weekly lectures will be slightly different than the data set you will be working with. We can consider these differences to be minor, as if the lecture uses a different sample of employees, but one that is consistent with the sample used for the homework. The conclusions reached in each week’s homework should use the findings from both the lecture’s examples and the homework problems. The actual reason for the difference is that students in the past have copied answers from websites and other students and handed them in as their own original work. So, to keep this from happening, the class data set you will be working with changes periodically, so previous answers will not be correct. It does not make sense to redo the lectures every time the data changes, so the lecture’s salary and compa-ratio data is comparable but not identical. Getting Started
  • 51. In real life on the job or with assignments we often, as do TV detectives, have an overwhelming amount of data that we need to sift through to get to our clues; and then interpret the clues to get the information we need to answer our questions about what happened with the process or outcome we are looking at. The information that we are first presented with is typically a bunch of numbers that measure, count, and code lots of things. Note we have three kinds of data we will deal with: • Measures tell us how much exists; such as a salary measure would tell us how many dollars someone is paid. · Counts tell us how many exist, such as counting how many employees have a master’s degree. · Codes tell us about a characteristic; for example, we might code being male as M and being female as F. However, we could also use 0 for male and 1 for female. These numbers do not mean one gender is somehow ‘better” or “higher” than the other, they merely show a difference. They are identifiers. More about this latter. So, as data detectives, we approach any question by finding numbers (measures, counts, and codes) that somehow relate to the issue and the question we need to answer about the situation. Once we have this data, we need to sort thru it to find the clues that need to be examined to understand the situation or outcome. For this class, clues are what we get once we have done some statistical work on the data. This work, as we will see throughout the class, starts with relatively simply summaries – average values for different groups or things, measures of how consistent things are, etc. These summary measures become our first clues. And, just as with any good detective story, not all the clues are meaningful and some are not immediately apparent. The detective/analyst needs to find out what happened what the clues mean to understand and “solve” the crime. Before we start with the data and how to tease clues from it, we need to understand a couple of concepts:
  • 52. · Population: includes all of the “things” we are interested in; for example, the population of the U.S. would include everyone living in the country. · Sample: involves only a selected sub-group of the population; for example, those selected for a national opinion poll. · Random Sample: a sample where every member of the population has an initial equal chance of being selected; this is the only way to obtain a sample that is truly representative of the entire population. Details on how to conduct a random survey are covered in research courses; we will assume the data we will be working with comes from a random sample of a company’s exempt employees. Note: an exempt employee, AKA salaried employee, does not get overtime pay for working more than 40 hours in a week (“exempt from overtime requirements”). · Parameter: a characteristic of the population; the average age of everyone in the US would be a parameter. · Statistic: a characteristic of a sample; the average age of everyone you know who attends school would be a statistic as the group is a sub-group of all students. · Descriptive Statistics: measures that summarize characteristics of a group. · Inferential Statistics: measures that summarize the characteristics of a random sample and are used to infer the value of the population parameters. · Statistical test: a quantitative technique to make a judgement about population parameters based upon random sample outcomes (statistics). The Case Our class, as a group of data analysts/detectives, will play the role of helping a Human Resources Department (in our assumed “company”) prepare for an audit from the government about our pay practices. This routine audit will focus on the question of equal pay for equal work, as required by both State and Federal statutes. Specifically, these require that if males and females
  • 53. are performing substantially the same job (equal work), then they should be paid equally. Of course, nothing is quite that simple. The laws do allow some differences based on company policies calling for pay differences due to performance, seniority, education, and – with some companies – functional areas. Our company does have policies saying we pay for organizational level (different kinds of work, which are represented by job grades), performance (as measured by the performance rating), seniority, experience, and educational achievements. Our first step is to decide upon some questions that need to be answered, as questions lead to the data we need to collect. The overall question, also known as the Research Question, is simply: “Do males and females receive equal pay for equal work?” This just means that if a male and female are doing the same work for the company, are they paid the same? As straightforward as this question seems, it is very difficult to answer directly. So, after brainstorming, secondary or intermediate (more basic) questions have been identified as needing to be answered as we build our case towards the final answer. Some of these secondary questions (which will be address throughout the course) include: · Do we have any measures that show pay comparisons between males and females? · Since different factors influence pay, do males and females fare differently on them; such as age, service, education, performance ratings, etc.? · How do the various salary related inputs interact with each other? How do they impact pay levels? These general questions lead to our collecting data from a random sample of employees. Note that a random sample (covered in research courses) is the best approach to give us a sample that closely represents the actual employee population. The sample consists of 25 males and 25 females. The following data was collected on each employee selected:
  • 54. · Salary, rounded to the nearest $100 dollars and measured in thousands of dollars, for example an annual salary of $38,825 is recorded as 38.8. · Age, rounded (up or down) to the age as of the employee’s nearest birthday. · Seniority, rounded (up or down) to the nearest hiring anniversary. · Performance Appraisal Rating, based on a 100-point scale. · Raise – the percent of their last performance merit increase. · Job grade – groups of jobs that are considered substantially similar work (for equal work purposes) that are grouped into classifications ranging from A (the lowest grade) through F (the highest grade). Note: all employees in this study are exempt employees – paid with a salary and not eligible for overtime payments. They are considered middle management and professional level employees. · Midpoint – the middle of the salary range assigned to each Job Grade level. The midpoint is considered to be the average market rate that companies pay for jobs within each grade. · Degree – the educational achievement level, coded as 0 for those having a Bachelor’s degree and 1 for those having a Master’s degree or higher. · Gender – coded as M for Males, and F for Females, also coded 0 (Males) and 1 (Females) for use in an advanced statistical technique introduced in Week 4. In addition to these collected measures, the HR Compensation Department has provided the compa-ratio for each employee. The Compa-ratio is defined as the salary divided by the employee’s grade midpoint. For example, an employee with a salary of $50,000 and a company salary range midpoint of $48,000 would have a Compa-ratio of 50/48 = 1.042 (rounded to three decimal places). Employees with a Compa-ratio greater (>) 1.0 are paid more than the market rate for their job, while employees with a Compa-ration less than (<) 1.0 are paid less than the prevailing market rate. Compensation professionals use Compa-ratios to examine the spread and relative pay levels of
  • 55. employees while the impact of grade is removed from the picture. Here is the data collected that will be used in the lecture examples and discussions. ID Salary Compa- ratio Mid Age Perf App. Service Gender Raise Deg. Gender 1 Grade 1 58 1.017 57 34 85 8 0 5.7 0 M E 2 27 0.870 31
  • 64. 22 57 1.187 48 48 65 6 1 3.8 1 F D 23 23 1.000 23 36 65 6 1 3.3 0 F A 24 50 1.041 48 30 75 9 1 3.8 0 F D 25 24 1.043 23 41 70 4 0 4 0 M A 26 24 1.043 23 22 95 2 1 6.2 0 F A 27 40 1.000 40 35 80 7 0 3.9 1 M C 28 75 1.119 67 44 95 9 1 4.4 0 F F 29 72 1.074 67 52 95 5 0 5.4 0 M F 30 49 1.020 48 45 90 18 0 4.3 0 M D 31 24 1.043 23 29 60 4 1 3.9 1 F A 32 28 0.903 31 25 95 4 0 5.6 0 M B 33 64 1.122 57 35 90 9 0 5.5 1 M E 34 28 0.903 31 26 80 2 0 4.9 1 M B 35 24 1.043 23 23 90 4 1 5.3 0 F A 36 23 1.000 23 27 75 3 1 4.3 0 F A 37 22 0.956 23 22 95 2 1 6.2 0 F A 38 56 0.982 57 45 95 11 0 4.5 0 M E 39 35 1.129 31 27 90 6 1 5.5 0 F B 40 25 1.086 23 24 90 2 0 6.3 0 M A 41 43 1.075 40 25 80 5 0 4.3 0 M C 42 24 1.043 23 32 100 8 1 5.7 1 F A 43 77 1.149 67 42 95 20 1 5.5 0 F F 44 60 1.052 57 45 90 16 0 5.2 1 M E 45 55 1.145 48 36 95 8 1 5.2 1 F D 46 65 1.140 57 39 75 20 0 3.9 1 M E 47 62 1.087 57 37 95 5 0 5.5 1 M E 48 65 1.140 57 34 90 11 1 5.3 1 F E 49 60 1.052 57 41 95 21 0 6.6 0 M E 50 66 1.157 57 38 80 12 0 4.6 0 M E (Note that this table can be copied into an Excel file if you would like to duplicate the examples provided in the lectures.) What kind of data do we have? Just as all clues and information uncovered on mystery shows are not equally valuable, or even useful; not all data is equally useful in answering questions. But, all data has some value. As
  • 65. we look at this data set, it is clear that not all the data is the same. We have some measures (salary, seniority, etc.) but we also have some labels (ID, for example merely identifies different employees in the data set, and is not useful for much else). We have some data that are clearly codes, gender and degree for example. In general, our data set can be sorted into four kinds of data, nominal, ordinal, interval, and ratio (NOIR): · Nominal: these are basically names or labels. For example, in our data set we see Gender1 labeled as M and F (for males and females). Other examples of nominal data include names of cars (Ford, Chevrolet, Dodge, etc.), cities and states, flowers, etc. Anything where the name/label just indicates a difference from something else that is similar is nominal level data. Now, we can “code” with words and letters (such as Male or M) but we can also code them with using 0 and 1(for male and female) as we do with the Gender variable. Regardless of one looking like a label (letters) and one looking like a measurement (numbers), both of these are simply ways to label males and females – they indicate a difference between the groups only – not that one is somehow higher than the other (as we typically think of 1 as higher or more than 0). Nominal level data are used in two ways. First, we can count them; for example, how many males and females exist in the group? Second, we can use them as group labels to identify different groups, and list other characteristics in each group; a list of all male and female compa-ratios will be quite helpful in our analysis, for example. · Ordinal: these variables add a sense of order to the difference, but where the differences are not the same between levels. Often, these variables are based on judgement calls creating labels that can be placed in a rank order, such as good, better, best. The grade and degree variables in our data set are ordinal. We cannot assume that the amount of work to get the higher degree or higher job grade is the same for all differences. Note: Even though we only show education as bachelor and graduate, we could include no high school diploma, high school diploma
  • 66. on the low end and doctoral degree and professional certification on the upper end. · Interval: these variables have a constant difference between successive values. Temperature is a common example – the difference between, for example, 45 and 46 degrees is the same amount of heat as between 87 and 88 degrees. Note: Often, analysts will assume that personal judgement scores such as Performance Appraisal ratings or responses on a questionnaire scale using scores of 1 to 5 are ordinal as it cannot be proven the differences are constant. Other researchers have suggested that these measures can be considered interval in nature for analysis purposes. We will consider performance appraisal ratings to be interval level data for our analysis purposed. · Ratio: these are interval measures that have a 0 point that means none. For example, while 0 dollars in your wallet means no money, a temperature of 0 degrees does not mean no heat. Ratio level variables include salary, compa-ratio, midpoint, age, service, and raise – even if our measurements do not go down to 0, each measure does have a 0 point that would mean none. These differences are important, as we can do different kinds of analysis with each level, and attempting to use the wrong level of data in an approach will result in misleading or wrong outcomes. Within our data set our variables fit into these groups. · Nominal: ID, Gender, Gender1 (merely labels showing a difference) · Ordinal: Grade, Degree (can be ordered from low to high, ex Grade A is the lowest and Grade F is the highest grade.) · Interval: Performance Rating (Constant difference between values, but no meaningful 0 score) · Ratio: Salary, Compa-ratio, Midpoint, Seniority, Age, Raise (All have a 0 point that means none) Wow – a lot of background material. But, now that we have this covered, we can get to actually looking at our data. As suggested above, the first question we need to ask is “do we
  • 67. have any measures that show pay comparisons between our males and females?” Now, we move on to some specific ways we can use to see if the company is guilty of not paying males and females the same for doing equal work in lectures two and three for the week. Summary This class is about uncovering the secrets hidden within data sets. As data detectives (AKA data analysts), we need to develop both the tools and the logic to examine data sets and use them to answer business questions. This class, will use a single business question: Are males and females paid the same for doing equal work? Each week we will look at different tools and techniques to summarize and make sense out of the data set assigned to your class. We looked at a lot of what we could call “background” in this lecture. Information that is needed to understand what we are doing but which does little more than set-up the problem. This included the lecture’s data set and definitions of the variables to be used, some statistical concepts that help identify what we are doing and the kinds of data that we are using. Next Steps Please respond to Discussion Thread 1 for this week with your initial response and responses to others over a couple of days before moving on to reading the second lecture for the week. Ask questions and please share what is unclear. Statistics has been compared to learning a new language, we need to understand what the terms mean and how they apply. Please ask your instructor if you have any questions about this material. 09/24/2007 12:37 PMWEI JINGSHENG Page 1 of 7http://www.echonyc.com/~wei/Fifth.html
  • 68. THE FIFTH MODERNIZATION by Wei Jingsheng At the present, the media no longer play up the themes of dictatorship of the proletariat and class struggle. One reason is that this line of propaganda was used as sort of a magical potion by the Gang of Four, who have now been overthrown. Another reason, which is even more important, is that the people have had enough of all that and can no longer be deceived. According to the laws of history, the new will not come about until the old is gone. Now that the old is gone, the people are rubbing their eyes in eager anticipation. Finally, with God's blessing, there is a new promise - the Four Modernizations. Chairman Hua, the wise leader, and Vice-Chairman Deng (who the people consider even wiser and greater) have defeated the Gang of Four. Now democracy and prosperity, so earnestly sought by those who shed their blood at Tian-an-men, seem soon to be realized. After the arrest of the Gang of Four, people eagerly hoped that Vice-Chairman Deng, the so-called "restorer of capitalism," would once again appear as a great towering banner. Finally, Vice-Chairman Deng did return to his post on the Central Committee. The people were indeed excited, inspired, and ... [sic]. However, to the people's regret, the hated old political system has not changed, and even any talk about the much hoped for democracy and freedom is forbidden. People's living conditions remain the same and the "increased wages" are far behind the soaring commodity prices. There has been some talk about the restoration of "capitalism"
  • 69. and the bonus system. After some investigation it was confirmed that the "invisible whip" for "the maximum exploitation of workers," which had been cursed by the Marxist ancestors, could not be used to fool the people anymore. Although without the leadership of the Great Helmsman, people can still be led by the "wise leader" to catch up with and surpass England, the United States, Japan, and Yugoslavia (?) or the advanced world level. Taking part in revolution is no longer "in vogue." Since entering a university will greatly enhance a person's prestige, people no longer need to hear the deafening noise of "class struggle" slogans. The Four Modernizations stand for everything that is good. Of course, it is still necessary to act according to the spirit of the Central Committee, as relayed to us by the April Fifth Academy. The beautiful vision can materialize only under unified leadership and guidance. In ancient China, there were such maxims as "A cake in the picture can appease hunger" and "Watching the plums can quench the thirst." These witty and ironic remarks were quite popular in ancient times, but today, after a long and continuous development of history, people should never take such stupid remarks seriously. Yet some people not only believe in them but also carry them out in practice. For several decades, Chinese people have closely followed the Great Helmsman. Communist ideology has provided "the cake in the picture," and the Great Leap Forward and Three Red Banners have served as 09/24/2007 12:37 PMWEI JINGSHENG
  • 70. Page 2 of 7http://www.echonyc.com/~wei/Fifth.html "plums for quenching thirst." People tightening their belts and bravely forged ahead. Thirty years soon passed and they have learned a lesson from experience. For thirty years people were like "monkeys reaching out for the moon and feeling only emptiness." Therefore, when Vice-Chairman Deng put forward the slogan, "Be practical," people's enthusiasm was like surging waves. Time and again he was helped by the people to come to power. The people expected him to review the past and lead them to a realistic future with a "seeking truth from facts" approach. However, some people have warned us: Marxism-Leninism-Mao Zedong Thought is the foundations of all foundations; Chairman Mao was the Great Savior of the people; "Without the Communist Party, there would be no new China"; "Without Chairman Mao there would be no new China"; and anyone disagreeing with these will come to no good end. "Some people" even warned us: Chinese people need dictatorship. His superiority over feudal emperors precisely shows his greatness. Chinese people need no democracy unless it is "democracy under collective leadership" without which democracy is not worth a dime. It is up to you to believe or to doubt it, but the prisons (from which so many have recently been released) were convincing "proof." However, someone has now given you a way out. Take the Four Modernizations as the key link and follow the principle of stability and unity and be brave (?) to serve the revolution (?) as an old ox does. Then you will find your way to paradise, namely the prosperity of
  • 71. communism and the Four Modernizations. Some well-intentioned people have given us this advice. "When you cannot think straight, try hard to study Marxism-Leninism-Mao Zedong Thought!" The reason why you cannot think straight is your lack of understanding which reflects on the level of your ideological accomplishment. You should be obedient, otherwise the leadership of your unit cannot forgive you! And on and on. I advise everyone not to believe such political swindlers anymore. Knowing that we are being deceived, we should implicitly believe in ourselves. We have been tempered in the Cultural Revolution and cannot be that ignorant now. Let us find out for ourselves what should be done. Why Democracy? This question has been discussed by many people for centuries. Others have conducted careful analyses and indicated on the Democracy Wall how much better is democracy than autocracy. "People are the masters of history." Is this a fact or just empty talk? Well, it can be both. How can there be history without the people's strength and their participation in making it? No Great Helmsman or Wise leader can even exit, not to speak of creating history. From this, we can see that without new Chinese people, there would be no new China; but it is not true that "without Chairman Mao, there would be no new China." Vice-Chairman Deng is grateful to Chairman Mao for saving his life. This is understandable. But is it not reasonable too that he should be grateful to the "outcries" that pushed him to the seat of power?
  • 72. Would it be reasonable for him to respond to the outcries by saying, "You must not denigrate Chairman Mao, because he saved my life?" This makes "The people are the masters of history" an empty slogan. It is empty talk because people cannot master their own destiny according to the majority will; because their achievements have been credited to other people's accounts; and because their rights have been used to make somebody's royal crown. What kind of master is this? It may be more correct to call them slaves. In our history books the people are the masters who create everything, but in real life they are lackeys, always standing at attention and waiting to be "led" by leaders who swell like dough under the effect of yeast. 09/24/2007 12:37 PMWEI JINGSHENG Page 3 of 7http://www.echonyc.com/~wei/Fifth.html People should have democracy. When they ask for democracy, they are only demanding what is rightfully theirs. Anyone refusing to give it to them is a shameless bandit no better than a capitalist who robs workers of their money earned with their sweat and blood. Do the people have democracy now? No. Do they want to be masters of their own destiny? Definitely yes. This was the reason for the Communist Party's victory over Kuomintang. But what then happened to the promise of democracy? The slogan "people's democratic dictatorship" was replaced by the dictatorship of the proletariat." Even the "democracy" enjoyed by the infinitesimal portion - one among tens of millions - was abolished and replaced by the autocracy of the "Great Leader." Thus, Peng Dehuai was overthrown because,
  • 73. instead of following the Great Leader's instruction, he had the audacity to show his temper in the Party. Then a new promise was held out: Because the leader is great, implicit faith in such a leaders, rather than democracy, will bring more happiness to the people. People have believed in this promise, half reluctantly and half willingly, until today. But are they any happier? Are they richer or more prosperous? Unconcealable facts show that they are poorer, more miserable, and more backward. Why? This is the first question to be considered. And what to do now? This is the second question. There is no need now to determine the ratio of Mao Zedong's merits and shortcomings. He first spoke about this as a self-defense. People should now think for a while and see if, without Mao Zedong's autocracy, China could be in its present backward state. Are Chinese people stupid, or lazy, or unwilling to enjoy wealth? Are they expecting too much? Quite the opposite. Then why? The answer is quite obvious. Chinese people should not have taken this road. Then why did they take it? Only because they were led by that self- exalting autocrat. If they did not take this road, he would exercise dictatorship over them. The people could see no other road and therefore had no choice. Is this not deception? Can there be any merit in deception? What road is this? It is called the "socialist road." According to the definition of the Marxist ancestors, socialism means that the people, or the proletariat, are their own masters. Let me ask the Chinese workers and peasants: With the meager wages you get every month, whose master and what kind of master can you be? Sad to relate, you are "mastered" by somebody else when in the matter of matrimony. Socialism
  • 74. guarantees the producers' rights to the surplus production from their labor over what is needed as a service to the society. But this service is limitless. So are you not getting only that miserable little wage "necessary for maintaining the labor force for production?" Socialism guarantees many rights, such as the right of a citizen to receive educations, to use this ability to the best advantage, and so forth. But none of these rights can be seen in our daily life. What we can see is only "the dictatorship of the proletariat" and "a variation of Russian autocracy" - Chinese socialist autocracy. Is this kind of socialist road what people want? Can it be claimed that autocracy means the people's happiness. Is this the socialist road depicted by Marx and hoped for by the people? Obviously not. Then what is it? Funny as it may sound, it is like the feudal socialism mentioned in the "Manifesto," or a feudal monarchy disguised as socialism. We have heard that Soviet Russia has been promoted from social feudalism to social imperialism. Must Chinese people take the same road? Some people have proposed that we should change everything to fascist autocracy under feudal socialism. To this I entirely agree, because the question of merits or shortcomings does not exist here. Let me say a word about the "National Socialism" the real name of the notorious German fascism. These fascists, also under an autocrat tyrant, called on the people to tighten their belts and deceived the people by telling them that they belonged to a great nation. Their main purpose was to suppress the most rudimentary form of democracy, because they clearly knew that democracy was the most formidable and irresistible enemy. On this basis, Stalin and Hitler shook hands and signed the German-Soviet Pact whereby a socialist state and a National Socialist State toasted the partition of
  • 75. Poland while the peoples of both countries suffered enslavement and poverty? If we do not want democracy as our only choice or, in other words, if we 09/24/2007 12:37 PMWEI JINGSHENG Page 4 of 7http://www.echonyc.com/~wei/Fifth.html want modernized economics, science, military science, and so forth, then there must be modernization of the people and of the social system. The Fifth Modernization - What Kind of Democracy? I would like to ask everyone: What do we want modernization for? After all, some men feel that the age of The Dream of the Red Chamber must have been perfectly all right, because men were free to read, write poetry, and fool around with women. One needed only to open his mouth and food would be provided, only raise an arm to be dressed. Well, today's privileged class get to see foreign movies and live like gods. Such a life-style is quite inaccessible to ordinary folk. What the people want are the happy days which they can truly enjoy and which are not worse than those enjoyed by foreigners. All want prosperity, the kind of prosperity which is universal and which can only result from increased social productive forces. This is obvious to everyone. However, there is still something overlooked by somebody. Can people enjoy good living when social productive forces have been increased? Now the questions of authority, of domination, of distribution, and of exploitation arise.