SlideShare a Scribd company logo
1 of 25
BUS308 – Week 1 Lecture 2
Describing Data
Expected Outcomes
After reading this lecture, the student should be familiar with:
1. Basic descriptive statistics for data location
2. Basic descriptive statistics for data consistency
3. Basic descriptive statistics for data position
4. Basic approaches for describing likelihood
5. Difference between descriptive and inferential statistics
What this lecture covers
This lecture focuses on describing data and how these
descriptions can be used in an
analysis. It also introduces and defines some specific
descriptive statistical tools and results.
Even if we never become a data detective or do statistical tests,
we will be exposed and
bombarded with statistics and statistical outcomes. We need to
understand what they are telling
us and how they help uncover what the data means on the
“crime,” AKA research question/issue.
How we obtain these results will be covered in lecture 1-3.
Detecting
In our favorite detective shows, starting out always seems
difficult. They have a crime,
but no real clues or suspects, no idea of what happened, no
“theory of the crime,” etc. Much as
we are at this point with our question on equal pay for equal
work.
The process followed is remarkably similar across the different
shows. First, a case or
situation presents itself. The heroes start by understanding the
background of the situation and
those involved. They move on to collecting clues and following
hints, some of which do not pan
out to be helpful. They then start to build relationships between
and among clues and facts,
tossing out ideas that seemed good but lead to dead-ends or
non-helpful insights (false leads,
etc.). Finally, a conclusion is reached and the initial question
of “who done it” is solved.
Data analysis, and specifically statistical analysis, is done quite
the same way as we will
see.
Descriptive Statistics
Week 1 Clues
We are interested in whether or not males and females are paid
the same for doing equal
work. So, how do we go about answering this question? The
“victim” in this question could be
considered the difference in pay between males and females,
specifically when they are doing
equal work. An initial examination (Doc, was it murder or an
accident?) involves obtaining
basic information to see if we even have cause to worry.
The first action in any analysis involves collecting the data.
This generally involves
conducting a random sample from the population of employees
so that we have a manageable
data set to operate from. In this case, our sample, presented in
Lecture 1, gave us 25 males and
25 females spread throughout the company. A quick look at the
sample by HR provided us with
assurance that the group looked representative of the company
workforce we are concerned with
as a whole. Now we can confidently collect clues to see if we
should be concerned or not.
As with any detective, the first issue is to understand the “who”
and “what” about the
victim. In this case, we need to use our sample to understand
basic information about how males
and females are paid. Understanding data sets typically
involves look at several characteristics.
These descriptive measures describe the data set. Typical
descriptive measures include:
• Measures of location such as the average (AKA mean), the
median (middle
point), and mode (most often occurring value if it exists).
• Measures of consistency such as range (largest value minus
the smallest value),
variance, and standard deviation.
• Measure of position showing where a single data point is
within the data set, such
as percentile and rank.
• Measures of likelihood showing the probability of obtaining
specific outcomes.
Note: Descriptive statistics describe a particular data set and
can only be used for that
data set. However, often we want to use a sample to “infer”
back to a larger population. In this
case, we would use inferential statistics. Most measures, except
for variance and standard
deviation, are calculated the same way. We will see the specific
difference for those two later in
this lecture.
The key to whether we have descriptive statistics or inferential
statistics lies with the
group we are taking the measures on. If we are only concerned
with that group, we use
descriptive statistics. If, however, we want to use that group to
make inferences, claims, and
conclusions about a larger population, then we take a random
sample from the population and
use inferential statistics (allowing us to infer back to the
population). Our class data sets – both
the lecture and homework – are random samples from a larger
population, so we will basically
be using inferential statistical measures.
Note that these are not the complete list of possible descriptive
statistics. Excel’s
Descriptive Statistics function (described in Lecture 3 for this
week) includes a couple of
measures that focus on data distribution shape. These have
some specialized uses that we will
not be getting into.
Location Measures
Perhaps the most often asked question about data sets is what is
the average? The intent
is to get a measure that shows us the center of the data.
Unfortunately, average is a somewhat
imprecise term that could mean all three of our measures of
location identified above. So, as
analysts we tend to be more precise and use mean, median, and
mode.
While these all tell us something about where the data might be
clustered, they can
provide very different views of the data. An example of this
comes from an example the author
heard back in High School. At that time, the mean per capita
income for citizens of Kuwait was
about $25,000; the median income was around $125; and the
mode was $25! The very high (due
to oil revenues) income of the Royal family accounted for much
of this difference, but just look
at the different impressions we get about the country depending
on which value we look at.
• Mean, AKA average, is the sum of all the values divided by
the count. This can be
considered the “weighted center” of the data set. For example,
the mean of 1, 2, 3, 4, and
5 = 1+2+3+4+5/5 = 15/5 = 3. The mean is generally the best
measure for any data set as
it uses all of the data values and requires interval or ratio level
data. Thus, while we can
average salary, compa-ratio, seniority, etc., we cannot average
gender or gender1 (even if
one is coded in numbers) or grade in our data set.
• The median is the middle value in an ordered (listed from low
to high) data set. This is
the “physical center” of a data set. For example, the median of
1, 2, 3, 4, and 5 = 3, the
middle value. If we have an even number of values, the median
is the average of the
middle two values. Medians can be found on ordinal, interval,
or ratio level data.
• The mode is the most frequently occurring value. This is more
or less the “popular
center” as it is where most numbers group together. A data set
may have no modes or
one or more. Modes may occur with any level of data. The data
set 1,1,2,2,2,2,3,8,8,9
has a primary mode of 2, and two secondary modes of 1 and 8.
Consistency/Variation Measures
While they do not have the popularity of their location cousins,
knowing the consistency
or variation within the data is as important, some say even more
important, as knowing the
central tendency for us to understand what the data is trying to
tell us. Very consistent data, with
little variation, has a mean that is very representative of the
data and is unlikely to change much
if we resample the population. Data with a large amount of
variation tends to have unstable
means, meaning that these values would change a lot with
multiple samples. Inconsistent data
(having large variation) is often a problem for businesses,
particularly for manufacturing
operations, as it means the results they produce differ and might
often not meet the quality
specifications. Predictions based on data with large variations
are rarely useful. Consider
attempting to estimate how long it would take you to get to
work if your route had frequent
traffic accidents that made the travel time different every day.
The key measures of variation are:
• Range, which equals the maximum value minus the minimum
value. For our
example data set of 1, 2, 3, 4, and 5, the range is 5 – 1 = 4.
• Variance, which is the average of the square of sum of the
differences between each
value in the data set from the mean. To get the variance, find
the mean of the data,
subtract this value from each of the data points, square this
result (to get rid of the
negative differences), add them up and divide by the total
count. For our example
data set, this would look like:
Value Mean Difference Squared
1 3 -2 4
2 3 -1 1
3 3 0 0
4 3 1 1
5 3 2 4
Sum = 10
Variance = 10/5 = 2
The problem with variance is that it expressed as units squared.
So, if our data set
were dollars, the variance would be 2 dollars squared. How
should we interpret
dollars squared? In general, we do not and use the next measure
instead.
• Standard Deviation is the (positive) square root of the
variance. It returns the
dispersion measure back to one that is in the same units as the
original data, so we can
compare it to the data values. For our example, the standard
deviation is the square
root of 2 dollars squared, or 1.4 dollars. This much easier to
understand measure
means that the average difference from the mean is 1.4 dollars
(in our example above
having a mean or average value of 3 dollars).
• Important point about the variance and standard deviation.
When we find these
values for a population, the entire group we are interested in,
we divide the numerator
by the sample size. However, when we have a sample of the
entire group (and want to
use this sample to estimate the population value for either
variance or standard
deviation), we create the inferential estimate by dividing the
numerator by the (count
– 1). This is an adjustment that increases the estimate to take
into account we most
likely do not have the extreme low and extreme high value from
the population in our
sample, so its variation is less than the group we are using the
sample to describe.
Just as detectives want to know what victims typically did and
how consistent they were
in their behavior around the time of the crime (For example:
Was he usually in this area, and if
not, why last night?), examining location and consistency
measures provide a similar perspective
on data variables and how they behave.
Applying the Information: Equal Pay Questions
OK, we can now start looking at our data set to see what the
numbers are hiding, and
develop some clues. As with all analysis, we start with
questions, then identify the tools to use
for those questions, and finally apply those tools to the data.
Our initial question is, do males
and females get equal pay for equal work? We also said we
needed to start with the question of
whether or not we had some measures that showed pay
comparisons between males and females.
Let’s take a look at some of the group and sub-group data. A
couple of measures that might
answer this question are:
• What are the group averages for each variable?
• What are the average male and female compa-ratios?
(Remember, you will work with
the Salary variable in the homework.)
• How consistent are the compa-ratios for each?
Note that we will be focusing on the compa-ratio data in the
lectures, while you will
focus on the same questions using salary in the weekly
homework assignments. As described,
compa-ratio is the result of dividing an employee’s salary by
their grade midpoint. It generally
ranges from about 0.80 to 1.20 in most pay plans. The value of
this measure is it removes the
impact of different grades (each of which we are assuming are
different levels of work from
other grades and contain equal work for all the jobs within the
grade). While not a perfect
measure, it is the start of measuring what is paid for equal
work. Side note: a grade’s midpoint is
generally pegged to the average market pay needed to hire new
employees into a job.
Week 1 Question 1
Question 1 asks for some summary statistics. Part A asks for
you to use the Excel Descriptive
Statistic function (more on this in the third lecture), while part
B asks for some specific statistics
using the Fx function list (again, how to do this is covered in
lecture 3). The purpose for these
specific requests is to let you show mastery in using these two
Excel tools.
For part a, the mean, standard deviation, and range of the entire
compa-ratio data set is
highlighted. This shows us that that mean is 1.062, the standard
deviation is 0.077, and the range
is 0.34. As interesting as these values are, they do not really
tell us anything. Measures
generally need to be compared to provide information.
This is where part b comes in. We see that the male and female
averages (1.056 and
1.069 (rounded) respectively) appear relatively close and are on
opposite sides of the overall
mean. The standard deviations are also close at 0.084 and 0.07
and surround the standard
deviation from the entire data set. The ranges are both smaller
than the overall range – meaning
that neither gender has both the smallest and largest value. The
female compa-ratios appear to
be slightly more clustered (less variation, more consistent) than
the male values from both the
range and standard deviation results.
Two things stand out. First, perhaps surprisingly, the females
appear to be paid more
relative to their grade midpoints than the males. Second,
measures of dispersion appear fairly
close with males being slightly more spread out than females.
So far, nothing seems to create
any concerns as we expect sample results to be a bit different
than the overall population values.
These differences seem to be small enough that they might be
simple sampling errors – if we
resampled (such as the data set you will be working with) the
male and female results might
switch.
Remember, when you do this problem in the homework, use the
salary data. As practice
you can copy the data set into a practice excel file and try to
replicate the same answers as show
up in the lectures. Ask a question if you are unsure of how to
do this or do not get the same
results using the lecture dataset.
Position Measures
Often, we are interested in where within a data set a particular
measure falls. This opens
up the idea of distributions, how the data values are spread
across the range of values. Our
detectives would be looking at where victims typically went and
where they spent their time –
the pattern of their normal behavior.
Distributions. Location and consistency measures are important
for summarizing the
data set. Important as they are, they do not always give us all
the information we need. At times
we want to know how specific values fit within the data set.
For example, we might want to
compare the 10th highest male and female value to get a sense
of how relative positions within
the data range differ. This often means we need to examine the
distribution, or shape, of the
data. This shows us how all the data values relate to all of the
other values with the sample.
One important tool in analyzing data sets that we will not cover
(we cannot cover
everything, alas) is graphical analysis – looking at how data
sets are distributed when graphed.
One example will show how powerful these techniques can be.
One very common graph is a
histogram – a count of how many times a certain value occurs.
For example, if you tossed a pair
of dice 50 times, you might get the following results. The table
shows the results we got. The
Histogram shows the distribution or shape of the data, with the
x-axis, horizontal, showing the
sum of the numbers on the two faces and the y-axis, vertical,
showing how often we observed
Outcomes from tossing a pair of dice
Count showing 2 3 4 5 6 7 8 9 10 11 12
Frequency seen 1 2 4 3 9 12 7 5 4 1 2
0
2
4
6
8
10
12
14
2 3 4 5 6 7 8 9 10 11 12
a particular result in our 50 tosses.
A couple of things we can do with distributions can be easily
shown with this histogram.
First, we can find the center, in this case 7. We can see that
there are two tails around the center,
one to the left showing counts for values less than the middle
value of 7, and one to the right
showing how often we got values greater than 7. Visually, we
can see that the further away from
the center we get, the less often – or less likely – we are to get
any particular outcome. Ways to
quantify these observations are discussed below.
Our detectives use this logic when they attempt to find out
where all the persons of
interest were at the critical times. These approaches provide
more detailed information about
how the data looks more specifically than the summaries of
dispersion examined earlier.
Position Measures. Central tendency and variation are group
descriptive measures –
particularly the mean and standard deviation, which use all the
values in the data set in their
calculation. At times; however, we are concerned with specific
values with in the distribution,
such as:
• Quartiles,
• Percentiles, or centiles,
• The 5-number summary, or
• Z-score.
Quartiles and Percentiles. These measures divide the data into
groups, four with the
quartile and 100 with the percentile. One example that many
of you might be familiar with is
percentile (AKA percentile rank). This is often use when
doctors describe a child as in the 80th
percentile in height or weight for his/her age. This means that
80% of other children at this age
are at or below this particular child’s measure. Percentiles
range from 1 to 100%-tile, meaning
the lowest score would be at the first (or 1%-tile) and the
highest score would be at the 100%-
tile. Percentiles are very useful for comparing groups.
The general percentile formula lets us find percentiles, deciles
(the 10% divisions), and/or
quartiles, although Excel will do this for us. The formula is:
Lp = (n+1) * P/100; where
Lp is the count of the desired percentile (25 would be the
location of the first quartile, for
example)
n is the size/count of the data set
P is the desired percentile; using 25, 50, or 75 gives the quartile
points, while using 10,
20, etc. would give the decile points.
Example: if we wanted to find the cut-off for the first (or
lowest) quartile of the data, also
known as the 25th percentile in a data set of 50, we would use
(50+1)*25/100 = 12.75, or
the 13th value from the bottom in an ordered list. By
convention, we always round up to
the next whole value.
5-Number Summary. As its name suggests, the 5-number
summary identifies five key
values in a data set: minimum value, 1st quartile, median or 2nd
quartile, 3rd quartile, and
maximum values. For the compa-ratio data set used in the
lectures, the 5-number summary can
be found from the following table results. The 1st quartile, for
either gender group of 25 is
(25+1) * 25/100 = 6.5, or the 7th values in a rank ordered list.
The 3rd quartile is located at 19.5.
For the entire sample of 50, these values are located at the 13th
and 39th rank ordered places,
respectively. Here is a 5-number summary for the overall
compa-ratio values in the sample:
Compa-ratio 5-number summary: 0.870, 1.013, 1.051, 1.134,
1.210.
More on this shortly.
Z-score. What is often of more value is looking at where
specific measures lie within
each range. The z-score measures show how far from the mean
a specific data point lies,
measured in standard deviation units. (I know that sounds
strange but keep reading.) The Z-
score provides a measure of how many standard deviations a
particular score lies from the mean,
and in what direction (above or below). The Z-score formula is:
Z = (individual score – mean) / (standard deviation).
Looking at this formula we can see that a score above the mean
would give us a positive
z-score, a score below the mean would give us a negative z-
score, and a score that exactly equals
the mean would gives us a z-score of 0. For most data sets, the
z-score ranges from a -3.0 to a
+3.0.
For example, in our example data set (1, 2, 3, 4, and 5) (see
above for descriptive
statistics on this data set), the z score for 2 would be (2-3)/1.4 =
-1/1.4 = -0.71. The negative
value means that 2 is below (or less than) the mean and is 0.71
standard deviation units away
from the mean (0.71 times the standard deviation of 1.4 = 1).
Using this measure, we can easily examine relative placement of
scores. For example, a
compa-ratio of 1.06 would have Z-scores of 0.048 for males, -
0.129 for females, and -0.03 for
the overall group. (We will see how we got these values
shortly.) Thus, we can see that a person
with this compa-ratio is slightly above average for males, but
below average for the overall
group and for females.
Applying the information
Week 1 Question 2
Question 2 asks for a 5-number summary for the overall compa-
ratio data set as well as for the
male and female sub-groups within the data.
Note: Lecture 1-3 will show the same screen shot with the cell
formulas displayed.
One of the first observations that confirms an earlier
observation is that neither the male
nor female data set has both the largest and smallest values.
The males appear to have a slightly
lower overall range of values than do the females. Some other
interesting observations include
the relatively similar 3rd quartile values for all three groups and
the lower midpoint for females,
meaning that more females are lower in the overall range than
males. More males are in the first
quartile than females. What other observations can you make
about how employees are
distributed within their respective compa-ratio ranges?
Week 1 Question 3
Often looking at how a single point lies within a data range is
helpful to get some insight
into how the distributions are positioned. Question 3 asks for
us to examine where the midpoint
of each gender’s dataset fits within the entire compa-ratio data
set. The Percentank.exc function
returns a percentile rank, the percent of data values that fall at
or below a given value. For
example, the percentrank.exe of the median would be 50%-tile
as half the values are above and
half below the median (as expected).
When we look at the male median, we see it falls at the 51st %-
tile, meaning it is slightly
above the overall median. The female median (half of the
female compa-ratios are below this
value remember) falls at the 33rd %-tile! This means that most
of the females are in the bottom
half of the distribution, even though (from Question 2), females
have the “higher” range.
Interesting.
The z score is a measure of relative placement based on the
mean rather than the median.
A value that equals the mean would have a z score of 0, a value
that is greater than the mean
would have a positive z score, while a value less than the mean
would have a negative z score.
Both the male and female medians fall below the overall compa-
ratio mean, with the female
median being relatively lower in the distribution. This is
consistent with what the percentile
scores suggested. Overall, these two questions are suggesting
that males and females are not
distributed the same within in the compa-ratio data set.
Likelihood Measures
Likelihood, or probability, focuses on how often we can expect
to see an outcome. In
statistics, many decisions are made based upon how likely, or
more accurately, how unlikely it is
to see an outcome.
Probability
Probability is the likelihood that an event will happen. For
example, if we toss a fair
coin, we have a 50/50 chance, or a probability of .5 of getting a
head. If we pick a date between
1 and 7, we have a 1 out of 7 chances (or a probability of 1/7 =
.14 or 14%) that it will be a
Wednesday in the current month. Statisticians recognize three
types of probabilities:
• Theoretical – based on a theory, for example – since a die
(half of a pair of dice) has 6
sides, and our theory says each face is equally likely to show up
when we toss it; we
therefore expect that will see a 1 1/6th of the number of times
we toss it (assuming we
toss it a lot).
• Empirical – count based; if we see that an accident happens on
our way to work 5
times(days) within every 4 weeks, we can say the probability of
an accident today is 5/20
or 25% since there are 20 work days within a 4-week period.
An empirical probability
equals the number of successes we see divided by the number of
times we could have
seen the outcome.
• Subjective – a guess based on some experience or feeling.
There are some basic probability rules that will be helpful
during the course. The
probability
• of something (an event) happening is called P(event),
• of two things happening together – called joint probability:
P(A and B),
• of either one or the other but not both events occurring – P(A
or B),
• of something occurring given that something else has
occurred, conditional probability:
P(A|B) (read as probability of A given B).
• Compliment rule: P(not A) = 1- p(A).
Two other issues are needed, the idea of mutually exclusive
means that the elements of
one data set do not belong to another – for example, males and
pregnant are mutually exclusive
data sets. The other term we frequently hear with probability is
collectively exhaustive – this
simply means that all members of the data set are listed.
Some rules, which apply for both theoretical and empirical
based probabilities, for
dealing with these different probability situations include:
• P(event) = (number of success)/(number of attempts or
possible outcomes)
• P(A and B) = P(A)*P(B) for independent events or
P(A)*P(B|A) for dependent events
(This last is called conditional probability the probability of B
occurring given that A has
occurred).
• P(A or B) = P(A) + P(B) – P(A and B); if A and B cannot
occur together (such as the
example of male and pregnant) then P(A and B) = 0
• P(A|B) = P(A and B)/P(B).
One of the more interesting uses of probabilities (other than
forecasting the likelihood of
rain on our days off) is the comparing of outcome likelihoods
for different groups.
• The probability of randomly picking a female [P(F)] is the
same as randomly picking
a male [P(M)] from the group = 25 specified outcomes/50
possible outcomes. This is
a simple empirical probability – counts divided by counts.
• We can get a bit more complicated, such as the probability of
picking a female from a
specific grade such as B – P(F|B), probability of picking a
female given (from) only
grade B. Again, this is empirical – we have 7 employees in
grade B, and 4 of these
are females, so P(F|B) = 4/7.
• Now the probability of picking a Female who is also in grade
B (from the entire data
set is 4 females out of 50 = 4/50 = 0.08, empirically. We can
find this using the P(A
and B) formula referenced above. P(F and B) = P(F)*P(B|F),
since the events of
female and grade E are not independent. So, we know P(F) =
.5, and P(B/F) = 4/25
(4 females out of 25 are in grade B), so by theory, P(Female and
grade B) = .5 * .16 =
0.08, the same results.
• The compliment rule is often helpful, if we want to find the
probability of picking any
female EXCEPT those in grade B, we could figure out the
probability for each of the
grades and add them together, OR we could simply say that the
probability of Female
and not grade B is simply 1 – P(Female and grade B), or 1 -
0.08, or 0.92. We will
use this property of probabilities a lot in the rest of the class.
As we can see, probabilities can show us a lot and can be
somewhat complex in determining
their values. The nice thing is that this is about as complicated
as it gets.
Applying the information
Week 1 Question 4
Question 4 gives us some probability values- how likely are we
to exceed the respective
gender midpoints in the entire data set. We are looking at the
empirical and normal curve
probabilities. If the data set is normally distributed, the
probabilities should be fairly close; if
not, we have a clue that the data might not be normally
distributed over the entire data range.
The male empirical probability of exceeding the midpoint in the
entire data set is 50%
empirically (close to the 51st percentile value we got above)
and 55% assuming normality –
fairly close. The female probabilities are 68 and 60%
respectively; again not too far off.
The data again support the idea that a lot of females are at the
higher end of the compa-
ratio distribution.
Drawing Conclusions: Week 1 Question 5
As interesting as the numbers are themselves, they mean very
little unless we can
interpret their meaning and apply that insight to the question(s)
at hand.
Recapping our results, we see that while female overall average
compa-ratio is somewhat
higher than the males, the probability and distribution outcomes
suggest that males and females
are not distributed in the same fashion and that more of the
females are relatively lower in their
range than the males.
While we have not yet accounted for equal work, it appears that
there are some issues
suggesting that males and females are not paid the same within
the company. At least enough
for more investigation.
On our detective shows, we might say that we have some
evidence, but not enough to
take it to the grand jury for an indictment yet.
Summary
This lecture looked at descriptive statistics and what they can
tell us about the data set.
We reviewed the questions that are asked in the Week 1
assignment and the answers for each
question using the COMPA-RATIO variable. The focus of this
lecture was on interpreting
presented results, as that is a more frequent activity for
professionals than actually developing
the measures.
Specifically, we looked at the developing the following
information.
Note that this was created by listing the tool as we introduced
it, the data requirements,
and then a typical question that would require this tool. By
copying this information to a second
Excel sheet and sorting the columns we can create a guide as to
when to use each tool, a shown
below.
Now, we move on to some specific ways to set-up Excel to
provide the results that we
just looked at.
Before we do, however, please respond to Discussion Thread 2
for this week with your
initial response and responses to others over a couple of days
before moving on to reading the
second lecture for the week.
Please ask your instructor if you have any questions about this
material.

More Related Content

Similar to BUS308 – Week 1 Lecture 2 Describing Data Expected Out.docx

BUS308 – Week 5 Lecture 1 A Different View Expected Ou.docx
BUS308 – Week 5 Lecture 1 A Different View Expected Ou.docxBUS308 – Week 5 Lecture 1 A Different View Expected Ou.docx
BUS308 – Week 5 Lecture 1 A Different View Expected Ou.docxcurwenmichaela
 
SPSS GuideAssessing Normality, Handling Missing Data, and Calculating Scores...
SPSS GuideAssessing Normality, Handling Missing Data, and Calculating  Scores...SPSS GuideAssessing Normality, Handling Missing Data, and Calculating  Scores...
SPSS GuideAssessing Normality, Handling Missing Data, and Calculating Scores...ahmedragab433449
 
M08 BiasVarianceTradeoff
M08 BiasVarianceTradeoffM08 BiasVarianceTradeoff
M08 BiasVarianceTradeoffRaman Kannan
 
Statistical ProcessesCan descriptive statistical processes b.docx
Statistical ProcessesCan descriptive statistical processes b.docxStatistical ProcessesCan descriptive statistical processes b.docx
Statistical ProcessesCan descriptive statistical processes b.docxdarwinming1
 
Stat11t Chapter1
Stat11t Chapter1Stat11t Chapter1
Stat11t Chapter1gueste87a4f
 
Ashford 2 - Week 1 - Instructor GuidanceWeek OverviewThe f.docx
Ashford 2 - Week 1 - Instructor GuidanceWeek OverviewThe f.docxAshford 2 - Week 1 - Instructor GuidanceWeek OverviewThe f.docx
Ashford 2 - Week 1 - Instructor GuidanceWeek OverviewThe f.docxfredharris32
 
Analysing The Data
Analysing The DataAnalysing The Data
Analysing The DataAngel Evans
 
Case Study Hereditary AngioedemaAll responses must be in your .docx
Case Study  Hereditary AngioedemaAll responses must be in your .docxCase Study  Hereditary AngioedemaAll responses must be in your .docx
Case Study Hereditary AngioedemaAll responses must be in your .docxcowinhelen
 
Analyzing quantitative data
Analyzing quantitative dataAnalyzing quantitative data
Analyzing quantitative datamostafasharafiye
 
Advanced business mathematics and statistics for entrepreneurs
Advanced business mathematics and statistics for entrepreneursAdvanced business mathematics and statistics for entrepreneurs
Advanced business mathematics and statistics for entrepreneursDr. Trilok Kumar Jain
 
Dymystify Statistics Day 1.pdf
Dymystify Statistics Day 1.pdfDymystify Statistics Day 1.pdf
Dymystify Statistics Day 1.pdfKristineIbaez2
 
Data science notes for ASDS calicut 2.pptx
Data science notes for ASDS calicut 2.pptxData science notes for ASDS calicut 2.pptx
Data science notes for ASDS calicut 2.pptxswapnaraghav
 
When you are working on the Inferential Statistics Paper I want yo.docx
When you are working on the Inferential Statistics Paper I want yo.docxWhen you are working on the Inferential Statistics Paper I want yo.docx
When you are working on the Inferential Statistics Paper I want yo.docxalanfhall8953
 
Categorical DataCategorical data represents characteristics..docx
Categorical DataCategorical data represents characteristics..docxCategorical DataCategorical data represents characteristics..docx
Categorical DataCategorical data represents characteristics..docxketurahhazelhurst
 

Similar to BUS308 – Week 1 Lecture 2 Describing Data Expected Out.docx (20)

BUS308 – Week 5 Lecture 1 A Different View Expected Ou.docx
BUS308 – Week 5 Lecture 1 A Different View Expected Ou.docxBUS308 – Week 5 Lecture 1 A Different View Expected Ou.docx
BUS308 – Week 5 Lecture 1 A Different View Expected Ou.docx
 
bigDay1data
bigDay1databigDay1data
bigDay1data
 
SPSS GuideAssessing Normality, Handling Missing Data, and Calculating Scores...
SPSS GuideAssessing Normality, Handling Missing Data, and Calculating  Scores...SPSS GuideAssessing Normality, Handling Missing Data, and Calculating  Scores...
SPSS GuideAssessing Normality, Handling Missing Data, and Calculating Scores...
 
M08 BiasVarianceTradeoff
M08 BiasVarianceTradeoffM08 BiasVarianceTradeoff
M08 BiasVarianceTradeoff
 
Statistical ProcessesCan descriptive statistical processes b.docx
Statistical ProcessesCan descriptive statistical processes b.docxStatistical ProcessesCan descriptive statistical processes b.docx
Statistical ProcessesCan descriptive statistical processes b.docx
 
Stat11t chapter1
Stat11t chapter1Stat11t chapter1
Stat11t chapter1
 
Stat11t Chapter1
Stat11t Chapter1Stat11t Chapter1
Stat11t Chapter1
 
161144815 obesity
161144815 obesity161144815 obesity
161144815 obesity
 
Ashford 2 - Week 1 - Instructor GuidanceWeek OverviewThe f.docx
Ashford 2 - Week 1 - Instructor GuidanceWeek OverviewThe f.docxAshford 2 - Week 1 - Instructor GuidanceWeek OverviewThe f.docx
Ashford 2 - Week 1 - Instructor GuidanceWeek OverviewThe f.docx
 
EXPLAIN SPSS-part1.pdf
EXPLAIN SPSS-part1.pdfEXPLAIN SPSS-part1.pdf
EXPLAIN SPSS-part1.pdf
 
Analysing The Data
Analysing The DataAnalysing The Data
Analysing The Data
 
Case Study Hereditary AngioedemaAll responses must be in your .docx
Case Study  Hereditary AngioedemaAll responses must be in your .docxCase Study  Hereditary AngioedemaAll responses must be in your .docx
Case Study Hereditary AngioedemaAll responses must be in your .docx
 
Analyzing quantitative data
Analyzing quantitative dataAnalyzing quantitative data
Analyzing quantitative data
 
Advanced business mathematics and statistics for entrepreneurs
Advanced business mathematics and statistics for entrepreneursAdvanced business mathematics and statistics for entrepreneurs
Advanced business mathematics and statistics for entrepreneurs
 
Dymystify Statistics Day 1.pdf
Dymystify Statistics Day 1.pdfDymystify Statistics Day 1.pdf
Dymystify Statistics Day 1.pdf
 
Data science notes for ASDS calicut 2.pptx
Data science notes for ASDS calicut 2.pptxData science notes for ASDS calicut 2.pptx
Data science notes for ASDS calicut 2.pptx
 
When you are working on the Inferential Statistics Paper I want yo.docx
When you are working on the Inferential Statistics Paper I want yo.docxWhen you are working on the Inferential Statistics Paper I want yo.docx
When you are working on the Inferential Statistics Paper I want yo.docx
 
Unit 1 Introduction
Unit 1 IntroductionUnit 1 Introduction
Unit 1 Introduction
 
Categorical DataCategorical data represents characteristics..docx
Categorical DataCategorical data represents characteristics..docxCategorical DataCategorical data represents characteristics..docx
Categorical DataCategorical data represents characteristics..docx
 
Krupa rm
Krupa rmKrupa rm
Krupa rm
 

More from curwenmichaela

BUS310ASSIGNMENTImagine that you work for a company with an ag.docx
BUS310ASSIGNMENTImagine that you work for a company with an ag.docxBUS310ASSIGNMENTImagine that you work for a company with an ag.docx
BUS310ASSIGNMENTImagine that you work for a company with an ag.docxcurwenmichaela
 
BUS357 Copyright © 2020 Singapore University of Social Science.docx
BUS357 Copyright © 2020 Singapore University of Social Science.docxBUS357 Copyright © 2020 Singapore University of Social Science.docx
BUS357 Copyright © 2020 Singapore University of Social Science.docxcurwenmichaela
 
BUS308 Statistics for ManagersDiscussions To participate in .docx
BUS308 Statistics for ManagersDiscussions To participate in .docxBUS308 Statistics for ManagersDiscussions To participate in .docx
BUS308 Statistics for ManagersDiscussions To participate in .docxcurwenmichaela
 
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docxBUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docxcurwenmichaela
 
BUS225 Group Assignment1. Service BlueprintCustomer acti.docx
BUS225 Group Assignment1. Service BlueprintCustomer acti.docxBUS225 Group Assignment1. Service BlueprintCustomer acti.docx
BUS225 Group Assignment1. Service BlueprintCustomer acti.docxcurwenmichaela
 
BUS301 Memo Rubric Spring 2020 - Student.docxBUS301 Writing Ru.docx
BUS301 Memo Rubric Spring 2020 - Student.docxBUS301 Writing Ru.docxBUS301 Memo Rubric Spring 2020 - Student.docxBUS301 Writing Ru.docx
BUS301 Memo Rubric Spring 2020 - Student.docxBUS301 Writing Ru.docxcurwenmichaela
 
BUS1431Introduction and PreferencesBUS143 Judgmen.docx
BUS1431Introduction and PreferencesBUS143 Judgmen.docxBUS1431Introduction and PreferencesBUS143 Judgmen.docx
BUS1431Introduction and PreferencesBUS143 Judgmen.docxcurwenmichaela
 
BUS210 analysis – open question codesQ7a01 Monthly OK02 Not .docx
BUS210 analysis – open question codesQ7a01 Monthly OK02 Not .docxBUS210 analysis – open question codesQ7a01 Monthly OK02 Not .docx
BUS210 analysis – open question codesQ7a01 Monthly OK02 Not .docxcurwenmichaela
 
Bus101 quiz (Business Organizations)The due time is in 1hrs1 .docx
Bus101 quiz (Business Organizations)The due time is in 1hrs1 .docxBus101 quiz (Business Organizations)The due time is in 1hrs1 .docx
Bus101 quiz (Business Organizations)The due time is in 1hrs1 .docxcurwenmichaela
 
BUS 625 Week 4 Response to Discussion 2Guided Response Your.docx
BUS 625 Week 4 Response to Discussion 2Guided Response Your.docxBUS 625 Week 4 Response to Discussion 2Guided Response Your.docx
BUS 625 Week 4 Response to Discussion 2Guided Response Your.docxcurwenmichaela
 
BUS 625 Week 2 Response for Discussion 1 & 2Week 2 Discussion 1 .docx
BUS 625 Week 2 Response for Discussion 1 & 2Week 2 Discussion 1 .docxBUS 625 Week 2 Response for Discussion 1 & 2Week 2 Discussion 1 .docx
BUS 625 Week 2 Response for Discussion 1 & 2Week 2 Discussion 1 .docxcurwenmichaela
 
Bus 626 Week 6 - Discussion Forum 1Guided Response Respon.docx
Bus 626 Week 6 - Discussion Forum 1Guided Response Respon.docxBus 626 Week 6 - Discussion Forum 1Guided Response Respon.docx
Bus 626 Week 6 - Discussion Forum 1Guided Response Respon.docxcurwenmichaela
 
BUS 499, Week 8 Corporate Governance Slide #TopicNarration.docx
BUS 499, Week 8 Corporate Governance Slide #TopicNarration.docxBUS 499, Week 8 Corporate Governance Slide #TopicNarration.docx
BUS 499, Week 8 Corporate Governance Slide #TopicNarration.docxcurwenmichaela
 
BUS 499, Week 6 Acquisition and Restructuring StrategiesSlide #.docx
BUS 499, Week 6 Acquisition and Restructuring StrategiesSlide #.docxBUS 499, Week 6 Acquisition and Restructuring StrategiesSlide #.docx
BUS 499, Week 6 Acquisition and Restructuring StrategiesSlide #.docxcurwenmichaela
 
BUS 499, Week 4 Business-Level Strategy, Competitive Rivalry, and.docx
BUS 499, Week 4 Business-Level Strategy, Competitive Rivalry, and.docxBUS 499, Week 4 Business-Level Strategy, Competitive Rivalry, and.docx
BUS 499, Week 4 Business-Level Strategy, Competitive Rivalry, and.docxcurwenmichaela
 
BUS 437 Project Procurement Management Discussion QuestionsWe.docx
BUS 437 Project Procurement Management  Discussion QuestionsWe.docxBUS 437 Project Procurement Management  Discussion QuestionsWe.docx
BUS 437 Project Procurement Management Discussion QuestionsWe.docxcurwenmichaela
 
BUS 480.01HY Case Study Assignment Instructions .docx
BUS 480.01HY Case Study Assignment Instructions     .docxBUS 480.01HY Case Study Assignment Instructions     .docx
BUS 480.01HY Case Study Assignment Instructions .docxcurwenmichaela
 
BUS 308 Week 5 Lecture 3 A Different View Effect Sizes .docx
BUS 308 Week 5 Lecture 3 A Different View Effect Sizes .docxBUS 308 Week 5 Lecture 3 A Different View Effect Sizes .docx
BUS 308 Week 5 Lecture 3 A Different View Effect Sizes .docxcurwenmichaela
 
BUS 340 Week 5BUS 340 Business CommunicationsWee.docx
BUS 340 Week 5BUS 340 Business CommunicationsWee.docxBUS 340 Week 5BUS 340 Business CommunicationsWee.docx
BUS 340 Week 5BUS 340 Business CommunicationsWee.docxcurwenmichaela
 
BUS 308 – Week 4 Lecture 2 Interpreting Relationships .docx
BUS 308 – Week 4 Lecture 2 Interpreting Relationships .docxBUS 308 – Week 4 Lecture 2 Interpreting Relationships .docx
BUS 308 – Week 4 Lecture 2 Interpreting Relationships .docxcurwenmichaela
 

More from curwenmichaela (20)

BUS310ASSIGNMENTImagine that you work for a company with an ag.docx
BUS310ASSIGNMENTImagine that you work for a company with an ag.docxBUS310ASSIGNMENTImagine that you work for a company with an ag.docx
BUS310ASSIGNMENTImagine that you work for a company with an ag.docx
 
BUS357 Copyright © 2020 Singapore University of Social Science.docx
BUS357 Copyright © 2020 Singapore University of Social Science.docxBUS357 Copyright © 2020 Singapore University of Social Science.docx
BUS357 Copyright © 2020 Singapore University of Social Science.docx
 
BUS308 Statistics for ManagersDiscussions To participate in .docx
BUS308 Statistics for ManagersDiscussions To participate in .docxBUS308 Statistics for ManagersDiscussions To participate in .docx
BUS308 Statistics for ManagersDiscussions To participate in .docx
 
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docxBUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
BUS308 Week 4 Lecture 1 Examining Relationships Expect.docx
 
BUS225 Group Assignment1. Service BlueprintCustomer acti.docx
BUS225 Group Assignment1. Service BlueprintCustomer acti.docxBUS225 Group Assignment1. Service BlueprintCustomer acti.docx
BUS225 Group Assignment1. Service BlueprintCustomer acti.docx
 
BUS301 Memo Rubric Spring 2020 - Student.docxBUS301 Writing Ru.docx
BUS301 Memo Rubric Spring 2020 - Student.docxBUS301 Writing Ru.docxBUS301 Memo Rubric Spring 2020 - Student.docxBUS301 Writing Ru.docx
BUS301 Memo Rubric Spring 2020 - Student.docxBUS301 Writing Ru.docx
 
BUS1431Introduction and PreferencesBUS143 Judgmen.docx
BUS1431Introduction and PreferencesBUS143 Judgmen.docxBUS1431Introduction and PreferencesBUS143 Judgmen.docx
BUS1431Introduction and PreferencesBUS143 Judgmen.docx
 
BUS210 analysis – open question codesQ7a01 Monthly OK02 Not .docx
BUS210 analysis – open question codesQ7a01 Monthly OK02 Not .docxBUS210 analysis – open question codesQ7a01 Monthly OK02 Not .docx
BUS210 analysis – open question codesQ7a01 Monthly OK02 Not .docx
 
Bus101 quiz (Business Organizations)The due time is in 1hrs1 .docx
Bus101 quiz (Business Organizations)The due time is in 1hrs1 .docxBus101 quiz (Business Organizations)The due time is in 1hrs1 .docx
Bus101 quiz (Business Organizations)The due time is in 1hrs1 .docx
 
BUS 625 Week 4 Response to Discussion 2Guided Response Your.docx
BUS 625 Week 4 Response to Discussion 2Guided Response Your.docxBUS 625 Week 4 Response to Discussion 2Guided Response Your.docx
BUS 625 Week 4 Response to Discussion 2Guided Response Your.docx
 
BUS 625 Week 2 Response for Discussion 1 & 2Week 2 Discussion 1 .docx
BUS 625 Week 2 Response for Discussion 1 & 2Week 2 Discussion 1 .docxBUS 625 Week 2 Response for Discussion 1 & 2Week 2 Discussion 1 .docx
BUS 625 Week 2 Response for Discussion 1 & 2Week 2 Discussion 1 .docx
 
Bus 626 Week 6 - Discussion Forum 1Guided Response Respon.docx
Bus 626 Week 6 - Discussion Forum 1Guided Response Respon.docxBus 626 Week 6 - Discussion Forum 1Guided Response Respon.docx
Bus 626 Week 6 - Discussion Forum 1Guided Response Respon.docx
 
BUS 499, Week 8 Corporate Governance Slide #TopicNarration.docx
BUS 499, Week 8 Corporate Governance Slide #TopicNarration.docxBUS 499, Week 8 Corporate Governance Slide #TopicNarration.docx
BUS 499, Week 8 Corporate Governance Slide #TopicNarration.docx
 
BUS 499, Week 6 Acquisition and Restructuring StrategiesSlide #.docx
BUS 499, Week 6 Acquisition and Restructuring StrategiesSlide #.docxBUS 499, Week 6 Acquisition and Restructuring StrategiesSlide #.docx
BUS 499, Week 6 Acquisition and Restructuring StrategiesSlide #.docx
 
BUS 499, Week 4 Business-Level Strategy, Competitive Rivalry, and.docx
BUS 499, Week 4 Business-Level Strategy, Competitive Rivalry, and.docxBUS 499, Week 4 Business-Level Strategy, Competitive Rivalry, and.docx
BUS 499, Week 4 Business-Level Strategy, Competitive Rivalry, and.docx
 
BUS 437 Project Procurement Management Discussion QuestionsWe.docx
BUS 437 Project Procurement Management  Discussion QuestionsWe.docxBUS 437 Project Procurement Management  Discussion QuestionsWe.docx
BUS 437 Project Procurement Management Discussion QuestionsWe.docx
 
BUS 480.01HY Case Study Assignment Instructions .docx
BUS 480.01HY Case Study Assignment Instructions     .docxBUS 480.01HY Case Study Assignment Instructions     .docx
BUS 480.01HY Case Study Assignment Instructions .docx
 
BUS 308 Week 5 Lecture 3 A Different View Effect Sizes .docx
BUS 308 Week 5 Lecture 3 A Different View Effect Sizes .docxBUS 308 Week 5 Lecture 3 A Different View Effect Sizes .docx
BUS 308 Week 5 Lecture 3 A Different View Effect Sizes .docx
 
BUS 340 Week 5BUS 340 Business CommunicationsWee.docx
BUS 340 Week 5BUS 340 Business CommunicationsWee.docxBUS 340 Week 5BUS 340 Business CommunicationsWee.docx
BUS 340 Week 5BUS 340 Business CommunicationsWee.docx
 
BUS 308 – Week 4 Lecture 2 Interpreting Relationships .docx
BUS 308 – Week 4 Lecture 2 Interpreting Relationships .docxBUS 308 – Week 4 Lecture 2 Interpreting Relationships .docx
BUS 308 – Week 4 Lecture 2 Interpreting Relationships .docx
 

Recently uploaded

Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxAmita Gupta
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseAnaAcapella
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docxPoojaSen20
 

Recently uploaded (20)

Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 

BUS308 – Week 1 Lecture 2 Describing Data Expected Out.docx

  • 1. BUS308 – Week 1 Lecture 2 Describing Data Expected Outcomes After reading this lecture, the student should be familiar with: 1. Basic descriptive statistics for data location 2. Basic descriptive statistics for data consistency 3. Basic descriptive statistics for data position 4. Basic approaches for describing likelihood 5. Difference between descriptive and inferential statistics What this lecture covers This lecture focuses on describing data and how these descriptions can be used in an analysis. It also introduces and defines some specific descriptive statistical tools and results. Even if we never become a data detective or do statistical tests, we will be exposed and bombarded with statistics and statistical outcomes. We need to understand what they are telling us and how they help uncover what the data means on the “crime,” AKA research question/issue. How we obtain these results will be covered in lecture 1-3. Detecting In our favorite detective shows, starting out always seems
  • 2. difficult. They have a crime, but no real clues or suspects, no idea of what happened, no “theory of the crime,” etc. Much as we are at this point with our question on equal pay for equal work. The process followed is remarkably similar across the different shows. First, a case or situation presents itself. The heroes start by understanding the background of the situation and those involved. They move on to collecting clues and following hints, some of which do not pan out to be helpful. They then start to build relationships between and among clues and facts, tossing out ideas that seemed good but lead to dead-ends or non-helpful insights (false leads, etc.). Finally, a conclusion is reached and the initial question of “who done it” is solved. Data analysis, and specifically statistical analysis, is done quite the same way as we will see. Descriptive Statistics Week 1 Clues We are interested in whether or not males and females are paid the same for doing equal work. So, how do we go about answering this question? The “victim” in this question could be considered the difference in pay between males and females, specifically when they are doing equal work. An initial examination (Doc, was it murder or an accident?) involves obtaining basic information to see if we even have cause to worry.
  • 3. The first action in any analysis involves collecting the data. This generally involves conducting a random sample from the population of employees so that we have a manageable data set to operate from. In this case, our sample, presented in Lecture 1, gave us 25 males and 25 females spread throughout the company. A quick look at the sample by HR provided us with assurance that the group looked representative of the company workforce we are concerned with as a whole. Now we can confidently collect clues to see if we should be concerned or not. As with any detective, the first issue is to understand the “who” and “what” about the victim. In this case, we need to use our sample to understand basic information about how males and females are paid. Understanding data sets typically involves look at several characteristics. These descriptive measures describe the data set. Typical descriptive measures include: • Measures of location such as the average (AKA mean), the median (middle point), and mode (most often occurring value if it exists). • Measures of consistency such as range (largest value minus the smallest value), variance, and standard deviation. • Measure of position showing where a single data point is within the data set, such as percentile and rank.
  • 4. • Measures of likelihood showing the probability of obtaining specific outcomes. Note: Descriptive statistics describe a particular data set and can only be used for that data set. However, often we want to use a sample to “infer” back to a larger population. In this case, we would use inferential statistics. Most measures, except for variance and standard deviation, are calculated the same way. We will see the specific difference for those two later in this lecture. The key to whether we have descriptive statistics or inferential statistics lies with the group we are taking the measures on. If we are only concerned with that group, we use descriptive statistics. If, however, we want to use that group to make inferences, claims, and conclusions about a larger population, then we take a random sample from the population and use inferential statistics (allowing us to infer back to the population). Our class data sets – both the lecture and homework – are random samples from a larger population, so we will basically be using inferential statistical measures. Note that these are not the complete list of possible descriptive statistics. Excel’s Descriptive Statistics function (described in Lecture 3 for this week) includes a couple of measures that focus on data distribution shape. These have some specialized uses that we will not be getting into.
  • 5. Location Measures Perhaps the most often asked question about data sets is what is the average? The intent is to get a measure that shows us the center of the data. Unfortunately, average is a somewhat imprecise term that could mean all three of our measures of location identified above. So, as analysts we tend to be more precise and use mean, median, and mode. While these all tell us something about where the data might be clustered, they can provide very different views of the data. An example of this comes from an example the author heard back in High School. At that time, the mean per capita income for citizens of Kuwait was about $25,000; the median income was around $125; and the mode was $25! The very high (due to oil revenues) income of the Royal family accounted for much of this difference, but just look at the different impressions we get about the country depending on which value we look at. • Mean, AKA average, is the sum of all the values divided by the count. This can be considered the “weighted center” of the data set. For example, the mean of 1, 2, 3, 4, and 5 = 1+2+3+4+5/5 = 15/5 = 3. The mean is generally the best measure for any data set as it uses all of the data values and requires interval or ratio level data. Thus, while we can average salary, compa-ratio, seniority, etc., we cannot average gender or gender1 (even if
  • 6. one is coded in numbers) or grade in our data set. • The median is the middle value in an ordered (listed from low to high) data set. This is the “physical center” of a data set. For example, the median of 1, 2, 3, 4, and 5 = 3, the middle value. If we have an even number of values, the median is the average of the middle two values. Medians can be found on ordinal, interval, or ratio level data. • The mode is the most frequently occurring value. This is more or less the “popular center” as it is where most numbers group together. A data set may have no modes or one or more. Modes may occur with any level of data. The data set 1,1,2,2,2,2,3,8,8,9 has a primary mode of 2, and two secondary modes of 1 and 8. Consistency/Variation Measures While they do not have the popularity of their location cousins, knowing the consistency or variation within the data is as important, some say even more important, as knowing the central tendency for us to understand what the data is trying to tell us. Very consistent data, with little variation, has a mean that is very representative of the data and is unlikely to change much if we resample the population. Data with a large amount of variation tends to have unstable means, meaning that these values would change a lot with multiple samples. Inconsistent data (having large variation) is often a problem for businesses, particularly for manufacturing operations, as it means the results they produce differ and might
  • 7. often not meet the quality specifications. Predictions based on data with large variations are rarely useful. Consider attempting to estimate how long it would take you to get to work if your route had frequent traffic accidents that made the travel time different every day. The key measures of variation are: • Range, which equals the maximum value minus the minimum value. For our example data set of 1, 2, 3, 4, and 5, the range is 5 – 1 = 4. • Variance, which is the average of the square of sum of the differences between each value in the data set from the mean. To get the variance, find the mean of the data, subtract this value from each of the data points, square this result (to get rid of the negative differences), add them up and divide by the total count. For our example data set, this would look like: Value Mean Difference Squared 1 3 -2 4 2 3 -1 1 3 3 0 0 4 3 1 1 5 3 2 4 Sum = 10
  • 8. Variance = 10/5 = 2 The problem with variance is that it expressed as units squared. So, if our data set were dollars, the variance would be 2 dollars squared. How should we interpret dollars squared? In general, we do not and use the next measure instead. • Standard Deviation is the (positive) square root of the variance. It returns the dispersion measure back to one that is in the same units as the original data, so we can compare it to the data values. For our example, the standard deviation is the square root of 2 dollars squared, or 1.4 dollars. This much easier to understand measure means that the average difference from the mean is 1.4 dollars (in our example above having a mean or average value of 3 dollars). • Important point about the variance and standard deviation. When we find these values for a population, the entire group we are interested in, we divide the numerator by the sample size. However, when we have a sample of the entire group (and want to use this sample to estimate the population value for either variance or standard deviation), we create the inferential estimate by dividing the numerator by the (count – 1). This is an adjustment that increases the estimate to take into account we most likely do not have the extreme low and extreme high value from the population in our sample, so its variation is less than the group we are using the sample to describe.
  • 9. Just as detectives want to know what victims typically did and how consistent they were in their behavior around the time of the crime (For example: Was he usually in this area, and if not, why last night?), examining location and consistency measures provide a similar perspective on data variables and how they behave. Applying the Information: Equal Pay Questions OK, we can now start looking at our data set to see what the numbers are hiding, and develop some clues. As with all analysis, we start with questions, then identify the tools to use for those questions, and finally apply those tools to the data. Our initial question is, do males and females get equal pay for equal work? We also said we needed to start with the question of whether or not we had some measures that showed pay comparisons between males and females. Let’s take a look at some of the group and sub-group data. A couple of measures that might answer this question are: • What are the group averages for each variable? • What are the average male and female compa-ratios? (Remember, you will work with the Salary variable in the homework.) • How consistent are the compa-ratios for each? Note that we will be focusing on the compa-ratio data in the
  • 10. lectures, while you will focus on the same questions using salary in the weekly homework assignments. As described, compa-ratio is the result of dividing an employee’s salary by their grade midpoint. It generally ranges from about 0.80 to 1.20 in most pay plans. The value of this measure is it removes the impact of different grades (each of which we are assuming are different levels of work from other grades and contain equal work for all the jobs within the grade). While not a perfect measure, it is the start of measuring what is paid for equal work. Side note: a grade’s midpoint is generally pegged to the average market pay needed to hire new employees into a job. Week 1 Question 1 Question 1 asks for some summary statistics. Part A asks for you to use the Excel Descriptive Statistic function (more on this in the third lecture), while part B asks for some specific statistics using the Fx function list (again, how to do this is covered in lecture 3). The purpose for these specific requests is to let you show mastery in using these two Excel tools. For part a, the mean, standard deviation, and range of the entire compa-ratio data set is highlighted. This shows us that that mean is 1.062, the standard deviation is 0.077, and the range is 0.34. As interesting as these values are, they do not really tell us anything. Measures generally need to be compared to provide information.
  • 11. This is where part b comes in. We see that the male and female averages (1.056 and 1.069 (rounded) respectively) appear relatively close and are on opposite sides of the overall mean. The standard deviations are also close at 0.084 and 0.07 and surround the standard deviation from the entire data set. The ranges are both smaller than the overall range – meaning that neither gender has both the smallest and largest value. The female compa-ratios appear to be slightly more clustered (less variation, more consistent) than the male values from both the range and standard deviation results. Two things stand out. First, perhaps surprisingly, the females appear to be paid more relative to their grade midpoints than the males. Second, measures of dispersion appear fairly close with males being slightly more spread out than females. So far, nothing seems to create any concerns as we expect sample results to be a bit different than the overall population values. These differences seem to be small enough that they might be simple sampling errors – if we resampled (such as the data set you will be working with) the male and female results might switch. Remember, when you do this problem in the homework, use the salary data. As practice you can copy the data set into a practice excel file and try to replicate the same answers as show up in the lectures. Ask a question if you are unsure of how to do this or do not get the same
  • 12. results using the lecture dataset. Position Measures Often, we are interested in where within a data set a particular measure falls. This opens up the idea of distributions, how the data values are spread across the range of values. Our detectives would be looking at where victims typically went and where they spent their time – the pattern of their normal behavior. Distributions. Location and consistency measures are important for summarizing the data set. Important as they are, they do not always give us all the information we need. At times we want to know how specific values fit within the data set. For example, we might want to compare the 10th highest male and female value to get a sense of how relative positions within the data range differ. This often means we need to examine the distribution, or shape, of the data. This shows us how all the data values relate to all of the other values with the sample. One important tool in analyzing data sets that we will not cover (we cannot cover everything, alas) is graphical analysis – looking at how data sets are distributed when graphed. One example will show how powerful these techniques can be. One very common graph is a histogram – a count of how many times a certain value occurs. For example, if you tossed a pair of dice 50 times, you might get the following results. The table shows the results we got. The Histogram shows the distribution or shape of the data, with the
  • 13. x-axis, horizontal, showing the sum of the numbers on the two faces and the y-axis, vertical, showing how often we observed Outcomes from tossing a pair of dice Count showing 2 3 4 5 6 7 8 9 10 11 12 Frequency seen 1 2 4 3 9 12 7 5 4 1 2 0 2 4 6 8 10 12 14 2 3 4 5 6 7 8 9 10 11 12 a particular result in our 50 tosses. A couple of things we can do with distributions can be easily shown with this histogram. First, we can find the center, in this case 7. We can see that
  • 14. there are two tails around the center, one to the left showing counts for values less than the middle value of 7, and one to the right showing how often we got values greater than 7. Visually, we can see that the further away from the center we get, the less often – or less likely – we are to get any particular outcome. Ways to quantify these observations are discussed below. Our detectives use this logic when they attempt to find out where all the persons of interest were at the critical times. These approaches provide more detailed information about how the data looks more specifically than the summaries of dispersion examined earlier. Position Measures. Central tendency and variation are group descriptive measures – particularly the mean and standard deviation, which use all the values in the data set in their calculation. At times; however, we are concerned with specific values with in the distribution, such as: • Quartiles, • Percentiles, or centiles, • The 5-number summary, or • Z-score. Quartiles and Percentiles. These measures divide the data into groups, four with the quartile and 100 with the percentile. One example that many of you might be familiar with is percentile (AKA percentile rank). This is often use when doctors describe a child as in the 80th percentile in height or weight for his/her age. This means that
  • 15. 80% of other children at this age are at or below this particular child’s measure. Percentiles range from 1 to 100%-tile, meaning the lowest score would be at the first (or 1%-tile) and the highest score would be at the 100%- tile. Percentiles are very useful for comparing groups. The general percentile formula lets us find percentiles, deciles (the 10% divisions), and/or quartiles, although Excel will do this for us. The formula is: Lp = (n+1) * P/100; where Lp is the count of the desired percentile (25 would be the location of the first quartile, for example) n is the size/count of the data set P is the desired percentile; using 25, 50, or 75 gives the quartile points, while using 10, 20, etc. would give the decile points. Example: if we wanted to find the cut-off for the first (or lowest) quartile of the data, also known as the 25th percentile in a data set of 50, we would use (50+1)*25/100 = 12.75, or the 13th value from the bottom in an ordered list. By convention, we always round up to the next whole value. 5-Number Summary. As its name suggests, the 5-number summary identifies five key values in a data set: minimum value, 1st quartile, median or 2nd
  • 16. quartile, 3rd quartile, and maximum values. For the compa-ratio data set used in the lectures, the 5-number summary can be found from the following table results. The 1st quartile, for either gender group of 25 is (25+1) * 25/100 = 6.5, or the 7th values in a rank ordered list. The 3rd quartile is located at 19.5. For the entire sample of 50, these values are located at the 13th and 39th rank ordered places, respectively. Here is a 5-number summary for the overall compa-ratio values in the sample: Compa-ratio 5-number summary: 0.870, 1.013, 1.051, 1.134, 1.210. More on this shortly. Z-score. What is often of more value is looking at where specific measures lie within each range. The z-score measures show how far from the mean a specific data point lies, measured in standard deviation units. (I know that sounds strange but keep reading.) The Z- score provides a measure of how many standard deviations a particular score lies from the mean, and in what direction (above or below). The Z-score formula is: Z = (individual score – mean) / (standard deviation). Looking at this formula we can see that a score above the mean would give us a positive z-score, a score below the mean would give us a negative z- score, and a score that exactly equals the mean would gives us a z-score of 0. For most data sets, the z-score ranges from a -3.0 to a +3.0.
  • 17. For example, in our example data set (1, 2, 3, 4, and 5) (see above for descriptive statistics on this data set), the z score for 2 would be (2-3)/1.4 = -1/1.4 = -0.71. The negative value means that 2 is below (or less than) the mean and is 0.71 standard deviation units away from the mean (0.71 times the standard deviation of 1.4 = 1). Using this measure, we can easily examine relative placement of scores. For example, a compa-ratio of 1.06 would have Z-scores of 0.048 for males, - 0.129 for females, and -0.03 for the overall group. (We will see how we got these values shortly.) Thus, we can see that a person with this compa-ratio is slightly above average for males, but below average for the overall group and for females. Applying the information Week 1 Question 2 Question 2 asks for a 5-number summary for the overall compa- ratio data set as well as for the male and female sub-groups within the data. Note: Lecture 1-3 will show the same screen shot with the cell formulas displayed. One of the first observations that confirms an earlier observation is that neither the male nor female data set has both the largest and smallest values.
  • 18. The males appear to have a slightly lower overall range of values than do the females. Some other interesting observations include the relatively similar 3rd quartile values for all three groups and the lower midpoint for females, meaning that more females are lower in the overall range than males. More males are in the first quartile than females. What other observations can you make about how employees are distributed within their respective compa-ratio ranges? Week 1 Question 3 Often looking at how a single point lies within a data range is helpful to get some insight into how the distributions are positioned. Question 3 asks for us to examine where the midpoint of each gender’s dataset fits within the entire compa-ratio data set. The Percentank.exc function returns a percentile rank, the percent of data values that fall at or below a given value. For example, the percentrank.exe of the median would be 50%-tile as half the values are above and half below the median (as expected). When we look at the male median, we see it falls at the 51st %- tile, meaning it is slightly above the overall median. The female median (half of the female compa-ratios are below this value remember) falls at the 33rd %-tile! This means that most of the females are in the bottom half of the distribution, even though (from Question 2), females have the “higher” range. Interesting.
  • 19. The z score is a measure of relative placement based on the mean rather than the median. A value that equals the mean would have a z score of 0, a value that is greater than the mean would have a positive z score, while a value less than the mean would have a negative z score. Both the male and female medians fall below the overall compa- ratio mean, with the female median being relatively lower in the distribution. This is consistent with what the percentile scores suggested. Overall, these two questions are suggesting that males and females are not distributed the same within in the compa-ratio data set. Likelihood Measures Likelihood, or probability, focuses on how often we can expect to see an outcome. In statistics, many decisions are made based upon how likely, or more accurately, how unlikely it is to see an outcome. Probability Probability is the likelihood that an event will happen. For example, if we toss a fair coin, we have a 50/50 chance, or a probability of .5 of getting a head. If we pick a date between 1 and 7, we have a 1 out of 7 chances (or a probability of 1/7 = .14 or 14%) that it will be a Wednesday in the current month. Statisticians recognize three types of probabilities:
  • 20. • Theoretical – based on a theory, for example – since a die (half of a pair of dice) has 6 sides, and our theory says each face is equally likely to show up when we toss it; we therefore expect that will see a 1 1/6th of the number of times we toss it (assuming we toss it a lot). • Empirical – count based; if we see that an accident happens on our way to work 5 times(days) within every 4 weeks, we can say the probability of an accident today is 5/20 or 25% since there are 20 work days within a 4-week period. An empirical probability equals the number of successes we see divided by the number of times we could have seen the outcome. • Subjective – a guess based on some experience or feeling. There are some basic probability rules that will be helpful during the course. The probability • of something (an event) happening is called P(event), • of two things happening together – called joint probability: P(A and B), • of either one or the other but not both events occurring – P(A or B), • of something occurring given that something else has occurred, conditional probability: P(A|B) (read as probability of A given B). • Compliment rule: P(not A) = 1- p(A). Two other issues are needed, the idea of mutually exclusive
  • 21. means that the elements of one data set do not belong to another – for example, males and pregnant are mutually exclusive data sets. The other term we frequently hear with probability is collectively exhaustive – this simply means that all members of the data set are listed. Some rules, which apply for both theoretical and empirical based probabilities, for dealing with these different probability situations include: • P(event) = (number of success)/(number of attempts or possible outcomes) • P(A and B) = P(A)*P(B) for independent events or P(A)*P(B|A) for dependent events (This last is called conditional probability the probability of B occurring given that A has occurred). • P(A or B) = P(A) + P(B) – P(A and B); if A and B cannot occur together (such as the example of male and pregnant) then P(A and B) = 0 • P(A|B) = P(A and B)/P(B). One of the more interesting uses of probabilities (other than forecasting the likelihood of rain on our days off) is the comparing of outcome likelihoods for different groups. • The probability of randomly picking a female [P(F)] is the same as randomly picking a male [P(M)] from the group = 25 specified outcomes/50
  • 22. possible outcomes. This is a simple empirical probability – counts divided by counts. • We can get a bit more complicated, such as the probability of picking a female from a specific grade such as B – P(F|B), probability of picking a female given (from) only grade B. Again, this is empirical – we have 7 employees in grade B, and 4 of these are females, so P(F|B) = 4/7. • Now the probability of picking a Female who is also in grade B (from the entire data set is 4 females out of 50 = 4/50 = 0.08, empirically. We can find this using the P(A and B) formula referenced above. P(F and B) = P(F)*P(B|F), since the events of female and grade E are not independent. So, we know P(F) = .5, and P(B/F) = 4/25 (4 females out of 25 are in grade B), so by theory, P(Female and grade B) = .5 * .16 = 0.08, the same results. • The compliment rule is often helpful, if we want to find the probability of picking any female EXCEPT those in grade B, we could figure out the probability for each of the grades and add them together, OR we could simply say that the probability of Female and not grade B is simply 1 – P(Female and grade B), or 1 - 0.08, or 0.92. We will use this property of probabilities a lot in the rest of the class. As we can see, probabilities can show us a lot and can be somewhat complex in determining their values. The nice thing is that this is about as complicated
  • 23. as it gets. Applying the information Week 1 Question 4 Question 4 gives us some probability values- how likely are we to exceed the respective gender midpoints in the entire data set. We are looking at the empirical and normal curve probabilities. If the data set is normally distributed, the probabilities should be fairly close; if not, we have a clue that the data might not be normally distributed over the entire data range. The male empirical probability of exceeding the midpoint in the entire data set is 50% empirically (close to the 51st percentile value we got above) and 55% assuming normality – fairly close. The female probabilities are 68 and 60% respectively; again not too far off. The data again support the idea that a lot of females are at the higher end of the compa- ratio distribution. Drawing Conclusions: Week 1 Question 5 As interesting as the numbers are themselves, they mean very little unless we can interpret their meaning and apply that insight to the question(s) at hand.
  • 24. Recapping our results, we see that while female overall average compa-ratio is somewhat higher than the males, the probability and distribution outcomes suggest that males and females are not distributed in the same fashion and that more of the females are relatively lower in their range than the males. While we have not yet accounted for equal work, it appears that there are some issues suggesting that males and females are not paid the same within the company. At least enough for more investigation. On our detective shows, we might say that we have some evidence, but not enough to take it to the grand jury for an indictment yet. Summary This lecture looked at descriptive statistics and what they can tell us about the data set. We reviewed the questions that are asked in the Week 1 assignment and the answers for each question using the COMPA-RATIO variable. The focus of this lecture was on interpreting presented results, as that is a more frequent activity for professionals than actually developing the measures. Specifically, we looked at the developing the following information.
  • 25. Note that this was created by listing the tool as we introduced it, the data requirements, and then a typical question that would require this tool. By copying this information to a second Excel sheet and sorting the columns we can create a guide as to when to use each tool, a shown below. Now, we move on to some specific ways to set-up Excel to provide the results that we just looked at. Before we do, however, please respond to Discussion Thread 2 for this week with your initial response and responses to others over a couple of days before moving on to reading the second lecture for the week. Please ask your instructor if you have any questions about this material.