SlideShare a Scribd company logo
1 of 40
Statistics is both the science of uncertainty and the technology
of extracting information from data.
A statistic is a summary measure of data.
Descriptive statistics are methods that describe and summarize
data.
Microsoft Excel supports statistical analysis in two ways:
1. Statistical functions
2. Analysis Toolpak add-in
Statistical Methods for Summarizing Data
A frequency distribution is a table that shows the number of
observations in each of several nonoverlapping groups.
Categorical variables naturally define the groups in a frequency
distribution.
To construct a frequency distribution, we need only count the
number of observations that appear in each category.
This can be done using the Excel COUNTIF function.
Frequency Distributions for Categorical Data
Example 3.16: Constructing a Frequency Distribution for Items
in the Purchase Orders Database
List the item names in a column on the spreadsheet.
Use the function =COUNTIF($D$4:$D$97,cell_reference),
where cell_reference is the cell containing the item name
Example 3.16: Constructing a Frequency Distribution for Items
in the Purchase Orders Database
Construct a column chart to visualize the frequencies.
Relative frequency is the fraction, or proportion, of the total.
If a data set has n observations, the relative frequency of
category i is:
We often multiply the relative frequencies by 100 to express
them as percentages.
A relative frequency distribution is a tabular summary of the
relative frequencies of all categories.
Relative Frequency Distributions
Example 3.17: Constructing a Relative Frequency Distribution
for Items in the Purchase Orders Database
First, sum the frequencies to find the total number (note that the
sum of the frequencies must be the same as the total number of
observations, n).
Then divide the frequency of each category by this value.
For numerical data that consist of a small number of discrete
values, we may construct a frequency distribution similar to the
way we did for categorical data; that is, we simply use
COUNTIF to count the frequencies of each discrete value.
Frequency Distributions for Numerical Data
In the Purchase Orders data, the A/P terms are all whole
numbers 15, 25, 30, and 45.
Example 3.18: Frequency and Relative Frequency Distribution
for A/P Terms
A graphical depiction of a frequency distribution for numerical
data in the form of a column chart is called a histogram.
Frequency distributions and histograms can be created using the
Analysis Toolpak in Excel.
Click the Data Analysis tools button in the Analysis group
under the Data tab in the Excel menu bar and select Histogram
from the list.
Excel Histogram Tool
Specify the Input Range corresponding to the data. If you
include the column header, then also check the Labels box so
Excel knows that the range contains a label. The Bin Range
defines the groups (Excel calls these “bins”) used for the
frequency distribution.
Histogram Dialog
If you do not specify a Bin Range, Excel will automatically
determine bin values for the frequency distribution and
histogram, which often results in a rather poor choice.
If you have discrete values, set up a column of these values in
your spreadsheet for the bin range and specify this range in the
Bin Range field.
Using Bin Ranges
We will create a frequency distribution and histogram for the
A/P Terms variable in the Purchase Orders database.
We defined the bin range below the data in cells H99:H103 as
follows:
Month
15
25
30
45
Example 3.19: Using the Histogram Tool
Histogram tool results:
Example 3.19: Using the Histogram Tool
For numerical data that have many different discrete values with
little repetition or are continuous, a frequency distribution
requires that we define by specifying
the number of groups,
the width of each group, and
the upper and lower limits of each group.
Choose between 5 to 15 groups, and the range of each should be
equal.
Choose the lower limit of the first group (LL) as a whole
number smaller than the minimum data value and the upper
limit of the last group (UL) as a whole number larger than the
maximum data value.
Histograms for Numerical Data
The data range from a minimum of $68.75 to a maximum of
$127,500; set the lower limit of the first group to $0 and the
upper limit of the last group to $130,000.
If we select 5 groups, using equation (3.2) the width of each
group is ($130,000 - 0) / 5 = $26,000
Example 3.20: Constructing a Frequency Distribution and
Histogram for Cost per Order
Ten-group histogram
Example 3.20: Constructing a Frequency Distribution and
Histogram for Cost per Order
Set the cumulative relative frequency of the first group equal to
its relative frequency. Then add the relative frequency of the
next group to the cumulative relative frequency.
For, example, the cumulative relative frequency in cell D3 is
computed as =D2+C3 = 0.000 + 0.447 = 0.447.
Example 3.21 Computing Cumulative Relative Frequencies
The kth percentile is a value at or below which at least k
percent of the observations lie. The most common way to
compute the kth percentile is to order the data values from
smallest to largest and calculate the rank of the kth percentile
using the formula:
Statistical software use different methods that often involve
interpolating between ranks instead of rounding, thus producing
different results.
The Excel function PERCENTILE.INC(array, k) computes the
kth percentile of data in the range specified in the array field,
where k is in the range 0 to 1, inclusive (i.e., including 0 and
1).
Percentiles
Compute the 90th percentile for Cost per order in the Purchase
Orders data.
Rank of kth percentile = nk/100 + 0.5
n = 94; k = 90
For the 90th percentile, the rank is
= 94(90)/100+0.5 = 85.1 (round to 85)
Value of the 85th observation = $74,375
Using the Excel function PERCENTILE.INC(G4:G97,0.9), the
90th percentile is $73,737.50, which is different from using
formula (3.3).
Examples 3.22 and 3.23: Computing Percentiles
Data >
Data Analysis >
Rank and Percentile
90.3rd percentile
= $74,375
(same result as
manually computing
the 90th percentile)
Example 3.24 Excel Rank and Percentile Tool
The Excel value of the 90th percentile that was computed in
Example 3.23 as $74,375 is the 90.3rd percentile value.
Quartiles break the data into four parts.
The 25th percentile is called the first quartile,Q1;
the 50th percentile is called the second quartile, Q2;
the 75th percentile is called the third quartile, Q3; and
the 100th percentile is the fourth quartile, Q4.
One-fourth of the data fall below the first quartile, one-half are
below the second quartile, and three-fourths are below the third
quartile.
Excel function QUARTILE. INC(array, quart), where array
specifies the range of the data and quart is a whole number
between 1 and 4, designating the desired quartile.
Quartiles
Compute the Quartiles of the Cost per Order data
First quartile: =QUARTILE.INC(G4:G97,1) = $6,757.81
Second quartile: =QUARTILE.INC(G4:G97,2) = $15,656.25
Third quartile: =QUARTILE.INC(G4:G97,3) = $27,593.75
Fourth quartile: =QUARTILE.INC(G4:G97,4) = $127,500.00
Example 3.25 Computing Quartiles in Excel
A cross-tabulation is a tabular method that displays the number
of observations in a data set for different subcategories of two
categorical variables.
A cross-tabulation table is often called a contingency table.
The subcategories of the variables must be mutually exclusive
and exhaustive, meaning that each observation can be classified
into only one subcategory, and, taken together over all
subcategories, they must constitute the complete data set.
Cross-Tabulations
Sales Transactions database
Count the number (and compute the percentage) of books and
DVDs ordered by region.
Example 3.26: Constructing a Cross-Tabulation
Cross-Tabulation Visualization: Chart of Regional Sales by
Product
Select the Insert tab.
Highlight the data.
Click on chart type, then subtype.
Use Chart Tools to customize.
Creating Charts in Microsoft Excel
Excel distinguishes between vertical and horizontal bar charts,
calling the former column charts and the latter bar charts.
A clustered column chart compares values across categories
using vertical rectangles;
a stacked column chart displays the contribution of each value
to the total by stacking the rectangles;
a 100% stacked column chart compares the percentage that each
value contributes to a total.
Column and bar charts are useful for comparing categorical or
ordinal data, for illustrating differences between sets of values,
and for showing proportions or percentages of a whole.
Column and Bar Charts
Example 3.2: Creating a Column Chart
Highlighted Cells
Highlight the range C3:K6, which includes the headings and
data for each category. Click on the Column Chart button and
then on the first chart type in the list (a clustered column chart).
Example 3.2: Creating a Column Chart
To add a title, click on the first icon in the Chart Layouts group.
Click on “Chart Title” in the chart and change it to “EEO
Employment Report—Alabama.” The names of the data series
can be changed by clicking on the Select Data button in the
Data group of the Design tab. In the Select Data Source dialog
(see below), click on “Series1” and then the Edit button. Enter
the name of the data series, in this case “All Employees.”
Change the names of the other data series to “Men” and
“Women” in a similar fashion.
Line charts provide a useful means for displaying data over
time.
You may plot multiple data series in line charts; however, they
can be difficult to interpret if the magnitude of the data values
differs greatly. In that case, it would be advisable to create
separate charts for each data series.
Line Charts
Example 3.3: A Line Chart for China Export Data
Pie Charts
A pie chart displays this by partitioning a circle into pie-shaped
areas showing the relative proportion.
Example 3.4: A Pie Chart for Census Data
Pie Charts
Data visualization professionals don't recommend using pie
charts. In a pie chart, it is difficult to compare the relative sizes
of areas; however, the bars in the column chart can easily be
compared to determine relative ratios of the data.
If you do use pie charts, restrict them to small numbers of
categories, always ensure that the numbers add to 100%, and
use labels to display the group names and actual percentages.
Avoid three-dimensional (3-D) pie charts—especially those that
are rotated—and keep them simple.
An area chart combines the features of a pie chart with those of
line charts.
Area charts present more information than pie or line charts
alone but may clutter the observer’s mind with too many details
if too many data series are used; thus, they should be used with
care.
Area Charts
Example 3.5: An Area Chart for Energy Consumption
Scatter charts show the relationship between two variables. To
construct a scatter chart, we need observations that consist of
pairs of variables.
Scatter Charts
Example 3.6: A Scatter Chart for Real Estate Data
A bubble chart is a type of scatter chart in which the size of the
data marker corresponds to the value of a third variable;
consequently, it is a way to plot three variables in two
dimensions.
Bubble Charts
Example 3.7: A Bubble Chart for Stock Comparisons
Stock chart
Surface chart
Doughnut chart
Radar chart
Miscellaneous Excel Charts
Many applications of business analytics involve geographic
data. Visualizing geographic data can highlight key data
relationships, identify trends, and uncover business
opportunities. In addition, it can often help to spot data errors
and help end users understand solutions, thus increasing the
likelihood of acceptance of decision models.
Companies like Nike use geographic data and information
systems for visualizing where products are being distributed and
how that relates to demographic and sales information. This
information is vital to marketing strategies.
Geographic mapping capabilities were introduced in Excel 2000
but were not available in Excel 2002 and later versions. These
capabilities are now available through Microsoft MapPoint
2010, which must be purchased separately.
Geographic Data
Visualizing and Exploring Data
Data visualization - the process of displaying data (often in
large quantities) in a meaningful fashion to provide insights that
will support better decisions.
Data visualization improves decision-making, provides
managers with better analysis capabilities that reduce reliance
on IT professionals, and improves collaboration and information
sharing.
Data Visualization
Tabular data can be used to determine exactly how many units
of a certain product were sold in a particular month, or to
compare one month to another.
For example, we see that sales of product A dropped in
February, specifically by 6.7% (computed as 1 – B3/B2).
Beyond such calculations, however, it is difficult to draw big
picture conclusions.
Example 3.1: Tabular vs. Visual Data Analysis
A visual chart provides the means to
easily compare overall sales of different products (Product C
sells the least, for example);
identify trends (sales of Product D are increasing), other
patterns (sales of Product C is relatively stable while sales of
Product B fluctuates more over time), and exceptions (Product
E’s sales fell considerably in September).
Example 3.1: Tabular vs. Visual Data Analysis
A dashboard is a visual representation of a set of key business
measures. It is derived from the analogy of an automobile’s
control panel, which displays speed, gasoline level,
temperature, and so on.
Dashboards provide important summaries of key business
information to help manage a business process or function.
Dashboards
Hypothesis Testing – Examples and
Case Studies
How Hypothesis Tests Are Reported
Determine the null hypothesis and the
alternative hypothesis.
Collect and summarize the data into a
test statistic.
Use the test statistic to determine the p-value.
The result is statistically significant if the p-value is less than
or equal to the level of significance.
2
Testing Hypotheses About Proportions and Means
If the null and alternative hypotheses are expressed in terms of
a population proportion, mean, or difference between two means
and if the sample sizes are large …
… the test statistic is simply the corresponding standardized
score computed assuming the null hypothesis is true; and the p-
value is found from a table of percentiles for standardized
scores.
3
Example 2: Weight Loss for Diet vs Exercise
Did dieters lose more fat than the exercisers?
Diet Only:
sample mean = 5.9 kg
sample standard deviation = 4.1 kg sample size = n = 42
Exercise Only: sample mean = 4.1 kg
sample standard deviation = 3.7 kg sample size = n = 47
measure of variability = [(0.633)2 + (0.540)2] = 0.83
4
Example 2: Weight Loss for Diet vs Exercise
Step 1. Determine the null and alternative hypotheses.
Null hypothesis: No difference in average fat lost in population
for two methods. Population mean difference is zero.
Alternative hypothesis: There is a difference in average fat lost
in population for two methods. Population mean difference is
not zero.
Step 2. Collect and summarize data into a test statistic.
The sample mean difference = 5.9 – 4.1 = 1.8 kg and the
standard error of the difference is 0.83.
So the test statistic: z = 1.8 – 0 = 2.17
0.83
5
Example 2: Weight Loss for Diet vs Exercise
Step 3. Determine the p-value.
Recall the alternative hypothesis was two-sided.
p- -shaped curve above 2.17]
Step 4. Make a decision.
The p-value of 0.03 is less than or equal to 0.05, so …
If really no difference between dieting and exercise as fat loss
methods, would see such an extreme result only 3% of the time,
or 3 times out of 100.
Prefer to believe truth does not lie with null hypothesis. We
conclude that there is a statistically significant difference
between average fat loss for the two methods.
6
Example 3: Public Opinion About President
On May 16, 1994, Newsweek reported the results of a public
opinion poll that asked: “From everything you know about Bill
Clinton, does he have the honesty and integrity you expect in a
president?” (p. 23).
Poll surveyed 518 adults and 233, or 0.45 of them (clearly less
than half), answered yes.
Could Clinton’s adversaries conclude from this that only a
minority (less than half) of the population of Americans thought
Clinton had the honesty and integrity to be president?
7
Example 3: Public Opinion About President
Step 1. Determine the null and alternative hypotheses.
Null hypothesis: There is no clear winning opinion on this
issue; the proportions who would answer yes or no are each
0.50.
Alternative hypothesis: Fewer than 0.50, or 50%, of the
population would answer yes to this question. The majority do
not think Clinton has the honesty and integrity to be president.
Step 2. Collect and summarize data into a test statistic.
Sample proportion is: 233/518 = 0.45.
The standard deviation =
– 0.50) = 0.022.
518
Test statistic: z = (0.45 – 0.50)/0.022 = –2.27
8
Example 3: Public Opinion About President
Step 3. Determine the p-value.
Recall the alternative hypothesis was one-sided.
p-value = proportion of bell-shaped curve below –2.27 Exact p-
value = 0.0116.
Step 4. Make a decision.
The p-value of 0.0116 is less than 0.05, so we conclude that the
proportion of American adults in 1994 who believed Bill
Clinton had the honesty and integrity they expected in a
president was significantly less than a majority.
9
Revisiting Case Studies: How Journals Present Tests
Whereas newspapers and magazines tend to simply report the
decision from hypothesis testing, journals tend to report p-
values as well.
This allows you to make your own decision, based on the
severity of a type 1 error and the magnitude of the p-value.
10
Case Study 5.1: Quitting Smoking with
Nicotine Patches
11
Compared the smoking cessation rates for smokers randomly
assigned to use a nicotine patch versus a placebo patch.
Null hypothesis: The proportion of smokers in the population
who would quit smoking using a nicotine patch and a placebo
patch are the same.
Alternative hypothesis: The proportion of smokers in the
population who would quit smoking using a nicotine patch is
higher than the proportion who would quit using a placebo
patch.
Case Study 5.1: Quitting Smoking with
Nicotine Patches
12
Higher smoking cessation rates were observed in the active
nicotine patch group at 8 weeks (46.7% vs 20%) (P < .001)
and at 1 year (27.5% vs 14.2%) (P = .011).
(Hurt et al., 1994, p. 595)
Conclusion: p-values are quite small: less than 0.001 for
difference after 8 weeks and equal to 0.011 for difference after
a year. Therefore, rates of quitting are significantly higher
using a nicotine patch than using a placebo patch after 8 weeks
and after 1 year.
Case Study 6.4: Smoking During
Pregnancy and Child’s IQ
13
Study investigated impact of maternal smoking on subsequent
IQ of child at ages 1, 2, 3, and 4 years of age.
Null hypothesis: Mean IQ scores for children whose mothers
smoke 10 or more cigarettes a day during pregnancy are same as
mean for those whose mothers do not smoke, in populations
similar to one from which this sample was drawn.
Alternative hypothesis: Mean IQ scores for children whose
mothers smoke 10 or more cigarettes a day during pregnancy are
not the same as mean for those whose mothers do not smoke, in
populations similar to one from which this sample was drawn.
Case Study 6.4: Smoking During
Pregnancy and Child’s IQ
14
Children born to women who smoked 10+ cigarettes per day
during pregnancy had developmental quotients at 12 and 24
months of age that were 6.97 points lower (averaged across
these two time points) than children born to women who did not
smoke during pregnancy (95% CI: 1.62,12.31, P = .01); at 36
and 48 months they were 9.44 points lower (95% CI:
4.52, 14.35, P = .0002). (Olds et al., 1994, p. 223)
Researchers conducted two-tailed tests for possibility the mean
IQ score could actually be higher for those whose mothers
smoke. The CI provides evidence of the direction in which the
difference falls. The p-value simply tells us there is a
statistically significant difference.
For Those Who Like Formulas
15
For Those Who Like Formulas
16
For Those Who Like Formulas
17
Statistics
Spring 2019
Module 3 Comprehensive Problem
INFERENTIAL STATISTICS – HYPOTHESIS TESTING
Either individually or in groups of 2 or 3, your task is to
perform some real-world inferential statistics. You will take a
claim that someone has made, form a hypothesis from that,
collect the data necessary to test the hypothesis, perform a
hypothesis test, and interpret the results.
You will test to see if less than 50% of students participate in
the Student Evaluation of Teaching system (SETS) in the
School of Business Administration at USCA. Why or Why not?
Determine and describe the type of data that you will collect
and how you plan to collect this data in order to answer your
questions. You will need to collect data on many characteristics
of your sample so that these characteristics can later be
compared somehow (e.g., before and after data; comparisons by
gender, major, type, year, age, etc.) Define the population and
the sample that you will be studying. (you must sample at least
100 students in the SOBA)Project Components
The report will include a description of the problem, and why
you think it is important, or what you hope to gain from testing
the hypothesis. It should also include the context of the data, all
data collected, and the values generated in EXCEL. A decision
and conclusion should be stated. An analysis should follow
with what the conclusion means in terms of the original
problem. The report should be in narrative format like you were
writing for a newspaper or magazine, must be typed, printed,
and should be double spaced.
An excellent final report (100 points) will have the following
components.
· An introduction to the problem including the claim(s) being
tested
· The context (who, what, where, when, why, how) of the data
(remember this is in narrative format) and any possible
problems with collecting the data
· Descriptive statistics and/or tables depending on your type of
data
· Appropriate graphs (every project should have at least one
graph or chart of the data in it)
· Inferential statistics including ...
· the null and alternative hypotheses written symbolically
· statistical output including a test statistic and p-value
· a graph showing the critical and non-critical regions, test
statistic, and p-value
· the decision and a conclusion written in terms of the original
claim
· Conclusion
· Suggestions for the next time this project is done
· No statistical usage errors
What can we test?
Some things are easier to test than other things. The purpose of
this project is to expose you to the process of hypothesis testing
in a real-world application. You may test means, proportions, or
linear correlation. You may have one or more samples. You may
categorize your variables in one or two ways.
If you are dealing with one sample, then you will need some
numerical value to test against. The claim "more people prefer
Pepsi than Coke" becomes a claim that the proportion of Pepsi
drinkers is greater than 0.5. There are not two independent
samples (Pepsi drinkers / Coke drinkers), just one sample
categorized in two ways. A problem with the Pepsi / Coke thing
is that it omits other soft drinks because that is more difficult to
do. A chi-square goodness of fit test would be more appropriate
in this case.
Categorical Data
If your data consists solely of categories and not measured
quantities, then you should be looking at proportions or counts.
Things to look for that let you know you're dealing with
categorical data or proportions include: proportions, percents,
counts, frequencies, fractions, or ratios. If your data consists of
names or labels, you're dealing with categorical data.
You really need to think about the response that was recorded
for each case. Did you record a yes/no response for each case
or did you record a number that means something? If it was a
yes/no or other categorical data, then this is the place to be.
Example Claims about Categorical Data
· 93.1% of Americans feel there should not be nudity on
television during children's viewing time.
http://www.parentstv.org/PTC/publications/lbbcolumns/2003/05
28.asp
This is a claim about a single proportion. We know this because
the value includes a percentage and the data is categorical (yes
or no), not numerical. The original claim here could be written
as p=0.931.
Quantitative (Numerical) Data
If your data consists of measured quantities, then you will
probably be testing a mean or perhaps correlation between two
variables. It is possible to test a claim about a standard
deviation, but that is rare, and not covered in this course.
There are four main ways to analyze means.
1. A test about a single mean that requires a number as the
claimed value.
2. A test about two independent means doesn't need a number
because you compare them to each other. This compares the
same thing in two different groups.
3. A test for two dependent means, often called paired samples,
compares two values for each case in the same group.
4. The Analysis of Variance is an extension of the two
independent samples case where there are more than two
groups.
You can also perform correlation and regression with two
quantitative variables. Simple regression, with just one
predictor variable, is covered in the book. Multiple regression,
with several predictor variables, is not covered in the textbook
but is available online.
Example Claims about Quantitative Data:
· Women live five years longer than men.
http://www.medicalnewstoday.com/medicalnews.php?newsid=1
8866 This is a claim about two averages, the average lifespan of
women and that of men. We don't know the average of either
gender (they're given in the article), we just know that women
are supposed to live five years longer than men. When you're
working with one sample, it's important to have a value to
compare against, but with two samples, you don't need a value
for each, just the difference between the two (in this case 5
years). The original claim here could be written as μw-μm=5
(the difference in the mean ages of women and men is 5 years).
· Seat belts save lives. http://dot.state.il.us/trafficsafety/seatbelt
june 2006.pdf and http://www-
fars.nhtsa.dot.gov/FinalReport.cfm?stateid=17&title=states&titl
e2=fatalities_and_fatality_rates&year=2005. Okay, this claim
is all over the place, but I wanted to give some links on how it
would be tested.
You could take the data regarding the percent of people wearing
their seat belts and compare it to the fatality rate. These are two
numerical values that are paired together for each case
(probably based on an annual report). Remember that you
cannot perform correlation and regression with categorical
variables. The original claim that seat belts save lives would be
interpreted as a negative correlation (as seat belt use goes up,
fatalities go down) and would be written as ρ<0.
Sample Final Report
Available online are sample projects and resources. Your
project may not be as long or detailed.
Assignment is due April 15th, either electronically prior to the
start of class or a hard copy at the start of class.
Hypothesis testing
Hypothesis testing: procedure
1
6
7
8
We ask a yes/no question about a population.
We answer the question yes, and answer the question no, using
symbols for the population means.
We label one answer the null hypothesis and the other answer
the alternative hypothesis.
We decide the criterion for rejecting the null hypothesis. The
test is one of: two-tailed, right-tailed, or left-tailed. We take a
sample, and calculate our test statistic (Z or t for now)
We find if the observed test statistic is in the rejection region
(critical region or tail) of the distribution.
If the statistic is in the rejection region, we reject the null
hypothesis and accept the alternative hyopthesis.
If the statistic is not in the rejection region, we retain the null
hypothesis, and do not accept the alternative hypothesis.
2
3
4
5
9
STATISTICS
PROJECT:
Hypothesis
Testing
INTRODUCTION
My topic is the average tuition cost of a 4-yr. public college.
Since I will soon be transferring to a 4-yr. college, I thought
this topic would be perfect. "The College Board" says that the
average tuition cost of college is $5836 per year. I will be
researching online the costs of different public colleges to test
this claim. I will be using the T-test for a mean, since my
sample is going to be less than 30 and an unknown population
standard deviation. I will also use Chi-Square Test of
Independence.
HYPOTHESIS
I think the average cost of tuition is lower than the average
stated by “The College Board”.
Ho: mu >/= $5836.
H1: mu< $5836 (Claim)
DATA ANALYSIS
I collected my data from various college websites. I looked up
the cost of tuition per year and the number of students enrolled.
Here is what I came up with:
College
Tuition
Number of Students
Central Washington University
$4392
10,200
University of Washington
$5985
25,469
Washington State University
$5888
18,432
Western Washington University
$4356
13,000
Evergreen State University
$4590
4400
Eastern Washington University
$5904
10,000
Peninsula College
$3639
10,120
University of Oregon
$6174
20,394
Portland State University
$5208
24,284
Oregon State University
$5604
19,362
Southern Oregon University
$5233
5000
Eastern Oregon University
$4500
3000
Western Oregon University
$5763
4500
University of Idaho
$4410
11,739
Idaho State University
$4400
13,000
There weren’t really any large gaps or outliers in the data that I
collected. There was a gap between 5,000 – 10,000 students.
But the rest was mostly consistent. The lowest tuition was
$3639 from Peninsula College and the highest tuition was
$6174 from the University of Oregon. Some of the websites
were hard to find the information I wanted, but I eventually
found it. Some of the websites were specific as to undergraduate
or graduate and some probably contain both. I should have done
further research to make sure that my numbers only contain
undergraduates and not graduates. So, that is one possible
mistake in the data collection.
HYPOTHESIS TESTING
T-Test for a Mean
Step 1: State the hypothesis and identify the claim.
I claim that the average cost of college tuition is less than
$5836 per year as concluded from “The College Board”. At
a=.025, can it be concluded that the average is less than $5836
based on a sample of 15 colleges?
H0: mu>/= $5836
H1: mu<$5836 (claim)
Step 2: Find the critical value
At a=.025 and d.f. = 14, the critical value is -2.145.
Step 3: Compute the sample test value. m= 5069.73, s=787.80
t= (5069.73-5836)/(787.80/sqrt(15)) = -3.767
Step 4: Make the decision to reject or not reject the null
hypothesis. Reject the null hypotheses since -3.767 falls in the
critical region.
Step 5: Summarize the results.
I will reject the null hypotheses since there is enough evidence
to support the claim that the average cost of tuition is less than
$5836 per year.
Chi-Squared Independence Test
Step 1: State the hypotheses and identify the claim.
I claim that there is a correlation between the number of
students at a college and the cost of tuition per year. Here is
the data that I collected:
Cost of Tuition
Number of Students
Total
3000-9,999
10,000-16,999
17,000-23,999
24,000-30,999
$3500-4500
1
5
0
0
6
$4501-5500
2
0
0
1
3
$5501-6500
1
1
3
1
6
Total
4
6
3
2
15
At .025, can we conclude that the cost of tuition is dependent on
the number of students?
Ho: The cost of tuition is independent of the number of students
that attend the college. (x²=0)
H1: The cost of tuition is dependent on the number of students
that attend the college. (claim) (x²>0)
Step 2: Find the critical value:
The critical value is 14.449 since the degrees of freedom are (3-
1)(4-1)=6.
Step 3: Compute the test value.
First we have to find the expected value:
E1,1 = (6)(4)/15=1.6
E2,1 = (3)(4)/15=.8
E3,1 = (6)(4)/15=1.6
E1,2 = (6)(6)/15=2.4
E2,2 = (3)(6)/15=1.2
E3,2 = (6)(6)/15=2.4
E1,3 = (6)(3)/15=1.2
E2,3 = (3)(3)-15=.6
E3,3 = (6)(3)/15=1.2
E1,4 = (6)(2)/15=.8
E2,4 = (3)(2)/15=.4
E3,4 = (6)(2)/15=.8
The completed table is shown:
Cost of Tuition
Number of Students
Total
3000-9,999
10,000-16,999
17,000-23,999
24,000-30,999
$3500-4500
1 (1.6)
5 (2.4)
0 (1.2)
0 (.8)
6
$4501-5500
2 (.8)
0 (1.2)
0 (.6)
1 (.4)
3
$5501-6500
1 (1.6)
1 (2.4)
3 (1.2)
1 (.8)
6
Total
4
6
3
2
15
Then the test value is x² = ∑ (O-E)²/E
= (1-1.6)²/1.6 + (5-2.4)²/2.4 + (0-1.2)²/1.2 + (0-.8)²/.8 + (2-
.8)²/.8 + (0-
1.2)²/1.2 + (0-.6)²/.6 + (1-.4)²/.4 + (1-1.6)²/1.6 + (1-2.4)²/2.4 +
(3-1.2)²/1.2 + (1-
.8)²/.8
= 13.333
Step 4: Make the decision to reject or not to reject the null
hypothesis. Do not reject the null hypothesis since 13.333 is
less than 14.449.
Step 5: Summarize the results.
There is not enough evidence to support the claim that the cost
of tuition is dependent on the number of students that attend the
college.
SUMMARY
My first hypothesis test about the tuition cost of 4-year
universities being less than the average was correct. The
average as stated by “The College Board” said that the tuition
was $5836 per year. I thought that was a little high. The average
tuition of the fifteen colleges that I researched was $5069.73.
Maybe if I would have researched colleges all around the
country instead of just our surrounding states I would have
come up with different numbers. Another thing that may have
caused this test to be a little off was that when I was collecting
data, some of the costs of tuition may include other fees and
some may not. When I looked them up, some fees were listed
separately and some were not. This could have lead to a Type I
error where the null hypothesis was true and it was rejected.
My second hypothesis test about whether the cost of tuition is
dependant on the number of students that attend the college was
rejected. I thought that the fewer the students that attend a
specific college, that tuition would be cheaper, but that wasn’t
the case. One main problem I can see with colleting my data is
that on the college websites for the number of students, some
said “over” or “approximately”. So, these weren’t the exact
numbers of students enrolled. Also, as stated earlier, some of
the students could be undergraduates or graduates. Some of the
websites didn’t list them separately. Tuition is higher for
graduates, so they should not have been included in this study
and it would have thrown off the number of students. So, these
may have affected the outcome a little, but I don’t’ think
enough for it to change the hypothesis.
It would have also been interesting to test to see whether the
tuition is higher in urban areas where more people live verses
rural areas where there are not as
many people. I would be inclined to say that this is true, but it
would need to be tested further to say for sure. It would also be
interesting to do this same testing for private colleges to see if
they have the same results. I thought this was fun to come up
with our own hypothesis and try to prove ourselves right or
wrong using what we have learned all quarter. It was a good test
of our skills and it made me get a better understanding of how
the formulas really work rather than just doing the homework
examples in the book.

More Related Content

Similar to Statistics is both the science of uncertainty and the technology.docx

Simple Spreadsheet Tips
Simple Spreadsheet TipsSimple Spreadsheet Tips
Simple Spreadsheet TipsInside Access
 
Source of DATA
Source of DATASource of DATA
Source of DATANahid Amin
 
Focusing on specific data by using filterss
Focusing on specific data by using filterssFocusing on specific data by using filterss
Focusing on specific data by using filterssum5ashm
 
ROLL NO 1 TO 9(G1) USE OF EXCEL IN CA PROFESSION (Final Draft).pptx
ROLL NO 1 TO 9(G1) USE OF EXCEL IN CA PROFESSION (Final Draft).pptxROLL NO 1 TO 9(G1) USE OF EXCEL IN CA PROFESSION (Final Draft).pptx
ROLL NO 1 TO 9(G1) USE OF EXCEL IN CA PROFESSION (Final Draft).pptxDishantGola
 
Chapter 2: Frequency Distribution and Graphs
Chapter 2: Frequency Distribution and GraphsChapter 2: Frequency Distribution and Graphs
Chapter 2: Frequency Distribution and GraphsMong Mara
 
WorldCat Local Lists for Serials Reporting
WorldCat Local Lists for Serials ReportingWorldCat Local Lists for Serials Reporting
WorldCat Local Lists for Serials ReportingDarinlee Needham
 
Day2 session i&amp;ii - spss
Day2 session i&amp;ii - spssDay2 session i&amp;ii - spss
Day2 session i&amp;ii - spssabir hossain
 
Excel tutorial for frequency distribution
Excel tutorial for frequency distributionExcel tutorial for frequency distribution
Excel tutorial for frequency distributionS.c. Chopra
 
Excel tutorial for frequency distribution
Excel tutorial for frequency distributionExcel tutorial for frequency distribution
Excel tutorial for frequency distributionS.c. Chopra
 
Steps for q3 q4
Steps for q3 q4Steps for q3 q4
Steps for q3 q4IIUM
 
1. Outline the differences between Hoarding power and Encouraging..docx
1. Outline the differences between Hoarding power and Encouraging..docx1. Outline the differences between Hoarding power and Encouraging..docx
1. Outline the differences between Hoarding power and Encouraging..docxpaynetawnya
 
Summarizing Data : Listing and Grouping pdf
Summarizing Data : Listing and Grouping pdfSummarizing Data : Listing and Grouping pdf
Summarizing Data : Listing and Grouping pdfJustynOwen
 
Oracle_Analytical_function.pdf
Oracle_Analytical_function.pdfOracle_Analytical_function.pdf
Oracle_Analytical_function.pdfKalyankumarVenkat1
 
Excel Datamining Addin Advanced
Excel Datamining Addin AdvancedExcel Datamining Addin Advanced
Excel Datamining Addin Advancedexcel content
 
MAT 240 Random Sampling in Excel Tutorial This tutorial wi
MAT 240 Random Sampling in Excel Tutorial This tutorial wiMAT 240 Random Sampling in Excel Tutorial This tutorial wi
MAT 240 Random Sampling in Excel Tutorial This tutorial wiAbramMartino96
 

Similar to Statistics is both the science of uncertainty and the technology.docx (20)

Advanced Excel ppt
Advanced Excel pptAdvanced Excel ppt
Advanced Excel ppt
 
Simple Spreadsheet Tips
Simple Spreadsheet TipsSimple Spreadsheet Tips
Simple Spreadsheet Tips
 
Source of DATA
Source of DATASource of DATA
Source of DATA
 
Focusing on specific data by using filterss
Focusing on specific data by using filterssFocusing on specific data by using filterss
Focusing on specific data by using filterss
 
ROLL NO 1 TO 9(G1) USE OF EXCEL IN CA PROFESSION (Final Draft).pptx
ROLL NO 1 TO 9(G1) USE OF EXCEL IN CA PROFESSION (Final Draft).pptxROLL NO 1 TO 9(G1) USE OF EXCEL IN CA PROFESSION (Final Draft).pptx
ROLL NO 1 TO 9(G1) USE OF EXCEL IN CA PROFESSION (Final Draft).pptx
 
Chapter 2: Frequency Distribution and Graphs
Chapter 2: Frequency Distribution and GraphsChapter 2: Frequency Distribution and Graphs
Chapter 2: Frequency Distribution and Graphs
 
WorldCat Local Lists for Serials Reporting
WorldCat Local Lists for Serials ReportingWorldCat Local Lists for Serials Reporting
WorldCat Local Lists for Serials Reporting
 
Advance excel
Advance excelAdvance excel
Advance excel
 
Day2 session i&amp;ii - spss
Day2 session i&amp;ii - spssDay2 session i&amp;ii - spss
Day2 session i&amp;ii - spss
 
Excel tutorial for frequency distribution
Excel tutorial for frequency distributionExcel tutorial for frequency distribution
Excel tutorial for frequency distribution
 
Excel tutorial for frequency distribution
Excel tutorial for frequency distributionExcel tutorial for frequency distribution
Excel tutorial for frequency distribution
 
Excel Training
Excel TrainingExcel Training
Excel Training
 
Steps for q3 q4
Steps for q3 q4Steps for q3 q4
Steps for q3 q4
 
1. Outline the differences between Hoarding power and Encouraging..docx
1. Outline the differences between Hoarding power and Encouraging..docx1. Outline the differences between Hoarding power and Encouraging..docx
1. Outline the differences between Hoarding power and Encouraging..docx
 
Summarizing Data : Listing and Grouping pdf
Summarizing Data : Listing and Grouping pdfSummarizing Data : Listing and Grouping pdf
Summarizing Data : Listing and Grouping pdf
 
Excel2002
Excel2002Excel2002
Excel2002
 
Oracle_Analytical_function.pdf
Oracle_Analytical_function.pdfOracle_Analytical_function.pdf
Oracle_Analytical_function.pdf
 
Excel Datamining Addin Advanced
Excel Datamining Addin AdvancedExcel Datamining Addin Advanced
Excel Datamining Addin Advanced
 
Excel Datamining Addin Advanced
Excel Datamining Addin AdvancedExcel Datamining Addin Advanced
Excel Datamining Addin Advanced
 
MAT 240 Random Sampling in Excel Tutorial This tutorial wi
MAT 240 Random Sampling in Excel Tutorial This tutorial wiMAT 240 Random Sampling in Excel Tutorial This tutorial wi
MAT 240 Random Sampling in Excel Tutorial This tutorial wi
 

More from rafaelaj1

Statistica Sinica 16(2006), 847-860PSEUDO-R2IN LOGIS.docx
Statistica Sinica 16(2006), 847-860PSEUDO-R2IN LOGIS.docxStatistica Sinica 16(2006), 847-860PSEUDO-R2IN LOGIS.docx
Statistica Sinica 16(2006), 847-860PSEUDO-R2IN LOGIS.docxrafaelaj1
 
Stations yourself somewhere (library, cafeteria, etc.) and observe.docx
Stations yourself somewhere (library, cafeteria, etc.) and observe.docxStations yourself somewhere (library, cafeteria, etc.) and observe.docx
Stations yourself somewhere (library, cafeteria, etc.) and observe.docxrafaelaj1
 
StatementState legislatures continue to advance policy proposals.docx
StatementState legislatures continue to advance policy proposals.docxStatementState legislatures continue to advance policy proposals.docx
StatementState legislatures continue to advance policy proposals.docxrafaelaj1
 
StatementState legislatures continue to advance policy propo.docx
StatementState legislatures continue to advance policy propo.docxStatementState legislatures continue to advance policy propo.docx
StatementState legislatures continue to advance policy propo.docxrafaelaj1
 
Statement of PurposeProvide a statement of your educational .docx
Statement of PurposeProvide a statement of your educational .docxStatement of PurposeProvide a statement of your educational .docx
Statement of PurposeProvide a statement of your educational .docxrafaelaj1
 
States and the federal government should not use private prisons for.docx
States and the federal government should not use private prisons for.docxStates and the federal government should not use private prisons for.docx
States and the federal government should not use private prisons for.docxrafaelaj1
 
StatementState legislatures continue to advance policy proposa.docx
StatementState legislatures continue to advance policy proposa.docxStatementState legislatures continue to advance policy proposa.docx
StatementState legislatures continue to advance policy proposa.docxrafaelaj1
 
Statement of Interest (This is used to apply for Graduate Schoo.docx
Statement of Interest (This is used to apply for Graduate Schoo.docxStatement of Interest (This is used to apply for Graduate Schoo.docx
Statement of Interest (This is used to apply for Graduate Schoo.docxrafaelaj1
 
StatementState  legislatures continue to advance policy prop.docx
StatementState  legislatures continue to advance policy prop.docxStatementState  legislatures continue to advance policy prop.docx
StatementState  legislatures continue to advance policy prop.docxrafaelaj1
 
Statement of cash flows (indirect method) Cash flows from ope.docx
Statement of cash flows (indirect method)  Cash flows from ope.docxStatement of cash flows (indirect method)  Cash flows from ope.docx
Statement of cash flows (indirect method) Cash flows from ope.docxrafaelaj1
 
Stateline Shipping and Transport CompanyRachel Sundusky is the m.docx
Stateline Shipping and Transport CompanyRachel Sundusky is the m.docxStateline Shipping and Transport CompanyRachel Sundusky is the m.docx
Stateline Shipping and Transport CompanyRachel Sundusky is the m.docxrafaelaj1
 
State Two ways in which Neanderthals and Cro-Magnons differed.      .docx
State Two ways in which Neanderthals and Cro-Magnons differed.      .docxState Two ways in which Neanderthals and Cro-Magnons differed.      .docx
State Two ways in which Neanderthals and Cro-Magnons differed.      .docxrafaelaj1
 
STAT 3300 Homework #6Due Thursday, 03282019Note Answe.docx
STAT 3300 Homework #6Due Thursday, 03282019Note Answe.docxSTAT 3300 Homework #6Due Thursday, 03282019Note Answe.docx
STAT 3300 Homework #6Due Thursday, 03282019Note Answe.docxrafaelaj1
 
State Standard by Content AreaLiteracy State Standard to Integra.docx
State Standard by Content AreaLiteracy State Standard to Integra.docxState Standard by Content AreaLiteracy State Standard to Integra.docx
State Standard by Content AreaLiteracy State Standard to Integra.docxrafaelaj1
 
STAT200 Assignment #2 - Descriptive Statistics Analysis and.docx
STAT200 Assignment #2 - Descriptive Statistics Analysis and.docxSTAT200 Assignment #2 - Descriptive Statistics Analysis and.docx
STAT200 Assignment #2 - Descriptive Statistics Analysis and.docxrafaelaj1
 
STAT200 Assignment #2 - Descriptive Statistics Analysis Writeup -.docx
STAT200 Assignment #2 - Descriptive Statistics Analysis Writeup -.docxSTAT200 Assignment #2 - Descriptive Statistics Analysis Writeup -.docx
STAT200 Assignment #2 - Descriptive Statistics Analysis Writeup -.docxrafaelaj1
 
State legislatures continue to advance policy proposals to address c.docx
State legislatures continue to advance policy proposals to address c.docxState legislatures continue to advance policy proposals to address c.docx
State legislatures continue to advance policy proposals to address c.docxrafaelaj1
 
State FLORIDAInstructionsThis written assignment requ.docx
State FLORIDAInstructionsThis written assignment requ.docxState FLORIDAInstructionsThis written assignment requ.docx
State FLORIDAInstructionsThis written assignment requ.docxrafaelaj1
 
State of the Science Quality ImprovementNameInst.docx
State of the Science Quality ImprovementNameInst.docxState of the Science Quality ImprovementNameInst.docx
State of the Science Quality ImprovementNameInst.docxrafaelaj1
 
State Data_1986-2015YearGross state product per capitaEducation sp.docx
State Data_1986-2015YearGross state product per capitaEducation sp.docxState Data_1986-2015YearGross state product per capitaEducation sp.docx
State Data_1986-2015YearGross state product per capitaEducation sp.docxrafaelaj1
 

More from rafaelaj1 (20)

Statistica Sinica 16(2006), 847-860PSEUDO-R2IN LOGIS.docx
Statistica Sinica 16(2006), 847-860PSEUDO-R2IN LOGIS.docxStatistica Sinica 16(2006), 847-860PSEUDO-R2IN LOGIS.docx
Statistica Sinica 16(2006), 847-860PSEUDO-R2IN LOGIS.docx
 
Stations yourself somewhere (library, cafeteria, etc.) and observe.docx
Stations yourself somewhere (library, cafeteria, etc.) and observe.docxStations yourself somewhere (library, cafeteria, etc.) and observe.docx
Stations yourself somewhere (library, cafeteria, etc.) and observe.docx
 
StatementState legislatures continue to advance policy proposals.docx
StatementState legislatures continue to advance policy proposals.docxStatementState legislatures continue to advance policy proposals.docx
StatementState legislatures continue to advance policy proposals.docx
 
StatementState legislatures continue to advance policy propo.docx
StatementState legislatures continue to advance policy propo.docxStatementState legislatures continue to advance policy propo.docx
StatementState legislatures continue to advance policy propo.docx
 
Statement of PurposeProvide a statement of your educational .docx
Statement of PurposeProvide a statement of your educational .docxStatement of PurposeProvide a statement of your educational .docx
Statement of PurposeProvide a statement of your educational .docx
 
States and the federal government should not use private prisons for.docx
States and the federal government should not use private prisons for.docxStates and the federal government should not use private prisons for.docx
States and the federal government should not use private prisons for.docx
 
StatementState legislatures continue to advance policy proposa.docx
StatementState legislatures continue to advance policy proposa.docxStatementState legislatures continue to advance policy proposa.docx
StatementState legislatures continue to advance policy proposa.docx
 
Statement of Interest (This is used to apply for Graduate Schoo.docx
Statement of Interest (This is used to apply for Graduate Schoo.docxStatement of Interest (This is used to apply for Graduate Schoo.docx
Statement of Interest (This is used to apply for Graduate Schoo.docx
 
StatementState  legislatures continue to advance policy prop.docx
StatementState  legislatures continue to advance policy prop.docxStatementState  legislatures continue to advance policy prop.docx
StatementState  legislatures continue to advance policy prop.docx
 
Statement of cash flows (indirect method) Cash flows from ope.docx
Statement of cash flows (indirect method)  Cash flows from ope.docxStatement of cash flows (indirect method)  Cash flows from ope.docx
Statement of cash flows (indirect method) Cash flows from ope.docx
 
Stateline Shipping and Transport CompanyRachel Sundusky is the m.docx
Stateline Shipping and Transport CompanyRachel Sundusky is the m.docxStateline Shipping and Transport CompanyRachel Sundusky is the m.docx
Stateline Shipping and Transport CompanyRachel Sundusky is the m.docx
 
State Two ways in which Neanderthals and Cro-Magnons differed.      .docx
State Two ways in which Neanderthals and Cro-Magnons differed.      .docxState Two ways in which Neanderthals and Cro-Magnons differed.      .docx
State Two ways in which Neanderthals and Cro-Magnons differed.      .docx
 
STAT 3300 Homework #6Due Thursday, 03282019Note Answe.docx
STAT 3300 Homework #6Due Thursday, 03282019Note Answe.docxSTAT 3300 Homework #6Due Thursday, 03282019Note Answe.docx
STAT 3300 Homework #6Due Thursday, 03282019Note Answe.docx
 
State Standard by Content AreaLiteracy State Standard to Integra.docx
State Standard by Content AreaLiteracy State Standard to Integra.docxState Standard by Content AreaLiteracy State Standard to Integra.docx
State Standard by Content AreaLiteracy State Standard to Integra.docx
 
STAT200 Assignment #2 - Descriptive Statistics Analysis and.docx
STAT200 Assignment #2 - Descriptive Statistics Analysis and.docxSTAT200 Assignment #2 - Descriptive Statistics Analysis and.docx
STAT200 Assignment #2 - Descriptive Statistics Analysis and.docx
 
STAT200 Assignment #2 - Descriptive Statistics Analysis Writeup -.docx
STAT200 Assignment #2 - Descriptive Statistics Analysis Writeup -.docxSTAT200 Assignment #2 - Descriptive Statistics Analysis Writeup -.docx
STAT200 Assignment #2 - Descriptive Statistics Analysis Writeup -.docx
 
State legislatures continue to advance policy proposals to address c.docx
State legislatures continue to advance policy proposals to address c.docxState legislatures continue to advance policy proposals to address c.docx
State legislatures continue to advance policy proposals to address c.docx
 
State FLORIDAInstructionsThis written assignment requ.docx
State FLORIDAInstructionsThis written assignment requ.docxState FLORIDAInstructionsThis written assignment requ.docx
State FLORIDAInstructionsThis written assignment requ.docx
 
State of the Science Quality ImprovementNameInst.docx
State of the Science Quality ImprovementNameInst.docxState of the Science Quality ImprovementNameInst.docx
State of the Science Quality ImprovementNameInst.docx
 
State Data_1986-2015YearGross state product per capitaEducation sp.docx
State Data_1986-2015YearGross state product per capitaEducation sp.docxState Data_1986-2015YearGross state product per capitaEducation sp.docx
State Data_1986-2015YearGross state product per capitaEducation sp.docx
 

Recently uploaded

Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxsocialsciencegdgrohi
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfadityarao40181
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitolTechU
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 

Recently uploaded (20)

Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdf
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptx
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 

Statistics is both the science of uncertainty and the technology.docx

  • 1. Statistics is both the science of uncertainty and the technology of extracting information from data. A statistic is a summary measure of data. Descriptive statistics are methods that describe and summarize data. Microsoft Excel supports statistical analysis in two ways: 1. Statistical functions 2. Analysis Toolpak add-in Statistical Methods for Summarizing Data A frequency distribution is a table that shows the number of observations in each of several nonoverlapping groups. Categorical variables naturally define the groups in a frequency distribution. To construct a frequency distribution, we need only count the number of observations that appear in each category. This can be done using the Excel COUNTIF function. Frequency Distributions for Categorical Data Example 3.16: Constructing a Frequency Distribution for Items in the Purchase Orders Database List the item names in a column on the spreadsheet. Use the function =COUNTIF($D$4:$D$97,cell_reference), where cell_reference is the cell containing the item name
  • 2. Example 3.16: Constructing a Frequency Distribution for Items in the Purchase Orders Database Construct a column chart to visualize the frequencies. Relative frequency is the fraction, or proportion, of the total. If a data set has n observations, the relative frequency of category i is: We often multiply the relative frequencies by 100 to express them as percentages. A relative frequency distribution is a tabular summary of the relative frequencies of all categories. Relative Frequency Distributions Example 3.17: Constructing a Relative Frequency Distribution for Items in the Purchase Orders Database First, sum the frequencies to find the total number (note that the
  • 3. sum of the frequencies must be the same as the total number of observations, n). Then divide the frequency of each category by this value. For numerical data that consist of a small number of discrete values, we may construct a frequency distribution similar to the way we did for categorical data; that is, we simply use COUNTIF to count the frequencies of each discrete value. Frequency Distributions for Numerical Data In the Purchase Orders data, the A/P terms are all whole numbers 15, 25, 30, and 45. Example 3.18: Frequency and Relative Frequency Distribution for A/P Terms A graphical depiction of a frequency distribution for numerical data in the form of a column chart is called a histogram. Frequency distributions and histograms can be created using the Analysis Toolpak in Excel. Click the Data Analysis tools button in the Analysis group under the Data tab in the Excel menu bar and select Histogram
  • 4. from the list. Excel Histogram Tool Specify the Input Range corresponding to the data. If you include the column header, then also check the Labels box so Excel knows that the range contains a label. The Bin Range defines the groups (Excel calls these “bins”) used for the frequency distribution. Histogram Dialog If you do not specify a Bin Range, Excel will automatically determine bin values for the frequency distribution and histogram, which often results in a rather poor choice. If you have discrete values, set up a column of these values in your spreadsheet for the bin range and specify this range in the Bin Range field. Using Bin Ranges We will create a frequency distribution and histogram for the A/P Terms variable in the Purchase Orders database. We defined the bin range below the data in cells H99:H103 as follows:
  • 5. Month 15 25 30 45 Example 3.19: Using the Histogram Tool Histogram tool results: Example 3.19: Using the Histogram Tool For numerical data that have many different discrete values with little repetition or are continuous, a frequency distribution requires that we define by specifying the number of groups, the width of each group, and the upper and lower limits of each group. Choose between 5 to 15 groups, and the range of each should be equal. Choose the lower limit of the first group (LL) as a whole number smaller than the minimum data value and the upper limit of the last group (UL) as a whole number larger than the maximum data value. Histograms for Numerical Data
  • 6. The data range from a minimum of $68.75 to a maximum of $127,500; set the lower limit of the first group to $0 and the upper limit of the last group to $130,000. If we select 5 groups, using equation (3.2) the width of each group is ($130,000 - 0) / 5 = $26,000 Example 3.20: Constructing a Frequency Distribution and Histogram for Cost per Order Ten-group histogram Example 3.20: Constructing a Frequency Distribution and Histogram for Cost per Order Set the cumulative relative frequency of the first group equal to its relative frequency. Then add the relative frequency of the next group to the cumulative relative frequency. For, example, the cumulative relative frequency in cell D3 is computed as =D2+C3 = 0.000 + 0.447 = 0.447. Example 3.21 Computing Cumulative Relative Frequencies
  • 7. The kth percentile is a value at or below which at least k percent of the observations lie. The most common way to compute the kth percentile is to order the data values from smallest to largest and calculate the rank of the kth percentile using the formula: Statistical software use different methods that often involve interpolating between ranks instead of rounding, thus producing different results. The Excel function PERCENTILE.INC(array, k) computes the kth percentile of data in the range specified in the array field, where k is in the range 0 to 1, inclusive (i.e., including 0 and 1). Percentiles Compute the 90th percentile for Cost per order in the Purchase Orders data. Rank of kth percentile = nk/100 + 0.5 n = 94; k = 90 For the 90th percentile, the rank is = 94(90)/100+0.5 = 85.1 (round to 85) Value of the 85th observation = $74,375 Using the Excel function PERCENTILE.INC(G4:G97,0.9), the 90th percentile is $73,737.50, which is different from using formula (3.3).
  • 8. Examples 3.22 and 3.23: Computing Percentiles Data > Data Analysis > Rank and Percentile 90.3rd percentile = $74,375 (same result as manually computing the 90th percentile) Example 3.24 Excel Rank and Percentile Tool The Excel value of the 90th percentile that was computed in Example 3.23 as $74,375 is the 90.3rd percentile value. Quartiles break the data into four parts. The 25th percentile is called the first quartile,Q1; the 50th percentile is called the second quartile, Q2; the 75th percentile is called the third quartile, Q3; and the 100th percentile is the fourth quartile, Q4. One-fourth of the data fall below the first quartile, one-half are below the second quartile, and three-fourths are below the third quartile. Excel function QUARTILE. INC(array, quart), where array specifies the range of the data and quart is a whole number between 1 and 4, designating the desired quartile. Quartiles
  • 9. Compute the Quartiles of the Cost per Order data First quartile: =QUARTILE.INC(G4:G97,1) = $6,757.81 Second quartile: =QUARTILE.INC(G4:G97,2) = $15,656.25 Third quartile: =QUARTILE.INC(G4:G97,3) = $27,593.75 Fourth quartile: =QUARTILE.INC(G4:G97,4) = $127,500.00 Example 3.25 Computing Quartiles in Excel A cross-tabulation is a tabular method that displays the number of observations in a data set for different subcategories of two categorical variables. A cross-tabulation table is often called a contingency table. The subcategories of the variables must be mutually exclusive and exhaustive, meaning that each observation can be classified into only one subcategory, and, taken together over all subcategories, they must constitute the complete data set. Cross-Tabulations Sales Transactions database
  • 10. Count the number (and compute the percentage) of books and DVDs ordered by region. Example 3.26: Constructing a Cross-Tabulation Cross-Tabulation Visualization: Chart of Regional Sales by Product Select the Insert tab. Highlight the data. Click on chart type, then subtype. Use Chart Tools to customize. Creating Charts in Microsoft Excel
  • 11. Excel distinguishes between vertical and horizontal bar charts, calling the former column charts and the latter bar charts. A clustered column chart compares values across categories using vertical rectangles; a stacked column chart displays the contribution of each value to the total by stacking the rectangles; a 100% stacked column chart compares the percentage that each value contributes to a total. Column and bar charts are useful for comparing categorical or ordinal data, for illustrating differences between sets of values, and for showing proportions or percentages of a whole. Column and Bar Charts Example 3.2: Creating a Column Chart Highlighted Cells Highlight the range C3:K6, which includes the headings and data for each category. Click on the Column Chart button and then on the first chart type in the list (a clustered column chart). Example 3.2: Creating a Column Chart To add a title, click on the first icon in the Chart Layouts group. Click on “Chart Title” in the chart and change it to “EEO Employment Report—Alabama.” The names of the data series can be changed by clicking on the Select Data button in the
  • 12. Data group of the Design tab. In the Select Data Source dialog (see below), click on “Series1” and then the Edit button. Enter the name of the data series, in this case “All Employees.” Change the names of the other data series to “Men” and “Women” in a similar fashion. Line charts provide a useful means for displaying data over time. You may plot multiple data series in line charts; however, they can be difficult to interpret if the magnitude of the data values differs greatly. In that case, it would be advisable to create separate charts for each data series. Line Charts Example 3.3: A Line Chart for China Export Data Pie Charts A pie chart displays this by partitioning a circle into pie-shaped areas showing the relative proportion. Example 3.4: A Pie Chart for Census Data
  • 13. Pie Charts Data visualization professionals don't recommend using pie charts. In a pie chart, it is difficult to compare the relative sizes of areas; however, the bars in the column chart can easily be compared to determine relative ratios of the data. If you do use pie charts, restrict them to small numbers of categories, always ensure that the numbers add to 100%, and use labels to display the group names and actual percentages. Avoid three-dimensional (3-D) pie charts—especially those that are rotated—and keep them simple. An area chart combines the features of a pie chart with those of line charts. Area charts present more information than pie or line charts alone but may clutter the observer’s mind with too many details if too many data series are used; thus, they should be used with care. Area Charts Example 3.5: An Area Chart for Energy Consumption Scatter charts show the relationship between two variables. To construct a scatter chart, we need observations that consist of pairs of variables. Scatter Charts
  • 14. Example 3.6: A Scatter Chart for Real Estate Data A bubble chart is a type of scatter chart in which the size of the data marker corresponds to the value of a third variable; consequently, it is a way to plot three variables in two dimensions. Bubble Charts Example 3.7: A Bubble Chart for Stock Comparisons Stock chart Surface chart Doughnut chart Radar chart Miscellaneous Excel Charts Many applications of business analytics involve geographic data. Visualizing geographic data can highlight key data relationships, identify trends, and uncover business opportunities. In addition, it can often help to spot data errors and help end users understand solutions, thus increasing the likelihood of acceptance of decision models. Companies like Nike use geographic data and information
  • 15. systems for visualizing where products are being distributed and how that relates to demographic and sales information. This information is vital to marketing strategies. Geographic mapping capabilities were introduced in Excel 2000 but were not available in Excel 2002 and later versions. These capabilities are now available through Microsoft MapPoint 2010, which must be purchased separately. Geographic Data Visualizing and Exploring Data Data visualization - the process of displaying data (often in large quantities) in a meaningful fashion to provide insights that will support better decisions. Data visualization improves decision-making, provides managers with better analysis capabilities that reduce reliance on IT professionals, and improves collaboration and information sharing. Data Visualization Tabular data can be used to determine exactly how many units
  • 16. of a certain product were sold in a particular month, or to compare one month to another. For example, we see that sales of product A dropped in February, specifically by 6.7% (computed as 1 – B3/B2). Beyond such calculations, however, it is difficult to draw big picture conclusions. Example 3.1: Tabular vs. Visual Data Analysis A visual chart provides the means to easily compare overall sales of different products (Product C sells the least, for example); identify trends (sales of Product D are increasing), other patterns (sales of Product C is relatively stable while sales of Product B fluctuates more over time), and exceptions (Product E’s sales fell considerably in September). Example 3.1: Tabular vs. Visual Data Analysis A dashboard is a visual representation of a set of key business measures. It is derived from the analogy of an automobile’s control panel, which displays speed, gasoline level, temperature, and so on. Dashboards provide important summaries of key business information to help manage a business process or function. Dashboards
  • 17. Hypothesis Testing – Examples and Case Studies How Hypothesis Tests Are Reported Determine the null hypothesis and the alternative hypothesis. Collect and summarize the data into a test statistic. Use the test statistic to determine the p-value. The result is statistically significant if the p-value is less than or equal to the level of significance. 2
  • 18. Testing Hypotheses About Proportions and Means If the null and alternative hypotheses are expressed in terms of a population proportion, mean, or difference between two means and if the sample sizes are large … … the test statistic is simply the corresponding standardized score computed assuming the null hypothesis is true; and the p- value is found from a table of percentiles for standardized scores. 3 Example 2: Weight Loss for Diet vs Exercise Did dieters lose more fat than the exercisers? Diet Only: sample mean = 5.9 kg sample standard deviation = 4.1 kg sample size = n = 42 Exercise Only: sample mean = 4.1 kg sample standard deviation = 3.7 kg sample size = n = 47 measure of variability = [(0.633)2 + (0.540)2] = 0.83
  • 19. 4 Example 2: Weight Loss for Diet vs Exercise Step 1. Determine the null and alternative hypotheses. Null hypothesis: No difference in average fat lost in population for two methods. Population mean difference is zero. Alternative hypothesis: There is a difference in average fat lost in population for two methods. Population mean difference is not zero. Step 2. Collect and summarize data into a test statistic. The sample mean difference = 5.9 – 4.1 = 1.8 kg and the standard error of the difference is 0.83. So the test statistic: z = 1.8 – 0 = 2.17 0.83 5 Example 2: Weight Loss for Diet vs Exercise Step 3. Determine the p-value. Recall the alternative hypothesis was two-sided. p- -shaped curve above 2.17]
  • 20. Step 4. Make a decision. The p-value of 0.03 is less than or equal to 0.05, so … If really no difference between dieting and exercise as fat loss methods, would see such an extreme result only 3% of the time, or 3 times out of 100. Prefer to believe truth does not lie with null hypothesis. We conclude that there is a statistically significant difference between average fat loss for the two methods. 6 Example 3: Public Opinion About President On May 16, 1994, Newsweek reported the results of a public opinion poll that asked: “From everything you know about Bill Clinton, does he have the honesty and integrity you expect in a president?” (p. 23). Poll surveyed 518 adults and 233, or 0.45 of them (clearly less than half), answered yes. Could Clinton’s adversaries conclude from this that only a minority (less than half) of the population of Americans thought Clinton had the honesty and integrity to be president? 7 Example 3: Public Opinion About President
  • 21. Step 1. Determine the null and alternative hypotheses. Null hypothesis: There is no clear winning opinion on this issue; the proportions who would answer yes or no are each 0.50. Alternative hypothesis: Fewer than 0.50, or 50%, of the population would answer yes to this question. The majority do not think Clinton has the honesty and integrity to be president. Step 2. Collect and summarize data into a test statistic. Sample proportion is: 233/518 = 0.45. The standard deviation = – 0.50) = 0.022. 518 Test statistic: z = (0.45 – 0.50)/0.022 = –2.27 8 Example 3: Public Opinion About President Step 3. Determine the p-value. Recall the alternative hypothesis was one-sided. p-value = proportion of bell-shaped curve below –2.27 Exact p- value = 0.0116. Step 4. Make a decision. The p-value of 0.0116 is less than 0.05, so we conclude that the proportion of American adults in 1994 who believed Bill
  • 22. Clinton had the honesty and integrity they expected in a president was significantly less than a majority. 9 Revisiting Case Studies: How Journals Present Tests Whereas newspapers and magazines tend to simply report the decision from hypothesis testing, journals tend to report p- values as well. This allows you to make your own decision, based on the severity of a type 1 error and the magnitude of the p-value. 10 Case Study 5.1: Quitting Smoking with Nicotine Patches 11 Compared the smoking cessation rates for smokers randomly assigned to use a nicotine patch versus a placebo patch. Null hypothesis: The proportion of smokers in the population who would quit smoking using a nicotine patch and a placebo patch are the same. Alternative hypothesis: The proportion of smokers in the population who would quit smoking using a nicotine patch is
  • 23. higher than the proportion who would quit using a placebo patch. Case Study 5.1: Quitting Smoking with Nicotine Patches 12 Higher smoking cessation rates were observed in the active nicotine patch group at 8 weeks (46.7% vs 20%) (P < .001) and at 1 year (27.5% vs 14.2%) (P = .011). (Hurt et al., 1994, p. 595) Conclusion: p-values are quite small: less than 0.001 for difference after 8 weeks and equal to 0.011 for difference after a year. Therefore, rates of quitting are significantly higher using a nicotine patch than using a placebo patch after 8 weeks and after 1 year. Case Study 6.4: Smoking During Pregnancy and Child’s IQ 13 Study investigated impact of maternal smoking on subsequent IQ of child at ages 1, 2, 3, and 4 years of age.
  • 24. Null hypothesis: Mean IQ scores for children whose mothers smoke 10 or more cigarettes a day during pregnancy are same as mean for those whose mothers do not smoke, in populations similar to one from which this sample was drawn. Alternative hypothesis: Mean IQ scores for children whose mothers smoke 10 or more cigarettes a day during pregnancy are not the same as mean for those whose mothers do not smoke, in populations similar to one from which this sample was drawn. Case Study 6.4: Smoking During Pregnancy and Child’s IQ 14 Children born to women who smoked 10+ cigarettes per day during pregnancy had developmental quotients at 12 and 24 months of age that were 6.97 points lower (averaged across these two time points) than children born to women who did not smoke during pregnancy (95% CI: 1.62,12.31, P = .01); at 36 and 48 months they were 9.44 points lower (95% CI: 4.52, 14.35, P = .0002). (Olds et al., 1994, p. 223) Researchers conducted two-tailed tests for possibility the mean IQ score could actually be higher for those whose mothers smoke. The CI provides evidence of the direction in which the difference falls. The p-value simply tells us there is a statistically significant difference. For Those Who Like Formulas
  • 25. 15 For Those Who Like Formulas 16 For Those Who Like Formulas 17 Statistics Spring 2019 Module 3 Comprehensive Problem
  • 26. INFERENTIAL STATISTICS – HYPOTHESIS TESTING Either individually or in groups of 2 or 3, your task is to perform some real-world inferential statistics. You will take a claim that someone has made, form a hypothesis from that, collect the data necessary to test the hypothesis, perform a hypothesis test, and interpret the results. You will test to see if less than 50% of students participate in the Student Evaluation of Teaching system (SETS) in the School of Business Administration at USCA. Why or Why not? Determine and describe the type of data that you will collect and how you plan to collect this data in order to answer your questions. You will need to collect data on many characteristics of your sample so that these characteristics can later be compared somehow (e.g., before and after data; comparisons by gender, major, type, year, age, etc.) Define the population and the sample that you will be studying. (you must sample at least 100 students in the SOBA)Project Components The report will include a description of the problem, and why you think it is important, or what you hope to gain from testing the hypothesis. It should also include the context of the data, all data collected, and the values generated in EXCEL. A decision and conclusion should be stated. An analysis should follow with what the conclusion means in terms of the original problem. The report should be in narrative format like you were writing for a newspaper or magazine, must be typed, printed, and should be double spaced. An excellent final report (100 points) will have the following components. · An introduction to the problem including the claim(s) being tested · The context (who, what, where, when, why, how) of the data
  • 27. (remember this is in narrative format) and any possible problems with collecting the data · Descriptive statistics and/or tables depending on your type of data · Appropriate graphs (every project should have at least one graph or chart of the data in it) · Inferential statistics including ... · the null and alternative hypotheses written symbolically · statistical output including a test statistic and p-value · a graph showing the critical and non-critical regions, test statistic, and p-value · the decision and a conclusion written in terms of the original claim · Conclusion · Suggestions for the next time this project is done · No statistical usage errors What can we test? Some things are easier to test than other things. The purpose of this project is to expose you to the process of hypothesis testing in a real-world application. You may test means, proportions, or linear correlation. You may have one or more samples. You may categorize your variables in one or two ways. If you are dealing with one sample, then you will need some numerical value to test against. The claim "more people prefer Pepsi than Coke" becomes a claim that the proportion of Pepsi drinkers is greater than 0.5. There are not two independent samples (Pepsi drinkers / Coke drinkers), just one sample categorized in two ways. A problem with the Pepsi / Coke thing is that it omits other soft drinks because that is more difficult to do. A chi-square goodness of fit test would be more appropriate in this case. Categorical Data If your data consists solely of categories and not measured quantities, then you should be looking at proportions or counts.
  • 28. Things to look for that let you know you're dealing with categorical data or proportions include: proportions, percents, counts, frequencies, fractions, or ratios. If your data consists of names or labels, you're dealing with categorical data. You really need to think about the response that was recorded for each case. Did you record a yes/no response for each case or did you record a number that means something? If it was a yes/no or other categorical data, then this is the place to be. Example Claims about Categorical Data · 93.1% of Americans feel there should not be nudity on television during children's viewing time. http://www.parentstv.org/PTC/publications/lbbcolumns/2003/05 28.asp This is a claim about a single proportion. We know this because the value includes a percentage and the data is categorical (yes or no), not numerical. The original claim here could be written as p=0.931. Quantitative (Numerical) Data If your data consists of measured quantities, then you will probably be testing a mean or perhaps correlation between two variables. It is possible to test a claim about a standard deviation, but that is rare, and not covered in this course. There are four main ways to analyze means. 1. A test about a single mean that requires a number as the claimed value. 2. A test about two independent means doesn't need a number because you compare them to each other. This compares the same thing in two different groups. 3. A test for two dependent means, often called paired samples, compares two values for each case in the same group. 4. The Analysis of Variance is an extension of the two independent samples case where there are more than two groups.
  • 29. You can also perform correlation and regression with two quantitative variables. Simple regression, with just one predictor variable, is covered in the book. Multiple regression, with several predictor variables, is not covered in the textbook but is available online. Example Claims about Quantitative Data: · Women live five years longer than men. http://www.medicalnewstoday.com/medicalnews.php?newsid=1 8866 This is a claim about two averages, the average lifespan of women and that of men. We don't know the average of either gender (they're given in the article), we just know that women are supposed to live five years longer than men. When you're working with one sample, it's important to have a value to compare against, but with two samples, you don't need a value for each, just the difference between the two (in this case 5 years). The original claim here could be written as μw-μm=5 (the difference in the mean ages of women and men is 5 years). · Seat belts save lives. http://dot.state.il.us/trafficsafety/seatbelt june 2006.pdf and http://www- fars.nhtsa.dot.gov/FinalReport.cfm?stateid=17&title=states&titl e2=fatalities_and_fatality_rates&year=2005. Okay, this claim is all over the place, but I wanted to give some links on how it would be tested. You could take the data regarding the percent of people wearing their seat belts and compare it to the fatality rate. These are two numerical values that are paired together for each case (probably based on an annual report). Remember that you cannot perform correlation and regression with categorical variables. The original claim that seat belts save lives would be interpreted as a negative correlation (as seat belt use goes up, fatalities go down) and would be written as ρ<0. Sample Final Report Available online are sample projects and resources. Your project may not be as long or detailed.
  • 30. Assignment is due April 15th, either electronically prior to the start of class or a hard copy at the start of class. Hypothesis testing
  • 31. Hypothesis testing: procedure 1 6 7 8 We ask a yes/no question about a population. We answer the question yes, and answer the question no, using symbols for the population means. We label one answer the null hypothesis and the other answer the alternative hypothesis. We decide the criterion for rejecting the null hypothesis. The test is one of: two-tailed, right-tailed, or left-tailed. We take a sample, and calculate our test statistic (Z or t for now) We find if the observed test statistic is in the rejection region (critical region or tail) of the distribution. If the statistic is in the rejection region, we reject the null hypothesis and accept the alternative hyopthesis. If the statistic is not in the rejection region, we retain the null hypothesis, and do not accept the alternative hypothesis. 2 3 4 5 9
  • 32. STATISTICS PROJECT: Hypothesis Testing INTRODUCTION My topic is the average tuition cost of a 4-yr. public college. Since I will soon be transferring to a 4-yr. college, I thought this topic would be perfect. "The College Board" says that the average tuition cost of college is $5836 per year. I will be researching online the costs of different public colleges to test this claim. I will be using the T-test for a mean, since my sample is going to be less than 30 and an unknown population standard deviation. I will also use Chi-Square Test of Independence. HYPOTHESIS I think the average cost of tuition is lower than the average stated by “The College Board”. Ho: mu >/= $5836. H1: mu< $5836 (Claim) DATA ANALYSIS I collected my data from various college websites. I looked up the cost of tuition per year and the number of students enrolled. Here is what I came up with:
  • 33. College Tuition Number of Students Central Washington University $4392 10,200 University of Washington $5985 25,469 Washington State University $5888 18,432 Western Washington University $4356 13,000 Evergreen State University $4590 4400 Eastern Washington University $5904 10,000 Peninsula College $3639 10,120 University of Oregon $6174 20,394 Portland State University $5208 24,284 Oregon State University $5604 19,362 Southern Oregon University $5233 5000
  • 34. Eastern Oregon University $4500 3000 Western Oregon University $5763 4500 University of Idaho $4410 11,739 Idaho State University $4400 13,000 There weren’t really any large gaps or outliers in the data that I collected. There was a gap between 5,000 – 10,000 students. But the rest was mostly consistent. The lowest tuition was $3639 from Peninsula College and the highest tuition was $6174 from the University of Oregon. Some of the websites were hard to find the information I wanted, but I eventually found it. Some of the websites were specific as to undergraduate or graduate and some probably contain both. I should have done further research to make sure that my numbers only contain undergraduates and not graduates. So, that is one possible mistake in the data collection. HYPOTHESIS TESTING T-Test for a Mean Step 1: State the hypothesis and identify the claim. I claim that the average cost of college tuition is less than $5836 per year as concluded from “The College Board”. At a=.025, can it be concluded that the average is less than $5836
  • 35. based on a sample of 15 colleges? H0: mu>/= $5836 H1: mu<$5836 (claim) Step 2: Find the critical value At a=.025 and d.f. = 14, the critical value is -2.145. Step 3: Compute the sample test value. m= 5069.73, s=787.80 t= (5069.73-5836)/(787.80/sqrt(15)) = -3.767 Step 4: Make the decision to reject or not reject the null hypothesis. Reject the null hypotheses since -3.767 falls in the critical region. Step 5: Summarize the results. I will reject the null hypotheses since there is enough evidence to support the claim that the average cost of tuition is less than $5836 per year. Chi-Squared Independence Test Step 1: State the hypotheses and identify the claim. I claim that there is a correlation between the number of students at a college and the cost of tuition per year. Here is the data that I collected: Cost of Tuition Number of Students Total 3000-9,999
  • 36. 10,000-16,999 17,000-23,999 24,000-30,999 $3500-4500 1 5 0 0 6 $4501-5500 2 0 0 1 3 $5501-6500 1 1 3 1 6 Total 4 6 3 2 15 At .025, can we conclude that the cost of tuition is dependent on the number of students? Ho: The cost of tuition is independent of the number of students that attend the college. (x²=0) H1: The cost of tuition is dependent on the number of students
  • 37. that attend the college. (claim) (x²>0) Step 2: Find the critical value: The critical value is 14.449 since the degrees of freedom are (3- 1)(4-1)=6. Step 3: Compute the test value. First we have to find the expected value: E1,1 = (6)(4)/15=1.6 E2,1 = (3)(4)/15=.8 E3,1 = (6)(4)/15=1.6 E1,2 = (6)(6)/15=2.4 E2,2 = (3)(6)/15=1.2 E3,2 = (6)(6)/15=2.4 E1,3 = (6)(3)/15=1.2 E2,3 = (3)(3)-15=.6 E3,3 = (6)(3)/15=1.2 E1,4 = (6)(2)/15=.8 E2,4 = (3)(2)/15=.4 E3,4 = (6)(2)/15=.8 The completed table is shown: Cost of Tuition Number of Students Total 3000-9,999 10,000-16,999 17,000-23,999
  • 38. 24,000-30,999 $3500-4500 1 (1.6) 5 (2.4) 0 (1.2) 0 (.8) 6 $4501-5500 2 (.8) 0 (1.2) 0 (.6) 1 (.4) 3 $5501-6500 1 (1.6) 1 (2.4) 3 (1.2) 1 (.8) 6 Total 4 6 3 2 15 Then the test value is x² = ∑ (O-E)²/E = (1-1.6)²/1.6 + (5-2.4)²/2.4 + (0-1.2)²/1.2 + (0-.8)²/.8 + (2- .8)²/.8 + (0- 1.2)²/1.2 + (0-.6)²/.6 + (1-.4)²/.4 + (1-1.6)²/1.6 + (1-2.4)²/2.4 + (3-1.2)²/1.2 + (1- .8)²/.8 = 13.333
  • 39. Step 4: Make the decision to reject or not to reject the null hypothesis. Do not reject the null hypothesis since 13.333 is less than 14.449. Step 5: Summarize the results. There is not enough evidence to support the claim that the cost of tuition is dependent on the number of students that attend the college. SUMMARY My first hypothesis test about the tuition cost of 4-year universities being less than the average was correct. The average as stated by “The College Board” said that the tuition was $5836 per year. I thought that was a little high. The average tuition of the fifteen colleges that I researched was $5069.73. Maybe if I would have researched colleges all around the country instead of just our surrounding states I would have come up with different numbers. Another thing that may have caused this test to be a little off was that when I was collecting data, some of the costs of tuition may include other fees and some may not. When I looked them up, some fees were listed separately and some were not. This could have lead to a Type I error where the null hypothesis was true and it was rejected. My second hypothesis test about whether the cost of tuition is dependant on the number of students that attend the college was rejected. I thought that the fewer the students that attend a specific college, that tuition would be cheaper, but that wasn’t the case. One main problem I can see with colleting my data is that on the college websites for the number of students, some said “over” or “approximately”. So, these weren’t the exact numbers of students enrolled. Also, as stated earlier, some of the students could be undergraduates or graduates. Some of the websites didn’t list them separately. Tuition is higher for graduates, so they should not have been included in this study
  • 40. and it would have thrown off the number of students. So, these may have affected the outcome a little, but I don’t’ think enough for it to change the hypothesis. It would have also been interesting to test to see whether the tuition is higher in urban areas where more people live verses rural areas where there are not as many people. I would be inclined to say that this is true, but it would need to be tested further to say for sure. It would also be interesting to do this same testing for private colleges to see if they have the same results. I thought this was fun to come up with our own hypothesis and try to prove ourselves right or wrong using what we have learned all quarter. It was a good test of our skills and it made me get a better understanding of how the formulas really work rather than just doing the homework examples in the book.