This document outlines a statistical methods course project on analyzing the gender pay gap. Students will generate random sample data on gender, age, education, occupation, experience, salary, marital status, and number of children. They will organize, describe, analyze, and interpret the data using statistical techniques like frequency tables, graphs, measures of center and spread, probability distributions, confidence intervals, hypothesis testing, correlation, and linear regression. The project involves multiple parts where students apply their learning and reflect on their work through metacognitive evaluations. The goal is to determine if the sample data shows evidence of a gender pay gap.
1. MATH 1342 Elementary Statistical Methods
The Gender Pay Gap Project - Spring 2022
Goal: The Economic Policy Institute (https://www.epi.org/about/) declares that working
women are paid less that men. More precisely, a typical woman is paid 83 cents for
every dollar a man is paid. The goal of this project is to determine the existence of
the gender pay gap.
This assignment will satisfy the following core objectives:
1. Critical Thinking Skills - to include creative thinking, innovation, inquiry, and the
analysis, evaluation, and synthesis of information.
2. Communication Skills - to include effective written, oral, and visual
communication.
3. Empirical and Quantitative Skills - to include applications of scientific and
mathematical concepts.
This assignment will assess the following student learning outcomes:
SLO 1: Explain the use of data collection and statistics as tools to reach reasonable
conclusions.
SLO 2: Recognize, examine and interpret the basic principles of describing and
presenting data.
SLO 5: Examine, analyze and compare various sampling distributions for both
discrete and continuous random variables.
SLO 6: Describe and compute confidence intervals.
SLO 7: Solve linear regression and correlation problems.
SLO 8: Perform hypothesis testing using statistical methods.
This assignment will assess the following metacognitive student leaning outcomes:
MSLO 1: Apply metacognitive skills to plan for all stages of assignments.
2. Page 2 of 11
1/10/2022
MSLO 2: Apply metacognitive skills to monitor comprehension of learning and task
completion
MSLO 3: Apply metacognitive skills to evaluate task performance.
General Instructions
Students are going to organize, describe, analyze and interpret data generated randomly
from a website. As a guide for the project, students will answer some questions and follow
some instructions. As a final product, students will create a well-organized, creative
PowerPoint presentation about their findings.
• All slides should be grammatically correct and error-free.
• All math content must be correct.
• Pictures should be added to enhance the meaning of the text and visual design.
Students should upload both the data file used and the PowerPoint presentation into
Canvas before the due date. This project is the Signature Assignment for the course.
Part I
This first assignment will assess the following student learning outcomes:
SLO 1: Explain the use of data collection and statistics as tools to reach reasonable
conclusions.
MSLO 1: Apply metacognitive skills to plan for all stages of assignments.
Create the sample data
• Access the website http://www.randat.com/
• Select the following variables (Note: first and last names appear by default)
o Gender
o Age. Select any interval.
o Education
o Occupation
o Experience (Years)
o Salary. Chose the “Per year” option.
o Marital Status
3. Page 3 of 11
1/10/2022
o Number of Children
• Select 200 for the Number of Rows.
• Click on the Generate button. In the table you will see the data generated.
• Save the data as an Excel (.XLSX) file. This will be your sample for the project.
Start the PowerPoint presentation
• Make the title slide in your PowerPoint presentation. This slide should include
the project title, your name, course name and number-section (MATH 1342-**
Elementary Statistical Methods), and semester (Spring 2022).
• The first slide should include a video with introduction, i.e., short description,
goal of the project and the importance of the topic. The video should be included
in PowerPoint (not link) and be at least 1-minute long. Students can use any
program or device to create the video. You don’t have to be present on video, but
in this case, you would have to create a pictures slide show.
• Make your second slide in your PowerPoint presentation. Your data was created
using a random data generator. Explain your opinion on applying statistics tools
over your sample data to draw reasonable conclusions. Also explain why did you
choose the range selected for the age variable? Hints:
o If the data is generated by a program, what type of data collection is it?
o What are the benefits and disadvantages of this method?
o Do you think your results will be close to real numbers?
• Make your next slide in your PowerPoint presentation. In this slide answer the
question: Why can’t we use the real data for this study? Explain the possible
ethical issues working with the real data? Identify possible flaws or biases of the
real-data study.
Upload both your PowerPoint presentation and the Excel file with the data of your sample.
Address your metacognitive thinking (use separate Word document)
• Explain why in some studies it is important to use simulation data instead of real
data?
• Explain why is it important to understand the essential real-world problems?
• Explain why is it important to include the real-world problems in the class
material?
4. Page 4 of 11
1/10/2022
Part II
This second assignment will assess the following student learning outcomes:
SLO 2: Recognize, examine and interpret the basic principles of describing and
presenting data.
MSLO 2: Apply metacognitive skills to monitor comprehension of learning and task
completion
Organizing and presenting data
• After examining the values for each variable in your study, recognize each
variable as qualitative, quantitative discrete, or quantitative continuous. Show
your results as a list or a table on the next slide of the PowerPoint. Explain in
writing why you classified the variable in that way?
• In the Excel file, copy the Education column to the next Excel sheet and organize
the values of the Education variable through a frequency table. The frequency
table should include four columns: the first for the descriptions, the second for
the frequencies, the third for the relative frequencies, and the fourth for the
percentages.
• Using the frequency table generated in the previous step, present the values
graphically. For this, by using the percentages column make a pie chart and by
using the frequencies column create a bar graph for Education.
• Copy the table and both graphs to the next slide of the presentation.
• Analyze the results and write down the paragraph presenting your findings and
conclusion.
Address your metacognitive thinking (use the same Word document as
the previous assignment)
• Explain why is it important to use different methods of organizing and
presenting data?
• Explain why is it important to provide conclusion and explanation for your
results?
• Reflect on your results of the Probability and Statistics Project. What would you
change in the planning if you would know your grade ahead of time? What
learning strategies can you apply to this project?
5. Page 5 of 11
1/10/2022
Part III
This third assignment will assess the following student learning outcomes:
SLO 2: Recognize, examine and interpret the basic principles of describing and
presenting data.
MSLO 3: Apply metacognitive skills to evaluate task performance.
Organizing and presenting data
• In Excel file, copy the Salary column to the next Excel sheet. Use the filter tool to
separate the Salary variable into two groups: one for Females and the other for
Males.
o First, designate two separate columns aside from your data set, one for
Female Salaries and the other for Male Salaries; write the titles. Now
filter (this tool is under Data tab) the gender variable to see only female
values; then copy the corresponding female salary values and paste it to
the Females Salaries column created.
o Repeat the previous steps to copy and paste the salaries for males into the
Male Salaries column made.
• Create the histogram to present the data for each group
o Open https://www.socscistatistics.com/descriptive/histograms/
o Copy the data from one group (Female or Male) into the data box
o Click Generate
o Go to Edit tools and select the number of classes; you can choose any
number from 8 to 10
o Click Edit histogram
o Copy the histogram and Frequency table to the next slide of your
presentation.
o Repeat the same steps for the second group
• Describe data values by calculating the following measurements:
o minimum value, maximum value, mean (average), first quartile, median
(or second quartile), third quartile, and the standard deviation.
o Find the IQR and check the data for the possible outliers.
o Repeat the same steps for the second group
• Analyze the results and write down a paragraph presenting your findings and
conclusion. Is there a gender pay gap? Explain.
6. Page 6 of 11
1/10/2022
Address your metacognitive thinking
• Differentiate between discrete and continuous variables. For example, why do
we have use different types of graphs for discrete and continuous variables?
• Reflect on your study strategies that you used to get ready for this assignment.
Which learning strategies are best for you? What strategies have you tried that
didn’t work? How long did it take to complete this part of the project?
Part IV
This fourth assignment will assess the following student learning outcomes:
SLO 5: Examine, analyze and compare various sampling distributions for both
discrete and continuous random variables.
MSLO 2: Apply metacognitive skills to monitor comprehension of learning and task
completion.
MSLO 3: Apply metacognitive skills to evaluate task performance.
Discrete probability distribution function
• In this part of the assignment, you work with Female Salary column, that you
created in Part III.
o Find the probability 𝑝 that a female person get a salary over $150,000.
o Consider that ten female people are randomly selected, with replacement,
from your Female Salary sample. Identify the characteristics (number of
trials (𝑛) and probability of success (𝑝)) of the Binomial distribution
𝑋~𝐵(𝑛, 𝑝).
o Describe and graph the sampling distribution for the number of female
people, out of 10, with a salary over $150,000.
o What is the probability that three or less out of 10 have that salary?
• Now, you work with the Male Salary column, that you created in Part III.
o Find the probability 𝑝 that a male person get a salary over $150,000.
o Consider that ten male people are randomly selected, with replacement,
from your Male Salary sample. Identify the characteristics of the Binomial
distribution 𝑋~𝐵(𝑛, 𝑝).
o Describe and graph the sampling distribution for the number of male
people, out of 10, with a salary over $150,000.
7. Page 7 of 11
1/10/2022
o What is the probability that three or less out of 10 have that salary?
Students can use https://classcalc.com/graphing-calculator for the graphs.
Continuous probability distribution function
• Assume that Female salaries follow a normal distribution with mean and
standard deviation found in Part III.
o Identify the characteristics of the Normal distribution 𝑋~𝑁(𝜇, 𝜎).
o Describe and graph the normal distribution for Female salaries, then
shade the area for salaries over $150,000.
o What is the probability of randomly selecting a female from your sample
with a salary over $150,000?
• Assume that Male salaries follow a normal distribution with mean and standard
deviation found in Part III.
o Identify the characteristics of the Normal distribution 𝑋~𝑁(𝜇, 𝜎).
o Describe and graph the normal distribution for Male salaries, then shade
the area for salaries over $150,000.
o What is the probability of randomly selecting a male from your sample
with a salary of over $150,000?
Conclusion
Comparing and analyzing the previous results, explain if males have a better salary
than females. Do you think that salary values correspond to real people? Please
present arguments and evidence that support your conclusion.
Address your metacognitive thinking
• Explain, why is it important to use graphs in your project?
• Reflect on the progress of your project. Is everything going according to plan, or
you may consider some changes?
Part V
This fifth assignment will assess the following student learning outcomes:
SLO 6: Describe and compute confidence intervals.
8. Page 8 of 11
1/10/2022
MSLO 2: Apply metacognitive skills to monitor comprehension of learning and task
completion
MSLO 3: Apply metacognitive skills to evaluate task performance.
Overview
To determine whether the difference between the female and male salary means is
statistically significant, you should compare the confidence intervals for those
groups. If those intervals overlap, you could conclude that the difference between
groups is not statistically significant. If there is no overlap, the difference is
significant.
Confidence intervals
Complete the following tasks:
• Compute a 95% confidence interval for the female salaries (assume that female
salaries follow a Normal Distribution.)
o Use the Female Salary column, that you created in the Scaffolding
assignment 3, to identify the size, the mean, and the standard
deviation of the sample.
o Use the calculator to create the confidence interval for the female
salaries.
• Compute a 95% confidence interval for the male salaries (assume that male
salaries follow a Normal Distribution.)
o Use the Male Salary column, that you created in the Scaffolding
assignment 3, to identify the size, the mean, and the standard
deviation of the sample.
o Use the calculator to create the confidence interval for the female
salaries.
• Is there a gender pay gap? To answer this question, use a paragraph to describe
the confidence intervals computed above using complete sentences.
Address your metacognitive thinking
• Why it is important to use samples for studies, but not the population. Explain in
writing.
• Why it is important to have a representative sample? Explain in writing.
• Is there a chance that the real data is not part of your confidence interval?
Explain in writing.
9. Page 9 of 11
1/10/2022
Part VI
This sixth assignment will assess the following student learning outcomes:
SLO 8: Perform hypothesis testing using statistical methods.
MSLO 2: Apply metacognitive skills to monitor comprehension of learning and task
completion.
MSLO 3: Apply metacognitive skills to evaluate task performance.
Hypothesis Test
Perform a test for the following hypotheses:
𝐻0: 𝜇𝑤 = 𝜇𝑚
𝐻𝑎: 𝜇𝑤 ≠ 𝜇𝑚
Where 𝜇𝑤 represents the mean salary for women and 𝜇𝑚 represents the mean
salary for men.
That is, you are testing if female and male salaries are different.
• Write your null and alternative hypothesis in written form. Identify the claim.
• Write the variables you will be using for the hypothesis testing. To simplify your
work, use 𝜇 = 𝜇𝑚, 𝜎 = 𝜎𝑤, 𝑥̅ = 𝜇𝑤, 𝑛 = 𝑛𝑤
• Analyze sample data by performing the calculations Z-test that ultimately will
allow you to reject or decline to reject the null hypothesis.
• Compute the p-value. Then write down your decision and conclusion about the
Gender Pay Gap (one paragraph).
Address your metacognitive thinking:
• Why it is important to identify which hypothesis is a claim. Explain in writing.
• What is the difference between the decision and conclusion in hypothesis
testing? Explain in writing.
• Reflect on the progress of your project. Is everything going according to plan, or
you may consider some changes?
10. Page 10 of 11
1/10/2022
Part VII
This seventh assignment will assess the following student learning outcomes:
SLO 7: Solve linear regression and correlation problems.
MSLO 3: Apply metacognitive skills to evaluate task performance.
In this section, you will determine the correlation between the Salary and the
Experience variables.
In this section, you will determine the correlation between the Salary and the
Experience variables.
• Use your Excel file to construct a scatter plot for Experience versus Salary
variables. Take the Experience values in the x-axis and the Salary values in the y-
axis.
• Visually analyze the graph generated to determine the existence of any
correlation among those variables.
• Find the correlation coefficient r to support your findings. Use the correlation
coefficient to identify the type of linear correlation: none, weak, average, strong.
• In case of existence of the linear correlation among the variables state the
corresponding regression equation.
• Write a two-paragraph conclusion to describe your findings. Explain, if the
regression equation can be used to make predictions or not.
Address your metacognitive thinking:
• If there is no linear correlation, does it mean there is no correlation at all?
• Explain why it is important to be able to identify correlations and how we use
them.
• Now, when you are almost ready to submit the project, compare your progress of
this scaffolding part to the previous submissions.
11. Page 11 of 11
1/10/2022
Conclusion
On the last slide of your project reflect on your findings. Use at least three complete
paragraphs:
• What did you learn about the Gender Gap Pay?
• What did you learn about statistical tools and methods?
• Reflect on your work on your project. What did you learn about yourself? Was there
anything specific, that you discovered (study habits, following the plan, etc.)?
• What about the project material; was everything went as planned?