GAUCHAN
1
Gauchan
Angeela Gauchan
Dr. Nicholas Jacob
Intro to Probability and Statistics 1223
April 4th, 2021
2010: Time spent by different sexes in Europe
Project Part I
I was randomly looking for some data set for the project and I landed upon this one. I see a lot of influencers from European countries on my social media feed (mostly Instagram), and they seem to have a perfectly balanced lifestyle, so this dataset about how people spend their time realistically based on country (European countries) and sex appealed to me. The knowledge covers how people spend their time on items like paying jobs, housework, and family.
The following websites have articles related to the subject, with both the source website and csv files. We can navigate through them by clicking on the links below.
https://perso.telecom-paristech.fr/eagan/class/igr204/datasets
http://appsso.eurostat.ec.europa.eu/nui/show.do?dataset=tus_00week&lang=en
There are a number of variables that can be used to interpret this database. The quantitative data that used are time spent in (hh:mm) ratio, participation time in hour and participation rate. These data would be used to compare and contrast how males and females spend their time in various areas of their lives based on where they live. Other data that can be used are categorical data which includes, time spent in sleeping eating, other unspecified personal care, employment related and travel as part of/ during main and second job, main and second job related travel, activities related to employment, study, school and university except homework, homework, free time study, household and family care, food management except dishwashing, cleaning, laundry, construction, shopping, childcare, visiting, computing, hobbies and so on. As we are comparing different countries we will be also looking into nominal variable.
TABLE 1.1
I can see some differences in time usages between males and females just by looking at a few sections of the results. In contrast to men, many women tend to spend more time doing house chores. They seem to spend around the same amount of time in childcare, however. I'd like to know the correlations and disparities between them in other aspects of their lives by looking into the specifics in the dataset.
Project Part II
Below is the updated version of the dataset that I chose which has 7 most common variable listed.
TABLE 2.1
SEX
GEO/ACL00
Total
Personal care
Sleep
Eating
Employment, related activities
Main and second job and related travel
Household and family care
Leisure, social and associative life
TV and video
Travel except travel related to jobs
Males
Belgium
24:00:00
10:45
8:15
1:49
3:07
3:05
2:28
5:58
2:35
1:30
Males
Bulgaria
24:00:00
11:54
9:08
2:07
3:32
3:27
2:37
4:46
2:41
1:07
Males
Germany (including former GDR ...
1. GAUCHAN
1
Gauchan
Angeela Gauchan
Dr. Nicholas Jacob
Intro to Probability and Statistics 1223
April 4th, 2021
2010: Time spent by different sexes in Europe
Project Part I
I was randomly looking for some data set for the project and I
landed upon this one. I see a lot of influencers from European
countries on my social media feed (mostly Instagram), and they
seem to have a perfectly balanced lifestyle, so this dataset about
how people spend their time realistically based on country
(European countries) and sex appealed to me. The knowledge
covers how people spend their time on items like paying jobs,
housework, and family.
The following websites have articles related to the subject, with
both the source website and csv files. We can navigate through
them by clicking on the links below.
https://perso.telecom-paristech.fr/eagan/class/igr204/datasets
http://appsso.eurostat.ec.europa.eu/nui/show.do?dataset=tus_00
week&lang=en
There are a number of variables that can be used to interpret
this database. The quantitative data that used are time spent in
(hh:mm) ratio, participation time in hour and participation rate.
These data would be used to compare and contrast how males
and females spend their time in various areas of their lives
based on where they live. Other data that can be used are
categorical data which includes, time spent in sleeping eating,
other unspecified personal care, employment related and travel
2. as part of/ during main and second job, main and second job
related travel, activities related to employment, study, school
and university except homework, homework, free time study,
household and family care, food management except
dishwashing, cleaning, laundry, construction, shopping,
childcare, visiting, computing, hobbies and so on. As we are
comparing different countries we will be also looking into
nominal variable.
TABLE 1.1
I can see some differences in time usages between males and
females just by looking at a few sections of the results. In
contrast to men, many women tend to spend more time doing
house chores. They seem to spend around the same amount of
time in childcare, however. I'd like to know the correlations and
disparities between them in other aspects of their lives by
looking into the specifics in the dataset.
Project Part II
Below is the updated version of the dataset that I chose which
has 7 most common variable listed.
TABLE 2.1
SEX
GEO/ACL00
Total
Personal care
Sleep
Eating
Employment, related activities
Main and second job and related travel
Household and family care
Leisure, social and associative life
3. TV and video
Travel except travel related to jobs
Males
Belgium
24:00:00
10:45
8:15
1:49
3:07
3:05
2:28
5:58
2:35
1:30
Males
Bulgaria
24:00:00
11:54
9:08
2:07
3:32
3:27
2:37
4:46
2:41
1:07
Males
Germany (including former GDR from 1991)
24:00:00
10:40
8:08
1:43
3:27
3:21
2:22
5:42
12. 2:09
1:25
Females
Norway
24:00:00
10:27
8:10
1:20
2:38
2:37
3:47
5:40
1:39
1:11
For the second part of the project, I examined the frequency and
relative frequency of the most time spent categories by these
males and females in the listed European countries. I took the
average of the most time spent and reported the frequency of the
time spent on a particular task that was less than the average.
TABLE 2.2
Time most spent (Avg)
Frequency (Below average time)
Relative Frequency
Personal Care (Avg 10:54 hrs/min)
16
11.8%
Sleep (Avg 8:28 hrs/min)
16
11.8%
Eating (Avg 1:38 hrs/min)
17
12.5%
Employment related activities (Avg 3:16 hrs /min)
13. 13
8.9%
Main and second job (Avg 3:14 hrs /min)
13
8.9%
Household and family care (Avg 3:12 hrs /min)
14
10.3%
Leisure, social and associative life (Avg 4:55 hrs/min)
15
11.11%
TV and video (Avg 2:07 hrs/min)
15
11.11%
Travel (Avg 1:14 hrs/min)
16
11.85%
Total
145
100%
Europeans tend to spend the majority of their time on personal
care and sleep, according to the dataset. What is even more
fascinating is that the majority of them work a second job and
spend an average of 6:30 hours/min on both jobs, which sounds
like a good reason for you and me to relocate to Europe. Since
the majority of social media influencers are freelancers, their
feeds can tend to be flawless. After all, their work is to be on
social media.
Next, I compared the same categories to their gender which is
shown below.
TABLE 2.3
GENDER
Time most spent (Avg)
Male
14. Female
Personal Care (Avg 10:54 hrs/min)
8
9
Sleep (Avg 8:28 hrs/min)
10
6
Eating (Avg 1:38 hrs/min)
8
8
Employment related activities (Avg 3:16 hrs /min)
1
15
Main and second job (Avg 3:14 hrs /min)
1
12
Household and family care (Avg 3:12 hrs /min)
14
0
Leisure, social and associative life (Avg 4:55 hrs/min)
4
9
TV and video (Avg 2:07 hrs/min)
4
11
Travel (Avg 1:14 hrs/min)
7
9
Since we're looking at data below the average, it's clear that
women in the mentioned European countries spend less time
working outside and more time in leisure and social activities.
In comparison to males, they often drive less and sleep more
than the average. Females in household and family care have no
below-average percentage, implying that they spend the
majority of their time caring for their families. It was also
15. interesting to learn that both men and women devote equal
amounts of time to personal care.
PROJECT PART III
For project part III, I created two graphs to display one of the
quantitative variables of my dataset that is; time most spent in
terms of hour: minute (hh:mm) ratio. I changed the ratio into
decimal and the box plot of the most time spent activity is
demonstrated below.
Fig 3.1
Fig 3.3
By looking at these graphs you can gather a variety of data.
According to my calculations I’ve listed the five-number
summary, mean, and standard deviation below.
Five number Summary:
Minimum: 0.42
Quartile Q1: 0.44
Median: 0.45
Quartile Q3: 0.47
Maximum: 0.50
Average (Mean): 0.45
There are no outliers in this data set you can tell by looking at
fig 3.3. The distribution of the data is skewed to the right
because the mean time is larger than the median time. It also
has no gaps which means it is a continuous dataset.
PROJECT PART IV
I suspected that the average time spent on the personal care is
the greatest than the other categorical variables.
Ho: = The mean of the time spent on personal care by females =
the mean of time spent on personal care by males.
Ha: The mean of the time spent on personal care ≠ the mean of
time spent on personal care by females.
If my hypothesis is correct, all I have to do is compare the
16. average amount of time spent on personal care by males and
females. Despite the fact that the figures do not differ
significantly, table 2.3 shows that females spend more time on
personal care than males. As a result, my hypothesis is verified.
We can also see in the graph that women in Europe prefer to
stay at home longer than men. One of the reasons they spend
more time on themselves may be because of this.
Simple formula: (f=female, m=male)
Ho: μf = μm
Ha: μf ≠ μm
For the categorical value, I decided to look at which among the
nine chosen variables has the lowest time spent my both male
and female.
Ho: The proportion of time spent in employment related
activities is p= 0.25
Ha: The proportion of time spent in employment related
activities is p <0.25.
I chose this because the most time spent variable was used in all
of the previous projects, but I think we should also use the least
time spent variable to formulate hypotheses and draw
conclusions.
PROJECT PART V
Going back and testing the quantitative hypothesis, I created a
bootstrap sample. I found that average amount of time spent on
personal time by males was 0.45 while the average amount of
time spent on the personal care by female was 0.46 which is
very close. Similarly, I found that the standard error for the
sample of the male was 0.01 whereas the female was 0.004. I
then computed the 95% confidence interval for both means of
male and female. For the male, the interval was between 0.44
and 0.46 whereas for female the interval lies between 0.45 and
0.47. With this we reject the null hypothesis because the
estimated mean is not equal to the other mean value. Below are
the two histograms for the bootstrap distribution of the male
and female (Table 5.1).
I also tested the categorical hypothesis using bootstrapping. I
17. found the standard error 0.0064. The 95% interval that I was
able to find for the proportion of the time spent on employment
related activities is between 0.1249 and 0.1453. Below is the
histogram for the bootstrap distribution (Table 5.2). Since, the
null hypothesis of p=0.25 is not within the confidence interval
we reject the null hypothesis.
TABLE 5.1
TABLE 5.2
TABLE 5.3
Sheet1SEXGEO/ACL00TotalPersonal care
SleepEatingEmployment, related activitiesMain and second job
and related travelHousehold and family careLeisure, social and
associative lifeTV and videoTravel except travel related to
jobsMalesBelgium24:00:000.450.340.080.130.130.100.250.110.
06MalesBulgaria24:00:000.500.380.090.150 .140.110.200.110.0
5MalesGermany (including former GDR from
1991)24:00:000.440.340.070.140.140.100.240.080.06MalesEsto
nia24:00:000.440.350.050.190.180.110.210.100.05MalesSpain2
4:00:000.470.360.070.180.180.070.220.080.05MalesFrance24:0
0:000.490.360.100.160.160.100.200.090.04MalesItaly24:00:000
.470.350.080.180.170.070.210.080.07MalesLatvia24:00:000.450
.360.060.210.200.080.200.100.06MalesLithuania24:00:000.450.
350.060.200.200.090.200.110.05MalesPoland24:00:000.450.350
.060.170.170.100.220.110.05MalesSlovenia24:00:000.440.350.0
60.160.160.110.230.090.05MalesFinland24:00:000.430.350.060.
160.160.090.250.100.05MalesUnited
Kingdom24:00:000.430.350.060.170.170.100.220.110.06MalesN
orway24:00:000.420.330.060.170.170.100.240.090.06FemalesB
elgium24:00:000.470.360.080.080.080.170.210.090.06FemalesB
ulgaria24:00:000.480.380.080.110.110.210.160.090.04FemalesG
33. 2. Use the two-way table from part 2 to complete the following
(a) Create at least 2 conditional probabilities from your two-
way table. Interpret their meanings and
explain how they were computed. Include the following formula
for conditional probability
P (A|B) =
P (A∩B)
P (B)
3. Add to your report!
(a) Include all items requested above. Include text and graphics
describing the processes you have
completed.
The eighth and final report submission will be graded by the
following criteria:
• Continuation -10 points. The report includes the material from
the previous reports and any errors
that interfered with the cohesion of the report have been fixed.
• Statistical analysis - 10 points. The statistical tests are all
provided and typeset correctly.
• Formulas - 10 points. The appropriate formula is used and
included in report correctly typeset.
• Interpretations - 10 points. The results of the statistical
analysis are clearly explained and interpreted
in the context of the problem. The conclusions accurately
reflect the analysis and are well supported.
34. • Writing quality - 10 points. The paper is readable and clearly
written. There are few, if any, gram-
matical or spelling errors and they do not interfere with the
clarity of the paper.
If you have any questions about this assignment feel free to
email me or come by my office.
Math 1223 Project Part 7: Quantitative Inference with Formulas
Please use the data you originally collected for part 1. You will
add these new parts to report part 2, 3,
4, 5, and 6.
1. For this project you must find some published or existing
data. Possible sources include: almanacs,
magazines and journal articles, textbooks, web resources,
athletic teams, newspapers, professors with
experimental data, campus organizations, electronic data
repositories, etc. Your dataset must have
at least 25 cases, two categorical variables and two quantitative
variables. It is also recom-
mended that you are interested in the material included in the
dataset.
2. Use the techniques of the text to repeat your hypothesis test.
(a) Repeat your hypothesis test on the quantitative variable
utilizing the appropriate formulas for
your situation. Compute 95% confidence interval and compare
to results from bootstrapping.
3. Add to your report!
35. (a) Include all items requested above. Include text and graphics
describing the processes you have
completed.
The seventh report submission will be graded by the following
criteria:
• Continuation -10 points. The report includes the material from
the previous reports and any errors
that interfered with the cohesion of the report have been fixed.
• Statistical analysis - 10 points. The statistical tests are all
provided and typeset correctly.
• Formulas - 10 points. The appropriate formula is used and
included in report correctly typeset.
• Interpretations - 10 points. The results of the statistical
analysis are clearly explained and interpreted
in the context of the problem. The conclusions accurately
reflect the analysis and are well supported.
• Writing quality - 10 points. The paper is readable and clearly
written. There are few, if any, gram-
matical or spelling errors and they do not interfere with the
clarity of the paper.
If you have any questions about this assignment feel free to
email me or come by my office.
Math 1223 Project Part 6: Categorical Inference with Formulas
36. Please use the data you originally collected for part 1. You will
add these new parts to report part 2, 3,
4, and 5.
1. For this project you must find some published or existing
data. Possible sources include: almanacs,
magazines and journal articles, textbooks, web resources,
athletic teams, newspapers, professors with
experimental data, campus organizations, electronic data
repositories, etc. Your dataset must have
at least 25 cases, two categorical variables and two quantitative
variables. It is also recom-
mended that you are interested in the material included in the
dataset.
2. Use the techniques of the text to repeat your hypothesis test.
(a) Repeat your hypothesis test on the categorical variable
utilizing the appropriate formulas for your
situation. Compute 95% confidence interval and compare to
results from bootstrapping.
3. Add to your report!
(a) Include all items requested above. Include text and graphics
describing the processes you have
completed.
The sixth report submission will be graded by the following
criteria:
• Continuation -10 points. The report includes the material from
the previous reports and any errors
that interfered with the cohesion of the report have been fixed.
• Statistical analysis - 10 points. The statistical tests are all
37. provided and typeset correctly.
• Formulas - 10 points. The appropriate formula is used and
included in report correctly typeset.
• Interpretations - 10 points. The results of the statistical
analysis are clearly explained and interpreted
in the context of the problem. The conclusions accuratel y
reflect the analysis and are well supported.
• Writing quality - 10 points. The paper is readable and clearly
written. There are few, if any, gram-
matical or spelling errors and they do not interfere with the
clarity of the paper.
If you have any questions about this assignment feel free to
email me or come by my office.
Mass Shootings from 1982 – 2019
Cougan Collins
Class: SU20-INTRO TO PROBABILITY AND STATS_01
Instructor: Nicholas Jacob
I will examine the data from the mass shootings done from 1982
to 2019. The data can be found
here https://www2.stetson.edu/~jrasp/data.htm. The original
data set was not arranged by year,
but I wanted to make it easier to read as the mass shootings in
38. chronologic order. There was
some additional data that was repetitive and not useful for my
analysis, so I cleaned that data out.
The categorical variables are the cases, location (state/city),
date, summary of the shooting,
location (where it happened), prior signs of mental health,
mental health details, were the
weapons obtained legally, weapon type, race, gender, and type
of shooting. The quantitative
variables are the number of fatalities, age of the shooter, the
number of injured, and total victims.
Below is an image of four cases of my data set.
What I am most interested in learning from these mass
shootings that took place during this
timeframe is how many of these shooters had prior signs of
mental illness. If the data shows that
most of these shooters had some kind of mental illness
beforehand, it would suggest that we need
to pay closer attention to those who have a mental illness,
especially if they show any signs of
hostility towards classmates or fellow workers. If having mental
39. problems is one of the main
signs that could cause a mass shooter, then we could develop
better programs that would help
these individuals to have the tools they need to not become the
next mass shooter.
I chose to make two tables that would show the frequency and
relative frequency of those mass
shooters who had prior signs of mental illness and who bought
their guns legally. Those who had
a mental illness prior to the shootings are significantly higher
than those who did not. Since my
table includes forty-one shooters whose prior mental condition
is unknown, it makes the relative
frequency of those who had prior mental illness appear to be
lower. If we removed the unclear,
the relative frequency would about 80%, which indicates that
mental illness is a key factor in
those who would commit a mass shooting.
As you look at the second table, about 71% of the weapons they
used were obtained legally. If
you removed the unclear accounts, about 84% of the guns were
bought legally.
40. Our third table, which is a two-way table, show a correlation
between having a mental illness
prior to the shooting and having guns legally. This correlation
would suggest that if we want to
help reduce the number of mass shootings, then we need to have
stricter gun rules for those with
mental illnesses. However, we need to keep in mind that having
a mental illness doesn’t mean a
person will become a mass shooter or harm anyone. So, creating
restrictions would need to be
case location date summary fatalities injured total_victims
location
age_of_shooterprior_signs_mental_health_issuesmental_health_
details weapons_obtained_legallyweapon_type race gender type
Welding shop shooting Miami, Florida 8/20/1982 Junior high
school teacher Carl Robert Brown, 51, opened fire inside a
welding shop and was later shot dead by a witness as he fled the
scene.8 3 11 Other 51 Yes His second wife left him because he
refused to seek psychological help. He had become increasingly
isolated. One former student said he was "off his rocker."Yes
One shotgun white Male Mass
Dallas nightclub shooting Dallas, Texas 6/29/1984 Abdelkrim
Belachheb, 39, opened fire at an upscale nightclub after a
woman rejected his advances. He was later arrested.6 1 7 Other
39 Yes During his last meal with his wife, he confessed he was
depressed and had visited psychiatric hospitals in Belgium.No
One semiautomatic handgunwhite Male Mass
41. San Ysidro McDonald's massacre San Ysidro, California
7/18/1984 James Oliver Huberty, 41, opened fire in a
McDonald's restaurant before he was shot dead by a police
officer.22 19 41 Other 41 Yes The day before the shooting, he
tried to make an appointment at a mental health clinic. Yes One
semiautomatic handgun, one rifle (assault), one shotgunwhite
Male Mass
United States Postal Service shooting Edmond, Oklahoma
8/20/1986 Postal worker Patrick Sherrill, 44, opened fire at a
post office before committing suicide.15 6 21 Workplace 44
Unclear He was worried he had inherited mental problems and
rebuffed a pastor's suggestion he seek psychiatric counseling.
His family members denied he had a history of mental
illness.Yes Three semiautomatic handgunswhite Male Mass
done fairly. Perhaps further study might show that certain kinds
of mental illnesses might be
correlated with mass shootings.
We will never be able to stop all mass shootings, but the data I
have composed indicates that we
could decrease the number of shootings by having more
substantial restrictions on owning guns
if people have mental illnesses. If the parents have guns legally
with no mental illness but their
child is mentally ill, then they should secure their weapons, so
the child has no access to them.
42. Prior signs of mental illness
Response Frequency Relative Frequency
Yes 59 0.504273504
No 17 0.145299145
Unclear 41 0.35042735
Total 117 1
Weapons obtained legally
Response Frequency Relative Frequency
Yes 83 0.709401709
No 16 0.136752137
Unclear 18 0.153846154
Total 117 1
Two-way table
Weapons obtained
legally
Weapons
obtained illegally
43. Weapons
obtained unclear
Prior signs of
mental illness
46 8 5
No prior signs of
mental illness
12 4 0
Unclear prior signs
of mental illness
25 3 14
Next, I will examine the statistical summary of the total number
of victims in these mass
shootings, and I will provide a histogram and a box plot. I also
included a close up of the main
data of the box plot to make it easier to see. The charts will
show that the distribution is skewed
to the right as the mean is greater than the median. Also, you
will notice in the box plot that there
are several outliers ranging from 36 to 604. If I removed the
44. extreme outlier (604), the mean
would change to 15.40517241, which shows how out outliers
can inflate the average. As you can
see from the histogram, the majority of victims injured or killed
during a mass shooting are
between three and thirteen. We can conclude that on average
that a mass shooting will have
between three to thirteen victims that are injured or killed.
Total victims - Summary
Mean 20.43589744
Median 11
Standard Dev. 56.27184141
Variance 3190.248011
Q1 7
Q2 11
Q3 18
IQR 11
Min 3
Max (604.00)
45. Hypothesis test for a quantitative variable
I suspect that if we examine all the mass shootings in my chart
that we will discover that there
are more injuries than deaths.
Ho: = The mean of the injuries will be > than the mean of the
deaths in the mass shootings in my
report.
Ha: ≠ The mean of the injuries will be < than the mean of the
deaths in the mass shootings in my
report.
This will be easy to determine if my hypothesis is supported
because all I have to do is compare
the mean of the deaths in the chart to the mean of the injuries.
If my hypothesis is supported, the
46. numbers will show it, and they do. The mean of deaths is 8.10,
and the mean of injuries 12.33.
Therefore, my hypothesis is supported. I will add that I based
my hypothesis on the fact that
when a mass shooting happens, people tend to run as fast as
they can, which puts them at a
higher risk of injuring themselves. Also, as the shooter shoots
into a crowd, he or she is not
going to have kill shot every time, which means that there will
be more people injured from the
stray bullets as well.
Simple formula: (i=injures d=deaths)
Ho: μi = μd
Ha: μi < μd
Hypothesis test for a categorical variable
I suspect that the majority of mass shooters have some form of
mental illness and showed signs
of mental illness before they committed the mass shooting.
Ho: = The frequency of mass shooters who had prior signs of
mental illness is equal to the mass
shooters who had no prior signs of mental illness.
47. Ha: ≠ The frequency of mass shooters who had prior signs of
mental illness is < than the mass
shooters who had no prior signs of mental illness.
The main reason I suspect the majority of mass shooters would
have some form of mental illness
is because what person in his or her right mind would decide to
kill as many people as he or she
can. We can easily determine if the hypothesis is supported by
looking at the frequency table
below. It clearly supports my hypothesis because 59 cases had
prior signs of mental illness, and
17 did not.
Prior signs of mental illness
Response Frequency Relative Frequency
Yes 59 0.504273504
No 17 0.145299145
Unclear 41 0.35042735
Total 117 1
Simple formula:
48. Ho: p Mental Illness=1/2
Ha: p Mental Illness >1/2
Bootstrap test for my categorical variable
To test my hypothesis further, I used bootstrapping with 52
generated samples to determine if my
hypothesis should be accepted or rejected. I discovered that my
bootstrap mean was .778846154,
and the standard error is .042. Computing the 95% confidence
level with a z score of 1.96, I
found the range to be .6930511954 to .864641114, which means
that I can be 95% confident that
the majority of mass shooters had some previous signs of mental
illness. So, I failed to reject my
hypothesis.
Bootstrap test for my quantitative variable
To test my hypothesis further, I used bootstrapping with 52
generated samples for the injured and
the fatalities to determine if my hypothesis should be accepted
or rejected. I discovered that my
49. bootstrap mean for the fatalities is 8.010804709, and my
bootstrap mean for injuries is
11.43202709. The standard error for fatalities is .728691332
and 4.73563476.
If my hypothesis were being tested by the bootstrap mean, I
would have failed to reject my null
hypothesis because the number of injuries is greater than the
number of deaths. However, based
on the 95% confidence level, I must reject the hypothesis
because, based on the compared ranges
for fatalities and injuries, it is possible for the deaths to be
greater than the injuries because there
is a much greater range with the injury numbers. As you can see
in the charts below, the fatalities
range between 6.553422046 and 9.468187372. The injuries
range from 1.960757572 to
20.90329661.
From Module 6, I will use the formulas we learned to test my
hypothesis again. Let’s begin with
our categorical data the records if the shooter had prior signs of
50. mental illness.
Simple formula:
Ho: p Mental Illness=1/2
Ha: p Mental Illness >1/2
There are 76 shooters, with 59 of them having prior signs of
mental illness. I suspect that more
than half of the shooters had prior signs of mental illness. I will
use an Alpha of .05. My sample
size is 76, the proportion is ½, and my statistic is .776315787.
The SE is .057353933. Z =
4.817730411 and Z* 1.6448553627. Using StartKey, my 95% CI
ranges from .684 to .868. This
is almost identical to my initial bootstrap test above, which
once again offers support that my
hypothesis is true. Using these numbers, I have failed to reject
the null hypothesis.
Next, let’s examine the quantitative data using the new formula
from Module 6. I am very
51. interested in how this will compare to the previous module.
Simple formula: (i=injures d=deaths)
Ho: μi = μd
Ha: μi < μd
Samplesize proportion statistic
76 0.5 0.776315789 0.05
n P p hat alpha
Simple formula: (i=injures d=deaths)
Ho: μi = μd
Ha: μi < μd
z* z SE
1.644853627 4.817730411 0.057353933
(phat-p)/SE sqrt(p*(p*(1-p)/n
We have a sample size of 2391. Out of the 2391, there were 948
fatalities and 1443 injuries. The
SE = 0.014460896. P1-P2 = -0.20702635. Z = -14.31628741. P
= 8.6564E-47. I have included a
snapshot of Excel numbers. However, when I put this problem
52. into StartKey, you can see from
the chart below that is only a .013 chance that the deaths and
injuries will be different from one
another. However, from the Excel nonpooled numbers, the 95%
Cl difference can range from -
0.234755148 to -0.17929755. Though this formula gave me
different numbers from the previous
example above, I would still have to reject the null hypothesis.
Also, my instructor told me to
include the following Cl numbers which range from -
0.027728799 and 0.027728799.
In this section, I am only retesting the qualitative part of my
hypothesis with a test. Using the
formula we learned this week, I came up with the following
numbers.
Fatalities Injuries
2391 2391 4782
53. 948 1443 2391
0.396486826 0.603513174 0.5
p1 p2 Pooled Proportion
Pooled SE p1-p2
0.014460896 -0.20702635
z p tail two times
-14.31628741 8.6564E-47 1 1.73E-46
0
Not pooled SE
0.014147606
z*
1.959963985
95% CI
-0.234755148 -0.17929755
Module 7
Average Standard Dev.
4.230769231 45.9914506
SE
54. 0.940561764
Tstat
4.498130153
Postive
0.034017483
CL
4.198773687 4.262764774
Lower Upper
If you compare these numbers to my previous test, they are
quite different. With such a high
confidence level, I will accept my null hypothesis. I amazed at
how different each statistical test
is. My question would be, which is the most reliable method?
For the last part of our project I will provide two conditional
probabilities from a modified
version of my two-way table that was done earlier in this
project.
55. How many shooters had a prior sign of mental illness? This
would be determined by dividing
39/54, which would = .72. In other words, 72% had prior signs
of mental illness.
How many shooters had no prior signs of mental illness and
obtained weapons illegally? This
would be determined by dividing 4/54 which would = .074. In
other words, 7.4% had no prior
signs of mental illness who had obtained weapons illegally.
Please refer to the following formula that is to be included in
this final project.
I have enjoyed examining the statistics of my project. I am
thankful for the opportunity to have
had a glimpse it to what goes into looking at projects like mine.
You can view my edited Excel file from this link
https://www.dropbox.com/s/3xezqdqoth3v17q/Mother%20Jones
%20-
%20Mass%20Shootings%20Database%5EJ%201982%20-
%202019%20revised.xlsx?dl=0
Two-way table
Column1 Weapons obtained legally Weapons obtaintaid
56. illegally total
Prior signs of mental illness 39 0 39
No prior signs of mental illness 11 4 15
Total 50 4 54