What to submit: Please submit a single Word file containing your numerical results, comments and graphics (if any) for all questions. Also submit the worksheets (if any) you used to produce the report - a total of two separate files (teaching assistants will review the worksheets in the event of errors in the report).
We have been tossing coins (or letting a computer toss them for us) to see what happens. One thing we learned is that we can get a slightly different answer each time we do it. There are cases in which it is possible to get an exact answer. This Assignment explores that. You can read the Exhaustion Methodto see how you could generate a list of all possible outcomes for a coin tossing experiment. We suggest you use it to list the outcomes for seven tosses. We hope this will help you see that such a list is not too difficult to make if you go at it in a logical and organized way. However, you do not need to follow the process to do this Assignment. Please use the list of all 128 outcomes to answer the questions 8-10 in the assignment and to check the list you made.
Statistics 1 Assignment 2 (40 points)
Q.1 (2 pts) In a survey of engineers at a hard drive manufacturer it was found that 18% were female, 7% were black, 35% had degrees in electrical or computer engineering, and 40% were under the age of 35. Would it make sense to present this information in a pie chart? Why or why not?
Q.2 (2 pts)
In basketball, some fouls result in "free throws" (unimpeded shots) by the player fouled. Over his career, a basketball player has scored on 1210 free throw attempts and missed 214 free throw attempts. What is his estimated probability of successfully scoring on a free throw attempt?
Q.3 (2 pts)
A political commentator makes the following observation in 1991: "From 1973 to 1982, the US economy grew at an annual rate of only 2%. From 1983 to 1990, the growth rate doubled to 4%. That's a big difference." Review the spreadsheet on GDP that was presented in this chapter and critique this statement, especially with respect to choice of comparison periods.
Q.4 (2 pts)
Consider the following data on the median home value in Boston neighborhoods (from the mid 20th century):
22
13.1
17.8
20.3
15.4
11.7
25.3
15.2
27.1
23.2
23.1
18.1
32.9
20.3
21.1
21.1
19.9
23.1
16.1
10.4
Find the standard normal score for the first value (22). (For purposes of calculating the standard deviation, you can consider this either as the entire population, or as a sample.
Q.5 (3 pts)
Evidence has been produced that famous people are less likely to die in the month of their birthday than in other months. The (skeptical) hypothesis is that dying is equally likely in any month regardless of birthday.
Now suppose that out of 120 celebrity deaths, only 7 occurred in the month of their birthday.
Imagine a hat with 12 cards, each card a month, as well as a list of the 120 celebrity birthdays. We shuffle and pick a card, noting whether it.
What to submit Please submit a single Word file containing your n.docx
1. What to submit: Please submit a single Word file
containing your numerical results, comments and graphics (if
any) for all questions. Also submit the worksheets (if any) you
used to produce the report - a total of two separate files
(teaching assistants will review the worksheets in the event of
errors in the report).
We have been tossing coins (or letting a computer toss them for
us) to see what happens. One thing we learned is that we can get
a slightly different answer each time we do it. There are cases
in which it is possible to get an exact answer. This Assignment
explores that. You can read the Exhaustion Methodto see how
you could generate a list of all possible outcomes for a coin
tossing experiment. We suggest you use it to list the outcomes
for seven tosses. We hope this will help you see that such a list
is not too difficult to make if you go at it in a logical and
organized way. However, you do not need to follow the process
to do this Assignment. Please use the list of all 128 outcomes to
answer the questions 8-10 in the assignment and to check the
list you made.
Statistics 1 Assignment 2 (40 points)
Q.1 (2 pts) In a survey of engineers at a hard drive
manufacturer it was found that 18% were female, 7% were
black, 35% had degrees in electrical or computer engineering,
and 40% were under the age of 35. Would it make sense to
present this information in a pie chart? Why or why not?
Q.2 (2 pts)
In basketball, some fouls result in "free throws" (unimpeded
shots) by the player fouled. Over his career, a basketball player
has scored on 1210 free throw attempts and missed 214 free
throw attempts. What is his estimated probability of
successfully scoring on a free throw attempt?
Q.3 (2 pts)
A political commentator makes the following observation in
2. 1991: "From 1973 to 1982, the US economy grew at an annual
rate of only 2%. From 1983 to 1990, the growth rate doubled to
4%. That's a big difference." Review the spreadsheet on GDP
that was presented in this chapter and critique this statement,
especially with respect to choice of comparison periods.
Q.4 (2 pts)
Consider the following data on the median home value in
Boston neighborhoods (from the mid 20th century):
22
13.1
17.8
20.3
15.4
11.7
25.3
15.2
27.1
23.2
23.1
18.1
32.9
20.3
21.1
21.1
19.9
23.1
16.1
10.4
Find the standard normal score for the first value (22). (For
purposes of calculating the standard deviation, you can consider
this either as the entire population, or as a sample.
Q.5 (3 pts)
Evidence has been produced that famous people are less likely
to die in the month of their birthday than in other months. The
(skeptical) hypothesis is that dying is equally likely in any
month regardless of birthday.
3. Now suppose that out of 120 celebrity deaths, only 7 occurred
in the month of their birthday.
Imagine a hat with 12 cards, each card a month, as well as a list
of the 120 celebrity birthdays. We shuffle and pick a card,
noting whether it matched the first celebrity birth month. We
then repeat this (replacing the card each time, of course), each
time noting whether the month picked from the hat matched the
next birth month, etc., until we have gone all the way through
the 120 names on the list.
Then we repeat this procedure 100 times, each time recording
how many matches we got between the 120 picks from the hat,
and the list of 120 birthdays. We got the following frequency
distribution. What is your conclusion and why?
Number dying in birthday month
Frequency
6
1
7
3
8
9
9
20
10
32
11
25
12
7
13
1
14
2
Q.6 (4 pts) With the CBC simulation that you already ran, run it
again nine more times and report the ten p-values you obtain.
4. So, you will (1) toss a coin ten times, (2) repeat step 1 a
thousand times, recording what proportion of the 1000 got 7 or
more heads, and then (3) doing steps 1 and 2 nine more times
for a total of 100 000 tosses. NOTE: here is a <link> that will
lead you to an Excel spreadsheet already set up to do this and
another Excel for Windows spreadsheet using Box Sampler (you
can download macro-enabled workbook or you can install Box
Sampler on your Windows computer). So all you really need to
do is press a key or click on a couple menu items ten times and
write down what you get. Then make a nice statistical summary
of the results and estimate the true p-value. Also give an
estimate of how far off that value might be from the true value.
Q.7 (5 pts) This exercise continues our work with the CBC
story. In the text we remarked that cutting the number of major
medical errors in half would have been more impressive if the
number of errors had been larger. Redo question 6, but this time
imagine we had 20 major medical errors to assign to years. If
you use one of the spreadsheets we provided, you will need to
make at least these changes:
a. Change the number of tosses from 10 to 20.
b. The formula that counts how many times 2008 came up will
have to be changed to point to a range of 20 numbers rather than
10.
c. The table where you record the frequency distribution of the
outcomes will have to expand to have 21 rows instead of 11.
(Make good use of cut-and- paste here.)
d. Cutting the errors in half will now mean 14 or more errors in
2008 so you will have to change what you count in the
frequency table when you compute the p-value.
Report a frequency table of outcomes and a p-value. Compare
the p-value to what you got with just 10 medical errors.
Q.8 (3 Points) Use the list of 128 outcomes for seven tosses of a
coin (the link is in the assignment introduction) to make a table
for the frequency and probability distribution of the random
variable "number of heads in seven tosses". (You can do this
simply by counting.) There should be eight possible values, and
5. each requires a probability. What do your probabilities add up
to?
Q.9 (2 Points) Use the table you made above to compute the
probability that the number of heads will be at least double the
number of tails (this translates to "five or more heads").
Q.10 (8 Points Total) Suppose you bought some really cheap
blank DVDs at the dollar store. Then you look them up on the
web and find that half these disks are dead on arrival and when
data are recorded on the remainder, about half of those become
unreadable within the first year, half of the survivors die in the
second year, etc. Let's see what happens over seven years. We
can model the distribution of "time before failure" with a coin
toss. Make a table for the probability distribution of the number
of tosses before you got a head (=failure). For example this
random variable assigns 3 to TTTHTHT and 0 to HTHTHTH,
meaning one disk lasted three years and another was dead on
arrival. (Make sure you get the right counts for these two
examples before you continue.) If you never get a head
(TTTTTTT), assign the value 7.
Use the "list of 128 outcomes" linked in the instructions for this
homework to answer the following two questions.
a. (5 points) What is the probability that a disk will last 6 years
(i.e. 7 tosses of the coin) before failing?
b. (3 points) Often of interest in such situations is the mean
time before failure (MTBF). This is a common spec for
computer hard drives. Find this for the distribution above.
Note: Could you use a simulation to solve these problems? Yes
and a link will be included to a Box Sampler solution in the
model answer. However, the setup is a bit involved and we are
not asking you to do a simulation for this problem. This is the
type of simulation more suited to a programming language than
a statistical analysis package.
Q.11 (1 pts) Here is a table of column percents for Department
D in the Berkeley study of graduate admissions.
Female
7. the admission rates of males and females.
Q.13 (4 pts) Here is a contingency table for the variables Dept.
and Admit from Berkeley.
A
B
C
D
E
F
All
Admitted
601
370
322
269
147
46
1755
Rejected
332
215
596
523
437
668
2771
All
933
585
918
792
584
714
4526
8. Find P(C), P(R) and P(C∩R). (Note: The last is the probability
of C intersect R in case your browser does not show math.
symbols. "R" means "rejected")
Possible Outcomes for Seven Tosses of a Fair
CoinHHHHHHHTHHHHHHHTHHHHH TTHHHHHHHTHHHH
THTHHHH HTTHHHH TTTHHHH HHHTHHH THHTHHH
HTHTHHH TTHTHHH HHTTHHH THTTHHH HTTTHHH
TTTTHHH HHHHTHH THHHTHH HTHHTHH TTHHTHH
HHTHTHH THTHTHH HTTHTHH TTTHTHH HHHTTHH
THHTTHH HTHTTHH TTHTTHH HHTTTHH THTTTHH
HTTTTHH TTTTTHH HHHHHTHTHHHHTHHHTHHTH
TTHHHTHHHTHHTH THTHHTH HTTHHTH TTTHHTH
HHHTHTH THHTHTH HTHTHTH TTHTHTH HHTTHTH
THTTHTH HTTTHTH TTTTHTH HHHHTTH THHHTTH
HTHHTTH TTHHTTH HHTHTTH THTHTTH HTTHTTH
TTTHTTH HHHTTTH THHTTTH HTHTTTH TTHTTTH
HHTTTTH THTTTTH HTTTTTH
TTTTTTHHHHHHHTTHHHHHTHTHHHHT
TTHHHHTHHTHHHT THTHHHT HTTHHHT TTTHHHT
HHHTHHT THHTHHT HTHTHHT TTHTHHT HHTTHHT
THTTHHT HTTTHHT TTTTHHT HHHHTHT THHHTHT
HTHHTHT TTHHTHT HHTHTHT THTHTHT HTTHTHT
TTTHTHT HHHTTHT THHTTHT HTHTTHT TTHTTHT
HHTTTHT THTTTHT HTTTTHT TTTTTHT
HHHHHTTTHHHHTTHHTHHTT TTHHHTTHHTHHTT
THTHHTT HTTHHTT TTTHHTT HHHTHTT THHTHTT
HTHTHTT TTHTHTT HHTTHTT THTTHTT HTTTHTT
TTTTHTT HHHHTTT THHHTTT HTHHTTT TTHHTTT
HHTHTTT THTHTTT HTTHTTT TTTHTTT HHHTTTT
THHTTTT HTHTTTT TTHTTTT HHTTTTT THTTTTT
HTTTTTT TTTTTTT
Now you just need to go through the list and count to get a
frequency (or probability) distribution. For example, here is a
count of heads for the first five outcomes. HHHHHHH
7THHHHHH 6HTHHHHH 6TTHHHHH 5HHTHHHH 6
9. The Method of Exhaustion
This exercise which is intended to give you a feel for where the
theoretical probabilities come from. It is the word processor
equivalent of a tree diagram. It's based on repeating a pattern,
but the pattern is actually harder to see at the beginning, so we
will start with a list of possible outcomes for two tosses of a
fair coin (which is an approximate model for the sexes of
children born in a family with two children). The possibilities
areHHTHHTTT
We will use this to build the corresponding list for three tosses.
In three tosses, we could have any of the above outcomes
followed by a head plus any of the above outcomes followed by
a tail, for a total of eight possibilities. Use Copy and Paste in
your word processor to make two copies of the outcomes for
two tosses.HHTHHTTTHHTHHTTT
Then add an H at the end of the top four and a T at the end of
the bottom four to get these outcomes for three
tosses.HHHTHHHTHTTHHHTTHTHTTTTT
You can continue in this pattern to create lists of possible
outcomes for 4, 5, 6 and 7 or more tosses of a fair coin. The
number of outcomes should double each time. You should find
that this is repetitive but not difficult and not too time
consuming. This approach allows you to compute theoretical
probabilities for a few tosses. If you print your document you
will find that continuing to, say, 20 tosses, would use up a lot of
trees. Still, you could answer any question by simple counting.
If we wish to work with large numbers of tosses, a formula
might be handy. The way to find a formula is to look for
patterns in easy cases we can do by hand. For that purpose,
please make a frequency table and some sort of display for the
outcomes of 4, 5, 6 and 7 tosses of a fair coin. Here is three
tosses done as an example:# heads freq.0 11 32 33 1
Here is a stem-and-leaf plot. If you wonder where the leaves
came from, think of "1" as "1.0".0|01|0002|0003|0
1