Statistics project

1. Introduction
We all require some form of entertainment; some people usually do it more often than
others. Some individuals take too much time entertaining themselves and others indulge too
much into their work forgetting entirely the need to have fun. With regards to students, I
decided to investigate the correlation between the number of hours spent in front of a screen
for entertainment and the corresponding average letter grade. I was curious to find out the
extent at which entertainment affects a student letter grade. I did this posing these questions
to the respondent:
What is your average letter grade in school?
How many hours per day do you spend in front of a screen for entertainment
purposes? (Facebook, TV, Video games, etc.; school or work do not count).
The source of the data and the sampling method:
2. Hypothesis
I believe that the population mean of entertainment will be about 3 hours. I believe
that this will be the mean because people today are highly dependent on TV and video games
for entertainment. I also believe that this may be lower than the true mean because I think
people may underestimate the amount of television that they watch daily.
3. To ensure that the information acquired is relevant one has to make sure that the
questions asked are relevant and not ambiguous. I made sure that the questions are simple to
answer and do not offend the respondent. Also I tailored the questions to get a relevant
answer such as giving closed ended questions. In taking the sample, I ensured that every part
of the population is covered to avoid under coverage. To solve this problem, I had to make
sure that the questions are well worded and in the right order. An interview gives better
results during the face to face communication although at times, interviewer’s error can affect
the availability of a perfect sample.

4. To obtain my sample, I first sent an email survey through the Angel Database to
150 students. I chose 25 students from all my 6 classes, (25×6=150 students). Eventually 50
students formed my sample size based on who responded to my email survey. I took data
from the only 50 people that responded. I have some in the class courses and some online so I
received the data from both. I believe I chose my experimental units basing on convenience
since these were my classmates. Furthermore, I did some random sampling by giving each
student a chance to be selected in this experiment.
5. My population is the students of Jamestown Community College. This school has
a total of 1024 students, most of who come from the Northeast region of America. The school
has a diverse population, the majority being the whites, followed by black Americans with
the fewest being non-residents and people of an unknown ethnicity.
6. Within the student population at Jamestown community college, which has 1024
students, we can classify the students are based on the following criteria:
Gender: male/female
Full-time undergraduates and part-time undergraduates
Age: ranging 16-60 years
From the population of 1024 students, I took a sample of 50 students. Out of these,
after the random sampling, 25 were male and 24 were female.
Out of the 50 students, 32 of them were full-time students, while 18 of them were
part-time students. On this basis, there was an adequate representation of both groups since in
the total population we found out that there are 638 full-time students and 342 part-time
students.
The age of the population ought to be represented in the sample. However, since the
range of the ages of most students at JCC falls between 17-25yrs, no regard was made to
capture this fact during sampling.

The sample size intended was 150 students, drawn from 6 classes where I had
selected 25 students from each. Of the intended sample size, 50 were responsive to the email
and hence I obtained the data from the same. This means that 150-50=100 students were non
responsive. The percentage of non-responsive students is therefore:
7. (Number of non-responsive students/ sample size) 100 = (100/150) ×100=
66.666%.
8. Sampling biases lead to a non-representative sample of the population. Possible
sampling biases in my case would include selection bias, because I selected the students that
were my classmates. This implies that I did not give the students who are not in my class a
chance to be selected. Probably, students pursuing other courses that demand a lot of their
time would imply less number of hours spent on the screen for entertainment purposes. Self-
selection bias could have occurred by sending an email to only those students I had their
email address, locking out student whose profile I don’t have.
Age bias could also have occurred due to the tendency of interviewing people within
my age bracket since they are easy to reach.
Other forms of bias which could have occurred include gender and place of origin.
Hours on screen Grade Letter
3 A
4 A
3 A
1 A
2 A
2 A
1 A

4 A
3 A
2.5 A
2 A
2 A
2 B+
3 B+
1 B+
4 B+
1 B+
3 B+
3 B+
4 B+
6 B+
6 B+
2 B+
2 B+
4 B+
4 B+
3 B+
4 B
2 B
2 B
3 B

6 B
5 B
2 B
2 B
1 B
4 B
4 C+
1 C+
1 C+
3 C+
5 C+
6 C+
5 C
3 C
5 C
2 C
3 C
6 C
5 F
Descriptive Statistics: hours on screen
Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3
Hours on screen 50 0 3.150 0.215 1.519 1.000 2.000 3.000 4.000
N for

Variable Maximum Range Mode Mode
Hours on screen 6.000 5.000 2 12
9. Numerical Summary of the Data:
Measure of central tendency:
The mean is 3.150 hours spent on the screen by the fifty students.
The median is 3.000 for the number of hours spent on the screen.
The mode for hours spent on the entertainment screen is 2hours.
The range with regard to number of hours spent on the screen is 5 while the standard
deviation the same is 1.519. the range of 5 implies that students spend between 1 hour and 6
hours on the screen while the standard deviation depicts the variability of data in our data set
from the mean, which in our case been 1.519 with our mean of 3.15 hours implies greater
variability or spread from the mean considering our range of 5.
Interpretation of the following number summary:
A minimum of 1 on the number of hours spent on the screen indicates that the turning
point occurs when the number of hours is 1.
Q1 is 2 hours; it implies that the 12th and 13th student (0.25×50=12.5) spent 2 hours
on the screen upon arranging the data set in ascending order.
Median (Q2) is 3.000hours; it implies that the 25th and the 26th students ( 0.5×50=25
student) on average spent 3 hours on the screen upon arranging the data set in ascending
order.
Q3, also third quartile, is 4 hours which implies that the 37th and 38th students
(0.75×50=37.5) spent 4 hours on the screen upon arranging the data set in ascending order.
Maximum of 6 would imply that in a normal distribution graph the turning point at
the maxima is 6.000.

Using the 3s test to identify an outlier:
Suspect-mean=6-3.15=2.85
3s=3×1.519=4.457
Hence, since suspect-mean is <3s, 6 is not an outlier.
There are no outliers in my data set.
10. Categorical Variable Analysis
I chose a pie chart because it is easy and fast to interpret a data set with its help. This
graph helps me to realize that the class is actually performing well with over 67% of the
students attaining a B+ and above. It shows less percentage of students attain lower grades.

11. Histogram Graph
6543210
12
10
8
6
4
2
0
hours on screen
Frequency
Mean 3.15
StDev 1.519
N 50
2.0
2.5
3.0
4.0
1.0
2.0
2.0
1.0
3.0
4.0
3.0
Histogram of hours on screen
Normal
12. The distribution has a skewed shape of the graph with a mode over 2 hours,
which is due to the fact that most of the students in the sample were the members of the same
class sharing same hours in a class and given the same assignment which implies spending
almost the same number of hours at school and at home, hence the amount of time spent on
the screen would be expected to be at 3 hours.
13. Confidence Interval for the Numeric Data
The confidence interval at 95% is 0.01354; it implies that my true mean for the
population lays between 3+0.01354=3.01354, as the upper limit and 3-0.01354=2.98646 as
the lower confidence limit.
The test I used was a simple t-test.

14. Hypothesis Testing for the Numeric Data
a) The null hypothesis is that there is no significant difference between the true mean
of the population i.e. 3 and the samples mean i.e. 3.15.
b)
tcalculated = ((×-µ)ΓN)S
Where: x-sample mean, µ-true mean, N-sample size and S-standard deviation
tcalculated =((3-3.15)Γ50)1.519=0.2208
Using t-test: t score value that is calculated is 0.2208 while the tabulated t score at
95% confidence level is 0.998276, hence since t score calculated is < t score tabulated,
c) We approve or retain the null hypothesis, H0 i.e. there is no significant difference
between the true mean of the population and the sample mean.
16. Conclusion
The population in this survey is adequately represented in terms of age, gender, mode
of learning and ethnicity. However, it was not possible to have respondents from all ethnic
groups.
The sampling method was effective, but in the future, questions should be phrased in
a better way to increase the number of respondents.
Based on the inferences made above I believe the sample chosen is not significantly
different from the population and is representative of the same; and as such I can conclude
that most students spend 3 ±0.01354 hours glued on a screen for entertainment purposes.
Does this affect their grades? Further analysis shows it inversely affects their grades i.e. the
grades score is lower with more time spent on the screen.

Works Cited
Freedman, D.A. Statistical models: Theory and practice. Cambridge University Press, 2005.
Wald, Abraham. Statistical decision functions. New York, NY: John Wiley and Sons, 1950.

Statistics project

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Statistics project

Similar to Statistics project (20)

More from SusanMidTerms

More from SusanMidTerms (6)

Recently uploaded

Recently uploaded (20)

Statistics project