2. Statistics
• Last class, we covered a lot of the basic words used in descriptive
statistics, but we didn’t really define Statistics
• Statistics is the mathematics of collecting and analyzing data to draw conclusions and
make predictions
• Note the difference between this and the common usage of the word statistics in our society,
“stats”
• stats are a form of statistics, but only a very small part of what we’ll do in this course
Statistics Inference Probability Variables Sampling Studies Ethics
3. Statistics
• We practice statistics by gathering information about variables, which is
something that can take on different values (they can vary)
• When these things are numbers they’re often referred to as random
variables in the context of statistics
• A value that a variable takes on is called a datum (singular – plural is
data)
• If we collect data on a variable, we call that collection a data set
Statistics Inference Probability Variables Sampling Studies Ethics
4. Statistics
• There are two parts to this course, with a small bridge-like part in
between
• The first part is about descriptive statistics
• The part of statistics where we describe data sets
• Organizing and summarizing data
• Graphing
• Using numbers (finding an average, for example)
• We’ll also be looking at Inferential Statistics
• Formal methods used for drawing conclusions from ‘good’ data
Statistics Inference Probability Variables Sampling Studies Ethics
5. • We’re interested in more than just what is, and what data we’ve been
able to gather.
• We want to draw conclusions and make predictions in general.
• Suppose we know that 460 of 1000 people we asked said they are going
to vote yes on Proposition X.
• Can we say that 46% of the entire population will vote yes for this proposition?
• The data set consisting of the 1000 yes’s and no’s that we collected is called a
sample.
• Any number that describes this sample (for instance that 46% are yes’s) is called a
statistic.
Statistics
Statistics Inference Probability Variables Sampling Studies Ethics
6. Inference
• In the case of Proposition X, the population is the entire potential
collection of yes’s and no’s.
• A parameter is a number that is a property of the population.
• Samples have Statistics; Populations have Parameters.
• It might seem tempting to say that they’re equal, they have totally different
significances.
• Thinking of our 46% - Samples and Statistics are realities (we counted them), Populations
and Parameters are guesses (we’re guessing how many of the entire population will vote
yes.)
Statistics Inference Probability Variables Sampling Studies Ethics
7. Inference
• An inference is a conclusion that you draw from information you receive.
• Inferential statistics takes information from a sample and applies it to the
population.
• Population – a collection of persons, things, or objects under study.
• Sample – a portion (or subset) of the population, used to study and gain information
about the population.
• Data are the result of sampling from a population.
Statistics Inference Probability Variables Sampling Studies Ethics
8. Probability
• Probability is the ‘bridge’ between descriptive and inferential statistics
• We calculate how likely it is that 46% of the entire electorate actually
votes yes on Proposition X?
• That is, assuming we didn’t ask our question at the headquarters of ‘vote yes on
Proposition X’, or ‘vote no on Proposition X’.
• Or how close to this 46% can we expect?
• Probability is a mathematical tool used to study randomness.
• It deals with the chance (or likelihood) of an event occurring.
• Probability is the tool used to transition from descriptive statistics to inferential
statistics.
Statistics Inference Probability Variables Sampling Studies Ethics
9. Probability
• If you toss a fair coin four times, the outcomes may not be two
heads and two tails.
• However, if you toss the same coin 4,000 times, the outcomes will
be close to half heads and half tails
• The expected theoretical probability of heads in any one toss is ½
or 0.5.
• The theory of probability began with the study of games of chance
such as poker.
• In your study of statistics, you will use the power of mathematics
through probability calculations to analyze and interpret your data.
Statistics Inference Probability Variables Sampling Studies Ethics
10. Variables
• Qualitative Variables
• Some variables are characteristics – not numbers
• These variables are also called categorical because the data fall into categories
• Quantitative Variables
• At one of four possible levels of measurement
• Nominal Level – the number is simply a name (Lowest level of measurement)
• ZIP codes, bus route numbers, Social Security numbers, phone numbers, etc.
• Ordinal Level – numbers can be ranked
• Attitudes toward math and newspapers, numbers of stars in a restaurant or movie review, etc
Statistics Inference Probability Variables Sampling Studies Ethics
11. Variables
• Interval Level – variables may be ranked, with the differences between data values
having a definite meaning and significance, but there is no meaning in phrases like
“twice as big”, and there is no meaningful zero
• Shoe size, temperature in Fahrenheit and Celsius (exception for Kelvin scale)
• Ratio Level – variables may be ranked, and the differences between data values DO
have deginite meaning and significance (highest level of measurement)
• Age, height, number of pets, Kelvin scale, income, weight, speed, many more!
• Variables at the interval or ratio level of measurement can be divided into two kinds of
variables – those that are counted, and those that are measured
Statistics Inference Probability Variables Sampling Studies Ethics
12. Variables
• Counted Variables
• These are called discrete because they have gaps between them (whole numbers)
• Measured Variables
• These are called continuous because they can be given to any degree of accuracy,
depending on the measuring device and how you decide to round the numbers
• This leads to the concept of boundaries
Statistics Inference Probability Variables Sampling Studies Ethics
13. Boundaries
• Suppose you weigh the amount of chicken you are going to eat, and it
comes in at 3 ounces. However, if you want to be more accurate, you
switch your scale to grams.
• So, say your scale reads now reads 84 grams, which if you do some quick converting
you’ll find is really 2.96301 oz.
• You see, nothing really weighs exactly 3 oz, or 84 grams.
• Realistically, if it weighs between 83.5 and 84.5 grams, it will read 84 grams
• These numbers, 83.5 and 84.5 are boundaries, and the difference between them (1 gram) shows the
accuracy to which you’re weighing the object
• If you had a more accurate scale, you might find that it really weighs 84.2 grams
• Again, nothing really weighs exactly 84.2 grams; rather it weighs between 84.15 and 84.25 grams
• These are its boundaries, and they are 0.1 gram apart
• Don’t worry about something weighing exactly 84.25 grams and being rounded wrong because
nothing weighs exactly 84.25 grams!
Statistics Inference Probability Variables Sampling Studies Ethics
14. Sampling
• Gathering information about an entire population often costs too much or
is not possible.
• Instead, we use a sample of the population.
• A sample should have the same characteristics as the population it is
representing.
• Most statisticians use various methods of random sampling in an attempt to achieve
this goal.
Statistics Inference Probability Variables Sampling Studies Ethics
15. Sampling Techniques
• Simple Random Sample (or Random
Sample) – every member of a population
has an equal chance of being selected for
the sample.
• Suppose you want to form a study group with
three other people.
• To choose a random sample of size three from
the other members of your class, assign each
member of the class a number from 0 – 30
(assuming 31 in your class, and you are leaving
yourself out.)
Statistics Inference Probability Variables Sampling Studies Ethics
16. Sampling Techniques
• Now, using your TI-83 or TI-84, go to MATH (left side, three buttons down)
• Use your left arrow to go to PRB (probability)
• And either scroll down to 5:randint( , or simply type in 5.
• Now, enter 0, 30) and hit enter 3 times. (If you get a multiple of the same number, hit
enter again.)
Statistics Inference Probability Variables Sampling Studies Ethics
17. Sampling Techniques
• There are other ways to do this sort of a random sample
• There are tables of random numbers
• You could put each persons name on a slip of paper and put them into a hat
Statistics Inference Probability Variables Sampling Studies Ethics
18. Sampling Techniques
• Random Sample – every member of a population has an equal chance of
being selected for the sample.
• This is the preferred method of sampling, but how do you actually accomplish that?
• Say you’re doing a poll.
• You assign a random number to each registered voter, select a certain number of these
numbers, and contact those voters.
• Some of them aren’t home
• Some of them have had their phones disconnected
• Some of them won’t talk to you
• Sometimes the phone is answered by a three-year-old.
Statistics Inference Probability Variables Sampling Studies Ethics
19. Sampling Techniques
• Another technique is called systematic sampling.
• You could pick every tenth number in the phone book for instance.
• However, even if every person answered the phone and your question, you still
wouldn’t have a truly random sample.
• People with the same name would have their chances of being selected greatly decreased,
since you are going every ten.
• Cell phones
• People with no phones at all
• Still, it’s less work than generating and assigning random numbers, and not at all a bad way
to get close to a random sample.
Statistics Inference Probability Variables Sampling Studies Ethics
20. Sampling Techniques
• What you’d like to avoid is a convenience sample.
• Using results that are readily available or easily gathered.
• You go online to see how much it’s going to rain in the next week. At the bottom of the page is a
poll asking ‘Do you believe that the government can stop climate change?’
• The problem here is, who is going to answer?
• Self Selecting
• Care about the question/answer
• Triggered by the idea of climate change, or government intervention
• We won’t know which groups, the yes’s or no’s, will respond more, but we know it’s not in any
way a random response.
Statistics Inference Probability Variables Sampling Studies Ethics
21. Sampling Techniques
• Another technique is called stratified
sampling.
• Think of strata, or layers, of a rock.
• Say you want to know how many units students are
taking, but you want to be sure that you get a fairly
equal number of men and women in your sample.
• Instead of getting a random sample from the whole
group, first split the groups into two sexes and then
get a random sample from each group.
• You can do this with more than two groups as well
• Suppose you want to make sure you get a fairly equal
number of people born in the 1960’s, in the 1970’s, the
1980’s, and the 1990’s.
Statistics Inference Probability Variables Sampling Studies Ethics
22. Sampling Techniques
• Another technique is called cluster sampling.
• This is actually how unemployment figures are estimated,
and it’s often used in evaluating medical techniques.
• Select, at random, a bunch of neighborhoods, or hospitals,
and get the information about every person or procedure in
the selected parts.
• This is the reverse of stratifying, where you split first and then
randomly select;
• Here, you randomly select the groups and then try to do a census
of the groups chosen.
• Census – the information about the entire population.
• Why not just do a Census? Well – it’s very expensive, and not always
possible.
Statistics Inference Probability Variables Sampling Studies Ethics
23. Observations and Experimental Studies
• The purpose of any study is to investigate the relationship between two
variables
• If we notice something, or observe something, we are doing an
observational study
• Even if we are asking someone a question or measuring them or having them fill out a
survey – we’re not altering the information; just collecting it
• A research study comparing the risk of developing lung cancer, between smokers and
non-smokers, would be one example
• This type of research draws a conclusion by comparing subjects against a control
group, in cases where the researcher has no control over the experiment
• One of the main reasons for performing any observational research is due to ethical
concerns
Statistics Inference Probability Variables Sampling Studies Ethics
24. Observations and Experimental Studies
• Sometimes researchers need to manipulate things a little to see what
effects are produced
• When you do that, manipulate things, you’re conducting an experimental
study
• There is a lot of terminology, protocols, and ethics associated with doing experimental
studies
• Suppose there are trials of a new drug meant to reduce blood pressure
• The drug (and the dosage) is the independent or explanatory variable
• You want to see if it has an effect on blood pressure
• The blood pressure readings are the dependent or outcome variable
Statistics Inference Probability Variables Sampling Studies Ethics
25. Experimental Design
You then give some patients the drug And you would give some patients a placebo
This is the treatment group This is the control group
Statistics Inference Probability Variables Sampling Studies Ethics
26. Experimental Design
• It is possible that someone’s blood pressure might well drop just because
they believe that the new medicine will be effective
• The placebo (Latin for “I will please”) is a fake pill to try and control for
this possibility
• The volunteers do not know which group they are in (blind study)
• If you do not tell the people taking their blood pressure which patients are
which, you now have a double blind study
• This way, you eliminate the power of suggestion on both sides
Statistics Inference Probability Variables Sampling Studies Ethics
27. Experimental Design
• You look for a difference in the average blood pressure of the treatment
group compared to the control group after the drug has been taken for a
suitable amount of time
• If the difference is big enough, you declare that your drug works
• We’ll spend a lot of time in this course talking about how to find out if it IS
big enough
Statistics Inference Probability Variables Sampling Studies Ethics
28. Experimental Design
• Even with all of these precautions, it might turn out that the drug
“worked”, not because of the specially developed chemical you put in, but
because of some supposedly unimportant substance you used to fill up
space in the pill
• This might be discovered some time later when the drug stops working
because you started using some other filler
• When something else besides the independent variable is responsible for
a difference in the dependent variable between the control group and the
treatment group, we call that something else a confounding variable
• It confounds, or confuses, the analysis of the effect
Statistics Inference Probability Variables Sampling Studies Ethics
29. Ethics
• The widespread misuses and misrepresentation of statistical information
often gives the field a bad name
• Ways that data can be misused and misrepresented
• Creating datasets, which largely confirm the prior expectations
• Altering data in existing datasets
• Changing measuring instruments without reporting the change
• Misrepresenting the number of experimental subjects
• The Vaccine War
Statistics Inference Probability Variables Sampling Studies Ethics
30.
31. Ethics
• Many types of statistical fraud are difficult to spot
• Some researches simply stop collecting data once they have just enough to prove
what they had hoped to prove
• They don’t want to take the chance that a more extensive study would complicate
their lives by producing data contradicting their hypothesis
• Professional organizations, like the American Statistical Association,
clearly define expectations for researchers
• There are even laws in the federal code about the use of research data
Statistics Inference Probability Variables Sampling Studies Ethics
32. Ethics – Human Participant
• Ethics and Law dictate that researchers should be mindful of the safety of
their research subjects.
• U.S. Dept. of Health and Human Services oversees federal regulations of
research studies with the aim of protecting participants.
• Research institutions, to ensure the safety of all human subjects,
establish oversight committees known as Institutional Review Boards
(IRB)
Statistics Inference Probability Variables Sampling Studies Ethics
33. Ethics – Human Participant
• All planned studies must be approved in advance by the IRB
• Key protections that are mandated by law include:
• Risks to participants must be minimized and reasonable with respect to projected benefits
• Participants must give informed consent
• The risks of participation must be clearly explained to the subjects of the study
• Subjects must consent in writing, and researches are required to keep documentation of the consent
• Data collected from individuals must be guarded carefully to protect their privacy
• Understanding these safeguards and protections is important so that you can
recognize proper data analysis
Statistics Inference Probability Variables Sampling Studies Ethics