Unraveling Multimodality with Large Language Models.pdf
Chapter 1 review
1. Chapter 1 Review
1.1 WHAT IS STATISTICS?
1.2 RANDOM SAMPLES
1.3 INTRODUCTION TO EXPERIMENTAL DESIGN
2. Page 4
Introduction
Statistics is the study of how to
collect, organize, analyze, and interpret numerical
information from data.
Generally the first thing one must do is gather data. In order to
do that, we need to identify the individuals or objects to be
included in our study and the characteristics that are of
interest.
Individuals are the people or objects included in the study.
A variable is a characteristic of the individual to be measured or
observed.
3. Variables
The variables in a study can be quantitative or
qualitative.
A quantitative variable has a value or numerical
measurement for which operations such as addition or
averaging make sense.
A qualitative variable describes an individual by placing the
individual into a category or group.
4. Some VS All
In population data, the data are from every
individual of interest.
Data from a specific population are fixed and complete
A population parameter is a numerical measure that
describes an aspect of a population
In sample data, the data are from only some of the
individuals of interest.
Data from a sample may vary from sample to sample and are
not complete
A sample statistic is a numerical measure that describes an
aspect of a sample
5. Page 7
Levels of Measurement
Levels of Measurement are another way to classify
data, they indicate the type of arithmetic that is
appropriate for the data
Categorize the level of measurement, take the highest
level appropriate. Consider which calculations are
suitable for the data.
Nominal: data that consists of names, labels, or categories. There
are no implied criteria to order the data from largest to smallest.
Ordinal: data that can be arranged in order, but differences
between data values either cannot be determined or are meaningless.
Interval: data that can be arranged in order and differences
between data values are meaningful.
Ratio: data that can be arranges in order, both differences and ratios
of data values are meaningful. Data at the ratio level have a true zero.
6. Page 13
Sampling Techniques
One of the most important sampling techniques is
the simple random sample.
In a simple random sample every sample of a specific size
has an equal chance of being selected and every individual of
the population has an equal chance of being selected.
To get a random sample use a random-number table or an
random-number generator
7. Page 15
Simulation
A simulation is a numerical copy, imitation, or
representation of a real-world phenomenon.
Random-number tables can be used for simulations
Advantages (p 22): numerical and statistical simulations can
fit real-world problems extremely well. Allows a researcher to
explore situations that might be very dangerous in real-life.
8. Page 15
Sampling With or Without Replacement
Sampling with replacement means that although a
number (or individual) is selected for the sample, it
is not removed from the population. Thus, the same
number may be selected for the sample more than
once.
Simulations often use sampling with replacement.
Sampling without replacement means that when a
number (or individual) is selected for a sample it is
not replaced in the population.
If you need to sample without replacement, generate more
items than you need for the sample and remove duplicate
values.
9. Page 16
Other Sampling Techniques
Random Sample: Use a simple random sample
from the entire population.
In our text, assume (simple) random samples are used
Stratified Sampling: Divide the entire population
into distinct subgroups, called strata, based on a
specific characteristic such that all members of a
stratum share the specific characteristic. Draw
random sample from each stratum.
10. Other Sampling Techniques
Systematic Sampling: Number all members of the
population sequentially. From a random starting
point, include every kth member of the population in
the sample.
Advantage: easy to get
Disadvantage: will not work if a population is repetitive or
cyclic in nature
Cluster Sampling: Divide the entire population
into pre-existing segments or clusters. Make a
random selection of clusters, include every member
of each selected cluster in the sample.
11. Other Sampling Techniques
Multistage Sampling: Use a variety of sampling
methods to create successively smaller groups at
each stage, the final sample consists of clusters.
Convenience Sampling: Create a sample by using
data from population members that are readily
available.
Disadvantage: may be severely biased
12. Page 21
Planning a Statistical Study
1. Identify the individuals or objects of interest
2. Specify the variables and protocols for taking measurements or
making observations
3. Determine whether you will use the entire population or a sample.
If using a sample, choose an appropriate method.
In a census, measurements or observations from the entire population are
used.
Advantage: gives complete information about the population
Disadvantage: obtaining a census can be expensive and difficult
4. Make a data collection plan.
Choose a collection technique: observational study, experiment, or survey
5. Collect the data.
6. Use descriptive statistics methods and make decisions using
appropriate inferential statistics methods.
7. Note any concerns you have about your data collection and
recommendations for future studies.
13. Page 22
Gathering Data
In an observational study, observations and
measurements of individuals are conducted in a way
that doesn’t change the response or variable being
measured.
14. Gathering Data
In an experiment, a treatment is deliberately
imposed on the individuals in order to observe a
possible change in the response or variable being
measured.
Statistical experiments are commonly used to determine the
effect or a treatment. The design needs to control for other
possible causes of the effect.
The Placebo Effect occurs when a subject receives no treatment
but (incorrectly) believes he or she is in fact receiving the
treatment and responds favorably.
15. Experiments cont…
A completely randomized experiment is one in which a
random process is used to assign each individual to one of the
treatments; this accounts for the placebo effect.
Randomly divide individuals into two groups: treatment and
control
The treatment group receives the treatment. The control group
receives a dummy or placebo treatment that is disguised to look
like the real treatment.
After the treatment cycle, compare the groups.
16. Page 24
Experiments cont…
It is difficult to control all the variables that might influence
the response to a treatment. One to control some of the
variables is through blocking.
A block is a group of individuals sharing some common features
that might affect the treatment.
In a randomized block experiment, individuals are first
sorted into blocks, and then a random process is used to assign
each individual in the black to one of the treatments.
17. Experiments cont…
Features of a good experimental design:
There is a control group.
Randomization is used to assign individuals to the two
treatment groups (this helps to prevent bias in selecting
members for each group)
Replication of the experiment on many patients/individuals
reduces the possibility that the results are due to chance
A double-blind experiment is one in which the individual
and the observer do not know which group the individual is in
which helps control subtle bias that the observer may pass
onto the individual.