Hm306 week 6

© 2016© 2016
A Practical Approach to Analyzing
Healthcare Data
Chapter 7 – Sample Selections

© 2016
Types of Studies - Descriptive
• Descriptive studies – performed to generate
hypotheses for more formal studies
– Cross-sectional study – describes the characteristics
of a population at a specific point in time
• Often used for prevalence studies
– Applied descriptive studies
• Data mining
• Exploratory data analysis

© 2016
Types of Studies - Analytic
• Analytic studies – more formal studies designed to test a
specific hypotheses
– Case-control study – involves both a case group (subjects with
the attribute under investigation) and a control group (those
without the attribute)
• Members of the case and control groups are often matched based on
demographics
• Typically a retrospective study
• May not be used to determine cause and effect; can calculate odds
ratio
• Weakness – dependent of subject’s ability to recall events
– Cohort studies – involves case and control group, but groups are
identified before the study is performed
• Prospective study
• May not be used to determine cause and effect; can calculate relative
risk
• May take a long time to complete
• Not useful if the attribute studied is rare

© 2016
Types of Studies - Experimental
• Allow the determination of a cause and effect
relationship between variables
• Randomized Control Trials (RCT)
– Used to determine the effectiveness of new
drugs/treatment protocols
• Blinded studies
– Single blind – subject does not know if they are assigned
to the case or control group
– Double blind – neither subject nor the researcher know if
they are assigned to the case or control group
– Triple blind- subject, researcher and analytics are all
blinded as to the group assignment of the subject

© 2016
Why select a sample?
• Often population is too large to collect data from every
unit of analysis or subject
• Statistical inference is used to make conclusions about
a population based on a sample
• Vocabulary:
– Population or universe – all subjects that are under study
and eligible to be sampled
– Sample – selected subset of the population
– Sampling frame – A listing of all of the subjects in the
population
– Variable of interest – Quantity to be estimated (denial rate,
coding error rate, overpayment, underpayment, etc)

© 2016
Statistically Valid Sample
• Large enough to provide information with
sufficient precision to meet the goals of the
analysis
• Probability sample where each item has
an equal chance of being selected
• Must be reproducible

© 2016
Defining the Variable of Interest
• What is the percent of lab orders that are not signed by a
physician during 2012?
– Universe – all lab orders during 2012
• What is the amount over/under paid due to incorrect E/M level
assignment during January?
– Universe –
• E/M services billed during January
• E/M services provided during January
• Must refine question to determine if billed date or service date should
be used for defining the universe
• What is the coding accuracy rate for secondary diagnosis
codes on inpatient accounts during the first quarter?
– Universe –
• All secondary diagnoses coded during first quarter
• All inpatient accounts during first quarter
• Must refine question to determine if diagnosis codes or charts are the
unit of analysis

© 2016
Simple Random Sampling
• It is the statistical equivalent of drawing sampling units
from a hat.
• Each sampling unit (claim, chart, etc.) must have the
same probability of selection.
• Note that some random number generators will allow
the user to set a ‘seed’. If that feature is available, the
analyst should always set a seed. This will ensure
that the sample can be replicated.
• A simple random sample is not appropriate if the frame
cannot be listed or if it is important that the sample
contain particular (rare) subsets of the population.

© 2016
Random Number Generators
• All random number generators are based on
mathematical functions that need a ‘seed’ or
starting point
• The use of a seed ensures that two independent
samples drawn using the same software will result
in the same series of random numbers and
reproducible sample
• Excel
– RAND() function does not allow a seed
– Random Number Generation in Data Analysis
ToolPak does allow a seed

© 2016
Simple Random Sampling
Steps
• Method 1:
– The members of the sampling frame should be assigned a
random number between 0 and 1
– The frame may then be sorted by the random number
– The first ‘n’ will be the simple random sample of size ‘n’
• Method 2:
– Assign a sequence number from 1 to ‘n’ to each member
of the sampling frame
– Use a random number generator (e.g., ratstats) to select
random numbers from 1 to ‘N’ (N is the population size)

© 2016
Systematic Random Sampling
• A systematic random sample is a simple random sample that
is selected using a particular technique. If the population
includes ‘N’ members and we wish to draw as sample of size
‘n’, then a systemic random sample could be selected by
choosing every N/nth member of the population as the
sample.
– The selection should start at random from a member between
the 1st and N/nth member.
• NOTE: If N/n is not a whole number, then round down to the
next lower whole number to determine the sampling interval.
• In order to ensure that a systematic random sample is truly
random, the population should not be sorted in an order that
might bias the sample.

© 2016
Stratified Random Sampling
• Population is divided into unique subsets or strata
• Strata should be mutually exclusive and exhaustive. In other
words, each of the members of the population should be in one and
only one stratum.
• A simple random sample is then selected from each of the strata
• The size of the sample in each strata may be equal or may be
assigned proportionally according to the relative size of each strata
• Stratified sampling is appropriate when the quantity to be estimated
may vary among natural subgroups (strata) of the population
• Typical strata in healthcare may be:
– CPT® Code (E/M levels)
– Physician
– Specialty
– Clinic

© 2016
Example
• Example: An analyst wishes to select a stratified random
sample of 90 from a population of 1,000 E/M visits. The
distribution of E/M visits in the population is:
– Level 1: 55
– Level 2: 183
– Level 3: 236
– Level 4: 309
– Level 5: 217

© 2016
Example
• Example: An analyst wishes to select a stratified
random sample of 90 from a population of 1,000
E/M visits. The distribution of E/M visits in the
population is:
Level
Population
Count (N)
% of
Population
Sample
Size (n)
1 55
2 183
3 236
4 309
5 217
Totals 1,000 100% 90

© 2016
Example
• Example: An analyst wishes to select a stratified
random sample of 90 from a population of 1,000
E/M visits. The distribution of E/M visits in the
population is:

© 2016
Cluster Sampling
• The population is divided into subsets much
like the strata in stratified sampling
• Clusters should be mutually exclusive and
exhaustive
• All members of each cluster are selected to
be a part of the sample
• Clusters are selected at random
• Cluster sampling is appropriate when it is
difficult to access all of the population

© 2016
Cluster Sampling
Example
The director of the emergency department
would like to audit the accuracy of charge
capture for the first quarter of 2010.
Unfortunately, she is not able to obtain a full
listing of the patients that pass through the ED
for a sampling frame. Instead, a cluster sample
will be drawn using date of service as the
cluster. Select 10 dates via simple random
sampling to produce a cluster sample.

© 2016
Non-probability Sampling
• Random sample not required if:
– Study is exploratory or a focused review
– Example: If we wish to determine educational
opportunities for improving documentation, we may
sample accounts with few secondary diagnoses to
determine if there is a pattern in the types of
diagnosis codes most likely to be missed
• Typically, this sample is driven by some
exploratory data analysis or data mining to help
‘steer’ the sample to subjects most likely to have
the issue of interest

© 2016
Non-probability Sampling
• Convenience sampling
– Example – sample first ‘n’ customers that enter the
hospital cafeteria
• Judgment sampling
– Use exploratory data analysis based on experience
or history
– AKA focused review
– Example – Know from history that the customer
satisfaction in cafeteria is lowest at lunch time
because of long lines. Select sample at that time to
try to improve process.
• Quota sampling
– Subjects divided into groups
– Judgment sample used within each group
– Example – may select first 10 male and 10 female
customers to cafeteria

© 2016
RAT-STATS
• Statistical program provided by the Office of the Inspector
General (OIG)
• Free and downloadable from the OIG website – PC only (no
MAC version)
• Functionality
– Determine sample size
– Create random numbers for sample selection
– Analyze sample data from simple, cluster and stratified sampling
• Two types of studies:
– Attribute – variable of interest is a rate or proportion
– Variable – variable of interest is a interval or ratio quantity

© 2016
RAT-STATS Demonstration
• Instructor:
– Reproduce the demo on pages 125 to 131
with a local installation of RAT-STATS
– Students should practice in the lab

© 2016
Sample Size
• Sample size is dependent on:
– Standard Deviation of the quantity to be estimated
– Desired precision (width of confidence interval)
– Sampling method
– Size of the population (if it is relatively small)
– Resources available to perform the study
• Any analyst that quotes a sample size without asking for the
above information is not making an informed choice regarding
sample size
• The standard deviation of the quantity to be estimated
typically is derived from a pilot study or previous review
– OIG current recommendation for a pilot study is 30

© 2016
Sample Size
Attribute Study • Determined by:
– Anticipated rate
of occurrence
(50% results in
largest sample)
– Confidence level
– Desired
precision range

© 2016
Sample Size
Attribute Study
• A larger sample size is required for:
– A higher level of confidence
– A anticipated rate of occurrence closer
to 50%
– A smaller (narrower) precision range

© 2016
Sample Size
Variable Study • Determined by:
– Probe sample
mean and
standard
deviation
– Confidence
level
– Desired
precision
range

© 2016
Sample Size
Variable Study
• A larger sample size is required for:
– A higher level of confidence
– A larger probe standard deviation
– A smaller (narrower) precision range

© 2016
Sample Size and Precision
• In both types of studies, attribute or variable, a
higher level of precision requires a larger sample
size
• A higher level of precision is equivalent to requiring a
narrower confidence interval for a set confidence
level
• Note that increasing ‘n’ in both the proportion and
mean confidence interval formulas results in
narrower intervals (all other variables held constant)

Hm306 week 6

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Hm306 week 6

Similar to Hm306 week 6 (20)

More from BealCollegeOnline

More from BealCollegeOnline (20)

Recently uploaded

Recently uploaded (20)

Hm306 week 6