The document provides guidance on sample size calculations for impact evaluations. It discusses setting the sample size when outcomes are continuous variables, proportions, or in cluster-based evaluations. Key points covered include:
- Setting power, significance level, expected means or proportions for treatment and control groups, and standard deviations.
- Using online calculators or software to determine the minimum sample size.
- Adjusting the sample size to account for data loss, cost considerations like different costs for treated vs. control groups, and statistical dependence within clusters.
- Computing the design effect to adjust the sample size for cluster-based evaluations based on the intracluster correlation.
- Considering multiple outcomes and adjusting the sample size accordingly.
Farm Size and Productivity: Lessons from Recent LiteratureIFPRI-PIM
CGIAR Research Program on Policies, Institutions, and Markets Workshop on Rural Transformation in the 21st Century (Vancouver, BC – 28 July 2018, 30th International Conference of Agricultural Economists). Presentation by Douglas Gollin, Oxford University
Farm Size and Productivity: Lessons from Recent LiteratureIFPRI-PIM
CGIAR Research Program on Policies, Institutions, and Markets Workshop on Rural Transformation in the 21st Century (Vancouver, BC – 28 July 2018, 30th International Conference of Agricultural Economists). Presentation by Douglas Gollin, Oxford University
Different techniques of costing in strategic management accounting discussed.
Marginal costing,budgetary control, standard costing,Activity based costing,responsibility costing.
Methods used to place values when markets are weak or missing: non-use values, contingent valuation, hedonic pricing, issues and limitations.
Morteza Rahmatian
This presentation consists of different pricing strategies that can be applied in businesses, and how do you solve or compute for the costing of your product, that would result to a positive profit.
The slides describes the basics of agricultural marketing, its importance, scope with some concepts like market, dimensions of market, market structure, market conduct and market performance.
Different techniques of costing in strategic management accounting discussed.
Marginal costing,budgetary control, standard costing,Activity based costing,responsibility costing.
Methods used to place values when markets are weak or missing: non-use values, contingent valuation, hedonic pricing, issues and limitations.
Morteza Rahmatian
This presentation consists of different pricing strategies that can be applied in businesses, and how do you solve or compute for the costing of your product, that would result to a positive profit.
The slides describes the basics of agricultural marketing, its importance, scope with some concepts like market, dimensions of market, market structure, market conduct and market performance.
International Food Policy Research Institute (IFPRI) organized a three days Training Workshop on ‘Monitoring and Evaluation Methods’ on 10-12 March 2014 in New Delhi, India. The workshop is part of an IFAD grant to IFPRI to partner in the Monitoring and Evaluation component of the ongoing projects in the region. The three day workshop is intended to be a collaborative affair between project directors, M & E leaders and M & E experts. As part of the workshop, detailed interaction will take place on the evaluation routines involving sampling, questionnaire development, data collection and management techniques and production of an evaluation report. The workshop is designed to better understand the M & E needs of various projects that are at different stages of implementation. Both the generic issues involved in M & E programs as well as project specific needs will be addressed in the workshop. The objective of the workshop is to come up with a work plan for M & E domains in the IFAD projects and determine the possibilities of collaboration between IFPRI and project leaders.
Certified Specialist Business Intelligence (.docxdurantheseldine
Certified Specialist Business
Intelligence (CSBI) Reflection
Part 5 of 6
CSBI Course 5: Business Intelligence and Analytical and Quantitative Skills
● Thinking about the Basics
● The Basic Elements of Experimental Design
● Sampling
● Common Mistakes in Analysis
● Opportunities and Problems to Solve
● The Low Severity Level ED (SL5P) Case Setup as an Example of BI Work
● Meaningful Analytic Structures
Analysis and Statistics
A key aspect of the work of the BI/Analytics consultant is analysis. Analysis can be defined as
how the data is turned into information. Information is the outcome when the data is analyzed
correctly.
Rigorous analysis is having the best chance of creating the sharpest picture of what the data
might reveal and is the product of proper application of statistics and experimental design.
Statistics encompasses a complex and detailed series of disciplines. Statistical concepts are
foundational to all descriptive, predictive and prescriptive analytic applications. However, the
application of simple descriptive statistical calculations yields a great deal of usable information
for transformational decision-making. The value of the information is amplified when using these
same simple statistics within the context of a well-designed experiment.
This module is not designed to teach one statistic. It is designed to place statistical work within
the appropriate context so that it can be leveraged most effectively in driving organizational
performance..
An important review of the basic knowledge for work with descriptive and inferential statistics.
The Basic Elements of Experimental Design
Analytic tools also can provide an enhanced ability to conduct experiments. More than just
allowing analysis of output of activities or processes, experiments can be performed on
processes and the output of processes. Experimenting on processes is a movement beyond
the traditional r.
The independent sample t-test is a statistical method of hypothesis testing that determines whether there is a statistically significant difference between the means of two independent samples. It is helpful when an organization wants to determine whether there is a statistical difference between two categories or groups or items and, furthermore, if there is a statistical difference, whether that difference is significant.
Between Black and White Population1. Comparing annual percent .docxjasoninnes20
Between Black and White Population
1. Comparing annual percent of Medicare enrollees having at least one ambulatory visit between B and W
2. Comparing average annual percent of diabetic Medicare enrollees age 65-75 having hemoglobin A1c between B and W
3. Comparing average annual percent of diabetic Medicare enrollees age 65-75 having eye examination between B and W
4. Comparing average annual percent of diabetic Medicare enrollees age 65-75 having
Students will develop an analysis report, in five main sections, including introduction, research method (research questions/objective, data set, research method, and analysis), results, conclusion and health policy recommendations. This is a 5-6 page individual project report.
Here are the main steps for this assignment.
Step 1: Students require to submit the topic using topic selection discussion forum by the end of week 1 and wait for instructor approval.
Step 2: Develop the research question and
Step 3: Run the analysis using EXCEL (RStudio for BONUS points) and report the findings using the assignment instruction.
The Report Structure:
Start with the
1.Cover page (1 page, including running head).
Please look at the example http://www.apastyle.org/manual/related/sample-experiment-paper-1.pdf (you can download the file from the class) and http://www.umuc.edu/library/libhow/apa_tutorial.cfm to learn more about the APA style.
In the title page include:
· Title, this is the approved topic by your instructor.
· Student name
· Class name
· Instructor name
· Date
2.Introduction
Introduce the problem or topic being investigated. Include relevant background information, for example;
· Indicates why this is an issue or topic worth researching;
· Highlight how others have researched this topic or issue (whether quantitatively or qualitatively), and
· Specify how others have operationalized this concept and measured these phenomena
Note: Introduction should not be more than one or two paragraphs.
Literature Review
There is no need for a literature review in this assignment
3.Research Question or Research Hypothesis
What is the Research Question or Research Hypothesis?
***Just in time information: Here are a few points for Research Question or Research Hypothesis
There are basically two kinds of research questions: testable and non-testable. Neither is better than the other, and both have a place in applied research.
Examples of non-testable questions are:
How do managers feel about the reorganization?
What do residents feel are the most important problems facing the community?
Respondents' answers to these questions could be summarized in descriptive tables and the results might be extremely valuable to administrators and planners. Business and social science researchers often ask non-testable research questions. The shortcoming with these types of questions is that they do not provide objective cut-off points for decision-makers.
In order to overcome this problem, researchers often seek to answer o ...
Need a nonplagiarised paper and a form completed by 1006015 before.docxlea6nklmattu
Need a nonplagiarised paper and a form completed by 10/06/015 before 7:00pm. I have attached the documents along the rubics that must be followed.
Coyne and Messina Articles, Part 2 Statistical Assessment
Details:
1) Write a paper of 1,000-1,250 words regarding the statistical significance of outcomes as presented in Messina's, et al. article "The Relationship between Patient Satisfaction and Inpatient Admissions Across Teaching and Nonteaching Hospitals."
2) Assess the appropriateness of the statistics used by referring to the chart presented in the Module 4 lecture and the resource "Statistical Assessment."
3) Discuss the value of statistical significance vs. pragmatic usefulness.
4) Prepare this assignment according to the APA guidelines found in the APA Style Guide located in the Student Success Center. An abstract is not required.
5) This assignment uses a grading rubric. Instructors will be using the rubric to grade the assignment; therefore, students should review the rubric prior to beginning the assignment to become familiar with the assignment criteria and expectations for successful completion of the assignment.
Statistics: What you Need to Know
Introduction
Often, when people begin a statistics course, they worry about doing advanced mathematics or their math phobias kick in. Understanding that statistics as addressed in this course is not a math course at all is important. The only math you will do is addition, subtraction, multiplication, and division. In these days of computer capability, you generally don't even have to do that much, since Excel is set up to do basic statistics for you. The key elements for the student in this course is to understand the various types of statistics, what their requirements are, what they do, and how you can use and interpret the results. Referring back to the basic components of a valid research study, which statistic a researcher uses depends on several things:
·
The research question itself
·
The sample size
·
The type of data you have collected
·
The type of statistic called for by the design
All quantitative studies require a data set. Qualitative studies may use a data set or may use observations with no numerical data at all. For the purposes of the next modules, our focus will be on quantitative studies.
Types of Statistics
There are several types of statistics available to the researcher. Descriptive statistics provide a basic description of the data set. This includes the measures of central tendency: means, medians, and modes, and the measures of dispersion, including variances and standard deviations. Descriptive statistics also include the sample size, or "N", and the frequency with which each data point occurs in the data set.
Inferential statistics allow the researcher to make predictions, estimations, and generalizations about the data set, the sample, and the population from which the sample was drawn. They allow you to draw inferences, generaliza.
About Your Signature Assignment.docxAbout Your Signature Assig.docxransayo
About Your Signature Assignment.docx
About Your Signature Assignment – Due Monday, 20-November
This signature assignment is designed to align with specific program student learning outcome(s) in your program. Program Student Learning Outcomes are broad statements that describe what students should know and be able to do upon completion of their degree. The signature assignments might be graded with an automated rubric that allows the University to collect data that can be aggregated across a location or college/school and used for program improvements.
Purpose of Assignment
The purpose of this assignment is for students to synthesize the concepts learned throughout the course. This assignment will provide students an opportunity to build critical thinking skills, develop businesses and organizations, and solve problems requiring data by compiling all pertinent information into one report.
Assignment Steps
Resources: Microsoft Excel®, Signature Assignment Databases, Signature Assignment Options, Part 3: Inferential Statistics
Scenario: Upon successful completion of the MBA program, say you work in the analytics department for a consulting company. Your assignment is to analyze one of the following databases:
· Manufacturing
· Hospital
· Consumer Food
· Financial
Select one of the databases based on the information in the Signature Assignment Options.
Provide a 1,600-word detailed, statistical report including the following:
· Explain the context of the case
· Provide a research foundation for the topic
· Present graphs
· Explain outliers
· Prepare calculations
· Conduct hypotheses tests
· Discuss inferences you have made from the results
This assignment is broken down into four parts:
· Part 1 - Preliminary Analysis
· Part 2 - Examination of Descriptive Statistics
· Part 3 - Examination of Inferential Statistics
· Part 4 - Conclusion/Recommendations
Part 1 - Preliminary Analysis (3-4 paragraphs)
Generally, as a statistics consultant, you will be given a problem and data. At times, you may have to gather additional data. For this assignment, assume all the data is already gathered for you.
State the objective:
· What are the questions you are trying to address?
Describe the population in the study clearly and in sufficient detail:
· What is the sample?
Discuss the types of data and variables:
· Are the data quantitative or qualitative?
· What are levels of measurement for the data?
Part 2 - Descriptive Statistics (3-4 paragraphs)
Examine the given data.
Present the descriptive statistics (mean, median, mode, range, standard deviation, variance, CV, and five-number summary).
Identify any outliers in the data.
Present any graphs or charts you think are appropriate for the data.
Note: Ideally, we want to assess the conditions of normality too. However, for the purpose of this exercise, assume data is drawn from normal populations.
Part 3 - Inferential Statistics (2-3 paragraphs)
Use the Part 3: Inferential Statistics document.
· Create (f.
Sample size calculation in medical researchKannan Iyanar
A short description on estimation of sample size in health care research. It describes the basic concepts in sample size estimation and various important formulae used for it.
The Paired Sample T Test is used to determine whether the mean of a dependent variable. For example, weight, anxiety level, salary, or reaction time is the same in two related groups. It is particularly useful in measuring results before and after a particular event, action, process change, etc.
StarCompliance is a leading firm specializing in the recovery of stolen cryptocurrency. Our comprehensive services are designed to assist individuals and organizations in navigating the complex process of fraud reporting, investigation, and fund recovery. We combine cutting-edge technology with expert legal support to provide a robust solution for victims of crypto theft.
Our Services Include:
Reporting to Tracking Authorities:
We immediately notify all relevant centralized exchanges (CEX), decentralized exchanges (DEX), and wallet providers about the stolen cryptocurrency. This ensures that the stolen assets are flagged as scam transactions, making it impossible for the thief to use them.
Assistance with Filing Police Reports:
We guide you through the process of filing a valid police report. Our support team provides detailed instructions on which police department to contact and helps you complete the necessary paperwork within the critical 72-hour window.
Launching the Refund Process:
Our team of experienced lawyers can initiate lawsuits on your behalf and represent you in various jurisdictions around the world. They work diligently to recover your stolen funds and ensure that justice is served.
At StarCompliance, we understand the urgency and stress involved in dealing with cryptocurrency theft. Our dedicated team works quickly and efficiently to provide you with the support and expertise needed to recover your assets. Trust us to be your partner in navigating the complexities of the crypto world and safeguarding your investments.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
tapal brand analysis PPT slide for comptetive data
Sample Size Calculations for Impact Evaluations
1. Sample size calculations for
impact evaluations
Marcos Vera-Hernandez
UCL, IFS, and PEPA
m.vera@ucl.ac.uk
2. Objectives
• When doing an impact evaluation, sometimes
we can use existing data. Some other times,
we need to collect our own data. Then:
– What sample size should I use?
– Considerations to minimize data collection costs
– How should adjust the sample size computation if
I use more than one outcome to assess the
intervention
• Emphasis on practical issues rather than
theoretical concepts
3. Topics to be covered
• Setting the stage/ basic definitions
• Sample size with continuous outcomes
– Computations
– Cost minimization
– Data loss
• Sample size with proportions
– Computations
• Cluster based impact evaluations
– Partial take-up and non-compliance
– Intra cluster correlation
– Computations
– Cost minimization
• Adjustment for multiple outcomes
4. Setting the stage
• An intervention to evaluate
– Training program (six months starting July 2014)
• An outcome variable to assess the success of
the intervention
– Labour earnings as of December 2015
• Intervention was allocated using a randomized
experiment:
– Treatment and Control
– Treated = participants, Control = non-participants
5. Setting the stage
• Method to estimate the effect of the
intervention on the outcome variables
(earnings)
– We need to interview treated and control
individuals in January 2016
– There are many individuals
– We will draw a random sample of them (and ask
them for their earnings)
6. Setting the stage
– Compute average earnings for individuals who
were allocated to treatment
– Compute average earnings for individuals who
were allocated to control
– The difference between these two averages
include a random component because we only
used a sample to compute them (not the entire
population)
– We carry out a test of hypothesis to see if the
difference is statistically significant
7. Power and Significance
• Null hypotheses to test:
– The intervention does not have an effect on mean
earnings
– The mean earnings of treated individuals is the
same as the mean earnings of control individuals
• This average is not the average in the sample drawn to
do the interviews and gather the data, it is the average
on the entire population
– Note, the null hypothesis is pessimistic
– The intervention will be deemed successful if the
null hypothesis is rejected
8. Significance
• Probability that our test will reject the null
hypothesis when it is true
– Also known as α
• Probability that our test will say that the
average earnings of treatment and control
individuals is different when in reality they are
the same
• Usually, we ask this probability to be 0.05 (or
smaller)
9. Power
• Probability that our test will reject the null
hypothesis when it is false
• Probability that our test will say that the
average earnings of treatment and control
individuals is different when in reality they are
different
• Usually, we ask this probability to be 0.80 (or
higher)
10. Conducting sample size calculations for continuous
outcomes
What is the minimum sample size required to achieve
an acceptable level of power (usually 0.8) at a given
significance level (usually 0.05)?
The test comprises the difference in the average of the outcome variable
between treated and control individuals (test of means)
We will focus on two-sided tests because they are the most common on
impact evaluations
11. • Example of continuous outcomes: earnings
• What information is needed?
– Power and α
– Mean of outcome variable for treated (μA)
– Mean of outcome variable for controls (μB)
– Standard deviation of the outcome variables for
treated and controls
• Usually assumed to be the same
– Ratio of sample in treated versus controls (nA/nB)
• Usually 1 but there might be good reasons to have
other ratios (see later on)
13. An Example for continuous outcomes
Using available data sources (i.e. Labour Force Survey, British Household
Panel Survey…) and/or other published studies, I estimate that:
The mean of monthly earnings will be 600 for the controls
(μB=600)
The standard deviation of monthly earnings will be 1600: (σ=1600)
• I will assume the same standard deviation for treated and controls
Based on my reading of existing studies, I expect that the mean of monthly
earnings will be 800 for the treated (μA=800)
Other standard parameters: Power =0.80, α=0.05, Ratio nA/nB = 1
http://powerandsamplesize.com/Calculators/Compare-2-Means/2-Sample-Equality
14.
15. We will need data on 1003 individuals in the control
group (nB = 1003)
The software only give us the sample size in group B.
However, as we specified a ratio of 1 (nA/nB = 1), this
means that we will also need data on 1003 individuals
of the control group (nA = 1003)
Total sample size: 1003+1003 = 2006
What happens if we specify a different ratio? For
example: (nA/nB = 2)
16.
17. What happens if we specify a different ratio?
For example: (nA/nB = 2)
In this case, nB = 752
Using the relation (nA/nB = 2), we can easily obtain
nA = 2 x 752= 1504
Note that 1504+752=2256 which is larger than 2006. The
smallest total sample size is achieved with a ratio 1
18. Some important relations
• The larger the power, the larger the sample
• The smaller α, the larger the sample
• The larger the difference between the means of
the treatment and control group, the smaller the
sample
• The larger the standard deviation, the larger the
sample
19. Cost minimization
• Sometimes, it is much more costly to collect
information from treated individuals than
from controls
– For instance, we might have to pay the training
program for treated individuals
– In this case, it makes sense to try several ratios to
see which one minimizes the cost
• Of course, the books also contain formulas that give
you the result directly
20. Cost Minimization
Ratio N Controls N treated Cost per treated Cost per control Total cost
1 1003 1003 2000 1400 3410200
0.9 1059 953.1 2000 1400 3388800
0.8 1128 902.4 2000 1400 3384000
0.7 1218 852.6 2000 1400 3410400
0.6 1337 802.2 2000 1400 3476200
All these combinations give you the same power at the same α, so
it makes sense to use the one that minimizes the cost
21. Data loss
• The sample size obtained are the minimum sample size
to achieve the specified power at the specified α
• But it is realistic to think that some data will be lost:
– An individual that agrees to participate but later on he/she
changes his mind
– We will not be able to locate an individual that initially
agreed to participate
– Some individuals will not want to answer the question on
earnings
– Some other data (gender, age, education) might be missing
• We should inflate the minimum sample size by some
percentage to allow for this loss of data
22. Conducting sample size calculations for outcomes
which are measured as proportions
What is the minimum sample size required to achieve
an acceptable level of power (usually 0.8) at a given
significance level (usually 0.05)?
We will focus on two-sided tests because they are the most common on
impact evaluations
23. • Example: proportion unemployed
• What information is needed?
– Power and α
– Proportion for treated (pA)
– Proportion for controls (pB)
• The standard deviation is not needed for such data as
proportions
– Ratio of sample in treated versus controls (nA/nB)
• Usually 1 but there might be good reasons to have
other ratios (see later on)
24. An Example for proportions
Using available data sources (i.e. Labour Force Survey, British
Household Panel Survey…) and/or other published studies, I
estimate that:
The proportion of unemployed control individuals will be 0.3
(pB=0.3)
Based on my reading of existing studies, I expect proportion of
unemployed treated individuals to be 0.1 (pA=0.1)
Other standard parameters: Power =0.80, α=0.05, Ratio nA/nB =
1
http://powerandsamplesize.com/Calculators/Compare-2-Proportions/2-
Sample-Equality
25.
26. About proportions
• For a given difference between pA and pB, the
closer the proportions are to 0.5, the larger
the sample required will be
– This is because the standard deviation is higher
when the proportions are close to 0.5
• All other considerations are the same as in the
continuous outcomes, including data loss, cost
minimization…
28. Cluster based impact evaluations
• Outcomes at the individual level (earnings, test
scores…) but intervention is allocated at a higher
level (area, school,…)
– Because of logistical/political feasibility or to avoid
contamination from treatment to control (i.e. two students
in the same class)
– The higher level (school, area…) is referred as the cluster
• Computations have to be adjusted for two factors
– Partial take-up and non-compliance
– Statistical dependence of individuals belonging to the
same cluster
29. Partial take-up and non-compliance
• The training intervention that we are evaluating is for
individuals aged 18-20 and takes place in some local area
authorities (LEA) and not others
– The cluster is the LEA
– Only 80% of individuals aged 18-20 sign up for the intervention in
treated LEA
– 10% of individuals aged 18-20 sign up for the intervention in control
LEA (administrative error...)
• We must re-adjust the income averages of 600 & 800
– μA= 0.8 x 800 +(1- 0.8) x 600 = 760
– μB= 0.9 x 600 +(1- 0.9) x 800 = 620
• You would do the same for proportions
• Sometimes, you do not need to adjust because take-up
is full and non-compliance is null
30. Statistical dependence within cluster
• Things happen within a cluster (school, area...) that
make individuals to have similar values of the
outcome variable (teacher quality, plant closures,...)
– If I already collected data from 10 people from the same
area, collecting data from another person from the same
area does not add much additional information (if the
statistical dependence is strong)
• The statistic to measure such statistical dependence
is the Intra Cluster Correlation
31. Statistical dependence within cluster
• Intra Cluster Correlation (ICC):
SS
S
wb
b
22
2
• S2
b= variance of the outcome variable (i.e. earnings) between clusters
• Compute the average of the outcome variable in each cluster
• Compute the variance of those averages
• S2
w= variance of the outcome variable (i.e. earnings) within clusters
• Compute the variance outcome variable in each cluster
• Compute the average of all those variances
• In Stata, you can easily compute it using: loneway earnings idcluster
• If there is maximum dependence (all individuals in the same cluster have
the same value of the outcome variable) then S2
w=0, then the ICC is 1
• If there is no dependence within clusters (all clusters have the same average
value of the outcome variable) then S2
b=0, then the ICC is 0
•The ICC is between 0 and 1. Usually, the larger the cluster the smaller the ICC
is
32. Statistical dependence within cluster
• Intra Cluster Correlation (ICC):
SS
S
wb
b
22
2
• How do I know that ICC for my sample size calculation?
•You can either use a dataset similar to the one that you will be collecting
(same outcome variable, same definition of cluster, similar population)
•You can obtain it from published articles or from publicly accessible
research proposals
33. Statistical dependence within cluster
• Design Effect (DE):
tcoefficienncorrelatioclusterintra
clusterpersindividualofnumber
)1(1
m
mDE
Once we have computed the sample size ignoring any statistical
dependence within cluster (using the methods previously outlined),
then we must multiply that sample size by the Design Effect to
obtain the total sample size adjusted for statistical dependence
within clusters
34. Example cluster based impact evaluation
• Once we have adjusted for partial take-up and non-
compliance, we assume that:
– μA= 0.8 x 800 +(1- 0.8) x 600 = 760
– μB= 0.9 x 600 +(1- 0.9) x 800 = 620
• We use software to compute the required sample
size ignoring any issues related to statistical
dependence within clusters
• We obtain 2047x2=4094 individuals using standard
levels of power and α (see screenshot in following
slide)
35.
36. Example cluster based impact evaluation
• Now, we must take the 4094 and adjust it for statistical
dependence
• Using published studies, we assume that the ICC of
earnings (at the area level) will be 0.05
– If we have 10 individuals per cluster, the DE will be
1+(10-1) x 0.05 = 1.45
– If we have 15 individuals per cluster, the DE will be
1+(15-1) x 0.05 = 1.70
• If we choose 10 individuals per cluster, we multiply 4094 by 1.45 =
5936 total number of individuals (the total number of clusters will
be 5936 /10=594)
• If we choose 15 individuals per cluster, we multiply 4094 by 1.70 =
6960 total number of individuals (the total number of clusters will
be 6960 /15=464)
37. Software for cluster based IE
• You do not need special software, you can use what we have just done in a spreadsheet
• The software below will not adjust for the partial take-up or non-compliance directly,
you have to make the adjustment before entering the parameters
• STATA:
– Install the user command sampclus
• Type: sampsi: 760 620, sd(1600) power(0.8)
• And then type: samplcus, rho(0.05) obsclus(10)
– An alternative is to use this other STATA command that allows to enter different number of treated
and control clusters:
• http://www.population-health.manchester.ac.uk/biostatistics/research/software/clsampsi/
• This can be very useful if the cost of a treatment cluster is different than the cost of a control
cluster or if there are other constraints that limit the number of either treatment and control
clusters
• Optimal Design:
– http://hlmsoft.net/od/
– http://sitemaker.umich.edu/group-based/optimal_design_software
– For proportions, this software uses a different way of adjustment
38. Cost Minimization for cluster based IE
• As we saw, there are different combinations of number of
clusters and number of people per cluster that give the same
power
– If we choose 10 individuals per cluster, we need 5936 individuals (594
clusters, each with 10 individuals)
– If we choose 15 individuals per cluster, we need 6960 individuals (464
clusters, each with 15 individuals)
• Usually there is a fixed cost of going into a cluster plus a
marginal cost of interviewing a person within each cluster
• By trying different number of individuals per cluster, we can
choose the combination of number of clusters and number of
individuals per cluster that minimizes the cost
• Software such as clampsi or Optimal Design can also do this
for us
39. Cost Minimization for cluster based IE
• If the cost of each treatment cluster is higher than the cost of
each control cluster, we might be able to further decrease
costs (but keep the same level of power) by using more
control clusters than treatment clusters
– Software clsampsi might be very useful for this (http://www.population-
health.manchester.ac.uk/biostatistics/research/software/clsampsi/)
41. Multiple Outcomes
• It is standard to use more than one indicator to assess the
success of the intervention
• For instance: earnings and unemployment
• We want to be sure that that the probability of rejecting any
of the null hypotheses (two in this case) when they are both
true is still 0.05
• This is known as the Familywise Error Rate
• This will not be true if we use an α of 0.05 for each hypothesis
• Probability of rejecting either earnings or unemployment =
• 1 – Probability of not rejecting either earnings or unemployment
• 1 – (1 - 0.05)(1 - 0.05)= 0.0975 which is larger than 0.05
• The intuition is that because we are carrying out more than one test, it
gets easier to reject one of them (because of bad luck) even if they are
all true
• This will mean that we will have to use an α for each test
that is smaller than 0.05 (adjustment)
42. Adjustments for multiple outcomes
• They consist of using an α for each test that is
smaller than 0.05, so that the FWER is kept at 0.05
• Many methods:
• Bonferroni:
– α= FWER/(Number of Outcomes)
– α= 0.05/2= 0.025
• Tukey:
a =1-(1- FWER)1/ number outcomes
a =1-(1- 0.05)1/ 2
= 0.0356
43. Example of adjustments for multiple
outcomes
• We go back to the two examples of the beginning:
earnings and proportion of unemployed
• But we carry out the computations using an α of
0.025 (following the Bonferroni method) so that the
FWER is 0.05
44.
45.
46. Example of adjustments for multiple
outcomes
• So we will need 1214x2 = 2428 individuals for
earnings and 72x2=144 for proportion of
unemployment
• Clearly, we must go for the maximum of these two
numbers: 2428 individuals
• Note that this is larger than the number that we
computed at the beginning (2006) that did not adjust
for multiple outcomes