SlideShare a Scribd company logo
1 of 55
Download to read offline
1
AN ASSIGNMENT
ON
ADVANCED BIOSTATISTICS
SUBMITTED TO
Professor Dr. Md. Shah Jahan Mazumder
Chairman
Department of Agricultural Statistics
&
Dean
Faculty of Agricultural Economics and Business Studies
Syndicate Member
Sylhet Agricultural University, Sylhet-3100
SUBMITTED BY
Muhammed Hossain
MS Student
ID. No. 1401011201
Registration No. 0445
Session: January-June’2014
Department of Parasitology
Faculty of Veterinary and Animal Science
Sylhet Agricultural University, Sylhet-3100
2
ACKNOWLEDGEMENT
At the outset, the author wishes to acknowledge the immeasurable grace and kindness
of “Almighty Allah” who enable the author to complete the assignment successfully.
With great pleasure, it is undoubtedly a great task to make an attempt to prepare an up
to date An Assignment on Advanced Biostatistics. The author would like to express
his heartfelt gratitude to his eminent Assignment Supervisor Professor Dr. Md. Shah
Jahan Mazumder, Chairman, Department of Agricultural Statistics and Dean,
Faculty of Agricultural Economics and Business Studies, Sylhet Agricultural
University, Sylhet for his scholastic guidance, supervision, intensive care, continuous
inspiration, wise criticism and cordial assistance for successful completion of this
assignment.
Special thanks to Professor DR. Md. Jamal Uddin Bhuiyan, Chairman department
of Parasitology, Faculty of Veterinary & Animal Science, Sylhet Agricultural
University, Sylhet for his mental support, well guidance and encouragement in writing
this assignment.
Cordial thanks to DR. Tilak Chandra Nath, Lecturer department of Parasitology,
Faculty of Veterinary & Animal Science, Sylhet Agricultural University, Sylhet for
his support and encouragement for writing this assignment.
I wish to express my appreciation and gratitude to the various authors and publishers
for using their books for collecting various information.
The Author
June’ 2014
3
INDEX
SL. NO. CONTENTS PAGE NO.
01 Acknowledgement i
02 Objectives ii
03 Introduction 01-02
04 General Discussion 03-04
05 Research Methodology 05-06
06 Correlation and Regression 07-15
07 Chi Square Test 16-23
08 Hypothesis Testing 24-30
09 Design of Experiment 31-32
10 Life Table 33-39
11 Sample and Sampling 40-44
12 Skewness and Kurtosis 45-47
13 Conclusion 48
14 References 49-52
4
Introduction
Statistics are measurement, enumerations or estimates of natural or social phenomena
systematically arrangement to exhibit their inner relation. To produce these financial
estimates, projections of the population in the Social Security coverage area are needed. One
of the essential components of population projections is a projection of mortality, which is
the subject of this study. The Biostatistics is the most important subject from the point of
view that for analyzing any data you need to study it properly. Mortality rates are presented
in this study in the context of life tables, which are commonly used by actuaries and
demographers. The most familiar measure of dependence between two quantities is the
Pearson product-moment correlation coefficient, or "Pearson's correlation coefficient",
commonly called simply "the correlation coefficient". It is obtained by dividing the
covariance of the two variables by the product of their standard deviations. Karl Pearson
developed the coefficient from a similar but slightly different idea by Francis Galton. A life
table is a concise way of showing the probabilities of a member of a particular population
living to or dying at a particular age. In this study, the life tables are used to examine the
mortality changes in the Social Security population over time. Actual data have become more
abundant and more reliable; the use of approximate analytical methods has become less
necessary and acceptable. Hypothesis testing is a core and important topic in statistics. In the
research hypothesis testing, a hypothesis is an optional but important detail of the
phenomenon. The null hypothesis is defined as a hypothesis that is aimed to challenge a
researcher. Generally, the null hypothesis represents the current explanation or the vision of a
feature which the researcher is going to test. Hypothesis testing includes the tests that are
used to determine the outcomes that would lead to the rejection of a null hypothesis in order
to get a specified level of significance. This helps to know if the results have enough
information, provided that conventional wisdom is being utilized for the establishment of null
hypothesis. Today, mortality is most commonly represented in the form of a life table, which
gives probabilities of death within one year at each exact integral age. These probabilities are
generally based on tabulations of deaths in a given population and estimates of the size of that
population. For this study, functions in the life table can be generated from the qx, where qx is
the probability of death within a year of a person aged x. Although a life table does not give
mortality at non-integral ages or for non-integral durations, as can be obtained from a
mathematical formula, acceptable methods for estimating such values are well known. Two
basic types of life tables are presented in this study, period-based tables and cohort-based
5
tables. Each type of table can be constructed either based on actual population data or on
expected future experience. A period life table is based on, or represents, the mortality
experience of an entire population during a relatively short period of time, usually one to
three years. Life tables based directly on population data are generally constructed as period
life tables because death and population data are most readily available on a time period
basis. Such tables are useful in analyzing changes in the mortality experienced by a
population through time. Cohort life tables based directly on population experience data are
relatively rare, because of the need for data of consistent quality over a very long period of
time. Cohort tables can, however, be readily produced, reflecting mortality rates from a series
of period tables for past years, projections of future mortality, or a combination of the two.
Such tables are superior to period tables for the purpose of projecting a population into the
future when mortality is expected to change over time, and for analyzing the generational
trends in mortality. The entry lx in the life table shows the number of survivors of that birth
cohort at each succeeding exact integral age. Another entry, dx, shows the number of deaths
that would occur between succeeding exact integral ages among members of the cohort. The
entry denoted Lx gives the number of person-years lived between consecutive exact integral
ages x and x+1 and Tx gives the total number of person-years lived beyond each exact
integral age x, by all members of the cohort. The final entry in the life table, represents the
average number of years of life remaining for members of the cohort still alive at exact
integral age x, and is called the life expectancy. The Biostatistics is the most crucial subjects
in the field of biological science. The use of statistics is seen the field to know the significant
of any experiments or test results. Statistics is indispensable into planning in the modern age
which is termed as “the age of planning”. Almost all over the world the govt. are restoring to
planning for economic development. Statistical data and techniques of statistical analysis
have to immensely useful involving economical problem.So the Biostatistics in an
indispensible subject to study.
Objectives:
1. To know the uses and importance of Biostatistics in the field of veterinary science.
2. To know the research methodology and the process of data analysis.
3. To study the correlation between the variables.
4. To study the significance of different test like, T-test, Chi square test and F-test etc.
5. To know the detailed procedure of sample and sampling.
6
General Discussion
Definition of statistics:
Statistics can be defined as the collection presentation and interpretation of numerical data.
Statistics are numerical statement of facts in any department of enquiry placed interrelation to
each other (Bouly). Statistics are measurement, enumerations or estimates of natural or social
phenomena systematically arrangement to exhibit their inner relation (Conner).
Biostatistics:
Biostatistics may be defined as the collection, organization, presentation, analysis and
interpretation on numerical data related to biological science is called Biostatistics.
Scope and importance of Statistics:
1. Statistics and planning: Statistics in indispensable into planning in the modern age which
is termed as “the age of planning”. Almost all over the world the govt. are re-storing to
planning for economic development.
2. Statistics and economics: Statistical data and techniques of statistical analysis have to
immensely useful involving economical problem. Such as wages, price, time series analysis,
demand analysis.
3. Statistics and business: Statistics is an irresponsible tool of production control. Business
executive are relying more and more on statistical techniques for studying the much and
desire of the valued customers.
4. Statistics and industry: In industry statistics is widely used inequality control.
5. Statistics and mathematics: Statistics are intimately related recent advancements in
statistical technique are the outcome of wide applications of mathematics.
6. Statistics and modern science: In medical science the statistical tools for collection,
presentation and analysis of observed facts relating to causes and incidence of dieses and the
result of application various drugs and medicine are of great importance.
7. Statistics, psychology and education: In education and physiology statistics has found wide
application such as, determining or to determine the reliability and validity to a test, factor
analysis etc.
8. Statistics and war: In war the theory of decision function can be a great assistance to the
military and personal to plan “maximum destruction with minimum effort.
7
Statistics in business and management:
1. Marketing: Statistical analysis are frequently used in providing information for making
decision in the field of marketing it is necessary first to find out what can be sold and the to
evolve suitable strategy, so that the goods which to the ultimate consumer. A skill full
analysis of data on production purchasing power, man power, habits of compotators, habits of
consumer, transportation cost should be consider to take any attempt to establish a new
market.
2. Production: In the field of production statistical data and method play a very important
role.
3. Finance: The financial organization discharging their finance function effectively depend
very heavily on statistical analysis of peat and tigers.
3. Banking: Banking institute have found if increasingly to establish research department
within their organization for the purpose of gathering and analysis information, not only
regarding their own business but also regarding general economic situation and every
segment of business in which they may have interest.
4. Investment: Statistics greatly assists investors in making clear and valued judgment in his
investment decision in selecting securities which are safe and have the best prospects of
yielding a good income.
5. Purchase: the purchase department in discharging their function makes use of statistical
data to frame suitable purchase policies such as what to buy? What quantity to buy? What
time to buy? Where to buy? Whom to buy?
6. Accounting: statistical data are also employer in accounting particularly in auditing
function, the technique of sampling and destination is frequently used.
7. Control: the management control process combines statistical and accounting method in
making the overall budget for the coming year including sales, materials, labor and other
costs and net profits and capital requirement.
Biostatistics and Statistical Programming: Scope International covers all aspects regarding
statistics. Our team is committed to providing the highest quality services and always works
together with the sponsor’s goal in mind, flexibility remains a key priority. Statistical advice
on study design and protocol preparation. Sample size calculations, Statistical analysis
planning (SAP), Statistical analysis of clinical data including validation using SAS, Statistical
listings and reports.
8
Thesis Title: “Prevalence of hookworms and Thread worm infections in
human population of tea garden area of Sylhet region”
Research Methodology
Study Area, Time period and populations
The study will be conducted for a period of 12 months starting from June 2014 to May 2015.
To investigate the helmints parasites, faeces will randomly collect from the people of
different areas of tea garden of Sylhet region from June 2014 to May 2015.The tea garden
dwellers are predominantly tea garden workers and traders but some of them engage
themselves to produce vegetables, rear cattle, goat and chickens. Housing facilities are very
poor. There is electricity, but potable water and sanitary toilet facilities are lacking in the tea
garden area of sylhet region.
Collection of stool specimens
Fresh stool specimens will be collected from 360 tea garden workers after proper
enlightenment. The stool samples would be collected into dry, clean, transparent, screw-cap
universal bottles. The specimens will be processed on the same day of collection at the
Parasitology Laboratory, Sylhet Agricultural University, Sylhet.
Macroscopy
Stool samples will be examined visually, within transparent containers for consistency,
presence of blood, mucus or adult worms etc.
Microscopy
A pea-size quantity of formed stool specimen will placed in a clean universal bottle and
homogenized with few drops of normal saline. Direct saline and iodine smears would made
on clean slides and examine with 10x and 40x objective lenses for parasites. Sedimentation
technique will be employed to concentrate all the stool specimens.
Culture of worm’s larvae
Stool samples previously confirmed by microscopy to carry eggs of hookworm, will be
cultured to recover larvae using the Harada – Mori technique (Raiet al., 1997). Wear gloves
when performing this procedure. Cut a narrow (3/8 by 5 inch.) strip of filter paper, and taper
it slightly at one end. Smear 0.5 to 1g feces in the centre of the strip. Add 3 to 4 ml of
9
distilled water to a 15 ml conical centrifuge tube. Insert the filter paper strip into the tube so
that the taper end is near the bottom of the tube. The water level should be slightly (0.5 inch.)
below the fecal spot. It is not necessary to cap the tube. However, a cork stopper or a cotton
plug may be used. Tube will be allowed to stand upright in a rack at 25 to 28°C. To maintain
the original level (usually evaporation takes place over the first 2 days, but then the culture
becomes stabilized) distilled water will be added. The tube will be kept for 10 days, and
check daily by withdrawing a small amount of fluid from the bottom of the tube. Prepare a
smear on a glass slide, cover with a cover slip, and examine with the 10X objectives.
Examine the larvae for motility and typical morphological features to reveal whether
hookworm or Strongyloides larvae are present. Filariform larvae of
Ancylostomaduodenaleand Necatoramericanusare characterized by a blunt head and tail and
there is no gap between the oesophagus and intestine. Whereas the oesophagus of the larvae
of Ancylostomaduodenaledoes not end in a thistle funnel shape, that of the larvae of
Necatoramericanusends in a thistle funnel shape. This is the main feature which is used to
differentiate between the two species (Okolie, 2007).
Statistical analysis: The imported data will be imported, stored and coded accordingly using
Microsoft Excel 2008. Statistical test will be done for depending upon the findings. Determination of
association between the variables will be done by using statistical software SPSS.
Time Frame:
Serial No. Name of Activity Duration
01 Collection of required instruments and laboratory set up 01 Month
02 Sample collection, Preparation, Culture and Examination 10 Month
03 Thesis Writing and Presentation 01 Month
10
Correlation and Regression
The goal of a correlation analysis is to see whether two measurement variables co vary, and
to quantify the strength of the relationship between the variables, whereas regression
expresses the relationship in the form of an equation. For example, in students taking a Maths
and English test, we could use correlation to determine whether students who are good at
Maths tend to be good at English as well, and regression to determine whether the marks in
English can be predicted for given marks in Maths.
Use of Correlation
We can use the correlation coefficient, such as the Pearson Product Moment Correlation
Coefficient, to test if there is a linear relationship between the variables. To quantify the
strength of the relationship, we can calculate the correlation coefficient (r). Its numerical
value ranges from +1.0 to -1.0. r> 0 indicates positive linear relationship, r < 0 indicates
negative linear relationship while r = 0 indicates no linear relationship.
Use of Regression
In regression analysis, the problem of interest is the nature of the relationship itself between
the dependent variable (response) and the (explanatory) independent variable. The analysis
consists of choosing and fitting an appropriate model, done by the method of least squares,
with a view to exploiting the relationship between the variables to help estimate the expected
response for a given value of the independent variable. For example, if we are interested in
the effect of age on height, then by fitting a regression line, we can predict the height for a
given age.
Uses of Correlation and Regression
There are three main uses for correlation and regression.
 One is to test hypotheses about cause-and-effect relationships. In this case, the
experimenter determines the values of the X-variable and sees whether variation in X
causes variation in Y. For example, giving people different amounts of a drug and
measuring their blood pressure.
 The second main use for correlation and regression is to see whether two variables are
associated, without necessarily inferring a cause-and-effect relationship. In this case,
neither variable is determined by the experimenter; both are naturally variable. If an
11
association is found, the inference is that variation in X may cause variation in Y, or
variation in Y may cause variation in X, or variation in some other factor may affect
both X and Y.
 The third common use of linear regression is estimating the value of one variable
corresponding to a particular value of the other variable. The linear correlation
coefficient is the ratio between the covariance and the product of standard deviations
of both variables.
The linear correlation coefficient is denoted by the letter r.
Properties of the Correlation Coefficient:
1. The correlation coefficient does not change the measurement scale. That is, if the
height is expressed in meters or feet, the correlation coefficient does not change.
2. The sign of the correlation coefficient is the same as the covariance.
3. The linear correlation coefficient is a real number between −1 and 1.
4. −1 ≤ r ≤ 1
5. If the linear correlation coefficient takes values closer to −1, the correlation is strong
and negative, and will become stronger the closer r approaches −1.
6. If the linear correlation coefficient takes values close to 1 the correlation is strong and
positive, and will become stronger the closer r approaches 1
7. If the linear correlation coefficient takes values close to 0, the correlation is weak.
8. If r = 1 or r = −1, there is perfect correlation and the line on the scatter plot is
increasing or decreasing respectively.
9. If r = 0, there is no linear correlation.
Example: The scores of 12 students in their mathematics and physics classes are:
Mathematics 2 3 4 4 5 6 6 7 7 8 10 10
Physics 1 3 2 4 4 4 6 4 6 7 9 10
12
Find the correlation coefficient distribution and interpret it.
1º Find the arithmetic means.
2º Calculate the covariance.
3º Calculate the standard deviations.
4º Apply the formula for the linear correlation coefficient.
xi yi xi ·yi xi
2
yi
2
2 1 2 4 1
3 3 9 9 9
4 2 8 16 4
4 4 16 16 16
5 4 20 25 16
6 4 24 36 16
6 6 36 36 36
7 4 28 49 16
7 6 42 49 36
8 7 56 64 49
10 9 90 100 81
10 10 100 100 100
72 60 431 504 380
13
The correlation is positive.
As the correlation coefficient is very close to 1, the correlation is very strong.
The values of the two variables X and Y are distributed according to the following table:
Y/X 0 2 4
1 2 1 3
2 1 4 2
3 2 5 0
Calculate the correlation coefficient.
Turn the double entry table into a single table.
xi yi fi xi · fi xi
2
· fi yi · fi yi
2
· fi xi · yi· fi
0 1 2 0 0 2 2 0
0 2 1 0 0 2 4 0
0 3 2 0 0 6 18 0
2 1 1 2 4 1 1 2
2 2 4 8 16 8 16 16
2 3 5 10 20 15 45 30
4 1 3 12 48 3 3 12
4 2 2 8 32 4 8 16
20 40 120 41 97 76
14
The correlation is negative. As the correlation coefficient is very close to 0, the correlation is
very weak.
Pearson's product-moment coefficient
Main article: Pearson product-moment correlation coefficient. The most familiar measure of
dependence between two quantities is the Pearson product-moment correlation coefficient, or
"Pearson's correlation coefficient", commonly called simply "the correlation coefficient". It is
obtained by dividing the covariance of the two variables by the product of their standard
deviations. Karl Pearson developed the coefficient from a similar but slightly different idea
by Francis Galton.
The population correlation coefficient ρX,Y between two random variablesX and Y with
expected valuesμX and μY and standard deviations X and Y is defined as:
whereE is the expected value operator, cov means covariance, and, corr a widely used
alternative notation for the correlation coefficient.
The Pearson correlation is defined only if both of the standard deviations are finite and both
of them are nonzero. It is a corollary of the Cauchy–Schwarz inequality that the correlation
cannot exceed 1 in absolute value. The correlation coefficient is symmetric:
corr(X,Y
) = corr(Y
,X). The Pearson correlation is +1 in the case of a perfect direct (increasing)
linear relationship (correlation), −1 in the case of a perfect decreasing (inverse) linear
15
relationship (anti-correlation), and some value between −1 and 1 in all other cases, indicating
the degree of linear dependence between the variables. As it approaches zero there is less of a
relationship (closer to uncorrelated). The closer the coefficient is to either −1 or 1, the
stronger the correlation between the variables. If the variables are independent, Pearson's
correlation coefficient is 0, but the converse is not true because the correlation coefficient
detects only linear dependencies between two variables. For example, suppose the random
variable X is symmetrically distributed about zero, and Y = X2
. Then Y is completely
determined by X, so that X and Yare perfectly dependent, but their correlation is zero; they
are uncorrelated. However, in the special case when X and Y are jointly normal,
uncorrelatedness is equivalent to independence.
If we have a series of n measurements of X and Ywritten as xi and yi where i = 1, 2, ...,n, then
the sample correlation coefficient can be used to estimate the population Pearson correlation
r between X and Y
. The sample correlation coefficient is written
where x and y are the sample means of X and Y
, and sx and sy are the sample standard
deviations of X and Y
.
This can also be written as:
If x and y are results of measurements that contain measurement error, the realistic limits on
the correlation coefficient are not −1 to +1 but a smaller range. For the case of a linear model
with a single independent variable, the coefficient of determination (R squared) is the square
of r, Pearson's product-moment coefficient.
Rank correlation coefficients:
Main articles: Spearman's rank correlation coefficient and Kendall tau rank correlation
coefficientRank correlation coefficients, such as Spearman's rank correlation coefficient and
16
Kendall's rank correlation coefficient ( ) measure the extent to which, as one variable
increases, the other variable tends to increase, without requiring that increase to be
represented by a linear relationship. If, as the one variable increases, the other decreases, the
rank correlation coefficients will be negative. It is common to regard these rank correlation
coefficients as alternatives to Pearson's coefficient, used either to reduce the amount of
calculation or to make the coefficient less sensitive to non-normality in distributions.
However, this view has little mathematical basis, as rank correlation coefficients measure a
different type of relationship than the Pearson product-moment correlation coefficient, and
are best seen as measures of a different type of association, rather than as alternative measure
of the population correlation coefficient.
Partial correlation
If a population or data-set is characterized by more than two variables, a partial correlation
coefficient measures the strength of dependence between a pair of variables that is not
accounted for by the way in which they both change in response to variations in a selected
subset of the other variables.
Correlation Coefficient
Statistics is a branch of mathematics that deals with a numeric data. It is basically concerns
with collection, manipulation, management, organization and analysis of numeric data. One
of the most important concepts that we come across in statistics is a correlation. Correlation
indicates the statistical dependence between two random variables. It is the measure of the
relationship between two sets of data.
For example - Correlation between height and weight of students of a particular standard
refers to the overall relationship between their height and weight that in what manner they
vary. Correlation is of two types -
(1) Positive Correlation - When the values of one variable increase with that of another are
increased.
(2) Negative Correlation - When the values of one variable decrease with that of another are
increased or vice versa.
(3) No Correlation - There is no impact on one variable with an increase or decrease of
values of another variable.
The correlation between two variables is a number which is known as a correlation
coefficient. Correlation Coefficient is a statistical concept, which helps in establishing a
17
relation between the predicted and actual values obtained in a statistical experiment. The
calculated value of the correlation coefficient explains the exactness between the predicted
and actual values. Its value always lies between -1 to +1. If the value of the correlation
coefficient is positive, it indicates the similar and identical relation between the two values.
Correlation Coefficient
Correlation can be defined as the degree of relationship between two variables. It needs pairs
of points to be available for every set of values of each of the variable. In a two dimensional
plot, the variables can be arbitrarily labelled as X and Y, where X mostly attains the
independent variable, which is used for prediction, and Y attains the dependent variable, the
value which is predicted. The correlation coefficient, sometimes also called the cross-
correlation coefficient.Correlation is a technique which shows if and how strongly pairs of
variables are related.
Correlation Coefficient Formula
Formula for simple Correlation coefficient is given below,If x & y are the two variables of
discussion, then correlation coefficient r can be calculated using the formula
Intra-class Correlation Coefficient
The intra-class correlation is commonly used to quantify the degree to which individuals with
a fixed degree of relatedness. The correlation coefficient is a measure that determines the
degree to which the movements of two variables are associated. Correlation coefficients are
very sensitive to sample size. A correlation coefficient interpreted in relation to the size of the
sample form which it was obtained. With a sufficient increase in sample size, almost any
observed correlation value will statistically significant, even if it is so small as to a
meaningless indicator of association. The intra-class correlation coefficient is a reliability
coefficient calculated with variance estimates obtained through analysis of variance. Intra-
class correlation coefficient can be used for two or more ratings.
Multiple Correlation Coefficient
The sample multiple correlation coefficient, R, is a measure of the strength of the association
between the independent variables and the one dependent variable. Multiple correlation is a
measure of how well a given variable can be predicted using a linear function of a set of other
variables. It is measured by the coefficient of determination, but under the particular
18
assumption. 'R' can be any value from 0 to +1. The multiple correlation coefficient measures
the correlation between a dependent variable and the combined effect of other designated
variables in the system.
Partial Correlation Coefficient
A partial correlation coefficient is a measure of the linear dependence of a pair of random
variables from a collection of random variables in the case where the influence of the
remaining variables is eliminated. A partial correlation between two variables can differ
substantially from their simple correlation. Sometimes the correlation between two variables
X and Y may be partly due to the correlation of third variables, Z with both X and Y. This
correlation is called the partial correlation and the correlation coefficient between X and Y
after the linear effect of Z on each of them has been eliminated is called the partial correlation
coefficient. A partial correlation coefficient can be written in terms of simple correlation
coefficients:
Population Correlation Coefficient
The population correlation coefficient ρ measure the degree of association between two
variables in the population of interest. The population correlation coefficient is estimated by
the confidence interval.
Linear Regression Coefficient
Regression measures the amount of average relationship or mathematical relationship
between two variables in terms of original units of data. Linear Regression can be measured
by using lines of regression and also curve linear regression can be measured by using
correlation ratio. In linear regression, the coefficient of determination, R2
, is equal to the
square of the correlation coefficient, i.e., R2
= r2
.
19
Chi-Square Test
Chi-square is a statistical test commonly used to compare observed data with data we would
expect to obtain according to a specific hypothesis. For example, if, according to Mendel's
laws, you expected 10 of 20 offspring from a cross to be male and the actual observed
number was 8 males, then you might want to know about the "goodness to fit" between the
observed and expected. Were the deviations (differences between observed and expected) the
result of chance, or were they due to other factors. How much deviation can occur before
you, the investigator, must conclude that something other than chance is at work, causing the
observed to differ from the expected. The chi-square test is always testing what scientists call
the null hypothesis, which states that there is no significant difference between the expected
and observed result.
That is, chi-square is the sum of the squared difference between observed (o) and the
expected (e) data (or the deviation, d), divided by the expected data in all possible categories.
For example, suppose that a cross between two pea plants yields a population of 880 plants,
639 with green seeds and 241 with yellow seeds. You are asked to propose the genotypes of
the parents. Your hypothesis is that the allele for green is dominant to the allele for yellow
and that the parent plants were both heterozygous for this trait. If your hypothesis is true, then
the predicted ratio of offspring from this cross would be 3:1 (based on Mendel's laws) as
predicted from the results of the Punnett square.
Chi-square requires numerical values, not percentages or ratios:
1. Determine degrees of freedom (df). Degrees of freedom can be calculated as the
number of categories in the problem minus 1. In our example, there are two categories
(green and yellow); therefore, there is I degree of freedom.
2. Determine a relative standard to serve as the basis for accepting or rejecting the
hypothesis. The relative standard commonly used in biological research is p >0.05.
The p value is the probability that the deviation of the observed from that expected is
due to chance alone (no other forces acting). In this case, using p>0.05, you would
expect any deviation to be due to chance alone 5% of the time or less.
3. Refer to a chi-square distribution table (Table B.2). Using the appropriate degrees of
'freedom, locate the value closest to your calculated chi-square in the table. Determine
20
the closestp (probability) value associated with your chi-square and degrees of
freedom. In this case (x2
=2.668), the p value is about 0.10, which means that there is a
10% probability that any deviation from expected results is due to chance only. Based
on our standard p >0.05, this is within the range of acceptable deviation. In terms of
your hypothesis for this example, the observed chi-squareis not significantly different
from expected. The observed numbers are consistent with those expected under
Mendel's law.
Step-by-Step Procedure for Testing Hypothesis and Calculating Chi-Square:
1. State the hypothesis being tested and the predicted results. Gather the data by
conducting the proper experiment (or, if working genetics problems, use the data
provided in the problem).
2. Determine the expected numbers for each observational class. Remember to use
numbers, not percentages. Chi-square should not be calculated if the expected value in
any category is less than 5.
3. Calculate chi square using the formula. Complete all calculations to three significant
digits. Round off your answer to two significant digits.
4. Use the chi-square distribution table to determine significance of the value.
5. Determine degrees of freedom and locate the value in the appropriate column.
6. Locate the value closest to your calculated 2
on that degrees of freedom df row.
7. Move up the column to determine the p value.
8. State your conclusion in terms of your hypothesis.
9. If the p value for the calculated x2
is p >0.05, accept your hypothesis. 'The deviation
is small enough that chance alone accounts for it. A p value of 0.6, for example,
means that there is a 60% probability that any deviation from expected is due to
chance only. This is within the range of acceptable deviation.
10. If the p value for the calculated x2
is p <0.05, reject your hypothesis, and conclude
that some factor other than chance is operating for the deviation to be so great. For
example, a p value of 0.01 means that there is only a 1% chance that this deviation is
due to chance alone. Therefore, other factors must be involved.
11. The chi-square test will be used to test for the "goodness to fit" between observed and
expected data from several laboratory investigations in this lab manual.
21
Chi-Square Test of Homogeneity
This lesson explains how to conduct a chi-square test of homogeneity. The test is applied to a
single categorical variable from two different populations. It is used to determine whether
frequency counts are distributed identically across different populations. For example, in a
survey of TV viewing preferences, we might ask respondents to identify their favorite
program. We might ask the same question of two different populations, such as males and
females. We could use a chi-square test for homogeneity to determine whether male viewing
preferences differed significantly from female viewing preferences. The sample problem at
the end of the lesson considers this example. The test procedure described in this lesson is
appropriate when the following conditions are met:
1. For each population, the sampling method is simple random sampling.
2. Each population is at least 10 times as large as its respective sample.
3. The variable under study is categorical.
4. If sample data are displayed in a contingency table (Populations x Category
levels), the expected frequency count for each cell of the table is at least 5.
This approach consists of four steps: (1) state the hypotheses, (2) formulate an analysis plan,
(3) analyze sample data, and (4) interpret results.
State the Hypotheses: Every hypothesis test requires the analyst to state a null hypothesis
and an alternative hypothesis. The hypotheses are stated in such a way that they are mutually
exclusive. That is, if one is true, the other must be false; and vice versa.Suppose that data
were sampled from r populations, and assume that the categorical variable had c levels. At
any specified level of the categorical variable, the null hypothesis states that each population
has the same proportion of observations. Thus, the alternative hypothesis (Ha) is that at least
one of the null hypothesis statements is false.
Analyze Sample Data
Using sample data from the contingency tables, find the degrees of freedom, expected
frequency counts, test statistic, and the P-value associated with the test statistic. The analysis
described in this section is illustrated in the sample problem at the end of this lesson.
1. Degrees of freedom. The degrees of freedom (DF) is equal to:
22
DF = (r - 1) * (c - 1) , where r is the number of populations, and c is the number of
levels for the categorical variable.
2. Expected frequency counts. The expected frequency counts are computed
separately for each population at each level of the categorical variable, according to
the following formula. Er,c = (nr * nc) / n , where Er,c is the expected frequency count
for population r at level c of the categorical variable, nr is the total number of
observations from population r, nc is the total number of observations at treatment
level c, and n is the total sample size.
3. Test statistic. The test statistic is a chi-square random variable (Χ2
) defined by the
following equation. Χ2
= Σ [ (τr,c - Er,c)2
/ Er,c ] , where Or,c is the observed
frequency count in population r for level c of the categorical variable, and Er,c is the
expected frequency count in population r for level c of the categorical variable.
4. P-value. The P-value is the probability of observing a sample statistic as extreme as
the test statistic. Since the test statistic is a chi-square, use the Chi-Square
Distribution Calculator to assess the probability associated with the test statistic.
Use the degrees of freedom computed above.
Interpret Results: If the sample findings are unlikely, given the null hypothesis, the
researcher rejects the null hypothesis. Typically, this involves comparing the P-value to the
significance level, and rejecting the null hypothesis when the P-value is less than the
significance level.
Test Your Understanding of This Lesson
Problem
In a study of the television viewing habits of children, a developmental psychologist selects a
random sample of 300 first graders - 100 boys and 200 girls. Each child is asked which of the
following TV programs they like best: The Lone Ranger, Sesame Street, or The Simpsons.
Results are shown in the contingency table below.
23
Viewing Preferences
Row total
Lone Ranger Sesame Street The Simpsons
Boys 50 30 20 100
Girls 50 80 70 200
Column total 100 110 90 300
Do the boys' preferences for these TV programs differ significantly from the girls'
preferences? Use a 0.05 level of significance.
Solution
The solution to this problem takes four steps: (1) state the hypotheses, (2) formulate an
analysis plan, (3) analyze sample data, and (4) interpret results. We work through those steps
below:
 State the hypotheses. The first step is to state the null hypothesis and an alternative
hypothesis.
o Null hypothesis: The null hypothesis states that the proportion of boys who
prefer the Lone Ranger is identical to the proportion of girls. Similarly, for the
other programs.
o Alternative hypothesis: At least one of the null hypothesis statements is false.
 Formulate an analysis plan. For this analysis, the significance level is 0.05. Using
sample data, we will conduct a chi-square test for homogeneity.
 Analyze sample data. Applying the chi-square test for homogeneity to sample data,
we compute the degrees of freedom, the expected frequency counts, and the chi-
square test statistic. Based on the chi-square statistic and the degrees of freedom, we
determine the P-value. where DF is the degrees of freedom, r is the number of
populations, c is the number of levels of the categorical variable, nr is the number of
observations from population r, nc is the number of observations from level c of the
categorical variable, n is the number of observations in the sample, Er,c is the expected
frequency count in population r for level c, and Or,c is the observed frequency count in
population r for level c. The P-value is the probability that a chi-square statistic
having 2 degrees of freedom is more extreme than 19.39.
24
Pearson's chi-squared test (χ2
) is a statistical test applied to sets of categorical data to
evaluate how likely it is that any observed difference between the sets arose by chance. It is
suitable for unpaired data from large samples.[1]
It is the most widely used of many chi-
squared tests (Yates, likelihood ratio, portmanteau test in time series, etc.). Statistical
procedures whose results are evaluated by reference to the chi-squared distribution. Its
properties were first investigated by Karl Pearson in 1900.[2]
In contexts where it is important
to improve a distinction between the test statistic and its distribution, names similar to
Pearson χ-squared test or statistic are used. It tests a null hypothesis stating that the
frequency distribution of certain events observed in a sample is consistent with a particular
theoretical distribution. The events considered must be mutually exclusive and have total
probability 1. A common case for this is where the events each cover an outcome of a
categorical variable. A simple example is the hypothesis that an ordinary six-sided die is
"fair", i. e., all six outcomes are equally likely to occur.
Pearson's chi-squared test is used to assess two types of comparison: tests of goodness of fit
and tests of independence.
1. A test of goodness of fit establishes whether or not an observed frequency distribution
differs from a theoretical distribution.
2. A test of independence assesses whether paired observations on two variables,
expressed in a contingency table, are independent of each other (e.g. polling responses
from people of different nationalities to see if one's nationality is related to the
response).
The procedure of the test includes the following steps:
1. Calculate the chi-squared test statistic, , which resembles a normalized sum of
squared deviations between observed and theoretical frequencies (see below).
2. Determine the degrees of freedom, df, of that statistic, which is essentially the number
of frequencies reduced by the number of parameters of the fitted distribution.
3. Compare to the critical value from the chi-squared distribution with df degrees of
freedom, which in many cases gives a good approximation of the distribution of .
Test of independence
25
In this case, an "observation" consists of the values of two outcomes and the null hypothesis
is that the occurrence of these outcomes is statistically independent. Each observation is
allocated to one cell of a two-dimensional array of cells (called a contingency table)
according to the values of the two outcomes. If there are r rows and c columns in the table,
the "theoretical frequency" for a cell, given the hypothesis of independence, is
where is the total sample size (the sum of all cells in the table). With the term "frequencies"
this page does not refer to already normalised values.
The value of the test-statistic is
Fitting the model of "independence" reduces the number of degrees of freedom by
p = r + c − 1. The number of degrees of freedom is equal to the number of cells rc, minus the
reduction in degrees of freedom, p, which reduces to (r − 1)(c − 1).For the test of
independence, also known as the test of homogeneity, a chi-squared probability of less than
or equal to 0.05 (or the chi-squared statistic being at or larger than the 0.05 critical point) is
commonly interpreted by applied workers as justification for rejecting the null hypothesis that
the row variable is independent of the column variable.[3]
The alternative hypothesis
corresponds to the variables having an association or relationship where the structure of this
relationship is not specified. A test that relies on different assumptions is Fisher's exact test; if
its assumption of fixed marginal distributions is met it is substantially more accurate in
obtaining a significance level, especially with few observations. In the vast majority of
applications this assumption will not be met, and Fisher's exact test will be over conservative
and not have correct coverage.
Goodness of fit
In this context, the frequencies of both theoretical and empirical distributions are un-
normalised counts, and for a chi-squared test the total sample sizes of both these
distributions (sums of all cells of the corresponding contingency tables) have to be the same.
26
For example, to test the hypothesis that a random sample of 100 people has been drawn from
a population in which men and women are equal in frequency, the observed number of men
and women would be compared to the theoretical frequencies of 50 men and 50 women. If
there were 44 men in the sample and 56 women, then
If the null hypothesis is true (i.e., men and women are chosen with equal probability), the test
statistic will be drawn from a chi-squared distribution with one degree of freedom (because if
the male frequency is known, then the female frequency is determined).
27
Hypothesis Testing
In statistics, during a statistical survey or a research, a hypothesis has to be set and defined. It
is termed as a statistical hypothesis It is actually an assumption for the population parameter.
Though, it is definite that this hypothesis is always proved to be true. The hypothesis testing
refers to the predefined formal procedures that are used by statisticians whether to accept or
reject the hypotheses. Hypothesis testing is defined as the process of choosing hypotheses for
a particular probability distribution, on the basis of observed data.
Hypothesis testing is a core and important topic in statistics. In the research hypothesis
testing, a hypothesis is an optional but important detail of the phenomenon. The null
hypothesis is defined as a hypothesis that is aimed to challenge a researcher. Generally, the
null hypothesis represents the current explanation or the vision of a feature which the
researcher is going to test. Hypothesis testing includes the tests that are used to determine the
outcomes that would lead to the rejection of a null hypothesis in order to get a specified level
ofsignificance.
A hypothesis testing is utilized in the reference of a research study. Hypothesis test is used to
evaluate and analyze the results of the research study. Let us learn more about this topic.
Hypothesis Testing
Hypothesis testing is one of the most important concepts in statistics. A statistical hypothesis
is an assumption about a population parameter. This assumption may or may not be true. The
methodology employed by the analyst depends on the nature of the data used and the goals of
the analysis. The goal is to either accept or reject the null hypothesis.
Hypothesis Testing Terms
Given below are some of the terms used in hypothesis testing.
1. Test Statistic: The decision, whether to accept and reject the null hypothesis is made based
on this value. The test statistic is a defined formula based on the distribution t, z, F etc. If
calculated test statistic value is less than the critical value, we accept the hypothesis,
otherwise, we reject the hypothesis.
28
Hypothesis Testing Formula: z-test statistic is used for testing the mean of the large sample.
The test statistic is given by
z =x¯−μσn√
where, x¯ is the sample mean, μ is the population mean, σ is the population standard
deviation and n is the sample size.
2. Level of Significance
The confidence at which a null hypothesis is accepted or rejected is called level of
significance. The level of significance is denoted by α
3. Critical Value
Critical value is the value that divides the regions into two-Acceptance region and rejection
region. If the computed test statistic falls in the rejection region, we reject the hypothesis.
Otherwise, we accept the hypothesis. The critical value depends upon the level of
significance and alternative hypothesis.
4. One Sided or Two Sided Hypothesis
The alternative hypothesis is one sided if the parameter is larger or smaller than the null
hypothesis value. It is two sided when the parameter is different from the null hypothesis
value. The null hypothesis is usually tested against an alternative hypothesis(H1). The
alternative hypothesis can take one of three forms:
1. H1: B1>1, is one-sided alternative hypothesis.
2. H1: B1< 1, also a one-sided alternative hypothesis.
3. H1: B1≠1, is two-sided alternative hypothesis. That is, the true value is either greater
or less than 1.
5. P – Value: The probability that the statistic takes a value as extreme or more than extreme
assuming that the null hypothesis is true is called P- value. The P-value is the probability of
observing a sample statistic as extreme as the test statistic, assuming the null hypothesis is
true. The P value is the probability of seeing the observed difference, or greater, just by
chance if the null hypothesis is true. The larger the P value, the smaller will be the evidence
against the null hypothesis.
A hypothesis testing gives the following benefits
1. They establish the focus and track for a research effort.
2. Their development helps the researcher shape the purpose of the research movement.
29
3. They establish which variables will not be measured in a study and similarly those,
which will be measured.
4. They need the researcher to contain the operational explanation of the variables of
interest.
Process of Hypothesis Testing
1. State the hypotheses of importance
2. Conclude the suitable test statistic
3. State the stage of statistical significance
4. State the decision regulation for rejecting / not rejecting the null hypothesis
5. Collect the data and complete the needed calculations
6. Choose to reject / not reject the null hypothesis
Errors in Research Testing:
It is common to make two types of errors while drawing conclusions in research:
Type 1: When we recognize the research hypothesis and the null hypothesis is supposed to
be correct.
Type 2: When we refuse the research hypothesis even if the null hypothesis is incorrect.
Purpose of Hypothesis Testing
Hypothesis testing begins with the hypothesis made about the population parameter. Then,
collect data from appropriate sample and obtained information from the sample is used to
decide how likely it is that the hypothesized population parameter is correct. The purpose of
hypothesis testing is not to question the computed value of the sample statistic but to make a
judgement about the difference between two samples and a hypothesized population
parameter.
Hypothesis Testing Steps: We illustrate the five steps to hypothesis testing in the context of
testing a specified value for a population proportion. The procedure for hypothesis testing is
given below:
1. Set up a null hypothesis and alternative hypothesis.
2. Decide about the test criterion to be used.
3. Calculate the test statistic using the given values from the sample
30
4. Find the critical value at the required level of significance and degrees of freedom.
5. Decide whether to accept or reject the hypothesis. If the calculated test statistic value
is less than the critical value, we accept the hypothesis otherwise we reject the
hypothesis.
Different Types of Hypothesis:
There are 5 different types of hypothesis as follows:
1. Simple Hypothesis: If a hypothesis is concerned with the population completely such as
functional form and the parameter, it is called simple hypothesis. Example: The hypothesis
“Population is normal with mean as 15 and standard deviation as 5" is a simple hypothesis
2.Composite Hypothesis or Multiple Hypothesis: If the hypothesis concerning the
population is not explicitly defined based on the parameters, then it is composite hypothesis
or multiple hypothesis. Example: The hypothesis “population is normal with mean is 15" is a
composite or multiple hypothesis.
3. Parametric Hypothesis
A hypothesis, which specifies only the parameters of the probability density function, is
called parametric hypothesis. Example: The hypothesis “Mean of the population is 15" is
parametric hypothesis.
4. Non Parametric Hypothesis: If a hypothesis specifies only the form of the density
function in the population, it is called a non- parametric hypothesis. Example: The hypothesis
"population is normal" is non - parametric.
Null and Alternative Hypothesis: A null hypothesis can be defined as a statistical
hypothesis, which is stated for acceptance. It is the original hypothesis. Any other hypothesis
other than null hypothesis is called Alternative hypothesis. When null hypothesis is rejected
we accept the alternative hypothesis. Null hypothesis is denoted by H0 and alternative
hypothesis is denoted by H1. Example: When we want to test if the population mean is 30,
then null hypothesis is “Population mean is 30'' and alternative Hypothesis is “Population
mean is not 30".
Logic of Hypothesis Testing
The logic underlying the hypothesis testing procedure as follow:
1. The hypothesis concerns the value of a population parameter.
2. Before select a sample, we use the hypothesis to predict the characteristics that the
sample should have.
31
3. Obtain the random sample from the population.
4. At last compare the obtained sample data with the prediction made from the
hypothesis. Hypothesis is reasonable if the sample mean is consistent with the
prediction otherwise hypothesis is wrong.
Type I Error and Type II Error
The probability of rejecting the null hypothesis, when it is true, is called Type I error whereas
the probability of accepting the null hypothesis is called Type II error.Example: Suppose a
toy manufacturer and its main supplier agreed that the quality of each shipment will meet a
particular benchmark. Our null hypothesis is that the quality is 90%. If we accept the
shipment, given the quality is less than 90%, then we have committed Type I error. If we
reject the shipment, given the the quality is greater than 90%, we have committed Type II
error.
Power of the Test: Power of a test is defined as the probability that the test will reject the
null hypothesis when the alternative hypothesis is true.
For a fixed level of significance, if we increase the sample size, the probability of Type II
error decreases, which in turn increases the power. So to increase the power, the best method
is to increase the sample size.
1. Only one of the Type I error or the Type II error is possible at a time.
2. The power of a test is defined as 1 minus the probability of type II error. Power =
1−β.
Hypothesis Testing Procedure
There are five important steps in the process of hypothesis testing:
Step 1: Identifying the null hypothesis and alternative hypothesis to be tested.
Step 2: Identifying the test criterion to be used
Step 3: Calculating the test criterion based on the values obtained from the sample
Step 4: Finding the critical value with required level of significance and degrees of freedom
Step 5: Concluding whether to accept or reject the null hypothesis.
Multiple Hypothesis Testing
The problem of multiple hypothesis testing arises when there are more than one hypothesis to
be tested simultaneously for statistical significance. Multiple hypothesis testing occurs in a
32
vast variety of field and for a variety of purposes. Testing of more than one hypothesis is used
in many field and for many purposes. An alternate way of multiple hypothesis testing is
multiple decision problem. When considering multiple testing problems, the concern is with
Type 1 errors when hypothesis are true and type 11 errors when they are false. The evaluation
of the procedures is based on criteria involving balance between these errors.
Bayesian Hypothesis Testing
Bayesian involves specifying a hypothesis and collecting evidence that support or does not
support the statistical hypothesis. The amount of evidence can be used to specify the degree
of belief in a hypothesis in probabilistic terms. The probability of supporting hypothesis can
become very high or low. Hypothesis with a high probabilistic terms are accepted as true, and
with low are rejected as false. Bayesian hypothesis testing works just like any other type of
Bayesian inference. Let us consider the case where we are considering only two hypotheses,
H1 and H2
Level of Significance in Hypothesis Testing
The hypothesis testing follows the following procedure:
 Specify the null and alternative hypotheses
 Specify a value for α
 Collect the sample data and determine the weight of evidence for rejection the null
hypothesis.
This weight is given in the terms of probability, is called the level of significance (p value) of
the statistical test. The level of significance is the probability of obtaining a value of the
statistic that is likely or reject H0 as the actual observed value of the test statistic, assuming
that null hypothesis is true. If the level of significance is a small value, then the sample data
fail to support null hypothesis and it reject H0. If the level of significance is a large value,
then we fail to reject null hypothesis.
Hypothesis Testing Example: Given below are some of the examples on hypothesis testing.
Solved Example
Question: XYL Company, with a very small turnover, is taking feedback on permanent
employees. During the feedback process, it was found that the average age of XYL
employees is 20 years. The relevance of the data was verified by taking a random sample of
33
hundred workers and the common age turns out as 19 years with a standard deviation of 02
years. Now XYZ should continue to make its claim, or it should make changes?
Solution:
1. Specify the hypothesis
H0 = 20 (twenty) years
H1 = 20 (twenty) years
2. State the Significance Level: Since the company would like to maintain its present
message to new human resources, XYZ selects a fairly weak significance level(α =
0.5). Because this is a two-tailed analysis, half of the alpha will be assigned to every
tail of the allocation. In this condition the important values of Z = +1.96 and -1.96.
3. Specify the decision rule: If the calculated value of Z geq 1.96 or Z leq -1.96, the null
hypothesis will be rejected.
34
Design of Experiment
The principles of design suggest how a designer can best arrange the various elements of a
page layout in connection to the overall design and to each other.
Principles of design: Principles applied to the elements of design that bring them together
into one design. How one applies these principles determines how successful a design may
be. Mainly three principles:
1. Randomization
2. Replication
3. Error control
Methods
1. Proximity: sense of distance between elements
2. Similarity: ability to seem repeatable with other elements
3. Continuation: the sense of having a line or pattern extend
4. Repetition: elements being copied or mimicked numerous times
5. Rhythm: is achieved when recurring position, size, color, and use of a graphic element
has a focal point interruption.
6. Altering the basic theme achieves unity and helps keep interest.
Data Design
1. Data will be normalised across the organisation and its partner network
2. Data will be transferable and re-usable across the organisation and its partner network
3. Data entry will be avoided, by using data lookup, selection and confirmation
approaches
Proportional Hazard Model
Survival models can be viewed as consisting of two parts: the underlying hazard function,
often denoted , describing how the risk of event per time unit changes over time at
baseline levels of covariates; and the effect parameters, describing how the hazard varies in
response to explanatory covariates. A typical medical example would include covariates such
as treatment assignment, as well as patient characteristics such as age at start of study,
35
gender, and the presence of other diseases at start of study, in order to reduce variability
and/or control for confounding. The proportional hazards condition[1]
states that covariates
are multiplicatively related to the hazard. In the simplest case of stationary coefficients, for
example, a treatment with a drug may, say, halve a subject's hazard at any given time , while
the baseline hazard may vary. Note however, that this does not double the life time of the
subject; the precise effect of the covariates on the life time depends on the type of . Of
course, the covariate is not restricted to binary predictors; in the case of a continuous
covariate , it is typically assumed that the hazard responds logarithmically; each unit
increase in results in proportional scaling of the hazard. The Cox partial likelihood shown
below, is obtained by using Breslow's estimate of the baseline hazard function, plugging it
into the full likelihood and then observing that the result is a product of two factors. The first
factor is the partial likelihood shown below, in which the baseline hazard has "canceled out".
The second factor is free of the regression coefficients and depends on the data only through
the censoring pattern. The effect of covariates estimated by any proportional hazards model
can thus be reported as hazard ratios.
36
Life Tables
A life table is a concise way of showing the probabilities of a member of a particular
population living to or dying at a particular age. In this study, the life tables are used to
examine the mortality changes in the Social Security population over time. An ideal
representation of human mortality would provide a measure of the rate of death occurring at
specified ages over specified periods of time. In the past, analytical methods (such as the
Gompertz, Makeham, or logistic curves) satisfied this criterion approximately over a broad
range of ages. However, as actual data have become more abundant and more reliable, the
use of approximate analytical methods have become less necessary and acceptable. Today,
mortality is most commonly represented in the form of a life table, which gives probabilities
of death within one year at each exact integral age. These probabilities are generally based on
tabulations of deaths in a given population and estimates of the size of that population. For
this study, functions in the life table can be generated from the qx, where qx is the probability
of death within a year of a person aged x. Although a life table does not give mortality at non-
integral ages or for non-integral durations, as can be obtained from a mathematical formula,
acceptable methods for estimating such values are well known.
Two basic types of life tables are presented in this study, period-based tables and cohort-
based tables. Each type of table can be constructed either based on actual population data or
on expected future experience. A period life table is based on, or represents, the mortality
experience of an entire population during a relatively short period of time, usually one to
three years. Life tables based directly on population data are generally constructed as period
life tables because death and population data are most readily available on a time period
basis. Such tables are useful in analyzing changes in the mortality experienced by a
population through time. If the experience study is limited to short periods of time, the
resulting rates will be more uniformly representative of the entire period. A cohort, or
generation, life table is based on, or represents, mortality experience over the entire lifetime
of a cohort of persons born during a relatively short period of time, usually one year. Cohort
life tables based directly on population experience data are relatively rare, because of the
need for data of consistent quality over a very long period of time. Cohort tables can,
however, be readily produced, reflecting mortality rates from a series of period tables for past
years, projections of future mortality, or a combination of the two. Such tables are superior to
period tables for the purpose of projecting a population into the future when mortality is
37
expected to change over time, and for analyzing the generational trends in mortality. A life
table treats the mortality experience upon which it is based as though it represents the
experience of a single birth cohort consisting of 100,000 births who experience, at each age x
of their lives, the probability of death, denoted qx, shown in the table. The entry lx in the life
table shows the number of survivors of that birth cohort at each succeeding exact integral age.
Another entry, dx, shows the number of deaths that would occur between succeeding exact
integral ages among members of the cohort. The entry denoted Lx gives the number of
person-years lived between consecutive exact integral ages x and x+1 and Tx gives the total
number of person-years lived beyond each exact integral age x, by all members of the cohort.
The final entry in the life table, represents the average number of years of life remaining for
members of the cohort still alive at exact integral age x, and is called the life expectancy.
The lx entry in the life table is also useful for determining the age corresponding to a
specified survival rate from birth, which is defined as the age at which the ratio of lx to
100,000 is equal to a specified value between 0 and 1.
A stationary population is what would result if for each past and future year:
1. The probabilities of death shown in the table are experienced
2. 100,000 births occur uniformly throughout each year
3. The population has no immigration and emigration
A population with these characteristics would have a constant number of persons from year to
year (in fact, at any time) both in their total number and in their number at each age. These
numbers of persons, by age last birthday, are provided in the life table as the Lx values. The lx
entry is interpreted as the number of persons who attain each exact integral age during any
year, and dx is the number of persons who die at each age last birthday during any year. The
entry Tx represents the number of persons who are alive at age last birthday x or older, at any
time.
Construction of Central Death Rates
A. Data Sources
Annual tabulations of numbers of deaths by age and sex are made by the National Center for
Health Statistics (NCHS) based on information supplied by States in the Death Registration
Area, and are published in the volumes of Vital Statistics of the United States. Deaths are
38
provided by five year age groups for ages 5 through 84, in total for ages 85 and older, and by
single-year and smaller age intervals for ages 4 and under. One requirement for admission to
the Death Registration Area, which since 1933 has included all the States, the District of
Columbia and the independent registration area of New York City, was a demonstration of
ninety percent completeness of registration.
Life Table Functions
The following are definitions of the standard actuarial functions used in this study to develop
mortality rates based on mid-year population and annual death data.
Dx = the number of deaths at age x last birthday in a population during a year
Px = the number of persons who are age x last birthday in a population at midyear
yMx =
the central death rate for the subset of a population that is between exact ages x and
x+y
yqx = the probability that a person exact age x will die within y years
The following are the additional definitions of standard life table functions. The table
represents a hypothetical cohort of 100,000 persons born at the same instant who experience
the rate of mortality represented by 1qx, the probability that a person age x will die within one
year, for each age x throughout their lives. The stationary population definitions, that are
given in parentheses, refer to the population size and age distribution that would result if the
rates of mortality represented by 1qx were experienced each year, past and future, for persons
between exact ages x and x+1, and if 100,000 births were to occur uniformly throughout each
year.
The life table functions lx, dx, Lx, Tx, and were calculated as follows:
l0 = 100,000
dx = lx · 1qx x = 1, 2, 3, ...
lx = lx-1 · (1 - 1qx-1) x = 1, 2, 3, ...
L0 = l0 - 1f0 · d0
Lx = lx - .5 · dx x = 1, 2, 3, ...
39
Tx = Lx + Lx+1 + Lx+2 + ... + L148 x = 0, 1, 2, 3, ...
= Tx / lx x = 0, 1, 2, 3, ...
The fundamental step in constructing a life table from population data is that of developing
probabilities of death, qx, that accurately reflect the underlying pattern of mortality
experienced by the population. The following sections describe the methods used for
developing the rates presented in this actuarial study. These methods, as will be seen, vary
significantly by age. Actual data permit the computation of central death rates, which are then
converted into probabilities of death. Exceptions to this procedure include direct calculation
of probabilities of death at young ages and geometric extrapolation of probabilities of death at
extreme old age, where data is sparse or of questionable quality.
where 5mx =
dx + dx+1 + dx+2 + dx+3 + dx+4
Lx + Lx+1 + Lx+2 + Lx+3 + Lx+4
and 5Mx =
Dx + Dx+1 + Dx+2 + Dx+3 + Dx+4
Px + Px+1 + Px+2 + Px+3 + Px+4
A number of extremely important developments have contributed to the rapid average rate of
mortality improvement during the twentieth century. These developments include:
 Access to primary medical care for the general population
 Improved healthcare provided to mothers and babies
 Availability of immunizations
 Improvements in motor vehicle safety
 Clean water supply and waste removal
 Safer and more nutritious foods
 Rapid rate of growth in the general standard of living.
Each of these developments is expected to make a substantially smaller contribution to
annual rates of mortality improvement in the future.
Future reductions in mortality will depend upon such factors as:
40
 Development and application of new diagnostic, surgical and life sustaining
techniques
 Presence of environmental pollutants
 Improvements in exercise and nutrition
 Incidence of violence
 Isolation and treatment of causes of disease
 Emergence of new forms of disease
 Prevalence of cigarette smoking
 Misuse of drugs (including alcohol)
 Extent to which people assume responsibility for their own health
 Education regarding health
 Changes in our conception of the value of life
 Ability and willingness of our society to pay for the development of new treatments
and technologies, and to provide these to the population as a whole.
Taking Demographic Projections
Survival rates are used extensively in demographic projection techniques. Survival rates are
derived from life tables or census data, and are used to calculate the number of people that
will be alive in the future. In many cases, planners can obtain survival rates from a national or
regional statistics office, or from life tables. If survival rates or life tables are not available,
the rates may be computed from a model life table or census data.
Life tables: Life tables are used to measure mortality, survivorship, and the life expectancy
of a population at varying ages. There are several types of life tables. A generation or cohort
life table is a life history of the mortality experiences of an actual cohort of individuals. The
cohort begins at birth and their mortality experiences are recorded through the death of the
last member of that cohort. For example, demographers use the table to trace the mortality
experiences of a cohort or group of individuals born in 1910 and record the mortality
experiences of each member until the last one dies. In most cases, generation life tables are
used to study historic periods.
Current or period life tables
Period life tables are based on the mortality experience of a hypothetical cohort of
newborn babies, usually 100,000 newborns, who are subject to the age-specific
41
mortality rates on which the table is based. It traces the cohort of newborn babies
throughout their lifetime under the assumption that they are subject to the age-specific
mortality rates of a region or country.
There are two types of current life tables:
 Unabridged, for single years of life
 Abridged, for 5-year cohorts of life
In many countries, life tables are based on an average of age-specific death rates for a 3-year
time period, generally around a census taking. In many cases, the life tables are prepared
every 10 years. For example, a country or state would collect age-specific death rates for
1999, 2000, and 2001. The census for year 2000 would be used for the base population.
^Top
Calculating Survival Rates
Life tables are used to calculate survival rates. For population projections, 5-year survival
rates are computed. For estimates of net migration, 10-year survival rates are calculated.
Calculations of survival rates rely on two columns in the life table, Lx and Tx.
Using the abridged life table presented in Table 7-1, calculate 5-year survival rates as shown
in Equation 7-1.
Equation 7-1
5-year Survival Rate
To calculate a rate to survive women ages 25–29 into the next 5-year age cohort (30–34), use
the following numbers from the Lx column in Table 7-1, as shown in the following example.
42
Using Model Life Tables
If life tables are not available for a particular country, use model life tables to obtain survival
rates, preferably regional model life tables.
Model life table: A model life table is derived from life tables and mortality experiences of a
number of countries. They are primarily used to assist countries that do not have vital
statistics systems to record deaths. Using regression analysis, the United Nations published its
first set of model life tables in 1955. The tables were based on life tables from 158 countries.
In 1966, Coale and Demeny introduced regional model life tables. The authors used 326 life
tables to develop 200 regional model life table.
43
Sample and Sampling
Sample: Sample is the representative part of a population.
Sampling: The process of taking sample from strata is called sampling. A process used in
statistical analysis in which a predetermined number of observations will be taken from a
larger population. The methodology used to sample from a larger population will depend on
the type of analysis being performed, but will include simple random sampling, systematic
sampling and observational sampling.
Types of Sampling
Probability Sampling: Laura is a psychologist who is interested in studying whether there is
bias against women in the workforce. So she decides to survey workers to see if they believe
that sexism plays a part at their company.
1. Simple Random Sampling,
2. Stratified Random Sampling,
3. Multi-Stage Sampling
Non-probability Sampling:
Simple Random Sampling: A simple random sample (SRS) of size n is produced by a
scheme which ensures that each subgroup of the population of size n has an equal probability
of being chosen as the sample. There are also a lot of computer programs that allow
researchers to easily generate a simple random sample from a population. For example,
maybe Laura uses a computer program to randomly select women in sales at the top 50
companies in the United States. She inputs all of their names and the number of people she
wants in her sample, and the computer will randomly pick names to be included in the
sample.
Stratified Random Sampling: Divide the population into "strata". There can be any number
of these. Then choose a simple random sample from each stratum. Combine those into the
overall sample. That is a stratified random sample. (Example: Church A has 600 women and
400 women as members. One way to get a stratified random sample of size 30 is to take a
SRS of 18 women from the 600 women and another SRS of 12 men from the 400 men.). One
44
popular method of probability sampling is the systematic sampling method, which involves
ordering the population and then choosing every nth person. For example, maybe Laura
orders a list of possible women alphabetically or by height. Then she chooses every 10th
woman or every 8th woman or whatever number she decides on ahead of time. The
advantage of systematic sampling is that the sample is evenly spread across the population.
Imagine that Laura's computer-generated simple random assignment picks all short women
and no tall women. And imagine that short women experience more bias than tall women
because people see them as being like little girls. The simple random assignment has given
Laura a sample that could have a different outcome from the main population. But if Laura
orders the population by height and chooses every 10th woman, she has a sample that
includes women of all heights and thus is more representative of the population.
Multi-Stage Sampling: Sometimes the population is too large and scattered for it to be
practical to make a list of the entire population from which to draw a SRS. For instance,
when the a polling organization samples US voters, they do not do a SRS. Since voter lists
are compiled by counties, they might first do a sample of the counties and then sample within
the selected counties. This illustrates two stages. In some instances, they might use even more
stages. At each stage, they might do a stratified random sample on sex, race, income level, or
any other useful variable on which they could get information before sampling.
Non-probability sampling schemes
1. Voluntary response sampling,
2. Judgement sampling,
3. Convenience sampling
Convenience Sampling
Kiera wants to give her survey to a sample of people in order to learn why Americans feel the
way they do about capital punishment. She and her two research assistants go to a shopping
mall on a Tuesday morning and stop people to ask their opinion on the death penalty and why
they feel that way. Kiera is using the convenience sampling method, which is just what it
sounds like: a researcher selects the sample based on convenience. The subjects selected to be
part of the study's sample are there and are available to be tested. Convenience sampling has
a major problem: the people who are readily available are not necessarily representative of
45
the population at large. Think about Kiera's study; if she and her research assistants poll the
people at a shopping mall on a Tuesday morning, their sample is limited to subjects who are
at a shopping mall on a Tuesday morning. Anyone with a nine-to-five job (which includes
most adults in America) will be at work, not at the mall, which means that they won't be part
of Kiera's sample. With the problem of non-representativeness, you might be wondering why
researchers would ever use convenience sampling. The answer is in its name: convenience.
Psychologists do this a lot. If they teach at a university, they are most likely doing research
on university students. The truth is, it's not always practical to use a method other than
convenience sampling. Other sampling methods might yield a better sample, but they also
cost more in time and money, so many researchers end up using convenience sampling.
Quota Sampling
For a moment, though, let's say that Kiera and her research assistants are able to go to a mall
at a time when the entire population of American adults is represented. She still has to choose
which people to survey. One way to choose is to use the quota sampling method, which
involves setting quotas based on demographic information but not randomly selecting
subjects for each quota. For example, let's say that Kiera knows that approximately 51% of
U.S. adults are women. She might tell her research assistants to interview 51 women and 49
men, a quota that roughly corresponds to the demographics for the population. However, the
51 women and 49 men are not chosen randomly; her assistants can choose which women and
men to give the survey to. The good thing about quota sampling is that the demographics are
approximately correct for the population, especially if you make quotas for several different
demographic categories. For example, Kiera can set quotas not only for gender but for race,
age, income level, employment status, political party affiliation, or a host of other categories.
The more categories there are, the more likely she will have a sample that represents the
population. Of course, the more categories she specifies quotas for, the more complex, time-
consuming, and costly her study will be. Not to mention the fact that she might not be able to
meet all her quotas. So a researcher has to balance having many categories to create a
representative sample with having just a few to keep the research simple and practical. It's a
balance that each researcher has to decide on.
Judgmental Sampling
46
Kiera decides to use quota sampling. She gets a representative sample and looks at why
people believe the way they do about the death penalty. But then she decides to follow up
with another, related study that looks at the differences in the death penalty by electric chair
versus lethal injection. She wants to know what the differences are as far as cost, difficulty of
carrying out the execution, and also how the prisoners react to the mode of execution. The
problem is that there aren't all that many people who can answer those questions. The best
source of her answers would be professionals who have administered both methods of
execution. But there are very few of those in the country, so Kiera decides to give the survey
to them all. Kiera is utilizing her own judgment as far as who to include in her sample; in this
case, she is using the judgmental sampling method, also called the purposive sampling
method, which is when the sample is based on the judgment of who the researcher thinks
would be best for the sample. You can remember the names of this method because the
researcher is using her 'judgment' and picking subjects with a 'purpose.' Judgmental sampling
works best when there are a limited number of people with the expertise needed to be a part
of the sample. In Kiera's case, there are only a relatively small number of people who have
administered both the electric chair and lethal injection, so she uses her judgment to choose
them as opposed to Joe Schmoe on the street who can't answer her questions about cost,
difficulty of administration, and how the prisoners react.
Types of samples
The best sampling is probability sampling, because it increases the likelihood of obtaining
samples that are representative of the population.
Probability sample:
Probability samples are selected in such a way as to be representative of the population. They
provide the most valid or credible results because they reflect the characteristics of the
population from which they are selected (e.g., residents of a particular community, students at
an elementary school, etc.). There are two types of probability samples: random and
stratified. The assumption of an equal chance of selection means that sources such as a
telephone book or voter registration lists are not adequate for providing a random sample of a
community. In both these cases there will be a number of residents whose names are not
listed. Telephone surveys get around this problem by random digit dialing but that assumes
that everyone in the population has a telephone. The key to random selection is that there is
47
no bias involved in the selection of the sample. Any variation between the sample
characteristics and the population characteristics is only a matter of chance.
Stratified sample
Stratified samples are as good as or better than random samples, but they require a fairly
detailed advance knowledge of the population characteristics, and therefore are more difficult
to construct.
Nonprobability samples (Non-representative samples)
As they are not truly representative, non-probability samples are less desirable than
probability samples. However, a researcher may not be able to obtain a random or stratified
sample, or it may be too expensive. A researcher may not care about generalizing to a larger
population. The validity of non-probability samples can be increased by trying to
approximate random selection, and by eliminating as many sources of bias as possible.
Purposive sample
A subset of a purposive sample is a snowball sample so named because one picks up the
sample along the way, analogous to a snowball accumulating snow. A snowball sample is
achieved by asking a participant to suggest someone else who might be willing or appropriate
for the study. Snowball samples are particularly useful in hard-to-track populations, such as
truants, drug users, etc.
Convenience sample
Non-probability samples are limited with regard to generalization. Because they do not truly
represent a population, we cannot make valid inferences about the larger group from which
they are drawn. Validity can be increased by approximating random selection as much as
possible, and making every attempt to avoid introducing bias into sample selection.
48
Skewness and Kurtosis
Skewness and Kurtosis are a fundamental task in many statistical analyses is to characterize
the location and variability of a data set. A further characterization of the data includes
skewness and kurtosis.
Measure of skewness
Skewness is a measure of symmetry, or more precisely, the lack of symmetry. A distribution,
or data set, is symmetric if it looks the same to the left and right of the center point.
Measures of kurtosis
Kurtosis is a measure of whether the data are peaked or flat relative to a normal distribution.
That is, data sets with high kurtosis tend to have a distinct peak near the mean, decline rather
rapidly, and have heavy tails. Data sets with low kurtosis tend to have a flat top near the mean
rather than a sharp peak. A uniform distribution would be the extreme case. SkewnessFor
univariate data Y1, Y2, ...,YN, the formula for skewness is:
skewness=∑Ni=1(Yi−Y¯ )3(N−1)s3, where Y¯ is the mean, s is the standard deviation, and N
is the number of data points. The skewness for a normal distribution is zero, and any
symmetric data should have a skewness near zero. Negative values for the skewness indicate
data that are skewed left and positive values for the skewness indicate data that are skewed
right. By skewed left, we mean that the left tail is long relative to the right tail. Similarly,
skewed right means that the right tail is long relative to the left tail. Some measurements have
a lower bound and are skewed right. For example, in reliability studies, failure times cannot
be negative.
Kurtosis For univariate data Y1, Y2, ...,YN, the formula for kurtosis is:
kurtosis=∑Ni=1(Yi−Y¯ )4(N−1)s4, where Y¯ is the mean, s is the standard deviation,
and N is the number of data points. Alternative Definition of Kurtosis The kurtosis for
a standard normal distribution is three.
Which definition of kurtosis is used is a matter of convention (this handbook uses the original
definition). When using software to compute the sample kurtosis, you need to be aware of
which convention is being followed. Many sources use the term kurtosis when they are
actually computing "excess kurtosis", so it may not always be clear.
49
Interpreting skewness and kurtosis
Skewnessquantifies how symmetrical the distribution is.
1. A symmetrical distribution has a skewness of zero.
2. An asymmetrical distribution with a long tail to the right (higher values) has a
positive skew.
3. An asymmetrical distribution with a long tail to the left (lower values) has a negative
skew.
4. The skewness is unitless.
5. Any threshold or rule of thumb is arbitrary, but here is one: If the skewness is greater
than 1.0 (or less than -1.0), the skewness is substantial and the distribution is far from
symmetrical.
Kurtosis quantifies whether the shape of the data distribution matches the Gaussian
distribution.
1. A Gaussian distribution has a kurtosis of 0.
2. A flatter distribution has a negative kurtosis,
3. A distribution more peaked than a Gaussian distribution has a positive kurtosis.
4. Kurtosis has no units.
5. The value that Prism reports is sometimes called the excess kurtosissince the
expected kurtosis for a Gaussian distribution is 0.0.
6. An alternative definition of kurtosis is computed by adding 3 to the value reported by
Prism. With this definition, a Gaussian distribution is expected to have a kurtosis of
3.0.
Computation of skewness
Skewness has been defined in multiple ways. The steps below explain the method used by
Prism, called g1 (the most common method).
50
1. We want to know about symmetry around the sample mean. So the first step is to
subtract the sample mean from each value, The result will be positive for values
greater than the mean, negative for values that are smaller than the mean, and zero for
values that exactly equal the mean.
2. To compute a unitless measures of skewness, divide each of the differences
computed in step 1 by the standard deviation of the values. These ratios (the
difference between each value and the mean divided by the standard deviation) are
called z ratios. By definition, the average of these values is zero and their standard
deviation is 1.
3. For each value, compute z3
. Note that cubing values preserves the sign. The cube of a
positive value is still positive, and the cube of a negative value is still negative.
4. Average the list of z3
by dividing the sum of those values by n-1, where n is the
number of values in the sample. If the distribution is symmetrical, the positive and
negative values will balance each other, and the average will be close to zero. If the
distribution is not symmetrical, the average will be positive if the distribution is
skewed to the right, and negative if skewed to the left. Why n-1 rather than n? For the
same reason that n-1 is used when computing the standard deviation.
5. Correct for bias. For reasons that I do not really understand, that average computed in
step 4 is biased with small samples -- its absolute value is smaller than it should be.
Correct for the bias by multiplying the mean of z3
by the ratio n/(n-2). This correction
increases the value if the skewness is positive, and makes the value more negative if
the skewness is negative. With large samples, this correction is trivial. But with small
samples, the correction is substantial.
51
Conclusion
The Biostatistics is the most crucial subjects in the field of biological science. The use of
statistics is seen the field to know the significant of any experiments or test results. Statistics
is indispensable into planning in the modern age which is termed as “the age of planning”.
Almost all over the world the govt. are restoring to planning for economic development.
Statistical data and techniques of statistical analysis have to immensely useful involving
economical problem. Such as wages, price, time series analysis, demand analysis. Statistics is
an irresponsible tool of production control. Business executive are relying more and more on
statistical techniques for studying the much and desire of the valued customers. In industry
statistics is widely used inequality control. Statistics are intimately related recent
advancements in statistical technique are the outcome of wide applications of mathematics. In
medical science the statistical tools for collection, presentation and analysis of observed facts
relating to causes and incidence of dieses and the result of application various drugs and
medicine are of great importance. In education and physiology statistics has found wide
application such as, determining or to determine the reliability and validity to a test, factor
analysis etc.In war the theory of decision function can be a great assistance to the military
and personal to plan “maximum destruction with minimum effort. So from the above
discussion it is very clear that the study of biostatistics is very important in the modern age.
To be a good researcher it is very important to know sharply statistics and biostatistics. When
anyone do thesis on biological data he must know the use of biostatistics to analyze the data
in right way and to make a comments on his findings whether it is significant or insignificant.
Therefore we should study it and apply in the field thoroughly.
52
References
Abramowitz, M. and Stegun, I. A. Handbook of Mathematical Functions with Formulas,
Graphs, andMathematical Tables, 9th printing.New York: Dover, p. 928, 1972.
Aitken, Alexander Craig (1957). Statistical Mathematics 8th Edition.Oliver & Boyd.ISBN
9780050013007 (Page 95)
Aldrich, John (1995). "Correlations Genuine and Spurious in Pearson and Yule".Statistical
Science10 (4): 364–376.
Andersen, P.; Gill, R. (1982). "Cox's regression model for counting processes, a large sample
study.".Annals of Statistics10 (4): 1100–1120.
Andrew M. Isserman, "The Right People, the Right Rates," Journal of the American Planning
Association 59.1(1993): 45–64.
Anscombe, Francis J. (1973). "Graphs in statistical analysis".The American Statistician27:
17–21.
Ansley J. Coale, Paul Demeny and Barbara Vaughan, 1983, "Uses of the Tables," Regional
Model Life Tables and Stable Populations, 2nd ed. (New York: Academic Press,
1983) 29–36.
Bagdonavicius, V.; Levuliene, R.; Nikulin, M. (2010). "Goodness-of-fit criteria for the Cox
model from left truncated and right censored data". Journal of Mathematical
Sciences167 (4): 436–443.
Bender, R., Augustin, T. and Blettner, M. (2006).Generating survival times to simulate Cox
proportional hazards models, Statistics in Medicine 2005; 24:1713–1723.
Breslow, N. E.(1975). "Analysis of Survival Data under the Proportional Hazards
Model".International Statistical Review / Revue Internationale de Statistique43 (1):
45–57.
Chernoff, H.; Lehmann, E. L. (1954)."The Use of Maximum Likelihood Estimates in
Tests for Goodness of Fit".The Annals of Mathematical Statistics25 (3): 579–586.
Collett, D. (2003). Modelling Survival Data in Medical Research (2nd ed.).
Cox, D. R. (1997). "Some remarks on the analysis of survival data". the First Seattle
Symposium of Biostatistics: Survival Analysis.
Cox, D. R. (1997). "Some remarks on the analysis of survival data". the First Seattle
Symposium of Biostatistics: Survival Analysis.
Cox, D. R.; Oakes, D. (1984).Analysis of Survival Data. New York: Chapman & Hall.
53
Cox, David R. (1972)."Regression Models and Life-Tables".Journal of the Royal Statistical
Society, Series B34 (2): 187–220.
Critical Values of the Chi-Squared Distribution".e-Handbook of Statistical Methods. National
Institute of Standards and Technology.
Croxton, Frederick Emory; Cowden, Dudley Johnstone; Klein, Sidney (1968) Applied
General Statistics, Pitman. ISBN 9780273403159 (page 625)
Dietrich, Cornelius Frank (1991) Uncertainty, Calibration and Probability: The Statistics of
Scientific and Industrial Measurement 2nd Edition, A. Higler. ISBN 9780750300605
(Page 331)
Donald J. Bogue, Kenneth Hinze and Michael White, Techniques of Estimating Net
Migration (Chicago: Community and Family Study Center, University of Chicago,
1982).
Dowdy, S. and Wearden, S. (1983)."Statistics for Research", Wiley.ISBN 0-471-08602-9pp
230
Efron, Bradley (1974). "The Efficiency of Cox's Likelihood Function for Censored
Data".Journal of the American Statistical Association72 (359): 557–565.
Francis, DP; Coats AJ; Gibson D (1999). "How high can a correlation coefficient be?".Int J
Cardiol69 (2): 185–199.
George W. Barclay, "The study of mortality," Techniques of Population Analysis (New
York: John Wiley and Sons, 1958) 123–134.
Gosall, NarinderKaurGosall, Gurpal Singh (2012). Doctor's Guide to Critical Appraisal. (3.
ed.). Knutsford: PasTest. pp. 129–130.
Greenwood, P.E.; Nikulin, M.S. (1996).A guide to chi-squared testing. New York: Wiley.
Henry S. Shryock and Jacob S. Siegel, "The Life Table," The Methods and Materials of
Demography (Washington, D.C.: United States Bureau of the Census, 1973).
J. L. Rodgers and W. A. Nicewander.Thirteen ways to look at the correlation coefficient. The
American Statistician, 42(1):59–66, February 1988.
James C. Raymondo, "Survival Rates: Census and Life Table Methods," Population
Estimation and Projection (New York: Quorum Books, 1992) 43–60.
Kendall, M. G. (1955) "Rank Correlation Methods", Charles Griffin & Co.
Kendall, W. S.; Barndorff-Nielson, O.; and van Lieshout, M. C. Mathematics of Statistic
Pt. 1, 3rd ed. Princeton, NJ: Van Nostrand, pp. 100-101, 1962.
54
Kenney, J. F. and Keeping, E. S. Mathematics of Statistics, Pt. 2, 2nd ed. Princeton, NJ: Van
Nostrand, 1951.
Lopez-Paz D. and Hennig P. and Schölkopf B. (2013). "The Randomized Dependence
Coefficient", "Conference on Neural Information Processing Systems" Reprint
MahdaviDamghani B. (2013). "The Non-Misleading Value of Inferred Correlation: An
Introduction to the Cointelation Model". Wilmott Magazine.
MahdaviDamghani, Babak (2012). "The Misleading Value of Measured
Correlation".Wilmott2012 (1): 64–73.
Martinussen&Scheike (2006) Dynamic Regression Models for Survival Data (Springer).
Nan Laird and Donald Olivier (1981)."Covariance Analysis of Censored Survival Data Using
Log-Linear Analysis Techniques".Journal of the American Statistical Association76
(374): 231–240.
σikolić D, Muresan RC, Feng W, Singer W (2012) Scaled correlation analysis: a better way
to compute a cross-correlogram. European Journal of Neuroscience, pp. 1–21,
P. McCullagh and J. A. Nelder (2000). "Chapter 13: Models for Survival Data". Generalized
Linear Models (Second ed.). Boca Raton, Florida: Chapman & Hall/CRC. ISBN 0-
412-31760-5.(Second edition 1989; first CRC reprint 1999.)
Pearson, Karl(1900). "On the criterion that a given system of deviations from the probable in
the case of a correlated system of variables is such that it can be reasonably supposed
to have arisen from random sampling". Philosophical Magazine Series 550 (302):
157–175.
Plackett, R. L. (1983). "Karl Pearson and the Chi-Squared Test".International Statistical
Review (International Statistical Institute (ISI)) 51 (1): 59–72.
Press, W. H.; Flannery, B. P.; Teukolsky, S. A.; and Vetterling, W. T. "Moments of a
Distribution: Mean, Variance, Skewness, and So Forth." Numerical Recipes in
FORTRAN: The Art of Scientific Computing, 2nd ed. Cambridge, England:
Cambridge University Press, pp. 604-609, 1992.
Reid, N. (1994). "A Conversation with Sir David Cox".Statistical Science9 (3): 439–455.
Steve H. Murdock and David R. Ellis, Applied Demography: An Introduction to Basic
Concepts, Methods, and Data (Boulder, CO: Westview Press, 1991).
An Assignment On Advanced Biostatistics

More Related Content

Similar to An Assignment On Advanced Biostatistics

Introduction and scope of statistics
Introduction and scope of statisticsIntroduction and scope of statistics
Introduction and scope of statisticskeerthi samuel
 
Branches and application of statistics
Branches and application of statisticsBranches and application of statistics
Branches and application of statisticsIrfan Hussain
 
Phase 2Phase 2Lucia RuizRasmussen College.docx
Phase 2Phase 2Lucia RuizRasmussen College.docxPhase 2Phase 2Lucia RuizRasmussen College.docx
Phase 2Phase 2Lucia RuizRasmussen College.docxmattjtoni51554
 
Basics of Research Types of Data Classification
Basics of Research Types of Data ClassificationBasics of Research Types of Data Classification
Basics of Research Types of Data ClassificationHarshit Pandey
 
UNIT-IV introduction about ANP course for M.sc I year.pptx
UNIT-IV introduction about ANP course for M.sc I year.pptxUNIT-IV introduction about ANP course for M.sc I year.pptx
UNIT-IV introduction about ANP course for M.sc I year.pptxanjalatchi
 
2021f_Cross-sectional study.pptx
2021f_Cross-sectional study.pptx2021f_Cross-sectional study.pptx
2021f_Cross-sectional study.pptxtamanielkhair
 
Statistics and agricultural lecture 1
Statistics and agricultural lecture 1Statistics and agricultural lecture 1
Statistics and agricultural lecture 1Dr.ammara khakwani
 
Role of Statistics in Scientific Research
Role of Statistics in Scientific ResearchRole of Statistics in Scientific Research
Role of Statistics in Scientific ResearchVaruna Harshana
 
biostatistics-220223232107.pdf
biostatistics-220223232107.pdfbiostatistics-220223232107.pdf
biostatistics-220223232107.pdfBagalanaSteven
 
Running Head SCENARIO NCLEX MEMORIAL HOSPITAL .docx
Running Head SCENARIO NCLEX MEMORIAL HOSPITAL                    .docxRunning Head SCENARIO NCLEX MEMORIAL HOSPITAL                    .docx
Running Head SCENARIO NCLEX MEMORIAL HOSPITAL .docxtoltonkendal
 

Similar to An Assignment On Advanced Biostatistics (20)

Introduction and scope of statistics
Introduction and scope of statisticsIntroduction and scope of statistics
Introduction and scope of statistics
 
Branches and application of statistics
Branches and application of statisticsBranches and application of statistics
Branches and application of statistics
 
Types of Research
Types of ResearchTypes of Research
Types of Research
 
Phase 2Phase 2Lucia RuizRasmussen College.docx
Phase 2Phase 2Lucia RuizRasmussen College.docxPhase 2Phase 2Lucia RuizRasmussen College.docx
Phase 2Phase 2Lucia RuizRasmussen College.docx
 
INTRODUCTION TO BIO STATISTICS
INTRODUCTION TO BIO STATISTICS INTRODUCTION TO BIO STATISTICS
INTRODUCTION TO BIO STATISTICS
 
Basics of Research Types of Data Classification
Basics of Research Types of Data ClassificationBasics of Research Types of Data Classification
Basics of Research Types of Data Classification
 
Statistic note
Statistic noteStatistic note
Statistic note
 
PPT1.pptx
PPT1.pptxPPT1.pptx
PPT1.pptx
 
UNIT-IV introduction about ANP course for M.sc I year.pptx
UNIT-IV introduction about ANP course for M.sc I year.pptxUNIT-IV introduction about ANP course for M.sc I year.pptx
UNIT-IV introduction about ANP course for M.sc I year.pptx
 
Statistics
StatisticsStatistics
Statistics
 
2021f_Cross-sectional study.pptx
2021f_Cross-sectional study.pptx2021f_Cross-sectional study.pptx
2021f_Cross-sectional study.pptx
 
Biostatics ppt
Biostatics pptBiostatics ppt
Biostatics ppt
 
Statistics and agricultural lecture 1
Statistics and agricultural lecture 1Statistics and agricultural lecture 1
Statistics and agricultural lecture 1
 
Descriptive statistics
Descriptive  statisticsDescriptive  statistics
Descriptive statistics
 
Role of Statistics in Scientific Research
Role of Statistics in Scientific ResearchRole of Statistics in Scientific Research
Role of Statistics in Scientific Research
 
biostatistics-220223232107.pdf
biostatistics-220223232107.pdfbiostatistics-220223232107.pdf
biostatistics-220223232107.pdf
 
Biostatistics
BiostatisticsBiostatistics
Biostatistics
 
Bas 103
Bas 103Bas 103
Bas 103
 
Running Head SCENARIO NCLEX MEMORIAL HOSPITAL .docx
Running Head SCENARIO NCLEX MEMORIAL HOSPITAL                    .docxRunning Head SCENARIO NCLEX MEMORIAL HOSPITAL                    .docx
Running Head SCENARIO NCLEX MEMORIAL HOSPITAL .docx
 
3 cross sectional study
3 cross sectional study3 cross sectional study
3 cross sectional study
 

More from Amy Roman

The Best Way To Buy Custom Essay By B
The Best Way To Buy Custom Essay By BThe Best Way To Buy Custom Essay By B
The Best Way To Buy Custom Essay By BAmy Roman
 
Hand Writing In Notebook Made From Recycled Paper Stock Photo - Image
Hand Writing In Notebook Made From Recycled Paper Stock Photo - ImageHand Writing In Notebook Made From Recycled Paper Stock Photo - Image
Hand Writing In Notebook Made From Recycled Paper Stock Photo - ImageAmy Roman
 
FREE Dear Santa Writing Paper For Preschool, Pre-K,
FREE Dear Santa Writing Paper For Preschool, Pre-K,FREE Dear Santa Writing Paper For Preschool, Pre-K,
FREE Dear Santa Writing Paper For Preschool, Pre-K,Amy Roman
 
Graffiti Speed Drawing Name Art - AMANDA Name Art, Graffiti, Graffiti
Graffiti Speed Drawing Name Art - AMANDA Name Art, Graffiti, GraffitiGraffiti Speed Drawing Name Art - AMANDA Name Art, Graffiti, Graffiti
Graffiti Speed Drawing Name Art - AMANDA Name Art, Graffiti, GraffitiAmy Roman
 
Research Paper Writing Service - Assignment Essays
Research Paper Writing Service - Assignment EssaysResearch Paper Writing Service - Assignment Essays
Research Paper Writing Service - Assignment EssaysAmy Roman
 
🏷️ How To Start A Conclusion Examples. How To Write A Good C.pdf
🏷️ How To Start A Conclusion Examples. How To Write A Good C.pdf🏷️ How To Start A Conclusion Examples. How To Write A Good C.pdf
🏷️ How To Start A Conclusion Examples. How To Write A Good C.pdfAmy Roman
 
How To Write The Princeton University Essays 2017
How To Write The Princeton University Essays 2017How To Write The Princeton University Essays 2017
How To Write The Princeton University Essays 2017Amy Roman
 
Before You Hand In That Essay Checklist
Before You Hand In That Essay ChecklistBefore You Hand In That Essay Checklist
Before You Hand In That Essay ChecklistAmy Roman
 
Write Esse Interview Essay Outline
Write Esse Interview Essay OutlineWrite Esse Interview Essay Outline
Write Esse Interview Essay OutlineAmy Roman
 
Get Literary Analysis Essay Examples Full -
Get Literary Analysis Essay Examples Full -Get Literary Analysis Essay Examples Full -
Get Literary Analysis Essay Examples Full -Amy Roman
 
Printable Thanksgiving Writing Paper Templates
Printable Thanksgiving Writing Paper TemplatesPrintable Thanksgiving Writing Paper Templates
Printable Thanksgiving Writing Paper TemplatesAmy Roman
 
Opinion Essay Bullying Sketsa
Opinion Essay Bullying SketsaOpinion Essay Bullying Sketsa
Opinion Essay Bullying SketsaAmy Roman
 
GCSE English Language Paper 1 Question 4 Gcse
GCSE English Language Paper 1 Question 4 GcseGCSE English Language Paper 1 Question 4 Gcse
GCSE English Language Paper 1 Question 4 GcseAmy Roman
 
65 Creative And Fun 7Th Grade Writing Prompts 7Th Grad
65 Creative And Fun 7Th Grade Writing Prompts 7Th Grad65 Creative And Fun 7Th Grade Writing Prompts 7Th Grad
65 Creative And Fun 7Th Grade Writing Prompts 7Th GradAmy Roman
 
Trics On How To Write A Non-Plagiarized Essay BESTWritingHE
Trics On How To Write A Non-Plagiarized Essay BESTWritingHETrics On How To Write A Non-Plagiarized Essay BESTWritingHE
Trics On How To Write A Non-Plagiarized Essay BESTWritingHEAmy Roman
 
PPT - Keys To Writing A Good Discussion PowerPoint Presentation, Free
PPT - Keys To Writing A Good Discussion PowerPoint Presentation, FreePPT - Keys To Writing A Good Discussion PowerPoint Presentation, Free
PPT - Keys To Writing A Good Discussion PowerPoint Presentation, FreeAmy Roman
 
Writing Pencil On Image Photo (Free Trial) Bigstock
Writing Pencil On Image Photo (Free Trial) BigstockWriting Pencil On Image Photo (Free Trial) Bigstock
Writing Pencil On Image Photo (Free Trial) BigstockAmy Roman
 
3 Steps To Writing A Powerful Introduction Paragraph - Riset
3 Steps To Writing A Powerful Introduction Paragraph - Riset3 Steps To Writing A Powerful Introduction Paragraph - Riset
3 Steps To Writing A Powerful Introduction Paragraph - RisetAmy Roman
 
Essay Writing Components -
Essay Writing Components -Essay Writing Components -
Essay Writing Components -Amy Roman
 
Rainbow Fish Writing Paper By Sailing Through The Co
Rainbow Fish Writing Paper By Sailing Through The CoRainbow Fish Writing Paper By Sailing Through The Co
Rainbow Fish Writing Paper By Sailing Through The CoAmy Roman
 

More from Amy Roman (20)

The Best Way To Buy Custom Essay By B
The Best Way To Buy Custom Essay By BThe Best Way To Buy Custom Essay By B
The Best Way To Buy Custom Essay By B
 
Hand Writing In Notebook Made From Recycled Paper Stock Photo - Image
Hand Writing In Notebook Made From Recycled Paper Stock Photo - ImageHand Writing In Notebook Made From Recycled Paper Stock Photo - Image
Hand Writing In Notebook Made From Recycled Paper Stock Photo - Image
 
FREE Dear Santa Writing Paper For Preschool, Pre-K,
FREE Dear Santa Writing Paper For Preschool, Pre-K,FREE Dear Santa Writing Paper For Preschool, Pre-K,
FREE Dear Santa Writing Paper For Preschool, Pre-K,
 
Graffiti Speed Drawing Name Art - AMANDA Name Art, Graffiti, Graffiti
Graffiti Speed Drawing Name Art - AMANDA Name Art, Graffiti, GraffitiGraffiti Speed Drawing Name Art - AMANDA Name Art, Graffiti, Graffiti
Graffiti Speed Drawing Name Art - AMANDA Name Art, Graffiti, Graffiti
 
Research Paper Writing Service - Assignment Essays
Research Paper Writing Service - Assignment EssaysResearch Paper Writing Service - Assignment Essays
Research Paper Writing Service - Assignment Essays
 
🏷️ How To Start A Conclusion Examples. How To Write A Good C.pdf
🏷️ How To Start A Conclusion Examples. How To Write A Good C.pdf🏷️ How To Start A Conclusion Examples. How To Write A Good C.pdf
🏷️ How To Start A Conclusion Examples. How To Write A Good C.pdf
 
How To Write The Princeton University Essays 2017
How To Write The Princeton University Essays 2017How To Write The Princeton University Essays 2017
How To Write The Princeton University Essays 2017
 
Before You Hand In That Essay Checklist
Before You Hand In That Essay ChecklistBefore You Hand In That Essay Checklist
Before You Hand In That Essay Checklist
 
Write Esse Interview Essay Outline
Write Esse Interview Essay OutlineWrite Esse Interview Essay Outline
Write Esse Interview Essay Outline
 
Get Literary Analysis Essay Examples Full -
Get Literary Analysis Essay Examples Full -Get Literary Analysis Essay Examples Full -
Get Literary Analysis Essay Examples Full -
 
Printable Thanksgiving Writing Paper Templates
Printable Thanksgiving Writing Paper TemplatesPrintable Thanksgiving Writing Paper Templates
Printable Thanksgiving Writing Paper Templates
 
Opinion Essay Bullying Sketsa
Opinion Essay Bullying SketsaOpinion Essay Bullying Sketsa
Opinion Essay Bullying Sketsa
 
GCSE English Language Paper 1 Question 4 Gcse
GCSE English Language Paper 1 Question 4 GcseGCSE English Language Paper 1 Question 4 Gcse
GCSE English Language Paper 1 Question 4 Gcse
 
65 Creative And Fun 7Th Grade Writing Prompts 7Th Grad
65 Creative And Fun 7Th Grade Writing Prompts 7Th Grad65 Creative And Fun 7Th Grade Writing Prompts 7Th Grad
65 Creative And Fun 7Th Grade Writing Prompts 7Th Grad
 
Trics On How To Write A Non-Plagiarized Essay BESTWritingHE
Trics On How To Write A Non-Plagiarized Essay BESTWritingHETrics On How To Write A Non-Plagiarized Essay BESTWritingHE
Trics On How To Write A Non-Plagiarized Essay BESTWritingHE
 
PPT - Keys To Writing A Good Discussion PowerPoint Presentation, Free
PPT - Keys To Writing A Good Discussion PowerPoint Presentation, FreePPT - Keys To Writing A Good Discussion PowerPoint Presentation, Free
PPT - Keys To Writing A Good Discussion PowerPoint Presentation, Free
 
Writing Pencil On Image Photo (Free Trial) Bigstock
Writing Pencil On Image Photo (Free Trial) BigstockWriting Pencil On Image Photo (Free Trial) Bigstock
Writing Pencil On Image Photo (Free Trial) Bigstock
 
3 Steps To Writing A Powerful Introduction Paragraph - Riset
3 Steps To Writing A Powerful Introduction Paragraph - Riset3 Steps To Writing A Powerful Introduction Paragraph - Riset
3 Steps To Writing A Powerful Introduction Paragraph - Riset
 
Essay Writing Components -
Essay Writing Components -Essay Writing Components -
Essay Writing Components -
 
Rainbow Fish Writing Paper By Sailing Through The Co
Rainbow Fish Writing Paper By Sailing Through The CoRainbow Fish Writing Paper By Sailing Through The Co
Rainbow Fish Writing Paper By Sailing Through The Co
 

Recently uploaded

How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfakmcokerachita
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxAnaBeatriceAblay2
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,Virag Sontakke
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 

Recently uploaded (20)

How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdf
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 

An Assignment On Advanced Biostatistics

  • 1. 1 AN ASSIGNMENT ON ADVANCED BIOSTATISTICS SUBMITTED TO Professor Dr. Md. Shah Jahan Mazumder Chairman Department of Agricultural Statistics & Dean Faculty of Agricultural Economics and Business Studies Syndicate Member Sylhet Agricultural University, Sylhet-3100 SUBMITTED BY Muhammed Hossain MS Student ID. No. 1401011201 Registration No. 0445 Session: January-June’2014 Department of Parasitology Faculty of Veterinary and Animal Science Sylhet Agricultural University, Sylhet-3100
  • 2. 2 ACKNOWLEDGEMENT At the outset, the author wishes to acknowledge the immeasurable grace and kindness of “Almighty Allah” who enable the author to complete the assignment successfully. With great pleasure, it is undoubtedly a great task to make an attempt to prepare an up to date An Assignment on Advanced Biostatistics. The author would like to express his heartfelt gratitude to his eminent Assignment Supervisor Professor Dr. Md. Shah Jahan Mazumder, Chairman, Department of Agricultural Statistics and Dean, Faculty of Agricultural Economics and Business Studies, Sylhet Agricultural University, Sylhet for his scholastic guidance, supervision, intensive care, continuous inspiration, wise criticism and cordial assistance for successful completion of this assignment. Special thanks to Professor DR. Md. Jamal Uddin Bhuiyan, Chairman department of Parasitology, Faculty of Veterinary & Animal Science, Sylhet Agricultural University, Sylhet for his mental support, well guidance and encouragement in writing this assignment. Cordial thanks to DR. Tilak Chandra Nath, Lecturer department of Parasitology, Faculty of Veterinary & Animal Science, Sylhet Agricultural University, Sylhet for his support and encouragement for writing this assignment. I wish to express my appreciation and gratitude to the various authors and publishers for using their books for collecting various information. The Author June’ 2014
  • 3. 3 INDEX SL. NO. CONTENTS PAGE NO. 01 Acknowledgement i 02 Objectives ii 03 Introduction 01-02 04 General Discussion 03-04 05 Research Methodology 05-06 06 Correlation and Regression 07-15 07 Chi Square Test 16-23 08 Hypothesis Testing 24-30 09 Design of Experiment 31-32 10 Life Table 33-39 11 Sample and Sampling 40-44 12 Skewness and Kurtosis 45-47 13 Conclusion 48 14 References 49-52
  • 4. 4 Introduction Statistics are measurement, enumerations or estimates of natural or social phenomena systematically arrangement to exhibit their inner relation. To produce these financial estimates, projections of the population in the Social Security coverage area are needed. One of the essential components of population projections is a projection of mortality, which is the subject of this study. The Biostatistics is the most important subject from the point of view that for analyzing any data you need to study it properly. Mortality rates are presented in this study in the context of life tables, which are commonly used by actuaries and demographers. The most familiar measure of dependence between two quantities is the Pearson product-moment correlation coefficient, or "Pearson's correlation coefficient", commonly called simply "the correlation coefficient". It is obtained by dividing the covariance of the two variables by the product of their standard deviations. Karl Pearson developed the coefficient from a similar but slightly different idea by Francis Galton. A life table is a concise way of showing the probabilities of a member of a particular population living to or dying at a particular age. In this study, the life tables are used to examine the mortality changes in the Social Security population over time. Actual data have become more abundant and more reliable; the use of approximate analytical methods has become less necessary and acceptable. Hypothesis testing is a core and important topic in statistics. In the research hypothesis testing, a hypothesis is an optional but important detail of the phenomenon. The null hypothesis is defined as a hypothesis that is aimed to challenge a researcher. Generally, the null hypothesis represents the current explanation or the vision of a feature which the researcher is going to test. Hypothesis testing includes the tests that are used to determine the outcomes that would lead to the rejection of a null hypothesis in order to get a specified level of significance. This helps to know if the results have enough information, provided that conventional wisdom is being utilized for the establishment of null hypothesis. Today, mortality is most commonly represented in the form of a life table, which gives probabilities of death within one year at each exact integral age. These probabilities are generally based on tabulations of deaths in a given population and estimates of the size of that population. For this study, functions in the life table can be generated from the qx, where qx is the probability of death within a year of a person aged x. Although a life table does not give mortality at non-integral ages or for non-integral durations, as can be obtained from a mathematical formula, acceptable methods for estimating such values are well known. Two basic types of life tables are presented in this study, period-based tables and cohort-based
  • 5. 5 tables. Each type of table can be constructed either based on actual population data or on expected future experience. A period life table is based on, or represents, the mortality experience of an entire population during a relatively short period of time, usually one to three years. Life tables based directly on population data are generally constructed as period life tables because death and population data are most readily available on a time period basis. Such tables are useful in analyzing changes in the mortality experienced by a population through time. Cohort life tables based directly on population experience data are relatively rare, because of the need for data of consistent quality over a very long period of time. Cohort tables can, however, be readily produced, reflecting mortality rates from a series of period tables for past years, projections of future mortality, or a combination of the two. Such tables are superior to period tables for the purpose of projecting a population into the future when mortality is expected to change over time, and for analyzing the generational trends in mortality. The entry lx in the life table shows the number of survivors of that birth cohort at each succeeding exact integral age. Another entry, dx, shows the number of deaths that would occur between succeeding exact integral ages among members of the cohort. The entry denoted Lx gives the number of person-years lived between consecutive exact integral ages x and x+1 and Tx gives the total number of person-years lived beyond each exact integral age x, by all members of the cohort. The final entry in the life table, represents the average number of years of life remaining for members of the cohort still alive at exact integral age x, and is called the life expectancy. The Biostatistics is the most crucial subjects in the field of biological science. The use of statistics is seen the field to know the significant of any experiments or test results. Statistics is indispensable into planning in the modern age which is termed as “the age of planning”. Almost all over the world the govt. are restoring to planning for economic development. Statistical data and techniques of statistical analysis have to immensely useful involving economical problem.So the Biostatistics in an indispensible subject to study. Objectives: 1. To know the uses and importance of Biostatistics in the field of veterinary science. 2. To know the research methodology and the process of data analysis. 3. To study the correlation between the variables. 4. To study the significance of different test like, T-test, Chi square test and F-test etc. 5. To know the detailed procedure of sample and sampling.
  • 6. 6 General Discussion Definition of statistics: Statistics can be defined as the collection presentation and interpretation of numerical data. Statistics are numerical statement of facts in any department of enquiry placed interrelation to each other (Bouly). Statistics are measurement, enumerations or estimates of natural or social phenomena systematically arrangement to exhibit their inner relation (Conner). Biostatistics: Biostatistics may be defined as the collection, organization, presentation, analysis and interpretation on numerical data related to biological science is called Biostatistics. Scope and importance of Statistics: 1. Statistics and planning: Statistics in indispensable into planning in the modern age which is termed as “the age of planning”. Almost all over the world the govt. are re-storing to planning for economic development. 2. Statistics and economics: Statistical data and techniques of statistical analysis have to immensely useful involving economical problem. Such as wages, price, time series analysis, demand analysis. 3. Statistics and business: Statistics is an irresponsible tool of production control. Business executive are relying more and more on statistical techniques for studying the much and desire of the valued customers. 4. Statistics and industry: In industry statistics is widely used inequality control. 5. Statistics and mathematics: Statistics are intimately related recent advancements in statistical technique are the outcome of wide applications of mathematics. 6. Statistics and modern science: In medical science the statistical tools for collection, presentation and analysis of observed facts relating to causes and incidence of dieses and the result of application various drugs and medicine are of great importance. 7. Statistics, psychology and education: In education and physiology statistics has found wide application such as, determining or to determine the reliability and validity to a test, factor analysis etc. 8. Statistics and war: In war the theory of decision function can be a great assistance to the military and personal to plan “maximum destruction with minimum effort.
  • 7. 7 Statistics in business and management: 1. Marketing: Statistical analysis are frequently used in providing information for making decision in the field of marketing it is necessary first to find out what can be sold and the to evolve suitable strategy, so that the goods which to the ultimate consumer. A skill full analysis of data on production purchasing power, man power, habits of compotators, habits of consumer, transportation cost should be consider to take any attempt to establish a new market. 2. Production: In the field of production statistical data and method play a very important role. 3. Finance: The financial organization discharging their finance function effectively depend very heavily on statistical analysis of peat and tigers. 3. Banking: Banking institute have found if increasingly to establish research department within their organization for the purpose of gathering and analysis information, not only regarding their own business but also regarding general economic situation and every segment of business in which they may have interest. 4. Investment: Statistics greatly assists investors in making clear and valued judgment in his investment decision in selecting securities which are safe and have the best prospects of yielding a good income. 5. Purchase: the purchase department in discharging their function makes use of statistical data to frame suitable purchase policies such as what to buy? What quantity to buy? What time to buy? Where to buy? Whom to buy? 6. Accounting: statistical data are also employer in accounting particularly in auditing function, the technique of sampling and destination is frequently used. 7. Control: the management control process combines statistical and accounting method in making the overall budget for the coming year including sales, materials, labor and other costs and net profits and capital requirement. Biostatistics and Statistical Programming: Scope International covers all aspects regarding statistics. Our team is committed to providing the highest quality services and always works together with the sponsor’s goal in mind, flexibility remains a key priority. Statistical advice on study design and protocol preparation. Sample size calculations, Statistical analysis planning (SAP), Statistical analysis of clinical data including validation using SAS, Statistical listings and reports.
  • 8. 8 Thesis Title: “Prevalence of hookworms and Thread worm infections in human population of tea garden area of Sylhet region” Research Methodology Study Area, Time period and populations The study will be conducted for a period of 12 months starting from June 2014 to May 2015. To investigate the helmints parasites, faeces will randomly collect from the people of different areas of tea garden of Sylhet region from June 2014 to May 2015.The tea garden dwellers are predominantly tea garden workers and traders but some of them engage themselves to produce vegetables, rear cattle, goat and chickens. Housing facilities are very poor. There is electricity, but potable water and sanitary toilet facilities are lacking in the tea garden area of sylhet region. Collection of stool specimens Fresh stool specimens will be collected from 360 tea garden workers after proper enlightenment. The stool samples would be collected into dry, clean, transparent, screw-cap universal bottles. The specimens will be processed on the same day of collection at the Parasitology Laboratory, Sylhet Agricultural University, Sylhet. Macroscopy Stool samples will be examined visually, within transparent containers for consistency, presence of blood, mucus or adult worms etc. Microscopy A pea-size quantity of formed stool specimen will placed in a clean universal bottle and homogenized with few drops of normal saline. Direct saline and iodine smears would made on clean slides and examine with 10x and 40x objective lenses for parasites. Sedimentation technique will be employed to concentrate all the stool specimens. Culture of worm’s larvae Stool samples previously confirmed by microscopy to carry eggs of hookworm, will be cultured to recover larvae using the Harada – Mori technique (Raiet al., 1997). Wear gloves when performing this procedure. Cut a narrow (3/8 by 5 inch.) strip of filter paper, and taper it slightly at one end. Smear 0.5 to 1g feces in the centre of the strip. Add 3 to 4 ml of
  • 9. 9 distilled water to a 15 ml conical centrifuge tube. Insert the filter paper strip into the tube so that the taper end is near the bottom of the tube. The water level should be slightly (0.5 inch.) below the fecal spot. It is not necessary to cap the tube. However, a cork stopper or a cotton plug may be used. Tube will be allowed to stand upright in a rack at 25 to 28°C. To maintain the original level (usually evaporation takes place over the first 2 days, but then the culture becomes stabilized) distilled water will be added. The tube will be kept for 10 days, and check daily by withdrawing a small amount of fluid from the bottom of the tube. Prepare a smear on a glass slide, cover with a cover slip, and examine with the 10X objectives. Examine the larvae for motility and typical morphological features to reveal whether hookworm or Strongyloides larvae are present. Filariform larvae of Ancylostomaduodenaleand Necatoramericanusare characterized by a blunt head and tail and there is no gap between the oesophagus and intestine. Whereas the oesophagus of the larvae of Ancylostomaduodenaledoes not end in a thistle funnel shape, that of the larvae of Necatoramericanusends in a thistle funnel shape. This is the main feature which is used to differentiate between the two species (Okolie, 2007). Statistical analysis: The imported data will be imported, stored and coded accordingly using Microsoft Excel 2008. Statistical test will be done for depending upon the findings. Determination of association between the variables will be done by using statistical software SPSS. Time Frame: Serial No. Name of Activity Duration 01 Collection of required instruments and laboratory set up 01 Month 02 Sample collection, Preparation, Culture and Examination 10 Month 03 Thesis Writing and Presentation 01 Month
  • 10. 10 Correlation and Regression The goal of a correlation analysis is to see whether two measurement variables co vary, and to quantify the strength of the relationship between the variables, whereas regression expresses the relationship in the form of an equation. For example, in students taking a Maths and English test, we could use correlation to determine whether students who are good at Maths tend to be good at English as well, and regression to determine whether the marks in English can be predicted for given marks in Maths. Use of Correlation We can use the correlation coefficient, such as the Pearson Product Moment Correlation Coefficient, to test if there is a linear relationship between the variables. To quantify the strength of the relationship, we can calculate the correlation coefficient (r). Its numerical value ranges from +1.0 to -1.0. r> 0 indicates positive linear relationship, r < 0 indicates negative linear relationship while r = 0 indicates no linear relationship. Use of Regression In regression analysis, the problem of interest is the nature of the relationship itself between the dependent variable (response) and the (explanatory) independent variable. The analysis consists of choosing and fitting an appropriate model, done by the method of least squares, with a view to exploiting the relationship between the variables to help estimate the expected response for a given value of the independent variable. For example, if we are interested in the effect of age on height, then by fitting a regression line, we can predict the height for a given age. Uses of Correlation and Regression There are three main uses for correlation and regression.  One is to test hypotheses about cause-and-effect relationships. In this case, the experimenter determines the values of the X-variable and sees whether variation in X causes variation in Y. For example, giving people different amounts of a drug and measuring their blood pressure.  The second main use for correlation and regression is to see whether two variables are associated, without necessarily inferring a cause-and-effect relationship. In this case, neither variable is determined by the experimenter; both are naturally variable. If an
  • 11. 11 association is found, the inference is that variation in X may cause variation in Y, or variation in Y may cause variation in X, or variation in some other factor may affect both X and Y.  The third common use of linear regression is estimating the value of one variable corresponding to a particular value of the other variable. The linear correlation coefficient is the ratio between the covariance and the product of standard deviations of both variables. The linear correlation coefficient is denoted by the letter r. Properties of the Correlation Coefficient: 1. The correlation coefficient does not change the measurement scale. That is, if the height is expressed in meters or feet, the correlation coefficient does not change. 2. The sign of the correlation coefficient is the same as the covariance. 3. The linear correlation coefficient is a real number between −1 and 1. 4. −1 ≤ r ≤ 1 5. If the linear correlation coefficient takes values closer to −1, the correlation is strong and negative, and will become stronger the closer r approaches −1. 6. If the linear correlation coefficient takes values close to 1 the correlation is strong and positive, and will become stronger the closer r approaches 1 7. If the linear correlation coefficient takes values close to 0, the correlation is weak. 8. If r = 1 or r = −1, there is perfect correlation and the line on the scatter plot is increasing or decreasing respectively. 9. If r = 0, there is no linear correlation. Example: The scores of 12 students in their mathematics and physics classes are: Mathematics 2 3 4 4 5 6 6 7 7 8 10 10 Physics 1 3 2 4 4 4 6 4 6 7 9 10
  • 12. 12 Find the correlation coefficient distribution and interpret it. 1º Find the arithmetic means. 2º Calculate the covariance. 3º Calculate the standard deviations. 4º Apply the formula for the linear correlation coefficient. xi yi xi ·yi xi 2 yi 2 2 1 2 4 1 3 3 9 9 9 4 2 8 16 4 4 4 16 16 16 5 4 20 25 16 6 4 24 36 16 6 6 36 36 36 7 4 28 49 16 7 6 42 49 36 8 7 56 64 49 10 9 90 100 81 10 10 100 100 100 72 60 431 504 380
  • 13. 13 The correlation is positive. As the correlation coefficient is very close to 1, the correlation is very strong. The values of the two variables X and Y are distributed according to the following table: Y/X 0 2 4 1 2 1 3 2 1 4 2 3 2 5 0 Calculate the correlation coefficient. Turn the double entry table into a single table. xi yi fi xi · fi xi 2 · fi yi · fi yi 2 · fi xi · yi· fi 0 1 2 0 0 2 2 0 0 2 1 0 0 2 4 0 0 3 2 0 0 6 18 0 2 1 1 2 4 1 1 2 2 2 4 8 16 8 16 16 2 3 5 10 20 15 45 30 4 1 3 12 48 3 3 12 4 2 2 8 32 4 8 16 20 40 120 41 97 76
  • 14. 14 The correlation is negative. As the correlation coefficient is very close to 0, the correlation is very weak. Pearson's product-moment coefficient Main article: Pearson product-moment correlation coefficient. The most familiar measure of dependence between two quantities is the Pearson product-moment correlation coefficient, or "Pearson's correlation coefficient", commonly called simply "the correlation coefficient". It is obtained by dividing the covariance of the two variables by the product of their standard deviations. Karl Pearson developed the coefficient from a similar but slightly different idea by Francis Galton. The population correlation coefficient ρX,Y between two random variablesX and Y with expected valuesμX and μY and standard deviations X and Y is defined as: whereE is the expected value operator, cov means covariance, and, corr a widely used alternative notation for the correlation coefficient. The Pearson correlation is defined only if both of the standard deviations are finite and both of them are nonzero. It is a corollary of the Cauchy–Schwarz inequality that the correlation cannot exceed 1 in absolute value. The correlation coefficient is symmetric: corr(X,Y ) = corr(Y ,X). The Pearson correlation is +1 in the case of a perfect direct (increasing) linear relationship (correlation), −1 in the case of a perfect decreasing (inverse) linear
  • 15. 15 relationship (anti-correlation), and some value between −1 and 1 in all other cases, indicating the degree of linear dependence between the variables. As it approaches zero there is less of a relationship (closer to uncorrelated). The closer the coefficient is to either −1 or 1, the stronger the correlation between the variables. If the variables are independent, Pearson's correlation coefficient is 0, but the converse is not true because the correlation coefficient detects only linear dependencies between two variables. For example, suppose the random variable X is symmetrically distributed about zero, and Y = X2 . Then Y is completely determined by X, so that X and Yare perfectly dependent, but their correlation is zero; they are uncorrelated. However, in the special case when X and Y are jointly normal, uncorrelatedness is equivalent to independence. If we have a series of n measurements of X and Ywritten as xi and yi where i = 1, 2, ...,n, then the sample correlation coefficient can be used to estimate the population Pearson correlation r between X and Y . The sample correlation coefficient is written where x and y are the sample means of X and Y , and sx and sy are the sample standard deviations of X and Y . This can also be written as: If x and y are results of measurements that contain measurement error, the realistic limits on the correlation coefficient are not −1 to +1 but a smaller range. For the case of a linear model with a single independent variable, the coefficient of determination (R squared) is the square of r, Pearson's product-moment coefficient. Rank correlation coefficients: Main articles: Spearman's rank correlation coefficient and Kendall tau rank correlation coefficientRank correlation coefficients, such as Spearman's rank correlation coefficient and
  • 16. 16 Kendall's rank correlation coefficient ( ) measure the extent to which, as one variable increases, the other variable tends to increase, without requiring that increase to be represented by a linear relationship. If, as the one variable increases, the other decreases, the rank correlation coefficients will be negative. It is common to regard these rank correlation coefficients as alternatives to Pearson's coefficient, used either to reduce the amount of calculation or to make the coefficient less sensitive to non-normality in distributions. However, this view has little mathematical basis, as rank correlation coefficients measure a different type of relationship than the Pearson product-moment correlation coefficient, and are best seen as measures of a different type of association, rather than as alternative measure of the population correlation coefficient. Partial correlation If a population or data-set is characterized by more than two variables, a partial correlation coefficient measures the strength of dependence between a pair of variables that is not accounted for by the way in which they both change in response to variations in a selected subset of the other variables. Correlation Coefficient Statistics is a branch of mathematics that deals with a numeric data. It is basically concerns with collection, manipulation, management, organization and analysis of numeric data. One of the most important concepts that we come across in statistics is a correlation. Correlation indicates the statistical dependence between two random variables. It is the measure of the relationship between two sets of data. For example - Correlation between height and weight of students of a particular standard refers to the overall relationship between their height and weight that in what manner they vary. Correlation is of two types - (1) Positive Correlation - When the values of one variable increase with that of another are increased. (2) Negative Correlation - When the values of one variable decrease with that of another are increased or vice versa. (3) No Correlation - There is no impact on one variable with an increase or decrease of values of another variable. The correlation between two variables is a number which is known as a correlation coefficient. Correlation Coefficient is a statistical concept, which helps in establishing a
  • 17. 17 relation between the predicted and actual values obtained in a statistical experiment. The calculated value of the correlation coefficient explains the exactness between the predicted and actual values. Its value always lies between -1 to +1. If the value of the correlation coefficient is positive, it indicates the similar and identical relation between the two values. Correlation Coefficient Correlation can be defined as the degree of relationship between two variables. It needs pairs of points to be available for every set of values of each of the variable. In a two dimensional plot, the variables can be arbitrarily labelled as X and Y, where X mostly attains the independent variable, which is used for prediction, and Y attains the dependent variable, the value which is predicted. The correlation coefficient, sometimes also called the cross- correlation coefficient.Correlation is a technique which shows if and how strongly pairs of variables are related. Correlation Coefficient Formula Formula for simple Correlation coefficient is given below,If x & y are the two variables of discussion, then correlation coefficient r can be calculated using the formula Intra-class Correlation Coefficient The intra-class correlation is commonly used to quantify the degree to which individuals with a fixed degree of relatedness. The correlation coefficient is a measure that determines the degree to which the movements of two variables are associated. Correlation coefficients are very sensitive to sample size. A correlation coefficient interpreted in relation to the size of the sample form which it was obtained. With a sufficient increase in sample size, almost any observed correlation value will statistically significant, even if it is so small as to a meaningless indicator of association. The intra-class correlation coefficient is a reliability coefficient calculated with variance estimates obtained through analysis of variance. Intra- class correlation coefficient can be used for two or more ratings. Multiple Correlation Coefficient The sample multiple correlation coefficient, R, is a measure of the strength of the association between the independent variables and the one dependent variable. Multiple correlation is a measure of how well a given variable can be predicted using a linear function of a set of other variables. It is measured by the coefficient of determination, but under the particular
  • 18. 18 assumption. 'R' can be any value from 0 to +1. The multiple correlation coefficient measures the correlation between a dependent variable and the combined effect of other designated variables in the system. Partial Correlation Coefficient A partial correlation coefficient is a measure of the linear dependence of a pair of random variables from a collection of random variables in the case where the influence of the remaining variables is eliminated. A partial correlation between two variables can differ substantially from their simple correlation. Sometimes the correlation between two variables X and Y may be partly due to the correlation of third variables, Z with both X and Y. This correlation is called the partial correlation and the correlation coefficient between X and Y after the linear effect of Z on each of them has been eliminated is called the partial correlation coefficient. A partial correlation coefficient can be written in terms of simple correlation coefficients: Population Correlation Coefficient The population correlation coefficient ρ measure the degree of association between two variables in the population of interest. The population correlation coefficient is estimated by the confidence interval. Linear Regression Coefficient Regression measures the amount of average relationship or mathematical relationship between two variables in terms of original units of data. Linear Regression can be measured by using lines of regression and also curve linear regression can be measured by using correlation ratio. In linear regression, the coefficient of determination, R2 , is equal to the square of the correlation coefficient, i.e., R2 = r2 .
  • 19. 19 Chi-Square Test Chi-square is a statistical test commonly used to compare observed data with data we would expect to obtain according to a specific hypothesis. For example, if, according to Mendel's laws, you expected 10 of 20 offspring from a cross to be male and the actual observed number was 8 males, then you might want to know about the "goodness to fit" between the observed and expected. Were the deviations (differences between observed and expected) the result of chance, or were they due to other factors. How much deviation can occur before you, the investigator, must conclude that something other than chance is at work, causing the observed to differ from the expected. The chi-square test is always testing what scientists call the null hypothesis, which states that there is no significant difference between the expected and observed result. That is, chi-square is the sum of the squared difference between observed (o) and the expected (e) data (or the deviation, d), divided by the expected data in all possible categories. For example, suppose that a cross between two pea plants yields a population of 880 plants, 639 with green seeds and 241 with yellow seeds. You are asked to propose the genotypes of the parents. Your hypothesis is that the allele for green is dominant to the allele for yellow and that the parent plants were both heterozygous for this trait. If your hypothesis is true, then the predicted ratio of offspring from this cross would be 3:1 (based on Mendel's laws) as predicted from the results of the Punnett square. Chi-square requires numerical values, not percentages or ratios: 1. Determine degrees of freedom (df). Degrees of freedom can be calculated as the number of categories in the problem minus 1. In our example, there are two categories (green and yellow); therefore, there is I degree of freedom. 2. Determine a relative standard to serve as the basis for accepting or rejecting the hypothesis. The relative standard commonly used in biological research is p >0.05. The p value is the probability that the deviation of the observed from that expected is due to chance alone (no other forces acting). In this case, using p>0.05, you would expect any deviation to be due to chance alone 5% of the time or less. 3. Refer to a chi-square distribution table (Table B.2). Using the appropriate degrees of 'freedom, locate the value closest to your calculated chi-square in the table. Determine
  • 20. 20 the closestp (probability) value associated with your chi-square and degrees of freedom. In this case (x2 =2.668), the p value is about 0.10, which means that there is a 10% probability that any deviation from expected results is due to chance only. Based on our standard p >0.05, this is within the range of acceptable deviation. In terms of your hypothesis for this example, the observed chi-squareis not significantly different from expected. The observed numbers are consistent with those expected under Mendel's law. Step-by-Step Procedure for Testing Hypothesis and Calculating Chi-Square: 1. State the hypothesis being tested and the predicted results. Gather the data by conducting the proper experiment (or, if working genetics problems, use the data provided in the problem). 2. Determine the expected numbers for each observational class. Remember to use numbers, not percentages. Chi-square should not be calculated if the expected value in any category is less than 5. 3. Calculate chi square using the formula. Complete all calculations to three significant digits. Round off your answer to two significant digits. 4. Use the chi-square distribution table to determine significance of the value. 5. Determine degrees of freedom and locate the value in the appropriate column. 6. Locate the value closest to your calculated 2 on that degrees of freedom df row. 7. Move up the column to determine the p value. 8. State your conclusion in terms of your hypothesis. 9. If the p value for the calculated x2 is p >0.05, accept your hypothesis. 'The deviation is small enough that chance alone accounts for it. A p value of 0.6, for example, means that there is a 60% probability that any deviation from expected is due to chance only. This is within the range of acceptable deviation. 10. If the p value for the calculated x2 is p <0.05, reject your hypothesis, and conclude that some factor other than chance is operating for the deviation to be so great. For example, a p value of 0.01 means that there is only a 1% chance that this deviation is due to chance alone. Therefore, other factors must be involved. 11. The chi-square test will be used to test for the "goodness to fit" between observed and expected data from several laboratory investigations in this lab manual.
  • 21. 21 Chi-Square Test of Homogeneity This lesson explains how to conduct a chi-square test of homogeneity. The test is applied to a single categorical variable from two different populations. It is used to determine whether frequency counts are distributed identically across different populations. For example, in a survey of TV viewing preferences, we might ask respondents to identify their favorite program. We might ask the same question of two different populations, such as males and females. We could use a chi-square test for homogeneity to determine whether male viewing preferences differed significantly from female viewing preferences. The sample problem at the end of the lesson considers this example. The test procedure described in this lesson is appropriate when the following conditions are met: 1. For each population, the sampling method is simple random sampling. 2. Each population is at least 10 times as large as its respective sample. 3. The variable under study is categorical. 4. If sample data are displayed in a contingency table (Populations x Category levels), the expected frequency count for each cell of the table is at least 5. This approach consists of four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) analyze sample data, and (4) interpret results. State the Hypotheses: Every hypothesis test requires the analyst to state a null hypothesis and an alternative hypothesis. The hypotheses are stated in such a way that they are mutually exclusive. That is, if one is true, the other must be false; and vice versa.Suppose that data were sampled from r populations, and assume that the categorical variable had c levels. At any specified level of the categorical variable, the null hypothesis states that each population has the same proportion of observations. Thus, the alternative hypothesis (Ha) is that at least one of the null hypothesis statements is false. Analyze Sample Data Using sample data from the contingency tables, find the degrees of freedom, expected frequency counts, test statistic, and the P-value associated with the test statistic. The analysis described in this section is illustrated in the sample problem at the end of this lesson. 1. Degrees of freedom. The degrees of freedom (DF) is equal to:
  • 22. 22 DF = (r - 1) * (c - 1) , where r is the number of populations, and c is the number of levels for the categorical variable. 2. Expected frequency counts. The expected frequency counts are computed separately for each population at each level of the categorical variable, according to the following formula. Er,c = (nr * nc) / n , where Er,c is the expected frequency count for population r at level c of the categorical variable, nr is the total number of observations from population r, nc is the total number of observations at treatment level c, and n is the total sample size. 3. Test statistic. The test statistic is a chi-square random variable (Χ2 ) defined by the following equation. Χ2 = Σ [ (τr,c - Er,c)2 / Er,c ] , where Or,c is the observed frequency count in population r for level c of the categorical variable, and Er,c is the expected frequency count in population r for level c of the categorical variable. 4. P-value. The P-value is the probability of observing a sample statistic as extreme as the test statistic. Since the test statistic is a chi-square, use the Chi-Square Distribution Calculator to assess the probability associated with the test statistic. Use the degrees of freedom computed above. Interpret Results: If the sample findings are unlikely, given the null hypothesis, the researcher rejects the null hypothesis. Typically, this involves comparing the P-value to the significance level, and rejecting the null hypothesis when the P-value is less than the significance level. Test Your Understanding of This Lesson Problem In a study of the television viewing habits of children, a developmental psychologist selects a random sample of 300 first graders - 100 boys and 200 girls. Each child is asked which of the following TV programs they like best: The Lone Ranger, Sesame Street, or The Simpsons. Results are shown in the contingency table below.
  • 23. 23 Viewing Preferences Row total Lone Ranger Sesame Street The Simpsons Boys 50 30 20 100 Girls 50 80 70 200 Column total 100 110 90 300 Do the boys' preferences for these TV programs differ significantly from the girls' preferences? Use a 0.05 level of significance. Solution The solution to this problem takes four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) analyze sample data, and (4) interpret results. We work through those steps below:  State the hypotheses. The first step is to state the null hypothesis and an alternative hypothesis. o Null hypothesis: The null hypothesis states that the proportion of boys who prefer the Lone Ranger is identical to the proportion of girls. Similarly, for the other programs. o Alternative hypothesis: At least one of the null hypothesis statements is false.  Formulate an analysis plan. For this analysis, the significance level is 0.05. Using sample data, we will conduct a chi-square test for homogeneity.  Analyze sample data. Applying the chi-square test for homogeneity to sample data, we compute the degrees of freedom, the expected frequency counts, and the chi- square test statistic. Based on the chi-square statistic and the degrees of freedom, we determine the P-value. where DF is the degrees of freedom, r is the number of populations, c is the number of levels of the categorical variable, nr is the number of observations from population r, nc is the number of observations from level c of the categorical variable, n is the number of observations in the sample, Er,c is the expected frequency count in population r for level c, and Or,c is the observed frequency count in population r for level c. The P-value is the probability that a chi-square statistic having 2 degrees of freedom is more extreme than 19.39.
  • 24. 24 Pearson's chi-squared test (χ2 ) is a statistical test applied to sets of categorical data to evaluate how likely it is that any observed difference between the sets arose by chance. It is suitable for unpaired data from large samples.[1] It is the most widely used of many chi- squared tests (Yates, likelihood ratio, portmanteau test in time series, etc.). Statistical procedures whose results are evaluated by reference to the chi-squared distribution. Its properties were first investigated by Karl Pearson in 1900.[2] In contexts where it is important to improve a distinction between the test statistic and its distribution, names similar to Pearson χ-squared test or statistic are used. It tests a null hypothesis stating that the frequency distribution of certain events observed in a sample is consistent with a particular theoretical distribution. The events considered must be mutually exclusive and have total probability 1. A common case for this is where the events each cover an outcome of a categorical variable. A simple example is the hypothesis that an ordinary six-sided die is "fair", i. e., all six outcomes are equally likely to occur. Pearson's chi-squared test is used to assess two types of comparison: tests of goodness of fit and tests of independence. 1. A test of goodness of fit establishes whether or not an observed frequency distribution differs from a theoretical distribution. 2. A test of independence assesses whether paired observations on two variables, expressed in a contingency table, are independent of each other (e.g. polling responses from people of different nationalities to see if one's nationality is related to the response). The procedure of the test includes the following steps: 1. Calculate the chi-squared test statistic, , which resembles a normalized sum of squared deviations between observed and theoretical frequencies (see below). 2. Determine the degrees of freedom, df, of that statistic, which is essentially the number of frequencies reduced by the number of parameters of the fitted distribution. 3. Compare to the critical value from the chi-squared distribution with df degrees of freedom, which in many cases gives a good approximation of the distribution of . Test of independence
  • 25. 25 In this case, an "observation" consists of the values of two outcomes and the null hypothesis is that the occurrence of these outcomes is statistically independent. Each observation is allocated to one cell of a two-dimensional array of cells (called a contingency table) according to the values of the two outcomes. If there are r rows and c columns in the table, the "theoretical frequency" for a cell, given the hypothesis of independence, is where is the total sample size (the sum of all cells in the table). With the term "frequencies" this page does not refer to already normalised values. The value of the test-statistic is Fitting the model of "independence" reduces the number of degrees of freedom by p = r + c − 1. The number of degrees of freedom is equal to the number of cells rc, minus the reduction in degrees of freedom, p, which reduces to (r − 1)(c − 1).For the test of independence, also known as the test of homogeneity, a chi-squared probability of less than or equal to 0.05 (or the chi-squared statistic being at or larger than the 0.05 critical point) is commonly interpreted by applied workers as justification for rejecting the null hypothesis that the row variable is independent of the column variable.[3] The alternative hypothesis corresponds to the variables having an association or relationship where the structure of this relationship is not specified. A test that relies on different assumptions is Fisher's exact test; if its assumption of fixed marginal distributions is met it is substantially more accurate in obtaining a significance level, especially with few observations. In the vast majority of applications this assumption will not be met, and Fisher's exact test will be over conservative and not have correct coverage. Goodness of fit In this context, the frequencies of both theoretical and empirical distributions are un- normalised counts, and for a chi-squared test the total sample sizes of both these distributions (sums of all cells of the corresponding contingency tables) have to be the same.
  • 26. 26 For example, to test the hypothesis that a random sample of 100 people has been drawn from a population in which men and women are equal in frequency, the observed number of men and women would be compared to the theoretical frequencies of 50 men and 50 women. If there were 44 men in the sample and 56 women, then If the null hypothesis is true (i.e., men and women are chosen with equal probability), the test statistic will be drawn from a chi-squared distribution with one degree of freedom (because if the male frequency is known, then the female frequency is determined).
  • 27. 27 Hypothesis Testing In statistics, during a statistical survey or a research, a hypothesis has to be set and defined. It is termed as a statistical hypothesis It is actually an assumption for the population parameter. Though, it is definite that this hypothesis is always proved to be true. The hypothesis testing refers to the predefined formal procedures that are used by statisticians whether to accept or reject the hypotheses. Hypothesis testing is defined as the process of choosing hypotheses for a particular probability distribution, on the basis of observed data. Hypothesis testing is a core and important topic in statistics. In the research hypothesis testing, a hypothesis is an optional but important detail of the phenomenon. The null hypothesis is defined as a hypothesis that is aimed to challenge a researcher. Generally, the null hypothesis represents the current explanation or the vision of a feature which the researcher is going to test. Hypothesis testing includes the tests that are used to determine the outcomes that would lead to the rejection of a null hypothesis in order to get a specified level ofsignificance. A hypothesis testing is utilized in the reference of a research study. Hypothesis test is used to evaluate and analyze the results of the research study. Let us learn more about this topic. Hypothesis Testing Hypothesis testing is one of the most important concepts in statistics. A statistical hypothesis is an assumption about a population parameter. This assumption may or may not be true. The methodology employed by the analyst depends on the nature of the data used and the goals of the analysis. The goal is to either accept or reject the null hypothesis. Hypothesis Testing Terms Given below are some of the terms used in hypothesis testing. 1. Test Statistic: The decision, whether to accept and reject the null hypothesis is made based on this value. The test statistic is a defined formula based on the distribution t, z, F etc. If calculated test statistic value is less than the critical value, we accept the hypothesis, otherwise, we reject the hypothesis.
  • 28. 28 Hypothesis Testing Formula: z-test statistic is used for testing the mean of the large sample. The test statistic is given by z =x¯−μσn√ where, x¯ is the sample mean, μ is the population mean, σ is the population standard deviation and n is the sample size. 2. Level of Significance The confidence at which a null hypothesis is accepted or rejected is called level of significance. The level of significance is denoted by α 3. Critical Value Critical value is the value that divides the regions into two-Acceptance region and rejection region. If the computed test statistic falls in the rejection region, we reject the hypothesis. Otherwise, we accept the hypothesis. The critical value depends upon the level of significance and alternative hypothesis. 4. One Sided or Two Sided Hypothesis The alternative hypothesis is one sided if the parameter is larger or smaller than the null hypothesis value. It is two sided when the parameter is different from the null hypothesis value. The null hypothesis is usually tested against an alternative hypothesis(H1). The alternative hypothesis can take one of three forms: 1. H1: B1>1, is one-sided alternative hypothesis. 2. H1: B1< 1, also a one-sided alternative hypothesis. 3. H1: B1≠1, is two-sided alternative hypothesis. That is, the true value is either greater or less than 1. 5. P – Value: The probability that the statistic takes a value as extreme or more than extreme assuming that the null hypothesis is true is called P- value. The P-value is the probability of observing a sample statistic as extreme as the test statistic, assuming the null hypothesis is true. The P value is the probability of seeing the observed difference, or greater, just by chance if the null hypothesis is true. The larger the P value, the smaller will be the evidence against the null hypothesis. A hypothesis testing gives the following benefits 1. They establish the focus and track for a research effort. 2. Their development helps the researcher shape the purpose of the research movement.
  • 29. 29 3. They establish which variables will not be measured in a study and similarly those, which will be measured. 4. They need the researcher to contain the operational explanation of the variables of interest. Process of Hypothesis Testing 1. State the hypotheses of importance 2. Conclude the suitable test statistic 3. State the stage of statistical significance 4. State the decision regulation for rejecting / not rejecting the null hypothesis 5. Collect the data and complete the needed calculations 6. Choose to reject / not reject the null hypothesis Errors in Research Testing: It is common to make two types of errors while drawing conclusions in research: Type 1: When we recognize the research hypothesis and the null hypothesis is supposed to be correct. Type 2: When we refuse the research hypothesis even if the null hypothesis is incorrect. Purpose of Hypothesis Testing Hypothesis testing begins with the hypothesis made about the population parameter. Then, collect data from appropriate sample and obtained information from the sample is used to decide how likely it is that the hypothesized population parameter is correct. The purpose of hypothesis testing is not to question the computed value of the sample statistic but to make a judgement about the difference between two samples and a hypothesized population parameter. Hypothesis Testing Steps: We illustrate the five steps to hypothesis testing in the context of testing a specified value for a population proportion. The procedure for hypothesis testing is given below: 1. Set up a null hypothesis and alternative hypothesis. 2. Decide about the test criterion to be used. 3. Calculate the test statistic using the given values from the sample
  • 30. 30 4. Find the critical value at the required level of significance and degrees of freedom. 5. Decide whether to accept or reject the hypothesis. If the calculated test statistic value is less than the critical value, we accept the hypothesis otherwise we reject the hypothesis. Different Types of Hypothesis: There are 5 different types of hypothesis as follows: 1. Simple Hypothesis: If a hypothesis is concerned with the population completely such as functional form and the parameter, it is called simple hypothesis. Example: The hypothesis “Population is normal with mean as 15 and standard deviation as 5" is a simple hypothesis 2.Composite Hypothesis or Multiple Hypothesis: If the hypothesis concerning the population is not explicitly defined based on the parameters, then it is composite hypothesis or multiple hypothesis. Example: The hypothesis “population is normal with mean is 15" is a composite or multiple hypothesis. 3. Parametric Hypothesis A hypothesis, which specifies only the parameters of the probability density function, is called parametric hypothesis. Example: The hypothesis “Mean of the population is 15" is parametric hypothesis. 4. Non Parametric Hypothesis: If a hypothesis specifies only the form of the density function in the population, it is called a non- parametric hypothesis. Example: The hypothesis "population is normal" is non - parametric. Null and Alternative Hypothesis: A null hypothesis can be defined as a statistical hypothesis, which is stated for acceptance. It is the original hypothesis. Any other hypothesis other than null hypothesis is called Alternative hypothesis. When null hypothesis is rejected we accept the alternative hypothesis. Null hypothesis is denoted by H0 and alternative hypothesis is denoted by H1. Example: When we want to test if the population mean is 30, then null hypothesis is “Population mean is 30'' and alternative Hypothesis is “Population mean is not 30". Logic of Hypothesis Testing The logic underlying the hypothesis testing procedure as follow: 1. The hypothesis concerns the value of a population parameter. 2. Before select a sample, we use the hypothesis to predict the characteristics that the sample should have.
  • 31. 31 3. Obtain the random sample from the population. 4. At last compare the obtained sample data with the prediction made from the hypothesis. Hypothesis is reasonable if the sample mean is consistent with the prediction otherwise hypothesis is wrong. Type I Error and Type II Error The probability of rejecting the null hypothesis, when it is true, is called Type I error whereas the probability of accepting the null hypothesis is called Type II error.Example: Suppose a toy manufacturer and its main supplier agreed that the quality of each shipment will meet a particular benchmark. Our null hypothesis is that the quality is 90%. If we accept the shipment, given the quality is less than 90%, then we have committed Type I error. If we reject the shipment, given the the quality is greater than 90%, we have committed Type II error. Power of the Test: Power of a test is defined as the probability that the test will reject the null hypothesis when the alternative hypothesis is true. For a fixed level of significance, if we increase the sample size, the probability of Type II error decreases, which in turn increases the power. So to increase the power, the best method is to increase the sample size. 1. Only one of the Type I error or the Type II error is possible at a time. 2. The power of a test is defined as 1 minus the probability of type II error. Power = 1−β. Hypothesis Testing Procedure There are five important steps in the process of hypothesis testing: Step 1: Identifying the null hypothesis and alternative hypothesis to be tested. Step 2: Identifying the test criterion to be used Step 3: Calculating the test criterion based on the values obtained from the sample Step 4: Finding the critical value with required level of significance and degrees of freedom Step 5: Concluding whether to accept or reject the null hypothesis. Multiple Hypothesis Testing The problem of multiple hypothesis testing arises when there are more than one hypothesis to be tested simultaneously for statistical significance. Multiple hypothesis testing occurs in a
  • 32. 32 vast variety of field and for a variety of purposes. Testing of more than one hypothesis is used in many field and for many purposes. An alternate way of multiple hypothesis testing is multiple decision problem. When considering multiple testing problems, the concern is with Type 1 errors when hypothesis are true and type 11 errors when they are false. The evaluation of the procedures is based on criteria involving balance between these errors. Bayesian Hypothesis Testing Bayesian involves specifying a hypothesis and collecting evidence that support or does not support the statistical hypothesis. The amount of evidence can be used to specify the degree of belief in a hypothesis in probabilistic terms. The probability of supporting hypothesis can become very high or low. Hypothesis with a high probabilistic terms are accepted as true, and with low are rejected as false. Bayesian hypothesis testing works just like any other type of Bayesian inference. Let us consider the case where we are considering only two hypotheses, H1 and H2 Level of Significance in Hypothesis Testing The hypothesis testing follows the following procedure:  Specify the null and alternative hypotheses  Specify a value for α  Collect the sample data and determine the weight of evidence for rejection the null hypothesis. This weight is given in the terms of probability, is called the level of significance (p value) of the statistical test. The level of significance is the probability of obtaining a value of the statistic that is likely or reject H0 as the actual observed value of the test statistic, assuming that null hypothesis is true. If the level of significance is a small value, then the sample data fail to support null hypothesis and it reject H0. If the level of significance is a large value, then we fail to reject null hypothesis. Hypothesis Testing Example: Given below are some of the examples on hypothesis testing. Solved Example Question: XYL Company, with a very small turnover, is taking feedback on permanent employees. During the feedback process, it was found that the average age of XYL employees is 20 years. The relevance of the data was verified by taking a random sample of
  • 33. 33 hundred workers and the common age turns out as 19 years with a standard deviation of 02 years. Now XYZ should continue to make its claim, or it should make changes? Solution: 1. Specify the hypothesis H0 = 20 (twenty) years H1 = 20 (twenty) years 2. State the Significance Level: Since the company would like to maintain its present message to new human resources, XYZ selects a fairly weak significance level(α = 0.5). Because this is a two-tailed analysis, half of the alpha will be assigned to every tail of the allocation. In this condition the important values of Z = +1.96 and -1.96. 3. Specify the decision rule: If the calculated value of Z geq 1.96 or Z leq -1.96, the null hypothesis will be rejected.
  • 34. 34 Design of Experiment The principles of design suggest how a designer can best arrange the various elements of a page layout in connection to the overall design and to each other. Principles of design: Principles applied to the elements of design that bring them together into one design. How one applies these principles determines how successful a design may be. Mainly three principles: 1. Randomization 2. Replication 3. Error control Methods 1. Proximity: sense of distance between elements 2. Similarity: ability to seem repeatable with other elements 3. Continuation: the sense of having a line or pattern extend 4. Repetition: elements being copied or mimicked numerous times 5. Rhythm: is achieved when recurring position, size, color, and use of a graphic element has a focal point interruption. 6. Altering the basic theme achieves unity and helps keep interest. Data Design 1. Data will be normalised across the organisation and its partner network 2. Data will be transferable and re-usable across the organisation and its partner network 3. Data entry will be avoided, by using data lookup, selection and confirmation approaches Proportional Hazard Model Survival models can be viewed as consisting of two parts: the underlying hazard function, often denoted , describing how the risk of event per time unit changes over time at baseline levels of covariates; and the effect parameters, describing how the hazard varies in response to explanatory covariates. A typical medical example would include covariates such as treatment assignment, as well as patient characteristics such as age at start of study,
  • 35. 35 gender, and the presence of other diseases at start of study, in order to reduce variability and/or control for confounding. The proportional hazards condition[1] states that covariates are multiplicatively related to the hazard. In the simplest case of stationary coefficients, for example, a treatment with a drug may, say, halve a subject's hazard at any given time , while the baseline hazard may vary. Note however, that this does not double the life time of the subject; the precise effect of the covariates on the life time depends on the type of . Of course, the covariate is not restricted to binary predictors; in the case of a continuous covariate , it is typically assumed that the hazard responds logarithmically; each unit increase in results in proportional scaling of the hazard. The Cox partial likelihood shown below, is obtained by using Breslow's estimate of the baseline hazard function, plugging it into the full likelihood and then observing that the result is a product of two factors. The first factor is the partial likelihood shown below, in which the baseline hazard has "canceled out". The second factor is free of the regression coefficients and depends on the data only through the censoring pattern. The effect of covariates estimated by any proportional hazards model can thus be reported as hazard ratios.
  • 36. 36 Life Tables A life table is a concise way of showing the probabilities of a member of a particular population living to or dying at a particular age. In this study, the life tables are used to examine the mortality changes in the Social Security population over time. An ideal representation of human mortality would provide a measure of the rate of death occurring at specified ages over specified periods of time. In the past, analytical methods (such as the Gompertz, Makeham, or logistic curves) satisfied this criterion approximately over a broad range of ages. However, as actual data have become more abundant and more reliable, the use of approximate analytical methods have become less necessary and acceptable. Today, mortality is most commonly represented in the form of a life table, which gives probabilities of death within one year at each exact integral age. These probabilities are generally based on tabulations of deaths in a given population and estimates of the size of that population. For this study, functions in the life table can be generated from the qx, where qx is the probability of death within a year of a person aged x. Although a life table does not give mortality at non- integral ages or for non-integral durations, as can be obtained from a mathematical formula, acceptable methods for estimating such values are well known. Two basic types of life tables are presented in this study, period-based tables and cohort- based tables. Each type of table can be constructed either based on actual population data or on expected future experience. A period life table is based on, or represents, the mortality experience of an entire population during a relatively short period of time, usually one to three years. Life tables based directly on population data are generally constructed as period life tables because death and population data are most readily available on a time period basis. Such tables are useful in analyzing changes in the mortality experienced by a population through time. If the experience study is limited to short periods of time, the resulting rates will be more uniformly representative of the entire period. A cohort, or generation, life table is based on, or represents, mortality experience over the entire lifetime of a cohort of persons born during a relatively short period of time, usually one year. Cohort life tables based directly on population experience data are relatively rare, because of the need for data of consistent quality over a very long period of time. Cohort tables can, however, be readily produced, reflecting mortality rates from a series of period tables for past years, projections of future mortality, or a combination of the two. Such tables are superior to period tables for the purpose of projecting a population into the future when mortality is
  • 37. 37 expected to change over time, and for analyzing the generational trends in mortality. A life table treats the mortality experience upon which it is based as though it represents the experience of a single birth cohort consisting of 100,000 births who experience, at each age x of their lives, the probability of death, denoted qx, shown in the table. The entry lx in the life table shows the number of survivors of that birth cohort at each succeeding exact integral age. Another entry, dx, shows the number of deaths that would occur between succeeding exact integral ages among members of the cohort. The entry denoted Lx gives the number of person-years lived between consecutive exact integral ages x and x+1 and Tx gives the total number of person-years lived beyond each exact integral age x, by all members of the cohort. The final entry in the life table, represents the average number of years of life remaining for members of the cohort still alive at exact integral age x, and is called the life expectancy. The lx entry in the life table is also useful for determining the age corresponding to a specified survival rate from birth, which is defined as the age at which the ratio of lx to 100,000 is equal to a specified value between 0 and 1. A stationary population is what would result if for each past and future year: 1. The probabilities of death shown in the table are experienced 2. 100,000 births occur uniformly throughout each year 3. The population has no immigration and emigration A population with these characteristics would have a constant number of persons from year to year (in fact, at any time) both in their total number and in their number at each age. These numbers of persons, by age last birthday, are provided in the life table as the Lx values. The lx entry is interpreted as the number of persons who attain each exact integral age during any year, and dx is the number of persons who die at each age last birthday during any year. The entry Tx represents the number of persons who are alive at age last birthday x or older, at any time. Construction of Central Death Rates A. Data Sources Annual tabulations of numbers of deaths by age and sex are made by the National Center for Health Statistics (NCHS) based on information supplied by States in the Death Registration Area, and are published in the volumes of Vital Statistics of the United States. Deaths are
  • 38. 38 provided by five year age groups for ages 5 through 84, in total for ages 85 and older, and by single-year and smaller age intervals for ages 4 and under. One requirement for admission to the Death Registration Area, which since 1933 has included all the States, the District of Columbia and the independent registration area of New York City, was a demonstration of ninety percent completeness of registration. Life Table Functions The following are definitions of the standard actuarial functions used in this study to develop mortality rates based on mid-year population and annual death data. Dx = the number of deaths at age x last birthday in a population during a year Px = the number of persons who are age x last birthday in a population at midyear yMx = the central death rate for the subset of a population that is between exact ages x and x+y yqx = the probability that a person exact age x will die within y years The following are the additional definitions of standard life table functions. The table represents a hypothetical cohort of 100,000 persons born at the same instant who experience the rate of mortality represented by 1qx, the probability that a person age x will die within one year, for each age x throughout their lives. The stationary population definitions, that are given in parentheses, refer to the population size and age distribution that would result if the rates of mortality represented by 1qx were experienced each year, past and future, for persons between exact ages x and x+1, and if 100,000 births were to occur uniformly throughout each year. The life table functions lx, dx, Lx, Tx, and were calculated as follows: l0 = 100,000 dx = lx · 1qx x = 1, 2, 3, ... lx = lx-1 · (1 - 1qx-1) x = 1, 2, 3, ... L0 = l0 - 1f0 · d0 Lx = lx - .5 · dx x = 1, 2, 3, ...
  • 39. 39 Tx = Lx + Lx+1 + Lx+2 + ... + L148 x = 0, 1, 2, 3, ... = Tx / lx x = 0, 1, 2, 3, ... The fundamental step in constructing a life table from population data is that of developing probabilities of death, qx, that accurately reflect the underlying pattern of mortality experienced by the population. The following sections describe the methods used for developing the rates presented in this actuarial study. These methods, as will be seen, vary significantly by age. Actual data permit the computation of central death rates, which are then converted into probabilities of death. Exceptions to this procedure include direct calculation of probabilities of death at young ages and geometric extrapolation of probabilities of death at extreme old age, where data is sparse or of questionable quality. where 5mx = dx + dx+1 + dx+2 + dx+3 + dx+4 Lx + Lx+1 + Lx+2 + Lx+3 + Lx+4 and 5Mx = Dx + Dx+1 + Dx+2 + Dx+3 + Dx+4 Px + Px+1 + Px+2 + Px+3 + Px+4 A number of extremely important developments have contributed to the rapid average rate of mortality improvement during the twentieth century. These developments include:  Access to primary medical care for the general population  Improved healthcare provided to mothers and babies  Availability of immunizations  Improvements in motor vehicle safety  Clean water supply and waste removal  Safer and more nutritious foods  Rapid rate of growth in the general standard of living. Each of these developments is expected to make a substantially smaller contribution to annual rates of mortality improvement in the future. Future reductions in mortality will depend upon such factors as:
  • 40. 40  Development and application of new diagnostic, surgical and life sustaining techniques  Presence of environmental pollutants  Improvements in exercise and nutrition  Incidence of violence  Isolation and treatment of causes of disease  Emergence of new forms of disease  Prevalence of cigarette smoking  Misuse of drugs (including alcohol)  Extent to which people assume responsibility for their own health  Education regarding health  Changes in our conception of the value of life  Ability and willingness of our society to pay for the development of new treatments and technologies, and to provide these to the population as a whole. Taking Demographic Projections Survival rates are used extensively in demographic projection techniques. Survival rates are derived from life tables or census data, and are used to calculate the number of people that will be alive in the future. In many cases, planners can obtain survival rates from a national or regional statistics office, or from life tables. If survival rates or life tables are not available, the rates may be computed from a model life table or census data. Life tables: Life tables are used to measure mortality, survivorship, and the life expectancy of a population at varying ages. There are several types of life tables. A generation or cohort life table is a life history of the mortality experiences of an actual cohort of individuals. The cohort begins at birth and their mortality experiences are recorded through the death of the last member of that cohort. For example, demographers use the table to trace the mortality experiences of a cohort or group of individuals born in 1910 and record the mortality experiences of each member until the last one dies. In most cases, generation life tables are used to study historic periods. Current or period life tables Period life tables are based on the mortality experience of a hypothetical cohort of newborn babies, usually 100,000 newborns, who are subject to the age-specific
  • 41. 41 mortality rates on which the table is based. It traces the cohort of newborn babies throughout their lifetime under the assumption that they are subject to the age-specific mortality rates of a region or country. There are two types of current life tables:  Unabridged, for single years of life  Abridged, for 5-year cohorts of life In many countries, life tables are based on an average of age-specific death rates for a 3-year time period, generally around a census taking. In many cases, the life tables are prepared every 10 years. For example, a country or state would collect age-specific death rates for 1999, 2000, and 2001. The census for year 2000 would be used for the base population. ^Top Calculating Survival Rates Life tables are used to calculate survival rates. For population projections, 5-year survival rates are computed. For estimates of net migration, 10-year survival rates are calculated. Calculations of survival rates rely on two columns in the life table, Lx and Tx. Using the abridged life table presented in Table 7-1, calculate 5-year survival rates as shown in Equation 7-1. Equation 7-1 5-year Survival Rate To calculate a rate to survive women ages 25–29 into the next 5-year age cohort (30–34), use the following numbers from the Lx column in Table 7-1, as shown in the following example.
  • 42. 42 Using Model Life Tables If life tables are not available for a particular country, use model life tables to obtain survival rates, preferably regional model life tables. Model life table: A model life table is derived from life tables and mortality experiences of a number of countries. They are primarily used to assist countries that do not have vital statistics systems to record deaths. Using regression analysis, the United Nations published its first set of model life tables in 1955. The tables were based on life tables from 158 countries. In 1966, Coale and Demeny introduced regional model life tables. The authors used 326 life tables to develop 200 regional model life table.
  • 43. 43 Sample and Sampling Sample: Sample is the representative part of a population. Sampling: The process of taking sample from strata is called sampling. A process used in statistical analysis in which a predetermined number of observations will be taken from a larger population. The methodology used to sample from a larger population will depend on the type of analysis being performed, but will include simple random sampling, systematic sampling and observational sampling. Types of Sampling Probability Sampling: Laura is a psychologist who is interested in studying whether there is bias against women in the workforce. So she decides to survey workers to see if they believe that sexism plays a part at their company. 1. Simple Random Sampling, 2. Stratified Random Sampling, 3. Multi-Stage Sampling Non-probability Sampling: Simple Random Sampling: A simple random sample (SRS) of size n is produced by a scheme which ensures that each subgroup of the population of size n has an equal probability of being chosen as the sample. There are also a lot of computer programs that allow researchers to easily generate a simple random sample from a population. For example, maybe Laura uses a computer program to randomly select women in sales at the top 50 companies in the United States. She inputs all of their names and the number of people she wants in her sample, and the computer will randomly pick names to be included in the sample. Stratified Random Sampling: Divide the population into "strata". There can be any number of these. Then choose a simple random sample from each stratum. Combine those into the overall sample. That is a stratified random sample. (Example: Church A has 600 women and 400 women as members. One way to get a stratified random sample of size 30 is to take a SRS of 18 women from the 600 women and another SRS of 12 men from the 400 men.). One
  • 44. 44 popular method of probability sampling is the systematic sampling method, which involves ordering the population and then choosing every nth person. For example, maybe Laura orders a list of possible women alphabetically or by height. Then she chooses every 10th woman or every 8th woman or whatever number she decides on ahead of time. The advantage of systematic sampling is that the sample is evenly spread across the population. Imagine that Laura's computer-generated simple random assignment picks all short women and no tall women. And imagine that short women experience more bias than tall women because people see them as being like little girls. The simple random assignment has given Laura a sample that could have a different outcome from the main population. But if Laura orders the population by height and chooses every 10th woman, she has a sample that includes women of all heights and thus is more representative of the population. Multi-Stage Sampling: Sometimes the population is too large and scattered for it to be practical to make a list of the entire population from which to draw a SRS. For instance, when the a polling organization samples US voters, they do not do a SRS. Since voter lists are compiled by counties, they might first do a sample of the counties and then sample within the selected counties. This illustrates two stages. In some instances, they might use even more stages. At each stage, they might do a stratified random sample on sex, race, income level, or any other useful variable on which they could get information before sampling. Non-probability sampling schemes 1. Voluntary response sampling, 2. Judgement sampling, 3. Convenience sampling Convenience Sampling Kiera wants to give her survey to a sample of people in order to learn why Americans feel the way they do about capital punishment. She and her two research assistants go to a shopping mall on a Tuesday morning and stop people to ask their opinion on the death penalty and why they feel that way. Kiera is using the convenience sampling method, which is just what it sounds like: a researcher selects the sample based on convenience. The subjects selected to be part of the study's sample are there and are available to be tested. Convenience sampling has a major problem: the people who are readily available are not necessarily representative of
  • 45. 45 the population at large. Think about Kiera's study; if she and her research assistants poll the people at a shopping mall on a Tuesday morning, their sample is limited to subjects who are at a shopping mall on a Tuesday morning. Anyone with a nine-to-five job (which includes most adults in America) will be at work, not at the mall, which means that they won't be part of Kiera's sample. With the problem of non-representativeness, you might be wondering why researchers would ever use convenience sampling. The answer is in its name: convenience. Psychologists do this a lot. If they teach at a university, they are most likely doing research on university students. The truth is, it's not always practical to use a method other than convenience sampling. Other sampling methods might yield a better sample, but they also cost more in time and money, so many researchers end up using convenience sampling. Quota Sampling For a moment, though, let's say that Kiera and her research assistants are able to go to a mall at a time when the entire population of American adults is represented. She still has to choose which people to survey. One way to choose is to use the quota sampling method, which involves setting quotas based on demographic information but not randomly selecting subjects for each quota. For example, let's say that Kiera knows that approximately 51% of U.S. adults are women. She might tell her research assistants to interview 51 women and 49 men, a quota that roughly corresponds to the demographics for the population. However, the 51 women and 49 men are not chosen randomly; her assistants can choose which women and men to give the survey to. The good thing about quota sampling is that the demographics are approximately correct for the population, especially if you make quotas for several different demographic categories. For example, Kiera can set quotas not only for gender but for race, age, income level, employment status, political party affiliation, or a host of other categories. The more categories there are, the more likely she will have a sample that represents the population. Of course, the more categories she specifies quotas for, the more complex, time- consuming, and costly her study will be. Not to mention the fact that she might not be able to meet all her quotas. So a researcher has to balance having many categories to create a representative sample with having just a few to keep the research simple and practical. It's a balance that each researcher has to decide on. Judgmental Sampling
  • 46. 46 Kiera decides to use quota sampling. She gets a representative sample and looks at why people believe the way they do about the death penalty. But then she decides to follow up with another, related study that looks at the differences in the death penalty by electric chair versus lethal injection. She wants to know what the differences are as far as cost, difficulty of carrying out the execution, and also how the prisoners react to the mode of execution. The problem is that there aren't all that many people who can answer those questions. The best source of her answers would be professionals who have administered both methods of execution. But there are very few of those in the country, so Kiera decides to give the survey to them all. Kiera is utilizing her own judgment as far as who to include in her sample; in this case, she is using the judgmental sampling method, also called the purposive sampling method, which is when the sample is based on the judgment of who the researcher thinks would be best for the sample. You can remember the names of this method because the researcher is using her 'judgment' and picking subjects with a 'purpose.' Judgmental sampling works best when there are a limited number of people with the expertise needed to be a part of the sample. In Kiera's case, there are only a relatively small number of people who have administered both the electric chair and lethal injection, so she uses her judgment to choose them as opposed to Joe Schmoe on the street who can't answer her questions about cost, difficulty of administration, and how the prisoners react. Types of samples The best sampling is probability sampling, because it increases the likelihood of obtaining samples that are representative of the population. Probability sample: Probability samples are selected in such a way as to be representative of the population. They provide the most valid or credible results because they reflect the characteristics of the population from which they are selected (e.g., residents of a particular community, students at an elementary school, etc.). There are two types of probability samples: random and stratified. The assumption of an equal chance of selection means that sources such as a telephone book or voter registration lists are not adequate for providing a random sample of a community. In both these cases there will be a number of residents whose names are not listed. Telephone surveys get around this problem by random digit dialing but that assumes that everyone in the population has a telephone. The key to random selection is that there is
  • 47. 47 no bias involved in the selection of the sample. Any variation between the sample characteristics and the population characteristics is only a matter of chance. Stratified sample Stratified samples are as good as or better than random samples, but they require a fairly detailed advance knowledge of the population characteristics, and therefore are more difficult to construct. Nonprobability samples (Non-representative samples) As they are not truly representative, non-probability samples are less desirable than probability samples. However, a researcher may not be able to obtain a random or stratified sample, or it may be too expensive. A researcher may not care about generalizing to a larger population. The validity of non-probability samples can be increased by trying to approximate random selection, and by eliminating as many sources of bias as possible. Purposive sample A subset of a purposive sample is a snowball sample so named because one picks up the sample along the way, analogous to a snowball accumulating snow. A snowball sample is achieved by asking a participant to suggest someone else who might be willing or appropriate for the study. Snowball samples are particularly useful in hard-to-track populations, such as truants, drug users, etc. Convenience sample Non-probability samples are limited with regard to generalization. Because they do not truly represent a population, we cannot make valid inferences about the larger group from which they are drawn. Validity can be increased by approximating random selection as much as possible, and making every attempt to avoid introducing bias into sample selection.
  • 48. 48 Skewness and Kurtosis Skewness and Kurtosis are a fundamental task in many statistical analyses is to characterize the location and variability of a data set. A further characterization of the data includes skewness and kurtosis. Measure of skewness Skewness is a measure of symmetry, or more precisely, the lack of symmetry. A distribution, or data set, is symmetric if it looks the same to the left and right of the center point. Measures of kurtosis Kurtosis is a measure of whether the data are peaked or flat relative to a normal distribution. That is, data sets with high kurtosis tend to have a distinct peak near the mean, decline rather rapidly, and have heavy tails. Data sets with low kurtosis tend to have a flat top near the mean rather than a sharp peak. A uniform distribution would be the extreme case. SkewnessFor univariate data Y1, Y2, ...,YN, the formula for skewness is: skewness=∑Ni=1(Yi−Y¯ )3(N−1)s3, where Y¯ is the mean, s is the standard deviation, and N is the number of data points. The skewness for a normal distribution is zero, and any symmetric data should have a skewness near zero. Negative values for the skewness indicate data that are skewed left and positive values for the skewness indicate data that are skewed right. By skewed left, we mean that the left tail is long relative to the right tail. Similarly, skewed right means that the right tail is long relative to the left tail. Some measurements have a lower bound and are skewed right. For example, in reliability studies, failure times cannot be negative. Kurtosis For univariate data Y1, Y2, ...,YN, the formula for kurtosis is: kurtosis=∑Ni=1(Yi−Y¯ )4(N−1)s4, where Y¯ is the mean, s is the standard deviation, and N is the number of data points. Alternative Definition of Kurtosis The kurtosis for a standard normal distribution is three. Which definition of kurtosis is used is a matter of convention (this handbook uses the original definition). When using software to compute the sample kurtosis, you need to be aware of which convention is being followed. Many sources use the term kurtosis when they are actually computing "excess kurtosis", so it may not always be clear.
  • 49. 49 Interpreting skewness and kurtosis Skewnessquantifies how symmetrical the distribution is. 1. A symmetrical distribution has a skewness of zero. 2. An asymmetrical distribution with a long tail to the right (higher values) has a positive skew. 3. An asymmetrical distribution with a long tail to the left (lower values) has a negative skew. 4. The skewness is unitless. 5. Any threshold or rule of thumb is arbitrary, but here is one: If the skewness is greater than 1.0 (or less than -1.0), the skewness is substantial and the distribution is far from symmetrical. Kurtosis quantifies whether the shape of the data distribution matches the Gaussian distribution. 1. A Gaussian distribution has a kurtosis of 0. 2. A flatter distribution has a negative kurtosis, 3. A distribution more peaked than a Gaussian distribution has a positive kurtosis. 4. Kurtosis has no units. 5. The value that Prism reports is sometimes called the excess kurtosissince the expected kurtosis for a Gaussian distribution is 0.0. 6. An alternative definition of kurtosis is computed by adding 3 to the value reported by Prism. With this definition, a Gaussian distribution is expected to have a kurtosis of 3.0. Computation of skewness Skewness has been defined in multiple ways. The steps below explain the method used by Prism, called g1 (the most common method).
  • 50. 50 1. We want to know about symmetry around the sample mean. So the first step is to subtract the sample mean from each value, The result will be positive for values greater than the mean, negative for values that are smaller than the mean, and zero for values that exactly equal the mean. 2. To compute a unitless measures of skewness, divide each of the differences computed in step 1 by the standard deviation of the values. These ratios (the difference between each value and the mean divided by the standard deviation) are called z ratios. By definition, the average of these values is zero and their standard deviation is 1. 3. For each value, compute z3 . Note that cubing values preserves the sign. The cube of a positive value is still positive, and the cube of a negative value is still negative. 4. Average the list of z3 by dividing the sum of those values by n-1, where n is the number of values in the sample. If the distribution is symmetrical, the positive and negative values will balance each other, and the average will be close to zero. If the distribution is not symmetrical, the average will be positive if the distribution is skewed to the right, and negative if skewed to the left. Why n-1 rather than n? For the same reason that n-1 is used when computing the standard deviation. 5. Correct for bias. For reasons that I do not really understand, that average computed in step 4 is biased with small samples -- its absolute value is smaller than it should be. Correct for the bias by multiplying the mean of z3 by the ratio n/(n-2). This correction increases the value if the skewness is positive, and makes the value more negative if the skewness is negative. With large samples, this correction is trivial. But with small samples, the correction is substantial.
  • 51. 51 Conclusion The Biostatistics is the most crucial subjects in the field of biological science. The use of statistics is seen the field to know the significant of any experiments or test results. Statistics is indispensable into planning in the modern age which is termed as “the age of planning”. Almost all over the world the govt. are restoring to planning for economic development. Statistical data and techniques of statistical analysis have to immensely useful involving economical problem. Such as wages, price, time series analysis, demand analysis. Statistics is an irresponsible tool of production control. Business executive are relying more and more on statistical techniques for studying the much and desire of the valued customers. In industry statistics is widely used inequality control. Statistics are intimately related recent advancements in statistical technique are the outcome of wide applications of mathematics. In medical science the statistical tools for collection, presentation and analysis of observed facts relating to causes and incidence of dieses and the result of application various drugs and medicine are of great importance. In education and physiology statistics has found wide application such as, determining or to determine the reliability and validity to a test, factor analysis etc.In war the theory of decision function can be a great assistance to the military and personal to plan “maximum destruction with minimum effort. So from the above discussion it is very clear that the study of biostatistics is very important in the modern age. To be a good researcher it is very important to know sharply statistics and biostatistics. When anyone do thesis on biological data he must know the use of biostatistics to analyze the data in right way and to make a comments on his findings whether it is significant or insignificant. Therefore we should study it and apply in the field thoroughly.
  • 52. 52 References Abramowitz, M. and Stegun, I. A. Handbook of Mathematical Functions with Formulas, Graphs, andMathematical Tables, 9th printing.New York: Dover, p. 928, 1972. Aitken, Alexander Craig (1957). Statistical Mathematics 8th Edition.Oliver & Boyd.ISBN 9780050013007 (Page 95) Aldrich, John (1995). "Correlations Genuine and Spurious in Pearson and Yule".Statistical Science10 (4): 364–376. Andersen, P.; Gill, R. (1982). "Cox's regression model for counting processes, a large sample study.".Annals of Statistics10 (4): 1100–1120. Andrew M. Isserman, "The Right People, the Right Rates," Journal of the American Planning Association 59.1(1993): 45–64. Anscombe, Francis J. (1973). "Graphs in statistical analysis".The American Statistician27: 17–21. Ansley J. Coale, Paul Demeny and Barbara Vaughan, 1983, "Uses of the Tables," Regional Model Life Tables and Stable Populations, 2nd ed. (New York: Academic Press, 1983) 29–36. Bagdonavicius, V.; Levuliene, R.; Nikulin, M. (2010). "Goodness-of-fit criteria for the Cox model from left truncated and right censored data". Journal of Mathematical Sciences167 (4): 436–443. Bender, R., Augustin, T. and Blettner, M. (2006).Generating survival times to simulate Cox proportional hazards models, Statistics in Medicine 2005; 24:1713–1723. Breslow, N. E.(1975). "Analysis of Survival Data under the Proportional Hazards Model".International Statistical Review / Revue Internationale de Statistique43 (1): 45–57. Chernoff, H.; Lehmann, E. L. (1954)."The Use of Maximum Likelihood Estimates in Tests for Goodness of Fit".The Annals of Mathematical Statistics25 (3): 579–586. Collett, D. (2003). Modelling Survival Data in Medical Research (2nd ed.). Cox, D. R. (1997). "Some remarks on the analysis of survival data". the First Seattle Symposium of Biostatistics: Survival Analysis. Cox, D. R. (1997). "Some remarks on the analysis of survival data". the First Seattle Symposium of Biostatistics: Survival Analysis. Cox, D. R.; Oakes, D. (1984).Analysis of Survival Data. New York: Chapman & Hall.
  • 53. 53 Cox, David R. (1972)."Regression Models and Life-Tables".Journal of the Royal Statistical Society, Series B34 (2): 187–220. Critical Values of the Chi-Squared Distribution".e-Handbook of Statistical Methods. National Institute of Standards and Technology. Croxton, Frederick Emory; Cowden, Dudley Johnstone; Klein, Sidney (1968) Applied General Statistics, Pitman. ISBN 9780273403159 (page 625) Dietrich, Cornelius Frank (1991) Uncertainty, Calibration and Probability: The Statistics of Scientific and Industrial Measurement 2nd Edition, A. Higler. ISBN 9780750300605 (Page 331) Donald J. Bogue, Kenneth Hinze and Michael White, Techniques of Estimating Net Migration (Chicago: Community and Family Study Center, University of Chicago, 1982). Dowdy, S. and Wearden, S. (1983)."Statistics for Research", Wiley.ISBN 0-471-08602-9pp 230 Efron, Bradley (1974). "The Efficiency of Cox's Likelihood Function for Censored Data".Journal of the American Statistical Association72 (359): 557–565. Francis, DP; Coats AJ; Gibson D (1999). "How high can a correlation coefficient be?".Int J Cardiol69 (2): 185–199. George W. Barclay, "The study of mortality," Techniques of Population Analysis (New York: John Wiley and Sons, 1958) 123–134. Gosall, NarinderKaurGosall, Gurpal Singh (2012). Doctor's Guide to Critical Appraisal. (3. ed.). Knutsford: PasTest. pp. 129–130. Greenwood, P.E.; Nikulin, M.S. (1996).A guide to chi-squared testing. New York: Wiley. Henry S. Shryock and Jacob S. Siegel, "The Life Table," The Methods and Materials of Demography (Washington, D.C.: United States Bureau of the Census, 1973). J. L. Rodgers and W. A. Nicewander.Thirteen ways to look at the correlation coefficient. The American Statistician, 42(1):59–66, February 1988. James C. Raymondo, "Survival Rates: Census and Life Table Methods," Population Estimation and Projection (New York: Quorum Books, 1992) 43–60. Kendall, M. G. (1955) "Rank Correlation Methods", Charles Griffin & Co. Kendall, W. S.; Barndorff-Nielson, O.; and van Lieshout, M. C. Mathematics of Statistic Pt. 1, 3rd ed. Princeton, NJ: Van Nostrand, pp. 100-101, 1962.
  • 54. 54 Kenney, J. F. and Keeping, E. S. Mathematics of Statistics, Pt. 2, 2nd ed. Princeton, NJ: Van Nostrand, 1951. Lopez-Paz D. and Hennig P. and Schölkopf B. (2013). "The Randomized Dependence Coefficient", "Conference on Neural Information Processing Systems" Reprint MahdaviDamghani B. (2013). "The Non-Misleading Value of Inferred Correlation: An Introduction to the Cointelation Model". Wilmott Magazine. MahdaviDamghani, Babak (2012). "The Misleading Value of Measured Correlation".Wilmott2012 (1): 64–73. Martinussen&Scheike (2006) Dynamic Regression Models for Survival Data (Springer). Nan Laird and Donald Olivier (1981)."Covariance Analysis of Censored Survival Data Using Log-Linear Analysis Techniques".Journal of the American Statistical Association76 (374): 231–240. σikolić D, Muresan RC, Feng W, Singer W (2012) Scaled correlation analysis: a better way to compute a cross-correlogram. European Journal of Neuroscience, pp. 1–21, P. McCullagh and J. A. Nelder (2000). "Chapter 13: Models for Survival Data". Generalized Linear Models (Second ed.). Boca Raton, Florida: Chapman & Hall/CRC. ISBN 0- 412-31760-5.(Second edition 1989; first CRC reprint 1999.) Pearson, Karl(1900). "On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling". Philosophical Magazine Series 550 (302): 157–175. Plackett, R. L. (1983). "Karl Pearson and the Chi-Squared Test".International Statistical Review (International Statistical Institute (ISI)) 51 (1): 59–72. Press, W. H.; Flannery, B. P.; Teukolsky, S. A.; and Vetterling, W. T. "Moments of a Distribution: Mean, Variance, Skewness, and So Forth." Numerical Recipes in FORTRAN: The Art of Scientific Computing, 2nd ed. Cambridge, England: Cambridge University Press, pp. 604-609, 1992. Reid, N. (1994). "A Conversation with Sir David Cox".Statistical Science9 (3): 439–455. Steve H. Murdock and David R. Ellis, Applied Demography: An Introduction to Basic Concepts, Methods, and Data (Boulder, CO: Westview Press, 1991).