This document provides an introduction to biostatistics. It defines statistics as the collection, organization, and analysis of data to draw inferences about a sample population. Biostatistics applies statistical methods to biological and medical data. The document discusses why biostatistics is studied, including that more aspects of medicine and public health are now quantified and biological processes have inherent variation. It also covers types of data, methods of data collection like questionnaires and observation, and considerations for designing questionnaires and conducting interviews.
DATA COLLECTION AND PRESENTATION IN PUBLIC HEALTH DENTISTRYPoonam Narang
The basics of data collection, from defining data types to exploring measurement scales. We discussed and outlined various sources for data collection. Text, tables, and graphs are effective communication media that present and convey data and information. They aid readers in understanding the content of research, sustain their interest, and effectively present large quantities of complex information.
DATA COLLECTION AND PRESENTATION IN PUBLIC HEALTH DENTISTRYPoonam Narang
The basics of data collection, from defining data types to exploring measurement scales. We discussed and outlined various sources for data collection. Text, tables, and graphs are effective communication media that present and convey data and information. They aid readers in understanding the content of research, sustain their interest, and effectively present large quantities of complex information.
- Video recording of this lecture in English language: https://youtu.be/lK81BzxMqdo
- Video recording of this lecture in Arabic language: https://youtu.be/Ve4P0COk9OI
- Link to download the book free: https://nephrotube.blogspot.com/p/nephrotube-nephrology-books.html
- Link to NephroTube website: www.NephroTube.com
- Link to NephroTube social media accounts: https://nephrotube.blogspot.com/p/join-nephrotube-on-social-media.html
HOT NEW PRODUCT! BIG SALES FAST SHIPPING NOW FROM CHINA!! EU KU DB BK substit...GL Anaacs
Contact us if you are interested:
Email / Skype : kefaya1771@gmail.com
Threema: PXHY5PDH
New BATCH Ku !!! MUCH IN DEMAND FAST SALE EVERY BATCH HAPPY GOOD EFFECT BIG BATCH !
Contact me on Threema or skype to start big business!!
Hot-sale products:
NEW HOT EUTYLONE WHITE CRYSTAL!!
5cl-adba precursor (semi finished )
5cl-adba raw materials
ADBB precursor (semi finished )
ADBB raw materials
APVP powder
5fadb/4f-adb
Jwh018 / Jwh210
Eutylone crystal
Protonitazene (hydrochloride) CAS: 119276-01-6
Flubrotizolam CAS: 57801-95-3
Metonitazene CAS: 14680-51-4
Payment terms: Western Union,MoneyGram,Bitcoin or USDT.
Deliver Time: Usually 7-15days
Shipping method: FedEx, TNT, DHL,UPS etc.Our deliveries are 100% safe, fast, reliable and discreet.
Samples will be sent for your evaluation!If you are interested in, please contact me, let's talk details.
We specializes in exporting high quality Research chemical, medical intermediate, Pharmaceutical chemicals and so on. Products are exported to USA, Canada, France, Korea, Japan,Russia, Southeast Asia and other countries.
Recomendações da OMS sobre cuidados maternos e neonatais para uma experiência pós-natal positiva.
Em consonância com os ODS – Objetivos do Desenvolvimento Sustentável e a Estratégia Global para a Saúde das Mulheres, Crianças e Adolescentes, e aplicando uma abordagem baseada nos direitos humanos, os esforços de cuidados pós-natais devem expandir-se para além da cobertura e da simples sobrevivência, de modo a incluir cuidados de qualidade.
Estas diretrizes visam melhorar a qualidade dos cuidados pós-natais essenciais e de rotina prestados às mulheres e aos recém-nascidos, com o objetivo final de melhorar a saúde e o bem-estar materno e neonatal.
Uma “experiência pós-natal positiva” é um resultado importante para todas as mulheres que dão à luz e para os seus recém-nascidos, estabelecendo as bases para a melhoria da saúde e do bem-estar a curto e longo prazo. Uma experiência pós-natal positiva é definida como aquela em que as mulheres, pessoas que gestam, os recém-nascidos, os casais, os pais, os cuidadores e as famílias recebem informação consistente, garantia e apoio de profissionais de saúde motivados; e onde um sistema de saúde flexível e com recursos reconheça as necessidades das mulheres e dos bebês e respeite o seu contexto cultural.
Estas diretrizes consolidadas apresentam algumas recomendações novas e já bem fundamentadas sobre cuidados pós-natais de rotina para mulheres e neonatos que recebem cuidados no pós-parto em unidades de saúde ou na comunidade, independentemente dos recursos disponíveis.
É fornecido um conjunto abrangente de recomendações para cuidados durante o período puerperal, com ênfase nos cuidados essenciais que todas as mulheres e recém-nascidos devem receber, e com a devida atenção à qualidade dos cuidados; isto é, a entrega e a experiência do cuidado recebido. Estas diretrizes atualizam e ampliam as recomendações da OMS de 2014 sobre cuidados pós-natais da mãe e do recém-nascido e complementam as atuais diretrizes da OMS sobre a gestão de complicações pós-natais.
O estabelecimento da amamentação e o manejo das principais intercorrências é contemplada.
Recomendamos muito.
Vamos discutir essas recomendações no nosso curso de pós-graduação em Aleitamento no Instituto Ciclos.
Esta publicação só está disponível em inglês até o momento.
Prof. Marcus Renato de Carvalho
www.agostodourado.com
Explore natural remedies for syphilis treatment in Singapore. Discover alternative therapies, herbal remedies, and lifestyle changes that may complement conventional treatments. Learn about holistic approaches to managing syphilis symptoms and supporting overall health.
263778731218 Abortion Clinic /Pills In Harare ,sisternakatoto
263778731218 Abortion Clinic /Pills In Harare ,ABORTION WOMEN’S CLINIC +27730423979 IN women clinic we believe that every woman should be able to make choices in her pregnancy. Our job is to provide compassionate care, safety,affordable and confidential services. That’s why we have won the trust from all generations of women all over the world. we use non surgical method(Abortion pills) to terminate…Dr.LISA +27730423979women Clinic is committed to providing the highest quality of obstetrical and gynecological care to women of all ages. Our dedicated staff aim to treat each patient and her health concerns with compassion and respect.Our dedicated group ABORTION WOMEN’S CLINIC +27730423979 IN women clinic we believe that every woman should be able to make choices in her pregnancy. Our job is to provide compassionate care, safety,affordable and confidential services. That’s why we have won the trust from all generations of women all over the world. we use non surgical method(Abortion pills) to terminate…Dr.LISA +27730423979women Clinic is committed to providing the highest quality of obstetrical and gynecological care to women of all ages. Our dedicated staff aim to treat each patient and her health concerns with compassion and respect.Our dedicated group of receptionists, nurses, and physicians have worked together as a teamof receptionists, nurses, and physicians have worked together as a team wwww.lisywomensclinic.co.za/
Report Back from SGO 2024: What’s the Latest in Cervical Cancer?bkling
Are you curious about what’s new in cervical cancer research or unsure what the findings mean? Join Dr. Emily Ko, a gynecologic oncologist at Penn Medicine, to learn about the latest updates from the Society of Gynecologic Oncology (SGO) 2024 Annual Meeting on Women’s Cancer. Dr. Ko will discuss what the research presented at the conference means for you and answer your questions about the new developments.
Ozempic: Preoperative Management of Patients on GLP-1 Receptor Agonists Saeid Safari
Preoperative Management of Patients on GLP-1 Receptor Agonists like Ozempic and Semiglutide
ASA GUIDELINE
NYSORA Guideline
2 Case Reports of Gastric Ultrasound
Ethanol (CH3CH2OH), or beverage alcohol, is a two-carbon alcohol
that is rapidly distributed in the body and brain. Ethanol alters many
neurochemical systems and has rewarding and addictive properties. It
is the oldest recreational drug and likely contributes to more morbidity,
mortality, and public health costs than all illicit drugs combined. The
5th edition of the Diagnostic and Statistical Manual of Mental Disorders
(DSM-5) integrates alcohol abuse and alcohol dependence into a single
disorder called alcohol use disorder (AUD), with mild, moderate,
and severe subclassifications (American Psychiatric Association, 2013).
In the DSM-5, all types of substance abuse and dependence have been
combined into a single substance use disorder (SUD) on a continuum
from mild to severe. A diagnosis of AUD requires that at least two of
the 11 DSM-5 behaviors be present within a 12-month period (mild
AUD: 2–3 criteria; moderate AUD: 4–5 criteria; severe AUD: 6–11 criteria).
The four main behavioral effects of AUD are impaired control over
drinking, negative social consequences, risky use, and altered physiological
effects (tolerance, withdrawal). This chapter presents an overview
of the prevalence and harmful consequences of AUD in the U.S.,
the systemic nature of the disease, neurocircuitry and stages of AUD,
comorbidities, fetal alcohol spectrum disorders, genetic risk factors, and
pharmacotherapies for AUD.
Lung Cancer: Artificial Intelligence, Synergetics, Complex System Analysis, S...Oleg Kshivets
RESULTS: Overall life span (LS) was 2252.1±1742.5 days and cumulative 5-year survival (5YS) reached 73.2%, 10 years – 64.8%, 20 years – 42.5%. 513 LCP lived more than 5 years (LS=3124.6±1525.6 days), 148 LCP – more than 10 years (LS=5054.4±1504.1 days).199 LCP died because of LC (LS=562.7±374.5 days). 5YS of LCP after bi/lobectomies was significantly superior in comparison with LCP after pneumonectomies (78.1% vs.63.7%, P=0.00001 by log-rank test). AT significantly improved 5YS (66.3% vs. 34.8%) (P=0.00000 by log-rank test) only for LCP with N1-2. Cox modeling displayed that 5YS of LCP significantly depended on: phase transition (PT) early-invasive LC in terms of synergetics, PT N0—N12, cell ratio factors (ratio between cancer cells- CC and blood cells subpopulations), G1-3, histology, glucose, AT, blood cell circuit, prothrombin index, heparin tolerance, recalcification time (P=0.000-0.038). Neural networks, genetic algorithm selection and bootstrap simulation revealed relationships between 5YS and PT early-invasive LC (rank=1), PT N0—N12 (rank=2), thrombocytes/CC (3), erythrocytes/CC (4), eosinophils/CC (5), healthy cells/CC (6), lymphocytes/CC (7), segmented neutrophils/CC (8), stick neutrophils/CC (9), monocytes/CC (10); leucocytes/CC (11). Correct prediction of 5YS was 100% by neural networks computing (area under ROC curve=1.0; error=0.0).
CONCLUSIONS: 5YS of LCP after radical procedures significantly depended on: 1) PT early-invasive cancer; 2) PT N0--N12; 3) cell ratio factors; 4) blood cell circuit; 5) biochemical factors; 6) hemostasis system; 7) AT; 8) LC characteristics; 9) LC cell dynamics; 10) surgery type: lobectomy/pneumonectomy; 11) anthropometric data. Optimal diagnosis and treatment strategies for LC are: 1) screening and early detection of LC; 2) availability of experienced thoracic surgeons because of complexity of radical procedures; 3) aggressive en block surgery and adequate lymph node dissection for completeness; 4) precise prediction; 5) adjuvant chemoimmunoradiotherapy for LCP with unfavorable prognosis.
New Drug Discovery and Development .....NEHA GUPTA
The "New Drug Discovery and Development" process involves the identification, design, testing, and manufacturing of novel pharmaceutical compounds with the aim of introducing new and improved treatments for various medical conditions. This comprehensive endeavor encompasses various stages, including target identification, preclinical studies, clinical trials, regulatory approval, and post-market surveillance. It involves multidisciplinary collaboration among scientists, researchers, clinicians, regulatory experts, and pharmaceutical companies to bring innovative therapies to market and address unmet medical needs.
New Directions in Targeted Therapeutic Approaches for Older Adults With Mantl...i3 Health
i3 Health is pleased to make the speaker slides from this activity available for use as a non-accredited self-study or teaching resource.
This slide deck presented by Dr. Kami Maddocks, Professor-Clinical in the Division of Hematology and
Associate Division Director for Ambulatory Operations
The Ohio State University Comprehensive Cancer Center, will provide insight into new directions in targeted therapeutic approaches for older adults with mantle cell lymphoma.
STATEMENT OF NEED
Mantle cell lymphoma (MCL) is a rare, aggressive B-cell non-Hodgkin lymphoma (NHL) accounting for 5% to 7% of all lymphomas. Its prognosis ranges from indolent disease that does not require treatment for years to very aggressive disease, which is associated with poor survival (Silkenstedt et al, 2021). Typically, MCL is diagnosed at advanced stage and in older patients who cannot tolerate intensive therapy (NCCN, 2022). Although recent advances have slightly increased remission rates, recurrence and relapse remain very common, leading to a median overall survival between 3 and 6 years (LLS, 2021). Though there are several effective options, progress is still needed towards establishing an accepted frontline approach for MCL (Castellino et al, 2022). Treatment selection and management of MCL are complicated by the heterogeneity of prognosis, advanced age and comorbidities of patients, and lack of an established standard approach for treatment, making it vital that clinicians be familiar with the latest research and advances in this area. In this activity chaired by Michael Wang, MD, Professor in the Department of Lymphoma & Myeloma at MD Anderson Cancer Center, expert faculty will discuss prognostic factors informing treatment, the promising results of recent trials in new therapeutic approaches, and the implications of treatment resistance in therapeutic selection for MCL.
Target Audience
Hematology/oncology fellows, attending faculty, and other health care professionals involved in the treatment of patients with mantle cell lymphoma (MCL).
Learning Objectives
1.) Identify clinical and biological prognostic factors that can guide treatment decision making for older adults with MCL
2.) Evaluate emerging data on targeted therapeutic approaches for treatment-naive and relapsed/refractory MCL and their applicability to older adults
3.) Assess mechanisms of resistance to targeted therapies for MCL and their implications for treatment selection
NVBDCP.pptx Nation vector borne disease control programSapna Thakur
NVBDCP was launched in 2003-2004 . Vector-Borne Disease: Disease that results from an infection transmitted to humans and other animals by blood-feeding arthropods, such as mosquitoes, ticks, and fleas. Examples of vector-borne diseases include Dengue fever, West Nile Virus, Lyme disease, and malaria.
Title: Sense of Smell
Presenter: Dr. Faiza, Assistant Professor of Physiology
Qualifications:
MBBS (Best Graduate, AIMC Lahore)
FCPS Physiology
ICMT, CHPE, DHPE (STMU)
MPH (GC University, Faisalabad)
MBA (Virtual University of Pakistan)
Learning Objectives:
Describe the primary categories of smells and the concept of odor blindness.
Explain the structure and location of the olfactory membrane and mucosa, including the types and roles of cells involved in olfaction.
Describe the pathway and mechanisms of olfactory signal transmission from the olfactory receptors to the brain.
Illustrate the biochemical cascade triggered by odorant binding to olfactory receptors, including the role of G-proteins and second messengers in generating an action potential.
Identify different types of olfactory disorders such as anosmia, hyposmia, hyperosmia, and dysosmia, including their potential causes.
Key Topics:
Olfactory Genes:
3% of the human genome accounts for olfactory genes.
400 genes for odorant receptors.
Olfactory Membrane:
Located in the superior part of the nasal cavity.
Medially: Folds downward along the superior septum.
Laterally: Folds over the superior turbinate and upper surface of the middle turbinate.
Total surface area: 5-10 square centimeters.
Olfactory Mucosa:
Olfactory Cells: Bipolar nerve cells derived from the CNS (100 million), with 4-25 olfactory cilia per cell.
Sustentacular Cells: Produce mucus and maintain ionic and molecular environment.
Basal Cells: Replace worn-out olfactory cells with an average lifespan of 1-2 months.
Bowman’s Gland: Secretes mucus.
Stimulation of Olfactory Cells:
Odorant dissolves in mucus and attaches to receptors on olfactory cilia.
Involves a cascade effect through G-proteins and second messengers, leading to depolarization and action potential generation in the olfactory nerve.
Quality of a Good Odorant:
Small (3-20 Carbon atoms), volatile, water-soluble, and lipid-soluble.
Facilitated by odorant-binding proteins in mucus.
Membrane Potential and Action Potential:
Resting membrane potential: -55mV.
Action potential frequency in the olfactory nerve increases with odorant strength.
Adaptation Towards the Sense of Smell:
Rapid adaptation within the first second, with further slow adaptation.
Psychological adaptation greater than receptor adaptation, involving feedback inhibition from the central nervous system.
Primary Sensations of Smell:
Camphoraceous, Musky, Floral, Pepperminty, Ethereal, Pungent, Putrid.
Odor Detection Threshold:
Examples: Hydrogen sulfide (0.0005 ppm), Methyl-mercaptan (0.002 ppm).
Some toxic substances are odorless at lethal concentrations.
Characteristics of Smell:
Odor blindness for single substances due to lack of appropriate receptor protein.
Behavioral and emotional influences of smell.
Transmission of Olfactory Signals:
From olfactory cells to glomeruli in the olfactory bulb, involving lateral inhibition.
Primitive, less old, and new olfactory systems with different path
2. • Statistics is a field of study concerned with:
1. the collection, organization, summarization,
and analysis of data; and
2. the drawing of inferences about a body of
data when only a part of the data is observed.
• Biostatistics: When the tools of statistics are
employed on the data derived from the biological
sciences and medicine or public health, we use
the term biostatistics
2
3. • Statistics versus statistic (field of study versus
numerical quantity computed from sample data)
• Roughly speaking, the field of statistics can be
divided into:
• Mathematical Statistics: the study &
development of statistical theory and methods in
the abstract and
• Applied Statistics: the application of statistical
methods to solve real problems involving
randomly generated data, and the development
of new statistical methodology motivated by real
problems
3
4. Rationale of studying Statistics
• Statistics provides a way of organizing information on
a wider and more formal basis than relying on the
exchange of anecdotes or biography and personal
experiences
• More and more things are now measured quantitatively
in medicine and public health
• There is a great deal of intrinsic (inherent) variation in
most biological processes
5. Rationale of studying Statistics
• The medical and public health literature is replete or
full with reports in which statistical techniques are
used extensively
• The planning, conduct and interpretation of much of
medical and public health research are becoming
increasingly reliant on statistical technology
5
6. Limitations of statistics
• It deals with only those subjects of inquiry that are
capable of being quantitatively measured and
numerically expressed.
• It deals on aggregates of facts and no importance is
attached to individual items: suited only their group
characteristics are desired to be studied.
• Statistical data is only approximately and not
mathematically correct.
7. Limitations of statistics
• It can be used to establish wrong conclusion and
therefore, can be used only by experts.
• Remember the three lies: Lies, Damon lies and
Statistics
• Evan Esar’s Definition of Statistics and Quote:
“The science of producing unreliable facts from
reliable figures”
• “Statistics is the only science that enables
different experts using the same figures to draw
different conclusions”
7
8. Variable
• As we observe a characteristic, we find that it takes
on different values in different persons, places, or
things, called variable. The characteristic is not the
same when observed in different possessors of it.
• Quantitative variables: is one that can be
measured in the usual sense. For example,
measurements on the heights of adults, the
weights of children, and the ages of patients.
• Qualitative Variables: characteristics that can be
categorized only, like possess or not to possess
some characteristic of interest, ethnic group, etc.
8
9. • Random Variable: Whenever we determine the
height, weight, or age of an individual, the result is
frequently referred to as a value of the respective
variable.
• When the values obtained arise as a result of
chance factors, so that they cannot be exactly
predicted in advance, the variable is called a
random variable.
• When a child is born, we cannot predict exactly his
or her height at maturity. Attained adult height is
the result of numerous genetic and environmental
factors.
9
10. Scales of measurement
• Scales of measurement refer to ways in which
variables/numbers are defined and categorized.
Each scale of measurement determines the
appropriateness for use of certain statistical
analyses.
• There are four scales of measurement: nominal,
ordinal, interval, and ratio.
10
11. Scales of measurement
• Nominal: Categorical data and numbers that are simply
used as identifiers or names represent a nominal scale
of measurement.
• Example: gender code Female as 1 and Male as 2 or
visa versa
• Ordinal: An ordinal scale of measurement represents
an ordered series of relationships or rank order.
• Example: Likert-type scales; how much pain are you in
today? (on a scale of 1 to 10 with one being no pain
and ten being high pain), represent ordinal data.
11
12. Scales of measurement
• Interval: A scale which represents quantity and has
equal units but for which zero represents simply an
additional point of measurement is an interval scale.
• In interval scales zero does not represent the absolute
lowest value.
• Example: Measurement of temperature in Fahrenheit
scale, measurement of Sea levels
12
13. Scales of measurement
• Ratio: The ratio scale of measurement is similar to
the interval scale in that it also represents quantity
and has equality of units. However, this scale also has
an absolute zero (no numbers exist below the zero). A
negative length is not possible.
• Example: physical measures height and weight.
• Often, the distinction between interval and ratio
scales can be ignored in statistical analyses.
• Distinction between these two types and ordinal and
nominal are more important.
13
14. Data
• Data are observations of random variables
made on the elements of a population or sample
• Data are the quantities (numbers) or qualities
(attributes) measured or observed that are to be
collected and/or analyzed
• The word data is plural, datum is singular
• A collection of data is often called a data set
(singular)
14
15. Data and information
• Data is raw, unorganized facts that need to be
processed. Data can be something simple and
seemingly random and useless until it is
organized.
• Example: Each newborn’s birth weight
• When data is processed, organized, structured
or presented in a given context so as to make it
useful, it is called information.
• Example: Mean birth weight of newborns
15
16. Types of data
1. Nominal data
• In statistics/biostatistics, we encounter many
different types of data.
• One of the simplest types of data is nominal data,
in which the values fallen to unordered categories
or classes. Example: sex, marital status, ethnicity,
religion, etc.
• Numbers are often used to represent the
categories. In a certain study, for instance, males
might be assigned the value 1 and females the
value 0 16
17. 2. Ordinal data
• When the order among categories becomes
important, the observations are referred to as
ordinal data.
• For example injuries may be classified according
to their level of severity, so that
1= fatal, 2= severe, 3= moderate, and 4= minor.
• Here a natural order exists among the groupings:
a smaller number represents a more serious
injury. However we are still not concerned with
the magnitude of these numbers.
17
18. 3. Discrete data
• For discrete data both ordering and magnitude
are important.
• In this case, the numbers represent actual
measurable quantities or counts rather than
mere labels.
• Examples of discrete data include the number of
car accidents in a given month, the number of
times a woman has given birth.
18
19. 4. Continuous data
• Data that represent measurable quantities but
are not restricted to taking on certain specified
values.
• In this case the difference between any two
possible data values can be arbitrarily small.
• Examples of continuous data include time, the
serum cholesterol level of a patient, etc.
19
20. Types and Methods of Data Collection
• The statistical data may be classified
under two categories depending up on the
sources:
- Primary Data: are those data which are
collected by the investigator himself for the
purpose of a specific inquiry or study.
- Secondary Data: when an investigator
uses data which have already been collected by
others.
20
21. Data collection methods
1. Observation
• It is a technique that involves systematically
selecting, watching, and recording behaviors of
people, measuring characteristics or other
phenomena.
• It includes all methods from simple visual
observations to the use of high level machines.
• Advantage: Gives relatively more accurate data
on behavior and activities.
• Disadvantages: Investigator’s or observer’s own
bias, prejudice, desires may be reflected and
needs more resources and skilled human power
during the use of high level machines.
21
22. 2 . Self-administered Questionnaire & Interviews
• These are the most commonly used research data
collection techniques.
• Self-administered questionnaire is
– simpler and cheaper
– can be administered to many persons
simultaneously
– can be sent by post (unlike interviews)
• But requires a certain level of education and skill
on the part of the respondents
• People of a low socio-economic status are less
likely to respond
22
23. 3. Face-to-face and telephone interviews
– An interview is a conversation for gathering
information. A research interview involves an
interviewer, who coordinates the process of the
conversation and asks questions, and an
interviewee, who responds to those questions.
– A good interviewer can stimulate and maintain
the respondent’s interest, and can create a
rapport (understanding) and atmosphere
conducive to the answering of questions.
– If anxiety aroused, the interviewer can allay it. If
a question is not understood an interviewer can
repeat it and explain.
23
24. 4. Mailed Questionnaire Method
• The investigator prepares a questionnaire
pertaining to the field of inquiry and are sent by
post to the informants together with a polite
covering letter explaining the detail, the aims and
objectives of collecting the information
• Requests the respondents to cooperate by
furnishing the correct replies and returning the
questionnaire duly filled in
• Drawback: response rates tend to be relatively
low, and there may be under representation of
less literate subjects
24
25. 5. Use of Documentary Sources
• Includes clinical and other personal records,
death certificates, published mortality statistics,
census publications, etc.
• Examples:
- Official publications of CSA
- Publication of MoH and other Ministries
- Newspapers and Journals
- International publications (WHO, UNICEF)
- Records of Hospitals or any HI
25
26. 6. Computer Direct Interviews
• These are interviews in which the Interviewees
enter their own answers directly into a computer.
• They can be used at malls, trade shows, offices,
and so on.
• The Survey System's optional Interviewing
Module and Interview Stations can easily create
computer-direct interviews. Some researchers
set up a Web page survey for this purpose.
26
27. Advantages
• The virtual elimination of data entry and editing
costs
• You will get more accurate answers to sensitive
questions
• Elimination of interviewer bias
• Ensuring skip patterns are accurately followed
• Response rates are usually higher
27
28. Disadvantages
• The Interviewees must have access to a
computer or one must be provided for them.
• As with mail surveys, computer direct
interviews may have serious response rate
problems in populations of lower
educational and literacy levels. This method
may grow in importance as computer use
increases.
28
29. Choosing Method of data
collection
• Decision Makers Need Information
that is Relevant, Timely, Accurate
and Useable
29
30. • The selection of the method of data collection
is also based on practical considerations,
such as:
The need for personnel, skills, equipment, etc.
into what is available and the urgency with
which results are needed.
The acceptability of the procedures to the
subjects – the absence of inconvenience,
unpleasantness, or untoward
The probability that the method will provide a
good coverage, i.e. will supply the required
information about all or almost all members of
the population or sample
30
31. Choice of survey method will also depend
on several factors. These include:
Speed
Email and Web page surveys are the fastest methods,
followed by telephone interviewing. Mail surveys are the
slowest.
Cost
Personal interviews are the most expensive followed by
telephone and then mail. Email and Web page surveys
are the least expensive for large samples.
Computer and
Internet Usage
Web page and Email surveys offer significant
advantages, but you may not be able to generalize their
results to the population as a whole.
Literacy Levels
Illiterate and less-educated people rarely respond to mail
surveys.
Sensitive
Questions
People are more likely to answer sensitive questions
when interviewed directly by a computer in one form or
another.
31
32. Designing Questionnaire
When designing a questionnaire the following
points should be taken into account
– Keep it (questions) short and simple (KISS)
– Questions should be unambiguous and not
double barreled
– Use simple and direct language. The
questions must be clearly understood by
respondent.
– The wording of a question should be simple
and to the point.
– The best kinds of questions are those which
allow a pre-printed answer to be ticked 32
33. – Questions should be neither irrelevant nor too
personal
– Leading questions shouldn’t be asked. A “leading
question” is one that suggests the answer.
– The questionnaire should be designed so that the
questions should fall into a logical sequence.
– After finalizing developing the questionnaire,
translate it into local languages to be used for data
collection
– The last step in questionnaire design is to test the
questionnaire with a small number of interviews
before conducting your main interviews - pilot.
33
34. General Considerations
To be successful involve other experts and
relevant decision-makers in the questionnaire
design process
Formulate a plan for doing the statistical
analysis during the design stage of the project
If you used one method in the past and need
to compare results, stick to that method,
unless there is a compelling reason to change
34
35. Types of questions
Open-ended Questions:
- Permit free responses that should be recorded
in the respondent’s own words.
It is used in
Facts with which the researcher is not very
familiar
Opinions, attitudes, and suggestions of
informants, or
Sensitive issues
35
36. Closed Questions:
Offer a list of possible options or answers
from which the respondents must choose.
Offer a list of options that are exhaustive
and mutually exclusive, and
Keep the number of options as few as
possible.
36
37. Interviewing technique
• Before the questionnaire is used for the data
collection, it should be pre-tested
• Manuals that explain each of the questions should
be prepared – question-by-question specification
• Enumerators and field supervisors should be
trained before they are deployed to the field
37
38. • Enumerator should create good communication
environment with the respondents.
• They should precisely explain the questions in the
questionnaire to the respondent. He/she should
not lead the respondent.
• There should be strong supervision to the field
work until it will be completed.
38
39. Rules for asking questions
Read Qs as they are written
Do not change order of Qs
Read the Qs slowly and clearly
Read Qs in a pleasant voice
Maintain eye contact which is culturally
appropriate
Read the entire question to Respondent
Do not skip Qs
Verify information given by Respondent
39
40. Interviewing tactics of Sensitive
Questions
• Sensitive questions may offend the
respondents
–Expose the respondent’s ignorance
–Call for socially unacceptable answer
–Embarrassments
45
41. Possible tactics (Barton)
– The everybody approach – as you know many
people have been arrested for being involved in
theft. Do you happen to have arrested for being
involved in theft?
– The other people approach – Do you know any
one arrested of theft? How about yourself?
– The Kinsey technique – stare firmly into the
respondents’ eyes and as in simple, clear-cut
language such as that to which respondent is
accustomed, and with and air of assuming that
everybody has done everything, ‘Have you ever
arrested for being involved in theft?’
46
42. Informed consents
Participation in a survey should be voluntary and a
respondent can refuse to be interviewed or
measured, etc.
The information given should be simple and clear
and adapted to the respondent’s level of
understanding.
Informed consents can be either signed or verbal
48
43. The interviewer is responsible for explaining:
– what the survey is about,
– providing all the necessary information, and
– making sure the respondent understands the
implications of his/her participation before
giving his/her consent.
• The information given should be simple and
clear and adapted to the respondent’s level of
understanding.
49
44. • Consents must be documented by asking the
respondents to sign an Informed Consent Form
or give verbal consent before doing the
interview.
– These forms must mention:
• who will be doing the study,
• the types of questions that will be asked,
• why the study is being done, and
• who will have access to the information
provided.
50
47. Data cleaning and edition
• When the questionnaires are collected from the
field, they should be coded and edited
• Checks are basically of two sorts, range checks
and consistency checks.
Range checks: exclude, for example, the
erroneous occurrence of code 3 for sex,
which should only be code 1(male) or code
2(female).
Consistency checks: detect impossible
combinations of data
53
48. Basic precautions recommended to
minimize errors during the handling of
data:
• Avoid any unnecessary copying of data from one
form to another
• Use a verification procedure during data entry -
range and skip rules, double data entry, etc.
• Check all calculations carefully, example – date
conversion, units of measurement, etc.
54
49. Data organization: Tables
The use of tables for presenting data involves
grouping the data into mutually exclusive categories
of the variable, and counting the number of
occurrences to each category
Tables should be as simple as possible and self-
explanatory
Numerical entities of zero should be explicitly
written rather than indicated by a dash
Totals should be shown either in the top row and
the first column or in the last row and last column
If data are not original, their source should be
given in a footnote
55
50. Asthma versus sex and smoking
Sex and
smoking status
Presence of Asthma
No Yes
n % n % Total
Sex
Female 459 91.6 42 8.4 501
Male 439 93.0 33 7.0 472
Total 898 92.3 75 7.7 973
Smoking
Never smoker 480 91.4 45 8.6 525
Ex-smoker 254 91.7 23 8.3 277
Current smoker 164 95.9 7 4.1 171
Total 898 92.3 75 7.7 973
56
51. Data presentation: Diagrams
• Allows readers to obtain an overall grasp of the
data presented.
• The relationship can be seen more quickly and
easily from a graph than from a table.
• The choice of one graph over the other depends
on personal choices and/or the type of the data.
Bar chart and pie chart are commonly used for
quantitative discrete or qualitative data
Histograms, frequency polygon, and line graphs
are used for quantitative continuous data
57
52. Component Bar graph - Smoking status and
presence of asthma
0
10
20
30
40
50
60
70
80
90
100
Never smoker Ex-smoker Current smoker
Number
of
individuals
Smoking status
No Yes
58
53. Pie-chart – smoking status (%)
Never smoker
54%
Ex-smoker
28%
Current
smoker
18%
59
55. Neonatal Mortality Rate by Sex
65.8
34.2
37.2
46.3
25.8
29.0 29.3
50.2
44.8
49.0
54.6
41.4
38.7
34.3
0.0
10.0
20.0
30.0
40.0
50.0
60.0
70.0
2005 2006 2007 2008 2009 2010 2011
NNMR
per
1000
LB
Surveillance year
Female Male
61
56. General rules for constructing graphs
• Every graph should be self-explanatory and as
simple as possible
• Titles are usually placed below the graph
• Legends or keys should be used to differentiate
variables if more than one is shown
• The axes label should be placed to read from
the left side and from the bottom
• The units into which the scale is divided should
be clearly indicated
• The numerical scale representing frequency
must start at zero or a break in the line should
be shown
62
58. Data Exploration
• The exploration procedure produces summary
statistics and graphical displays
• The reasons for using the explore procedure are:
– data screening,
– outlier identification,
– description,
– assumption checking, and
– characterizing differences among
subpopulations (groups of cases).
64
60. • Data screening may show that you have
unusual values, extreme values, gaps in
the data, or other peculiarities.
• Exploring the data can help to determine
whether the statistical techniques that you
are considering for data analysis are
appropriate.
• The exploration may indicate that you need
to transform the data if the technique
requires some known distribution, say the
Normal distribution.
66
61. Measures of Central tendency
- The arithmetic mean, median and mode
- Arithmetic mean is unique, takes into
account all data points and leads itself for
further manipulation but sensitive to
extreme values
- Median is unique, not sensitive to all data
points and not affected by extreme values
- Mode might not exist and be unique, it can
be determined for qualitative data
67
62. Exercise
• Calculate the mean, median and mode for
the whole sample and sex specific
summary values using the data in the
table below
• Sex – 1=Male, 2=Female
• Height if measured in cm, weight in kg,
age in years and FEV in liter
68
64. Summary values
Sex Age Ht Wt FEV
Male Mean 54.85 173.54 80.27 3.42
Median 59.94 174.00 80.90 3.75
Mode 32.47 170.00 57.40 4.20
Sum 932.47 2950.10 1364.60 58.13
n 17 17 17 17
Female Mean 49.16 158.40 64.42 2.81
Median 47.40 159.00 62.00 2.70
Mode 34.43 156.00 60.00 2.45
Sum 639.04 2059.20 837.50 36.53
n 13 13 13 13
Both Mean 52.38 166.98 73.40 3.16
Median 50.96 166.75 71.25 3.15
Mode 32.47 156.00 60.00 2.45
Sum 1571.51 5009.30 2202.10 94.66
n 30 30 30 30
70
65. Measures of Variation/Dispersion
• Dispersion of a set of observations refers to the
scatteredness of observations around a measure
of central tendency
Commonly used measures of variation:
Range, Percentiles, and Standard deviation.
Of these measures only standard deviation is a
measure of variation since it assesses the
scatteredness of observations around the mean
71
66. The Coefficient of Variation
To compare the variability of two or more sets of
data for same or different variables, standard
deviations may lead to fallacious results.
• The variables involved might be measured in
different units, or different characteristics
• Coefficient of Variation (CV) is the standard
deviation expressed as a percentage of the
mean.
72
67. Use the above data to determine standard deviation and Coefficient of variation
Sex Age Ht Wt FEV
Male Mean 54.85 173.54 80.27 3.42
Variance 160.7 49.53 157.22 1.15
Std dev 12.68 7.04 12.54 1.07
CV 23.1 4.1 15.6 31.3
Range 46.06 28 43.9 3.85
Female Mean 49.16 158.4 64.42 2.81
Variance 74.16 32.65 74.78 0.24
Std dev 8.61 5.71 8.65 0.49
CV 17.5 3.6 13.4 17.4
Range 28.98 20.5 28.5 1.55
Both Mean 52.38 166.98 73.40 3.16
Variance 127.58 99.03 181.48 0.83
Std dev 11.3 9.95 13.47 0.91
CV 21.6 6.0 18.4 28.8
Range 46.06 41 49.5 3.85
73
68. Data transformations
• The assumptions underlying a statistical method
may not always be satisfied by a particular set of
data.
• For example, a distribution may be positively
skewed rather than normal. Such problems can
often be overcome simply by transforming the
data to a different scale of measurement
• The most common choice is the logarithmic
transformation
74
69. Logarithmic transformation
• When a logarithmic transformation is applied
to a variable, each individual value is replaced
by its logarithm.
y = log x
• Where x is the original value and y the
transformed value.
• The logarithm has the effect both of equalizing
the standard deviations and removing
skewness (absence of symmetry)
75
70. Choice of a transformation
• There are alternative transformations
• Reciprocal transformation:- is stronger than
the logarithmic, and would be appropriate if the
distribution were considerably more positively
skewed than lognormal.
Y=1/x
76
71. • Square root transformation:- is used when the
constant variance assumption does not hold
true.
• It is weaker than the logarithmic transformation.
• Negative skewness can be removed by using
power transformation, such as a square or a
cubic transformation, the strength increases with
the order of the power
x
y
77
74. Probability Distributions
• Definition: A random variable is a numerical
quantity that takes different values with specified
probabilities.
• There are two types of random variables: discrete
and continuous.
• Definition: A random variable for which there
exists a discrete definition of values with specified
probabilities is a discrete random variable.
80
75. Probability Distributions
• Example: Diarrhoea is one of the most frequent
reasons for visiting health institutions in the first 2
years of life in children.
• Let X be the random variable that represents the
number of episodes of diarrhoea in the first 2
years of life. Then X is a discrete random
variable, which takes on values 0,1,2, ....
• Definition: A random variable whose values form
a continuum (i.e., have no gaps) such that ranges
of values occur with specified probabilities is a
continuous random variable. 81
76. Probability Mass Function for a Discrete
Random Variable
• The values taken by a discrete random variable
and its associated probabilities can be expressed
by a rule, or relationship that is called a probability
density function (pdf).
• Definition: A pdf is a mathematical relationship, or
rule, that assigns to any possible value of a discrete
random variable X the probability P(X = r). This
assignment is made for all values r that have
positive probability. The pdf is also referred to as
probability distribution.
82
77. General rules which apply to any
probability distribution
1. Since the values of a probability distribution are
probabilities, they must be numbers in the
interval from 0 to 1.
2. Since a random variable has to take on one of
its values, the sum of all the values of a
probability distribution must be equal to 1.
• Example: Check whether the following function
can serve as the probability distribution of an
appropriate random variable
83
78. General rules …
12
2
)
(
x
x
f
for x=1, 2, and 3
Substituting the values of x, f(1)=3/12, f(2)=4/12,
and f(3)=5/12
Since none of these values is negative or greater
than one, and since their sum 3/12+4/12+5/12 = 1,
the given function is a probability distribution
84
79. Example on Hypertension-control:
• Suppose a physician agrees to use a new anti-
hypertensive drug on a trial basis on the first 4
untreated hypertensives whom she encounters in
her practice before deciding whether to adopt the
drug for routine use.
• Let X = the number of patients out of 4 who are
brought under control. Suppose that from
previous experience with the drug, for any clinical
practice, the drug company expects the following
probabilities.
r 0 1 2 3 4
P(X=r) .008 .076 .265 .411 .240
85
80. Example:
• For the above table, for any clinical practice, the
probability that between 0 and 4 hypertension’s
are brought under control = 1, i.e.,
• 0.008 + 0.076 + 0.265 + 0.411 + 0.240 = 1
• What is the probability that:
– At least two patients brought under control?
– At most three patients brought under control?
86
81. 1. Binomial distribution
• The Binomial distribution with parameters n and
p is a discrete probability distribution of the
number of successes in a sequence of n
independent binary (yes/no) experiments, each of
which yields success with probability p.
• A useful summary measure, used to describe
binary variables, is the proportion with which the
variable took one of its values, called success.
• The binomial distribution is used to model the
number of successes in a sample of size n drawn
with replacement from a population of size N.
87
82. The Binomial Distribution
• Definition: The distribution of the number of
successes (r) in n statistically independent trails,
where the probability of success on each trail is
P, is known as the binomial distribution, and has
a probability density function given by:
where
• The mean is np and variance is np(1-p)
r
n
r
P)
(1
P
r
n
r)
P(X
r = 0, 1, 2, …, n
!
)!
(
!
r
r
n
n
r
n
88
84. Example:
• What is the probability of obtaining 2 boys out of
5 children if the probability of a boy is 0.51 at
each birth and the sexes of successive children
are considered independent random variables?
• n=5, p=0.51, 1-p=0.49 and r=2
0.306
(0.49)
(0.51)
2!3!
5!
(0.49)
(0.51)
2
5
2)
P(x 3
2
3
2
90
85. Continuous Probability Distribution
• A continuous probability distribution is a smooth
density curve that models the distribution of a
continuous random variable.
• The area under the curve is 1 and the area
within any interval is approximately the
probability that the value of the random variable
is in that interval.
• Density function is a formula used to represent
the distribution of a continuous random variable.
91
86. Definition
• Probability distribution for a continuous
random variable for a nonnegative function
f(x) (probability density function) is:
– Total area bounded by its curve and the x-
axis is equal to one
– Subarea under the curve bounded, X-axis and
the perpendiculars erected at any two points
give the probability that x is between a and b
92
87. 2. Normal distribution
• The Normal Distribution also called the Gaussian
distribution is the most important of the
distribution in all statistics.
• The normal density is given by:
= 3.141….. and e = 2.72….
x
where
e
x
f
x 2
2
1
2
1
93
88. Characteristics
1. It is symmetrical about its mean
2. Mean, median and mode are equal
3. The total area under the curve above the x
axis is one square unit
4. One SD from the mean in both directions
approximately 68% of the area
5. The height of the curve =
6. The normal distribution is determined by the
parameters standard deviation and mean.
2
/
1
94
91. The standard Normal distribution
• Definition: A normal distribution with mean 0
and variance 1 will be referred to as a standard,
or unit, normal distribution. This distribution is
denoted by N(0,1).
2
2
1
z
2π
1
f(z) e
for - < z < +
This distribution is symmetrical about 0 (the mean),
since f(x)=f(-x). About 68% of the area under the
normal density lies +1 and -1, about 95% lies
between +2 and -2, and about 99% lies between
+2.5 and -2.5
97
92. Application of Normal distribution
• Example:
Suppose it is know that the height of a population
of individual are approximately normally
distributed with a mean of 70 inches and standard
deviation of 3 inches. What is the probability that
a person picked at random from this group will be
a) between 65 and 74 inches tall?
b) greater than 75 inches
c) less than 65 inches
98
93. Solution
Step 1: Transform this to standard normal
distribution by using
Step 2: Determine the area under the curve
bounded by the curve, x-axis and the two points.
P( a<z<b).
Step 3: Look at the z distribution table for the
corresponding value of z.
99
94. 3. The t-distribution
• The t-distribution is a family of continuous
probability distributions that arise when
estimating the mean of a normally distributed
population in situations where the sample
size is small and population standard
deviation is unknown.
• Whereas a normal distribution describes a full
population, t-distributions describe samples
drawn from a full population; accordingly,
the t-distribution for each sample size is
different.
100
95. The t-distribution
• The t-distribution is similar in shape to the
Normal distribution but is more spread out with
longer tails than the standard Normal.
• It is symmetrical about zero, its mean, and the
variance, σ2 is = k/(k-2) for k > 2, k = df, µ does
not exist for k=1, σ2 does not exists for k = 1,2
• The df increases with the sample size. As the
sample size increases, the shape of the t-
distribution becomes increasingly more like the
standard Normal distribution.
• It is used for estimation of means.
101
99. Why sample?
• It is usually not cost effective or practicable to
collect and examine all the data that might be
available.
• Instead it is often necessary to draw a sample of
information from the whole population to enable
the detailed examination required to take place.
• Sampling provides a means of gaining
information about the population without the
need to examine the population in its entirety.
105
100. • Purposes of sampling: Provides various
types of statistical information of a
qualitative or quantitative nature about the
whole by examining a few selected units.
• Advantages of sample based studies
– Cost effectiveness
– Timeliness
– Inaccessibility of some people
– Less destructive in data summarization
– Accuracy
106
101. Caveats
• Sampling can provide a valid, defensible
methodology but it is important to match
the type of sample needed to the type of
analysis required.
• The auditor should also take care to check
the quality of the information from which
the sample is to be drawn. If the quality is
poor, sampling may not be justified.
107
102. Sampling Designs
• Sample design covers the method of selection, the
sample structure and plans for analysing and
interpreting the results.
• Sample designs can vary from simple to complex
and depend on the type of information required and
the way the sample is selected.
• The design will impact upon the size of the sample
and the way in which analysis is carried out. In
simple terms the tighter the required precision and
the more complex the design the larger the sample
size. 108
103. Sampling Designs
• The design may make use of the characteristics
of the population, but it does not have to be
proportionally representative.
• It may be necessary to draw a larger sample
than would be expected from some parts of the
population;
• For example, to select more from a minority
grouping to ensure that we get sufficient data for
analysis on such groups.
109
104. Sampling Designs
• The aim of the design is to achieve a
balance between the required precision
and the available resources.
110
105. Definition of terms
• Sample – Subset of the population of interest
• Sampling – process of selecting units from
the population of interest so that by studying
the sample we generalize our result back to
population.
• Sampling can provide a valid, defensible
methodology but it is important to match the
type of sample needed to the type of analysis
required.
111
106. • Population - Finite or infinite set of objects
whose properties are to be studied.
• Study population/sample population –
subset of target population chosen so as to be
representative of the total population
• Sampling unit - unit of selection in the
sampling process.
• Study unit – subject on which information is
collected.
112
107. Conditions that needs to be met
The sample must be well chosen – Representative
the method of choosing the sample matters
the best methods involve the planned
introduction of chance
A sampling procedure should be fair, selecting
people for inclusion in the sample in an impartial
way, so as to get a representative cross section of
the public – No selection bias
When a selection procedure is biased, taking a large
sample does not help. This just repeats the basic
mistake on a large scale
113
108. Conditions …
A sample chosen in a haphazard fashion, or
because it is ‘handy’, is unlikely to be a
representative one. This kind of samples may be
used in exploratory surveys to get a ‘feel’ about
the situation
The sample must be sufficiently large –
Sample size
There must be adequate coverage of the sample
– Response rate
Non-respondents can be very different from
respondents. When there is high non-response
rate, lookout for non-response bias. 114
109. Is a sample any good?
Some samples are really bad. To find out
whether a sample is any good, ask:
1. How it is chosen?
2. Was there selection bias?
3. Non-response bias?
These questions might not be answered just
by look at the data
115
110. Sampling techniques/methods
• Sampling is the process of selecting a number of
study units from a defined study population.
• Clearly define study population and study unit
– Study population – individuals, households,
institutions, records, etc…
– Study units – an individual, a household, an
institution or a record
116
111. Sampling cont…
• Types: probability and non-probability
– Probability – quantitative studies
– Non-probability – qualitative studies
• Probability sampling technique:
– Involves using random selection procedures to ensure that each
unit of the sample is chosen on the basis of chance.
– All units of the study population should have an equal, or at
least a known non-zero chance of being included in the sample.
– Sample drawn in such a way that it is representative of the
population
– The type to be used depends on population composition and
availability of sampling frame
117
113. 1. Simple random sampling
• Selecting required number of sampling units
randomly from list of all units
– Up-to-date Sampling frame
– Random selection – manually using table of random
numbers or using computer programs
• E.g. 250 households from list of 9000 households
• Better representativeness but costly and
representativeness reduced in heterogeneous
population
119
114. 2. Systematic sampling
• Sampling units are selected at regular intervals. The
starting unit is selected randomly
• Example: to select a sample of 100 students from
2500, first calculate sampling interval=2500/100=25.
Then randomly select the first student and finally pick
every 25th student
• Easier and less time consuming
• Can be done without sampling frame – sequential
studies
• Risk of bias if there is cyclic repetition
120
115. 3. Stratified sampling
• Used when the population structure consists distinct
subgroups/strata
• Ensures proportions of individuals with certain
characteristics in the sample will be the same as those
in the whole population
– Representation of groups with different characteristics
• The study population must be divided into strata of
the characteristic (Example: residence, age, sex,
profession) and then random or systematic samples
are obtained from each stratum
121
116. 3. Stratified sampling cont.
• Depending on the need, samples from each stratum
can be drawn either proportional to their size or non-
proportionally/equal size from each stratum
– Proportional- using sampling fraction (N/n)
– Equal size – to represent small groups
• Improved representativeness
• Estimates can be obtained for each stratum and the
population
122
117. 4. Cluster sampling
• Groups of study units (clusters) instead of individual
study units are selected at a time
• Assumes homogeneity of population with respect the
characteristic to be measured
• All the study units in the selected clusters are
included in the study
• Used in geographically scattered areas where visiting
dispersed study units is time consuming and costly
• Example: a simple random sample of 5 villages from
30 villages
• Easier but less representative
123
118. 5. Multistage sampling
• Carried out in stages – PSU, SSU…
• Used in very large and diverse populations
• The method used in most community-based big
studies
• E.g. In a study to be undertaken in a big town the
sampling may involve stages like selection of
kefetegnas, kebeles and finally houses
• Representativeness and reduced cost
124
119. 5. Multistage sampling
• The larger the number of clusters, the greater is
the likelihood that the sample will be
representative.
• Further, the sampling units at community level
should be selected randomly (avoid convenience
sampling!).
125
120. Bias in sampling
• Bias in sampling is a systematic error in
sampling procedures, which leads to a distortion
in the results of the study.
• Bias can be introduced as a consequence of
improper sampling procedures, which result in
the sample not being representative of the study
population.
126
121. Bias …
• There are several possible sources of bias that
may arise when sampling. The most well known
source is non-response.
• Non-response can occur in any interview
situation
• Respondents may refuse or forget to fill in the
questionnaire
• The problem lies in the fact that non-respondents
in a sample may exhibit characteristics that differ
systematically from the characteristics of
respondents.
127
122. Bias …
There are several ways to deal with this problem and
reduce the possibility of bias:
1. Data collection tools should be pre-tested.
2. If non-response is due to absence of the subjects,
follow-up of non-respondents may be considered.
3. If non-response is due to refusal to co-operate, an
extra, separate study of non-respondents may be
considered in order to identify to what extent they
differ from respondents.
4. Include additional people in the sample, so that non-
respondents can be replaced if their absence was
very unlikely to be related to the topic being studied.
128
123. Bias …
Other sources of bias in sampling:
Studying volunteers only – volunteers are
motivated to participate in the study.
Sampling of registered patients only –
Patients reporting to a clinic are likely to
differ systematically from people seeking
alternative treatments
Seasonal bias.
Tarmac bias – easily accessible by car.
129
124. Non-probability sampling methods
Quota Sampling: Each data collector is assigned
a fixed quota of subjects to interview; the number
falling into certain categories (like residence, sex,
age, etc.) are also fixed. On the other hand, the
interviewers are free to select anybody they like.
From common sense point of view, quota sampling
looks good. It seems to guarantee that the sample
will be like the population with respect to all the
important characteristics that affect the variable of
interest.
130
125. In quota sampling, the sample is hand-picked
to resemble the population with respect to
some key characteristics. The method
seems reasonable, but does not work very
well. The reason is unintentional bias on
the part of the interviewers.
131
126. Other non-probability sampling methods
• Purposive sampling
• Snowball or chain sampling
• Extreme case sampling
• Maximum variation sampling
• Homogeneous sampling
• Critical case sampling
132
127. Sample size estimation
• How many subjects are needed in the sample
to enable draw conclusion on the whole
population?
– Depends on expected variation in the data and
number of units per cell for analysis
– The eventual sample size is a compromise between
what is desirable and what is feasible
133
128. Sample size cont…
• Minimum sample size can be calculated
depending on the objective of the study
– Estimation of population parameter with certain
precision
• Single variable estimation (single population mean,
proportion or rate)
• Descriptive studies - Prevalence, coverage and utilization
rate studies
– Test of significant difference between groups
• Analytic studies - comparative cross-sectional, case-
control, cohort and clinical trials
134
129. Sample size - single proportion
• For making confidence limit statement (such as
prevalence study), the following formula can be used
to estimate minimum sample size:
• For population <10,000, use finite population
correction
2
2
2
1
1
d
P
P
Z
n
P
P
Z
N
d
P
P
Z
N
nf
1
1
1
2
2
1
2
2
2
1
135
130. Single proportion cont…
• Parameters in the formula
– n is minimum sample size
– P is estimate of the prevalence rate for the
population
• From available data, or Pilot study result, or 0.5 should be
used to get the possible minimum large sample size; if given
in range, take the value closest to 0.5.
– d is the margin of sampling error tolerated
– Z1-α/2 is the standard normal variable at (1-α )%
confidence level and α is mostly taken to be 5%
• Usually 95% confidence level is used = 1.96
– N population size 136
131. Exercise
• What sample size do we need to estimate the
prevalence of HIV among residents of a town such
that the error of estimation is within 1% of its actual
parameter with 95% confidence?
137
132. Measuring prevalence for more than one
item in one group
• Take estimated prevalence of the most important item
to be measured or
• Determine sample size for each item/specific
objective and then
– Take estimated prevalence of the item that gives
the maximum sample size
138
133. Sample size-two proportion
For test of significance study the following formula can
be used:
Parameters:
n - size of sample in each group
P1 ,P2 – estimated population prevalence in the
comparison groups
β = 1- Power (the probability that if the two proportions
differ the test will produce a significant difference)
– Usually a power of 80% or 90% is used
2
2
1
2
2
1
1
2
2 1
1
p
p
p
p
p
p
Z
Z
n
139
134. Exercise
A study is designed to assess the difference in the
proportion of physicians leaving health services in
urban and rural areas. From available literature 30% and
15% of physicians are estimated to leave services in
rural and urban areas within three years of graduation
respectively. What sample size is required for the study?
140
135. Sample size – case-control studies
• Formula –
• Parameters:
– P1 ,P0–estimated prevalence of exposure in the case
and controls respectively
– P0 can be estimated as the population prevalence of
exposure
– P′ – derived from P1 ,P0, m and odds ratio
– OR : odds ratio of exposures between cases and
controls
– m : number of control subjects per case subject
2
1
2
1
1 1
1
1
1
o
o
o
p
p
p
mp
p
p
z
p
p
m
z
n
141
136. Exercise
• Example: Suppose you want to test presence of
difference in exposure status between cases and
controls at 95% confidence level and with power of
80% using a 1:1 ratio of cases to controls while
looking for an odds ratio of 2. You assume the
prevalence of exposure controls is 25%. How many
sample size do you need?
142
137. Sample size-two proportion
• More than one comparison variable – take the one
with the smallest estimated difference
– To get largest sample size
• Different formulae
– Case-control studies
– Matched studies
– Survival analysis
– Other cases
• Reference
– http://www.statsdirect.com/help/sample_size_and_me
thods/sms.htm
143
138. Five key factors
1. Confidence level: how certain you want to be that the
population figure is within the sample estimate and its
associated precision.
2. Variability in the population: the SD is the most usual
measure and often needs to be estimated.
3. Margin of error or precision: a measure of the possible
difference between the sample estimate and the actual
population value.
4. The population proportion: the proportion of items in
the population displaying the attributes that you are
seeking.
5. Population size: only important if the sample size is
greater than 5% of the population in which case the
sample size reduces.
144
139. Sample size – other considerations
• Non-response
– Add contingency – say 10%
• More – sensitive topic, self-administered questionnaire
(up to 30%)
– Response rate for
• Cross-sectional survey >85%
• Cohort - >60-80%
• Sampling technique
– In complex samples (cluster, multistage) increase the
sample size to account for design effect
145
140. Sample size – other considerations cont.
– Design effect - ratio variance of estimate derived from
a complex sampling design to the variance of estimate
from simple random sample
– Usually sample size is multiplied by 2 (1.5) in cluster
sampling
• Increase – large PSU, many stages, clustered variable
• Qualitative methods – estimate, not determined
• Better to have good quality data than large sample
after a certain point
• Better to have representative than large sample
– Use representative sampling techniques
146
141. Sampling distribution
Definition: A parameter is a numerical descriptive
measure of a population (μ). A statistic is a
numerical descriptive measure of a sample ( ).
To each sample statistic there corresponds a
population parameter. We use , S2, S , p, etc. to
estimate μ, σ2, σ, P (or π), etc.
X
X
147
142. Sampling distribution of Means
• The sampling distribution of means is one of the
most fundamental concepts of statistical
inference, and it has remarkable properties.
• Since it is a frequency distribution, it has its own
mean and standard deviation
Example: let a population of size 6 has values for
weight of individuals with 55.7, 66.7, 85.5, 79.7,
122.4 and 78.1. Select all possible samples of size
3 from this population and check if the sample mean
is unbiased estimate of population mean and
calculate the standard error of the sample mean.
148
143. Measurements of weight of individuals of
the population
Population values: 55.7 66.7 85.5 79.7 122.4 78.1
Sum of observations 488.1
Population mean (µ) 81.35
Population SD (σ) 20.77
All possible unique sample 20
n
N
N
X
N
X
2
2
)
(
149
144. Sample Obs1 Obs2 Obs3 Mean
S1 55.7 66.7 85.5 69.30
S2 55.7 66.7 79.7 67.37
S3 55.7 66.7 122.4 81.60
S4 55.7 66.7 78.1 66.83
S5 55.7 85.5 79.7 73.63
S6 55.7 85.5 122.4 87.87
S7 55.7 85.5 78.1 73.10
S8 55.7 79.7 122.4 85.93
S9 55.7 79.7 78.1 71.17
S10 55.7 122.4 78.1 85.40
S11 66.7 85.5 79.7 77.30
S12 66.7 85.5 122.4 91.53
S13 66.7 85.5 78.1 76.77
S14 66.7 79.7 122.4 89.60
S15 66.7 79.7 78.1 74.83
S16 66.7 122.4 78.1 89.07
S17 85.5 79.7 122.4 95.87
S18 85.5 79.7 78.1 81.10
S19 85.5 122.4 78.1 95.33
S20 79.7 122.4 78.1 93.40
Sum of means 1627.00
Mean of means 81.35
Variance of means 86.27
SD of sample means 9.29
n
N
n
N
n
n
N
n
X
X
n
X
1
X
of
error
Standard
X
deviation
Standard
X
means
sample
of
Mean
1
)
(
S
variance
Sample
X
mean
Sample
2
2
150
145. Properties
1. The mean of the sampling distribution of means
is the same as the population mean, μ
2. The SD of the sampling distribution of sample
means is ≈ σ/√n if n is large
3. The sampling distribution of sample means is
approximately normal, regardless of the shape
of the population distribution provided n is large
(> 30) enough (Central limit theorem).
1
N
n
N
n
151
149. Cont…
Estimator: Methods or rules to compute
values/ estimate.
Estimator need to have characteristics of
unbiasedness.
• T of the parameter x is said to be unbiased
estimator of x if E(T) =x.
155
150. Cont…
• Estimation is calculating, from sample data, some statistic
that offers an approximation for the corresponding
parameter of the population from which the sample is
drawn.
• Properties of good estimators
– Unbiased: An estimator is said to be unbiased if in
the long run it takes on the value of the population
parameter
– Efficiency: An estimator is said to be efficient if in the
class of unbiased estimators it has minimum variance
– Consistency: A sequence of estimators is said to be
consistent if it converges in probability to the true value
of the parameter
– Sufficiency: an estimator is sufficient if it uses all the
sample information 156
151. Estimation methods
• Point estimate:
a single numeric value used to estimate the
corresponding population parameter.
frequently used point estimators ( sample statistic)
sample statistic coresponding population
sample mean population mean
sample variance population variance
sample standard deviation population standard deviation
sample proportion population proportion
157
152. Interval Estimate
• Interval estimate:
Two numerical values defining a range of
values that, with a specified degree of
confidence, we feel include the parameter
being estimated.
158
153. Cont…
• Even if sample mean is good quality estimator,
it is better to explain in an interval regarding the
probable magnitude of population mean.
• Confidence intervals are about putting some
bounds on how far away the truth might be from
your estimate.
• Sample mean is the best unbiased estimator.
159
154. Cont…
• If the sample is drawn from normally distributed
population, sample distribution will be normal.
• Even if the distribution of the population is non
normal, sampling distribution will assume normal
distribution if sample size is sufficiently large.
• Ninety-five (95%) percent of possible value of
will lie between two standard deviation of
x
2
2
s
x
160
155. Interval estimator component
• Reliability coefficient value of Z or t within the
standard error:
• Standard error – measure of sample mean
variability in repeated sampling.
n
x
z
n
s
x
t
161
156. Standard Error of the Mean
• It helps us to quantify in some way how good our
estimate of the mean is of the true, & unknown,
population mean- how large an error might we
be making
• Standard error of sample mean is 𝑆𝐷 𝑛 and it
is:
• Error that arise from variability in the sample
means
• It indicates the variability of the distribution of
means of samples caused by sampling error
and measurement error.
162
157. Confidence interval
• The confidence interval provides a range that is
highly likely (often 95% or 99%) to contain the
true population value, or parameter that is being
estimated.
• The narrower the interval the more informative is
the result. It is usually calculated using the point
estimate and its standard error.
163
158. • Provide an interval around our estimate
showing how much error there might be
either side of the estimate
lower upper
confidence estimate confidence
interval interval
164
159. Interval estimate for mean:
one sample situation
• Confidence interval of the mean with known
population standard deviation
• Confidence interval of the mean with unknown
population standard deviation for small sample
size
n
Z
x
x
SE
z
x
2
/
1
)
2
/
1
( )
(
n
s
n
t
x
x
se
df
t
x )
1
(
)
(
)
( 2
/
1
2
/
1
165
160. Cont…
Interpretation of confidence interval
• Probabilistic: in repeated sampling from a
normally distributed population with known SD of
all interval will in the long run include population
mean
• Practical: when sampling from normally
distributed population with known SD (σ), we are
confident that the single computed interval
contains the population mean.
166
161. Cont…
• Confidence coefficient commonly used values are
0.9, 0.95 & 0.99 associated reliability coefficient
value of 1.645, 1.96 and 2.58 respectively for the
standard normal random variable (Z).
• Precision:
The quantity obtained by multiplying the reliability
factor by the SE of the mean called margins of
error.
167
162. Computing a 95 and 99% CI for μ
• Given = 19.26, σ = 2.52 and n = 117
• At 95% confidence level, α = 0.05 (α/2=0.025) and at 99%
α = 0.01 (α/2=0.005)
• Z0.975 = 1.96 and Z0.995 = 2.58
95% CI for μ becomes
• 19.26 1.96*2.52/117 = (18.80 μ 19.72)
99% CI for μ becomes
• 19.26 2.58*2.52/117 = (18.66 μ 19.86)
x
168
163. Computing CI for μ when σ is unknown
• When the population SD (σ) is unknown, it
should be estimated from the sample SD (s)
• Accordingly, the standard error of the sample
mean will be estimated by s/√n
• Therefore, the say 95% CI for μ with n < 30 will
be based on the t-statistic as:
where (n-1) is the degree of freedom
n
s
n
t
x /
)
1
(
975
.
0
169
164. Example
• Consider the following summary information
based on data on systolic blood pressure of a
random sample of 30 individuals selected from a
normal population. Compute a 95% and 99% CI
for μ
• n=30, df=30-1=29, at 95% confidence level, t0.975(29)=
2.045 and at 99%, t0.995(29)=2.756, se( )=16.3/30=2.98
• 95% CI for μ: 115.9 2.045*2.98 = (109.8 μ 122.0)
• 99% CI for μ: 115.9 2.756*2.98 = (107.7 μ 124.1)
3
.
16
s
,
9
.
115
X
x
170
165. Standard Error of the difference between
two sample means
• Most medical research is comparative, as a
result we are more often concerned with two or
more samples rather than a single sample, i.e.,
compare difference between two samples.
• This helps in deciding whether or not it is likely
that the two mean are equal
• When the interval includes 0, the two means
might be equal.
• When the interval does not include zero the two
mean are different.
171
166. Cont….
The Z test statistic can be used in confidence
interval to estimate difference between two mean
if the variances of the populations are known
A 95% confidence interval for the difference of the
two means is given by:
2
2
2
1
2
1
2
1
2
2
2
1
2
1
975
.
0
2
1 96
.
1
)
(
)
(
n
n
X
X
n
n
Z
X
X
172
167. Unknown Variance
The t-test statistic is used when the
population standard deviations are unknown
and small sample size under the two sets of
conditions
1. When equal variance is assumed
2. When the variance are unequal
173
168. Cont…
• When the variance are equal, the variances are
pooled to estimate the common variance.
• Pooled estimate is obtained by weighing
average of the two sample variance.
• Each sample variance is weighed by its degree
of freedom (n-1).
• If the sample size are equal, the weighed
average equal the arithmetic mean of the two
sample variance.
• If the sample size are different, weighed average
take the advantage of additional information
provided by the larger sample.
174
169. Unknown but equal variances
• The pooled standard deviation (Sp) is
calculated using the following formula:
• Then the standard error of the difference
of the two sample means is:
2
)
1
(
)
1
(
2
1
2
2
2
2
1
1
n
n
S
n
S
n
Sp
2
1
2
1
1
1
)
(
n
n
S
X
X
se p
175
170. Example: Was there a difference in the mean
fasting blood glucose level between men and
women given data from normal populations
Sex Mean SD n
Men 98.14 19.59 57
Women 95.19 14.03 59
Total 96.64 16.98 116
• Compute a 95% CI for the population mean
difference
– Assuming the standard deviations (SD) are
population SD
– Assuming the population variances are unknown but
assumed to be equal
176
171. Factors affecting the length of a
confidence interval (CI)
– Sample size (n)
– Standard deviation (σ)
– Confidence level (1-α)
177
172. Hypothesis Testing
Why is hypothesis testing so important?
• Hypothesis testing provides an objective
framework for making decisions using
probabilistic method, rather than relying on
subjective impressions.
• The Null hypothesis, denoted by Ho, is the
hypothesis that is to be tested.
• The alternative hypothesis H1 is the hypothesis
that in some sense contradicts the null
hypothesis.
178
173. Cont…
• While making decision on the null and
alternative hypothesis, we have four
possible outcomes:
1. We accept Ho, and Ho is in fact true – confidence level
(1-α).
2. We accept Ho, and H1is in fact true – Type II error (β).
3. We reject Ho, and Ho is in fact true – Type I error (α).
4. We reject Ho, and H1 in fact is true – Power of the test
(1- β).
179
174. One Sample Test for the Mean from a
Normal population
1. One Sided Alternative (One-tailed)
Unknown Variance
• A one tailed test is a test in which the values of
the parameter being studied (in this case mean)
under the alternative hypothesis are allowed to be
either greater than or less than the values of the
parameter under the null hypothesis, but not both
180
175. Cont…
I. Alternative mean < Null mean
• One sample t -test for the mean of a normal
distribution with Unknown variance to test the
hypothesis:
If t < t1- with n-1 df, then Do not Reject Ho
If t >= t1- with n-1 df, then Reject Ho
n
s
X
t o
181
176. Cont…
Two ways to determine statistical significance:
1. Critical value method – comparing the tabulated
value of the test statistic to the calculated value
for a given level of significance
2. P-value method
182
177. Cont…
The p value is the α level at which the given
value of the test statistic (such as t) would be on
the boarder line between the acceptance and
rejection zone.
P=p(tn-1 ≤ t)
where p is the area to the left of ’t’ under a tn-1
distribution.
183
178. Guidelines to judge p-value
1. If 0.01 <= p < 0.05, statistically significant
2. If 0.001 <= p < 0.01, statistically highly
significant
3. If p < 0.001, very highly statistically
significant
4. If p > 0.05, not statistically significant
184
179. II. Alternative mean >Null mean
• To test the hypothesis:
Ho: = Vs H1 : > , Variance Unknown
With a significant level, , the test is based on ‘t’
where:
• If t > tn-1, 1-α Ho is rejected
• If t < tn-1, 1- α Ho is accepted
o
o
n
s
x
t o
/
185
180. Cont…
2. Two-sided alternatives (two tailed)
It is a test in which the values of the parameter
being studied under the alternate hypothesis are
allowed to be either greater than or less than the
values of the parameter under the null hypothesis,
Ho.
186
181. Cont…
• To test the hypothesis:
Ho : = versus H1: ≠ with a significant
level of
/t/ > tn-1,1- α /2 Ho rejected
/t/ < tn-1,1- α /2 Ho accepted
n
s
x
t o
/
o
o
187
182. Cont…
• P-value for two tailed t-test
n
s
x
t o
/
0
t
if
)]
(
1
[
2
0
t
if
)
(
2
1
1
t
t
P
P
t
t
P
P
n
n
188
183. Cont…
One sample Z-test - Two Tailed
• The critical values and p-values for the one
sample t-test have been specified in terms of
percentiles of the t distribution, assuming that the
underlying variance is unknown.
• In some applications, the variance may be
assumed known from prior studies. In this case,
the test statistic t-test is replaced by the test
statistic ′Z′
189
184. Cont…
To test the hypothesis, we use
Z < Z α /2 or Z > Z1- α /2 ,reject Ho
Z α /2 < Z < Z 1- α /2 , Don’t reject Ho
n
x
z o
/
190
185. Cont…
• One Tail
• Alternative mean < Null mean (Variance
Known)
Z < Z α , then Ho rejected
Z > Z α, Ho accepted
• Alternative mean > Null mean (Variance
Known)
Z > Z1- α , then Ho rejected
Z < Z α, Ho accepted
191
186. Relationship between Hypothesis
Testing and confidence interval
–Two sided case
• Suppose we are testing Ho : = versus
H1: Ho is rejected with a two –sided level
alpha test if and only if the two sided confidence
interval for Does not contain , otherwise
accept Ho.
o
o
o
192
187. Hypothesis Testing Two Sample
Inference
• In a two sample hypothesis testing, the
underlying parameters of two different
Population, neither of whose values is
assumed Known, are compared.
• Two samples are said to be Paired when
each data point of the first sample is
matched and is related to a unique data
point of the second sample.
193
188. Cont…
• Two samples are said to be independent
if the data points in one sample are
unrelated to the data points in the second
sample
194
189. The paired t- test
• the statistic is denoted by
where SD(d) is the sample standard deviation of
the observed difference and n is the number of
differences
n
d
SD
d
t
)
(
195
190. Cont…
• Degree of freedom n-1
– If t>tn-1 ,1- α /2 or t<-tn-1, 1- α /2 then Ho is
rejected.
– - tn-1, 1- α /2 <t<tn-1, 1- α /2
• P- value is 2x the area of ‘t’
196
191. • Example:
• Suppose a sample of 20 students were
given a test before studying a particular
module and then again after completing
the module.
• We want to find out if, in general, our
teaching leads to improvements in
students’ knowledge/skills (i.e. test
scores).
197
193. 199
• Hypothesis: Ho: △=0 and HA: △≠0
• Calculating the mean and standard deviation of
the differences: 𝑑= 2.05 and sd(d) = 2.837.
Therefore, se(𝑑) = 2.837/ 20 = 0.634
• So, we have: t = 2.05/0.634 = 3.231 on 19 df with
p = 0.004.
• Therefore, there is strong evidence that, on
average, the module does lead to improvements.
194. Two sample t – test for independent
sample with equal variance
• The equation is given by:
where, the weighted average of variance1 and variance2
could simply used as the estimate of
• The degree of freedom will be the sum of the degree of
freedom of the two samples, i.e., (n1-1) + (n2-1)
2
1
2
1
1
1
n
n
S
X
X
t
p
2
200
196. Sampling distribution of proportions
Construction
• It is done in the same manner as that of
the mean
• take all possible samples of a given size
• Compute the sample proportion for each
• Prepare a frequency distribution of the
proportions
202
197. Cont…
Characteristics:
– When the sample size is large the distribution is
approximately normal
– The mean of the distribution, , will be equal
to the true proportion P.
– the variance of the distribution, , will be
equal to
P̂
2
p̂
n
p
p )
1
(
203
198. Sampling distribution of difference
between two proportions
• For independent random samples n1 and n2 drawn
from two populations of dichotomous variables and
when P1 and P2 are the population proportions of
the characteristic
• Distribution of is approximately normal with
mean:
• And variance:
2
1
ˆ
ˆ p
p
2
1
ˆ
ˆ 2
1
p
p
p
p
2
2
2
1
1
1
2
ˆ
ˆ
)
1
(
)
1
(
2
1
n
p
p
n
p
p
p
p
204
199. Estimation of single proportions
• Confidence intervals of proportions by
approximation to the normal distribution and the
sample standard deviation.
• The confidence interval for the population
proportion :
where p is the proportion of successes (event),
q=(1 - p) is the proportion of failures,
n is the sample size and z denotes the z value
relating to a defined probability level.
n
p
p
Z
p
)
1
(
205
200. Estimation of difference between
two proportions
• Unbiased point estimators are
• Standard error of the estimate when n1 and n2 are
large enough and are not close to 1 or 0
• Since population proportions are not known
2
2
2
1
1
1
ˆ
ˆ
)
ˆ
1
(
ˆ
)
ˆ
1
(
ˆ
2
1
n
p
p
n
p
p
p
p
2
1
ˆ
ˆ p
and
p
2
1
ˆ
ˆ p
p
206
202. Hypothesis testing on single
population proportions
• Follows from the properties of the sampling
distribution of the sample proportion
• The null hypothesis
and
• The alternate hypothesis
o
A
o
o
P
P
H
P
P
H
:
:
208
203. Cont…
• Test statistics
• Where Ho is true the sample proportions are
approximately distributed as standard normal
distribution
n
p
p
p
p
Z
o
o
)
1
(
ˆ
0
209
204. Testing differences between two
sample proportions
• The most commonly used test
Ho: P1-P2 = 0 or P1=P2
• Under Ho, thus pooled estimate for the proportions will be
• Standard error
2
1
2
2
1
1
2
1
2
1
n
n
p
n
p
n
n
n
x
x
P
2
1
ˆ
ˆ
)
1
(
)
1
(
2
1
n
p
p
n
p
p
p
p
210
205. Cont…
• The test statistic will be:
2
1 ˆ
ˆ
2
1
2
1
ˆ
ˆ
p
p
P
P
p
p
z
211
206. Example: Comparison of number of swimming
hours’ by swimmers with or without erosion of
dental enamel
Number of
swimming hours
per week
Erosion of dental
enamel (EDE) Total
Yes No
≥ 6 hours 32 118 150
< 6 hours 17 127 144
Total 49 245 294
212
Prevalence of EDE (P) 0.167
Standard error 0.022
95% CI for P: Lower 0.124
Upper 0.209
207. 1. Estimate the prevalence of erosion of
dental enamel and calculate a 95% CI
2. From previous studies among
swimmers it is claimed that the
prevalence of erosion of dental enamel
was 14%. Is the claim justified? Give
your p-value
213
208. 3. Compute the respective prevalence of erosion
of dental enamel for those who had 6 hours
and < 6 hours of swimming time and calculate a
95% CI for the difference in the prevalence.
4. Is there a difference in the prevalence of erosion
of dental enamel between the two swimming
times? Give your p-value
214
209. Amount of swimming time per week P
≥ 6 hours 0.213
< 6 hours 0.118
Total 0.167
p1 – p2 0.095
Ho: P1=P2, HA: P1≠P2
se(p1-p2) 0.044
Z 2.174
95% CI for P1-P2
se(p1-p2) 0.042
Lower 95% 0.013
Upper 95% 0.177
215
210. Exercise: A study was conducted to look at the
effect of oral contraceptives (OC) on heart disease
in women 40-44 years of age over 3 years. Given
the following data, is there a difference in the rate of
MI between OC-users and non-users? Compute
95% CI for the difference.
OC-use group
MI status over 3
years Total
Yes No
OC-users 13 4,987 5,000
No-OC-users 7 9,993 10,000
Total 20 14,980 15,000
216
213. Statistical errors related to study design
• Study aims and primary outcome measures
not clearly stated or unclear
• In adequate sample size
• Choice of inappropriate high risk sample to
make inferences about the general population
• Failure to report number of participants or
observations
• Use of an inappropriate control group
219
214. Errors in execution
• Failure to adhered to the study protocol
– Misuse of sample selection procedures
– Exclusion and inclusion criteria not strictly
followed
– Failure to follow randomization procedures
220
215. Statistical errors in presentation
• Inadequate graphical or numerical description of
basic data
– Presenting or plotting mean but no indication of
variability
– Giving SE instead of SD to describe data
– Failure to define ± notation for describing variability
– Numerical information given to an unrealistic level
of precision to present data and results
– Inappropriate graph selection that doesn’t reflect
characteristics of variables and use of three
dimensional graph for two dimension presentation
221
217. Statistical errors in analysis
• Using methods of analysis when assumptions are
not met
• Analyzing paired data ignoring the pairing
• Failing to take account of ordered categories
• Treating multiple observations on one subject as
independent
o Improper multiple pair-wise comparisons of more than
two groups
o Quoting confidence intervals that include impossible
values
• Failure to use multivariate techniques to adjust
for confounding factors
223
218. Statistical errors in interpretation of
study findings
• Wrong interpretation of results
“non significant” interpreted as “no effect”, or
“no difference”
Drawing conclusions not supported by the
study data
Significance claimed without data analysis
or statistical test mentioned
• Failure to discuss sources of potential bias and
confounding factors
224
219. Consequences of statistical errors
• Impossible to get ethical approval to conduct the
study
• Others researchers may be led to follow false line
of investigation
• Patients may receive an inferior treatment , either
as a direct consequence of the result of the study
or possibly by the delay in the introduction of a
truly effective treatment
• If the results go unchallenged the researchers
may use the same inferior statistical methods in
future research, and others may copy them due to
inappropriate conclusion 225