The document discusses sampling methods and statistical inference. It defines key terms like population, sample, sampling frame. It describes different sampling techniques including random sampling methods like simple random sampling and systematic sampling. It also covers non-random sampling techniques like quota sampling and convenience sampling. The minimum sample size is calculated using a standard formula. Statistical inference is defined as using a sample to make conclusions about the larger population. The key difference between a sample and population is also highlighted.
concept of sample and sampling, sampling process and problems, types of samples: probability and non probability sampling, determination and sample size, sampling and non sampling errors
concept of sample and sampling, sampling process and problems, types of samples: probability and non probability sampling, determination and sample size, sampling and non sampling errors
PUH 5302, Applied Biostatistics 1
Course Learning Outcomes for Unit III
Upon completion of this unit, students should be able to:
4. Recommend solutions to public health problems using biostatistical methods.
4.1 Compute and interpret probability for biostatistical analysis.
4.2 Draw conclusions about public health problems based on biostatistical methods.
5. Analyze public health information to interpret results of biostatistical analysis.
5.1 Analyze literature related to biostatistical analysis in the public health field.
5.2 Prepare an annotated bibliography that explores a topic related to public health issues.
Course/Unit
Learning Outcomes
Learning Activity
4.1
Unit Lesson
Chapter 5
Unit III Problem Solving
4.2
Unit Lesson
Chapter 5
Unit III Problem Solving
5.1
Chapter 5
Unit III Annotated Bibliography
5.2
Chapter 5
Unit III Annotated Bibliography
Reading Assignment
Chapter 5: The Role of Probability
Unit Lesson
Welcome to Unit III. In previous units, we discussed some fundamentals of biostatistics and their application
to solving public health problems. In Unit III, we will compute, interpret, and apply probability, especially in
relation to different populations.
Computing and Interpreting Probabilities
Probability means using a number (or numbers) to demonstrate how likely something is to occur. For
example, if a coin is tossed, the probability of getting a heads or tail is one out of two chances; that is ½.
Researchers have used probability studies to predict weather and other events and have been successful to
some extent. Public health professionals have used statistical methods to predict the chances of health-
related events, thereby providing arguments in favor of taking precautionary measures and warning the
general public on important health issues.
In biostatistics, we use both descriptive statistics and inferential statistics to address public health issues
within a population. In most cases, researchers are not able to study the entire population; they try to get a
sample from the population from which they can generalize their findings.
Descriptive Statistics
Aside from the use of probability sampling methods, there are other methods used for the computation and
interpretation of data; these are generally known as descriptive statistics. With descriptive statistics, we
UNIT III STUDY GUIDE
Probability
PUH 5302, Applied Biostatistics 2
UNIT x STUDY GUIDE
Title
normally compute the mean, mode, median, variance, and standard deviation. Information obtained using
such computation methods is used for descriptive purposes, as opposed to information obtained from
inferential statistics.
Let’s examine this example using the numbers 5, 10, 2, 4, 6, 10, 2, 3, and 2.
The mean is the sum of all the numbers ÷ the number of cases
= 37 ÷ 9
= 4.11
The median is the middle number after the numbers have been arranged in an ascending or descend ...
Sampling - Types, Steps in Sampling process.pdfRKavithamani
Sampling is a technique of selecting individual members or a subset of the population to make statistical inferences from them and estimate the characteristics of the whole population. Different sampling methods are widely used by researchers in market research so that they do not need to research the entire population to collect actionable insights.
This presentation covers statistics, its importance, its applications, branches of statistics, basic concepts used in statistics, data sampling, types of sampling,types of data and collection of data.
SURVEY RESEARCH- Advance Research MethodologyRehan Ehsan
This Presentation states the details of Survey Research for students to get help in advance research methodology. Rearchers may also get help from this work.
PUH 5302, Applied Biostatistics 1
Course Learning Outcomes for Unit III
Upon completion of this unit, students should be able to:
4. Recommend solutions to public health problems using biostatistical methods.
4.1 Compute and interpret probability for biostatistical analysis.
4.2 Draw conclusions about public health problems based on biostatistical methods.
5. Analyze public health information to interpret results of biostatistical analysis.
5.1 Analyze literature related to biostatistical analysis in the public health field.
5.2 Prepare an annotated bibliography that explores a topic related to public health issues.
Course/Unit
Learning Outcomes
Learning Activity
4.1
Unit Lesson
Chapter 5
Unit III Problem Solving
4.2
Unit Lesson
Chapter 5
Unit III Problem Solving
5.1
Chapter 5
Unit III Annotated Bibliography
5.2
Chapter 5
Unit III Annotated Bibliography
Reading Assignment
Chapter 5: The Role of Probability
Unit Lesson
Welcome to Unit III. In previous units, we discussed some fundamentals of biostatistics and their application
to solving public health problems. In Unit III, we will compute, interpret, and apply probability, especially in
relation to different populations.
Computing and Interpreting Probabilities
Probability means using a number (or numbers) to demonstrate how likely something is to occur. For
example, if a coin is tossed, the probability of getting a heads or tail is one out of two chances; that is ½.
Researchers have used probability studies to predict weather and other events and have been successful to
some extent. Public health professionals have used statistical methods to predict the chances of health-
related events, thereby providing arguments in favor of taking precautionary measures and warning the
general public on important health issues.
In biostatistics, we use both descriptive statistics and inferential statistics to address public health issues
within a population. In most cases, researchers are not able to study the entire population; they try to get a
sample from the population from which they can generalize their findings.
Descriptive Statistics
Aside from the use of probability sampling methods, there are other methods used for the computation and
interpretation of data; these are generally known as descriptive statistics. With descriptive statistics, we
UNIT III STUDY GUIDE
Probability
PUH 5302, Applied Biostatistics 2
UNIT x STUDY GUIDE
Title
normally compute the mean, mode, median, variance, and standard deviation. Information obtained using
such computation methods is used for descriptive purposes, as opposed to information obtained from
inferential statistics.
Let’s examine this example using the numbers 5, 10, 2, 4, 6, 10, 2, 3, and 2.
The mean is the sum of all the numbers ÷ the number of cases
= 37 ÷ 9
= 4.11
The median is the middle number after the numbers have been arranged in an ascending or descend ...
Sampling - Types, Steps in Sampling process.pdfRKavithamani
Sampling is a technique of selecting individual members or a subset of the population to make statistical inferences from them and estimate the characteristics of the whole population. Different sampling methods are widely used by researchers in market research so that they do not need to research the entire population to collect actionable insights.
This presentation covers statistics, its importance, its applications, branches of statistics, basic concepts used in statistics, data sampling, types of sampling,types of data and collection of data.
SURVEY RESEARCH- Advance Research MethodologyRehan Ehsan
This Presentation states the details of Survey Research for students to get help in advance research methodology. Rearchers may also get help from this work.
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
1. 4
Sampling
SAMPLE
[Q:
1. What do you mean by sampling? (BSMMU, January,
2010)
2. Define population and sample. (BSMMU, July 2011, July
2010, July2009)]
Sample is part of
anything which is
drawn form whole
mass. Sample
must be
representative and
unbiased.
Example: Insulin
given to 12 persons
and it is found that
blood sugar
2. Biostatistics-28
decreases. So, we can say insulin will lower blood sugar of all
persons Here 12 persons are samples which represent the
whole populations.
Population is the entire group of people or person or study
elements.
Target population study population study sample
Target Population
The target population is the entire group a researcher is
interested in; the group about which the researcher wishes to
draw conclusions.
Sample unit is the each member of the population.
Sample size
3. Biostatistics-29
One of the problems we often have in epidemiological
investigations is figuring out how large a sample we need to
answer a specific question. Our sample size must be big
enough for the study to have appropriate statistical power. We
base sample size calculations on a number of study design
factors:
prevalence
acceptable error
the detectable difference.
[Bonita R, Beaglehole R, Kjellström T 2006. Basic epidemiology,
2nd
edition, WHO.]
Definition: The sample size of a statistical sample is the number
of observations that constitute it. It is typically denoted by n. It
should not be less than 30. Small sample lacks precision.
Sample Size Formula
There are numerous formulae and computer programs that
simplify the task considerably.
Here is a common formula used to calculate sample size:
Sample Size
=
2
2
(1 )
Z p p
d
Where:
Z = Z value (e.g. 1.96 for 95% confidence level)
p = prevalence
4. Biostatistics-30
d = margin of error (usually considered 5% that is 0.05 when
counted in 1)
Example
It has been estimated that roughly 30% (0.3) of the children in
the project area suffer from chronic malnutrition. This figure
has been taken from national statistics on malnutrition in rural
areas. Use of the standard values listed above provides the
following calculation.
Calculation:
2
2
1.96 0.3(1 0.3) 3.841 0.21 0.8068
322.72
0.05 0.0025 0.0025
n
That is, sample size will be 323
In reality, sample size is often determined by logistic and
financial considerations, and a compromise always has to be
made between sample size and costs. A practical
guide to determining sample size in health studies has been
published by WHO
[Bonita R, Beaglehole R, Kjellström T 2006. Basic epidemiology,
2nd
edition, WHO.]
[Q:
1. Determine the minimum sample size to study the
prevalence of disease in community where it was 12%,
with 95% confidence level and acceptable error of 1.0%
(BSMMU, January, 2009)
5. Biostatistics-31
2. Determine minimum sample size to study a disease in a
community in which no prevalence of it is known to
you. (BSMMU, January, 2011)
3. Determine the sample size to conduct a study on a
disease having prevalence 10% with 95% confidence
and acceptable error of 1.0% (BSMMU, July, 2009)
4. How can you make sampling for a study on the
nutritional status of 5 to 10 years children of
Bangladesh? (BSMMU, July, 2011)]
Sampling frame
The listing of the accessible population from which you'll draw
your sample is called the sampling frame. If you were doing a
phone survey and selecting names from the telephone book,
the book would be your sampling frame.
Population and Sample
A population is any finite collection of elements that is,
individuals, items, observations, etc. under consideration in a
given problem. A sample is part, or a subset, of a population.
6. Biostatistics-32
The basic problem of statistical inference is to arrive at
generalisations concerning populations on the basis of
samples. For reasons of validity in making statistical inferences,
statistical methods assume that all samples are selected at
random with the use of sampling procedures.
Characteristics of a representative sample
There are two main characteristics of a representative sample.
1. Precision which implies the size of the sample
2. unbiased character
[Example of biased sampling: Some items of Bangladeshi company
may be better than multinational company. But due to biasness to
multinational company somebody may be hesitant to accept it even If
it is proved better. This is biasness.]
SAMPLING
Definition
[Q: Define sampling. (BSMMU, January, 2009])
Sampling is the technique of obtaining information about the
whole group by examining only the part of the whole group.
Objectives of sampling
Two main objectives of sampling are:
1. Estimation of population parameters (mean, proportion,
etc.) from the sample statistics.
2. To test hypothesis about the population from which the
sample or samples are drawn.
Types of sampling
[Q.
1. Discuss different methods of sampling. (BSMMU,
January, 2010)]
7. Biostatistics-33
2. Discuss different types of sampling techniques.
(BSMMU, July, 2009)
3. Discuss the common sampling techniques. (BSMMU,
July, 2010)
4. Discuss in short the sampling techniques with their
application. (BSMMU, July, 2011)
5. Enumerate different types of sampling. (BSMMU, July,
2011)]
A. Random sampling or probability sampling or non-
purposive sampling - A probability sampling method is any
method of sampling that utilizes some form of random
selection. In order to have a random selection method, you
must set up some process or procedure that assures that
the different units in your population have equal
probabilities of being chosen.
Criteria:
Sample value is equivalent to population value
unbiased
B. Nonrandom sampling or non-probability sampling or
purposive sampling – A sample technique in which units of
sample are selected on the basis of personal judgment or
convenience. Nonprobability sampling does not involve
random selection. With nonprobability samples, we may or
may not represent the population well.
In general, researchers prefer probabilistic or random sampling
methods over nonprobabilistic ones, and consider them to be
more accurate and rigorous. However, in applied social
research there may be circumstances where it is not feasible,
practical or theoretically sensible to do random sampling. Here,
we consider a wide range of nonprobabilistic alternatives.
A. probability sampling
[Q:
1. In short, discuss the probability sampling. (BSMMU, January,
2011)
8. Biostatistics-34
2. Describe simple random sampling techniques. (BSMMU,
January, 2009])
1. Simple Random Sampling(SRS)
[Q: Write short notes on: SRS (BSMMU, January, 2010)]
The method is applicable when the population is small,
homogeneous and readily available such as patients coming to
hospital or lying in the bed.
Two methods:
Lottery method
Table of random numbers.
Lottery Method
Suppose, 10 patients are to be put on a trial out of the 100
available patients. Note the serial number of patients on 100
cards and shuffle them well. Draw out one and note the
number. Replace the card drawn. Reshuffle and draw the
second card. Repeat the process till 10 numbers are drawn.
Reject the cards that are drawn for second time.
Table of Random Number Method
Required number to be selected from the table either going
vertically or longitudinally. [Table-please see in appendix]
2. Systematic Sampling
Here we will have to proceed methodically. We must follow
certain particular system to get the desired sample which
should be representative and biased
Example: Say in out patient department .there is a register
containing 10000 pages where diseases of URTI are registered.
So if we want to get the real picture throughout the year every
nth page to be noted. Otherwise if we see certain pages then
some facts may be omitted [e.g. – seasonally variation may be
omitted].
Another type of system sampling is as follows
9. Biostatistics-35
If we want to visit 30 health complexes out of 360 health
complexes of Bangladesh than first to find out the sample
interval.
Total no
Sample interval = Desired no
Here total no of health complexes =360
Desired no to visit =30
So, sample interval = 360 /30 =12.
[If it is a fraction, make round figure e.g. 12.1 may be
considered as 12, 12.9 as 13]
After getting sample interval all health center to be numbered
.Now if we make a draw of lottery from 1 to 12 than we can get
the first number .Suppose it is 9. If 9 numbered health complex
is Tongi, then that one to be visited first.
Second health complex 9+12=21st
.
Third health complex 21+12=33rd
. In this way desired no. of
sample to be taken
Merits
The systematic design is simple, convenient to adopt.
The time and labour involved in the collection of sample is
relatively small.
If the population is sufficiently large, homogeneous and each
unit is numbered, this method can yield accurate results.
3. Stratified Sampling
This method is followed when the population is riot
homogeneous. The population under study is first divided into
homogeneous groups or classes called strata and the sample is
drawn from each stratum at random in proportion to its size.
It is a method of sampling to give representation to all strata
of society or population such as selecting sample from defined
areas, classes, ages, sexes etc.
Merits
10. Biostatistics-36
1. Proportionate representative sample strata are secured.
2. It gives greater accuracy.
4. Multistage Sampling
This method refers to the sampling procedures carried out in
several stages using random sampling techniques.
This is employed in large country surveys.
According to the characteristics of the population we will have
to proceed
Example: By random sampling
Select --- 100 from village
---- 100 from Thana
---- 100 from Districts
---- 100 from Division
Merits
It introduces flexibility in sampling, which is lacking in other
techniques.
It enables the use of existing division and subdivision which
saves extra labour.
5. Multiphase Sampling
In this method part of the information is collected from the
whole sample and part from the sub-sample.
Example: In a tuberculosis survey,
Physical examination or Mantoux test may be done in all
cases of the sample in the first phase.
in the second phase X-ray of the chest may be done in
Mantoux positive cases and in those with clinical symptoms,
while
Sputum may be examined in X-ray positive cases in the third
phase only.
Number in the sub-samples in 2nd and 3rd phases will become
successively smaller and smaller.
Merit
Survey by such procedure will be less costly, less laborious and
more purposeful.
11. Biostatistics-37
6. Cluster Sampling
A cluster is a randomly selected group. This method is used
when units of population are natural groups or clusters such as
villages, wards blocks, slums of a town, factories, workshops or
children of a school etc.
C. non-probability sampling
[Q: Write short notes on: Non probability sampling. (BSMMU,
January 2009, January 2010)]
1. Quota sampling
It is the nonprobability equivalent of stratified sampling. Like
stratified sampling, the researcher first identifies the stratums
and their proportions as they are represented in the
population. Then convenience or judgment sampling is used to
select the required number of subjects from each stratum. This
(convenience or judgment) differs quota sampling from
stratified sampling, where the stratums are filled by random
sampling.
Advantages
Obvious advantages of quota sampling are
the speed with which information can be collected,
the lower cost of doing so and
the convenience it represents.
Disadvantages
The non-random element is its greatest weakness and quota
versus probability has been a matter of controversy for many
years. Within quota the sampling may be unrepresentative (e.g.
all young, attractive females)
2. Convenience sampling
In this method the units of sample that are convenient to
collect at the time of data collection are included in the study.
It is also known as accidental sampling, haphazard sampling or
chunk.
12. Biostatistics-38
Clinic or hospital based studies use convenience sampling.
Sample may be quite unrepresentative of the population.
3. Judgment sampling
It is a process of sampling where personal judgment
determines the selection of the sample in terms of purpose of
the sample.
In this method of sampling the choice of sample items depend
exclusively on the judgment of the investigator.
Investigator exercises his judgment in the choice and includes
those items in the sample which he or she thinks are most
typical of the population with regards to characteristics under
investigation.
The success of this method depends upon excellence in
judgment; still it is not representative because the sample may
be affected by the personal prejudice or bias of the
investigator.
Statistical inference
Statistical inference is to determine what can reasonably be
concluded about an entire population on the basis of having
examined only a limited sample of instances drawn from that
population.
So inferences about the characteristics of a population are
based on data from a sample.
Difference between Sample and Population
Sample Population
1. Sample is the part or
subset of whole group
which represents the
1. Population is the entire
group of subject or person
or study element about
13. Biostatistics-39
whole group
2. Sample always fixed in
number.
3. The summary values
derived from sample are
called statistics – e.g. –
Sample mean ( x ) ,
Sample variance (s2
) ,
Sample standard
deviation (s) etc.
which information is
desired.
2. Population may be finite or
infinite number.
3. The summary values
derived from population are
called parameter- e.g. –
Population mean (μ ) ,
Population variance ( o2
) ,
Population standard
deviation ( o) etc.