SlideShare a Scribd company logo
1 of 23
Data Visualization Intro II
Sampling – Sampling Methods – Experiments - Examining
numerical data - Considering categorical data -
Segmented bar and mosaic plots
“Sample should be representative of whole population”
Why Sampling is Random?
Almost all statistical methods are based on the notion of
implied randomness. If observational data are not
collected in a random framework from a population,
these statistical methods - the estimates and errors
associated with the estimates - are not reliable.
Sampling
A prospective study identies
individuals and collects information
as events unfold.
For instance, medical researchers
may identify and follow a group of
similar individuals over many years
to assess the possible inuences of
behavior on cancer risk.
Observational studies and sampling
Sampling Methods
Retrospective studies collect data
after events have taken place, e.g.
researchers may review past events
in medical records. For instance,
central government may have
collected data during the 2010 census
(e.g. county population acounts).
∗ Some data sets, may contain both prospectively- and
retrospectively-collected variables. Local
governments prospectively collect some variables as
events unfolded (e.g. retails sales) while the federal
government retrospectively
∗ Observational studies and sampling strategies:
∗ Generally, data in observational studies are collected
only by monitoring what occurs.
∗ observational studies are generally only sufficient to
show associations/coorelation.
∗ Suppose an observational study tracked sunscreen
use and skin cancer, and it was found that the more
sunscreen someone used, the more likely the person
was to have skin cancer. Does this mean sunscreen
causes skin cancer?
∗ One important piece of information that is absent is
sun exposure. Exposure to the sun is unaccounted for
in the simple investigation.
Exercise
∗ Sun exposure is a variable that is correlated with both
the explanatory and response variables.
∗ This is called as confounding variable.
Meaning of Confounding: cause surprise or confusion in (someone), especially
by not according with their expectations.
"the inflation figure confounded economic analysts"
Sampling Methods
Simple random sampling is probably the most intuitive
form of random sampling.
In general, a sample is referred to as “simple random" if
each case in the population has an equal chance of being
included in the final sample and knowing that a case is
included in a sample does not provide useful
information about which other cases are included.
Eg. Lottery system
Simple random sampling
Simple random sampling was used to
randomly select the 18 cases.
Stratified sampling is a divide-and-conquer sampling strategy.
The population is divided into groups called strata.
The strata are chosen so that similar cases are grouped together, then a second
sampling method, usually simple random sampling, is employed within each
stratum.
In the cricketers salary example, the teams could represent the strata, since
some teams have a lot more money (up to 4 times as much!). Then we might
randomly sample 4 players from each team for a total of 120 players.
Stratied sampling is especially useful when the cases in each stratum are very
similar with respect to the outcome of interest. The downside is that analyzing
data from a stratied sample is a more complex task than analyzing data from a
simple random sample.
Stratified sampling
Stratified sampling was used: cases were grouped
into strata, then simple random sampling was
employed within each stratum.
∗ We break up the population into many groups, called
clusters.
∗ Then we sample a fixed number of clusters and
include all observations from each of those clusters in
the sample.
Cluster Sample
Cluster sampling was used. Here, data were binned into nine clusters,three
of these clusters were sampled, and all observations within these three
clusters were included in the sample.
∗ A multistage sample is like a cluster sample, but
rather than keeping all observations in each cluster,
we collect a random sample within each selected
cluster.
Multistage Sample
Multistage sampling was used. It diers from cluster sampling
in that of the clusters selected, we randomly select a subset of
each cluster to be included in the sample.
∗ Sometimes cluster or multistage sampling can be
more economical than the alternative sampling
techniques.
∗ Also, unlike stratified sampling, these approaches are
most helpful when there is a lot of case-to-case
variability within a cluster but the clusters themselves
don't look very different from one another.
∗ For example, if neighborhoods represented clusters,
then cluster or multistage sampling work best when
the neighborhoods are very diverse.
∗ A downside of these methods is that more advanced
analysis techniques are typically required.
Points to remember
Randomized experiments are generally built on four
principles.
∗Controlling
∗Randomization
∗Replication.
∗Blocking
Principles of experimental Study
Controlling
Researchers assign treatments to cases, and they do their best
to control any other differences in the groups. For example,
when patients take a drug in pill form, some patients take the
pill with only a sip of water while others may have it with an
entire glass of water. To control for the effect of water
consumption, a doctor may ask all patients to drink a 12 ounce
glass of water with the pill.
Randomization
Researchers randomize patients into treatment groups to
account for variables that cannot be controlled. For example,
some patients may be more susceptible to a disease than others
due to their dietary habits. Randomizing patients into the
treatment or control group helps even out such differences,
and it also prevents accidental bias from entering the study.
Replication
The more cases researchers observe, the more accurately they
can estimate the effect of the explanatory variable on the
response. In a single study, we replicate by collecting a
sufficiently large sample. Additionally, a group of scientists may
replicate an entire study to verify an earlier finding.
Blocking
Researchers sometimes know or suspect that variables, other
than the treatment, inuence the response. Under these
circumstances, they may rst group individuals based on this
variable into blocks and then randomize cases within each
block to the treatment groups. This strategy is often referred
to as blocking.
Blocking using a
variable depicting
patient risk. Patients are
rest divided into low-
risk and high-risk blocks,
then each block is
evenly separated into
the treatment groups
using randomization.
This strategy ensures an
equal representation of
patients in each
treatment group from
both the low-risk and
high-risk categories.
Researchers aren't usually interested in the emotional effect,
which might bias the study. To circumvent this problem,
researchers do not want patients to know which group they are
in. When researchers keep the patients uninformed about their
treatment, the study is said to be blind.
If you are in the treatment group, you are given a fancy new
drug that you anticipate will help you. On the other hand, a
person in the other group doesn't receive the drug and sits idly,
hoping her participation doesn't increase her risk of death.
∗The solution to this problem is to give fake treatments to
patients in the control group. A fake treatment is called a
placebo, and an effective placebo is the key to making a study
truly blind. This effect has been dubbed the placebo effect.
The patients are not the only ones who should be blinded:
doctors and researchers can accidentally bias a study.
To guard against this bias, which again has been found to
have a measurable eect in some instances, most modern
studies employ a double-blind setup where doctors or
researchers who interact with patients are, just like the
patients, unaware of who is or is not receiving the
treatment.
It is important to incorporate the first three experimental design
principles into any study. Blocking is a slightly more advanced
technique.

More Related Content

What's hot

Meta-analysis when the normality assumptions are violated (2008)
Meta-analysis when the normality assumptions are violated (2008)Meta-analysis when the normality assumptions are violated (2008)
Meta-analysis when the normality assumptions are violated (2008)Evangelos Kontopantelis
 
study design of clinical research
study design of clinical researchstudy design of clinical research
study design of clinical researchMD Jahidul Islam
 
Allocation Concealment
Allocation ConcealmentAllocation Concealment
Allocation ConcealmentTulasi Raman
 
What Are the Different Types of Clinical Research or who can participate in it?
What Are the Different Types of Clinical Research or who can participate in it?What Are the Different Types of Clinical Research or who can participate in it?
What Are the Different Types of Clinical Research or who can participate in it?Vial Trials
 
Designs and sample size in medical resarch
Designs and sample size in medical resarchDesigns and sample size in medical resarch
Designs and sample size in medical resarchAbhaya Indrayan
 
Sample size estimation
Sample size estimationSample size estimation
Sample size estimationHanaaBayomy
 
Sample size estimation
Sample size estimationSample size estimation
Sample size estimationMonali2011
 
RSS 2013 - A re-analysis of the Cochrane Library data]
RSS 2013 - A re-analysis of the Cochrane Library data]RSS 2013 - A re-analysis of the Cochrane Library data]
RSS 2013 - A re-analysis of the Cochrane Library data]Evangelos Kontopantelis
 
Parmetric and non parametric statistical test in clinical trails
Parmetric and non parametric statistical test in clinical trailsParmetric and non parametric statistical test in clinical trails
Parmetric and non parametric statistical test in clinical trailsVinod Pagidipalli
 
Research methodology3
Research methodology3Research methodology3
Research methodology3Tosif Ahmad
 
Randomized Controlled Trials (RCTs)
Randomized Controlled Trials (RCTs)Randomized Controlled Trials (RCTs)
Randomized Controlled Trials (RCTs)Amit Agrawal
 
Metanalysis Lecture
Metanalysis LectureMetanalysis Lecture
Metanalysis Lecturedrmomusa
 
Evidence based surgery
Evidence based surgeryEvidence based surgery
Evidence based surgeryDeepak Kumar
 
An overview of clinical research the lay of the land
An overview of clinical research   the lay of the  landAn overview of clinical research   the lay of the  land
An overview of clinical research the lay of the landEfrain Ariel Romero Zepeda
 

What's hot (20)

Basics in Research
Basics in Research Basics in Research
Basics in Research
 
Meta-analysis when the normality assumptions are violated (2008)
Meta-analysis when the normality assumptions are violated (2008)Meta-analysis when the normality assumptions are violated (2008)
Meta-analysis when the normality assumptions are violated (2008)
 
study design of clinical research
study design of clinical researchstudy design of clinical research
study design of clinical research
 
Allocation Concealment
Allocation ConcealmentAllocation Concealment
Allocation Concealment
 
What Are the Different Types of Clinical Research or who can participate in it?
What Are the Different Types of Clinical Research or who can participate in it?What Are the Different Types of Clinical Research or who can participate in it?
What Are the Different Types of Clinical Research or who can participate in it?
 
Standard error
Standard error Standard error
Standard error
 
Designs and sample size in medical resarch
Designs and sample size in medical resarchDesigns and sample size in medical resarch
Designs and sample size in medical resarch
 
Sample size estimation
Sample size estimationSample size estimation
Sample size estimation
 
Sample size estimation
Sample size estimationSample size estimation
Sample size estimation
 
RSS 2013 - A re-analysis of the Cochrane Library data]
RSS 2013 - A re-analysis of the Cochrane Library data]RSS 2013 - A re-analysis of the Cochrane Library data]
RSS 2013 - A re-analysis of the Cochrane Library data]
 
Parmetric and non parametric statistical test in clinical trails
Parmetric and non parametric statistical test in clinical trailsParmetric and non parametric statistical test in clinical trails
Parmetric and non parametric statistical test in clinical trails
 
Metaanalysis copy
Metaanalysis    copyMetaanalysis    copy
Metaanalysis copy
 
Chapter 1
Chapter 1Chapter 1
Chapter 1
 
Research methodology3
Research methodology3Research methodology3
Research methodology3
 
Randomized Controlled Trials (RCTs)
Randomized Controlled Trials (RCTs)Randomized Controlled Trials (RCTs)
Randomized Controlled Trials (RCTs)
 
Metanalysis Lecture
Metanalysis LectureMetanalysis Lecture
Metanalysis Lecture
 
Evidence based surgery
Evidence based surgeryEvidence based surgery
Evidence based surgery
 
Analysis and Interpretation
Analysis and InterpretationAnalysis and Interpretation
Analysis and Interpretation
 
An overview of clinical research the lay of the land
An overview of clinical research   the lay of the  landAn overview of clinical research   the lay of the  land
An overview of clinical research the lay of the land
 
Sample size calculation
Sample size calculationSample size calculation
Sample size calculation
 

Similar to Data visualization intro2

Clinical application of statistical analysis
Clinical application of statistical analysisClinical application of statistical analysis
Clinical application of statistical analysisUmair hanif
 
Sampling techniques and sample size calculations.pptx
Sampling techniques and sample size calculations.pptxSampling techniques and sample size calculations.pptx
Sampling techniques and sample size calculations.pptxDr Debasish Mohapatra
 
Sampling in Research
Sampling in ResearchSampling in Research
Sampling in ResearchEnzo Engada
 
XNN001 Introductory epidemiological concepts - sampling, bias and error
XNN001 Introductory epidemiological concepts - sampling, bias and errorXNN001 Introductory epidemiological concepts - sampling, bias and error
XNN001 Introductory epidemiological concepts - sampling, bias and errorramseyr
 
Sampling research method
Sampling research methodSampling research method
Sampling research methodM.S. SaHiR
 
HEALTHCARE RESEARCH METHODS: Primary Studies: Selecting a Sample Population a...
HEALTHCARE RESEARCH METHODS: Primary Studies: Selecting a Sample Population a...HEALTHCARE RESEARCH METHODS: Primary Studies: Selecting a Sample Population a...
HEALTHCARE RESEARCH METHODS: Primary Studies: Selecting a Sample Population a...Dr. Khaled OUANES
 
Biostats epidemiological studies
Biostats epidemiological studiesBiostats epidemiological studies
Biostats epidemiological studiesJagdish Dukre
 
Clinical research (study designs)
Clinical research  (study designs)Clinical research  (study designs)
Clinical research (study designs)Mohamed Fahmy Dehim
 

Similar to Data visualization intro2 (20)

Research
ResearchResearch
Research
 
Clinical application of statistical analysis
Clinical application of statistical analysisClinical application of statistical analysis
Clinical application of statistical analysis
 
Chapter02
Chapter02Chapter02
Chapter02
 
Chapter02
Chapter02Chapter02
Chapter02
 
3 cross sectional study
3 cross sectional study3 cross sectional study
3 cross sectional study
 
3 cross sectional study
3 cross sectional study3 cross sectional study
3 cross sectional study
 
Sampling techniques and sample size calculations.pptx
Sampling techniques and sample size calculations.pptxSampling techniques and sample size calculations.pptx
Sampling techniques and sample size calculations.pptx
 
Statistics five
Statistics fiveStatistics five
Statistics five
 
Sec 1.3 collecting sample data
Sec 1.3 collecting sample data  Sec 1.3 collecting sample data
Sec 1.3 collecting sample data
 
Sampling
Sampling Sampling
Sampling
 
Sampling in Research
Sampling in ResearchSampling in Research
Sampling in Research
 
XNN001 Introductory epidemiological concepts - sampling, bias and error
XNN001 Introductory epidemiological concepts - sampling, bias and errorXNN001 Introductory epidemiological concepts - sampling, bias and error
XNN001 Introductory epidemiological concepts - sampling, bias and error
 
Sampling research method
Sampling research methodSampling research method
Sampling research method
 
Clinical study designs
Clinical study designsClinical study designs
Clinical study designs
 
Randomization, Bias, Blinding
Randomization, Bias, Blinding Randomization, Bias, Blinding
Randomization, Bias, Blinding
 
HEALTHCARE RESEARCH METHODS: Primary Studies: Selecting a Sample Population a...
HEALTHCARE RESEARCH METHODS: Primary Studies: Selecting a Sample Population a...HEALTHCARE RESEARCH METHODS: Primary Studies: Selecting a Sample Population a...
HEALTHCARE RESEARCH METHODS: Primary Studies: Selecting a Sample Population a...
 
Chapter 1
Chapter 1Chapter 1
Chapter 1
 
Biostats epidemiological studies
Biostats epidemiological studiesBiostats epidemiological studies
Biostats epidemiological studies
 
OBSERVATIONAL STUDIES PPT.pptx
OBSERVATIONAL STUDIES PPT.pptxOBSERVATIONAL STUDIES PPT.pptx
OBSERVATIONAL STUDIES PPT.pptx
 
Clinical research (study designs)
Clinical research  (study designs)Clinical research  (study designs)
Clinical research (study designs)
 

More from Shruti Nigam (CWM, AFP) (11)

Morph transition 1.pptx
Morph transition 1.pptxMorph transition 1.pptx
Morph transition 1.pptx
 
Morph transition 2.pptx
Morph transition 2.pptxMorph transition 2.pptx
Morph transition 2.pptx
 
Statistics.pdf
Statistics.pdfStatistics.pdf
Statistics.pdf
 
Business forecasting project border
Business forecasting project borderBusiness forecasting project border
Business forecasting project border
 
Data analysis property area analysis via powerbi
Data analysis property area analysis via powerbiData analysis property area analysis via powerbi
Data analysis property area analysis via powerbi
 
Actionable results to enhance Employee satisfaction score analysis via Tableau
Actionable results to enhance Employee satisfaction score analysis via TableauActionable results to enhance Employee satisfaction score analysis via Tableau
Actionable results to enhance Employee satisfaction score analysis via Tableau
 
Data visualization intro
Data visualization introData visualization intro
Data visualization intro
 
Finanacial institutions nature and role
Finanacial institutions nature and roleFinanacial institutions nature and role
Finanacial institutions nature and role
 
Mutual funds
Mutual fundsMutual funds
Mutual funds
 
Fs unit-i nbfc
Fs unit-i nbfcFs unit-i nbfc
Fs unit-i nbfc
 
NBFC MBA I SEMESTER
NBFC MBA I SEMESTERNBFC MBA I SEMESTER
NBFC MBA I SEMESTER
 

Recently uploaded

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 

Recently uploaded (20)

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 

Data visualization intro2

  • 1. Data Visualization Intro II Sampling – Sampling Methods – Experiments - Examining numerical data - Considering categorical data - Segmented bar and mosaic plots
  • 2. “Sample should be representative of whole population” Why Sampling is Random? Almost all statistical methods are based on the notion of implied randomness. If observational data are not collected in a random framework from a population, these statistical methods - the estimates and errors associated with the estimates - are not reliable. Sampling
  • 3. A prospective study identies individuals and collects information as events unfold. For instance, medical researchers may identify and follow a group of similar individuals over many years to assess the possible inuences of behavior on cancer risk. Observational studies and sampling Sampling Methods Retrospective studies collect data after events have taken place, e.g. researchers may review past events in medical records. For instance, central government may have collected data during the 2010 census (e.g. county population acounts).
  • 4. ∗ Some data sets, may contain both prospectively- and retrospectively-collected variables. Local governments prospectively collect some variables as events unfolded (e.g. retails sales) while the federal government retrospectively ∗ Observational studies and sampling strategies: ∗ Generally, data in observational studies are collected only by monitoring what occurs. ∗ observational studies are generally only sufficient to show associations/coorelation.
  • 5. ∗ Suppose an observational study tracked sunscreen use and skin cancer, and it was found that the more sunscreen someone used, the more likely the person was to have skin cancer. Does this mean sunscreen causes skin cancer? ∗ One important piece of information that is absent is sun exposure. Exposure to the sun is unaccounted for in the simple investigation. Exercise
  • 6. ∗ Sun exposure is a variable that is correlated with both the explanatory and response variables. ∗ This is called as confounding variable. Meaning of Confounding: cause surprise or confusion in (someone), especially by not according with their expectations. "the inflation figure confounded economic analysts"
  • 8. Simple random sampling is probably the most intuitive form of random sampling. In general, a sample is referred to as “simple random" if each case in the population has an equal chance of being included in the final sample and knowing that a case is included in a sample does not provide useful information about which other cases are included. Eg. Lottery system Simple random sampling
  • 9. Simple random sampling was used to randomly select the 18 cases.
  • 10. Stratified sampling is a divide-and-conquer sampling strategy. The population is divided into groups called strata. The strata are chosen so that similar cases are grouped together, then a second sampling method, usually simple random sampling, is employed within each stratum. In the cricketers salary example, the teams could represent the strata, since some teams have a lot more money (up to 4 times as much!). Then we might randomly sample 4 players from each team for a total of 120 players. Stratied sampling is especially useful when the cases in each stratum are very similar with respect to the outcome of interest. The downside is that analyzing data from a stratied sample is a more complex task than analyzing data from a simple random sample. Stratified sampling
  • 11. Stratified sampling was used: cases were grouped into strata, then simple random sampling was employed within each stratum.
  • 12. ∗ We break up the population into many groups, called clusters. ∗ Then we sample a fixed number of clusters and include all observations from each of those clusters in the sample. Cluster Sample
  • 13. Cluster sampling was used. Here, data were binned into nine clusters,three of these clusters were sampled, and all observations within these three clusters were included in the sample.
  • 14. ∗ A multistage sample is like a cluster sample, but rather than keeping all observations in each cluster, we collect a random sample within each selected cluster. Multistage Sample
  • 15. Multistage sampling was used. It diers from cluster sampling in that of the clusters selected, we randomly select a subset of each cluster to be included in the sample.
  • 16. ∗ Sometimes cluster or multistage sampling can be more economical than the alternative sampling techniques. ∗ Also, unlike stratified sampling, these approaches are most helpful when there is a lot of case-to-case variability within a cluster but the clusters themselves don't look very different from one another. ∗ For example, if neighborhoods represented clusters, then cluster or multistage sampling work best when the neighborhoods are very diverse. ∗ A downside of these methods is that more advanced analysis techniques are typically required. Points to remember
  • 17. Randomized experiments are generally built on four principles. ∗Controlling ∗Randomization ∗Replication. ∗Blocking Principles of experimental Study
  • 18. Controlling Researchers assign treatments to cases, and they do their best to control any other differences in the groups. For example, when patients take a drug in pill form, some patients take the pill with only a sip of water while others may have it with an entire glass of water. To control for the effect of water consumption, a doctor may ask all patients to drink a 12 ounce glass of water with the pill. Randomization Researchers randomize patients into treatment groups to account for variables that cannot be controlled. For example, some patients may be more susceptible to a disease than others due to their dietary habits. Randomizing patients into the treatment or control group helps even out such differences, and it also prevents accidental bias from entering the study.
  • 19. Replication The more cases researchers observe, the more accurately they can estimate the effect of the explanatory variable on the response. In a single study, we replicate by collecting a sufficiently large sample. Additionally, a group of scientists may replicate an entire study to verify an earlier finding. Blocking Researchers sometimes know or suspect that variables, other than the treatment, inuence the response. Under these circumstances, they may rst group individuals based on this variable into blocks and then randomize cases within each block to the treatment groups. This strategy is often referred to as blocking.
  • 20. Blocking using a variable depicting patient risk. Patients are rest divided into low- risk and high-risk blocks, then each block is evenly separated into the treatment groups using randomization. This strategy ensures an equal representation of patients in each treatment group from both the low-risk and high-risk categories.
  • 21. Researchers aren't usually interested in the emotional effect, which might bias the study. To circumvent this problem, researchers do not want patients to know which group they are in. When researchers keep the patients uninformed about their treatment, the study is said to be blind. If you are in the treatment group, you are given a fancy new drug that you anticipate will help you. On the other hand, a person in the other group doesn't receive the drug and sits idly, hoping her participation doesn't increase her risk of death. ∗The solution to this problem is to give fake treatments to patients in the control group. A fake treatment is called a placebo, and an effective placebo is the key to making a study truly blind. This effect has been dubbed the placebo effect.
  • 22. The patients are not the only ones who should be blinded: doctors and researchers can accidentally bias a study. To guard against this bias, which again has been found to have a measurable eect in some instances, most modern studies employ a double-blind setup where doctors or researchers who interact with patients are, just like the patients, unaware of who is or is not receiving the treatment.
  • 23. It is important to incorporate the first three experimental design principles into any study. Blocking is a slightly more advanced technique.

Editor's Notes

  1. Generally, data in observational studies are collected only by monitoring what occurs, while experiments require the primary explanatory variable in a study be assigned for each subject by the researchers. Making causal conclusions based on experiments is often reasonable. However, making the same causal conclusions based on observational data can be treacherous and is not rec- ommended. Thus, observational studies are generally only sucient to show associations.
  2. Some previous research tells us that using sunscreen actually reduces skin cancer risk, so maybe there is another variable that can explain this hypothetical association between sunscreen usage and skin cancer. If someone is out in the sun all day, she is more likely to use sunscreen and more likely to get skin cancer.
  3. While one method to justify making causal conclusions from observational studies is to exhaust the search for confounding variables, there is no guarantee that all confounding variables can be examined or measured.