SlideShare a Scribd company logo
1 of 58
52-Week Biotech Stock Price 1
100 Years of “Emma” 2
17 Years Superbowl Viewership 3
4 What is common among these time series data?
All wrong! All these time series are fabrications All these time series are “random walks” 5
6 Welcome to secondary data analysis.
Secondary Data Analysis B. Rey de Castro, Sc.D. Guest Researcher CDC National Center for Health Statistics University of Maryland College Park School of Public Health FMSC 720 Study Design in MCH Epidemiology November 30, 2010
Secondary Data Analysis Data that you did not collect yourself Both the data and study design are givens The statistical analysis is up to you 8
Uses for Secondary Data Hypothesis generation/testing Pilot data for grant proposals Expanding knowledge Publications
National Health and Nutrition Examination Survey (NHANES) http://www.cdc.gov/nchs/nhanes.htm Population Children, adults  nationwide Method Face to face interview Physical exams Content  Chronic and Infectious Disease Mental health and cognitive functioning Energy Balance Reproductive history and sexual behavior Respiratory disease Data N ~ 5,000 annually Initiated in 1960’s; Annual since 1999 On-line tutorial
    National Health Interview Survey                    (NHIS) http://www.cdc.gov/nchs/nhis.htm Population Households, families, adults, children nationwide Method Face to face interview Content Health conditions and behaviors, access to and use of health services Cancer Control Module (1987, 1992, 2000, 2003, and 2005) Energy Balance Cancer Screening  Sun Avoidance  Tobacco Use and Control  Genetic Testing Data N ~ 40,000 households (~87,000 individuals) annually Initiated in 1957
Other Federal Surveys National Longitudinal Mortality Study http://www.census.gov/nlms/ National Health Care Survey http://www.cdc.gov/nchs/nhcs.htm National Ambulatory Medical Care Survey http://www.cdc.gov/nchs/about/major/ahcd/ahcd1.htm Medical Expenditure Panel Survey http://www.meps.ahrq.gov/ Medicare Current Beneficiary Survey http://www.cms.hhs.gov/MCBS/ Medicare Health Outcomes Survey http://www.hosonline.org/ National Survey on Drug Use and Health http://www.oas.samhsa.gov/nhsda.htm National Survey of Family Growth http://www.cdc.gov/nchs/about/major/nsfg/nsfgbiblio.htm
Strengths Inexpensive data collection and design costs More statistical power: larger samples Broader geographic area Generalizable to national population Improves understanding of hypothesis Test trends over time Potential for linkage Person Geographically
Limitations 1 Substantial time spent on statistical analysis Cross-sectional Recall bias Mismatch: ideal and feasible hypothesis Mismatch: hypothesis and original purpose Generalizabilityto small areas impossible Specialized statistical techniques
Limitations 2 Quality Validity & reliability Changes to survey over time Poor documentation Restricted/conditional access Confidentiality 15
Recap Just a few examples of publicly available data Most are cross-sectional All employ a complex sampling design Many use multi-stage sampling Requires special software to analyze  e.g., SUDAAN Use of weighting, clustering, and stratification Differences in variance estimation methods
Complex Surveys 17
Statistical Weight The statistical weight of a sampled person is the number of people in the population that the person represents.  Weights derived from Selection probabilities Response rates Post-stratification adjustment  e.g., gender, education, income, region
Stratification Population divided before sampling into disjoint, exhaustive groups (strata) Members termed primary sampling units (PSUs)  Independent samples are taken in each strata Strata formed by similar geographic areas   e.g., NHANES: partition US counties into 49 strata based on region and economic/racial characteristics Sample 2 counties (PSUs) from each strata
Clustering Persons residing in a small area may have similar characteristics Thus, responses of subjects in small area are potentially correlated  Correlation must be accounted for in the analysis  Survey analysis programs do this through strata/PSU information
Variance Estimation for Surveys Linearization: Uses a Taylor series expansion to estimate variance of non-linear estimators  Default method for most programs Requires stratification and PSU information Replication: Calculates parameter estimates for each replicate and combines to estimate variance Jackknife with replicate weights available for SUDAAN, STATA, SAS and WesVAR
Replication vs. Linearization If survey doesn’t have replicate weights use the full sample weights and linearization If survey has replicate weights use them with the jackknife procedure Most software use linearization method Only SUDAAN, STATA, SAS, and WesVAR can incorporate replicate weights
Complex Survey Design Correct variance estimates Proper hypothesis testing Standard errors will tend to be larger  Less likely to make Type I error
Statistical Software for Analyzing Health Surveys  Specifically designed for analyzing data utilizing complex sampling designs: SUDAAN WesVar Others that can be used: SAS STATA SPSS Mplus
Data/Research Resources Univ. of Michigan Consortium for social research: http://www.icpsr.umich.edu/ UCLA Statistical Computing: http://www.ats.ucla.edu/stat/ BRFSS Maps http://apps.nccd.cdc.gov/gisbrfss/default.aspx State Cancer Profiles http://statecancerprofiles.cancer.gov/
References Korn, E.L. and Graubard, B.I. (1999). Analysis of  	Health Surveys. New York: John Wiley State Cancer Profiles: http://statecancerprofiles.cancer.gov/ SUDAAN: http://www.rti.org/SUDAAN/ SAS:  http://www.sas.com/ SPSS: http://www.spss.com/ STATA:  http://www.stata.com/ WesVar: http://www.westat.com/wesvar/ Mplus: http://www.statmodel.com/
Other Data Sources State registries Birth Death Cancer Emergency room admissions Acute outcomes 27
Intermission 28
Secondary Data Analysis Data that you did not collect yourself Both the data and study design are givens The statistical analysis is up to you 29
Lesson One 30 Integrity
Dirty Data Key-punch errors Invalid data Missing data Mislabeled variables Unknown variables 31
Preparing Data 32
Processing Data Recode data Label variables Format data 33
Investigation Reality checks Out-of-range values Descriptive statistics Ranges: out-of-range or improbable values Frequencies: missing values or classes Simple graphical display 34
Normal Ranges 35
Imputing Missing Values Increases available data Statistically more complex Defensibility Useful 36
Lesson Two Spend time up-front being sure about your data Foundation of sand or stone? Crystal clear case definition & recodes More time preparing than analyzing Prevents problems Simplifies analysis 37
Statistical Analysis Plan 38
Outcome 39
Design 40
Clustered Data 41
Longitudinal 42
Hierarchical 43
Diagnostics Independence Homoskedasticity Skewness Influential observations 44
Lesson Three Plan, then execute the plan Conform statistical technique to outcome and design Diagnostics 45
Case Study Ongoing spatial epidemiology project Complex survey Cross-sectional Data linkage Childhood asthma episodes Air pollution exposure 46
Case Study Air pollutant: acrolein EPA attributes >90% non-cancer respiratory health effects to acrolein No epidemiology to date 47
Data Linkage 48
National Health Interview Survey Health outcome Asthma episode in last 12 months 2000 – 2004 Children 3 – 17 years-old Parents of ~66,000 kids surveyed Nationally representative sample Complex survey weighting 49
National Health Interview Survey Potential Confounders Smoking household Acrolein industry household Age, sex, race Education, income, single-parent family Access to care, insurance Urban/rural Census regional division 50
National Air Toxics Assessment Air pollutant Acrolein Strong respiratory irritant Cigarette smoke; industrial emissions 2002 Modeled exposure assessment Census tracts nationwide 51
52 How would you link these two databases?
Geographic Linkage 53
54 But, requires access to confidential NHIS data.
NCHS Says Orient to data structure and contents Locate variables Download data Append & merge data Clean & recode data Format & label variables 55
NHIS Data Processing Extract and compile data by year Multiple files 2004 redesign Compile data 2000 – 2004 Formatting and variable names a pain Identify records with complete data Link to NATA Done confidentially by NCHS staff 56
Analysis Plan Hypothesis “Childhood asthma episodes are associated with census-tract-level estimates of acrolein exposure” Descriptive statistics Logistic regression Complex weighted variance estimation SAS-callable SUDAAN 57
Wisdom Network Cultivate relationships Front-line staff Principal investigators 58

More Related Content

What's hot

Sample size estimation
Sample size estimationSample size estimation
Sample size estimationHanaaBayomy
 
What does an odds ratio or relative risk mean?
What does an odds ratio or relative risk mean? What does an odds ratio or relative risk mean?
What does an odds ratio or relative risk mean? Terry Shaneyfelt
 
Case control studies
Case control studiesCase control studies
Case control studiesBruno Mmassy
 
Predictive value and likelihood ratio
Predictive value and likelihood ratio Predictive value and likelihood ratio
Predictive value and likelihood ratio Abino David
 
Case control studies
Case control studiesCase control studies
Case control studieskeshavapavan
 
Categorical data analysis
Categorical data analysisCategorical data analysis
Categorical data analysisSumit Das
 
Case control study
Case control studyCase control study
Case control studyswati shikha
 
Confounding and interaction seminar
Confounding and interaction seminarConfounding and interaction seminar
Confounding and interaction seminararundhati garud
 
Meta analysis techniques in epidemiology
Meta analysis techniques in epidemiologyMeta analysis techniques in epidemiology
Meta analysis techniques in epidemiologyBhoj Raj Singh
 
Sources of bias and error
Sources of bias and error Sources of bias and error
Sources of bias and error IAU Dent
 

What's hot (20)

Sample size estimation
Sample size estimationSample size estimation
Sample size estimation
 
Cross sectional study
Cross sectional studyCross sectional study
Cross sectional study
 
What does an odds ratio or relative risk mean?
What does an odds ratio or relative risk mean? What does an odds ratio or relative risk mean?
What does an odds ratio or relative risk mean?
 
Case control studies
Case control studiesCase control studies
Case control studies
 
Experimental Studies
Experimental StudiesExperimental Studies
Experimental Studies
 
Predictive value and likelihood ratio
Predictive value and likelihood ratio Predictive value and likelihood ratio
Predictive value and likelihood ratio
 
Case control studies
Case control studiesCase control studies
Case control studies
 
Case Control Study
Case Control StudyCase Control Study
Case Control Study
 
Bias and validity
Bias and validityBias and validity
Bias and validity
 
Biostatistics ppt
Biostatistics  pptBiostatistics  ppt
Biostatistics ppt
 
Categorical data analysis
Categorical data analysisCategorical data analysis
Categorical data analysis
 
META ANALYSIS
META ANALYSISMETA ANALYSIS
META ANALYSIS
 
Case control study
Case control studyCase control study
Case control study
 
Cross sec study dr rahul
Cross sec study dr rahulCross sec study dr rahul
Cross sec study dr rahul
 
Cohort study
Cohort studyCohort study
Cohort study
 
Confounding and interaction seminar
Confounding and interaction seminarConfounding and interaction seminar
Confounding and interaction seminar
 
Meta analysis techniques in epidemiology
Meta analysis techniques in epidemiologyMeta analysis techniques in epidemiology
Meta analysis techniques in epidemiology
 
Sources of bias and error
Sources of bias and error Sources of bias and error
Sources of bias and error
 
The odds ratio
The odds ratioThe odds ratio
The odds ratio
 
Case control study
Case control studyCase control study
Case control study
 

Viewers also liked

Summer training project report on
Summer training project report onSummer training project report on
Summer training project report onKantinath Banerjee
 
Project report on- "A study of digital marketing services"
Project report on- "A study of digital marketing services" Project report on- "A study of digital marketing services"
Project report on- "A study of digital marketing services" MarketerBoard
 
A project report on evaluation of financial performance based on ratio analysis
A project report on  evaluation of financial performance based on ratio analysisA project report on  evaluation of financial performance based on ratio analysis
A project report on evaluation of financial performance based on ratio analysisBabasab Patil
 
Project report on Financial Statement Analysis and interpretation of A Company
Project report on Financial Statement Analysis and interpretation of A CompanyProject report on Financial Statement Analysis and interpretation of A Company
Project report on Financial Statement Analysis and interpretation of A CompanyPinkey Rana
 
A project report on analysis of financial statement of icici bank
A project report on analysis of financial statement of  icici bankA project report on analysis of financial statement of  icici bank
A project report on analysis of financial statement of icici bankProjects Kart
 

Viewers also liked (6)

Nagender
NagenderNagender
Nagender
 
Summer training project report on
Summer training project report onSummer training project report on
Summer training project report on
 
Project report on- "A study of digital marketing services"
Project report on- "A study of digital marketing services" Project report on- "A study of digital marketing services"
Project report on- "A study of digital marketing services"
 
A project report on evaluation of financial performance based on ratio analysis
A project report on  evaluation of financial performance based on ratio analysisA project report on  evaluation of financial performance based on ratio analysis
A project report on evaluation of financial performance based on ratio analysis
 
Project report on Financial Statement Analysis and interpretation of A Company
Project report on Financial Statement Analysis and interpretation of A CompanyProject report on Financial Statement Analysis and interpretation of A Company
Project report on Financial Statement Analysis and interpretation of A Company
 
A project report on analysis of financial statement of icici bank
A project report on analysis of financial statement of  icici bankA project report on analysis of financial statement of  icici bank
A project report on analysis of financial statement of icici bank
 

Similar to Secondary Data Analysis

WGHA Discovery Series: Ali Mokdad
WGHA Discovery Series: Ali MokdadWGHA Discovery Series: Ali Mokdad
WGHA Discovery Series: Ali MokdadUWGlobalHealth
 
Epidemiological study Design Case Control And Cohort Study.ppt
Epidemiological study Design Case Control And Cohort Study.pptEpidemiological study Design Case Control And Cohort Study.ppt
Epidemiological study Design Case Control And Cohort Study.pptTauseef Jawaid
 
Big data, RWE and AI in Clinical Trials made simple
Big data, RWE and AI in Clinical Trials made simpleBig data, RWE and AI in Clinical Trials made simple
Big data, RWE and AI in Clinical Trials made simpleHadas Jacoby
 
Epidemiological study designs
Epidemiological study designs Epidemiological study designs
Epidemiological study designs Tauseef Jawaid
 
Epidemiological methods
Epidemiological methodsEpidemiological methods
Epidemiological methodsBhoj Raj Singh
 
Math, Stats and CS in Public Health and Medical Research
Math, Stats and CS in Public Health and Medical ResearchMath, Stats and CS in Public Health and Medical Research
Math, Stats and CS in Public Health and Medical ResearchJessica Minnier
 
Genetic testing evaluation part 1 2018
Genetic testing evaluation part 1 2018Genetic testing evaluation part 1 2018
Genetic testing evaluation part 1 2018John Shoffner, MD
 
Khoury ashg2014
Khoury ashg2014Khoury ashg2014
Khoury ashg2014muink
 
2022-06-07 Berman Lew Great Plains Conference FINAL.pptx
2022-06-07 Berman Lew Great Plains Conference FINAL.pptx2022-06-07 Berman Lew Great Plains Conference FINAL.pptx
2022-06-07 Berman Lew Great Plains Conference FINAL.pptxLew Berman
 
PrEP Implementation Planning for the US
PrEP Implementation Planning for the USPrEP Implementation Planning for the US
PrEP Implementation Planning for the USCHAMP Network
 
Methods for Observational Comparative Effectiveness Research on Healthcare De...
Methods for Observational Comparative Effectiveness Research on Healthcare De...Methods for Observational Comparative Effectiveness Research on Healthcare De...
Methods for Observational Comparative Effectiveness Research on Healthcare De...Marion Sills
 
The Learning Health System: Thinking and Acting Across Scales
The Learning Health System: Thinking and Acting Across ScalesThe Learning Health System: Thinking and Acting Across Scales
The Learning Health System: Thinking and Acting Across ScalesPhilip Payne
 
ODF III - 3.15.16 - Day Two Morning Sessions
ODF III - 3.15.16 - Day Two Morning SessionsODF III - 3.15.16 - Day Two Morning Sessions
ODF III - 3.15.16 - Day Two Morning SessionsMichael Kerr
 
PDAs for Nursing Students: Technology at Your Fingertips
PDAs for Nursing Students: Technology at Your FingertipsPDAs for Nursing Students: Technology at Your Fingertips
PDAs for Nursing Students: Technology at Your FingertipsCynthia.Russell
 
A Two-sample Approach for State Estimates of a Chronic Condition Outcome
A Two-sample Approach for State Estimates of a Chronic Condition OutcomeA Two-sample Approach for State Estimates of a Chronic Condition Outcome
A Two-sample Approach for State Estimates of a Chronic Condition Outcomesoder145
 
Embi cri review-2013-final
Embi cri review-2013-finalEmbi cri review-2013-final
Embi cri review-2013-finalPeter Embi
 
Matching the Research Design to the Study Question
Matching the Research Design to the Study QuestionMatching the Research Design to the Study Question
Matching the Research Design to the Study QuestionAcademyHealth
 
Clinical Research Informatics (CRI) Year-in-Review 2014
Clinical Research Informatics (CRI) Year-in-Review 2014Clinical Research Informatics (CRI) Year-in-Review 2014
Clinical Research Informatics (CRI) Year-in-Review 2014Peter Embi
 

Similar to Secondary Data Analysis (20)

WGHA Discovery Series: Ali Mokdad
WGHA Discovery Series: Ali MokdadWGHA Discovery Series: Ali Mokdad
WGHA Discovery Series: Ali Mokdad
 
Epidemiological study Design Case Control And Cohort Study.ppt
Epidemiological study Design Case Control And Cohort Study.pptEpidemiological study Design Case Control And Cohort Study.ppt
Epidemiological study Design Case Control And Cohort Study.ppt
 
Big data, RWE and AI in Clinical Trials made simple
Big data, RWE and AI in Clinical Trials made simpleBig data, RWE and AI in Clinical Trials made simple
Big data, RWE and AI in Clinical Trials made simple
 
Epidemiological study designs
Epidemiological study designs Epidemiological study designs
Epidemiological study designs
 
Epidemiological methods
Epidemiological methodsEpidemiological methods
Epidemiological methods
 
Math, Stats and CS in Public Health and Medical Research
Math, Stats and CS in Public Health and Medical ResearchMath, Stats and CS in Public Health and Medical Research
Math, Stats and CS in Public Health and Medical Research
 
Genetic testing evaluation part 1 2018
Genetic testing evaluation part 1 2018Genetic testing evaluation part 1 2018
Genetic testing evaluation part 1 2018
 
Khoury ashg2014
Khoury ashg2014Khoury ashg2014
Khoury ashg2014
 
2022-06-07 Berman Lew Great Plains Conference FINAL.pptx
2022-06-07 Berman Lew Great Plains Conference FINAL.pptx2022-06-07 Berman Lew Great Plains Conference FINAL.pptx
2022-06-07 Berman Lew Great Plains Conference FINAL.pptx
 
PrEP Implementation Planning for the US
PrEP Implementation Planning for the USPrEP Implementation Planning for the US
PrEP Implementation Planning for the US
 
Methods for Observational Comparative Effectiveness Research on Healthcare De...
Methods for Observational Comparative Effectiveness Research on Healthcare De...Methods for Observational Comparative Effectiveness Research on Healthcare De...
Methods for Observational Comparative Effectiveness Research on Healthcare De...
 
The Learning Health System: Thinking and Acting Across Scales
The Learning Health System: Thinking and Acting Across ScalesThe Learning Health System: Thinking and Acting Across Scales
The Learning Health System: Thinking and Acting Across Scales
 
ODF III - 3.15.16 - Day Two Morning Sessions
ODF III - 3.15.16 - Day Two Morning SessionsODF III - 3.15.16 - Day Two Morning Sessions
ODF III - 3.15.16 - Day Two Morning Sessions
 
Research design fw 2011
Research design fw 2011Research design fw 2011
Research design fw 2011
 
PDAs for Nursing Students: Technology at Your Fingertips
PDAs for Nursing Students: Technology at Your FingertipsPDAs for Nursing Students: Technology at Your Fingertips
PDAs for Nursing Students: Technology at Your Fingertips
 
A Two-sample Approach for State Estimates of a Chronic Condition Outcome
A Two-sample Approach for State Estimates of a Chronic Condition OutcomeA Two-sample Approach for State Estimates of a Chronic Condition Outcome
A Two-sample Approach for State Estimates of a Chronic Condition Outcome
 
Embi cri review-2013-final
Embi cri review-2013-finalEmbi cri review-2013-final
Embi cri review-2013-final
 
Matching the Research Design to the Study Question
Matching the Research Design to the Study QuestionMatching the Research Design to the Study Question
Matching the Research Design to the Study Question
 
Clinical Research Informatics (CRI) Year-in-Review 2014
Clinical Research Informatics (CRI) Year-in-Review 2014Clinical Research Informatics (CRI) Year-in-Review 2014
Clinical Research Informatics (CRI) Year-in-Review 2014
 
From Research to Practice: New Models for Data-sharing and Collaboration to I...
From Research to Practice: New Models for Data-sharing and Collaboration to I...From Research to Practice: New Models for Data-sharing and Collaboration to I...
From Research to Practice: New Models for Data-sharing and Collaboration to I...
 

More from REY DECASTRO

Population-Weighted Exposure to 174 Air Toxics in a Representative Sample of...
Population-Weighted Exposure to 174 Air Toxics in a Representative Sample of...Population-Weighted Exposure to 174 Air Toxics in a Representative Sample of...
Population-Weighted Exposure to 174 Air Toxics in a Representative Sample of...REY DECASTRO
 
Association of Urinary Arsenic Species with Diet in a Representative Sample o...
Association of Urinary Arsenic Species with Diet in a Representative Sample o...Association of Urinary Arsenic Species with Diet in a Representative Sample o...
Association of Urinary Arsenic Species with Diet in a Representative Sample o...REY DECASTRO
 
Acrolein and COPD in a Nationally Representative Sample of United States Adul...
Acrolein and COPD in a Nationally Representative Sample of United States Adul...Acrolein and COPD in a Nationally Representative Sample of United States Adul...
Acrolein and COPD in a Nationally Representative Sample of United States Adul...REY DECASTRO
 
Bootstrap estimation of variance from ROC curve analysis of NHANES complex su...
Bootstrap estimation of variance from ROC curve analysis of NHANES complex su...Bootstrap estimation of variance from ROC curve analysis of NHANES complex su...
Bootstrap estimation of variance from ROC curve analysis of NHANES complex su...REY DECASTRO
 
Perchlorate Exposure from Diet and Drinking Water in a Representative Sample ...
Perchlorate Exposure from Diet and Drinking Water in a Representative Sample ...Perchlorate Exposure from Diet and Drinking Water in a Representative Sample ...
Perchlorate Exposure from Diet and Drinking Water in a Representative Sample ...REY DECASTRO
 
Acrolein and Neurocognitive Loss in a Nationally Representative Sample of Uni...
Acrolein and Neurocognitive Loss in a Nationally Representative Sample of Uni...Acrolein and Neurocognitive Loss in a Nationally Representative Sample of Uni...
Acrolein and Neurocognitive Loss in a Nationally Representative Sample of Uni...REY DECASTRO
 
Acrolein and Adult Asthma in a Nationally Representative Sample of the United...
Acrolein and Adult Asthma in a Nationally Representative Sample of the United...Acrolein and Adult Asthma in a Nationally Representative Sample of the United...
Acrolein and Adult Asthma in a Nationally Representative Sample of the United...REY DECASTRO
 
Applications of Contemporary Statistical Approaches in Environmental Health A...
Applications of Contemporary Statistical Approaches in Environmental Health A...Applications of Contemporary Statistical Approaches in Environmental Health A...
Applications of Contemporary Statistical Approaches in Environmental Health A...REY DECASTRO
 
Applications of Contemporary Statistical Approaches in Environmental Health M...
Applications of Contemporary Statistical Approaches in Environmental Health M...Applications of Contemporary Statistical Approaches in Environmental Health M...
Applications of Contemporary Statistical Approaches in Environmental Health M...REY DECASTRO
 
The Longitudinal Dependence of Indoor PAH Concentration on Outdoor PAH and Tr...
The Longitudinal Dependence of Indoor PAH Concentration on Outdoor PAH and Tr...The Longitudinal Dependence of Indoor PAH Concentration on Outdoor PAH and Tr...
The Longitudinal Dependence of Indoor PAH Concentration on Outdoor PAH and Tr...REY DECASTRO
 
The Dependence of Indoor PAH Concentrations on Outdoor PAHs and Traffic Volum...
The Dependence of Indoor PAH Concentrations on Outdoor PAHs and Traffic Volum...The Dependence of Indoor PAH Concentrations on Outdoor PAHs and Traffic Volum...
The Dependence of Indoor PAH Concentrations on Outdoor PAHs and Traffic Volum...REY DECASTRO
 
Using Microarrays to Monitor Gene Expression Induced by Outdoor Airborne Part...
Using Microarrays to Monitor Gene Expression Induced by Outdoor Airborne Part...Using Microarrays to Monitor Gene Expression Induced by Outdoor Airborne Part...
Using Microarrays to Monitor Gene Expression Induced by Outdoor Airborne Part...REY DECASTRO
 
The Longitudinal Dependence of Indoor PAH Concentration on Outdoor PAH and Tr...
The Longitudinal Dependence of Indoor PAH Concentration on Outdoor PAH and Tr...The Longitudinal Dependence of Indoor PAH Concentration on Outdoor PAH and Tr...
The Longitudinal Dependence of Indoor PAH Concentration on Outdoor PAH and Tr...REY DECASTRO
 
Microenvironment Exposure Weights Can Be Obtained from a Straightforward Stat...
Microenvironment Exposure Weights Can Be Obtained from a Straightforward Stat...Microenvironment Exposure Weights Can Be Obtained from a Straightforward Stat...
Microenvironment Exposure Weights Can Be Obtained from a Straightforward Stat...REY DECASTRO
 

More from REY DECASTRO (14)

Population-Weighted Exposure to 174 Air Toxics in a Representative Sample of...
Population-Weighted Exposure to 174 Air Toxics in a Representative Sample of...Population-Weighted Exposure to 174 Air Toxics in a Representative Sample of...
Population-Weighted Exposure to 174 Air Toxics in a Representative Sample of...
 
Association of Urinary Arsenic Species with Diet in a Representative Sample o...
Association of Urinary Arsenic Species with Diet in a Representative Sample o...Association of Urinary Arsenic Species with Diet in a Representative Sample o...
Association of Urinary Arsenic Species with Diet in a Representative Sample o...
 
Acrolein and COPD in a Nationally Representative Sample of United States Adul...
Acrolein and COPD in a Nationally Representative Sample of United States Adul...Acrolein and COPD in a Nationally Representative Sample of United States Adul...
Acrolein and COPD in a Nationally Representative Sample of United States Adul...
 
Bootstrap estimation of variance from ROC curve analysis of NHANES complex su...
Bootstrap estimation of variance from ROC curve analysis of NHANES complex su...Bootstrap estimation of variance from ROC curve analysis of NHANES complex su...
Bootstrap estimation of variance from ROC curve analysis of NHANES complex su...
 
Perchlorate Exposure from Diet and Drinking Water in a Representative Sample ...
Perchlorate Exposure from Diet and Drinking Water in a Representative Sample ...Perchlorate Exposure from Diet and Drinking Water in a Representative Sample ...
Perchlorate Exposure from Diet and Drinking Water in a Representative Sample ...
 
Acrolein and Neurocognitive Loss in a Nationally Representative Sample of Uni...
Acrolein and Neurocognitive Loss in a Nationally Representative Sample of Uni...Acrolein and Neurocognitive Loss in a Nationally Representative Sample of Uni...
Acrolein and Neurocognitive Loss in a Nationally Representative Sample of Uni...
 
Acrolein and Adult Asthma in a Nationally Representative Sample of the United...
Acrolein and Adult Asthma in a Nationally Representative Sample of the United...Acrolein and Adult Asthma in a Nationally Representative Sample of the United...
Acrolein and Adult Asthma in a Nationally Representative Sample of the United...
 
Applications of Contemporary Statistical Approaches in Environmental Health A...
Applications of Contemporary Statistical Approaches in Environmental Health A...Applications of Contemporary Statistical Approaches in Environmental Health A...
Applications of Contemporary Statistical Approaches in Environmental Health A...
 
Applications of Contemporary Statistical Approaches in Environmental Health M...
Applications of Contemporary Statistical Approaches in Environmental Health M...Applications of Contemporary Statistical Approaches in Environmental Health M...
Applications of Contemporary Statistical Approaches in Environmental Health M...
 
The Longitudinal Dependence of Indoor PAH Concentration on Outdoor PAH and Tr...
The Longitudinal Dependence of Indoor PAH Concentration on Outdoor PAH and Tr...The Longitudinal Dependence of Indoor PAH Concentration on Outdoor PAH and Tr...
The Longitudinal Dependence of Indoor PAH Concentration on Outdoor PAH and Tr...
 
The Dependence of Indoor PAH Concentrations on Outdoor PAHs and Traffic Volum...
The Dependence of Indoor PAH Concentrations on Outdoor PAHs and Traffic Volum...The Dependence of Indoor PAH Concentrations on Outdoor PAHs and Traffic Volum...
The Dependence of Indoor PAH Concentrations on Outdoor PAHs and Traffic Volum...
 
Using Microarrays to Monitor Gene Expression Induced by Outdoor Airborne Part...
Using Microarrays to Monitor Gene Expression Induced by Outdoor Airborne Part...Using Microarrays to Monitor Gene Expression Induced by Outdoor Airborne Part...
Using Microarrays to Monitor Gene Expression Induced by Outdoor Airborne Part...
 
The Longitudinal Dependence of Indoor PAH Concentration on Outdoor PAH and Tr...
The Longitudinal Dependence of Indoor PAH Concentration on Outdoor PAH and Tr...The Longitudinal Dependence of Indoor PAH Concentration on Outdoor PAH and Tr...
The Longitudinal Dependence of Indoor PAH Concentration on Outdoor PAH and Tr...
 
Microenvironment Exposure Weights Can Be Obtained from a Straightforward Stat...
Microenvironment Exposure Weights Can Be Obtained from a Straightforward Stat...Microenvironment Exposure Weights Can Be Obtained from a Straightforward Stat...
Microenvironment Exposure Weights Can Be Obtained from a Straightforward Stat...
 

Recently uploaded

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 

Recently uploaded (20)

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 

Secondary Data Analysis

  • 2. 100 Years of “Emma” 2
  • 3. 17 Years Superbowl Viewership 3
  • 4. 4 What is common among these time series data?
  • 5. All wrong! All these time series are fabrications All these time series are “random walks” 5
  • 6. 6 Welcome to secondary data analysis.
  • 7. Secondary Data Analysis B. Rey de Castro, Sc.D. Guest Researcher CDC National Center for Health Statistics University of Maryland College Park School of Public Health FMSC 720 Study Design in MCH Epidemiology November 30, 2010
  • 8. Secondary Data Analysis Data that you did not collect yourself Both the data and study design are givens The statistical analysis is up to you 8
  • 9. Uses for Secondary Data Hypothesis generation/testing Pilot data for grant proposals Expanding knowledge Publications
  • 10. National Health and Nutrition Examination Survey (NHANES) http://www.cdc.gov/nchs/nhanes.htm Population Children, adults nationwide Method Face to face interview Physical exams Content Chronic and Infectious Disease Mental health and cognitive functioning Energy Balance Reproductive history and sexual behavior Respiratory disease Data N ~ 5,000 annually Initiated in 1960’s; Annual since 1999 On-line tutorial
  • 11. National Health Interview Survey (NHIS) http://www.cdc.gov/nchs/nhis.htm Population Households, families, adults, children nationwide Method Face to face interview Content Health conditions and behaviors, access to and use of health services Cancer Control Module (1987, 1992, 2000, 2003, and 2005) Energy Balance Cancer Screening Sun Avoidance Tobacco Use and Control Genetic Testing Data N ~ 40,000 households (~87,000 individuals) annually Initiated in 1957
  • 12. Other Federal Surveys National Longitudinal Mortality Study http://www.census.gov/nlms/ National Health Care Survey http://www.cdc.gov/nchs/nhcs.htm National Ambulatory Medical Care Survey http://www.cdc.gov/nchs/about/major/ahcd/ahcd1.htm Medical Expenditure Panel Survey http://www.meps.ahrq.gov/ Medicare Current Beneficiary Survey http://www.cms.hhs.gov/MCBS/ Medicare Health Outcomes Survey http://www.hosonline.org/ National Survey on Drug Use and Health http://www.oas.samhsa.gov/nhsda.htm National Survey of Family Growth http://www.cdc.gov/nchs/about/major/nsfg/nsfgbiblio.htm
  • 13. Strengths Inexpensive data collection and design costs More statistical power: larger samples Broader geographic area Generalizable to national population Improves understanding of hypothesis Test trends over time Potential for linkage Person Geographically
  • 14. Limitations 1 Substantial time spent on statistical analysis Cross-sectional Recall bias Mismatch: ideal and feasible hypothesis Mismatch: hypothesis and original purpose Generalizabilityto small areas impossible Specialized statistical techniques
  • 15. Limitations 2 Quality Validity & reliability Changes to survey over time Poor documentation Restricted/conditional access Confidentiality 15
  • 16. Recap Just a few examples of publicly available data Most are cross-sectional All employ a complex sampling design Many use multi-stage sampling Requires special software to analyze e.g., SUDAAN Use of weighting, clustering, and stratification Differences in variance estimation methods
  • 18. Statistical Weight The statistical weight of a sampled person is the number of people in the population that the person represents. Weights derived from Selection probabilities Response rates Post-stratification adjustment e.g., gender, education, income, region
  • 19. Stratification Population divided before sampling into disjoint, exhaustive groups (strata) Members termed primary sampling units (PSUs) Independent samples are taken in each strata Strata formed by similar geographic areas   e.g., NHANES: partition US counties into 49 strata based on region and economic/racial characteristics Sample 2 counties (PSUs) from each strata
  • 20. Clustering Persons residing in a small area may have similar characteristics Thus, responses of subjects in small area are potentially correlated Correlation must be accounted for in the analysis Survey analysis programs do this through strata/PSU information
  • 21. Variance Estimation for Surveys Linearization: Uses a Taylor series expansion to estimate variance of non-linear estimators Default method for most programs Requires stratification and PSU information Replication: Calculates parameter estimates for each replicate and combines to estimate variance Jackknife with replicate weights available for SUDAAN, STATA, SAS and WesVAR
  • 22. Replication vs. Linearization If survey doesn’t have replicate weights use the full sample weights and linearization If survey has replicate weights use them with the jackknife procedure Most software use linearization method Only SUDAAN, STATA, SAS, and WesVAR can incorporate replicate weights
  • 23. Complex Survey Design Correct variance estimates Proper hypothesis testing Standard errors will tend to be larger Less likely to make Type I error
  • 24. Statistical Software for Analyzing Health Surveys Specifically designed for analyzing data utilizing complex sampling designs: SUDAAN WesVar Others that can be used: SAS STATA SPSS Mplus
  • 25. Data/Research Resources Univ. of Michigan Consortium for social research: http://www.icpsr.umich.edu/ UCLA Statistical Computing: http://www.ats.ucla.edu/stat/ BRFSS Maps http://apps.nccd.cdc.gov/gisbrfss/default.aspx State Cancer Profiles http://statecancerprofiles.cancer.gov/
  • 26. References Korn, E.L. and Graubard, B.I. (1999). Analysis of Health Surveys. New York: John Wiley State Cancer Profiles: http://statecancerprofiles.cancer.gov/ SUDAAN: http://www.rti.org/SUDAAN/ SAS: http://www.sas.com/ SPSS: http://www.spss.com/ STATA: http://www.stata.com/ WesVar: http://www.westat.com/wesvar/ Mplus: http://www.statmodel.com/
  • 27. Other Data Sources State registries Birth Death Cancer Emergency room admissions Acute outcomes 27
  • 29. Secondary Data Analysis Data that you did not collect yourself Both the data and study design are givens The statistical analysis is up to you 29
  • 30. Lesson One 30 Integrity
  • 31. Dirty Data Key-punch errors Invalid data Missing data Mislabeled variables Unknown variables 31
  • 33. Processing Data Recode data Label variables Format data 33
  • 34. Investigation Reality checks Out-of-range values Descriptive statistics Ranges: out-of-range or improbable values Frequencies: missing values or classes Simple graphical display 34
  • 36. Imputing Missing Values Increases available data Statistically more complex Defensibility Useful 36
  • 37. Lesson Two Spend time up-front being sure about your data Foundation of sand or stone? Crystal clear case definition & recodes More time preparing than analyzing Prevents problems Simplifies analysis 37
  • 44. Diagnostics Independence Homoskedasticity Skewness Influential observations 44
  • 45. Lesson Three Plan, then execute the plan Conform statistical technique to outcome and design Diagnostics 45
  • 46. Case Study Ongoing spatial epidemiology project Complex survey Cross-sectional Data linkage Childhood asthma episodes Air pollution exposure 46
  • 47. Case Study Air pollutant: acrolein EPA attributes >90% non-cancer respiratory health effects to acrolein No epidemiology to date 47
  • 49. National Health Interview Survey Health outcome Asthma episode in last 12 months 2000 – 2004 Children 3 – 17 years-old Parents of ~66,000 kids surveyed Nationally representative sample Complex survey weighting 49
  • 50. National Health Interview Survey Potential Confounders Smoking household Acrolein industry household Age, sex, race Education, income, single-parent family Access to care, insurance Urban/rural Census regional division 50
  • 51. National Air Toxics Assessment Air pollutant Acrolein Strong respiratory irritant Cigarette smoke; industrial emissions 2002 Modeled exposure assessment Census tracts nationwide 51
  • 52. 52 How would you link these two databases?
  • 54. 54 But, requires access to confidential NHIS data.
  • 55. NCHS Says Orient to data structure and contents Locate variables Download data Append & merge data Clean & recode data Format & label variables 55
  • 56. NHIS Data Processing Extract and compile data by year Multiple files 2004 redesign Compile data 2000 – 2004 Formatting and variable names a pain Identify records with complete data Link to NATA Done confidentially by NCHS staff 56
  • 57. Analysis Plan Hypothesis “Childhood asthma episodes are associated with census-tract-level estimates of acrolein exposure” Descriptive statistics Logistic regression Complex weighted variance estimation SAS-callable SUDAAN 57
  • 58. Wisdom Network Cultivate relationships Front-line staff Principal investigators 58
  • 59. Wisdom No one cares more about your problem than you Or, you should 59
  • 60. Wisdom Teach yourself Learn to learn 60
  • 61. Contact B. Rey de Castro, Sc.D. jsq7@cdc.gov http://www.slideshare.net/intelligo/secondary-data-analysis-5972949 61

Editor's Notes

  1. Stage 1: Primary sampling units (PSUs) are selected.  These are mostly single counties or, in a few cases, groups of contiguous counties with probability proportional to a measure of size (PPS).Stage 2: The PSUs are divided up into segments (generally city blocks or their equivalent). As with each PSU, sample segments are selected with PPS.Stage 3: Households within each segment are listed, and a sample is randomly drawn. In geographic areas where the proportion of age, ethnic, or income groups selected for oversampling is high, the probability of selection for those groups is greater than in other areas.Stage 4: Individuals are chosen to participate in NHANES from a list of all persons residing in selected households. Individuals are drawn at random within designated age-sex-race/ethnicity screening subdomains. On average, 1.6 persons are selected per household.