Analysis of Medical Data Research Perspective Nancy B. Clark. M.Ed. Director of Medical Informatics Education FSU College of Medicine Spring 2004 http://www.med.fsu.edu/informatics
Objectives Review statistical concepts to be on Step 1. Determine what data exist relative to a clinical question or formal hypothesis  use IT to locate existing data sources  identify and locate existing data sets  Within institution Outside institution Analyze, interpret and report findings  Select and use appropriate computer software: Excel, SPSS Use software to perform simple statistical analysis and portray results graphically  Interpret reports
Prerequisite Skills  (Step 1 USMLE) Fundamental concepts of measurement Scales of measurement Distribution, central tendency, variability, probability Disease prevalence and incidence Disease outcomes (eg, fatality rates) Associations (correlation or covariance) Health impact (eg, risk differences and ratios) Sensitivity, specificity, predictive values
More Prerequisite Skills  (Step 1 USMLE) Fundamental concepts of hypothesis testing and statistical inference Confidence intervals Statistical significance and type I error Statistical power and type II error  
More Step 1 Topics Fundamental concepts of study design Types of experimental studies (eg, clinical trials, community intervention trials) Types of observational studies (eg, cohort, case-control, cross-sectional, case series, community surveys) Sampling and sample size Subject selection and exposure allocation (eg, randomization, stratification, self-  - selection, systematic assignment) Outcome assessment Internal and external validity
Scales of Measure Nominal  – qualitative classification of equal value:  gender, race, color, city  Ordinal  - qualitative classification which can be rank ordered:  socioeconomic status of families  Interval   - Numerical or quantitative data:  can be rank ordered and sizes compared :  temperature  Ratio   - interval data with absolute zero value:  time or space
Distribution, Central Tendency… Mean
…Variability, Probability… Mean Median Mode Standard deviation Statistical Significance p < .01
Confidence Interval
Statistical Significance Type I  and Type II errors Null Hypothesis = H o Type II error Correct decision Do Not Reject H o Correct decision Type I error Reject H o H o  False H o  True
Statistics Online Textbook The Statistics Homepage  http://www.statsoftinc.com/textbook/stathome.html
Disease Prevalence and Incidence Prevalence probability of disease in entire population at any point in time 2% of the population has diabetes  Incidence probability that patient without disease develops disease during interval 0.2% or 2 per 1000 new cases per year
Sensitivity, Specificity sensitivity  =  a / (a+c) specificity  =  d / (b+d) d c Test is negative b a Test is positive Patients without disease Patients with disease
Predictive Value Positive predictive value  = a / ( a+b) Negative predictive value  = d / (c+d) Post-test probability of disease given positive test  = a / (a+b) Post-test probability of disease given negative test  = c / (c+d) d c Test is negative b a Test is positive Patients without disease Patients with disease
Good Resource Sen, Spc, PV  An Introduction to Information Mastery http://www.poems.msu.edu/InfoMastery/default.htm   Diagnosis Sensitivity and specificity Predictive values Likelihood ratios InfoRetriever  Calculators: Epidemiology, Diagnostic test
Fundamental Concepts of Study Design Good Resource  Epidemiology for the Uninitiated  BMJ Online Textbook http://bmj.com/collections/epidem/epid.shtml
Finding Health Statistics
Types of Health Statistics Questions Fact lookups Research Presentations Social and Policy indicators
Strategies for Finding Health Stats Use Portal Start at Internet site Start with book or article
Internet Portals of Health Stats Lists of links that provide starting points for browsing or searching Keyword search in portal vs Google General idea what you want The Related Health Services Research Web Sites  http://www.nlm.nih.gov/nichsr/hsrsites.html   The NCHS portal:  http://www.cdc.gov/nchs/
Other Statistical Web Sites CDC Data and Statistics  http://www.cdc.gov/scientific.htm FedStats Home Page  http://www.fedstats.gov/   Compare these two U Michigan’s  Statistical Resources on the WEB – HEALTH What type of stats
Lexis-Nexis Statistical Universe Subscription resource Searches stat data Subject List Limit search Reports or tables http://web.lexis-nexis.com/statuniv?B1=Connect+to+Statistical+Universe
MMWR Morbidity – illness Mortality – death http://www.cdc.gov/mmwr/ Disease Trends Tables - searchable
Health Care Data Healthcare Cost and Utilization Project HCUPnet Hospital discharges Ambulatory service Costs Amount of care By diagnosis and procedure Surveys of hosp, physicians, nursing homes
Health Consequences  Costs to society, individuals Cost from care Costs of illness Impact on infrastructure HCFA=>CMS Health Accounts http://www.cms.hhs.gov/statistics/nhe/default.asp
State and International Data Floridahealthstat.com  - Where Florida Health Data Resides   DOH Epidemiology KFF State Health Facts Online   United Nations Statistics Division   World Health Organization Research Tools
Individual Datasets EMR Billing CDCS Customized data collection tools
Data Analysis
Selecting the Appropriate Software Spreadsheet  Numerical (interval or ratio) data Sums Averages Standard deviations Simple charts and graphs Statistical Software Nominal or Ordinal data Comparisons of two+ groups Frequency tables Complicated charts and graphs Normal curves Class intervals Statistical significance
Spreadsheets Excel Pocket Excel
Data Tables Field names at top Each row is a record (sample) Sorting whole table By one column By more than one column Sorting individual sections
Descriptive Statistics Distribution frequency distribution Histogram Central tendency  Mean Median mode  Dispersion Range Standard deviation Variance N Not P (inferential stats)
Central Tendency Mean =AVERAGE(b2:b1500) Median =MEDIAN(A2:A7)  Mode =MODE(A2:A7) N =COUNT(A2:A1500) =COUNTBLANK(A2:B5)
Dispersion Range =MAX(A2:A60)- MIN(A2:A60)  Standard deviation =STDEV(A2:A110)  Variance =VAR(A2:A110)
Distribution Frequency distribution Not easy – use SPSS FREQUENCY(data_array,bins_array) Use help Histogram Bar chart of frequency table
Hands on experience Analyze data in examples2.xls
Statistical Software Intro to SPSS
Statistical Software SPSS Provided by request/justification Lab Computers Start => Programs => SPSS for Windows => SPSS 11.0 for Windows
Start Screen Don’t show this dialog in the future. OK
Open Breast Cancer Survival Data View
Views Variables  View
File Information Utilities Menu File Info… Output window
Descriptive Statistics Analyze Menu Descriptive Statistics Frequencies Select Age  ► Click  Statistic s button In Central Tendency Mean, Median, Mode  In Dispersion Standard Deviation, variance In Percentile Values Quartiles Continue OK
Graphing Graphs Menu Pie… Summary for Groups of cases Lymph Nodes  ► OK
Histogram with Normal Curve Graphs Menu Histogram.. Select Age  ► Check  Display Normal Curve OK
Simple Correlation Analysis Age and Tumor Size Analyze Menu Correlate… Bivariate Select Age  ► Select Pathological Tumor Size ► Check Pearson and Spearman – Two tailed OK Is there a correlation?  Negative or Positive? Is it statistically significant?
Save Output Save on All Users drive Under Nancy.clark SPSS Output Files Name it your name: ie, KerryBachista.spo
Importing Data From Excel, SAS, dBase, etc. Variable names first row File Menu, Open Data… Files of Type Excel Tutorial, Samples Demo.exe Type in Labels Pick Type of variable Enter Value Labels Etc.
SPSS Tutorials In the Help Menu On Informatics Web page Books: Statistics for Social & Health Research (Sage) Argyrous, George  Statistics Applied to Clinical Trials (Klawer Academic Publishers) Cleophas, Ton J., et al
Objectives Determine what data exist relative to a clinical question or formal hypothesis  use IT to locate existing data sources  identify and locate existing data sets  Within institution Outside institution Analyze, interpret and report findings  Select appropriate computer software: Excel, SPSS Use software to perform simple statistical analysis and portray results graphically  Interpret reports
Questions?

Analysis Of Medical Data

  • 1.
    Analysis of MedicalData Research Perspective Nancy B. Clark. M.Ed. Director of Medical Informatics Education FSU College of Medicine Spring 2004 http://www.med.fsu.edu/informatics
  • 2.
    Objectives Review statisticalconcepts to be on Step 1. Determine what data exist relative to a clinical question or formal hypothesis use IT to locate existing data sources identify and locate existing data sets Within institution Outside institution Analyze, interpret and report findings Select and use appropriate computer software: Excel, SPSS Use software to perform simple statistical analysis and portray results graphically Interpret reports
  • 3.
    Prerequisite Skills (Step 1 USMLE) Fundamental concepts of measurement Scales of measurement Distribution, central tendency, variability, probability Disease prevalence and incidence Disease outcomes (eg, fatality rates) Associations (correlation or covariance) Health impact (eg, risk differences and ratios) Sensitivity, specificity, predictive values
  • 4.
    More Prerequisite Skills (Step 1 USMLE) Fundamental concepts of hypothesis testing and statistical inference Confidence intervals Statistical significance and type I error Statistical power and type II error  
  • 5.
    More Step 1Topics Fundamental concepts of study design Types of experimental studies (eg, clinical trials, community intervention trials) Types of observational studies (eg, cohort, case-control, cross-sectional, case series, community surveys) Sampling and sample size Subject selection and exposure allocation (eg, randomization, stratification, self- - selection, systematic assignment) Outcome assessment Internal and external validity
  • 6.
    Scales of MeasureNominal – qualitative classification of equal value: gender, race, color, city Ordinal - qualitative classification which can be rank ordered: socioeconomic status of families Interval - Numerical or quantitative data: can be rank ordered and sizes compared : temperature Ratio - interval data with absolute zero value: time or space
  • 7.
  • 8.
    …Variability, Probability… MeanMedian Mode Standard deviation Statistical Significance p < .01
  • 9.
  • 10.
    Statistical Significance TypeI and Type II errors Null Hypothesis = H o Type II error Correct decision Do Not Reject H o Correct decision Type I error Reject H o H o False H o True
  • 11.
    Statistics Online TextbookThe Statistics Homepage http://www.statsoftinc.com/textbook/stathome.html
  • 12.
    Disease Prevalence andIncidence Prevalence probability of disease in entire population at any point in time 2% of the population has diabetes Incidence probability that patient without disease develops disease during interval 0.2% or 2 per 1000 new cases per year
  • 13.
    Sensitivity, Specificity sensitivity = a / (a+c) specificity = d / (b+d) d c Test is negative b a Test is positive Patients without disease Patients with disease
  • 14.
    Predictive Value Positivepredictive value = a / ( a+b) Negative predictive value = d / (c+d) Post-test probability of disease given positive test = a / (a+b) Post-test probability of disease given negative test = c / (c+d) d c Test is negative b a Test is positive Patients without disease Patients with disease
  • 15.
    Good Resource Sen,Spc, PV An Introduction to Information Mastery http://www.poems.msu.edu/InfoMastery/default.htm Diagnosis Sensitivity and specificity Predictive values Likelihood ratios InfoRetriever Calculators: Epidemiology, Diagnostic test
  • 16.
    Fundamental Concepts ofStudy Design Good Resource Epidemiology for the Uninitiated BMJ Online Textbook http://bmj.com/collections/epidem/epid.shtml
  • 17.
  • 18.
    Types of HealthStatistics Questions Fact lookups Research Presentations Social and Policy indicators
  • 19.
    Strategies for FindingHealth Stats Use Portal Start at Internet site Start with book or article
  • 20.
    Internet Portals ofHealth Stats Lists of links that provide starting points for browsing or searching Keyword search in portal vs Google General idea what you want The Related Health Services Research Web Sites http://www.nlm.nih.gov/nichsr/hsrsites.html The NCHS portal: http://www.cdc.gov/nchs/
  • 21.
    Other Statistical WebSites CDC Data and Statistics http://www.cdc.gov/scientific.htm FedStats Home Page http://www.fedstats.gov/ Compare these two U Michigan’s Statistical Resources on the WEB – HEALTH What type of stats
  • 22.
    Lexis-Nexis Statistical UniverseSubscription resource Searches stat data Subject List Limit search Reports or tables http://web.lexis-nexis.com/statuniv?B1=Connect+to+Statistical+Universe
  • 23.
    MMWR Morbidity –illness Mortality – death http://www.cdc.gov/mmwr/ Disease Trends Tables - searchable
  • 24.
    Health Care DataHealthcare Cost and Utilization Project HCUPnet Hospital discharges Ambulatory service Costs Amount of care By diagnosis and procedure Surveys of hosp, physicians, nursing homes
  • 25.
    Health Consequences Costs to society, individuals Cost from care Costs of illness Impact on infrastructure HCFA=>CMS Health Accounts http://www.cms.hhs.gov/statistics/nhe/default.asp
  • 26.
    State and InternationalData Floridahealthstat.com - Where Florida Health Data Resides DOH Epidemiology KFF State Health Facts Online United Nations Statistics Division World Health Organization Research Tools
  • 27.
    Individual Datasets EMRBilling CDCS Customized data collection tools
  • 28.
  • 29.
    Selecting the AppropriateSoftware Spreadsheet Numerical (interval or ratio) data Sums Averages Standard deviations Simple charts and graphs Statistical Software Nominal or Ordinal data Comparisons of two+ groups Frequency tables Complicated charts and graphs Normal curves Class intervals Statistical significance
  • 30.
  • 31.
    Data Tables Fieldnames at top Each row is a record (sample) Sorting whole table By one column By more than one column Sorting individual sections
  • 32.
    Descriptive Statistics Distributionfrequency distribution Histogram Central tendency Mean Median mode Dispersion Range Standard deviation Variance N Not P (inferential stats)
  • 33.
    Central Tendency Mean=AVERAGE(b2:b1500) Median =MEDIAN(A2:A7) Mode =MODE(A2:A7) N =COUNT(A2:A1500) =COUNTBLANK(A2:B5)
  • 34.
    Dispersion Range =MAX(A2:A60)-MIN(A2:A60) Standard deviation =STDEV(A2:A110) Variance =VAR(A2:A110)
  • 35.
    Distribution Frequency distributionNot easy – use SPSS FREQUENCY(data_array,bins_array) Use help Histogram Bar chart of frequency table
  • 36.
    Hands on experienceAnalyze data in examples2.xls
  • 37.
  • 38.
    Statistical Software SPSSProvided by request/justification Lab Computers Start => Programs => SPSS for Windows => SPSS 11.0 for Windows
  • 39.
    Start Screen Don’tshow this dialog in the future. OK
  • 40.
    Open Breast CancerSurvival Data View
  • 41.
  • 42.
    File Information UtilitiesMenu File Info… Output window
  • 43.
    Descriptive Statistics AnalyzeMenu Descriptive Statistics Frequencies Select Age ► Click Statistic s button In Central Tendency Mean, Median, Mode In Dispersion Standard Deviation, variance In Percentile Values Quartiles Continue OK
  • 44.
    Graphing Graphs MenuPie… Summary for Groups of cases Lymph Nodes ► OK
  • 45.
    Histogram with NormalCurve Graphs Menu Histogram.. Select Age ► Check Display Normal Curve OK
  • 46.
    Simple Correlation AnalysisAge and Tumor Size Analyze Menu Correlate… Bivariate Select Age ► Select Pathological Tumor Size ► Check Pearson and Spearman – Two tailed OK Is there a correlation? Negative or Positive? Is it statistically significant?
  • 47.
    Save Output Saveon All Users drive Under Nancy.clark SPSS Output Files Name it your name: ie, KerryBachista.spo
  • 48.
    Importing Data FromExcel, SAS, dBase, etc. Variable names first row File Menu, Open Data… Files of Type Excel Tutorial, Samples Demo.exe Type in Labels Pick Type of variable Enter Value Labels Etc.
  • 49.
    SPSS Tutorials Inthe Help Menu On Informatics Web page Books: Statistics for Social & Health Research (Sage) Argyrous, George Statistics Applied to Clinical Trials (Klawer Academic Publishers) Cleophas, Ton J., et al
  • 50.
    Objectives Determine whatdata exist relative to a clinical question or formal hypothesis use IT to locate existing data sources identify and locate existing data sets Within institution Outside institution Analyze, interpret and report findings Select appropriate computer software: Excel, SPSS Use software to perform simple statistical analysis and portray results graphically Interpret reports
  • 51.