Multivariate statistical


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • General overview of alcohol consumption – implications for prevention policy. At any given time, approximately 1/3 of the U.S. population abstains;1/3 are light, infrequent drinkers; and l/3 are moderate or heavy drinkers. A plot of the volume drunk against the proportion of the population reporting this volume closely resembles a distribution that can be characterized as a log normal curve, and, indeed, a log normal curve fits the distribution of drinking volumes in many populations. This observation led a Frenchman named Ledermann in the 1950s to develop a formula which was based on the theory that alcohol consumption in a population fit a log normal curve, and the shape of the curve was determined by the average intake of absolute alcohol in liters of the drinking population. Ledermann ’ s theory was relevant to alcohol policy makers because it predicted the number of people likely to be drinking at alcoholic levels. Moreover, it suggested that the number of people drinking at alcoholic levels could be decreased by policies that lowered the average intake of alcohol in all drinkers, for example, by increasing cost, reducing the number of sales outlets, restricting hours during which alcohol is sold, and raising the age limit for buying alcohol. This is sometimes characterized as a “ public health ” model of prevention, in which alcohol consumption is viewed as a problem, and reducing consumption is the goal. An alternative view is suggested by another representation of survey data on consumption — see slide 10.
  • Epidemiology of Alcoholic Liver Disease (slides 1- 3) Slide 1. Alcohol is one of the best known hepatotoxins. In many different societies, death rates from liver disease correlate well with the per capita consumption of alcoholic beverages.
  • Multivariate statistical

    1. 1. Multivariate StatisticalAnalysisRaul Caetano, M.D., Ph.D.
    2. 2. Types of AnalysisIn most cases data analysis is done to test the association between two or more variables (bivariate).• In health research very frequently the aimis to establish the association between riskfactor and disease or between therapyand outcome (patient oriented research).
    3. 3. 12-Month Prevalence of DSM-IVAlcohol Dependence by Age: NLAES199218-29 yrs 30-34 yrs 45-64 yrs 65+ yrs AllWomen 6 3 1 .2 1Men 13 6 3 1 6From Grant et al., 1994
    4. 4. Life is Not SimpleIn reality most statistical analysis test complex relationships in which more than two variables are considered.In health research either multiple risk factors are under consideration or many subject variables must be considered in ascertainingthe outcome of a clinical trial.The ultimate goal of these analysis is either explanation or prediction, i.e., more than just establishing an association.
    5. 5. Examples of Multivariate Analyses• Postulating progression of a specific type of lungcancer as a function of baseline stage ofdiagnostic, prior therapeutic regiments, age, sexamong other clinical and /or demographicfactors.• Evaluating the likelihood of domestic violencetaking into account age of the individuals,whether or not they consume alcohol, ethnicbackground and level of education.
    6. 6. Multivariate Analyses• The idea (model) for the first example(progression of lung cancer) in this casewill be:• Progression of cancer = overall effect(irrespective of the effect of other factors)+ effect of baseline stage of dx + effect ofprior therapeutic regiments + effect of ageand sex.
    7. 7. Multivariate Analyses: How• The model underlying the different effectsis quantifiably measured for eachindividual.• Then the different effects are solved forusing mathematical techniques (calculusand numerical methods) to quantify theireffects and tests for their significance.
    8. 8. Advantages of MultivariateAnalysesIt assures that the results are not biased and influenced by other factors that are not accounted for.• Case in point: if difference in cancer progressiondue to different therapies is not a mere reflectionof differences in prior therapies and or stage ofdiagnostic and or demographic factors.
    9. 9. Which Technique?Statistical techniques are like tools: the task to be accomplished determines the selection of the tool.The selection of a statistical technique is greatly based on the type of data under analysis (e.g., measurement level, type ofdistribution) and the reason (research question) for the analysis.
    10. 10. Normal Curve00. 5 10 15 20Choice of Technique and Data Distribution
    11. 11. Choice of Technique and DataDistribution: Log Normal Curve
    12. 12. Choice of Technique: Levels(Types) of Data• Nominal (Categorical) Measures: Areexhaustive and mutually exclusive (e.g.,religion).• Ordinal Measures: All of the above plus can berank-ordered (e.g., social class).• Interval Measures: All of the above plus equaldifferences between measurement points(temperature).• Ratio Measures: All of the above plus a truezero point (weight).
    13. 13. The Link Between Measurementand Statistical Analysis• A chi square can be used to test theassociation between two nominal levelvariables (e.g., gender and liver cirrhosis).• A correlation can be used to assess theassociation between two interval or ratiolevel measures (e.g., height and weight).
    14. 14. What Type of Measure Should IUse?• Whenever possible use interval or ratiomeasures.• However, many attributes of individuals(e.g., gender, religion) or health outcomes(cured/not cured) cannot be easilymeasured by interval or ratio data.• Thus, nominal or ordinal measures arealso frequently used.
    15. 15. What Level for MultivariateAnalysis?• In the past most techniques required at leastinterval data, that is, all measures in the analysis(including independent and dependentvariables) should have at least equal differencesbetween points on any part of the scale.• Nowadays many techniques (e.g., logisticregression) accept categorical, ordinal, intervalor ratio level variables. These techniques arevery popular in health research because of thedifficulty of implementing interval or ratiomeasurements.
    16. 16. Level of Measurement andMultivariate Statistical TechniqueIndependent Variable Dependent Variable TechniqueNumerical Numerical Multiple RegressionNominal or Numerical Nominal Logistic RegressionNominal or Numerical Numerical (censored) Cox RegressionNominal or Numerical Numerical ANOVA, MANOVANominal or Numerical Nominal (2 or morevalues)Discriminant AnalysisNumerical No DependentVariableFactor and ClusterAnalysis
    17. 17. Assumptions in Linear Regression1) The relationship under analysis is linear.2) The values of the independent variable arefixed (not random variable).3) The values of the dependent variable arerandom.4) For each value of the independent variable thevalues of the dependent variable are normallydistributed.5) The variance of the dependent variable is thesame for all values of the independentvariable.
    18. 18. The Basic Regression ModelSimple Linear RegressionY = a + bXY = Dependent variablea = interceptb = slope/regression coefficient (change in Y with a one unit change inX)X = predictor valueMultiple Linear RegressionY = a + b1X1 + b2X2 +b3X3 + …
    19. 19. The Regression LineYXY=a+bXab
    20. 20. Prevalence of Alcohol Dependence byFrequency of Drinking 5 or More/Day01020304050600 2 4 6 8 10 12MenWomenFrequency of drinking 5 or more per dayFrom Caetano et al., 1997
    21. 21. Liver-relatedDeathsAlcohol Consumption(per 106people)USAFranceGermanyMexicoUSSRAustraliaGreat BritainCanadaJapan
    22. 22. Measurements of Total Lifetime Dose of Alcoholand Muscular Strength for 50 Alcoholic MenUrbano Marquez etal., 1989
    23. 23. Scatterplot of the Total Lifetime Dose of Alcoholvs. Muscular Strength for the Observations
    24. 24. The Least Squares Regression LineAdded to the ScatterplotRegression Equation: Y = 26.4 - .296X
    25. 25. Replicating Life’s Complex Relationships inData Analysis: Developing “The Model”• A set of theoretically and or evidencebased associations between variables is a“model”.• Data analysis tests the extent to which theproposed theoretical associations areobserved in the data.
    26. 26. Model Development• Existing knowledge in the health fieldindicates that most, if not all disease-related outcomes, have multiple andvaried risks. This includes gene-environment interactions.• Thus, in health research the associationsbeing tested usually involve a series ofindependent variables (“causes”) and adependent variable (disease).
    27. 27. Model Development• Model development is based on previousknowledge (previous research evidenceabout associations) and new andhypothetical ones.• It is mostly about deciding whichindependent variables (risk factors) shouldbe in the multivariate analysis.
    28. 28. Example of Multiple Risk Factors
    29. 29. Structural Equation Model of 1991 to 1996 SuicidalIdeation (Women Drinkers only)From Wilsnack et al., 2004
    30. 30. Path Analytic Model of Risk Factors that Predict SuicidalBehaviorsFrom Windle, 2004
    31. 31. Points to Remember• Most data analysis tries to answer complexquestions involving more than 2 variables• These questions can only be adequatelyaddressed by multivariate statistical techniques.• There are a variety of multivariate techniques allof which are based on assumptions about thenature of the data and the type of associationunder analysis.
    32. 32. Points to Remember• Multivariate statistical techniques testtheoretical models (research question)about associations against observed data.• These theoretical models are based onprevious knowledge and on newhypotheses about plausible associationsbetween variables.
    33. 33. Bibliography• R. D. De Veaux, P.F. Velleman “Intro doStats”, Pearson Education Inc., Boston,2004.• B. Dawson, R.G. Trapp “Basic and ClinicalBiostatistics”, 3rdedition, McGraw-Hill, NewYork, 2001.• S. Glantz “Primer of Biostatistics”, 5thEdition, McGraw-Hill, New York, 2002.