The document provides an overview of correlation, regression, and other statistical methods. It defines correlation as measuring the association between two variables, while regression finds the best fitting line to predict a dependent variable from an independent variable. Simple linear regression uses one predictor variable, while multiple linear regression uses two or more. Logistic regression is used for nominal dependent variables. Nonlinear regression fits curved lines to nonlinear data. The document provides examples and guidelines for choosing the appropriate statistical test based on the type of variables.
This article provides guidance on analyzing conditional indirect effects, also known as moderated mediation. It aims to clarify conflicting definitions of moderated mediation and describe approaches for estimating and testing hypotheses about conditional indirect effects. These include standard errors for hypothesis testing, bootstrapping methods, and techniques for probing significant conditional indirect effects. The article introduces an SPSS macro and illustrates these methods using data on the indirect effect of student interest on math performance.
This document discusses metrics for assessing the performance of randomization methods in clinical trials. It proposes measuring randomness using potential selection bias, which calculates how well an observer could guess the next treatment assignment based on previous assignments. It also considers periodicity to detect patterns. Balance is measured using efficiency loss, which quantifies the increase in variability due to imbalances. The document outlines a simulation study comparing randomization methods using these proposed metrics. Stratification factors are modeled using a Zipf-Mandelbrot distribution to generate realistic subgroup sizes. Randomness and balance metrics are calculated at interim analyses and summarized graphically.
This document discusses feature selection methods for causal inference in bioinformatics. It describes how relevance and causality differ, with relevant features not always being causal. Information theory concepts like mutual information, conditional mutual information, and interaction information are introduced to quantify dependence and independence between variables. The min-Interaction Max-Relevance (mIMR) filter method is proposed to select features based on both relevance to the target and minimal interaction, approximating causal relationships. Experimental results on breast cancer gene expression datasets show mIMR outperforms conventional ranking in predictive performance, identifying a potential causal signature for survival.
This document discusses the necessary conditions for conducting path analysis with manifest (observed) variables. Path analysis can be used to test theoretical causal models and determine if the models fit observed data. The document outlines assumptions such as variables being at least interval-level, normally distributed, and linearly related, while also being free of multicollinearity and measurement error. It also provides examples of path diagrams and defines key terms like endogenous and exogenous variables.
This document discusses different types of statistics used in data analysis:
Descriptive statistics aim to quantitatively summarize a data set and are used to give an overall sense of the data being analyzed. An example is providing characteristics like average age in medical research studies.
Inferential statistics are used to make inferences about an unknown population based on a sample. There are different schools of thought on justifying statistical inference based on probability models.
Regression analysis models the relationship between a dependent variable and one or more independent variables. It estimates how the dependent variable changes with the independent variables and is widely used for prediction. Regression models involve unknown parameters, independent variables, and a dependent variable related by a linear or nonlinear function.
Discriminant analysis (DA) is a statistical technique used to predict group membership when the dependent variable is categorical and the independent variables are continuous. It identifies which variables discriminate between two or more naturally occurring groups. DA develops a linear equation to predict group membership based on weighted combinations of predictor variables. It aims to maximize the distance between group means to achieve strong discriminatory power. Like regression, DA assumes variables are normally distributed, cases are randomly sampled, and groups are mutually exclusive and collectively exhaustive. It requires at least two groups with minimal overlap and similar group sizes of at least five cases. DA can classify new cases into groups based on the discriminant functions derived from existing data.
The document provides an overview of correlation, regression, and other statistical methods. It defines correlation as measuring the association between two variables, while regression finds the best fitting line to predict a dependent variable from an independent variable. Simple linear regression uses one predictor variable, while multiple linear regression uses two or more. Logistic regression is used for nominal dependent variables. Nonlinear regression fits curved lines to nonlinear data. The document provides examples and guidelines for choosing the appropriate statistical test based on the type of variables.
This article provides guidance on analyzing conditional indirect effects, also known as moderated mediation. It aims to clarify conflicting definitions of moderated mediation and describe approaches for estimating and testing hypotheses about conditional indirect effects. These include standard errors for hypothesis testing, bootstrapping methods, and techniques for probing significant conditional indirect effects. The article introduces an SPSS macro and illustrates these methods using data on the indirect effect of student interest on math performance.
This document discusses metrics for assessing the performance of randomization methods in clinical trials. It proposes measuring randomness using potential selection bias, which calculates how well an observer could guess the next treatment assignment based on previous assignments. It also considers periodicity to detect patterns. Balance is measured using efficiency loss, which quantifies the increase in variability due to imbalances. The document outlines a simulation study comparing randomization methods using these proposed metrics. Stratification factors are modeled using a Zipf-Mandelbrot distribution to generate realistic subgroup sizes. Randomness and balance metrics are calculated at interim analyses and summarized graphically.
This document discusses feature selection methods for causal inference in bioinformatics. It describes how relevance and causality differ, with relevant features not always being causal. Information theory concepts like mutual information, conditional mutual information, and interaction information are introduced to quantify dependence and independence between variables. The min-Interaction Max-Relevance (mIMR) filter method is proposed to select features based on both relevance to the target and minimal interaction, approximating causal relationships. Experimental results on breast cancer gene expression datasets show mIMR outperforms conventional ranking in predictive performance, identifying a potential causal signature for survival.
This document discusses the necessary conditions for conducting path analysis with manifest (observed) variables. Path analysis can be used to test theoretical causal models and determine if the models fit observed data. The document outlines assumptions such as variables being at least interval-level, normally distributed, and linearly related, while also being free of multicollinearity and measurement error. It also provides examples of path diagrams and defines key terms like endogenous and exogenous variables.
This document discusses different types of statistics used in data analysis:
Descriptive statistics aim to quantitatively summarize a data set and are used to give an overall sense of the data being analyzed. An example is providing characteristics like average age in medical research studies.
Inferential statistics are used to make inferences about an unknown population based on a sample. There are different schools of thought on justifying statistical inference based on probability models.
Regression analysis models the relationship between a dependent variable and one or more independent variables. It estimates how the dependent variable changes with the independent variables and is widely used for prediction. Regression models involve unknown parameters, independent variables, and a dependent variable related by a linear or nonlinear function.
Discriminant analysis (DA) is a statistical technique used to predict group membership when the dependent variable is categorical and the independent variables are continuous. It identifies which variables discriminate between two or more naturally occurring groups. DA develops a linear equation to predict group membership based on weighted combinations of predictor variables. It aims to maximize the distance between group means to achieve strong discriminatory power. Like regression, DA assumes variables are normally distributed, cases are randomly sampled, and groups are mutually exclusive and collectively exhaustive. It requires at least two groups with minimal overlap and similar group sizes of at least five cases. DA can classify new cases into groups based on the discriminant functions derived from existing data.
Linear regression (1). spss analiisa statistikJuandaSatriyo1
This document discusses linear regression analysis, a statistical method used to analyze relationships between variables. It can be used to describe, estimate, and predict relationships. The document provides an overview of linear regression, including how it models relationships between dependent and independent variables using equations. It also discusses important considerations for performing and interpreting linear regression analyses correctly. Examples are provided to illustrate key points.
Inferential Analysis
Chapter 20
NUR 6812Nursing Research
Florida National University
Introduction - Inferential Analysis
We will discuss analysis of variance and regression, which are technically part of the same family of statistics known as the general linear method but are used to achieve different analytical goals
ANALYSIS OF VARIANCE
Analysis of variance (ANOVA) is used so often that Iversen and Norpoth (1987) said they once had a student who thought this was the name of an Italian statistician.
You can think of analysis of variance as a whole family of procedures beginning with the simple and frequently used t-test and becoming quite complicated with the use of multiple dependent variables (MANOVA, to be explained later in this chapter) and covariates.
Although the simpler varieties of these statistics can actually be calculated by hand, it is assumed that you will use a statistical software package for your calculations.
If you want to see how these calculations are done, you could try to compute a correlation, chi-square, t-test, or ANOVA yourself (see Yuker, 1958; Field, 2009), but in general it is too time consuming and too subject to human error to do these by hand.
IMPORTANT TERMINOLOGY
Several terms are used in these analyses that you need to be familiar with to understand the analyses themselves and the results. Many will already be familiar to you.
Statistical significance: This indicates the probability that the differences found are a result of error, not the treatment. Stated in terms of the P value, the convention is to accept either a 1% (P ≤ 0.01), or 1 out of 100, or 5% (P ≤ 0.05), or 5 out of 100, possibility that any differences seen could have been due to error (Cortina & Dunlap, 2007).
Research hypothesis: A research hypothesis is a declarative statement of the expected relationship between the dependent and independent variable(s).
Null hypothesis: The null hypothesis, based on the research hypothesis, states that the predicted relationships will not be found or that those found could have occurred by chance, meaning the difference will not be statistically significant.
Effect size: This is defined by Cortina and Dunlap as “the amount of variance in one variable accounted for by another in the sample at hand” (2007, p. 231). Effect size estimates are helpful adjuncts to significance testing. An important limitation, however, is that they are heavily influenced by the type of treatment or manipulation that occurred and the measures that are used.
Confidence intervals: Although sometimes suggested as an adjunct or replacement for the significance level, confidence intervals are determined in part by the alpha (significance level) (Cortina & Dunlap, 2007). Likened to a margin of error, the confidence intervals indicate the range within which the true difference between means may lie. A narrow confidence interval implies high precision; we can specify believable values within a narrow range ...
Inferential Analysis
Chapter 20
NUR 6812Nursing Research
Florida National University
Introduction - Inferential Analysis
We will discuss analysis of variance and regression, which are technically part of the same family of statistics known as the general linear method but are used to achieve different analytical goals
ANALYSIS OF VARIANCE
Analysis of variance (ANOVA) is used so often that Iversen and Norpoth (1987) said they once had a student who thought this was the name of an Italian statistician.
You can think of analysis of variance as a whole family of procedures beginning with the simple and frequently used t-test and becoming quite complicated with the use of multiple dependent variables (MANOVA, to be explained later in this chapter) and covariates.
Although the simpler varieties of these statistics can actually be calculated by hand, it is assumed that you will use a statistical software package for your calculations.
If you want to see how these calculations are done, you could try to compute a correlation, chi-square, t-test, or ANOVA yourself (see Yuker, 1958; Field, 2009), but in general it is too time consuming and too subject to human error to do these by hand.
IMPORTANT TERMINOLOGY
Several terms are used in these analyses that you need to be familiar with to understand the analyses themselves and the results. Many will already be familiar to you.
Statistical significance: This indicates the probability that the differences found are a result of error, not the treatment. Stated in terms of the P value, the convention is to accept either a 1% (P ≤ 0.01), or 1 out of 100, or 5% (P ≤ 0.05), or 5 out of 100, possibility that any differences seen could have been due to error (Cortina & Dunlap, 2007).
Research hypothesis: A research hypothesis is a declarative statement of the expected relationship between the dependent and independent variable(s).
Null hypothesis: The null hypothesis, based on the research hypothesis, states that the predicted relationships will not be found or that those found could have occurred by chance, meaning the difference will not be statistically significant.
Effect size: This is defined by Cortina and Dunlap as “the amount of variance in one variable accounted for by another in the sample at hand” (2007, p. 231). Effect size estimates are helpful adjuncts to significance testing. An important limitation, however, is that they are heavily influenced by the type of treatment or manipulation that occurred and the measures that are used.
Confidence intervals: Although sometimes suggested as an adjunct or replacement for the significance level, confidence intervals are determined in part by the alpha (significance level) (Cortina & Dunlap, 2007). Likened to a margin of error, the confidence intervals indicate the range within which the true difference between means may lie. A narrow confidence interval implies high precision; we can specify believable values within a narrow range ...
The document discusses different statistical techniques used for data analysis. It describes descriptive statistics, which summarize aspects of a data set using measures like mean, median and mode. Inferential statistics are used to make generalizations, predictions and test hypotheses. Various types of statistical analysis are discussed, including univariate, bivariate and multivariate analysis. Specific bivariate statistical methods explained include correlation, cross tabulation, correlation coefficients like Pearson's r and Spearman's rho, chi-square tests, t-tests, ANOVA, and regression analysis.
This document outlines the key topics and concepts covered in a course on statistics in pharmaceutical sciences. It includes definitions of important statistical terms like population, sample, parameter, statistic, variables, and study types. It also describes levels of measurement, the basics of control groups and placebos in studies, and provides an overview of the steps involved in planning a statistical study.
Overview of Multivariate Statistical MethodsThomasUttaro1
This is an overview of advanced multivariate statistical methods which have become very relevant in many domains over the last few decades. These methods are powerful and can exploit the massive datasets implemented today in meaningful ways. Typically analytics platforms do not deploy these statistical methods, in favor of straightforward metrics and machine learning, and thus they are often overlooked. Additional references are available as documented.
Correlation measures the strength and direction of association between two variables. Positive correlation means both variables increase or decrease together, while negative correlation means one variable increases as the other decreases. Correlation does not imply causation. The correlation coefficient r ranges from -1 to 1, where -1 is total negative correlation, 0 is no correlation, and 1 is total positive correlation. Common types of correlation coefficients include Pearson's correlation coefficient, used with normally distributed interval or ratio data, and Spearman's rank correlation coefficient, used with ordinal or non-normally distributed data. Regression analysis can be used to predict the value of a dependent variable from the value of an independent variable when they are linearly correlated.
- Janet Volen asked Dr. Alfonso Scandrett Jr to complete a descriptive analysis of her research project on using the Framingham Risk Score to initiate lifestyle changes and patient education by identifying individuals at risk of cardiovascular events.
- Dr. Scandrett analyzed demographic and clinical data from 50 participants, finding some significant relationships between variables. He created tables, graphs and conducted statistical tests like ANOVA, correlation matrices, and regression analysis.
- The results showed some relationships between variables but also areas that were not well explained, suggesting caution in fully accepting or rejecting the hypotheses. Dr. Scandrett provided analysis and interpretation of the results to Janet Volen.
This document discusses metrics for assessing the predictability and efficiency of covariate-adaptive randomization designs in clinical trials. It proposes measuring predictability using a modified Blackwell-Hodges potential selection bias metric that calculates how well an observer could guess the next treatment assignment. It also considers entropy and periodicity measures. Balance/efficiency is proposed to be measured using Atkinson's method of quantifying the loss of statistical power as an equivalent reduction in sample size due to treatment imbalances within subgroups. The document then outlines a simulation study to compare various randomization methods using these proposed metrics.
Quantile regression is an extension of linear regression that relates specific quantiles (percentiles) of the target variable to the predictor variables rather than just the mean. It makes fewer assumptions than ordinary least squares regression about the distribution of the target variable and is more robust to outliers. Quantile regression can provide a more complete picture of the relationship between variables by examining how predictors influence different parts of the conditional distribution.
This document covers basic concepts in statistical inference including:
- Definitions of random variables as numerical outcomes that can take different values from random phenomena. Random variables can be discrete or continuous.
- Distinguishing between statistical and systematic uncertainties, where statistical uncertainty arises from randomness in data and systematic from biases consistently affecting measurements.
- Key concepts including expected value, variance, covariance, correlation, and how uncertainties propagate in functions of random variables, either linearly or nonlinearly.
- Properties of expected value and variance such as how they describe the center and spread of a random variable's possible values. Covariance and correlation measure the relationship between two random variables.
Linear regression attempts to model the relationship between two variables by fitting a linear equation to observed data. The most common method for fitting a regression line is the method of least squares, which minimizes the vertical deviations between observed data points and the fitted line. Outliers and influential observations are data points far from the regression line that can significantly impact the slope and strength of the linear relationship. Residual plots are used to investigate the validity of assuming a linear relationship between variables and to identify potential lurking variables not included in the model.
This document provides an overview of common statistical tools used for descriptive statistics, inferential statistics, regression analysis, correlation analysis, probability distributions, and sampling techniques. Descriptive statistics summarize and present data through measures like mean, median, mode, and standard deviation. Inferential statistics allow researchers to make generalizations about populations based on samples using techniques like t-tests, ANOVA, and chi-square tests. Regression analysis and correlation analysis examine relationships between variables. Probability distributions assign probabilities to possible outcomes, and sampling techniques select subsets from populations.
- Analysis of variance (ANOVA) can be used to test if there are significant differences between the means of three or more populations. It tests the null hypothesis that all population means are equal.
- Key terms in ANOVA include response variable, factor, treatment, and level. A factor is the independent variable whose levels make up the treatments being compared.
- ANOVA partitions total variation in data into variations due to treatments and random error. If the treatment variation is large compared to error variation, the null hypothesis of equal means is rejected.
Assessing Mediation in HIV Intervention Studiesfhardnett
This presentation describes the use of asymmetric confidence limits to test for mediation when the direct effect was not significant and effect suppression was present.
This document discusses regression diagnostic checking techniques applied to a study examining factors that influence babies' weight at birth. The study uses mothers' weights and ages as independent variables to predict babies' weight (dependent variable) using linear regression analysis. All regression assumptions (normality of residuals, no collinearity between independent variables, no outliers, linear model) were met based on the diagnostic checking techniques applied to the data.
This document provides an overview of statistics concepts for a pharmacy course. It discusses topics like variables, populations and samples, levels of measurement for data, types of studies like randomized controlled trials, and key steps to planning a study. The document is intended to cover fundamental statistical concepts and their applications in pharmaceutical research and clinical trials.
Multivariate and Conditional Distributionssusered887b
The document discusses key concepts in multivariate analysis including:
1) The multivariate normal distribution plays a fundamental role as both a population model and approximate sampling distribution for many statistics.
2) Multivariate distributions are determined by their mean vectors and covariance matrices.
3) Multivariate analysis involves measuring and analyzing dependence between variables and sets of variables.
4) Many real-world problems fall within the framework of multivariate normal theory.
This document discusses meta-analysis and network meta-analysis in Ayurveda. It defines meta-analysis as a systematic literature review using statistical methods to aggregate findings from multiple related studies. Network meta-analysis extends this concept by including indirect treatment comparisons across different interventions studied. The document provides examples of outcomes that can be analyzed and models used. It also discusses integrating real-world evidence from non-clinical sources with randomized clinical trial data to better predict real-world results.
This document discusses isoimmunization in pregnancy. It covers topics like the immunology of gestation, innate and adaptive immune responses, the development of the fetal immune system, and the management of Rh-sensitized pregnancies. Specifically, it details the nine mandatory conditions for Rh-immunization, describes the fetal inflammatory response syndrome, and provides an algorithm for managing Rh-sensitized pregnancies. It also includes case analyses of relevant malpractice law cases.
Linear regression (1). spss analiisa statistikJuandaSatriyo1
This document discusses linear regression analysis, a statistical method used to analyze relationships between variables. It can be used to describe, estimate, and predict relationships. The document provides an overview of linear regression, including how it models relationships between dependent and independent variables using equations. It also discusses important considerations for performing and interpreting linear regression analyses correctly. Examples are provided to illustrate key points.
Inferential Analysis
Chapter 20
NUR 6812Nursing Research
Florida National University
Introduction - Inferential Analysis
We will discuss analysis of variance and regression, which are technically part of the same family of statistics known as the general linear method but are used to achieve different analytical goals
ANALYSIS OF VARIANCE
Analysis of variance (ANOVA) is used so often that Iversen and Norpoth (1987) said they once had a student who thought this was the name of an Italian statistician.
You can think of analysis of variance as a whole family of procedures beginning with the simple and frequently used t-test and becoming quite complicated with the use of multiple dependent variables (MANOVA, to be explained later in this chapter) and covariates.
Although the simpler varieties of these statistics can actually be calculated by hand, it is assumed that you will use a statistical software package for your calculations.
If you want to see how these calculations are done, you could try to compute a correlation, chi-square, t-test, or ANOVA yourself (see Yuker, 1958; Field, 2009), but in general it is too time consuming and too subject to human error to do these by hand.
IMPORTANT TERMINOLOGY
Several terms are used in these analyses that you need to be familiar with to understand the analyses themselves and the results. Many will already be familiar to you.
Statistical significance: This indicates the probability that the differences found are a result of error, not the treatment. Stated in terms of the P value, the convention is to accept either a 1% (P ≤ 0.01), or 1 out of 100, or 5% (P ≤ 0.05), or 5 out of 100, possibility that any differences seen could have been due to error (Cortina & Dunlap, 2007).
Research hypothesis: A research hypothesis is a declarative statement of the expected relationship between the dependent and independent variable(s).
Null hypothesis: The null hypothesis, based on the research hypothesis, states that the predicted relationships will not be found or that those found could have occurred by chance, meaning the difference will not be statistically significant.
Effect size: This is defined by Cortina and Dunlap as “the amount of variance in one variable accounted for by another in the sample at hand” (2007, p. 231). Effect size estimates are helpful adjuncts to significance testing. An important limitation, however, is that they are heavily influenced by the type of treatment or manipulation that occurred and the measures that are used.
Confidence intervals: Although sometimes suggested as an adjunct or replacement for the significance level, confidence intervals are determined in part by the alpha (significance level) (Cortina & Dunlap, 2007). Likened to a margin of error, the confidence intervals indicate the range within which the true difference between means may lie. A narrow confidence interval implies high precision; we can specify believable values within a narrow range ...
Inferential Analysis
Chapter 20
NUR 6812Nursing Research
Florida National University
Introduction - Inferential Analysis
We will discuss analysis of variance and regression, which are technically part of the same family of statistics known as the general linear method but are used to achieve different analytical goals
ANALYSIS OF VARIANCE
Analysis of variance (ANOVA) is used so often that Iversen and Norpoth (1987) said they once had a student who thought this was the name of an Italian statistician.
You can think of analysis of variance as a whole family of procedures beginning with the simple and frequently used t-test and becoming quite complicated with the use of multiple dependent variables (MANOVA, to be explained later in this chapter) and covariates.
Although the simpler varieties of these statistics can actually be calculated by hand, it is assumed that you will use a statistical software package for your calculations.
If you want to see how these calculations are done, you could try to compute a correlation, chi-square, t-test, or ANOVA yourself (see Yuker, 1958; Field, 2009), but in general it is too time consuming and too subject to human error to do these by hand.
IMPORTANT TERMINOLOGY
Several terms are used in these analyses that you need to be familiar with to understand the analyses themselves and the results. Many will already be familiar to you.
Statistical significance: This indicates the probability that the differences found are a result of error, not the treatment. Stated in terms of the P value, the convention is to accept either a 1% (P ≤ 0.01), or 1 out of 100, or 5% (P ≤ 0.05), or 5 out of 100, possibility that any differences seen could have been due to error (Cortina & Dunlap, 2007).
Research hypothesis: A research hypothesis is a declarative statement of the expected relationship between the dependent and independent variable(s).
Null hypothesis: The null hypothesis, based on the research hypothesis, states that the predicted relationships will not be found or that those found could have occurred by chance, meaning the difference will not be statistically significant.
Effect size: This is defined by Cortina and Dunlap as “the amount of variance in one variable accounted for by another in the sample at hand” (2007, p. 231). Effect size estimates are helpful adjuncts to significance testing. An important limitation, however, is that they are heavily influenced by the type of treatment or manipulation that occurred and the measures that are used.
Confidence intervals: Although sometimes suggested as an adjunct or replacement for the significance level, confidence intervals are determined in part by the alpha (significance level) (Cortina & Dunlap, 2007). Likened to a margin of error, the confidence intervals indicate the range within which the true difference between means may lie. A narrow confidence interval implies high precision; we can specify believable values within a narrow range ...
The document discusses different statistical techniques used for data analysis. It describes descriptive statistics, which summarize aspects of a data set using measures like mean, median and mode. Inferential statistics are used to make generalizations, predictions and test hypotheses. Various types of statistical analysis are discussed, including univariate, bivariate and multivariate analysis. Specific bivariate statistical methods explained include correlation, cross tabulation, correlation coefficients like Pearson's r and Spearman's rho, chi-square tests, t-tests, ANOVA, and regression analysis.
This document outlines the key topics and concepts covered in a course on statistics in pharmaceutical sciences. It includes definitions of important statistical terms like population, sample, parameter, statistic, variables, and study types. It also describes levels of measurement, the basics of control groups and placebos in studies, and provides an overview of the steps involved in planning a statistical study.
Overview of Multivariate Statistical MethodsThomasUttaro1
This is an overview of advanced multivariate statistical methods which have become very relevant in many domains over the last few decades. These methods are powerful and can exploit the massive datasets implemented today in meaningful ways. Typically analytics platforms do not deploy these statistical methods, in favor of straightforward metrics and machine learning, and thus they are often overlooked. Additional references are available as documented.
Correlation measures the strength and direction of association between two variables. Positive correlation means both variables increase or decrease together, while negative correlation means one variable increases as the other decreases. Correlation does not imply causation. The correlation coefficient r ranges from -1 to 1, where -1 is total negative correlation, 0 is no correlation, and 1 is total positive correlation. Common types of correlation coefficients include Pearson's correlation coefficient, used with normally distributed interval or ratio data, and Spearman's rank correlation coefficient, used with ordinal or non-normally distributed data. Regression analysis can be used to predict the value of a dependent variable from the value of an independent variable when they are linearly correlated.
- Janet Volen asked Dr. Alfonso Scandrett Jr to complete a descriptive analysis of her research project on using the Framingham Risk Score to initiate lifestyle changes and patient education by identifying individuals at risk of cardiovascular events.
- Dr. Scandrett analyzed demographic and clinical data from 50 participants, finding some significant relationships between variables. He created tables, graphs and conducted statistical tests like ANOVA, correlation matrices, and regression analysis.
- The results showed some relationships between variables but also areas that were not well explained, suggesting caution in fully accepting or rejecting the hypotheses. Dr. Scandrett provided analysis and interpretation of the results to Janet Volen.
This document discusses metrics for assessing the predictability and efficiency of covariate-adaptive randomization designs in clinical trials. It proposes measuring predictability using a modified Blackwell-Hodges potential selection bias metric that calculates how well an observer could guess the next treatment assignment. It also considers entropy and periodicity measures. Balance/efficiency is proposed to be measured using Atkinson's method of quantifying the loss of statistical power as an equivalent reduction in sample size due to treatment imbalances within subgroups. The document then outlines a simulation study to compare various randomization methods using these proposed metrics.
Quantile regression is an extension of linear regression that relates specific quantiles (percentiles) of the target variable to the predictor variables rather than just the mean. It makes fewer assumptions than ordinary least squares regression about the distribution of the target variable and is more robust to outliers. Quantile regression can provide a more complete picture of the relationship between variables by examining how predictors influence different parts of the conditional distribution.
This document covers basic concepts in statistical inference including:
- Definitions of random variables as numerical outcomes that can take different values from random phenomena. Random variables can be discrete or continuous.
- Distinguishing between statistical and systematic uncertainties, where statistical uncertainty arises from randomness in data and systematic from biases consistently affecting measurements.
- Key concepts including expected value, variance, covariance, correlation, and how uncertainties propagate in functions of random variables, either linearly or nonlinearly.
- Properties of expected value and variance such as how they describe the center and spread of a random variable's possible values. Covariance and correlation measure the relationship between two random variables.
Linear regression attempts to model the relationship between two variables by fitting a linear equation to observed data. The most common method for fitting a regression line is the method of least squares, which minimizes the vertical deviations between observed data points and the fitted line. Outliers and influential observations are data points far from the regression line that can significantly impact the slope and strength of the linear relationship. Residual plots are used to investigate the validity of assuming a linear relationship between variables and to identify potential lurking variables not included in the model.
This document provides an overview of common statistical tools used for descriptive statistics, inferential statistics, regression analysis, correlation analysis, probability distributions, and sampling techniques. Descriptive statistics summarize and present data through measures like mean, median, mode, and standard deviation. Inferential statistics allow researchers to make generalizations about populations based on samples using techniques like t-tests, ANOVA, and chi-square tests. Regression analysis and correlation analysis examine relationships between variables. Probability distributions assign probabilities to possible outcomes, and sampling techniques select subsets from populations.
- Analysis of variance (ANOVA) can be used to test if there are significant differences between the means of three or more populations. It tests the null hypothesis that all population means are equal.
- Key terms in ANOVA include response variable, factor, treatment, and level. A factor is the independent variable whose levels make up the treatments being compared.
- ANOVA partitions total variation in data into variations due to treatments and random error. If the treatment variation is large compared to error variation, the null hypothesis of equal means is rejected.
Assessing Mediation in HIV Intervention Studiesfhardnett
This presentation describes the use of asymmetric confidence limits to test for mediation when the direct effect was not significant and effect suppression was present.
This document discusses regression diagnostic checking techniques applied to a study examining factors that influence babies' weight at birth. The study uses mothers' weights and ages as independent variables to predict babies' weight (dependent variable) using linear regression analysis. All regression assumptions (normality of residuals, no collinearity between independent variables, no outliers, linear model) were met based on the diagnostic checking techniques applied to the data.
This document provides an overview of statistics concepts for a pharmacy course. It discusses topics like variables, populations and samples, levels of measurement for data, types of studies like randomized controlled trials, and key steps to planning a study. The document is intended to cover fundamental statistical concepts and their applications in pharmaceutical research and clinical trials.
Multivariate and Conditional Distributionssusered887b
The document discusses key concepts in multivariate analysis including:
1) The multivariate normal distribution plays a fundamental role as both a population model and approximate sampling distribution for many statistics.
2) Multivariate distributions are determined by their mean vectors and covariance matrices.
3) Multivariate analysis involves measuring and analyzing dependence between variables and sets of variables.
4) Many real-world problems fall within the framework of multivariate normal theory.
This document discusses meta-analysis and network meta-analysis in Ayurveda. It defines meta-analysis as a systematic literature review using statistical methods to aggregate findings from multiple related studies. Network meta-analysis extends this concept by including indirect treatment comparisons across different interventions studied. The document provides examples of outcomes that can be analyzed and models used. It also discusses integrating real-world evidence from non-clinical sources with randomized clinical trial data to better predict real-world results.
This document discusses isoimmunization in pregnancy. It covers topics like the immunology of gestation, innate and adaptive immune responses, the development of the fetal immune system, and the management of Rh-sensitized pregnancies. Specifically, it details the nine mandatory conditions for Rh-immunization, describes the fetal inflammatory response syndrome, and provides an algorithm for managing Rh-sensitized pregnancies. It also includes case analyses of relevant malpractice law cases.
An unconventional analysis focusing on the detail missed by the Crown (in 1884) and by the related publications. Forgot/omitted a slide where I also argued about the inaccurate testimony of the slaughter. Punching "Jugular Vein" typically leads to death by embolism, not by hemorrhage. It is most likely that the defendants had cut Parker's Carotid Artery (not the Jugular Vein) because they had drunk blood after the slaughter. Although this correction would not update the prosecution process, my point is that the prosecution and the judges did not pay attention to this technical reality and kept repeating and rewriting the case from the verbiage of initial testimony. The same content is presented by Naira as a lecture on youtube.com
This was my reputation at Boston University (BU), before the Armenian diabolic rats and BU-outsiders (Paul Noroian/MA, Karine Martirosyan/MA, Marina Noble/MA, Ara Khachatryan/NY) would contact BU with lunatic tirades to demonize and ruin me. Learning about Dori Hutchinson's positive reference years after their malicious attacks, they contacted Dori again, this time challenging her letter. Having Dori's confused response, they then promulgated that latter across the entire global web. Also, Zlatka Russinova and E. Sally Rogers (from BU) were part of this hoax. The Armenian morbid/catatonic jealousy syndrome must be coded as a separate disease in the ICD system. Ask NOT why did the "Armenian genocide happen in 1915." Not only the historical "evidence" (half of what are grossly staged photos) is inaccurate, but also the Armenian leadership (if so existed) was guiltier than the Ottoman Turks. The Armenians lie that they were "massacred over their Christian religion." The neighboring Georgians are Christians too, with way better lands plus access to the sea. Please, do not hire me if you have an Armenian in your team, and do not invite me for a party if you have an Armenian guest, sweeper, butler, or janitor.
The document provides an overview of immunology, covering topics such as:
- The anatomy of the primary and secondary defense organs including the bone marrow, thymus, lymph nodes, and spleen.
- The difference between the innate (naive) and adaptive (learned) immune systems.
- The myeloid and lymphoid lineages that originate from hematopoietic stem cells in the bone marrow and give rise to different immune cells.
- Key immune cells and components such as T-cells, B-cells, antibodies, cytokines, complement systems, and more.
Bound for the medical students who seek legal knowledge and for the law students who seek medical knowledge at the interface of two disciplines in teratology litigation.
This document provides an overview of the Stark Law, including:
- The Stark Law prohibits physician self-referrals of Medicare patients for designated health services if the physician has a financial relationship with the entity providing those services.
- It addresses questions about who enforces the law, why the law was created, what activities it prohibits, and differences between it, the Anti-Kickback Statute, and the False Claims Act.
- The document outlines penalties for Stark Law violations and lists 17 areas of compliance risk identified by the Office of Inspector General related to healthcare fraud and abuse.
This document provides an overview of patents and trade secrets, including:
- The history of U.S. patent law from the Constitution to modern statutes like the America Invents Act.
- Key requirements for patentability including utility, novelty, non-obviousness, and adequate disclosure.
- Distinctions between patents, copyrights, and trademarks in terms of subject matter and requirements.
- Issues regarding patenting natural phenomena, abstract ideas, and business processes.
- Mechanisms of the patent system including filing, examination, infringement, and defenses.
This document provides an overview and introduction to key concepts in trademark law, including definitions of trademarks, trade names, and trade dress. It discusses the differences and overlaps between trademark, patent, copyright, and domain name laws. Specifically, it examines two seminal Supreme Court cases on trade dress - Two Pesos and Wal-Mart Stores - and the TrafFix Devices case that addressed the boundaries between trademark and patent protection. The document analyzes when functional aspects of a product may be eligible for trademark versus patent protection.
A comprehensive guide to the laws governing surrogacy arrangements in North Transatlantic (the UK, the USA, and Canada). DOI: 10.13140/RG.2.1.4485.2888
Elucidates the governing laws (U.S., Canada, U.K), restrictions and extensions of the advance-directives (living wills) in obstetrics. DOI: 10.13140/RG.2.1.3671.4321
As an "anti-dumping" law, EMTALA is signed to prevent hospitals from discharging or transferring uninsured or Medicaid patients to public hospitals without providing, at minimum, a medical screening (appropriate and consistent with the hospital's customary capacity) and stabilizing the patient's emergency condition. This presentation outlines the key elements and challenges in provision of this Law. DOI: 10.13140/RG.2.1.4195.7209
This compact presentation elucidates the key elements of the Public Company Accounting Reform & Investor Protection Act, and contemporary inquires related to it, such as steps the corporations should take to comply with the Act and whether or not, the Act has solved all the problems it was intended to address? DOI: 10.13140/RG.2.1.1049.9923
Adhd Medication Shortage Uk - trinexpharmacy.comreignlana06
The UK is currently facing a Adhd Medication Shortage Uk, which has left many patients and their families grappling with uncertainty and frustration. ADHD, or Attention Deficit Hyperactivity Disorder, is a chronic condition that requires consistent medication to manage effectively. This shortage has highlighted the critical role these medications play in the daily lives of those affected by ADHD. Contact : +1 (747) 209 – 3649 E-mail : sales@trinexpharmacy.com
TEST BANK For An Introduction to Brain and Behavior, 7th Edition by Bryan Kol...rightmanforbloodline
TEST BANK For An Introduction to Brain and Behavior, 7th Edition by Bryan Kolb, Ian Q. Whishaw, Verified Chapters 1 - 16, Complete Newest Versio
TEST BANK For An Introduction to Brain and Behavior, 7th Edition by Bryan Kolb, Ian Q. Whishaw, Verified Chapters 1 - 16, Complete Newest Version
TEST BANK For An Introduction to Brain and Behavior, 7th Edition by Bryan Kolb, Ian Q. Whishaw, Verified Chapters 1 - 16, Complete Newest Version
Integrating Ayurveda into Parkinson’s Management: A Holistic ApproachAyurveda ForAll
Explore the benefits of combining Ayurveda with conventional Parkinson's treatments. Learn how a holistic approach can manage symptoms, enhance well-being, and balance body energies. Discover the steps to safely integrate Ayurvedic practices into your Parkinson’s care plan, including expert guidance on diet, herbal remedies, and lifestyle modifications.
Rasamanikya is a excellent preparation in the field of Rasashastra, it is used in various Kushtha Roga, Shwasa, Vicharchika, Bhagandara, Vatarakta, and Phiranga Roga. In this article Preparation& Comparative analytical profile for both Formulationon i.e Rasamanikya prepared by Kushmanda swarasa & Churnodhaka Shodita Haratala. The study aims to provide insights into the comparative efficacy and analytical aspects of these formulations for enhanced therapeutic outcomes.
These lecture slides, by Dr Sidra Arshad, offer a quick overview of the physiological basis of a normal electrocardiogram.
Learning objectives:
1. Define an electrocardiogram (ECG) and electrocardiography
2. Describe how dipoles generated by the heart produce the waveforms of the ECG
3. Describe the components of a normal electrocardiogram of a typical bipolar lead (limb II)
4. Differentiate between intervals and segments
5. Enlist some common indications for obtaining an ECG
6. Describe the flow of current around the heart during the cardiac cycle
7. Discuss the placement and polarity of the leads of electrocardiograph
8. Describe the normal electrocardiograms recorded from the limb leads and explain the physiological basis of the different records that are obtained
9. Define mean electrical vector (axis) of the heart and give the normal range
10. Define the mean QRS vector
11. Describe the axes of leads (hexagonal reference system)
12. Comprehend the vectorial analysis of the normal ECG
13. Determine the mean electrical axis of the ventricular QRS and appreciate the mean axis deviation
14. Explain the concepts of current of injury, J point, and their significance
Study Resources:
1. Chapter 11, Guyton and Hall Textbook of Medical Physiology, 14th edition
2. Chapter 9, Human Physiology - From Cells to Systems, Lauralee Sherwood, 9th edition
3. Chapter 29, Ganong’s Review of Medical Physiology, 26th edition
4. Electrocardiogram, StatPearls - https://www.ncbi.nlm.nih.gov/books/NBK549803/
5. ECG in Medical Practice by ABM Abdullah, 4th edition
6. Chapter 3, Cardiology Explained, https://www.ncbi.nlm.nih.gov/books/NBK2214/
7. ECG Basics, http://www.nataliescasebook.com/tag/e-c-g-basics
8 Surprising Reasons To Meditate 40 Minutes A Day That Can Change Your Life.pptxHolistified Wellness
We’re talking about Vedic Meditation, a form of meditation that has been around for at least 5,000 years. Back then, the people who lived in the Indus Valley, now known as India and Pakistan, practised meditation as a fundamental part of daily life. This knowledge that has given us yoga and Ayurveda, was known as Veda, hence the name Vedic. And though there are some written records, the practice has been passed down verbally from generation to generation.
Promoting Wellbeing - Applied Social Psychology - Psychology SuperNotesPsychoTech Services
A proprietary approach developed by bringing together the best of learning theories from Psychology, design principles from the world of visualization, and pedagogical methods from over a decade of training experience, that enables you to: Learn better, faster!
Promoting Wellbeing - Applied Social Psychology - Psychology SuperNotes
A Compact Guide to Biostatistics
1. A COMPACT GUIDE TO
BIOSTATISTICS
Naira R. Matevosyan, MD,MSJ,PhD
Legal Clinic: nairarenault.wix.com/panther-law
Authored Books: nairarenault1.wix.com/nairamatevosyan
2. IN THIS ISSUE:
Main Strains in Biostatistics: Descriptive, Inferential, Euclid (3 -7)
Data, Variables, Vectors, Valence (8 - 16)
Matching and Manipulation: Mediator and Moderator (17)
Mode Merits and Demerits (18)
Confounding by Indication: Severity, Protopathy, Selection (19-20)
Confounding by Indication and Contraindication (21)
Collider, Residual Confounding, Reverse Causality (22 -23)
Prevalence, Incidence, Duration (24)
Reduction and Stratification (25 -26)
Diagnostic Tests (S.N.N.O.U.T., S.P.P.I.N.), Predictive Value (27 -29)
Reliability, Validity, Accuracy, Precision, Recall (30 -32)
Stratum Specific Hyper-prior Distributions (33)
Propensity Score, Matching, and Causal Pretzel (34 -35)
Level of Evidence: Causal Description v. Causal Explanation (36)
Measuring Risk (37 -38)
Types of Biases (39 -40)
Review Questions and Answers (41 -42)
3. MAIN STRAINS IN BIOSTATISTICS
Based on the degree of abstraction
DESCRIPTIVE
Characterizes a sample or data-sets by actual measurements.
INFERENTIAL
Assumes each replication in a condition as entirely
independent – to create countless challenges. Calculates a
test-statistic value, degree of freedom, or rejection criteria –
through a particular formula (based on the study design or
specifics) to determine whether or not there are differences
between the treatment groups. Extrapolates and generalizes
the outcomes to make predictions.
EUCLID
Assumes a set of intuitively appealing axioms to assess a
sample through the two, three, four or n-dimensional
geometrical canon (detaching time from space) for causality,
casualty, and prediction. 3
4. SCENARIOS, EXAMPLES
●
Hypothesis: Copper (Cu), Iron (Fe), Manganese (Mn), and Zinc (Zn)
insufficiency contributes to suboptimal levels of luteinization hormone (LH) in
infertile women of reproductive age.
●
Hypothetical Sample: Plasma levels of Cu, Fe, Mn, and Zn from 210 women
(age 19-49 y) with primary infertility, 36 hours prior to the proposed ovulation
day, over the four consecutive menstrual cycles.
●
Descriptive Statistics: Measures the means, standard deviations of plasma
trace elements, covariance given by the diagonal and off-diagonal elements,
correlations between each and paired variables. A correlation coefficient (r)
larger than 0.7 indicates a strong association.
●
Inferential Statistics: Stratifies women for ethnicity (for variations in duration
of the menstrual cycles), age (19-29, 39-40), BMI, preexisting medical
conditions, pelvic inflammatory disease, tubal TB, uterine anomalies, thyroid
disorders, anemia, etc. Models the stratified random samples for probability.
Measures the posterior kernel densities of parameters. Runs predictive
inferences in the fitted model.
●
Euclid Statistics: Assesses the data through holomorphic operations, each
domain as a complex-valued function of differentiable variables which are
manifolds in a spatial unit where the tangent sits with the n-root of differential
expression. Data triangulation and inferences at the infinitesimal points help
with predictions (check scholar.google.com for Matevosyan N.R. Articles, 2011-2021).
4
5. SOME OF MY BOOKS ON THIS SUBJECT
●
“Advanced Research in Comorbodity”
ISBN: 9781514787410 ISBN: 9781493553013
5
8. DATA v. VARIABLES
●
Data include two sets of values: variables (qualitative,
quantitative), and observational units from samples or populations.
●
Data and variables are not synonymous. Variables are the data
modeled (measured, manipulated, linked, controlled, correlated,
compared, indexed) into a function.
●
Independent (experimental) variable is manipulated and its effect
on the dependent (outcome) variable is measured. The role of a
variable depends on a study design: a dependent variable may
become independent, a process variable may become a predictor.
For example, by changing the temporal order between two
variables in causal inference (rates of abortion and unipolar
depression in the same community), abortion can be modeled as an
independent variable to measure the depression rate (dependent
variable) or vice versa. Put simply, depression can be viewed as the
outcome of abortion, and abortion can be viewed as the outcome of
depression.
8
9. TYPES OF VARIABLES INCLUDED IN THIS
PRESENTATION
Independent (experimental) variable
Dependent (outcome) variable
Categorical variable
Numerical variable
Continuous variable
Predictor
Process (mediator, intervening) variable
Moderator (affector)
Latent variable
Omitted variable
Symbolic variable
Hidden variable
Hypothetical variable. 9
10. VARIABLES: MODERATOR V. MEDIATOR
MEDIATOR explains the relationship between independent and
dependent variables (predictor and outcome). For example, in a
sedative pill trial on women with anxiety disorder, women's body-
mass index (BMI) is modeled as a process variable (mediator) that
shows the relation between the independent variable (drug dosage)
and the dependent variable (symptoms of anxiety). In the case of
total mediation, the relationship between predictor and criterion is
reduced to zero, after controlling the mediator - criterion relation.
MODERATOR is a third variable (in a zero-order correlation) on
which the relationship between the other two depends. While the
mediator explains the causality chain, the moderator affects the
strength and direction of that chain. Mediator intervenes and
moderator interacts. Interaction can be categorical (qualitative) or
quantitative. In the same sedative pill study, a moderator is the heavy
caffeine intake that worsens the anxiety in women and interacts with
the study results. Moderator can also explain variations between the
studies. 10
13. MODERATOR V. CONFOUNDER
●
Confounder distorts the association between the
predictor and the outcome.
●
Moderator differentiates the association between the
predictor and the outcome.
●
Mediator explains the association between the predictor
and the outcome.
We typically “adjust” for confounders, and “report” the
different effects seen from the effect modifiers.
14. LATENT VARIABLES
Latent variables are inferred via mathematical models from other
variables that are observed (actually measured). Latent variables (LV)
are used in psychology, economics, medicine, artificial intelligence,
bioinformatics, speech science, management, or social sciences.
Examples are quality of life, confidence, morale, happiness, or liberty -
concepts that cannot be measured directly.
Sometimes, LV may correspond to aspects of physical reality and
therefore be measured as “hidden variables”, and not for practical
reasons. LV may also correspond to the abstract concepts (categories,
behavioral clusters) and be modeled as “hypothetical variables”.
An advantage of using LV is that it reduces data dimensionality
(valence). Presenting a "shared variance" or the degree to which
variables "move together," the LV link observable (real) data to
symbolic (modeled) data. Variables that have no correlation cannot
result in a latent construct based on the common factor model.3
(3) Tabachnick, B.G., Fidell, L.S. (2001). Using Multivariate Analysis. Boston: Allyn
and Bacon. 14
15. OMITTED VARIABLES
Omitted variables are values that can be both cause and result, or
independent and dependent variables in the same model. For example,
anxiety can be both the cause or the result of unemployment;
abortion can be both the cause and the result of depression.
Omitted variable bias (OVB) occurs when a model is created by
incorrectly leaving out one or more important causal factors, or
compensating for the missing factor by underestimating one of the
other important factors.
Two conditions must hold true for OVB to exist in linear regression:
the omitted variable must be:
●
a determinant of the dependent variable (when its true, regression
coefficient is not equal to zero), and
●
correlated with one or more of the included independent variables
(the covariance of the omitted variable and the independent
variable is not equal to zero).
15
16. VALENCE, VECTORS
●
Valence is the dimensionality of the data which can be reduced by the
latent variables (see slide 14).
●
Vectors of values are implied in regression models toward the matrix.
An example: in the modeled formula
16
i = 1.....n; xi is a 1 × p row vector of values of p independent variables
observed at time i or for the ith study participant; β is a p × 1 column
vector of unobservable parameters to be estimated; zi is a scalar, the value
of another independent variable that is observed at time i or for the ith
study participant; δ is a scalar, an unobservable parameter (the response
coefficient of the dependent variable to zi); ui is an unobservable error
occurring at time i or for the ith
study participant; ui is an unobserved
realization of a random variable having expected value 0 (conditionally on
xi and zi); yi is the observation of dependent variable at time i or for the ith
study participant. If zi is omitted from the regression, the estimated values
of response parameters will be given by usual least squares, = (X'X)-1
β
X'Y, where the "prime" notation means the transpose of matrix and the -1
superscript is matrix inversion. Substituting for Y based on the assumed
linear model,
The OMV is non-zero if z is correlated with any variable on the matrix.
17. MATCHING & MANIPULATION: MEDIATION V. MODERATION
●
Matching is used to reduce bias, by evaluating the effect of treatment while
comparing the treated and non-treated units in an observational study or
quasi-experiment (without no random assignments).
●
Experiments explore the effects of things, events, or behaviors that can be
manipulated (dose of a medicine, salary, treatment modality). It is harder to
measure non-manipulable causes (raw genetic material, age, gender). Those
are assessed indirectly in non-experimental studies, using whatever means are
available or fit. Finding manipulable agents helps ameliorate the problem. For
example, phenylketonuria (PKU) treatment wasn't discovered by first trying
different diets in retarded children. Initially, non-manipulable variables were
used to find the increased levels of phenylalanine in those kids. Such findings
informed the scientific directions leading to the diet – with varying degrees of
reduction. Some were experimental, others were not.
●
Analogue experiments can be used on non-manipulable causes by
manipulating an agent that is similar to the cause of interest. We cannot change
a person's race but we can chemically alter the skin pigmentation. Further,
past events (which usually are nonmanipulable) may constitute a natural
experiment that once was even randomized. Stronger solutions to causality
can be achieved by the mediators.
●
Mediator v. Moderator: See details on slides 10-13. 17
18. MODE MERITS & DEMERITS
●
Mode: Among the mean or median of a series, mode is the most
frequent value. It can't be determined from a series of individual
observations unless it is converted into a discrete or continuous series.
In a discrete series, the value of variable against which the frequency
is the largest is the modal value. Mode is measured by
where i is the class interval, i1
is the lower limit of modal class, Δ1
is the
difference of frequencies between modal class and preceding class, Δ2
is
the difference of frequencies between modal class and post-modal class.
●
Mode Merits: Mode is not affected by the values of extreme items. For
the determination of mode, all values in a series are not considered.
●
Mode Demerits: Mode is incapable of further mathematical
treatment. Because mode is not based on all observations of a series, it
is not rigidly defined. Mode may be unrepresentative in some cases as
it may not have a definite value, as in a set of observations two or
three or more modal values may occur. 18
19. Confounding by Indication
●
As noted in my books, epidemiology is about mastering the
concept of confounding. Yet, confounding is not always the “elixir” of
causality.
●
A confounding variable (hidden or lurking variable) extraneously
correlates (directly or inversely) with both dependent and
independent variables. A perceived relationship between independent
and dependent variables that has been misestimated due to the
failure to adjust for confounders is termed a spurious relationship,
and the misestimation is known as an omitted variable bias.
●
How do we prove confounding? We compute the degree of
associations between independent and dependent variables before
and after adjusting for a possible confounder. If the difference
between the two degrees of association is >10%, a confounding is
present and the effect is modified.
●
Confounding by indication is when a variable itself is a risk factor
(in the non-exposed control group) associated with the exposure of
interest – without being an intermediate step in the causal pathway.
19
20. Types of Confounding by Indication
Confounding by Indication (CBI) is typical of the observational,
pharmaco-epidemiologic studies when exposure is associated with
outcome and the latter is caused by indication for what the exposure was
used, or by another factor associated with indication. Confusions about
CBI are mostly due to the three different situations:
(a) CBI as a protopathic bias
(b) CBI by severity
(c) CBI as a form of selection bias.
CBI matters when the severity (or stage) of a disease or a degree of
exposure to an agent act as independent variables at the random intercept
for each confounder. The degree of confounding depends on the
prevalence of putative confounding factors, levels of association with the
disease, and the exposure. Where the disease responsible for indication
acts as a categorical confounder irrespective of a symptom severity, CBI is
due to the protopathic or selection biases.
Solution: Including a range of different indications for the same exposures
enables the relationship between exposure and outcome triangulated to
each of the individual indication analyzed separately.
20
21. Confounding by Indication & Contraindication
Confounding by contraindication (CBCI) is
a rarer bias, and concerns the non-
experimental (observational) studies that
examine predictable side effects.
Hypothesis: Hypochromic anemia is a side
effect of SSRI/SNRI antidepressants. The
SSRI/SNRI intake during pregnancy
contributes to intrauterine growth
retardation (IUGR). CBI Scenario: In women
with depression and singleton pregnancies
antidepressants are modeled as independent
variables, IUGR as the outcome, anemia as a
confounder. CBCI Scenario: The SSRI/SNRI
and IUGR relationship is distorted because
the index group of SSRI/SNRI users will
exclude women with prior IUGR. Ignoring
the CBCI will result in a reference group of
SSRI/SNRI non-users having false “higher
rates” of IUGR. CSCI bias can be addressed
by the exclusion of multi-gravida women. 21
22. Confounding v. Colliding
22
●
Confounding is when exposure and outcome have a
shared common cause that is not controlled by design.
●
Collider bias occurs when exposure and outcome (or
factors causing these) each concurrently influence a
common third variable and that variable (or collider) is
controlled by design.
23. Residual Confounding, Reverse Causality
Residual confounding is the distortion that remains after controlling for
confounding in a study design or analysis. There are three reasons for
residual confounding: (1) No efforts are made to consider, collect and
adjust for additional factors; (2) There are many errors in grouping the
subjects for a confounder analysis; (3) Control of confounding is not
vigorous enough. For example, in a randomized trial on women with
osteoporosis (where age is a confounder) the sample size is too small and
the confounding variable is imprecise (while matching or stratifying age
groups, the age distinction is not scored but scaled as “younger,” “young,”
“old,” “older”) – resultant in residual confounding.
Reverse causality occurs when the probability of an outcome is causally
related to the exposure being studied. Put simply, you may think that X
causes Y, while in the reality Y cases X. For example, it is hard to prove
whether miscarriage was from depression, or depression resulted from a
miscarriage? To prevent confusion, the nine criteria must be followed: (1)
strength of the association, (2) consistency of findings, (3) specificity, (4)
temporal order, (5) exposure gradient, (6) plausible mechanisms, (7)
coherence between the observational, epidemiological and lab data, (8)
experimental evidence, (9) analogy. 23
24. Prevalence, Incidence, Duration
●
Prevalence: A cross-sectional measure of the total number of
people in a population (or subjects in a sample) affected by a
condition at one point in time. Cannot be used in a prediction analysis
(excluding meta-studies with logistic prediction, stochastic compartmental
modeling, or Euclid infinitesimal manifolds by Matevosyan 2013, 2015, 2021).
●
Incidence: A longitudinal measure showing the number of new
cases of a disease (or an event) in a population over a specific
period of time. Can be used in a prediction analysis (examples are the
S.I.R. model [4], or the reinfection proportion model [5]).
●
Duration: Relates incidence to prevalence. For example, upper
respiratory infections (URI) have a high incidence (seasonal) but a low
prevalence because most URI resolve fast. multiple sclerosis (MS) has a
relatively low incidence but high prevalence because it is for life.
(4) Kermack,W.O., McKendrick, A.G. (1927). A contribution to the mathematical theory of
epidemics. Proceedings of the Royal Society of London; 115(772): 700-721
(5) Wang J.Y., Lee L, N., Lai H.C., et al. (2007). Prediction of the tuberculosis reinfection
proportion from the local incidence. The Journal of Infectious Diseases; 196(2): 281–288
24
25. Reduction
Reduction is the transformation of numerical data (empirical, trial, lab,
digital) into a corrected and simplified form for three reasons: (1) to reduce
the number of data records by eliminating invalid or dubious data, (2) to
produce summary or aggregate data for various applications, (3) to reduce
the occurrence and effect of confounders by comparative analysis.
Depending on a study design, reduction controls confounders differently:
●
Cross-section - assigns confounders to both (clinical, control) groups equally.
●
Cohort - creates (via over-exclusion) comparable cohorts with similar
features for possible confounders (age, gender, income, menarche, BMI, etc)
●
Double-blind - conceals the experiment group membership. By preventing
the participants from knowing if they are receiving treatment or not, the
placebo effect should be the same for the control and treatment groups. By
preventing the observees from knowing of their membership, there should be
no treatment or interpreting bias by the researchers.
●
Randomized- the study sample (or population) is divided randomly in order
to mitigate the chances of self-selection (by participants) or bias (by
researchers). Prior to the trial, a random number generator is used to assign
participants to the intended groups (control, intervention, parallel).
25
26. Stratification
Stratification is about dividing the population into distinct groups
or subsets (strata) in each independent sample.
●
For example, protected sex may prevent prostate cancer and in
this equation, age is assumed to be a confounder. Therefore,
the sampled data are stratified by age groups to analyze the
degree of association between safe sex practices and prostate
cancer. If different age groups (strata) yield substantially
diverse risk ratios, age must be viewed as a confounding
variable.
There are statistical tools, among them Mantel–Haenszel iterates,
that control confounding effects by measuring the known
confounders and including them as covariates in multivariate
analyses. However, the multivariate analyses reveal much lesser
information about the strength of the confounding effects than do
stratification methods.
26
27. Diagnostic Tests
●
True positive (Tp): Disease is present and diagnostic test is positive
(a correct result).
●
True negative (Tn): Disease is absent and diagnostic test is negative
(a correct result).
●
False positive (Fp): Disease is absent and diagnostic test is positive
(an incorrect result).
●
False negative (Fn): Disease is present and diagnostic test is
negative (an incorrect result). It is also known as type-2 error.
●
PREVALENCE: The number of affected persons of the total sampe (or
population) = (Tp + Fn)/(Tp + Tn +Fp + Fn)
●
SENSITIVITY: Assuming the disease is present, the probability that the
test will be positive Tp/(Tp + Fn). Used in imaging or screening that
have few negatives. A highly sensitive test rules out the disease:
SNNOUT (sensitive, negative result rules out a disease).
●
SPECIFICITY: Assuming the disease is absent, the probability that the
test will be negative Tn/(Tn + Fp). Used in confirming clinical
diagnoses as there are few false positives. A highly specific test rules-in
the disease: SPPIN (spcific, positive result rules in a disease). 27
28. Sensitivity v. Specificity (continued)
There is a tradeoff between sensitivity and specificity. Changing the cutoff value for the
serum psychosine (to < 15 ng/mL) will change the test's ability to detect the affected
newborns with Krabbe disease. Likewise, if the serum copper cutoff for diagnosing
Wilson's disease were moved from 20 g/dL to 15 g/dL, the test would be
μ μ very specific
because any child with Cu level of 15 g/dL would certainly have
μ Wilson's disease (with
a very few Fp results). However, the results would be insensitive because patients with
serum Cu reading of 15 g/dL would have
μ Fn results (when the normal is > 20 g/dL) .
μ
28
29. Predictive Value
●
A reminder: Sensitivity = Tp/(Tp + Fn); Specificity = Tn/(Tn +Fp).
●
Positive Predictive Value (PPV): Given the test is positive, it is
the probability that a disease is present.
PPV = Tp/(Tp + Fp)
If MRI has a 95% specificity of a spinal cord tumor, based on positive
findings, the patient will trully have the tumor 95% of the time.
●
Nevative Predictive Value (NPV): Given the test is negative, it is
the probability that a disease is absent.
NPV = Tn/(Tn + Fn)
If Epstein Barr Virus (EBV) test has a 99% NPV, then given a negative
test, the patient will trully be EBV-negative 99% of the time.
●
Note: PPV and NPV vary depending on disease prevalence in a population. Yet,
sensitivity will not be affected because Tp/(total number of people with
disease) ratio will not change for a given test but the Tp/(total number of
positive tests) will vary because the area with a higher prevalence will have a
higher number of positive tests.
29
30. Reliability, Validity, Accuracy, Precision
●
RELIABILITY – the measure of consistency of a test; the likelihood that
upon repetition the test will deliver the same results in the same
situation.
●
VALIDITY – the ability of a test to measure what it intends to measure.
●
A test may be reliable but not valid. It may reliably measure the serum
level of selenium; yet, this doesn't inherently mean that the reliable level
of selenium is a valid predictor of Grave's disease.
●
ACCURACY - is analogous to validity, relates to constant error, and
measures a test's ability to obtain true results. In a binary case,
Ac = (Tp +Tn)/(Tp +Tn + Fp+ Fn).
●
PRECISION - is analogous to reliability, relates to variable error, and
measures a test's ability to replicate results. In a binary model,
Pr = Tp/(Tp +Fp).
●
Accuracy is the degree of closeness to the true value. Precision is the
degree to which repeated measurements under unchanged conditions
show the same results. The precision value lies between 0 and 1.
30
31. Precision & Recall
●
A measurement system is considered valid if it is both
accurate and precise.
●
RECALL – Measures a test's accuracy in a binary model,
i.e. out of the total positive what percentage is predicted
positively? It is the same as TPR (true positive rate):
R = Tp /(Tp + Fn)
●
F1 SCORE: The harmonic mean of precision and recall
that takes both false positives (Fp) and false negatives
31
(Fn) into account. It performs well on imbalanced datasets by
giving the same weight to recall (Rc) and precision (Pr):
F1 score = 2/(1/Pr + 1/Rc) = (2Pr x 2 Rc)/(Pr + Rc) =
Tp/ (Tp + Fp/2 + Fn/2).
●
Different problems give different weights to recall or precision. The
weighted F1 score interprets it:
Fβ
=(1+β2
) x (Pr x Rc)/([β2
x Pr] +Rc), where β represents the
number of times when a recall is more important than precision.
32. Precision & Recall
●
A measurement system is considered valid if it is both
accurate and precise.
●
RECALL – Measures a test's accuracy in a binary model,
i.e. out of the total positive what percentage is predicted
positively? It is the same as TPR (true positive rate):
R = Tp /(Tp + Fn)
●
F1 SCORE: The harmonic mean of precision and recall
that takes both false positives (Fp) and false negatives
32
(Fn) into account. It performs well on imbalanced datasets by
giving the same weight to recall (Rc) and precision (Pr):
F1 score = 2/(1/Pr + 1/Rc) = (2Pr x 2 Rc)/(Pr + Rc) =
Tp/ (Tp + Fp/2 + Fn/2).
●
Different problems give different weight to recall or precision. The
weighted F1 score interpretes it:
Fβ
=(1+β2
) x (Pr x Rc)/([β2
x Pr] +Rc), where β represents the
number of times when recall is more important than precision.
33. Stratum Specific Hyper-prior Distributions
●
Problem Definition: Bias models in biostatistics are often used for a sensitivity
analysis where bias is a function (although, occasionally it becomes part of Bayesian
analysis). Conventional analysis of observational data looks like a stratum-specific
process that only quantifies random errors, leaving the scholars to rely on informal
judgments as to the bias effects.
●
Conventional Solutions: Assessment of uncertainty is an essential part of inference
and requires a model with parameters that measure departures from the dubious
assumptions. The most notable models are the confidence profile method which
incorporates bias models into the likelihood function, or Monte Carlo sensitivity
analysis (MCSA) which samples bias parameters, then inverts the bias model to provide
a distribution of ‘bias-corrected’ estimates.
●
Hyperprior Approximation: Bayesian and MCSA outputs depend entirely on the
prior-distributions p(η) that reintroduce the problem of basic sensitivity analysis.
Given the limitless possibilities for p(η), a thorough sensitivity analysis would only
illustrate how various conclusions can be reached. A conclusion about the target would
require constraints on the p(η). These limits would constitute a subjective prior on
priors (a hyperprior); incorporating them into the analysis would produce a subjective
average of results over the hyperprior. This result would itself be subjected to concerns
about sensitivity to the hyperprior, which would continue on into an infinite regress
which is impractical. There is nothing spurious about the quantification if the hyperprior
approximates the views of the analyst, as then the output gives the analyst an idea of
what his/ her posterior bets about the value of target should be. 33
34. Propensity Score
Propensity score (PS) is a probability of treatment assignment
conditional on observed baseline characteristics. It allows to design
and analyze an observational (non-randomized) study so that it
mimics some of the characteristics of a randomized controlled trial. It
is a balancing score where the distribution of observed baseline
covariates is similar between treated and untreated subjects.
ei = Pr(Zi = 1|Xi)
where ei is the PS, Zi denotes the binary treatment condition (Zi=1, if
patient i is in the treatment group and Zi=0, if patient i is in the control
group), Pr - the conditional probability of treatment, Xi vector of
covariates. There are four different applications of PS:
●
matching on the propensity score
●
stratification on the propensity score
●
inverse probability of treatment weighing by using the propensity score
●
covariate adjustment using the propensity score.
(continued) 34
35. Matching & Causal Pretzel
There are several methods of forming matched pairs in treated and
compared subjects for the propensity score:
(1) Matching with and without replacement
(2) Greedy matching – where the first treated subject is selected randomly
(3) Caliper matching - using a proportion of standard deviations of a logit
for propensity score.
Causal Pretzel: Experiments test the influence of descriptive causes or inus
conditions. They do not completely explain a phenomenon; rather they aim to
identify whether a variable (or a set) makes a marginal difference in an
outcome - among other factors affecting that outcome. Many costly scientific
studies (including randomized trials) do not necessarily bring home results. In
part, to limit the cost of contingencies, researchers undergo extensive training
to be able to make smart inclusions and matching. Even then, substantial
judgment is still required as the exact choice may depend on the diagnosis, lab
results, insurance resources, ethics constraints, and the cost of such
arrangements still remains high. In this aspect, meta-studies are a great asset,
for measuring moderators (that once were experimental) for propensity scores,
for further testing invariance or reduction. This framework reminds a pretzel.
35
36. Level of Evidence:
Causal Description v. Causal Explanation
The level of evidence outlined by Sackett (2000) [5]:
1A = Systematic review of randomized controlled trials (RCT)
1B = RCT with narrow confidence interval
1C = All or none case series
2A = Systematic review of cohort studies
2B = Cohort study
2C = Outcomes research
3A = Systematic review of case-controlled studies
3B = Case-controlled study
4 = Case series, poor cohort case-controlled
5 = Expert opinion.
(5) Sackett D.L., Strauss S.E., Richardson W.S., et al (2000).
Evidence-Based Medicine: How to Practice and Teach EBM.
Philadelphia (PA): Churchill-Livingstone
The strength of an
experiment or observation is
in describing outcomes
attributable to varying
treatments (causal
description). Yet, trials or
observations do less when
clarifying causal chains or
confounding (causal
explanation). Meta-studies
pool the results of similar
studies to increase statistical
power (rejecting Fn). This
depends on how a meta-
study preserves or changes
the provided causal
description into causal
explanation.
36
37. Measuring Risk
EXPOSURE: From the 2 x 2 table, the probability that the event will
occur in the exposed group is given by risk in exposed = a/(a+b), and
in the unexposed (control) group it is given by risk in unexposed =
c(c+d).
RISK DIFFERENCE (RD) = risk in exposed - risk in unexposed,
or vice versa. There are several ways to express RD.
– Absolute Risk Reduction (ARR): The reduction of incidence
associated with treatment. ARR = risk in the control group –
risk in the treatment group.
– Attributable Risk (AR): Increase in disease incidence associated
with an exposure. AR = risk in exposed - risk in unexposed.
– Number Needed to Treat (NNT): Number of patients required
to receive an intervention before an adverse outcome is
prevented. NNT = 1/ARR.
– Number Needed to Harm (NNH): Used for interventions or
exposures that may be detrimental. NNH = 1/AR.
37
38. Measuring Risk (continued)
●
Relative Risk or Risk Ratio (RR): The ratio of incidence in two
groups. RR =risk in exposed / risk in unexposed. RR > 1 indicates
harm, RR < 1 indicates treatment, RR = 1 indicates null effect.
●
Relative Risk Reduction (RRR): The percentage of a disease
prevented by treatment. RRR = (risk in unexposed – risk in
exposed)/risk in unexposed = ARR/baseline risk.
●
Excess Relative Risk (ERR): For harmful exposure, ERR =(risk
in exposed – risk in unexposed)/risk in unexposed.
●
Odds: The ratio of a probability of an outcome to the probablity of
not having the outcome. Odds =p/(1-p).
●
Odds Ratio (OR): The odds of an event in the exposed group
divided by the odds of the event in an unexposed group. In a 2 x 2
table, OR = (a/b)/(c/d) = ad/bc. In case-control studies, OR is used
instead of RR because RR can't be calculated from a study data owing
to purposeful oversampling of cases in the study design. OR
approximates RR if the outcome is rare. 38
39. Types of Biases
●
Confounding: A third variable relates to both exposure and
outcome and distorts the association of interest. Solution: Matching.
●
Selection Bias: Non-randomly assigned unsimilar baseline groups.
– Sampling (Ascertainment) Bias: The sample doesn't accurately
represent the population of interest. These studies have internal
validity but lack external validity (generalization). Solution:
Random sampling.
– Susceptibility Bias: Sicker patients are selected for more
invasive treatment. Solution: Randomization.
– Attrition Bias: If loss pr follow-up is uneven between the
groups, it makes an intervention group seem more effective than
it is. Solution: Gathering as much data as possible from dropouts.
●
Measurement Bias (Hawthorne Effect): During the study,
participants change their behaviors. Solution: Placebo group.
●
Recall Bias: The memory of exposure may be affected by the
patient's knowledge of the current disorder. Solution: Prospective
study or data triangulation with confirmatory and objective sources. 39
40. Types of Biases (continued)
●
Lead-time Bias: Early detection of a disease may be misinterpreted as
improving survival. Solution: Adjusting survival rates according to the
severity of disease, not from the detection date.
●
Late-look Bias: Data are collected too late for useful conclusions
because subjects with terminal diseases are either dead or incapable of
timely responding. Solution: Stratify by severity.
●
Omission Bias: Removing or absence of certain variables resultant in
unfitness of the model for regression analysis. Solution: Reiterative
truncated projected least squares (BP-RTPLS).
●
Procedural Bias: Subjects are treated differently depending on the
arm of the study. Solution: Double-blind study.
●
Experimenter Expectancy Bias (Pygmalion Effect): The
researchers' ambitions influence the outcome of the study. Solution:
Double-blind study will prevent researchers and subjects from knowing
to which arm of the study the subjects are assigned.
●
Funding (Sponsorship) Bias: The tendency to skew study results to
support the sponsor's goal or mission. Solution: Independent audit.
40
41. Review Questions
1)A randomized control trial studied the benefits of a new Lupus
Nephritis medication. Of 80 subjects on medication, only 10 had
hematuria. Twenty-five (25) participants out of 80 in the
control group developed hematuria. Make a 2 x 2 table to
calculate the incidence in the exposed and unexposed groups as
well as ARR, NNT, RR, and RRR for medication.
2)A case-control study examines risk factors for oral cancer.
Sixteen (16) subjects with oral cancer are sampled for the
treatment group and 16 participants are selected as controls.
Ten subjects with oral cancer are heavy smokers and four
without oral cancer smoke too. Construct a 2 x 2 table to
calculate the odds ratio (OR). Given the data above, can we
compute the prevalence of oral cancer? Why should we
calculate OR and not RR?
41
42. Review Answers
42
1) Risk in exposed = A/(A + B) =
10/80 = 0.125 = 12.5%
Risk in unexposed = C/(C+D) = 25/80
= 0.313 = 31.3%
ARR = 31.3% - 12.5% = 18.8%
NNT = 1/AAR = 1/0.188 = 5.3
RR = Risk in exposed/Risk in unexposed = 12.5%/31.3% = 0.4 = 40%
RRR = (Risk in unexposed – Risk in exposed)/Risk in unexposed = (31.3%
-12.5%)/31.3% = 0.6 = 60%
2) OR = (A/B)(C/D) = AD/CB = 120/24 = 5
prevalence as we sampled two equal-size groups (exposed,
unexposed). In a case-control study like this, OR is measured, not RR.
The odds of having oral cancer in
smokers are 5 times those of non-
smokers. We can't calculate