- A study analyzed factors that can cause babies to be small for gestational age (SGA), including mothers' body mass index (BMI).
- The document discusses computing BMI from height and weight data, classifying BMI into underweight, normal, and overweight categories, and performing statistical tests to analyze associations between these factors and birthweight and SGA outcomes.
- Statistical tests discussed include chi-square tests, t-tests, ANOVA, and linear regression to identify relationships between maternal BMI, weight classification, and baby's birthweight and risk of SGA.
This document provides an introduction to using R for data science and analytics. It discusses what R is, how to install R and RStudio, statistical software options, and how R can be used with other tools like Tableau, Qlik, and SAS. Examples are given of how R is used in government, telecom, insurance, finance, pharma, and by companies like ANZ bank, Bank of America, Facebook, and the Consumer Financial Protection Bureau. Key statistical concepts are also refreshed.
Factor analysis is a statistical technique used to reduce a large set of variables into a smaller set of underlying factors or dimensions. It examines the interrelationships among variables to define common dimensions called factors that can help explain correlations. Factor analysis is used to identify the underlying structure in a data set and reduce many variables into a smaller number of factors for subsequent analysis like regression or discriminant analysis.
This document provides an introduction to R, including what R is, how it compares to other statistical software packages, its advantages and disadvantages, how to install R, and options for R editors and graphical user interfaces (GUIs). It discusses R as a language for statistical computing and graphics, compares it to packages like SAS, Stata, and SPSS in terms of cost, usage mode, and prevalence. It outlines some of R's advantages like being free and open-source software with an active user community contributing packages, and some disadvantages like the learning curve and lack of a standard GUI.
The document discusses mixed models, which contain both fixed and random effects. Fixed effects have all possible levels included in the study, while random effects are a random sample from the total population. The mixed model is represented as Y = Xβ + Zγ + ε, where β are fixed effects, X are fixed effect variables, Z are random effects, γ are random effect parameters, and ε is the error term. Mixed models can model both fixed and random effects, account for correlation in errors, and handle missing data. They provide correct standard errors compared to general linear models (GLMs). Model fitting involves likelihood ratio tests and information criteria to select the best fitting model.
Cronbach's alpha is a measure of internal consistency, which is used to determine if the items in a survey or questionnaire reliably measure the same concept. It ranges from 0 to 1, with higher numbers indicating greater reliability. An acceptable alpha is between 0.7-0.95. Cronbach's alpha measures how well items correlate with each other and with the total test. It is reported along with the mean to indicate the reliability of a scale. The reliability statistics table in SPSS shows the actual alpha value and whether removing any items would increase or decrease the value.
This document provides an introduction and overview of SPSS (Statistical Package for the Social Sciences). It discusses what SPSS is, the research process it supports, how questionnaires are translated into SPSS, different question and response formats, and levels of measurement. It also briefly outlines some of SPSS's data editing, analysis, and output features.
SPSS (Statistical Package for the Social Sciences) is software used for data analysis. It can process questionnaires, report data in tables and graphs, and analyze means, chi-squares, regression, and more. Originally its own company, SPSS is now owned by IBM and integrated into their software portfolio. The document provides an overview of using SPSS, including entering data from questionnaires, different question/response formats, and descriptive statistical analysis functions in SPSS like frequencies, cross-tabs, and graphs.
This document provides an introduction to using R for data science and analytics. It discusses what R is, how to install R and RStudio, statistical software options, and how R can be used with other tools like Tableau, Qlik, and SAS. Examples are given of how R is used in government, telecom, insurance, finance, pharma, and by companies like ANZ bank, Bank of America, Facebook, and the Consumer Financial Protection Bureau. Key statistical concepts are also refreshed.
Factor analysis is a statistical technique used to reduce a large set of variables into a smaller set of underlying factors or dimensions. It examines the interrelationships among variables to define common dimensions called factors that can help explain correlations. Factor analysis is used to identify the underlying structure in a data set and reduce many variables into a smaller number of factors for subsequent analysis like regression or discriminant analysis.
This document provides an introduction to R, including what R is, how it compares to other statistical software packages, its advantages and disadvantages, how to install R, and options for R editors and graphical user interfaces (GUIs). It discusses R as a language for statistical computing and graphics, compares it to packages like SAS, Stata, and SPSS in terms of cost, usage mode, and prevalence. It outlines some of R's advantages like being free and open-source software with an active user community contributing packages, and some disadvantages like the learning curve and lack of a standard GUI.
The document discusses mixed models, which contain both fixed and random effects. Fixed effects have all possible levels included in the study, while random effects are a random sample from the total population. The mixed model is represented as Y = Xβ + Zγ + ε, where β are fixed effects, X are fixed effect variables, Z are random effects, γ are random effect parameters, and ε is the error term. Mixed models can model both fixed and random effects, account for correlation in errors, and handle missing data. They provide correct standard errors compared to general linear models (GLMs). Model fitting involves likelihood ratio tests and information criteria to select the best fitting model.
Cronbach's alpha is a measure of internal consistency, which is used to determine if the items in a survey or questionnaire reliably measure the same concept. It ranges from 0 to 1, with higher numbers indicating greater reliability. An acceptable alpha is between 0.7-0.95. Cronbach's alpha measures how well items correlate with each other and with the total test. It is reported along with the mean to indicate the reliability of a scale. The reliability statistics table in SPSS shows the actual alpha value and whether removing any items would increase or decrease the value.
This document provides an introduction and overview of SPSS (Statistical Package for the Social Sciences). It discusses what SPSS is, the research process it supports, how questionnaires are translated into SPSS, different question and response formats, and levels of measurement. It also briefly outlines some of SPSS's data editing, analysis, and output features.
SPSS (Statistical Package for the Social Sciences) is software used for data analysis. It can process questionnaires, report data in tables and graphs, and analyze means, chi-squares, regression, and more. Originally its own company, SPSS is now owned by IBM and integrated into their software portfolio. The document provides an overview of using SPSS, including entering data from questionnaires, different question/response formats, and descriptive statistical analysis functions in SPSS like frequencies, cross-tabs, and graphs.
This document provides an overview of using SPSS (Statistical Package for the Social Sciences) software. It introduces the main interfaces for working with data in SPSS, including the data view, variable view, output view, draft view, and syntax view. It also provides instructions for installing sample data files and demonstrates how to generate a basic cross-tabulation output of employment by gender using the automated features.
This document provides an overview of various statistical analysis techniques used in inferential statistics, including t-tests, ANOVA, ANCOVA, chi-square, regression analysis, and interpreting null hypotheses. It defines key terms like alpha levels, effect sizes, and interpreting graphs. The overall purpose is to explain common statistical methods for analyzing data and determining the probability that results occurred by chance or were statistically significant.
Multiple regression analysis , its methods among which multiple regression analysis one of the popular method. also discuss the applications and purposes
Research method ch08 statistical methods 2 anovanaranbatn
1) The document discusses various statistical methods including one-way ANOVA, repeated measures ANOVA, and ANCOVA.
2) One-way ANOVA is used to compare the means of three or more independent groups when you have one independent variable with three or more categories and one continuous dependent variable.
3) Repeated measures ANOVA is used when the same subjects are measured under different conditions to assess for main effects and interactions while accounting for the dependency of measurements within subjects.
The document discusses exploratory factor analysis (EFA). EFA is used to identify patterns of correlations among observed variables and group them into fewer unobserved variables called factors. The key steps of EFA include data screening, factor extraction to identify factors, factor rotation for interpretability, and interpretation of results. The document also provides examples of important EFA concepts like communalities, eigenvalues, scree plot, factor loadings, and reliability. It summarizes an EFA conducted on variables related to consumer mobile phone purchasing behavior, which identified 4 factors: after sales services, looks and ranges, availability of parts and add-on technology, and brand and features.
This document provides an overview and introduction to using the statistical software R. It outlines R's interface, workspace, help system, packages, input/output functions, and how to reuse results. It also discusses downloading and installing R, basic functions and syntax, data manipulation techniques like sorting and merging, creating graphs, and performing statistical analyses such as t-tests, regression, ANOVA, and multiple comparisons. The document recommends several tutorials that provide more in-depth information on using R for statistical modeling, data analysis, and graphics.
Introduction to Statistical Analysis Using Graphpad Prism 6Azmi Mohd Tamil
This document outlines different statistical tests used for different types of variables and data distributions. For quantitative-quantitative data that is normally distributed, Pearson correlation or linear regression is used. For qualitative-quantitative data that is normally distributed, a Student's t-test is used. For repeated measurements on the same individual, a paired t-test is used if the data is normally distributed. Non-parametric tests like Wilcoxon rank sum are used for data that is not normally distributed.
Analysis of covariance (ANCOVA) is a statistical test that assesses whether the means of a dependent variable are equal across levels of a categorical independent variable while statistically controlling for the effects of other continuous variables known as covariates. ANCOVA works by adjusting the sums of squares for the independent variable to remove the influence of the covariate. This allows ANCOVA to test for differences between groups while controlling for the influence of other continuous variables. The assumptions of ANCOVA include those of ANOVA as well as the assumptions that the relationship between the dependent variable and covariate is linear and the same across all groups.
Introduction to Computational StatisticsSetia Pramana
This document outlines courses in computational statistics that utilize various statistical software packages like R, SPSS, and Excel. The courses cover topics ranging from data preparation and visualization to statistical modeling techniques like linear regression, resampling methods, and hypothesis testing. They emphasize hands-on practice over theory, involve group projects, and provide reference materials for further learning.
This document provides an overview of logistic regression, including when and why it is used, the theory behind it, and how to assess logistic regression models. Logistic regression predicts the probability of categorical outcomes given categorical or continuous predictor variables. It relaxes the normality and linearity assumptions of linear regression. The relationship between predictors and outcomes is modeled using an S-shaped logistic function. Model fit, predictors, and interpretations of coefficients are discussed.
This document provides an overview of data analysis and statistics concepts for a training session. It begins with an agenda outlining topics like descriptive statistics, inferential statistics, and independent vs dependent samples. Descriptive statistics concepts covered include measures of central tendency (mean, median, mode), measures of variability (range, standard deviation), and charts. Inferential statistics discusses estimating population parameters, hypothesis testing, and statistical tests like t-tests, ANOVA, and chi-squared. The document provides examples and online simulation tools. It concludes with some practical tips for data analysis like checking for errors, reviewing findings early, and consulting a statistician on analysis plans.
SEM is not a single statistical technique but rather integrates multiple multivariate techniques like factor analysis, path analysis, and regression into a unified framework. It allows modeling of complex latent constructs that are measured with error through multiple observed variables. Path analysis using latent variables can partition variance between true scores on latent variables and measurement error, and examine direct, indirect, and total effects in a system of relationships between variables.
DIstinguish between Parametric vs nonparametric testsai prakash
This document summarizes parametric and nonparametric tests. Parametric tests make assumptions about the population based on known parameters, while nonparametric tests make no assumptions about the population. Some examples of parametric tests provided are t-test, F-test, z-test, and ANOVA, while examples of nonparametric tests include Mann-Whitney, rank sum test, and Kruskal-Wallis test. The key differences between parametric and nonparametric tests are that parametric tests are based on population parameters and distributions while nonparametric tests are not, and parametric tests can only be applied to variable data while nonparametric tests can be used for variable or attribute data.
- Simple linear regression is used to predict values of one variable (dependent variable) given known values of another variable (independent variable).
- A regression line is fitted through the data points to minimize the deviations between the observed and predicted dependent variable values. The equation of this line allows predicting dependent variable values for given independent variable values.
- The coefficient of determination (R2) indicates how much of the total variation in the dependent variable is explained by the regression line. The standard error of estimate provides a measure of how far the observed data points deviate from the regression line on average.
- Prediction intervals can be constructed around predicted dependent variable values to indicate the uncertainty in predictions for a given confidence level, based on the
This document provides an introduction to using R Studio for statistical analysis. It discusses how to install both R and R Studio on Windows and Mac systems. It then covers creating scripts and files in R Studio, basic R syntax including assigning values to variables, vectors, and strings. The document also demonstrates how to install and load packages to access additional functions, and how to access built-in datasets to practice working with data in R.
UNIVARIATE & BIVARIATE ANALYSIS
UNIVARIATE BIVARIATE & MULTIVARIATE
UNIVARIATE ANALYSIS
-One variable analysed at a time
BIVARIATE ANALYSIS
-Two variable analysed at a time
MULTIVARIATE ANALYSIS
-More than two variables analysed at a time
TYPES OF ANALYSIS
DESCRIPTIVE ANALYSIS
INFERENTIAL ANALYSIS
DESCRIPTIVE ANALYSIS
Transformation of raw data
Facilitate easy understanding and interpretation
Deals with summary measures relating to sample data
Eg-what is the average age of the sample?
INFERENTIAL ANALYSIS
Carried out after descriptive analysis
Inferences drawn on population parameters based on sample results
Generalizes results to the population based on sample results
Eg-is the average age of population different from 35?
DESCRIPTIVE ANALYSIS OF UNIVARIATE DATA
1. Prepare frequency distribution of each variable
Missing Data
Situation where certain questions are left unanswered
Analysis of multiple responses
Measures of central tendency
3 measures of central tendency
1.Mean
2.Median
3.Mode
MEAN
Arithmetic average of a variable
Appropriate for interval and ratio scale data
x
MEDIAN
Calculates the middle value of the data
Computed for ratio, interval or ordinal scale.
Data needs to be arranged in ascending or descending order
MODE
Point of maximum frequency
Should not be computed for ordinal or interval data unless grouped.
Widely used in business
MEASURE OF DISPERSION
Measures of central tendency do not explain distribution of variables
4 measures of dispersion
1.Range
2.Variance and standard deviation
3.Coefficient of variation
4.Relative and absolute frequencies
DESCRIPTIVE ANALYSIS OF BIVARIATE DATA
There are three types of measure used.
1.Cross tabulation
2.Spearmans rank correlation coefficient
3.Pearsons linear correlation coefficient
Cross Tabulation
Responses of two questions are combined
Spearman’s rank order correlation coefficient.
Used in case of ordinal data
Measures of association like the relative risk (RR) and odds ratio (OR) quantify the strength between an exposure and disease. An RR or OR of 1 means no association, above 1 means positive association, and below 1 means negative association. The RR compares outcomes between exposed and unexposed groups in cohort studies, while the OR provides an estimate of the RR using case-control studies. Confidence intervals describe the precision of a point estimate, with a narrower interval indicating a more precise estimate. Interpreting if a 95% CI includes 1 determines if there is a statistically significant association.
SPSS is a statistical software package used for data management and analysis. It can import data from various file formats, perform complex statistical analyses and generate reports, tables, and graphs. Some key features include an easy to use interface, robust statistical procedures, and the ability to work with different operating systems. While powerful and popular, SPSS is also expensive and less flexible than open-source alternatives like R for advanced or custom analyses.
- Reliability is a measure of reproducibility of a test when repeated, quantifying random error. Validity is how well a test measures what it intends to, requiring comparison to a criterion.
- Reliability is typically quantified by the typical error or intraclass correlation. Validity uses correlation and error of estimate from regression of the test on a criterion.
- Both reliability and validity should be high for a test to accurately track small individual changes over time and distinguish individuals. Ideal values are >0.96 for reliability and validity correlations and typical/estimate errors <20% of between-subject standard deviation.
This document provides an overview of using SPSS (Statistical Package for the Social Sciences) software. It introduces the main interfaces for working with data in SPSS, including the data view, variable view, output view, draft view, and syntax view. It also provides instructions for installing sample data files and demonstrates how to generate a basic cross-tabulation output of employment by gender using the automated features.
This document provides an overview of various statistical analysis techniques used in inferential statistics, including t-tests, ANOVA, ANCOVA, chi-square, regression analysis, and interpreting null hypotheses. It defines key terms like alpha levels, effect sizes, and interpreting graphs. The overall purpose is to explain common statistical methods for analyzing data and determining the probability that results occurred by chance or were statistically significant.
Multiple regression analysis , its methods among which multiple regression analysis one of the popular method. also discuss the applications and purposes
Research method ch08 statistical methods 2 anovanaranbatn
1) The document discusses various statistical methods including one-way ANOVA, repeated measures ANOVA, and ANCOVA.
2) One-way ANOVA is used to compare the means of three or more independent groups when you have one independent variable with three or more categories and one continuous dependent variable.
3) Repeated measures ANOVA is used when the same subjects are measured under different conditions to assess for main effects and interactions while accounting for the dependency of measurements within subjects.
The document discusses exploratory factor analysis (EFA). EFA is used to identify patterns of correlations among observed variables and group them into fewer unobserved variables called factors. The key steps of EFA include data screening, factor extraction to identify factors, factor rotation for interpretability, and interpretation of results. The document also provides examples of important EFA concepts like communalities, eigenvalues, scree plot, factor loadings, and reliability. It summarizes an EFA conducted on variables related to consumer mobile phone purchasing behavior, which identified 4 factors: after sales services, looks and ranges, availability of parts and add-on technology, and brand and features.
This document provides an overview and introduction to using the statistical software R. It outlines R's interface, workspace, help system, packages, input/output functions, and how to reuse results. It also discusses downloading and installing R, basic functions and syntax, data manipulation techniques like sorting and merging, creating graphs, and performing statistical analyses such as t-tests, regression, ANOVA, and multiple comparisons. The document recommends several tutorials that provide more in-depth information on using R for statistical modeling, data analysis, and graphics.
Introduction to Statistical Analysis Using Graphpad Prism 6Azmi Mohd Tamil
This document outlines different statistical tests used for different types of variables and data distributions. For quantitative-quantitative data that is normally distributed, Pearson correlation or linear regression is used. For qualitative-quantitative data that is normally distributed, a Student's t-test is used. For repeated measurements on the same individual, a paired t-test is used if the data is normally distributed. Non-parametric tests like Wilcoxon rank sum are used for data that is not normally distributed.
Analysis of covariance (ANCOVA) is a statistical test that assesses whether the means of a dependent variable are equal across levels of a categorical independent variable while statistically controlling for the effects of other continuous variables known as covariates. ANCOVA works by adjusting the sums of squares for the independent variable to remove the influence of the covariate. This allows ANCOVA to test for differences between groups while controlling for the influence of other continuous variables. The assumptions of ANCOVA include those of ANOVA as well as the assumptions that the relationship between the dependent variable and covariate is linear and the same across all groups.
Introduction to Computational StatisticsSetia Pramana
This document outlines courses in computational statistics that utilize various statistical software packages like R, SPSS, and Excel. The courses cover topics ranging from data preparation and visualization to statistical modeling techniques like linear regression, resampling methods, and hypothesis testing. They emphasize hands-on practice over theory, involve group projects, and provide reference materials for further learning.
This document provides an overview of logistic regression, including when and why it is used, the theory behind it, and how to assess logistic regression models. Logistic regression predicts the probability of categorical outcomes given categorical or continuous predictor variables. It relaxes the normality and linearity assumptions of linear regression. The relationship between predictors and outcomes is modeled using an S-shaped logistic function. Model fit, predictors, and interpretations of coefficients are discussed.
This document provides an overview of data analysis and statistics concepts for a training session. It begins with an agenda outlining topics like descriptive statistics, inferential statistics, and independent vs dependent samples. Descriptive statistics concepts covered include measures of central tendency (mean, median, mode), measures of variability (range, standard deviation), and charts. Inferential statistics discusses estimating population parameters, hypothesis testing, and statistical tests like t-tests, ANOVA, and chi-squared. The document provides examples and online simulation tools. It concludes with some practical tips for data analysis like checking for errors, reviewing findings early, and consulting a statistician on analysis plans.
SEM is not a single statistical technique but rather integrates multiple multivariate techniques like factor analysis, path analysis, and regression into a unified framework. It allows modeling of complex latent constructs that are measured with error through multiple observed variables. Path analysis using latent variables can partition variance between true scores on latent variables and measurement error, and examine direct, indirect, and total effects in a system of relationships between variables.
DIstinguish between Parametric vs nonparametric testsai prakash
This document summarizes parametric and nonparametric tests. Parametric tests make assumptions about the population based on known parameters, while nonparametric tests make no assumptions about the population. Some examples of parametric tests provided are t-test, F-test, z-test, and ANOVA, while examples of nonparametric tests include Mann-Whitney, rank sum test, and Kruskal-Wallis test. The key differences between parametric and nonparametric tests are that parametric tests are based on population parameters and distributions while nonparametric tests are not, and parametric tests can only be applied to variable data while nonparametric tests can be used for variable or attribute data.
- Simple linear regression is used to predict values of one variable (dependent variable) given known values of another variable (independent variable).
- A regression line is fitted through the data points to minimize the deviations between the observed and predicted dependent variable values. The equation of this line allows predicting dependent variable values for given independent variable values.
- The coefficient of determination (R2) indicates how much of the total variation in the dependent variable is explained by the regression line. The standard error of estimate provides a measure of how far the observed data points deviate from the regression line on average.
- Prediction intervals can be constructed around predicted dependent variable values to indicate the uncertainty in predictions for a given confidence level, based on the
This document provides an introduction to using R Studio for statistical analysis. It discusses how to install both R and R Studio on Windows and Mac systems. It then covers creating scripts and files in R Studio, basic R syntax including assigning values to variables, vectors, and strings. The document also demonstrates how to install and load packages to access additional functions, and how to access built-in datasets to practice working with data in R.
UNIVARIATE & BIVARIATE ANALYSIS
UNIVARIATE BIVARIATE & MULTIVARIATE
UNIVARIATE ANALYSIS
-One variable analysed at a time
BIVARIATE ANALYSIS
-Two variable analysed at a time
MULTIVARIATE ANALYSIS
-More than two variables analysed at a time
TYPES OF ANALYSIS
DESCRIPTIVE ANALYSIS
INFERENTIAL ANALYSIS
DESCRIPTIVE ANALYSIS
Transformation of raw data
Facilitate easy understanding and interpretation
Deals with summary measures relating to sample data
Eg-what is the average age of the sample?
INFERENTIAL ANALYSIS
Carried out after descriptive analysis
Inferences drawn on population parameters based on sample results
Generalizes results to the population based on sample results
Eg-is the average age of population different from 35?
DESCRIPTIVE ANALYSIS OF UNIVARIATE DATA
1. Prepare frequency distribution of each variable
Missing Data
Situation where certain questions are left unanswered
Analysis of multiple responses
Measures of central tendency
3 measures of central tendency
1.Mean
2.Median
3.Mode
MEAN
Arithmetic average of a variable
Appropriate for interval and ratio scale data
x
MEDIAN
Calculates the middle value of the data
Computed for ratio, interval or ordinal scale.
Data needs to be arranged in ascending or descending order
MODE
Point of maximum frequency
Should not be computed for ordinal or interval data unless grouped.
Widely used in business
MEASURE OF DISPERSION
Measures of central tendency do not explain distribution of variables
4 measures of dispersion
1.Range
2.Variance and standard deviation
3.Coefficient of variation
4.Relative and absolute frequencies
DESCRIPTIVE ANALYSIS OF BIVARIATE DATA
There are three types of measure used.
1.Cross tabulation
2.Spearmans rank correlation coefficient
3.Pearsons linear correlation coefficient
Cross Tabulation
Responses of two questions are combined
Spearman’s rank order correlation coefficient.
Used in case of ordinal data
Measures of association like the relative risk (RR) and odds ratio (OR) quantify the strength between an exposure and disease. An RR or OR of 1 means no association, above 1 means positive association, and below 1 means negative association. The RR compares outcomes between exposed and unexposed groups in cohort studies, while the OR provides an estimate of the RR using case-control studies. Confidence intervals describe the precision of a point estimate, with a narrower interval indicating a more precise estimate. Interpreting if a 95% CI includes 1 determines if there is a statistically significant association.
SPSS is a statistical software package used for data management and analysis. It can import data from various file formats, perform complex statistical analyses and generate reports, tables, and graphs. Some key features include an easy to use interface, robust statistical procedures, and the ability to work with different operating systems. While powerful and popular, SPSS is also expensive and less flexible than open-source alternatives like R for advanced or custom analyses.
- Reliability is a measure of reproducibility of a test when repeated, quantifying random error. Validity is how well a test measures what it intends to, requiring comparison to a criterion.
- Reliability is typically quantified by the typical error or intraclass correlation. Validity uses correlation and error of estimate from regression of the test on a criterion.
- Both reliability and validity should be high for a test to accurately track small individual changes over time and distinguish individuals. Ideal values are >0.96 for reliability and validity correlations and typical/estimate errors <20% of between-subject standard deviation.
Non parametric study; Statistical approach for med student Dr. Rupendra Bharti
Non-parametric statistics are statistical methods that do not rely on assumptions about the probability distributions of the variables being assessed. They make fewer assumptions than parametric tests and can be used with ordinal or nominal data. Some common non-parametric tests include the chi-square test, McNemar's test, sign test, Wilcoxon signed-rank test, Mann-Whitney U test, Kruskal-Wallis test, and Spearman's rank correlation test. Non-parametric tests are useful when the data is ranked or does not meet the assumptions of parametric tests, as they provide a distribution-free way to perform statistical hypothesis testing.
Standardized tests are designed and administered consistently to allow for comparison of student performance. Tests are given to a sample group to determine average scores and the spread of scores. This establishes norms that individual students can be compared to. There are two main types of standardized tests - norm-referenced tests which compare students to peers, and criterion-referenced tests which assess knowledge of a defined subject area. Tests go through a process of development that includes trying out drafts, analyzing results, revising weak questions, and further testing to establish reliability and validity.
This document discusses several common problems with data handling and quality including building and testing models with the same data, confusion between biological and technical replicates, and identification and handling of outliers. It provides examples and explanations of key concepts such as experimental and sampling units, pseudo-replication, outliers versus high influence points, and leverage plots. The importance of proper data handling techniques like dividing data into training, test, and confirmation sets and using cross-validation is emphasized to avoid overfitting models and generating spurious findings.
Statistical Learning and Model Selection (1).pptxrajalakshmi5921
This document discusses statistical learning and model selection. It introduces statistical learning problems, statistical models, the need for statistical modeling, and issues around evaluating models. Key points include: statistical learning involves using data to build a predictive model; a good model balances bias and variance to minimize prediction error; cross-validation is described as the ideal procedure for evaluating models without overfitting to the test data.
This document provides an overview of key statistical concepts for medical research including:
- Common measures like mean, standard deviation, confidence intervals, and p-values.
- Study designs such as randomized controlled trials.
- Tests for comparing groups like t-tests, ANOVA, and chi-square tests.
- Measures of disease frequency and test accuracy like sensitivity and specificity.
- The importance of understanding statistics for medical research and exams.
- Examples of choosing the appropriate statistical tests based on the study design and variables.
In 3 sentences or less, it orients the reader to fundamental epidemiological and biostatistical concepts for medical research and exam preparation.
Vital QMS Process Validation Statistics - OMTEC 2018April Bright
According to 21 CFR, Part 820, medical device manufacturers are required to validate as well as monitor and control parameters for their processes. The guideline on Quality Management Systems does not specify how this is accomplished; only that “a process is established that can consistently conform to requirements” and “studies are conducted demonstrating” this. Thorough process development, optimization and control using appropriate statistical methods and tools is recommended for demonstrating that your process is both stable and capable. This session will demonstrate ways to efficiently and effectively apply recommended statistical methods and tools to process validation—with no statistical expertise needed. Using realistic process data, participants will learn how to apply tools, interpret results and draw meaningful conclusions throughout Installation Qualification (IQ), Operational Qualification (OQ) and Performance Qualification (PQ).
A researcher conducted a study collecting data from a sample of 100 individuals for a heart study. Variables collected included education, gender, weight, height, smoking status, physical activity level, and various medical measurements. The document discusses several non-parametric statistical tests that can be used to analyze this type of data including tests for one sample and two samples. It provides examples of how to conduct one sample binomial tests for proportions and numeric variables, chi-square tests for more than two proportions, Kolmogorov-Smirnov tests for normality, Wilcoxon signed-rank tests, Mann-Whitney tests, and runs tests using SPSS.
This document provides information about non-parametric statistical tests. It discusses the Mann-Whitney U test, chi-square test, and how to perform chi-square tests in SPSS. Key points include:
- Non-parametric tests do not assume a specific data distribution and can be used for small sample sizes, ordinal data, or outliers. Examples include Mann-Whitney U, Kruskal-Wallis, and chi-square tests.
- Chi-square tests independence between two categorical variables. Assumptions include frequencies data and expected counts over 5 in 80% of cells.
- To perform a chi-square test in SPSS, select two categorical variables, choose crosstabs
Introduction to Business Analytics Course Part 10Beamsync
Are you looking for Business Analytics training courses in Bangalore? then consult Beamsync.
Beamsync is providing business analytics training in Bengaluru / Bangalore with experience trainers. For schedules visit: http://beamsync.com/business-analytics-training-bangalore/
This document provides information about non-parametric tests. It begins by explaining that non-parametric tests do not assume a specific distribution or make assumptions about the population. It then discusses tests for normality like the Kolmogorov-Smirnov test and Shapiro-Wilk test. Commonly used non-parametric tests like Spearman's rank correlation, Mann-Whitney U test, and Kruskal-Wallis H test are explained. The chi-square test and assumptions are also covered in detail. Advantages of non-parametric tests include fewer assumptions and applicability to small sample sizes. A disadvantage is they are less powerful than parametric tests.
This document provides guidance on data analysis techniques for a research project. It outlines key steps like understanding objectives, cleaning data, and ensuring the chosen analytical approaches are appropriate. It also details specific tests and criteria for assessing normality, reliability, validity, factor analysis, and collinearity. Normality can be checked using graphs, Shapiro-Wilk, Kolmogorov-Smirnov tests and normal Q-Q plots. Reliability is assessed using Cronbach's alpha above 0.7 and composite reliability above 0.7. Validity involves factor analysis p-values, average variance extracted above 0.5, and HTMT tests. Collinearity is checked using variance inflation factor values.
Quality Control for Quantitative Tests by Prof Aamir Ijaz (Pakistan)Aamir Ijaz Brig
This document provides an overview of quality control and quality assurance processes in a chemical pathology laboratory. It discusses key terms like quality control, quality assurance, internal quality control, external quality assurance. It also describes different types of errors like random error and systematic error. The document explains statistical concepts like measures of central tendency, standard deviation, coefficient of variation. It discusses the Westgard rules for evaluating quality control results and triggering investigations into potential errors. The goal of the lecture is to describe the processes involved in quality management for chemical pathology laboratories.
This document summarizes quantitative data analysis techniques for summarizing data from samples and generalizing to populations. It discusses variables, simple and effect statistics, statistical models, and precision of estimates. Key points covered include describing data distribution through plots and statistics, common effect statistics for different variable types and models, ensuring model fit, and interpreting precision, significance, and probability to generalize from samples.
Here are the answers to the measurement scale questions:
1) Number of absences - Ordinal
2) Letter grade on exam - Ordinal
3) Room temperature - Interval
4) Productivity - Ratio
5) Bicycle models - Nominal
6) Educational level - Ordinal
7) Number of questions - Ratio
8) Likert scale categories - Ordinal
The key things to remember are:
- Nominal has no quantitative properties
- Ordinal allows ranking but not equal intervals
- Interval has equal intervals but no absolute zero
- Ratio has all properties - magnitude, equal intervals, absolute zero
Let me know if any of these need more explanation!
This document provides an overview of non-parametric statistics. It defines non-parametric tests as those that make fewer assumptions than parametric tests, such as not assuming a normal distribution. The document compares and contrasts parametric and non-parametric tests. It then explains several common non-parametric tests - the Mann-Whitney U test, Wilcoxon signed-rank test, sign test, and Kruskal-Wallis test - and provides examples of how to perform and interpret each test.
The document discusses method validation requirements in clinical laboratories. It defines validation as testing a measurement procedure to assess its performance and determine acceptability. Method validation involves characterizing six key elements: reportable range, precision, accuracy, reference intervals, sensitivity, and specificity. The degree of validation depends on whether a test is FDA-approved, modified, or non-FDA approved. Common validation studies include precision, accuracy, method comparison, linearity, reference intervals, and sensitivity testing. Validation ensures a test method is fit for its intended use and identifies potential sources of error.
Biological variation as an uncertainty componentGH Yeoh
To assist the clinical interpretation of a test result, there is a necessity to have an additional non-analytical component in the overall estimation of UM, namely the biological variation.
This document discusses key concepts related to sampling theory and measurement in research studies. It defines important sampling terms like population, sampling criteria, sampling methods, sampling error and bias. It also covers levels of measurement, reliability, validity and various measurement strategies like physiological measures, observations, interviews, questionnaires and scales. Finally, it provides an overview of statistical analysis techniques including descriptive statistics, inferential statistics, the normal curve and common tests like t-tests, ANOVA, and regression analysis.
Similar to Introduction to Data Analysis With R and R Studio (20)
Audiovisual and technicalities from preparation to retrieval how to enhance m...Azmi Mohd Tamil
This document discusses strategies for enhancing online presentations for the Clinical Pathology Conference (CPC) at Universiti Kebangsaan Malaysia. It outlines the transition from initially streaming CPC sessions publicly on Facebook to creating a private Facebook group and YouTube channel to cater to different audiences. Equipment was purchased, including an HDMI encoder and later integrating direct streaming from Zoom. A future goal is to integrate hybrid sessions for the Grand CPC with a limited live audience and online participation. Recommendations are provided for achieving broadcast quality, including ensuring a stable internet connection and using a green screen to integrate presenters into slides.
Broadcast quality online teaching at zero budgetAzmi Mohd Tamil
1. The document discusses how to achieve broadcast quality for online teaching with zero budget by using free software like OBS Studio.
2. It provides step-by-step instructions on setting up OBS Studio scenes and configuring the software to live stream to platforms like Zoom, Teams, and Facebook.
3. Additional tips include using a green screen, selecting an optimal internet connection, and integrating graphics while streaming to enhance the online teaching experience without spending money on dedicated hardware.
This document provides instructions on how to use OBS Studio, a free and open-source software, for recording and streaming online lectures and events. It demonstrates how to configure scenes in OBS Studio with elements like webcam, background, and scrolling text. It also shows how to set up a virtual camera in OBS Studio to broadcast the video feed into platforms like Zoom, Teams, and Meet. Additionally, it discusses using green screen capabilities in OBS Studio to integrate separate media sources. Finally, it provides steps on streaming recordings directly to platforms like YouTube and Facebook.
Bengkel 21-12-2020 - Etika atas Talian & Alat MinimaAzmi Mohd Tamil
1. The document discusses equipment and software for achieving broadcast quality when teaching online, including a stable internet connection, using OBS Studio to customize video feeds on Zoom or Teams, and using a green screen.
2. It demonstrates how to configure OBS Studio scenes and layers, and how to stream to Facebook Live using the stream key.
3. It concludes that good quality online teaching is possible with a budget of EUR5 using a notebook, OBS Studio, and workarounds for internet access, rather than spending RM7600.
1) Blended learning, which combines online and face-to-face instruction, is now required for 50% of courses in Malaysian public universities by the Ministry of Higher Education.
2) Lecturers must upload certain minimum required materials to the UKMfolio online platform, including course synopses, at least 7 content files, 4 discussion activities, and 2 assessments.
3) The document provides an example of how one lecturer structured their course on the UKMfolio platform to meet these requirements by uploading files, creating discussions, and linking to external assessments via web tools.
The document discusses recoding variables in an SPSS data file to conduct statistical tests. It describes computing BMI from height and weight variables, then recoding BMI into categories of underweight, normal, and overweight. The recoding is done to test for associations between weight classification and the outcome of small for gestational age (SGA) babies. Steps shown include using the compute variable and recode functions in SPSS to generate new variables for weight classification based on BMI.
Hack#38 - How to Stream Zoom to Facebook & YouTube Without Using An Encoder o...Azmi Mohd Tamil
This document provides instructions for streaming Zoom or Microsoft Teams meetings to Facebook or YouTube without using an encoder or paid webinar package. It describes using the free software OBS Studio along with a VB-Cable to route audio and video between the video conferencing software and OBS. OBS Studio is then configured to stream the combined feed to platforms like Facebook Live. Steps covered include installing OBS Studio, the virtual camera plugin, VB-Cable and configuring the audio and video routes between the apps.
Hack#37 - How to simultaneously live stream to 4 sites using a single hardwar...Azmi Mohd Tamil
The document discusses how to simultaneously live stream to 4 sites using a single hardware encoder. It recommends the URayTech HDMI Video Streaming Encoder, which costs $121.50. It provides step-by-step instructions on configuring the encoder to stream to a Facebook page, Facebook group, and YouTube channel. While it mentions setting up a fourth stream, it is only able to demonstrate streaming to three platforms due to limitations of the encoder.
Cochran Mantel Haenszel Test with Breslow-Day Test & Quadratic EquationAzmi Mohd Tamil
The document describes the Cochran-Mantel-Haenszel method for adjusting odds ratios when analyzing the relationship between a dichotomous risk factor and outcome while controlling for confounding factors. It provides an example looking at the relationship between catecholamine levels and coronary heart disease, adjusting for age and ECG changes. The Mantel-Haenszel method is used to calculate an adjusted odds ratio of 1.89, indicating those with high catecholamine levels have nearly twice the odds of coronary heart disease after accounting for confounders. The Breslow-Day test is then described as a method to assess the homogeneity of odds ratios across strata.
The document discusses assembling a Raspberry Pi 4 mini computer. It lists the components acquired: a Raspberry Pi 4 Model B 4GB, 7-inch touch screen display, touch screen interface, and power supply. It describes installing the operating system by downloading Raspbian or NOOBS, using Etcher to flash the OS to the microSD card, and inserting it into the Raspberry Pi 4. Setup involves enabling WiFi, selecting the OS, language and keyboard, then waiting for installation to finish. Once updated, the Raspberry Pi can be used like a PC to explore its capabilities.
This document discusses using a video encoder to stream Facebook Live broadcasts seamlessly between applications. It recommends buying an Epiphan Video Pearl 2 encoder but suggests a cheaper alternative that can be found on AliExpress. The setup involves connecting the presenter's PC via HDMI to the encoder, which is then connected via USB to a separate Facebook Live streaming PC. OBS Studio is installed on the streaming PC to access the encoder's video feed and configure the Facebook Live stream key and settings to begin broadcasting.
Hack#34 - Online Teaching with Microsoft TeamsAzmi Mohd Tamil
The document discusses how Microsoft Teams can be used for teaching. It notes that Teams allows teachers to email lesson links to students, invite students to Teams, and start teleconferences through the video symbol. Teachers can also add many apps like Zoom to Teams. The document recommends Teams as a platform for online teaching and learning.
This document provides instructions for going live on Facebook using Facebook Live. It details how to install the required Chrome extension to share your screen. It explains how to select your audience and choose whether to use a hardware encoder or built-in webcam to share your screen or app window. The steps are outlined, including going to the publishing tools on a Facebook page, selecting the audience, clicking "Go Live" and then stopping the share and saving the broadcast.
Skype for Business is available for all UKM staff and students through their Office 365 licenses, however some advanced features are not usable since UKM does not have an Exchange server. "Meet Now" can be used to create video conferences in Skype for Business and invite participants, but scheduling meetings through Outlook is not possible. Key features like file sharing and viewing the presenter's desktop are available once participants join the Skype meeting. In conclusion, while licensed for Skype for Business, UKM's lack of an Exchange server limits its functionality to the "Meet Now" option for video conferencing.
This document provides an introduction to structural equation modeling (SEM) through a series of definitions and explanations. It discusses key concepts in SEM including latent versus measured variables, covariance versus correlation, and the history and development of SEM. Sample size requirements and software for conducting SEM are also covered. The document is intended as introductory material for postgraduate students learning about SEM.
This document provides guidelines for protecting a personal computer from viruses, spyware, and hackers. It recommends installing and regularly updating antivirus and anti-spyware software such as ZoneAlarm, Outpost, Spybot Search & Destroy, and Ad-Aware. It also advises using Mozilla browser instead of Microsoft applications to avoid exploits, backing up data both on and off the computer, and being cautious of email attachments. The document emphasizes that education has improved with computers but proper security strategies are needed to prevent education impediments.
Introduction to 20 Classroom Hacks For Education 4.0 (updated)Azmi Mohd Tamil
This document provides an introduction to classroom hacks for Education 4.0 by Assoc. Prof. Dr. Azmi Mohd Tamil. It discusses how the audience and time for education is changing with more students accessing education remotely and increased use of technology. It introduces the concept of Education 4.0 in line with the Fourth Industrial Revolution, where learning will be less confined to physical classrooms and involve new pedagogies. While technologies offer opportunities, the document notes challenges in preparing teachers and adapting curricula to these changes. The workshop aims to provide quick educational hacks or techniques for utilizing different learning spaces, pedagogies, and technologies in teaching to help transition to Education 4.0.
This presentation was provided by Steph Pollock of The American Psychological Association’s Journals Program, and Damita Snow, of The American Society of Civil Engineers (ASCE), for the initial session of NISO's 2024 Training Series "DEIA in the Scholarly Landscape." Session One: 'Setting Expectations: a DEIA Primer,' was held June 6, 2024.
Introduction to AI for Nonprofits with Tapp NetworkTechSoup
Dive into the world of AI! Experts Jon Hill and Tareq Monaur will guide you through AI's role in enhancing nonprofit websites and basic marketing strategies, making it easy to understand and apply.
How to Add Chatter in the odoo 17 ERP ModuleCeline George
In Odoo, the chatter is like a chat tool that helps you work together on records. You can leave notes and track things, making it easier to talk with your team and partners. Inside chatter, all communication history, activity, and changes will be displayed.
How to Build a Module in Odoo 17 Using the Scaffold MethodCeline George
Odoo provides an option for creating a module by using a single line command. By using this command the user can make a whole structure of a module. It is very easy for a beginner to make a module. There is no need to make each file manually. This slide will show how to create a module using the scaffold method.
বাংলাদেশের অর্থনৈতিক সমীক্ষা ২০২৪ [Bangladesh Economic Review 2024 Bangla.pdf] কম্পিউটার , ট্যাব ও স্মার্ট ফোন ভার্সন সহ সম্পূর্ণ বাংলা ই-বুক বা pdf বই " সুচিপত্র ...বুকমার্ক মেনু 🔖 ও হাইপার লিংক মেনু 📝👆 যুক্ত ..
আমাদের সবার জন্য খুব খুব গুরুত্বপূর্ণ একটি বই ..বিসিএস, ব্যাংক, ইউনিভার্সিটি ভর্তি ও যে কোন প্রতিযোগিতা মূলক পরীক্ষার জন্য এর খুব ইম্পরট্যান্ট একটি বিষয় ...তাছাড়া বাংলাদেশের সাম্প্রতিক যে কোন ডাটা বা তথ্য এই বইতে পাবেন ...
তাই একজন নাগরিক হিসাবে এই তথ্য গুলো আপনার জানা প্রয়োজন ...।
বিসিএস ও ব্যাংক এর লিখিত পরীক্ষা ...+এছাড়া মাধ্যমিক ও উচ্চমাধ্যমিকের স্টুডেন্টদের জন্য অনেক কাজে আসবে ...
This presentation includes basic of PCOS their pathology and treatment and also Ayurveda correlation of PCOS and Ayurvedic line of treatment mentioned in classics.
Assessment and Planning in Educational technology.pptxKavitha Krishnan
In an education system, it is understood that assessment is only for the students, but on the other hand, the Assessment of teachers is also an important aspect of the education system that ensures teachers are providing high-quality instruction to students. The assessment process can be used to provide feedback and support for professional development, to inform decisions about teacher retention or promotion, or to evaluate teacher effectiveness for accountability purposes.
The simplified electron and muon model, Oscillating Spacetime: The Foundation...RitikBhardwaj56
Discover the Simplified Electron and Muon Model: A New Wave-Based Approach to Understanding Particles delves into a groundbreaking theory that presents electrons and muons as rotating soliton waves within oscillating spacetime. Geared towards students, researchers, and science buffs, this book breaks down complex ideas into simple explanations. It covers topics such as electron waves, temporal dynamics, and the implications of this model on particle physics. With clear illustrations and easy-to-follow explanations, readers will gain a new outlook on the universe's fundamental nature.
2. Download & Install
• You can download and install R for free from
https://r-project.org/
• Upon installation, download and install the
free version of R Studio Desktop from
https://rstudio.com
• Instructions at
https://youtu.be/hXb47dmPCR8
drtamil@gmail.com
3. Uniqueness of R & R Studio
• R is one of the programming languages that provide an
intensive environment for you to analyze, process,
transform and visualize information.
• It is the primary choice for many statisticians who want
to involve themselves in designing statistical models for
solving complex problems.
• Data are usually entered and manipulated using
spreadsheet such as Microsoft Excel.
• Specific analysis requires specific commands. So you
must know exactly what command is required for the
analysis.
drtamil@gmail.com
5. Parametric Statistical Tests
Variable 1 Variable 2 Criteria Type of Test
Qualitative Qualitative Sample size > 20 dan no
expected value < 5
Chi Square Test (X2
)
Qualitative
Dichotomus
Qualitative
Dichotomus
Sample size > 30 Proportionate Test
Qualitative
Dichotomus
Qualitative
Dichotomus
Sample size > 40 but with at
least one expected value < 5
X2
Test with Yates
Correction
Qualitative
Dichotomus
Quantitative Normally distributed data Student's t Test
Qualitative
Polinomial
Quantitative Normally distributed data ANOVA
Quantitative Quantitative Repeated measurement of the
same individual & item (e.g.
Hb level before & after
treatment). Normally
distributed data
Paired t Test
Quantitative -
continous
Quantitative -
continous
Normally distributed data Pearson Correlation
& Linear
Regresssion
drtamil@gmail.com
6. Non-parametric Statistical Tests
Variable 1 Variable 2 Criteria Type of Test
Qualitative
Dichotomus
Qualitative
Dichotomus
Sample size < 20 or (< 40 but
with at least one expected
value < 5)
Fisher Test
Qualitative
Dichotomus
Quantitative Data not normally distributed Wilcoxon Rank Sum
Test or U Mann-
Whitney Test
Qualitative
Polinomial
Quantitative Data not normally distributed Kruskal-Wallis One
Way ANOVA Test
Quantitative Quantitative Repeated measurement of the
same individual & item
Wilcoxon Rank Sign
Test
Quantitative -
continous/ordina
l
Quantitative -
continous
Data not normally distributed Spearman/Kendall
Rank Correlation
drtamil@gmail.com
7. Statistical Tests for Qualitative Data
Variable 1 Variable 2 Criteria Type of Test
Qualitative Qualitative Sample size > 20 dan no
expected value < 5
Chi Square Test (X2
)
Qualitative
Dichotomus
Qualitative
Dichotomus
Sample size > 30 Proportionate Test
Qualitative
Dichotomus
Qualitative
Dichotomus
Sample size > 40 but with at
least one expected value < 5
X2
Test with Yates
Correction
Qualitative
Dichotomus
Quantitative Normally distributed data Student's t Test
Qualitative
Polinomial
Quantitative Normally distributed data ANOVA
Quantitative Quantitative Repeated measurement of the
same individual & item (e.g.
Hb level before & after
treatment). Normally
Paired t Test
Variable 1 Variable 2 Criteria Type of Test
Qualitative
Dichotomus
Qualitative
Dichotomus
Sample size < 20 or (< 40 but
with at least one expected
value < 5)
Fisher Test
Qualitative
Dichotomus
Quantitative Data not normally distributed Wilcoxon Rank Sum
Test or U Mann-
Whitney Test
Qualitative Quantitative Data not normally distributed Kruskal-Wallis Onedrtamil@gmail.com
8. R Hands-on Exercise
Text in this blue colour are the
commands to be typed in
the Console window.
drtamil@gmail.com
9. URL for data & submit answers
• Data -
https://drive.google.com/file/d/1PzcqCzm5t9KQk
kXAtlvO56bZlMojM8-b/view?usp=sharing
• The analysis required https://wp.me/p4mYLF-vA
• Submit answers at this link
https://docs.google.com/forms/d/1o_L7ZjXF9Q1
PON2zDs_VwkKsLCHT4v-
8WruXhCiVq2Q/viewform
drtamil@gmail.com
11. A study to identify factors that can cause small for gestational
age (SGA) was conducted. Among the factors studied were the
mothers’ body mass index (BMI). It is believed that mothers with
lower BMI were of higher risk to get SGA babies.
• 1. Create a new variable mBMI (Mothers’
Body Mass Index) from the mothers’
HEIGHT (in metre) & WEIGHT (first
trimester weight in kg). mBMI = weight in
kg/(height in metre)2. Calculate the
following for mBMI;
– Mean
– Standard deviation
• 2. Create a new variable OBESCLAS
(Classification of Obesity) from mBMI. Use
the following cutoff point;
– <20 = Underweight
– 20 – 24.99 = Normal
– 25 or larger = Overweight
– Create a frequency table for OBESCLAS.
• 3. Conduct the appropriate statistical test
to test whether there is any association
between OBESCLAS (Underweight/
Normal/Overweight) and OUTCOME.
• 4. Conduct the appropriate statistical test
to test whether there is any association
between BMI and OUTCOME.
• 5. Conduct the appropriate statistical test
to find any association between OBESCLAS
(Underweight/Normal/Overweight) and
BIRTHWGT.
• 6. Assuming that both variables mBMI &
BIRTHWGT are normally distributed,
conduct an appropriate statistical test to
prove the association between the two
variables.
– Demonstrate the association using the
appropriate chart. Determine the
coefficient of determination.
• 7. Conduct Simple Linear Regression using
BIRTHWGT as the dependent variable. Try
to come out with a formula that will
predict the baby’s birthweight based on
the mother’s BMI.
– y = a + bx
drtamil@gmail.com
12. Online form for answers
drtamil@gmail.com
https://docs.google.com/forms/d/1o_L7ZjXF9Q1PON2zDs_VwkKsLCHT4v-8WruXhCiVq2Q/viewform
14. Import Excel into R Studio
• Select the Excel file you
downloaded earlier;
“SGA.xls”
drtamil@gmail.com
15. Import Excel into R Studio
• Click “Import” and the
following command are
executed;
– library(readxl)
– sga <- read_excel
("C:/…./sga.xls")
– View(sga)
drtamil@gmail.com
16. R Studio - compute
A study to identify factors that can cause small for gestational
age (SGA) was conducted. Among the factors studied were
the mothers’ body mass index (BMI). It is believed that
mothers with lower BMI were of higher risk to get SGA
babies.
1. Create a new variable mBMI (Mothers’ Body Mass Index)
from the mothers’ HEIGHT (in metre) & WEIGHT (first
trimester weight in kg). mBMI = weight in kg/(height in
metre)2. Calculate the following for mBMI;
– Mean
– Standard deviation
Copy and paste your answers into your Word file.
19. Recode
• 2. Create a new variable OBESCLAS
(Classification of Obesity) from mBMI. Use the
following cutoff point;
– <20 = Underweight
– 20 – 24.99 = Normal
– 25 or larger = Overweight
– Create a frequency table for OBESCLAS.
drtamil@gmail.com
21. Frequency table for OBESCLAS
• table(sga$obesclas)
– Under Normal Over
– 17 40 43
• prop.table(table(sga$obesclas))
– Under Normal Over
– 0.17 0.40 0.43
– 17% 40% 43%
drtamil@gmail.com
22. Question 2 – Obese Classification
• table(sga$obesclas)
– Under Normal Over
– 17 40 43
• prop.table(table(sga$obesclas))
– Under Normal Over
– 0.17 0.40 0.43
– 17% 40% 43%
drtamil@gmail.com
23. Exercise 3
• 3. Conduct the
appropriate statistical
test to test whether
there is any association
between OBESCLAS
(Underweight/Normal/
Overweight) and
OUTCOME.
• Therefore most suitable
analysis is Pearson Chi-
square.
SGA Normal TOTAL
UnderW
Normal
OverW
TOTAL 50 50 100
drtamil@gmail.com
Variable 1 Variable 2 Criteria Type of Test
Qualitative Qualitative Sample size > 20 dan no
expected value < 5
Chi Square Test (X2
)
Qualitative
Dichotomus
Qualitative
Dichotomus
Sample size > 30 Proportionate Test
Qualitative
Dichotomus
Qualitative
Dichotomus
Sample size > 40 but with at
least one expected value < 5
X2
Test with Yates
Correction
Qualitative
Dichotomus
Quantitative Normally distributed data Student's t Test
Qualitative
Polinomial
Quantitative Normally distributed data ANOVA
Quantitative Quantitative Repeated measurement of the
same individual & item (e.g.
Hb level before & after
treatment). Normally
distributed data
Paired t Test
Variable 1 Variable 2 Criteria Type of Test
Qualitative
Dichotomus
Qualitative
Dichotomus
Sample size < 20 or (< 40 but
with at least one expected
value < 5)
Fisher Test
Qualitative
Dichotomus
Quantitative Data not normally distributed Wilcoxon Rank Sum
Test or U Mann-
Whitney Test
Qualitative
Polinomial
Quantitative Data not normally distributed Kruskal-Wallis One
Way ANOVA Test
25. Chi-Square Results from R Studio
• R not only states that there is a significant association
(p=5x10-6) between mother’s weight classification and
small for gestational age.
• But it also show which group has the higher rate of SGA.
drtamil@gmail.com
26. Results From R Studio
• Underweight mothers
has a higher rate (94%)
of SGA, compared to
normal mothers (58%)
and overweight
mothers (26%).
drtamil@gmail.com
29. Exercise 4
• 4. Conduct the
appropriate statistical test
to test whether there is
any association between
BMI and OUTCOME.
• Basically we are
comparing the mean BMI
of SGA babies’ mothers
against mean BMI of
Normal babies’ mothers.
• Therefore the appropriate
test is Student’s t-test.
drtamil@gmail.com
Variable 1 Variable 2 Criteria Type of Test
Qualitative Qualitative Sample size > 20 dan no
expected value < 5
Chi Square Test (X2
)
Qualitative
Dichotomus
Qualitative
Dichotomus
Sample size > 30 Proportionate Test
Qualitative
Dichotomus
Qualitative
Dichotomus
Sample size > 40 but with at
least one expected value < 5
X2
Test with Yates
Correction
Qualitative
Dichotomus
Quantitative Normally distributed data Student's t Test
Qualitative
Polinomial
Quantitative Normally distributed data ANOVA
Quantitative Quantitative Repeated measurement of the
same individual & item (e.g.
Hb level before & after
treatment). Normally
distributed data
Paired t Test
Quantitative -
continous
Quantitative -
continous
Normally distributed data Pearson Correlation
& Linear
Regresssion
30. Student’s T-Test
• library("car")
• leveneTest(sga$mBMI,
sga$outcome)
– Levene's Test for
Homogeneity of
Variance
(center = median)
– Df F value Pr(>F)
– group 1 0.0827 0.7743
– 98
• Levene test reveals that
variances are not
significantly different
(P = 0.7743).
• Therefore when we run
the t-test, it is for equal
variances.
drtamil@gmail.com
31. T-Test Results from Studio R
• t.test(sga$mBMI ~
sga$outcome, var.equal=TRUE)
– Two Sample t-test
– data: sga$mBMI by sga$outcome
– t = 4.5164, df = 98, p-value =
1.756e-05
– alternative hypothesis: true
difference in means is not equal
to 0
– 95 percent confidence interval:
2.207433 5.667658
– sample estimates:
– mean in group Normal 26.46453
– mean in group SGA 22.52699
• Studio R states that there is
a significant mean
difference of BMI (p =
1.756x10-5) between SGA
babies’ mothers (22.52) and
normal babies’ mothers
(26.46).
• Therefore mean BMI of SGA
babies’ mothers is
significantly lower than the
mean BMI of normal babies’
mothers.
drtamil@gmail.com
34. Exercise 5
• 5. Conduct the appropriate statistical test to find
any association between OBESCLAS
(Underweight/Normal/Overweight) and
BIRTHWGT.
• Basically we are comparing the mean
BIRTHWEIGHT of underweight mothers, normal
weight mothers and overweight mothers.
• Therefore the appropriate test is Analysis of
Variance (ANOVA).
drtamil@gmail.com
35. ANOVA
• library("car")
• leveneTest(sga$birthwgt,
sga$obesclas)
– Levene's Test for
Homogeneity of Variance
(center = median)
• Df F value Pr(>F)
– group 2 3.1702 0.04638 *
– 97
– ---
– Signif. codes: 0 ‘***’ 0.001
‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
• Variance of birthwgt are
significantly different
between the groups
obesclas.
• Therefore when we run
the ANOVA, it is for
unequal variances.
drtamil@gmail.com
36. ANOVA – command
• tapply(sga$birthwgt, sga$obesclas, mean)
– Under Normal Over
– 2.187059 2.768250 3.245116
• tapply(sga$birthwgt, sga$obesclas, sd)
– Under Normal Over
– 0.3403999 0.6712861 0.6606179
• levels(sga$obesclas)
• summary(aov(sga$birthwgt ~ sga$obesclas))
drtamil@gmail.com
38. ANOVA Results from Studio R
• Studio R states that there is a significant mean difference of mean
birth weight (p < 0.0001) between underweight mothers’ (2.187),
normal mothers ‘(2.768) & overweight mothers’(3.245).
• Unfortunately it also proves that there is unequal variances of the
three means. So it fails the homogeneity of variances assumption.
drtamil@gmail.com
39. ANOVA Results – post hoc
• Post-hoc tests indicate there is significant
difference of birth weight between ALL the
three groups.
drtamil@gmail.com
pairwise.t.test(sga$birthwgt, sga$obesclas, p.adjust.method ="bonferroni")
42. Exercise 6
• 6. Assuming that both variables
mBMI & BIRTHWGT are normally
distributed, conduct an appropriate
statistical test to prove the
association between the two
variables.
–Demonstrate the association using the
appropriate chart. Determine the
coefficient of determination.
drtamil@gmail.com
43. Pearson Correlation
• mBMI and birth weight are both normally distributed
continuous data. Since the aim is to measure the
strength and direction of the association between
these two continuous variable, therefore Pearson
Correlation is the most appropriate test.
drtamil@gmail.com
expected value < 5
Qualitative
Dichotomus
Qualitative
Dichotomus
Sample size > 30 Proportionate Test
Qualitative
Dichotomus
Qualitative
Dichotomus
Sample size > 40 but with at
least one expected value < 5
X2
Test with Yates
Correction
Qualitative
Dichotomus
Quantitative Normally distributed data Student's t Test
Qualitative
Polinomial
Quantitative Normally distributed data ANOVA
Quantitative Quantitative Repeated measurement of the
same individual & item (e.g.
Hb level before & after
treatment). Normally
distributed data
Paired t Test
Quantitative -
continous
Quantitative -
continous
Normally distributed data Pearson Correlation
& Linear
Regresssion
44. Pearson’s Correlation
Command
• cor.test(sga$mBMI, sga$birthwgt,
method="pearson")
– Pearson's product-moment
correlation
– data: sga$mBMI and sga$birthwgt
– t = 5.4379, df = 98, p-value =
3.959e-07
– alternative hypothesis: true
correlation is not equal to 0
– 95 percent confidence interval:
– 0.3148037 0.6193051
– sample estimates:
– cor
– 0.4814521
Discussion
• r = 0.4814521
• p-value = 3.959 x 10-7
• Fair & positive correlation
between mBMI and Birthweight.
• Therefore as mothers’ BMI
increases, the birth weight also
increases.
• r2 =0.48145212 = 0.2318
• 23.18% (r2=0.2318) variability of
the birth weight is determined by
the variability of the mothers’
BMI.
drtamil@gmail.com
48. Exercise 7
• 7. Conduct Simple Linear Regression
using BIRTHWGT as the dependent
variable. Try to come out with a
formula that will predict the baby’s
birth weight based on the mother’s
BMI.
–y = a + bx
drtamil@gmail.com
49. Simple Linear Regression
• mBMI and birth weight are both normally distributed
continuous data. Since the aim is to come out with a
regression formula between these two continuous
variable, therefore Simple Linear Regression is the
most appropriate test.
drtamil@gmail.com
expected value < 5
Qualitative
Dichotomus
Qualitative
Dichotomus
Sample size > 30 Proportionate Test
Qualitative
Dichotomus
Qualitative
Dichotomus
Sample size > 40 but with at
least one expected value < 5
X2
Test with Yates
Correction
Qualitative
Dichotomus
Quantitative Normally distributed data Student's t Test
Qualitative
Polinomial
Quantitative Normally distributed data ANOVA
Quantitative Quantitative Repeated measurement of the
same individual & item (e.g.
Hb level before & after
treatment). Normally
distributed data
Paired t Test
Quantitative -
continous
Quantitative -
continous
Normally distributed data Pearson Correlation
& Linear
Regresssion
50. plot(x = sga$mBMI, y = sga$birthwgt, type = 'p')
abline(lm(sga$birthwgt ~ sga$mBMI), col=‘red’, lty=2)
drtamil@gmail.com
52. SLR Results from Studio R
• Studio R states that there is a significant regression
coefficient (b=0.07330).
• The constant (a) is 1.07895
• 23.18% (r2=0.2318) variability of the birth weight is
determined by the variability of the mothers’ BMI.
• BW = 1.079 + 0.073BMI
• For every increase of BMI of 1 unit, BW increases 0.07kg.
drtamil@gmail.com