The document discusses power analysis and statistical power. It notes that power analysis allows researchers to determine if their planned sample size is large enough to detect effects of interest and not too large. Conducting power analyses can help ensure efficient use of resources and avoid issues like p-hacking. Understanding power is also important for interpreting null results and replication failures, as low-powered studies are unlikely to find small effects.
This document discusses effect size and how to establish causality in longitudinal data analysis. It begins by reviewing autocorrelation and how to account for it in mixed effects models. It then discusses cross-lagged models, where including lags of variables can help determine whether A causes B or B causes A. Establishing causality further requires showing the relationship is directional by testing models in both directions. The document provides an example using a diary study to show that emotional support attempts cause increased warmth, not vice versa. It also discusses how effect size compares to statistical significance and how to interpret the size of different effects.
There are a few potential issues with modeling the data this way:
1. Students are nested within classrooms. A student's outcomes may be more similar to others in their classroom compared to students in other classrooms, due to shared classroom factors. This violates the independence assumption of ordinary least squares regression.
2. Classroom-level factors like teacher quality are not included in the model but likely influence student outcomes. Failing to account for these could lead to omitted variable bias.
3. The error terms for students within the same classroom may not be independent as assumed, since classroom factors induce correlation.
To properly account for the nested data structure, we need to model the classroom as a second level in a multilevel
This document discusses issues that can arise with missing or incomplete data in statistical models. It covers several types of problems, including rank deficiency caused by linear dependencies between predictors, and incomplete designs where certain combinations of factor levels are not represented in the data. It also discusses different patterns of missing data and how the mechanism that led to certain data points being missing (missing completely at random, missing at random, or missing not at random) impacts which techniques are appropriate for handling the missing values, such as casewise deletion, listwise deletion, or imputation methods. Examples using the lmer function in R and a dataset on cortisol levels are provided to illustrate these concepts.
The document discusses introducing mixed effects models. It focuses on fixed effects, which are the effects of interest. This week covers sample datasets, theoretical models, fitting models in R, and interpreting parameters like slopes and intercepts. The sample dataset examines factors like newcomers and experience that influence how long teams take to assemble phones. The document uses this example to demonstrate key steps in modeling fixed effects, such as estimating slopes, intercepts, and conducting hypothesis tests on parameters.
This document discusses an upcoming course on using mixed effects models in psychology. The course will cover applying mixed effects models to common research designs, fitting models in R, and addressing issues that arise. Mixed effects models are motivated by research designs involving multiple random effects, nested random effects, crossed random effects, categorical dependent variables, and continuous predictors. Accounting for these complex sampling procedures and non-independent observations is important for making valid statistical inferences.
Mixed Effects Models - Signal Detection TheoryScott Fraundorf
Signal detection theory provides a framework for analyzing categorical judgments by distinguishing between sensitivity and response bias. Sensitivity refers to the ability to discriminate between signal and noise, such as detecting whether sentence pairs are the same color. Response bias reflects the overall tendency to respond in a certain category, such as a propensity to judge sentences as grammatical. Signal detection theory models allow researchers to separately measure how experimental manipulations influence these components of performance.
This document discusses effect size and how to establish causality in longitudinal data analysis. It begins by reviewing autocorrelation and how to account for it in mixed effects models. It then discusses cross-lagged models, where including lags of variables can help determine whether A causes B or B causes A. Establishing causality further requires showing the relationship is directional by testing models in both directions. The document provides an example using a diary study to show that emotional support attempts cause increased warmth, not vice versa. It also discusses how effect size compares to statistical significance and how to interpret the size of different effects.
There are a few potential issues with modeling the data this way:
1. Students are nested within classrooms. A student's outcomes may be more similar to others in their classroom compared to students in other classrooms, due to shared classroom factors. This violates the independence assumption of ordinary least squares regression.
2. Classroom-level factors like teacher quality are not included in the model but likely influence student outcomes. Failing to account for these could lead to omitted variable bias.
3. The error terms for students within the same classroom may not be independent as assumed, since classroom factors induce correlation.
To properly account for the nested data structure, we need to model the classroom as a second level in a multilevel
This document discusses issues that can arise with missing or incomplete data in statistical models. It covers several types of problems, including rank deficiency caused by linear dependencies between predictors, and incomplete designs where certain combinations of factor levels are not represented in the data. It also discusses different patterns of missing data and how the mechanism that led to certain data points being missing (missing completely at random, missing at random, or missing not at random) impacts which techniques are appropriate for handling the missing values, such as casewise deletion, listwise deletion, or imputation methods. Examples using the lmer function in R and a dataset on cortisol levels are provided to illustrate these concepts.
The document discusses introducing mixed effects models. It focuses on fixed effects, which are the effects of interest. This week covers sample datasets, theoretical models, fitting models in R, and interpreting parameters like slopes and intercepts. The sample dataset examines factors like newcomers and experience that influence how long teams take to assemble phones. The document uses this example to demonstrate key steps in modeling fixed effects, such as estimating slopes, intercepts, and conducting hypothesis tests on parameters.
This document discusses an upcoming course on using mixed effects models in psychology. The course will cover applying mixed effects models to common research designs, fitting models in R, and addressing issues that arise. Mixed effects models are motivated by research designs involving multiple random effects, nested random effects, crossed random effects, categorical dependent variables, and continuous predictors. Accounting for these complex sampling procedures and non-independent observations is important for making valid statistical inferences.
Mixed Effects Models - Signal Detection TheoryScott Fraundorf
Signal detection theory provides a framework for analyzing categorical judgments by distinguishing between sensitivity and response bias. Sensitivity refers to the ability to discriminate between signal and noise, such as detecting whether sentence pairs are the same color. Response bias reflects the overall tendency to respond in a certain category, such as a propensity to judge sentences as grammatical. Signal detection theory models allow researchers to separately measure how experimental manipulations influence these components of performance.
This document provides information and examples related to interpreting interaction effects in statistical models. It begins with an overview of how to interpret numerical interaction terms and then provides several examples of interactions between variables and their interpretation. These include examples involving intrinsic motivation and autonomy on learning, number of choices and maximizing strategy on satisfaction, language proficiency and word frequency on translation accuracy, and more. It then shifts to discussing model comparison, including nested models, hypothesis testing using likelihood ratio tests, and comparing models.
This document discusses including level-2 variables in multilevel models. It explains that level-2 variables characterize groups (like classrooms) and are invariant within groups. An example adds the variable "TeacherTheory" to characterize teachers' mindset and explain differences between classrooms. This reduces unexplained classroom variance but does not change the level-1 model or residuals. Cross-level interactions like the effect of a student's mindset depending on their teacher's mindset can also be added. Including level-2 variables provides more information about group differences but does not alter the basic multilevel structure.
Mixed Effects Models - Centering and TransformationsScott Fraundorf
This document provides information about analyzing a dataset called numerosity that contains reaction time (RT) data from a dot counting task and self-reported math anxiety ratings. It discusses centering the numerosity (number of dots) and anxiety variables, including mean centering, centering around other values, and the difference between grand-mean and cluster-mean centering. It also covers transforming the numerosity variable using logarithms due to the non-linear relationship with RT. The goal is to build mixed effects models to examine the effects of numerosity and anxiety on RT.
The document discusses introducing random slopes to multilevel models. Random slopes allow the relationship between a predictor variable (like tutor hours) to vary across levels (like schools). This accounts for differences in how effective an intervention may be depending on context. The notation for random slopes models is presented, with classroom and student models nested within school-level models. Implementing random slopes helps address non-independence of observations and better estimate variability.
In this talk I discuss our recent Bayesian reanalysis of the Reproducibility Project: Psychology.
The slides at the end include the technical details underlying the Bayesian model averaging method we employ.
Bayesian Bias Correction: Critically evaluating sets of studies in the presen...Alexander Etz
Slides from my recent talks on a new method called Bayesian Bias Correction. We attempt to mitigate the effects of publication bias on our estimates of effect size by trying to model the publication process itself.
Null hypothesis for single linear regressionKen Plummer
The document discusses the null hypothesis for a single linear regression analysis. It explains that the null hypothesis states that there is no effect or relationship between the independent and dependent variables. As an example, if investigating the relationship between hours of sleep and ACT scores, the null hypothesis would be: "There will be no significant prediction of ACT scores by hours of sleep." The document provides a template for writing the null hypothesis in terms of the specific independent and dependent variables being analyzed.
Null hypothesis for multiple linear regressionKen Plummer
The document discusses null hypotheses for multiple linear regression. It provides two templates for writing null hypotheses. Template 1 states there will be no significant prediction of the dependent variable (e.g. ACT scores) by the independent variables (e.g. hours of sleep, study time, gender, mother's education). Template 2 states that in the presence of other variables, there will be no significant prediction of the dependent variable by a specific independent variable. The document provides an example applying both templates to investigate the prediction of ACT scores by hours of sleep, study time, gender, and mother's education.
Delta Analytics is a 501(c)3 non-profit in the Bay Area. We believe that data is powerful, and that anybody should be able to harness it for change. Our teaching fellows partner with schools and organizations worldwide to work with students excited about the power of data to do good.
Welcome to the course! These modules will teach you the fundamental building blocks and the theory necessary to be a responsible machine learning practitioner in your own community. Each module focuses on accessible examples designed to teach you about good practices and the powerful (yet surprisingly simple) algorithms we use to model data.
To learn more about our mission or provide feedback, take a look at www.deltanalytics.org.
About Joseph Ours' Presentation – “Bad Metric – Bad!”
Metrics have always been used in corporate sectors, primarily as a way to gain insight into what is an otherwise invisible world. Organizations blindly adopt a set of metrics as a way of satisfying some process transparency requirement, rarely applying any statistical or scientific thought behind the measures and metrics they establish and interpret. Many metrics do not represent what people believe they do and as a result can lead to erroneous decisions. Joseph looks at some of the common and some of the humorous testing metrics and determines why they are failures. He further discusses the real purpose of metrics, metrics programs and finishes with pitfalls into which you fall.
Best Practices for the Academic User: Maximizing the Impact of Your Instituti...Qualtrics
To view the on-demand webinar for this presentation see the following link: https://success.qualtrics.com/academic-best-practices-watch.html
Qualtrics has changed the landscape for colleges and universities, introducing many features to help academic decision makers run more successful surveys.
Join Qualtrics and Jag Patel, Associate Director of Institutional Research at MIT, as we share best practices and tips for academic users.
Functions based algebra 1 pesentation to department 042312maryannfoss
This document provides an overview of a teacher's training, philosophy on teaching algebra, units taught, and results from implementing a functions-based approach to teaching algebra. The teacher has taken several courses related to curriculum, instruction, and assessment. Their philosophy is to use a functions-based approach to help students make their own associations and build patterns, using visualization and context to give meaning to algebra. Units covered include families of functions, linear, absolute value, systems, exponential, quadratic, polynomial, and radical functions. Test results showed students scored higher using the functions-based approach, particularly co-taught students. The teacher reflects on personal and professional growth from implementing this approach.
R - what do the numbers mean? #RStats This is the presentation for my Demo at Orlando Live60 AILIve. We go through statistics interpretation with examples
Forecasting using data workshop slides for the Deliver conference in Winnipeg October 2016. This session introduces practical exercises for probabilistic forecasting. http://www.prdcdeliver.com
Module 1 introduction to machine learningSara Hooker
We believe in building technical capacity all over the world.
We are building and teaching an accessible introduction to machine learning for students passionate about the power of data to do good.
Welcome to the course! These modules will teach you the fundamental building blocks and the theory necessary to be a responsible machine learning practitioner in your own community. Each module focuses on accessible examples designed to teach you about good practices and the powerful (yet surprisingly simple) algorithms we use to model data.
To learn more about our work, visit www.deltanalytics.org
About CORE:
The Culture of Research and Education (C.O.R.E.) webinar series is spearheaded by Dr. Bernice B. Rumala, CORE Chair & Program Director of the Ph.D. in Health Sciences program in collaboration with leaders and faculty across all academic programs.
This innovative and wide-ranging series is designed to provide continuing education, skills-building techniques, and tools for academic and professional development. These sessions will provide a unique chance to build your professional development toolkit through presentations, discussions, and workshops with Trident’s world-class faculty.
For further information about CORE or to present, you may contact Dr. Bernice B. Rumala at Bernice.rumala@trident.edu
sience 2.0 : an illustration of good research practices in a real studywolf vanpaemel
a presentation explaining the what, how and why of some of the features of science 2.0 (replication, registration, high power, bayesian statistics, estimation, co-pilot multi-software approach, distinction between confirmatory and exploratory analyses, and open science) using steegen et al. (2014) as a running example.
This document provides information and examples related to interpreting interaction effects in statistical models. It begins with an overview of how to interpret numerical interaction terms and then provides several examples of interactions between variables and their interpretation. These include examples involving intrinsic motivation and autonomy on learning, number of choices and maximizing strategy on satisfaction, language proficiency and word frequency on translation accuracy, and more. It then shifts to discussing model comparison, including nested models, hypothesis testing using likelihood ratio tests, and comparing models.
This document discusses including level-2 variables in multilevel models. It explains that level-2 variables characterize groups (like classrooms) and are invariant within groups. An example adds the variable "TeacherTheory" to characterize teachers' mindset and explain differences between classrooms. This reduces unexplained classroom variance but does not change the level-1 model or residuals. Cross-level interactions like the effect of a student's mindset depending on their teacher's mindset can also be added. Including level-2 variables provides more information about group differences but does not alter the basic multilevel structure.
Mixed Effects Models - Centering and TransformationsScott Fraundorf
This document provides information about analyzing a dataset called numerosity that contains reaction time (RT) data from a dot counting task and self-reported math anxiety ratings. It discusses centering the numerosity (number of dots) and anxiety variables, including mean centering, centering around other values, and the difference between grand-mean and cluster-mean centering. It also covers transforming the numerosity variable using logarithms due to the non-linear relationship with RT. The goal is to build mixed effects models to examine the effects of numerosity and anxiety on RT.
The document discusses introducing random slopes to multilevel models. Random slopes allow the relationship between a predictor variable (like tutor hours) to vary across levels (like schools). This accounts for differences in how effective an intervention may be depending on context. The notation for random slopes models is presented, with classroom and student models nested within school-level models. Implementing random slopes helps address non-independence of observations and better estimate variability.
In this talk I discuss our recent Bayesian reanalysis of the Reproducibility Project: Psychology.
The slides at the end include the technical details underlying the Bayesian model averaging method we employ.
Bayesian Bias Correction: Critically evaluating sets of studies in the presen...Alexander Etz
Slides from my recent talks on a new method called Bayesian Bias Correction. We attempt to mitigate the effects of publication bias on our estimates of effect size by trying to model the publication process itself.
Null hypothesis for single linear regressionKen Plummer
The document discusses the null hypothesis for a single linear regression analysis. It explains that the null hypothesis states that there is no effect or relationship between the independent and dependent variables. As an example, if investigating the relationship between hours of sleep and ACT scores, the null hypothesis would be: "There will be no significant prediction of ACT scores by hours of sleep." The document provides a template for writing the null hypothesis in terms of the specific independent and dependent variables being analyzed.
Null hypothesis for multiple linear regressionKen Plummer
The document discusses null hypotheses for multiple linear regression. It provides two templates for writing null hypotheses. Template 1 states there will be no significant prediction of the dependent variable (e.g. ACT scores) by the independent variables (e.g. hours of sleep, study time, gender, mother's education). Template 2 states that in the presence of other variables, there will be no significant prediction of the dependent variable by a specific independent variable. The document provides an example applying both templates to investigate the prediction of ACT scores by hours of sleep, study time, gender, and mother's education.
Delta Analytics is a 501(c)3 non-profit in the Bay Area. We believe that data is powerful, and that anybody should be able to harness it for change. Our teaching fellows partner with schools and organizations worldwide to work with students excited about the power of data to do good.
Welcome to the course! These modules will teach you the fundamental building blocks and the theory necessary to be a responsible machine learning practitioner in your own community. Each module focuses on accessible examples designed to teach you about good practices and the powerful (yet surprisingly simple) algorithms we use to model data.
To learn more about our mission or provide feedback, take a look at www.deltanalytics.org.
About Joseph Ours' Presentation – “Bad Metric – Bad!”
Metrics have always been used in corporate sectors, primarily as a way to gain insight into what is an otherwise invisible world. Organizations blindly adopt a set of metrics as a way of satisfying some process transparency requirement, rarely applying any statistical or scientific thought behind the measures and metrics they establish and interpret. Many metrics do not represent what people believe they do and as a result can lead to erroneous decisions. Joseph looks at some of the common and some of the humorous testing metrics and determines why they are failures. He further discusses the real purpose of metrics, metrics programs and finishes with pitfalls into which you fall.
Best Practices for the Academic User: Maximizing the Impact of Your Instituti...Qualtrics
To view the on-demand webinar for this presentation see the following link: https://success.qualtrics.com/academic-best-practices-watch.html
Qualtrics has changed the landscape for colleges and universities, introducing many features to help academic decision makers run more successful surveys.
Join Qualtrics and Jag Patel, Associate Director of Institutional Research at MIT, as we share best practices and tips for academic users.
Functions based algebra 1 pesentation to department 042312maryannfoss
This document provides an overview of a teacher's training, philosophy on teaching algebra, units taught, and results from implementing a functions-based approach to teaching algebra. The teacher has taken several courses related to curriculum, instruction, and assessment. Their philosophy is to use a functions-based approach to help students make their own associations and build patterns, using visualization and context to give meaning to algebra. Units covered include families of functions, linear, absolute value, systems, exponential, quadratic, polynomial, and radical functions. Test results showed students scored higher using the functions-based approach, particularly co-taught students. The teacher reflects on personal and professional growth from implementing this approach.
R - what do the numbers mean? #RStats This is the presentation for my Demo at Orlando Live60 AILIve. We go through statistics interpretation with examples
Forecasting using data workshop slides for the Deliver conference in Winnipeg October 2016. This session introduces practical exercises for probabilistic forecasting. http://www.prdcdeliver.com
Module 1 introduction to machine learningSara Hooker
We believe in building technical capacity all over the world.
We are building and teaching an accessible introduction to machine learning for students passionate about the power of data to do good.
Welcome to the course! These modules will teach you the fundamental building blocks and the theory necessary to be a responsible machine learning practitioner in your own community. Each module focuses on accessible examples designed to teach you about good practices and the powerful (yet surprisingly simple) algorithms we use to model data.
To learn more about our work, visit www.deltanalytics.org
About CORE:
The Culture of Research and Education (C.O.R.E.) webinar series is spearheaded by Dr. Bernice B. Rumala, CORE Chair & Program Director of the Ph.D. in Health Sciences program in collaboration with leaders and faculty across all academic programs.
This innovative and wide-ranging series is designed to provide continuing education, skills-building techniques, and tools for academic and professional development. These sessions will provide a unique chance to build your professional development toolkit through presentations, discussions, and workshops with Trident’s world-class faculty.
For further information about CORE or to present, you may contact Dr. Bernice B. Rumala at Bernice.rumala@trident.edu
sience 2.0 : an illustration of good research practices in a real studywolf vanpaemel
a presentation explaining the what, how and why of some of the features of science 2.0 (replication, registration, high power, bayesian statistics, estimation, co-pilot multi-software approach, distinction between confirmatory and exploratory analyses, and open science) using steegen et al. (2014) as a running example.
Crisis of confidence, p-hacking and the future of psychologyMatti Heino
The document discusses issues with statistical analysis and interpretation in research. It notes that traditional null hypothesis significance testing can lead to problems like publication bias. Bayesian statistics are presented as an alternative that considers the probability of the data under both the null and alternative hypotheses. However, Bayesian methods still require transparent reporting and are not a panacea. Overall statistical power in many fields remains low, and selective reporting can still undermine reliability regardless of the statistical approach. Transparency in analysis, open sharing of data and materials, and preregistration of hypotheses are emphasized as ways to improve the credibility of research findings.
Statistical Power: Foundations and Applications is a presentation about statistical power analysis. It covers the background and definition of statistical power analysis, the statistical foundations including the central role of noncentrality, how to determine sample size, power, and minimum detectable effect size for common statistical tests like a two sample t-test comparing group means. It also discusses available power analysis software tools and how to conduct power analyses using G*Power and Stata for designs comparing two independent group means.
Inferential statistics use samples to make generalizations about populations. It allows researchers to test theories designed to apply to entire populations even though samples are used. The goal is to determine if sample characteristics differ enough from the null hypothesis, which states there is no difference or relationship, to justify rejecting the null in favor of the research hypothesis. All inferential tests examine the size of differences or relationships in a sample compared to variability and sample size to evaluate how deviant the results are from what would be expected by chance alone.
Statistics is used to interpret data and draw conclusions about populations based on sample data. Hypothesis testing involves evaluating two statements (the null and alternative hypotheses) about a population using sample data. A hypothesis test determines which statement is best supported.
The key steps in hypothesis testing are to formulate the hypotheses, select an appropriate statistical test, choose a significance level, collect and analyze sample data to calculate a test statistic, determine the probability or critical value associated with the test statistic, and make a decision to reject or fail to reject the null hypothesis based on comparing the probability or test statistic to the significance level and critical value.
An example tests whether the proportion of internet users who shop online is greater than 40% using
This document discusses common limitations that arise when working with large datasets, or "big data". It outlines four main limitations: 1) Measuring the wrong thing if the available data does not directly measure the variable of interest. 2) Biased samples if the data is not representative of the target population. 3) Dependence if observations are not independent due to repeated measurements. 4) Uncommon support if qualifying cases that satisfy criteria like matching are rare in the data. For each limitation, common patterns are described and opportunities for addressing the issues through better measurement, modeling, or experimental design are presented. The conclusion emphasizes that generating high quality data is better than relying on convenience data alone, and there are research opportunities in overcoming limitations
This document provides an overview of hypothesis testing in the context of regression analysis with a single regressor. It discusses stating the population parameter of interest, using sample data to estimate this parameter, determining the standard error of the estimator, and conducting hypothesis tests by comparing the test statistic to critical values. Examples are provided to illustrate testing whether the slope coefficient is statistically different from zero, and interpreting results based on p-values and significance levels.
The document discusses objectives and concepts related to statistical analysis in biology, including:
- Types of data, graphs, and statistical analyses such as mean, standard deviation, and chi square analysis.
- Calculating and interpreting the mean and standard deviation of a data set to describe variability.
- Using standard deviation to compare the spread of data between samples and determine significance.
- Performing hypothesis testing using calculated t values, t tables, and p values to determine if differences between data sets are statistically significant.
This document defines key terms used in data analysis and statistical inference, including population, sample, parameter, and statistic. It explains that statistics estimated from samples are used to infer unknown population parameters, and that error occurs since samples rather than entire populations are studied. The document also discusses theory and logic in data analysis, noting that theories are built on testable propositions and hypotheses are tested but never proven, instead only rejected or not rejected.
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...jemille6
Uri Simonsohn (Professor, Department of Operations, Innovation and Data Sciences at Esade)
ABSTRACT: The statistical tools listed in the title share that a mathematically elegant solution has
become the consensus advice of statisticians, methodologists and some
mathematically sophisticated researchers writing tutorials and textbooks, and yet,
they lead research workers to meaningless answers, that are often also statistically
invalid. Part of the problem is that advice givers take the mathematical abstractions
of the tools they advocate for literally, instead of taking the actual behavior of
researchers seriously.
Psbe2 08 research methods 2011-2012 - week 2Vlady Fckfb
The document discusses null hypothesis significance testing (NHST) and power. It explains that NHST is used to determine if mean differences between groups are greater than would be expected by chance. A statistically significant result is one that has a low probability of occurring if the null hypothesis is true. The level of significance is typically set at p<.05. Limitations of NHST include that small sample sizes can lead to a lack of power to detect effects. Power refers to the likelihood of correctly rejecting the null hypothesis and depends on sample size, significance level, and effect size. Prospective power analyses estimate the needed sample size while retrospective analyses determine a study's power based on its parameters.
Psbe2 08 research methods 2011-2012 - week 2Vlady Fckfb
The document discusses null hypothesis significance testing (NHST) and power. It explains that NHST is used to determine if mean differences between groups are greater than would be expected by chance. A statistically significant result is one that has a low probability of occurring if the null hypothesis is true. The level of significance is typically set at p<.05. Limitations of NHST include that small sample sizes may lack power to detect real effects. The document also discusses how power and effect size determine the likelihood of correctly rejecting the null hypothesis and avoiding Type II errors.
Generalizability in fMRI, fast and slowTal Yarkoni
- The document discusses issues around generalizability in fMRI research and proposes strategies for "fast" versus "slow" generalization.
- It notes that most findings in neuroimaging are only interesting if they generalize broadly, but that statistical models must support desired inferences about new stimuli or subjects.
- The document advocates treating factors like subjects and stimuli as random effects to support generalization, and using large stimulus samples to avoid overgeneralizing findings. It suggests both modeling approaches (e.g. random effects) and study design (e.g. more stimuli/subjects) can help researchers generalize more appropriately.
UX STRAT Online 2020: Dr. Martin Tingley, NetflixUX STRAT
Over the years, the Netflix UI has evolved from a sparse and static webpage into an immersive, video-centric experience tailored to a variety of platforms. In this talk, I’ll describe the simple but powerful framework that Netflix uses to evolve the product experience: we ask our members, through online A/B tests, which of several possible experiences resonate with them. I’ll also describe the steps we are taking to democratize access to experimentation across the company so that we can explore more ideas and identify those that deliver more value to our members.
This document summarizes a webinar on advanced sampling techniques. It discusses how to calculate sample sizes for variable and attribute data to detect differences between populations or groups while balancing acceptable levels of error risks. For variable data, compromising on the difference to be detected or error risks can reduce sample size needs. For attributes, sample sizes needed to detect small proportions are often very large, in the hundreds. The webinar explores balancing these factors to determine practical sample sizes.
- A researcher is evaluating whether a null hypothesis (H0: μ = 80) should be rejected using sample data. The combination of factors (sample mean (μ) and sample size (n)) that is most likely to result in rejecting the null hypothesis is μ = 10 and n = 50. With a larger sample size and mean further from the null hypothesized mean, this combination provides the highest chance of rejecting the null hypothesis.
- Hypothesis testing involves using sample data to evaluate hypotheses about population parameters. The researcher states hypotheses, collects data, and uses a test statistic to determine whether to reject or fail to reject the null hypothesis. Type I and II errors can occur.
- Key factors that influence the likelihood
This document discusses empirical logit models. It begins by clarifying logit models, which use a logit link function to model binary dependent variables. It then discusses Poisson regression models. The document notes that empirical logit can be used to model low-frequency events like errors or rare outcomes that may have probabilities close to 0. Empirical logit makes extreme probabilities less extreme by adding 0.5 to the numerator and denominator when calculating the logit. It concludes by discussing how to implement empirical logit models using the psycholing package in R.
This document provides an introduction to logit models for categorical outcomes. It discusses how logit models use a logistic link function to model the log odds of categorical outcomes as a linear combination of predictors while accounting for random effects. Key points covered include probabilities, odds, the logit transformation, interpreting model parameters, and implementing logit models in R using the glmer() function from the lme4 package.
1. Post-hoc comparisons allow testing differences between individual levels or cells in an experiment after fitting a linear mixed effects model. The Tukey test, available via the emmeans package, corrects for multiple comparisons.
2. Estimated marginal means (EMMs) report what cell means would be if covariates were equal across conditions, providing a hypothetical adjustment. EMMs can be compared to test effects averaging over other variables.
3. Both post-hoc comparisons and EMMs require fitting an appropriate linear mixed effects model first before making inferences about condition differences.
Mixed Effects Models - Crossed Random EffectsScott Fraundorf
This document discusses crossed random effects analysis using an experimental dataset measuring response times for students learning English naming words. The dataset includes 60 subjects presented with 49 words each, with measurements of years of study and word frequency as variables of interest. It describes fitting a linear mixed effects model with random intercepts for subject and word to account for non-independence of observations. It also discusses joining additional word frequency data from another file and adding word frequency as a fixed effect. Finally, it introduces the concepts of between-subjects and within-subjects variables to determine what random slopes may be relevant.
This document provides information about a business course, including that there will be class on Labor Day and lab materials are available on Canvas. It discusses fixed effects in models and how to include them in R scripts. Predicted values from models are covered, along with using residuals to detect outliers and interpreting interactions between variables in models. The document provides examples of adding interaction terms to model formulas and interpreting the results.
This document provides an overview of data processing techniques in R, including filtering, mutating, and working with different variable types. It discusses using filter() to subset data frames based on logical criteria, saving filtered data to new data frames. It also covers using mutate() to create new variables and recode existing ones. The if_else() function and applying conditional logic when assigning values is explained. The document concludes with a brief overview of variable types in R like numeric, character, and factor, as well as functions for converting between types and examples of built-in and package functions for analysis.
R allows users to perform calculations, analyze data, and create visualizations through commands and functions. This document discusses R basics including commands, functions, arguments, reading in data from files, and performing descriptive statistics. Key points covered include using the pipe operator (%)>% to chain multiple functions together, loading and summarizing data, and obtaining descriptive statistics for single and multiple variables as well as grouped variables.
This document is a resume for Scott H. Fraundorf, summarizing his experience and qualifications as a data scientist. Over 10 years, he has developed statistical models and experiments to predict human behavior, with skills in Python, R, SQL, and software development. He has worked as an Assistant Professor at the University of Pittsburgh designing randomized experiments and statistical models of human cognition. Prior to that, he held positions as a Research Associate and Cognitive Scientist developing models of student learning and performance.
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...indexPub
The recent surge in pro-Palestine student activism has prompted significant responses from universities, ranging from negotiations and divestment commitments to increased transparency about investments in companies supporting the war on Gaza. This activism has led to the cessation of student encampments but also highlighted the substantial sacrifices made by students, including academic disruptions and personal risks. The primary drivers of these protests are poor university administration, lack of transparency, and inadequate communication between officials and students. This study examines the profound emotional, psychological, and professional impacts on students engaged in pro-Palestine protests, focusing on Generation Z's (Gen-Z) activism dynamics. This paper explores the significant sacrifices made by these students and even the professors supporting the pro-Palestine movement, with a focus on recent global movements. Through an in-depth analysis of printed and electronic media, the study examines the impacts of these sacrifices on the academic and personal lives of those involved. The paper highlights examples from various universities, demonstrating student activism's long-term and short-term effects, including disciplinary actions, social backlash, and career implications. The researchers also explore the broader implications of student sacrifices. The findings reveal that these sacrifices are driven by a profound commitment to justice and human rights, and are influenced by the increasing availability of information, peer interactions, and personal convictions. The study also discusses the broader implications of this activism, comparing it to historical precedents and assessing its potential to influence policy and public opinion. The emotional and psychological toll on student activists is significant, but their sense of purpose and community support mitigates some of these challenges. However, the researchers call for acknowledging the broader Impact of these sacrifices on the future global movement of FreePalestine.
Brand Guideline of Bashundhara A4 Paper - 2024khabri85
It outlines the basic identity elements such as symbol, logotype, colors, and typefaces. It provides examples of applying the identity to materials like letterhead, business cards, reports, folders, and websites.
Andreas Schleicher presents PISA 2022 Volume III - Creative Thinking - 18 Jun...EduSkills OECD
Andreas Schleicher, Director of Education and Skills at the OECD presents at the launch of PISA 2022 Volume III - Creative Minds, Creative Schools on 18 June 2024.
A Visual Guide to 1 Samuel | A Tale of Two HeartsSteve Thomason
These slides walk through the story of 1 Samuel. Samuel is the last judge of Israel. The people reject God and want a king. Saul is anointed as the first king, but he is not a good king. David, the shepherd boy is anointed and Saul is envious of him. David shows honor while Saul continues to self destruct.
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...TechSoup
Whether you're new to SEO or looking to refine your existing strategies, this webinar will provide you with actionable insights and practical tips to elevate your nonprofit's online presence.
1. Course Business
! Final papers due on
Canvas on Monday,
Nov. 23rd (2 weeks
from Monday)
! Meteyard & Davies
(2020) paper
provides some
useful guidelines on
reporting
2. Course Business
! Last lecture & lab today!
! Lab materials and data on Canvas
! Package to install: simr
! “Bonus” lecture on Canvas on signal detection
! Relevant for datasets where participants are making
a binary decision (e.g., recognition memory,
grammaticality judgments, moral dilemmas, etc.)
! Teaching survey (OMET) begins Monday
! Thanks for working with the challenges of this
term!
3. Week 12.2: Power
! Statistical Power
! Intro to Power Analysis
! Estimating Effect Size
! Why Do We Care About Power?
! Determining Power
! Power Simulations in R
! Influences on Power
! Lab
4. • Does “brain training” affect general cognition?
• H0: There is no effect of brain training on cognition
• HA: There is an effect of brain training on
cognition
Recap of Null Hypothesis Significance Testing
5. • Does “brain training” affect general cognition?
• H0: There is no effect of brain training on cognition
• HA: There is an effect of brain training on
cognition
Recap of Null Hypothesis Significance Testing
These two
books contain
the sum total
of human
knowledge…
6. • Let’s consider a world where H0 is true—there
is no effect of brain training on general
cognition
Recap of Null Hypothesis Significance Testing
7. • Let’s consider a world where H0 is true—there
is no effect of brain training on general
cognition
• Two possible outcomes…
H0 is true
ACTUAL
STATE
OF
THE
WORLD
WHAT WE DID
Retain H0 Reject H0
OOPS!
Type I error
Probability: α
GOOD!
Probability: 1-α
Recap of Null Hypothesis Significance Testing
8. • What about a world where HA is true?
Recap of Null Hypothesis Significance Testing
9. • Another mistake we could make: There
really is an effect, but we retained H0
• False negative / Type II error
• Historically, not considered as “bad” as Type I
• Probability: β
H0 is true
HA is true
ACTUAL
STATE
OF
THE
WORLD
WHAT WE DID
Retain H0 Reject H0
OOPS!
Type I error
Probability: α
GOOD!
Probability: 1-α
OOPS!
Type II error
Probability: β
Recap of Null Hypothesis Significance Testing
11. • POWER (1-β): Probability of correct rejection
of H0: detecting the effect when it really exists
• If our hypothesis (HA) is right, what probability is
there of obtaining significant evidence for it?
H0 is true
HA is true
ACTUAL
STATE
OF
THE
WORLD
WHAT WE DID
Retain H0 Reject H0
OOPS!
Type I error
Probability: α
GOOD!
Probability: 1-α
OOPS!
Type II error GOOD!
Probability: 1-β
Probability: β
Recap of Null Hypothesis Significance Testing
12. • POWER (1-β): Probability of correct rejection
of H0: detecting the effect when it really exists
• Can we find the thing we’re looking for?
Recap of Null Hypothesis Significance Testing
13. • POWER (1-β): Probability of correct rejection
of H0: detecting the effect when it really exists
• Can we find the thing we’re looking for?
• If our hypothesis is true, what is the probability
we’ll get p < .05 ?
• We compare retrieval practice to re-reading
with power = .75
• If retrieval practice is actually beneficial, there is a
75% chance we’ll get a significant result
• We compare bilinguals to monolinguals on a
test of non-verbal cognition with power = .35
• If there is a difference between monolinguals &
bilinguals, there is a 35% chance we’ll get p < .05
Recap of Null Hypothesis Significance Testing
14. Power Analysis
• Power analysis: Do we have the
power to detect the effect we’re
interested in?
• Depends on effect size, α
(Type I error rate), and sample size
• In practice:
• We can’t control effect size; it’s a property of nature
• α is usually fixed (e.g., at .05) by convention
• But, we can control our sample size n!
• So:
• Determine desired power (often .80)
• Estimate the effect size(s)
• Calculate the necessary sample size n
15. Week 12.2: Power
! Statistical Power
! Intro to Power Analysis
! Estimating Effect Size
! Why Do We Care About Power?
! Determining Power
! Power Simulations in R
! Influences on Power
! Lab
16. Estimating Effect Size
• One reason we haven’t always calculated
power is it requires the effect size of the
effect we’re looking for
• But, two ways to estimate effect size:
1. Prior literature
• What is the effect size in other studies in this
domain or with a similar manipulation?
17. Estimating Effect Size
• One reason we haven’t always calculated
power is it requires the effect size of the
effect we’re looking for
• But, two ways to estimate effect size:
1. Prior literature
2. Smallest Effect Size Of Interest (SESOI)
• Decide smallest effect size we’d care about
• e.g., we want our educational intervention to
have an effect size of at least .05 GPA
• Calculate power based on that effect size
• True that if actual effect is smaller than
.05 GPA, our power would be lower, but the
idea is we no longer care about the
intervention if its effect is that small
18. Week 12.2: Power
! Statistical Power
! Intro to Power Analysis
! Estimating Effect Size
! Why Do We Care About Power?
! Determining Power
! Power Simulations in R
! Influences on Power
! Lab
19. Why Do We Care About Power?
1. Efficient use of resources
• A major determinant of power is sample size
(larger = more power)
• Power analyses tell us if our planned sample size
(n) is:
• Large enough to be able to find what we’re
looking for
• Not too large that we’re collecting more data
than necessary
20. Why Do We Care About Power?
1. Efficient use of resources
• A major determinant of power is sample size
(larger = more power)
• Power analyses tell us if our planned sample size
(n) is:
• Large enough to be able to find what we’re
looking for
• Not too large that we’re collecting more data
than necessary
• This is about good use of our resources
• Societal resources: Money,
participant hours
• Your resources: Time!!
21. Why Do We Care About Power?
1. Efficient use of resources
2. Avoid p-hacking (Simmons et al., 2011)
• Rate of false positive results increases if we keep
collecting data whenever our effect is non-sig.
• In the limit, ensures
a significant result
• Random sampling
means that p-value
is likely to differ in
each sample
p-value happens to
be higher in this
slightly larger sample
22. Why Do We Care About Power?
1. Efficient use of resources
2. Avoid p-hacking (Simmons et al., 2011)
• Rate of false positive results increases if we keep
collecting data whenever our effect is non-sig.
• In the limit, ensures
a significant result
• Random sampling
means that p-value
is likely to differ in
each sample
Now, p-value
happens to be lower
23. Why Do We Care About Power?
1. Efficient use of resources
2. Avoid p-hacking (Simmons et al., 2011)
• Rate of false positive results increases if we keep
collecting data whenever our effect is non-sig.
• In the limit, ensures
a significant result
• Random sampling
means that p-value
is likely to differ in
each sample
• At some point,
p < .05 by chance
SIGNIFICANT!!
PUBLISH NOW!!
24. Why Do We Care About Power?
1. Efficient use of resources
2. Avoid p-hacking (Simmons et al., 2011)
• Rate of false positive results increases if we keep
collecting data whenever our effect is non-sig.
• In the limit, ensures
a significant result
• Random sampling
means that p-value
is likely to differ in
each sample
• At some point,
p < .05 by chance
• Bias to get positive
results if we stop
if and only if p < .05
But not significant in
this even larger
sample
25. Why Do We Care About Power?
1. Efficient use of resources
2. Avoid p-hacking (Simmons et al., 2011)
• Rate of false positive results increases if we keep
collecting data whenever our effect is non-sig.
• We can avoid this if we use a power analysis to
decide our sample size in advance
26. Why Do We Care About Power?
1. Efficient use of resources
2. Avoid p-hacking (Simmons et al., 2011)
3. Understand non-replication (Open Science
Collaboration, 2015)
• Even if an effect exists in the population, we’d
expect some non-significant results
• Power is almost never 100%
• In fact, many common designs in psychology have
low power (Etz & Vandekerckhove, 2016; Maxwell et al., 2015)
• Small to moderate sample sizes
• Small effect sizes
27. Why Do We Care About Power?
1. Efficient use of resources
2. Avoid p-hacking (Simmons et al., 2011)
3. Understand non-replication (Open Science
Collaboration, 2015)
28. Why Do We Care About Power?
1. Efficient use of resources
2. Avoid p-hacking (Simmons et al., 2011)
3. Understand non-replication (Open Science
Collaboration, 2015)
• Even if an effect exists in the population, we’d
expect some non-significant results
• Power is almost never 100%
• In fact, many common designs in psychology have
low power (Etz & Vandekerckhove, 2016; Maxwell et al., 2015)
• Small effect sizes
• Small to moderate sample sizes
• Failures to replicate might be a sign of low power,
rather than a non-existent effect
29. Why Do We Care About Power?
1. Efficient use of resources
2. Avoid p-hacking (Simmons et al., 2011)
3. Understand non-replication (Open Science
Collaboration, 2015)
4. Understand null results
• A non-significant result, by itself, doesn’t prove an
effect doesn’t exist
• We “fail to reject H0” rather than “accept H0”
30. Why Do We Care About Power?
1. Efficient use of resources
2. Avoid p-hacking (Simmons et al., 2011)
3. Understand non-replication (Open Science
Collaboration, 2015)
4. Understand null results
• A non-significant result, by itself, doesn’t prove an
effect doesn’t exist
• We “fail to reject H0” rather than “accept H0”
• “Absence of evidence is not evidence of absence.”
“I looked around Schenley Park for 15
minutes and didn’t see any giraffes.
Therefore, giraffes don’t exist.”
31. Why Do We Care About Power?
1. Efficient use of resources
2. Avoid p-hacking (Simmons et al., 2011)
3. Understand non-replication (Open Science
Collaboration, 2015)
4. Understand null results
• A non-significant result, by itself, doesn’t prove an
effect doesn’t exist
• We “fail to reject H0” rather than “accept H0”
• “Absence of evidence is not evidence of absence.”
We didn’t find enough
evidence to conclude
there is a significant
effect
No significant effect
exists
DOES
NOT
MEAN
32. Why Do We Care About Power?
1. Efficient use of resources
2. Avoid p-hacking (Simmons et al., 2011)
3. Understand non-replication (Open Science
Collaboration, 2015)
4. Understand null results
• A non-significant result, by itself, doesn’t prove an
effect doesn’t exist
• We “fail to reject H0” rather than “accept H0”
• “Absence of evidence is not evidence of absence.”
• Major criticism of null hypothesis significance testing!
33. Why Do We Care About Power?
1. Efficient use of resources
2. Avoid p-hacking (Simmons et al., 2011)
3. Understand non-replication (Open Science
Collaboration, 2015)
4. Understand null results
• A non-significant result, by itself, doesn’t prove an
effect doesn’t exist
• But, with high power, null result is more informative
34. Why Do We Care About Power?
1. Efficient use of resources
2. Avoid p-hacking (Simmons et al., 2011)
3. Understand non-replication (Open Science
Collaboration, 2015)
4. Understand null results
• A non-significant result, by itself, doesn’t prove an
effect doesn’t exist
• But, with high power, null result is more informative
• e.g., null effect of working memory training on
intelligence with 20% power
• Maybe brain training works & we just couldn’t detect
the effect
• But: null effect of WM on intelligence with 90% power
• Unlikely that we just missed the effect!
35. Why Do We Care About Power?
1. Efficient use of resources
2. Avoid p-hacking (Simmons et al., 2011)
3. Understand non-replication (Open Science
Collaboration, 2015)
4. Understand null results
• A non-significant result, by itself, doesn’t prove an
effect doesn’t exist
• But, with high power, null result is more informative
36. Why Do We Care About Power?
1. Efficient use of resources
2. Avoid p-hacking (Simmons et al., 2011)
3. Understand non-replication (Open Science
Collaboration, 2015)
4. Understand null results
5. Granting agencies now want to see it
• Don’t want to fund a study with low probability of
showing anything
• e.g., Our theory predicts greater activity in Broca’s area in
condition A than condition B. But our experiment has only
a 16% probability of detecting the difference. Not good!
37. Why Do We Care About Power?
1. Efficient use of resources
2. Avoid p-hacking (Simmons et al., 2011)
3. Understand non-replication (Open Science
Collaboration, 2015)
4. Understand null results
5. Granting agencies now want to see it
• NIH:
• IES:
38. Why Do We Care About Power?
1. Efficient use of resources
2. Avoid p-hacking (Simmons et al., 2011)
3. Understand non-replication (Open Science
Collaboration, 2015)
4. Understand null results
5. Granting agencies now want to see it
6. Scientific accuracy!
• If there is an effect, we want to know about it!
39. Week 12.2: Power
! Statistical Power
! Intro to Power Analysis
! Estimating Effect Size
! Why Do We Care About Power?
! Determining Power
! Power Simulations in R
! Influences on Power
! Lab
40. Determining Power
• Power for simpler tests like
t-tests or ANOVAs can be
simply found from tables
• Simpler design. Only 1
random effect (at most)
• But, more complicated for
mixed effect models
• Varying number of fixed
effects, random intercepts,
random slopes
• Not possible to have a table
for every possible design
• We need a different
approach
41. Monte Carlo Methods
• Remember the definition of power?
• The probability of observing a significant effect in
our sample if the effect truly exists in the
population
• What if we knew for a fact that the effect existed in
a particular population?
• Then, a measure of power is how often we get a
significant result in a sample (of our intended n)
• Observe a significant effect 10 samples out of
20 = 50% of the time = power of .50
• Observe a significant effect 300 samples out of
1000 = 30% of the time = power of .30
• Observe a significant effect 800 samples out of
1000 = 80% of the time = power of .80
42. Monte Carlo Methods
• Remember the definition of power?
• The probability of observing a significant effect in
our sample if the effect truly exists in the
population
• What if we knew for a fact that the effect existed in
a particular population?
• Then, a measure of power is how often we get a
significant result in a sample (of our intended n)
Great, but where am I ever going to
find data where I know exactly what
the population parameters are?
43. Monte Carlo Methods
• Remember the definition of power?
• The probability of observing a significant effect in
our sample if the effect truly exists in the
population
• What if we knew for a fact that the effect existed in
a particular population?
• Then, a measure of power is how often we get a
significant result in a sample (of our intended n)
• Solution: We create (“simulate”) the data.
44. Data Simulation
• Set some plausible population parameters
(effect size, subject variance, item var., etc.)
• Since we are creating the data…
• We can choose the population parameters
• We know we exactly what they are
Set population
parameters
Mean = 723 ms
Group difference =
100 ms
Subject var = 30
45. Data Simulation
• Create (“simulate”) a random sample drawn
from this population
• Like most samples, the sample statistics will not
exactly match the population parameters
• It’s randomly generated
• But, the difference is we know what the
population is like & that there IS an effect
Set population
parameters
Mean = 723 ms
Group difference =
100 ms
Subject var = 30
Create a random
sample from
these data
N subjects = 20
N items = 40
46. Data Simulation
• Now, fit our planned mixed-effects model to this
sample of simulated data to get one result
• Might get a significant result
• Correctly detected the effect in the population
• Might get a non-significant result
• Type II error – missed an effect that really exists in
the population
Set population
parameters
Mean = 723 ms
Group difference =
100 ms
Subject var = 30
Create a random
sample from
these data
N subjects = 20
N items = 40
Run our planned
model and see if
we get a
significant result
47. Monte Carlo Methods
• If we do this repeatedly, we will get multiple
significance tests, each on a different sample
• Outcomes:
• Sample 1: p < .05 (Yes)
• Sample 2: p = .23 (No)
• Sample 3: p < .05 (Yes)
• Sample 4: p = .14 (No)
• Detected the effect ½ of the time: Power = .50
Set population
parameters
Mean = 723 ms
Group difference =
100 ms
Subject var = 30
Create a random
sample from
these data
N subjects = 20
N items = 40
Run our planned
model and see if
we get a
significant result
Repeat with a new sample
from the same population
48. Monte Carlo Methods
• If we do this repeatedly, we will get multiple
significance tests, each on a different sample
• Hmm, that power
wasn’t very good "
Set population
parameters
Mean = 723 ms
Group difference =
100 ms
Subject var = 30
Create a random
sample from
these data
N subjects = 20
N items = 40
Run our planned
model and see if
we get a
significant result
Repeat with a new sample
from the same population
49. Monte Carlo Methods
• If we do this repeatedly, we will get multiple
significance tests, each on a different sample
• Hmm, that power
wasn’t very good "
• Let’s increase the number of subjects and run a
new simulation to see what our power is like
now
Set population
parameters
Mean = 723 ms
Group difference =
100 ms
Subject var = 30
Create a random
sample from
these data
N subjects = 60
N items = 40
Run our planned
model and see if
we get a
significant result
Repeat with a new sample
from the same population
50. Monte Carlo Methods
• If we do this repeatedly, we will get multiple
significance tests, each on a different sample
• Goal: Find the sample
size(s) that let you detect the effect at least 80%
of the time (or whatever your desired power is)
• Will 40 subjects in each of 5 schools suffice?
• What about 50 subjects in each of 10 schools?
Set population
parameters
Mean = 723 ms
Group difference =
100 ms
Subject var = 30
Create a random
sample from
these data
N subjects = 60
N items = 40
Run our planned
model and see if
we get a
significant result
Repeat with a new sample
from the same population
51. Week 12.2: Power
! Statistical Power
! Intro to Power Analysis
! Estimating Effect Size
! Why Do We Care About Power?
! Determining Power
! Power Simulations in R
! Influences on Power
! Lab
52. Power Simulations in R
• We can do these Monte Carlo
simulations in R with the simr
package
• If we already have a model &
dataset (“observed power”):
• powerSim(model1,
test=fixed('VariableName'), nsim=200)
• Tells us our power to detect a significant effect of
the fixed effect VariableName (e.g., 83%)
• nsim is the number of Monte Carlo simulations.
More will always give you a more accurate power
analysis, but will take more time to run
53. Power Simulations in R
• We can do these Monte Carlo
simulations in R with the simr
package
• If we already have a model &
dataset (“observed power”):
• powerSim(model1,
test=fixed('VariableName'), nsim=200)
• Could also do test=random('RandomEffectName')
to find our power to detect a significant random
effect
• Or test=fcompare to compare to a specific
alternative (nested) model
54. Power Simulations in R
• What if I don’t have enough power?
• We may need to add more observations—more
subjects, items, classrooms, etc.
• plot(powerCurve(model1,
test=fixed(‘VariableName'),
along='Subject',
breaks=c(40,60,80,100)))
• Varies the number of Subjects and runs the power-
analysis for each sample size
• Specifically, tests the power with 40, 60, 80, and
100 subjects
• Where do we hit 80% power?
56. Power Simulations in R
• What if I don’t have my data yet?
• A priori power
• Can first create a (simulated) dataset & model
• makeLmer()
57. Week 12.2: Power
! Statistical Power
! Intro to Power Analysis
! Estimating Effect Size
! Why Do We Care About Power?
! Determining Power
! Power Simulations in R
! Influences on Power
! Lab
58. Influences on Power
• So what makes for a powerful design?
• Things that increase power:
• Larger effect size estimates for the fixed effects
• Bigger things are easier to find
• Larger sample size (at any level)
• More data = more confidence
• This one we can control
• Increasing sample size at a higher level (e.g., subjects
rather than time points within subjects) is more effective
• Variance of independent variables
• Easier to see an effect of income on happiness if people
vary in their income
• Hard to test effect of “number of fingers on your hand”
• With a categorical variable, would prefer to have an equal
# of observations in each condition—most information
59. Influences on Power
• So what makes for a powerful design?
• Things that decrease power:
• Larger variance of random effects
• More differences between people (noise) make it harder
to see what’s consistent
• Larger error variance
• Again, more noise = harder to see consistent effects
• May be able to reduce both of these if you can add
covariates / control variables
60. Week 12.2: Power
! Statistical Power
! Intro to Power Analysis
! Estimating Effect Size
! Why Do We Care About Power?
! Determining Power
! Power Simulations in R
! Influences on Power
! Lab