In this talk I discuss our recent Bayesian reanalysis of the Reproducibility Project: Psychology.
The slides at the end include the technical details underlying the Bayesian model averaging method we employ.
Bayesian Bias Correction: Critically evaluating sets of studies in the presen...Alexander Etz
Slides from my recent talks on a new method called Bayesian Bias Correction. We attempt to mitigate the effects of publication bias on our estimates of effect size by trying to model the publication process itself.
Abstract: Mounting failures of replication in the social and biological sciences give a practical spin to statistical foundations in the form of the question: How can we attain reliability when methods make illicit cherry-picking and significance seeking so easy? Researchers, professional societies, and journals are increasingly getting serious about methodological reforms to restore scientific integrity – some are quite welcome (e.g., pre-registration), while others are quite radical. The American Statistical Association convened members from differing tribes of frequentists, Bayesians, and likelihoodists to codify misuses of P-values. Largely overlooked are the philosophical presuppositions of both criticisms and proposed reforms. Paradoxically, alternative replacement methods may enable rather than reveal illicit inferences due to cherry-picking, multiple testing, and other biasing selection effects. Crowd-sourced reproducibility research in psychology is helping to change the reward structure but has its own shortcomings. Focusing on purely statistical considerations, it tends to overlook problems with artificial experiments. Without a better understanding of the philosophical issues, we can expect the latest reforms to fail.
Stephen Senn slides:"‘Repligate’: reproducibility in statistical studies. What does it mean and in what sense does it matter?" presented May 23 at the session on "The Philosophy of Statistics: Bayesianism, Frequentism and the Nature of Inference"," at the 2015 APS Annual Convention in NYC
Exploratory Research is More Reliable Than Confirmatory Researchjemille6
PSA 2016 Symposium:
Philosophy of Statistics in the Age of Big Data and Replication Crises
Presenter: Clark Glymour (Alumni University Professor in Philosophy, Carnegie Mellon University, Pittsburgh, Pennsylvania)
ABSTRACT: Ioannidis (2005) argued that most published research is false, and that “exploratory” research in which many hypotheses are assessed automatically is especially likely to produce false positive relations. Colquhoun (2014) with simulations estimates that 30 to 40% of positive results using the conventional .05 cutoff for rejection of a null hypothesis is false. Their explanation is that true relationships in a domain are rare and the selection of hypotheses to test is roughly independent of their truth, so most relationships tested will in fact be false. Conventional use of hypothesis tests, in other words, suffers from a base rate fallacy. I will show that the reverse is true for modern search methods for causal relations because: a. each hypothesis is tested or assessed multiple times; b. the methods are biased against positive results; c. systems in which true relationships are rare are an advantage for these methods. I will substantiate the claim with both empirical data and with simulations of data from systems with a thousand to a million variables that result in fewer than 5% false positive relationships and in which 90% or more of the true relationships are recovered.
"The Statistical Replication Crisis: Paradoxes and Scapegoats”jemille6
D. G. Mayo LSE Popper talk, May 10, 2016.
Abstract: Mounting failures of replication in the social and biological sciences give a practical spin to statistical foundations in the form of the question: How can we attain reliability when Big Data methods make illicit cherry-picking and significance seeking so easy? Researchers, professional societies, and journals are increasingly getting serious about methodological reforms to restore scientific integrity – some are quite welcome (e.g., preregistration), while others are quite radical. Recently, the American Statistical Association convened members from differing tribes of frequentists, Bayesians, and likelihoodists to codify misuses of P-values. Largely overlooked are the philosophical presuppositions of both criticisms and proposed reforms. Paradoxically, alternative replacement methods may enable rather than reveal illicit inferences due to cherry-picking, multiple testing, and other biasing selection effects. Popular appeals to “diagnostic testing” that aim to improve replication rates may (unintentionally) permit the howlers and cookbook statistics we are at pains to root out. Without a better understanding of the philosophical issues, we can expect the latest reforms to fail.
Statistical skepticism: How to use significance tests effectively jemille6
Prof. D. Mayo, presentation Oct. 12, 2017 at the ASA Symposium on Statistical Inference : “A World Beyond p < .05” in the session: “What are the best uses for P-values?“
D. Mayo: Replication Research Under an Error Statistical Philosophy jemille6
D. Mayo (Virginia Tech) slides from her talk June 3 at the "Preconference Workshop on Replication in the Sciences" at the 2015 Society for Philosophy and Psychology meeting.
Bayesian Bias Correction: Critically evaluating sets of studies in the presen...Alexander Etz
Slides from my recent talks on a new method called Bayesian Bias Correction. We attempt to mitigate the effects of publication bias on our estimates of effect size by trying to model the publication process itself.
Abstract: Mounting failures of replication in the social and biological sciences give a practical spin to statistical foundations in the form of the question: How can we attain reliability when methods make illicit cherry-picking and significance seeking so easy? Researchers, professional societies, and journals are increasingly getting serious about methodological reforms to restore scientific integrity – some are quite welcome (e.g., pre-registration), while others are quite radical. The American Statistical Association convened members from differing tribes of frequentists, Bayesians, and likelihoodists to codify misuses of P-values. Largely overlooked are the philosophical presuppositions of both criticisms and proposed reforms. Paradoxically, alternative replacement methods may enable rather than reveal illicit inferences due to cherry-picking, multiple testing, and other biasing selection effects. Crowd-sourced reproducibility research in psychology is helping to change the reward structure but has its own shortcomings. Focusing on purely statistical considerations, it tends to overlook problems with artificial experiments. Without a better understanding of the philosophical issues, we can expect the latest reforms to fail.
Stephen Senn slides:"‘Repligate’: reproducibility in statistical studies. What does it mean and in what sense does it matter?" presented May 23 at the session on "The Philosophy of Statistics: Bayesianism, Frequentism and the Nature of Inference"," at the 2015 APS Annual Convention in NYC
Exploratory Research is More Reliable Than Confirmatory Researchjemille6
PSA 2016 Symposium:
Philosophy of Statistics in the Age of Big Data and Replication Crises
Presenter: Clark Glymour (Alumni University Professor in Philosophy, Carnegie Mellon University, Pittsburgh, Pennsylvania)
ABSTRACT: Ioannidis (2005) argued that most published research is false, and that “exploratory” research in which many hypotheses are assessed automatically is especially likely to produce false positive relations. Colquhoun (2014) with simulations estimates that 30 to 40% of positive results using the conventional .05 cutoff for rejection of a null hypothesis is false. Their explanation is that true relationships in a domain are rare and the selection of hypotheses to test is roughly independent of their truth, so most relationships tested will in fact be false. Conventional use of hypothesis tests, in other words, suffers from a base rate fallacy. I will show that the reverse is true for modern search methods for causal relations because: a. each hypothesis is tested or assessed multiple times; b. the methods are biased against positive results; c. systems in which true relationships are rare are an advantage for these methods. I will substantiate the claim with both empirical data and with simulations of data from systems with a thousand to a million variables that result in fewer than 5% false positive relationships and in which 90% or more of the true relationships are recovered.
"The Statistical Replication Crisis: Paradoxes and Scapegoats”jemille6
D. G. Mayo LSE Popper talk, May 10, 2016.
Abstract: Mounting failures of replication in the social and biological sciences give a practical spin to statistical foundations in the form of the question: How can we attain reliability when Big Data methods make illicit cherry-picking and significance seeking so easy? Researchers, professional societies, and journals are increasingly getting serious about methodological reforms to restore scientific integrity – some are quite welcome (e.g., preregistration), while others are quite radical. Recently, the American Statistical Association convened members from differing tribes of frequentists, Bayesians, and likelihoodists to codify misuses of P-values. Largely overlooked are the philosophical presuppositions of both criticisms and proposed reforms. Paradoxically, alternative replacement methods may enable rather than reveal illicit inferences due to cherry-picking, multiple testing, and other biasing selection effects. Popular appeals to “diagnostic testing” that aim to improve replication rates may (unintentionally) permit the howlers and cookbook statistics we are at pains to root out. Without a better understanding of the philosophical issues, we can expect the latest reforms to fail.
Statistical skepticism: How to use significance tests effectively jemille6
Prof. D. Mayo, presentation Oct. 12, 2017 at the ASA Symposium on Statistical Inference : “A World Beyond p < .05” in the session: “What are the best uses for P-values?“
D. Mayo: Replication Research Under an Error Statistical Philosophy jemille6
D. Mayo (Virginia Tech) slides from her talk June 3 at the "Preconference Workshop on Replication in the Sciences" at the 2015 Society for Philosophy and Psychology meeting.
D. G. Mayo (Virginia Tech) "Error Statistical Control: Forfeit at your Peril" presented May 23 at the session on "The Philosophy of Statistics: Bayesianism, Frequentism and the Nature of Inference," 2015 APS Annual Convention in NYC.
Controversy Over the Significance Test Controversyjemille6
Deborah Mayo (Professor of Philosophy, Virginia Tech, Blacksburg, Virginia) in PSA 2016 Symposium: Philosophy of Statistics in the Age of Big Data and Replication Crises
Deborah G. Mayo: Is the Philosophy of Probabilism an Obstacle to Statistical Fraud Busting?
Presentation slides for: Revisiting the Foundations of Statistics in the Era of Big Data: Scaling Up to Meet the Challenge[*] at the Boston Colloquium for Philosophy of Science (Feb 21, 2014).
Probing with Severity: Beyond Bayesian Probabilism and Frequentist Performancejemille6
Slides from Rutgers Seminar talk by Deborah G Mayo
December 3, 2014
Rutgers, Department of Statistics and Biostatistics
Abstract: Getting beyond today’s most pressing controversies revolving around statistical methods, I argue, requires scrutinizing their underlying statistical philosophies.Two main philosophies about the roles of probability in statistical inference are probabilism and performance (in the long-run). The first assumes that we need a method of assigning probabilities to hypotheses; the second assumes that the main function of statistical method is to control long-run performance. I offer a third goal: controlling and evaluating the probativeness of methods. An inductive inference, in this conception, takes the form of inferring hypotheses to the extent that they have been well or severely tested. A report of poorly tested claims must also be part of an adequate inference. I develop a statistical philosophy in which error probabilities of methods may be used to evaluate and control the stringency or severity of tests. I then show how the “severe testing” philosophy clarifies and avoids familiar criticisms and abuses of significance tests and cognate methods (e.g., confidence intervals). Severity may be threatened in three main ways: fallacies of statistical tests, unwarranted links between statistical and substantive claims, and violations of model assumptions.
A. Gelman "50 shades of gray: A research story," presented May 23 at the session on "The Philosophy of Statistics: Bayesianism, Frequentism and the Nature of Inference," 2015 APS Annual Convention in NYC.
There are many questions one might ask of a clinical trial, ranging from what was the effect in the patients studied to what might the effect be in future patients via what was the effect in individual patients? The extent to which the answer to these questions is similar depends on various assumptions made and in some cases the design used may not permit any meaningful answer to be given at all.
A related issue is confusion between randomisation, random sampling, linear model and true multivariate based modelling. These distinctions don’t matter much for some purposes and under some circumstances but for others they do.
There are many valid criticisms of P-values but the criticism that they are largely responsible for the reproducibility crisis has been accepted rather lightly in some quarters. Whatever the inferential statistic that is used, it is quite illogical to assume that as the sample size increases it will tend to show more evidence against the null hypothesis. This applies to Bayesian posterior probabilities as much as it does to P-values. In the context of P-values it can be referred to as the trend towards significance fallacy but more generally, for reasons I shall explain, it could be referred to as the anticipated evidence fallacy.
The anticipated evidence fallacy is itself an example of the overstated evidence fallacy. I shall also discuss this fallacy and other relevant matters affecting reproducible science including the problem of false negatives.
Replication Crises and the Statistics Wars: Hidden Controversiesjemille6
D. Mayo presentation at the X-Phil conference on "Reproducibility and Replicabililty in Psychology and Experimental Philosophy", University College London (June 14, 2018)
Severe Testing: The Key to Error Correctionjemille6
D. G. Mayo's slides for her presentation given March 17, 2017 at Boston Colloquium for Philosophy of Science, Alfred I.Taub forum: "Understanding Reproducibility & Error Correction in Science"
D. G. Mayo: Your data-driven claims must still be probed severelyjemille6
In the session "Philosophy of Science and the New Paradigm of Data-Driven Science at the American Statistical Association Conference on Statistical Learning and Data Science/Nonparametric Statistics
It is argued that when it comes to nuisance parameters an assumption of ignorance is harmful. On the other hand this raises problems as to how far one should go in searching for further data when combining evidence.
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...jemille6
Gerd Gigerenzer (Director of Max Planck Institute for Human Development, Berlin, Germany) in the PSA 2016 Symposium:Philosophy of Statistics in the Age of Big Data and Replication Crises
What should we expect from reproducibiliryStephen Senn
Is there really a reproducibility crisis and if so are P-values to blame? Choose any statistic you like and carry out two identical independent studies and report this statistic for each. In advance of collecting any data, you ought to expect that it is just as likely that statistic 1 will be smaller than statistic 2 as vice versa. Once you have seen statistic 1, things are not so simple but if they are not so simple, it is that you have other information in some form. However, it is at least instructive that you need to be careful in jumping to conclusions about what to expect from reproducibility. Furthermore, the forecasts of good Bayesians ought to obey a Martingale property. On average you should be in the future where you are now but, of course, your inferential random walk may lead to some peregrination before it homes in on “the truth”. But you certainly can’t generally expect that a probability will get smaller as you continue. P-values, like other statistics are a position not a movement. Although often claimed, there is no such things as a trend towards significance.
Using these and other philosophical considerations I shall try and establish what it is we want from reproducibility. I shall conclude that we statisticians should probably be paying more attention to checking that standard errors are being calculated appropriately and rather less to inferential framework.
D. G. Mayo (Virginia Tech) "Error Statistical Control: Forfeit at your Peril" presented May 23 at the session on "The Philosophy of Statistics: Bayesianism, Frequentism and the Nature of Inference," 2015 APS Annual Convention in NYC.
Controversy Over the Significance Test Controversyjemille6
Deborah Mayo (Professor of Philosophy, Virginia Tech, Blacksburg, Virginia) in PSA 2016 Symposium: Philosophy of Statistics in the Age of Big Data and Replication Crises
Deborah G. Mayo: Is the Philosophy of Probabilism an Obstacle to Statistical Fraud Busting?
Presentation slides for: Revisiting the Foundations of Statistics in the Era of Big Data: Scaling Up to Meet the Challenge[*] at the Boston Colloquium for Philosophy of Science (Feb 21, 2014).
Probing with Severity: Beyond Bayesian Probabilism and Frequentist Performancejemille6
Slides from Rutgers Seminar talk by Deborah G Mayo
December 3, 2014
Rutgers, Department of Statistics and Biostatistics
Abstract: Getting beyond today’s most pressing controversies revolving around statistical methods, I argue, requires scrutinizing their underlying statistical philosophies.Two main philosophies about the roles of probability in statistical inference are probabilism and performance (in the long-run). The first assumes that we need a method of assigning probabilities to hypotheses; the second assumes that the main function of statistical method is to control long-run performance. I offer a third goal: controlling and evaluating the probativeness of methods. An inductive inference, in this conception, takes the form of inferring hypotheses to the extent that they have been well or severely tested. A report of poorly tested claims must also be part of an adequate inference. I develop a statistical philosophy in which error probabilities of methods may be used to evaluate and control the stringency or severity of tests. I then show how the “severe testing” philosophy clarifies and avoids familiar criticisms and abuses of significance tests and cognate methods (e.g., confidence intervals). Severity may be threatened in three main ways: fallacies of statistical tests, unwarranted links between statistical and substantive claims, and violations of model assumptions.
A. Gelman "50 shades of gray: A research story," presented May 23 at the session on "The Philosophy of Statistics: Bayesianism, Frequentism and the Nature of Inference," 2015 APS Annual Convention in NYC.
There are many questions one might ask of a clinical trial, ranging from what was the effect in the patients studied to what might the effect be in future patients via what was the effect in individual patients? The extent to which the answer to these questions is similar depends on various assumptions made and in some cases the design used may not permit any meaningful answer to be given at all.
A related issue is confusion between randomisation, random sampling, linear model and true multivariate based modelling. These distinctions don’t matter much for some purposes and under some circumstances but for others they do.
There are many valid criticisms of P-values but the criticism that they are largely responsible for the reproducibility crisis has been accepted rather lightly in some quarters. Whatever the inferential statistic that is used, it is quite illogical to assume that as the sample size increases it will tend to show more evidence against the null hypothesis. This applies to Bayesian posterior probabilities as much as it does to P-values. In the context of P-values it can be referred to as the trend towards significance fallacy but more generally, for reasons I shall explain, it could be referred to as the anticipated evidence fallacy.
The anticipated evidence fallacy is itself an example of the overstated evidence fallacy. I shall also discuss this fallacy and other relevant matters affecting reproducible science including the problem of false negatives.
Replication Crises and the Statistics Wars: Hidden Controversiesjemille6
D. Mayo presentation at the X-Phil conference on "Reproducibility and Replicabililty in Psychology and Experimental Philosophy", University College London (June 14, 2018)
Severe Testing: The Key to Error Correctionjemille6
D. G. Mayo's slides for her presentation given March 17, 2017 at Boston Colloquium for Philosophy of Science, Alfred I.Taub forum: "Understanding Reproducibility & Error Correction in Science"
D. G. Mayo: Your data-driven claims must still be probed severelyjemille6
In the session "Philosophy of Science and the New Paradigm of Data-Driven Science at the American Statistical Association Conference on Statistical Learning and Data Science/Nonparametric Statistics
It is argued that when it comes to nuisance parameters an assumption of ignorance is harmful. On the other hand this raises problems as to how far one should go in searching for further data when combining evidence.
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...jemille6
Gerd Gigerenzer (Director of Max Planck Institute for Human Development, Berlin, Germany) in the PSA 2016 Symposium:Philosophy of Statistics in the Age of Big Data and Replication Crises
What should we expect from reproducibiliryStephen Senn
Is there really a reproducibility crisis and if so are P-values to blame? Choose any statistic you like and carry out two identical independent studies and report this statistic for each. In advance of collecting any data, you ought to expect that it is just as likely that statistic 1 will be smaller than statistic 2 as vice versa. Once you have seen statistic 1, things are not so simple but if they are not so simple, it is that you have other information in some form. However, it is at least instructive that you need to be careful in jumping to conclusions about what to expect from reproducibility. Furthermore, the forecasts of good Bayesians ought to obey a Martingale property. On average you should be in the future where you are now but, of course, your inferential random walk may lead to some peregrination before it homes in on “the truth”. But you certainly can’t generally expect that a probability will get smaller as you continue. P-values, like other statistics are a position not a movement. Although often claimed, there is no such things as a trend towards significance.
Using these and other philosophical considerations I shall try and establish what it is we want from reproducibility. I shall conclude that we statisticians should probably be paying more attention to checking that standard errors are being calculated appropriately and rather less to inferential framework.
The Seven Habits of Highly Effective StatisticiansStephen Senn
If you know why the title of this talk is extremely stupid, then you clearly know something about control, data and reasoning: in short, you have most of what it takes to be a statistician. If you have studied statistics then you will also know that a large amount of anything, and this includes successful careers, is luck.
In this talk I shall try share some of my experiences of being a statistician in the hope that it will help you make the most of whatever luck life throws you, In so doing, I shall try my best to overcome the distorting influence of that easiest of sciences hindsight. Without giving too much away, I shall be recommending that you read, listen, think, calculate, understand, communicate, and do. I shall give you some example of what I think works and what I think doesn’t
In all of this you should never forget the power of negativity and also the joy of being able to wake up every day and say to yourself ‘I love the small of data in the morning’.
The replication crisis: are P-values the problem and are Bayes factors the so...StephenSenn2
Today’s posterior is tomorrow’s prior. Dennis Lindley1 (P2)
It has been claimed that science is undergoing a replication crisis and that when looking for culprits, the cult of significance is the chief suspect. It has also been claimed that Bayes factors might provide a solution.
In my opinion, these claims are misleading and part of the problem is our understanding of the purpose and nature of replication, which has only recently been subject to formal analysis2. What we are or should be interested in is truth. Replication is a coherence not a correspondence requirement3 and one that has a strong dependence on the size of the replication study4.
Consideration of Bayes factors raises a puzzling question. Should the Bayes factor for a replication study be calculated as if it were the initial study? If the answer is yes, the approach is not fully Bayesian and furthermore the Bayes factors will be subject to exactly the same replication ‘paradox’ as P-values. If the answer is no, then in what sense can an initially found Bayes factor be replicated and what are the implications for how we should view replication of P-values?
A further issue is that little attention has been paid to false negatives and, by extension to true negative values. Yet, as is well known from the theory of diagnostic tests, it is meaningless to consider the performance of a test in terms of false positives alone.
I shall argue that we are in danger of confusing evidence with the conclusions we draw and that any reforms of scientific practice should concentrate on producing evidence that is reliable as it can be qua evidence. There are many basic scientific practices in need of reform. Pseudoreplication5, for example, and the routine destruction of information through dichotomisation6 are far more serious problems than many matters of inferential framing that seem to have excited statisticians.
References
1. Lindley DV. Bayesian statistics: A review. SIAM; 1972.
2. Devezer B, Navarro DJ, Vandekerckhove J, Ozge Buzbas E. The case for formal methodology in scientific reform. R Soc Open Sci. Mar 31 2021;8(3):200805. doi:10.1098/rsos.200805
3. Walker RCS. Theories of Truth. In: Hale B, Wright C, Miller A, eds. A Companion to the Philosophy of Language. John Wiley & Sons,; 2017:532-553:chap 21.
4. Senn SJ. A comment on replication, p-values and evidence by S.N.Goodman, Statistics in Medicine 1992; 11:875-879. Letter. Statistics in Medicine. 2002;21(16):2437-44.
5. Hurlbert SH. Pseudoreplication and the design of ecological field experiments. Ecological monographs. 1984;54(2):187-211.
6. Senn SJ. Being Efficient About Efficacy Estimation. Research. Statistics in Biopharmaceutical Research. 2013;5(3):204-210. doi:10.1080/19466315.2012.754726
The replication crisis: are P-values the problem and are Bayes factors the so...jemille6
Today’s posterior is tomorrow’s prior. Dennis Lindley
It has been claimed that science is undergoing a replication crisis and that when looking for culprits, the cult of significance is the chief suspect. It has also been claimed that Bayes factors might provide a solution.
In my opinion, these claims are misleading and part of the problem is our understanding
of the purpose and nature of replication, which has only recently been subject to formal
analysis.
What we are or should be interested in is truth. Replication is a coherence not a correspondence requirement and one that has a strong dependence on the size
of the replication study
.
Consideration of Bayes factors raises a puzzling question. Should the Bayes factor for a replication study be calculated as if it were the initial study? If the answer is yes, the approach is not fully Bayesian and furthermore the Bayes factors will be subject to
exactly the same replication ‘paradox’ as P-values. If the answer is no, then in what
sense can an initially found Bayes factor be replicated and what are the implications for how we should view replication of P-values?
A further issue is that little attention has been paid to false negatives and, by extension
to true negative values. Yet, as is well known from the theory of diagnostic tests, it is
meaningless to consider the performance of a test in terms of false positives alone.
I shall argue that we are in danger of confusing evidence with the conclusions we draw and that any reforms of scientific practice should concentrate on producing evidence
that is reliable as it can be qua evidence. There are many basic scientific practices in
need of reform. Pseudoreplication, for example, and the routine destruction of
information through dichotomisation are far more serious problems than many matters of inferential framing that seem to have excited statisticians.
K-to-R Workshop: How to Structure the "Approach" Section (Part 1)UCLA CTSI
UCLA CTSI K-to_R Workshop, October 29, 2015
Presenter:
David Elashoff, PhD
Professor of Biostatistics & Medicine
Program Leader, CTSI Biostatistics and Computational Biology
Statistical tests, also known as hypothesis tests, are used to make inferences or draw conclusions about a population based on sample data. These tests help determine the significance of relationships, differences, or effects observed in the data. Here's an overview of some commonly used statistical tests:
T-Test: The t-test is used to compare the means of two groups and determine if they are statistically different from each other. It is typically used when the sample size is small and the population standard deviation is unknown.
Chi-Square Test: The chi-square test is used to assess the association or independence between categorical variables. It compares observed frequencies with expected frequencies to determine if there is a significant relationship between the variables.
Analysis of Variance (ANOVA): ANOVA is used to compare the means of three or more groups to determine if there are significant differences among them. It tests whether the variation between groups is greater than the variation within groups.
Pearson's Correlation Coefficient: Pearson's correlation coefficient measures the strength and direction of the linear relationship between two continuous variables. It ranges from -1 to 1, with 0 indicating no correlation and values closer to -1 or 1 indicating stronger correlations.
Regression Analysis: Regression analysis is used to model the relationship between a dependent variable and one or more independent variables. It helps determine how the independent variables affect the dependent variable and provides insights into the strength and significance of the relationships.
Mann-Whitney U Test: The Mann-Whitney U test is a nonparametric test used to compare the distributions of two independent groups when the data does not meet the assumptions of a t-test.
Kruskal-Wallis Test: The Kruskal-Wallis test is a nonparametric test used to compare the distributions of three or more independent groups when the data does not meet the assumptions of ANOVA.
Wilcoxon Signed-Rank Test: The Wilcoxon signed-rank test is a nonparametric test used to compare two related samples or paired data when the data is not normally distributed or when there are outliers.
Fisher's Exact Test: Fisher's exact test is used to determine if there is a significant association between two categorical variables in a 2x2 contingency table when the sample size is small.
McNemar's Test: McNemar's test is used to compare the proportions of two dependent groups in a 2x2 contingency table, often used for before-and-after studies or matched pairs.
These are just a few examples of statistical tests, and there are many more available depending on the research question, data type, and assumptions. The choice of test depends on the specific analysis objectives and characteristics of the data.
It's important to note that statistical tests should be used appropriately, taking into consideration assumptions, sample sizes, and the context of the research. Consulting w
State of Artificial intelligence Report 2023kuntobimo2016
Artificial intelligence (AI) is a multidisciplinary field of science and engineering whose goal is to create intelligent machines.
We believe that AI will be a force multiplier on technological progress in our increasingly digital, data-driven world. This is because everything around us today, ranging from culture to consumer products, is a product of intelligence.
The State of AI Report is now in its sixth year. Consider this report as a compilation of the most interesting things we’ve seen with a goal of triggering an informed conversation about the state of AI and its implication for the future.
We consider the following key dimensions in our report:
Research: Technology breakthroughs and their capabilities.
Industry: Areas of commercial application for AI and its business impact.
Politics: Regulation of AI, its economic implications and the evolving geopolitics of AI.
Safety: Identifying and mitigating catastrophic risks that highly-capable future AI systems could pose to us.
Predictions: What we believe will happen in the next 12 months and a 2022 performance review to keep us honest.
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfEnterprise Wired
In this guide, we'll explore the key considerations and features to look for when choosing a Trusted analytics platform that meets your organization's needs and delivers actionable intelligence you can trust.
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
1. A Bayesian Perspective On
The Reproducibility Project:
Psychology
Alexander Etz & Joachim Vandekerckhove
@alxetz ßMy Twitter (no ‘e’ in alex)
alexanderetz.com ß My website/blog
2. Purpose
• Revisit Reproducibility Project: Psychology (RPP)
• Compute Bayes factors
• Account for publication bias in original studies
• Evaluate and compare levels of statistical evidence
3. TLDR: Conclusions first
• 75% of studies find qualitatively similar levels of evidence
in original and replication
• 64% find weak evidence (BF < 10) in both attempts
• 11% of studies find strong evidence (BF > 10) in both attempts
4. TLDR: Conclusions first
• 75% of studies find qualitatively similar levels of evidence
in original and replication
• 64% find weak evidence (BF < 10) in both attempts
• 11% of studies find strong evidence (BF > 10) in both attempts
• 10% find strong evidence in replication but not original
• 15% find strong evidence in original but not replication
5. The RPP
• 270 scientists attempt to closely replicate 100 psychology
studies
• Use original materials (when possible)
• Work with original authors
• Pre-registered to avoid bias
• Analysis plan specified in advance
• Guaranteed to be published regardless of outcome
6. The RPP
• 2 main criteria for grading replication
7. The RPP
• 2 main criteria for grading replication
• Is the replication result statistically significant (p < .05) in
the same direction as original?
• 39% success rate
8. The RPP
• 2 main criteria for grading replication
• Is the replication result statistically significant (p < .05) in
the same direction as original?
• 39% success rate
• Does the replication’s confidence interval capture original
reported effect?
• 47% success rate
9. The RPP
• Neither of these metrics are any good
• (at least not as used)
• Neither make predictions about out-of-sample data
• Comparing significance levels is bad
• “The difference between significant and not significant is not
necessarily itself significant”
• -Gelman & Stern (2006)
10. The RPP
• Nevertheless, .51 correlation between original &
replication effect sizes
• Indicates at least some level of robustness
11. −0.50
−0.25
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00
Original Effect Size
ReplicationEffectSize
p−value
Not Significant
Significant
Replication Power
0.6
0.7
0.8
0.9
Source: Open Science Collaboration. (2015).
Estimating the reproducibility of psychological
science. Science, 349(6251), aac4716.
13. Moderators
• Two study attempts are in different contexts
• Texas vs. California
• Different context = different results?
• Conservative vs. Liberal sample
14. Low power in original studies
• Statistical power:
• The frequency with which a study will yield a statistically significant
effect in repeated sampling, assuming that the underlying effect is
of a given size.
• Low powered designs undermine credibility of statistically
significant results
• Button et al. (2013)
• Type M / Type S errors (Gelman & Carlin, 2014)
15. Low power in original studies
• Replications planned to have minimum 80% power
• Report average power of 92%
16. Publication bias
• Most published results are “positive” findings
• Statistically significant results
• Most studies designed to reject H0
• Most published studies succeed
• Selective preference = bias
• “Statistical significance filter”
17. Statistical significance filter
• Incentive to have results that reach p < .05
• “Statistically significant”
• Evidential standard
• Studies with large effect size achieve significance
• Get published
18. Statistical significance filter
• Studies with smaller effect size don’t reach significance
• Get suppressed
• Average effect size inevitably inflates
• Replication power calculations meaningless
19. Can we account for this bias?
• Consider publication as part of data collection process
• This enters through likelihood function
• Data generating process
• Sampling distribution
20. Mitigation of publication bias
• Remember the statistical significance filter
• We try to build a statistical model of it
21. Mitigation of publication bias
• We formally model 4 possible significance filters
• 4 models comprise overall H0
• 4 models comprise overall H1
• If result consistent with bias, then Bayes factor penalized
• Raise the evidence bar
22. Mitigation of publication bias
• Expected distribution of test statistics that make it to the
literature.
23. Mitigation of publication bias
• None of these are probably right
• (Definitely all wrong)
• But it is a reasonable start
• Doesn’t matter really
• We’re going to mix and mash them all together
• “Bayesian Model Averaging”
24. The Bayes Factor
• How the data shift the balance of evidence
• Ratio of predictive success of the models
BF10 =
p(data | H1)
p(data | H0 )
26. The Bayes Factor
• BF10 > 1 means evidence favors H1
• BF10 < 1 means evidence favors H0
• Need to be clear what H0 and H1 represent
BF10 =
p(data | H1)
p(data | H0 )
28. The Bayes Factor
• H0: d = 0
• H1: d ≠ 0
BF10 =
p(data | H1)
p(data | H0 )
29. The Bayes Factor
• H0: d = 0
• H1: d ≠ 0 (BAD)
• Too vague
• Doesn’t make predictions
BF10 =
p(data | H1)
p(data | H0 )
30. The Bayes Factor
• H0: d = 0
• H1: d ~ Normal(0, 1)
• The effect is probably small
• Almost certainly -2 < d < 2”
BF10 =
p(data | H1)
p(data | H0 )
32. Statistical evidence
• Do independent study attempts obtain similar amounts of
evidence?
• Same prior distribution for both attempts
• Measuring general evidential content
• We want to evaluate evidence from outsider perspective
33. Interpreting evidence
• “How convincing would these data be to a neutral
observer?”
• 1:1 prior odds for H1 vs. H0
• 50% prior probability for each
34. Interpreting evidence
• BF > 10 is sufficiently evidential
• 10:1 posterior odds for H1 vs. H0 (or vice versa)
• 91% posterior probability for H1 (or vice versa)
35. Interpreting evidence
• BF > 10 is sufficiently evidential
• 10:1 posterior odds for H1 vs. H0 (or vice versa)
• 91% posterior probability for H1 (or vice versa)
• BF of 3 is too weak
• 3:1 posterior odds for H1 vs. H0 (or vice versa)
• Only 75% posterior probability for H1 (or vice versa)
36. Interpreting evidence
• It depends on context (of course)
• You can have higher or lower standards of evidence
37. Interpreting evidence
• How do p values stack up?
• American Statistical Association:
• “Researchers should recognize that a p-value … near 0.05 taken
by itself offers only weak evidence against the null hypothesis. “
38. Interpreting evidence
• How do p values stack up?
• p < .05 is weak standard
• p = .05 corresponds to BF ≤ 2.5 (at BEST)
• p = .01 corresponds to BF ≤ 8 (at BEST)
50. Results
• Replication studies, face value
• No chance for bias, no need for correction
• 21% obtain BF10 > 10
• 79% obtain 1/10 < BF10 < 10
• 0 obtain BF10 < 1/10
51.
52. Consistency of results
• No alarming inconsistencies
• 46 cases where both original and replication show only
weak evidence
• Only 8 cases where both show BF10 > 10
53. Consistency of results
• 11 cases where original BF10 > 10, but not replication
• 7 cases where replication BF10 > 10, but not original
• In every case, the study obtaining strong evidence had
the larger sample size
54.
55. Moderators?
• As Laplace would say, we have no need for that
hypothesis
• Results adequately explained by:
• Publication bias in original studies
• Generally weak standards of evidence
56. Take home message
• Recalibrate our intuitions about statistical evidence
• Reevaluate expectations for replications
• Given weak evidence in original studies
58. Thank you
@alxetz ßMy Twitter (no ‘e’ in alex)
alexanderetz.com ß My website/blog
@VandekerckhoveJ ß Joachim’s Twitter
joachim.cidlab.com ß Joachim’s website
59. Want to learn more Bayes?
• My blog:
• alexanderetz.com/understanding-bayes
• “How to become a Bayesian in eight easy steps”
• http://tinyurl.com/eightstepsdraft
• JASP summer workshop
• https://jasp-stats.org/
• Amsterdam, August 22-23, 2016
61. Mitigation of publication bias
• Model 1: No bias
• Every study has same chance of publication
• Regular t distributions
• H0 true: Central t
• H0 false: Noncentral t
• These are used in standard BFs
62. Mitigation of publication bias
• Model 2: Extreme bias
• Only statistically significant results published
• t distributions but…
• Zero density in the middle
• Spikes in significant regions
63. Mitigation of publication bias
• Model 3: Constant-bias
• Nonsignificant results published x% as often as significant results
• t distributions but…
• Central regions downweighted
• Large spikes over significance regions
64. Mitigation of publication bias
• Model 4: Exponential bias
• “Marginally significant” results have a chance to be published
• Harder as p gets larger
• t likelihoods but…
• Spikes over significance regions
• Quick decay to zero density as
(p – α) increases
65. Calculate face-value BF
• Take the likelihood:
• Integrate with respect to prior distribution, p(δ)
tn (x |δ)
66. Calculate face-value BF
• Take the likelihood:
• Integrate with respect to prior distribution, p(δ)
• “What is the average likelihood of the data given this model?”
• Result is marginal likelihood, M
tn (x |δ)
69. Calculate mitigated BF
• Start with regular t likelihood function
• Multiply it by bias function: w = {1, 2, 3, 4}
• Where w=1 is no bias, w=2 is extreme bias, etc.
tn (x |δ)
70. Calculate mitigated BF
• Start with regular t likelihood function
• Multiply it by bias function: w = {1, 2, 3, 4}
• Where w=1 is no bias, w=2 is extreme bias, etc.
• E.g., when w=3 (constant bias):
tn (x |δ)
tn (x |δ)× w(x |θ)
72. Calculate mitigated BF
• Too messy
• Rewrite as a new function
• H1: δ ~ Normal(0 , 1):
pw+ (x | n,δ,θ)∝tn (x |δ)× w(x |θ)
73. Calculate mitigated BF
• Too messy
• Rewrite as a new function
• H1: δ ~ Normal(0 , 1):
• H0: δ = 0:
pw+ (x | n,δ,θ)∝tn (x |δ)× w(x |θ)
pw− (x | n,θ) = pw+ (x | n,δ = 0,θ)
74. Calculate mitigated BF
• Integrate w.r.t. p(θ) and p(δ)
• “What is the average likelihood of the data given each bias model?”
• p(data | bias model w):
75. Calculate mitigated BF
• Integrate w.r.t. p(θ) and p(δ)
• “What is the average likelihood of the data given each bias model?”
• p(data | bias model w):
Mw+ =
Δ
∫ pw+ (x | n,δ,θ)
Θ
∫ p(δ)p(θ)dδdθ
76. Calculate mitigated BF
• Integrate w.r.t. p(θ) and p(δ)
• “What is the average likelihood of the data given each bias model?”
• p(data | bias model w):
Mw+ =
Δ
∫ pw+ (x | n,δ,θ)
Θ
∫ p(δ)p(θ)dδdθ
Mw− = pw− (x | n,θ)
Θ
∫ p(θ)dθ
77. Calculate mitigated BF
• Take Mw+ and Mw- and multiply by the weights of the
corresponding bias model, then sum within each
hypothesis
78. Calculate mitigated BF
• Take Mw+ and Mw- and multiply by the weights of the
corresponding bias model, then sum within each
hypothesis
• For H1:
p(w =1)M1+ + p(w = 2)M2+ + p(w = 3)M3+ + p(w = 4)M4+
79. Calculate mitigated BF
• Take Mw+ and Mw- and multiply by the weights of the
corresponding bias model, then sum within each
hypothesis
• For H1:
• For H0:
p(w =1)M1+ + p(w = 2)M2+ + p(w = 3)M3+ + p(w = 4)M4+
p(w =1)M1− + p(w = 2)M2− + p(w = 3)M3− + p(w = 4)M4−
82. Calculate mitigated BF
• Messy, so we restate as sums:
• For H1:
• For H0:
w
∑ p(w)Mw+
w
∑ p(w)Mw−
83. Calculate mitigated BF
• Messy, so we restate as sums:
• For H1:
BF10 =
• For H0:
w
∑ p(w)Mw+
w
∑ p(w)Mw−
p(data | H1) =
w
∑ p(w)Mw+
p(data | H0 ) =
w
∑ p(w)Mw−