Talk given to the Rhodes Biomedical Association, 4th May 2016.
For references see: http://www.slideshare.net/deevybishop/references-on-reproducibility-crisis-in-science-by-dvm-bishop
How AstraZeneca is Applying AI, Imaging & Data Analytics (AI-Driven Drug Deve...Nick Brown
Keynote AI Presentation given at AI-Driven Drug Development Summit Europe on 26th April 2023 in London. Overview around how AstraZeneca has been developing AI in the past 5+ years. Predominantly focused on R&D and how we are developing digital solutions & AI for right safety and right dose. AI examples include machine learning for safety assessment, augmenting digital pathology for image quantification & segmentation, understanding more about our drugs through advanced imaging modalities and first steps in applying AI for right dose - immunogenicity, adverse events and tolerability.
How AstraZeneca is Applying AI, Imaging & Data Analytics (AI-Driven Drug Deve...Nick Brown
Keynote AI Presentation given at AI-Driven Drug Development Summit Europe on 26th April 2023 in London. Overview around how AstraZeneca has been developing AI in the past 5+ years. Predominantly focused on R&D and how we are developing digital solutions & AI for right safety and right dose. AI examples include machine learning for safety assessment, augmenting digital pathology for image quantification & segmentation, understanding more about our drugs through advanced imaging modalities and first steps in applying AI for right dose - immunogenicity, adverse events and tolerability.
Sensitivity, specificity and likelihood ratiosChew Keng Sheng
A short tutorial on sensitivity, specificity and likelihood ratios. In this presentation, I demonstrate why likelihood ratios are better parameters compared to sensitivity and specificity in real world setting.
For a School of Information class on medical librarianship, this presentation was created to provide a very basic introduction and overview of the concepts, expectations, and experience of the librarian portion of working in a systematic review team.
Improving epidemiological research: avoiding the statistical paradoxes and fa...Maarten van Smeden
Keynote at Norwegian Epidemiological Association conference, October 26 2022. Discussing absence of evidence fallacy, Table 2 fallacy, Winner's curse and Stein's paradox.
4 major threats to reproducibility are publication bias, low power, p-hacking and HARKing. In this talk I explain these terms and show how study pre-registration can fix them
Bias, confounding and causality in p'coepidemiological researchsamthamby79
A brief description of three issues (Bias, Confounding and Causality) commonly encountered while performing pharmacoepidemiological research. A big THANK YOU to Mr. Strom and Mr. Kimmel.
Signal detection is a process used in pharmacovigilance to identify potential safety issues or new safety information associated with a medicinal product. The goal of signal detection is to detect signals, or potential safety concerns, as early as possible in order to allow for timely risk management and safety interventions.
Signal detection typically involves analyzing large amounts of safety data, including adverse event reports, clinical trial data, post-marketing surveillance data, and other sources of safety information. The data is analyzed using statistical methods and algorithms to identify any patterns or trends that may suggest a potential safety concern.
Once a potential safety concern is identified, further investigation is typically required to confirm the signal and assess the magnitude of the risk. This may involve conducting additional studies, analyzing the available data in more detail, or consulting with regulatory agencies and other stakeholders.
Signal detection is an ongoing process that continues throughout the life cycle of a medicinal product. The process is critical for ensuring the ongoing safety and effectiveness of medicinal products, and is an important component of pharmacovigilance activities.
Dichotomania and other challenges for the collaborating biostatisticianLaure Wynants
Conference presentation at ISCB 41 in the session
"Biostatistical inference in practice: moving beyond false
dichotomies"
A comment in Nature, signed by over 800 researchers, called for the scientific community to “retire statistical significance”. The responses included a call to halt the use of the term „statistically significant”, and changes in journal’s author guidelines. The leading discourse among statisticians is that inadequate statistical training of clinical researchers and publishing practices are to blame for the misuse of statistical testing. In this presentation, we search our collective conscience by reviewing ethical guidelines for statisticians in light of the p-value crisis, examine what this implies for us when conducting analyses in collaborative work and teaching, and whether the ATOM (accept uncertainty; be thoughtful, open and modest) principles can guide us.
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...GigaScience, BGI Hong Kong
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: the reproducibility crisis, and the need for transparency. Melbourne University 19th September 2014
Sensitivity, specificity and likelihood ratiosChew Keng Sheng
A short tutorial on sensitivity, specificity and likelihood ratios. In this presentation, I demonstrate why likelihood ratios are better parameters compared to sensitivity and specificity in real world setting.
For a School of Information class on medical librarianship, this presentation was created to provide a very basic introduction and overview of the concepts, expectations, and experience of the librarian portion of working in a systematic review team.
Improving epidemiological research: avoiding the statistical paradoxes and fa...Maarten van Smeden
Keynote at Norwegian Epidemiological Association conference, October 26 2022. Discussing absence of evidence fallacy, Table 2 fallacy, Winner's curse and Stein's paradox.
4 major threats to reproducibility are publication bias, low power, p-hacking and HARKing. In this talk I explain these terms and show how study pre-registration can fix them
Bias, confounding and causality in p'coepidemiological researchsamthamby79
A brief description of three issues (Bias, Confounding and Causality) commonly encountered while performing pharmacoepidemiological research. A big THANK YOU to Mr. Strom and Mr. Kimmel.
Signal detection is a process used in pharmacovigilance to identify potential safety issues or new safety information associated with a medicinal product. The goal of signal detection is to detect signals, or potential safety concerns, as early as possible in order to allow for timely risk management and safety interventions.
Signal detection typically involves analyzing large amounts of safety data, including adverse event reports, clinical trial data, post-marketing surveillance data, and other sources of safety information. The data is analyzed using statistical methods and algorithms to identify any patterns or trends that may suggest a potential safety concern.
Once a potential safety concern is identified, further investigation is typically required to confirm the signal and assess the magnitude of the risk. This may involve conducting additional studies, analyzing the available data in more detail, or consulting with regulatory agencies and other stakeholders.
Signal detection is an ongoing process that continues throughout the life cycle of a medicinal product. The process is critical for ensuring the ongoing safety and effectiveness of medicinal products, and is an important component of pharmacovigilance activities.
Dichotomania and other challenges for the collaborating biostatisticianLaure Wynants
Conference presentation at ISCB 41 in the session
"Biostatistical inference in practice: moving beyond false
dichotomies"
A comment in Nature, signed by over 800 researchers, called for the scientific community to “retire statistical significance”. The responses included a call to halt the use of the term „statistically significant”, and changes in journal’s author guidelines. The leading discourse among statisticians is that inadequate statistical training of clinical researchers and publishing practices are to blame for the misuse of statistical testing. In this presentation, we search our collective conscience by reviewing ethical guidelines for statisticians in light of the p-value crisis, examine what this implies for us when conducting analyses in collaborative work and teaching, and whether the ATOM (accept uncertainty; be thoughtful, open and modest) principles can guide us.
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...GigaScience, BGI Hong Kong
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: the reproducibility crisis, and the need for transparency. Melbourne University 19th September 2014
Open Research Practices in the Age of a Papermill PandemicDorothy Bishop
Talk given to Open Research Group, Maynooth University, October 2022.
Describes the phenomenon of large-scale fraudulent science publishing (papermills), and discusses how open science practices can help tackle this.
D. Mayo: Replication Research Under an Error Statistical Philosophy jemille6
D. Mayo (Virginia Tech) slides from her talk June 3 at the "Preconference Workshop on Replication in the Sciences" at the 2015 Society for Philosophy and Psychology meeting.
Presentation to CRC Mental Health Early Career Researcher Workshop, Melbourne 29.11.17 for @andsdata.
Workshop title: A by-product of scientific training: We're all a little bit biased.
Univ of Miami CTSI: Citizen science seminar; Oct 2014Richard Bookman
The University of Miami's Clinical & Translational Science Institute runs a seminar course for MS students.
This talk surveys 8 citizen science projects, reviews NIH's current activities, and identifies issues for attention, particularly with ethical, legal and social implications.
The Challenge of Deeper Knowledge Graphs for SciencePaul Groth
Over the past 5 years, we have seen multiple successes in the development of knowledge graphs for supporting science in domains ranging from drug discovery to social science. However, in order to really improve scientific productivity, we need to expand and deepen our knowledge graphs. To do so, I believe we need to address two critical challenges: 1) dealing with low resource domains; and 2) improving quality. In this talk, I describe these challenges in detail and discuss some efforts to overcome them through the application of techniques such as unsupervised learning; the use of non-experts in expert domains, and the integration of action-oriented knowledge (i.e. experiments) into knowledge graphs.
Sherri Rose wrote a fascinating article about statistician’s role in big data. One thing I really liked was this line: “This may require implementing commonly used methods, developing a new method, or integrating techniques from other fields to answer our problem.” I really like the idea that integrating and applying standard methods in new and creative ways can be viewed as a statistical contribution.
Reproducibility, preregistration, etc.: Making good science even betterAlex Holcombe
Reproducibility problems afflict many sciences, including psychology. The problems are, to some extent, rooted in the criteria for and process of scientific publication. In response, many journals, funders and professional societies have begun incentivising change. For example, study preregistration, although traditionally used only by clinical trials researchers, is becoming more common. In this seminar, you will learn how it is now used even in basic experimental psychology, and how you can take advantage of preregistration and other new practices to smooth your path to publication and dissemination of your work. Bring your laptop (optional), walk with me through preregistering a study, and also learn how sites such as Open Science Framework facilitate project management and collaboration. One object of this seminar is to spark discussion of how we can all make our already wonderful system of science even better.
Open Data and the Social Sciences - OpenCon Community WebcastRight to Research
These slides were created by Temina Madon.
Temina Madon, Executive Director of the Centre for Effective Global Action, outlines why Open Data is critical to the Social Sciences. She helped launch the Berkeley Initiative for Transparency in the Social Sciences (BITSS), which supports opportunities and tools for students and early career researchers to engage in more open, transparent, reproducible science. She will also discuss the Transparency and Openness Promotion Guidelines, a new set of standards for academic journals.
Presentation to the J. Craig Venter Institute, Dec. 2014Mark Wilkinson
This is largely a compilation of various other talks that I have posted here - a summary of the past 3+ years of work on SADI/SHARE. It includes the (now well-worn!!) slides about SHARE, as well as some of the more contemporary stuff about how we extended GALEN clinical classes with richer semantic descriptions, and then used them to do automated clinical phenotype analysis. Also includes the slide-deck related to automated Measurement Unit conversion (related to our work on semantically representing Framingham clinical risk assessment rules)
So... for anyone who regularly follows my uploads, there isn't much "new" in here, but at least it's all in one place now! :-)
Reproducibility of Published Scientific and Medical Findings in Top Journals in an Era of Big Data by Shannon Bohle, BA, MLIS, CDS (Cantab), FRAS, AHIP
Keynote Analytics Week, Boston, MA November 7, 2014
Big Data is in its infancy and is opening the door to profound change - Grand Opportunities (Accelerating Scientific Discovery) and Grand Challenges to be addressed over the next decade. We explore the premise that Data Science is to data-intensive discovery as the Scientific Method is to scientific discovery, leading us to potential Laws and Limits of Data Science, and then to Best Practices.
Language-impaired preschoolers: A follow-up into adolescence.Dorothy Bishop
Stothard, S. E., Snowling, M. J., Bishop, D. V., Chipchase, B. B., & Kaplan, C. A. (1998). Language-impaired preschoolers: A follow-up into adolescence. Journal of Speech, Language, and Hearing Research: JSLHR, 41(2), 407–418. https://doi.org/10.1044/jslhr.4102.407
ABSTRACT: This paper reports a longitudinal follow-up of 71 adolescents with a preschool history of speech-language impairment, originally studied by Bishop and Edmundson (1987). These children had been subdivided at 4 years into those with nonverbal IQ 2 SD below the mean (General Delay group), and those with normal nonverbal intelligence (SLI group). At age 5;6 the SLI group was subdivided into those whose language problems had resolved, and those with persistent SLI. The General Delay group was also followed up. At age 15-16 years, these children were compared with age-matched normal-language controls on a battery of tests of spoken language and literacy skills. Children whose language problems had resolved did not differ from controls on tests of vocabulary and language comprehension skills. However, they performed significantly less well on tests of phonological processing and literacy skill. Children who still had significant language difficulties at 5;6 had significant impairments in all aspects of spoken and written language functioning, as did children classified as having a general delay. These children fell further and further behind their peer group in vocabulary growth over time.
Otitis media with effusion: an illustration of ascertainment biasDorothy Bishop
Otitis media with effusion (OME) provides an example of how ascertainment bias can induce spurious correlations. Early work suggested it impacted children's language, but when unbiased samples are studied, the effect is absent or very small
Simulating data to gain insights intopower and p-hackingDorothy Bishop
Very basic introduction to simulating data to illustrate issues affecting reproducibility. Uses Excel and R, but assumes no prior knowledge of R. Please let me know of errors or things that need better explanation.
Lecture by Prof Dorothy Bishop, 1st Feb 2017, University of Southampton:
What’s wrong with our Universities, and will the Teaching Excellence Framework put it right?
This presentation explores a brief idea about the structural and functional attributes of nucleotides, the structure and function of genetic materials along with the impact of UV rays and pH upon them.
Richard's entangled aventures in wonderlandRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
Introduction:
RNA interference (RNAi) or Post-Transcriptional Gene Silencing (PTGS) is an important biological process for modulating eukaryotic gene expression.
It is highly conserved process of posttranscriptional gene silencing by which double stranded RNA (dsRNA) causes sequence-specific degradation of mRNA sequences.
dsRNA-induced gene silencing (RNAi) is reported in a wide range of eukaryotes ranging from worms, insects, mammals and plants.
This process mediates resistance to both endogenous parasitic and exogenous pathogenic nucleic acids, and regulates the expression of protein-coding genes.
What are small ncRNAs?
micro RNA (miRNA)
short interfering RNA (siRNA)
Properties of small non-coding RNA:
Involved in silencing mRNA transcripts.
Called “small” because they are usually only about 21-24 nucleotides long.
Synthesized by first cutting up longer precursor sequences (like the 61nt one that Lee discovered).
Silence an mRNA by base pairing with some sequence on the mRNA.
Discovery of siRNA?
The first small RNA:
In 1993 Rosalind Lee (Victor Ambros lab) was studying a non- coding gene in C. elegans, lin-4, that was involved in silencing of another gene, lin-14, at the appropriate time in the
development of the worm C. elegans.
Two small transcripts of lin-4 (22nt and 61nt) were found to be complementary to a sequence in the 3' UTR of lin-14.
Because lin-4 encoded no protein, she deduced that it must be these transcripts that are causing the silencing by RNA-RNA interactions.
Types of RNAi ( non coding RNA)
MiRNA
Length (23-25 nt)
Trans acting
Binds with target MRNA in mismatch
Translation inhibition
Si RNA
Length 21 nt.
Cis acting
Bind with target Mrna in perfect complementary sequence
Piwi-RNA
Length ; 25 to 36 nt.
Expressed in Germ Cells
Regulates trnasposomes activity
MECHANISM OF RNAI:
First the double-stranded RNA teams up with a protein complex named Dicer, which cuts the long RNA into short pieces.
Then another protein complex called RISC (RNA-induced silencing complex) discards one of the two RNA strands.
The RISC-docked, single-stranded RNA then pairs with the homologous mRNA and destroys it.
THE RISC COMPLEX:
RISC is large(>500kD) RNA multi- protein Binding complex which triggers MRNA degradation in response to MRNA
Unwinding of double stranded Si RNA by ATP independent Helicase
Active component of RISC is Ago proteins( ENDONUCLEASE) which cleave target MRNA.
DICER: endonuclease (RNase Family III)
Argonaute: Central Component of the RNA-Induced Silencing Complex (RISC)
One strand of the dsRNA produced by Dicer is retained in the RISC complex in association with Argonaute
ARGONAUTE PROTEIN :
1.PAZ(PIWI/Argonaute/ Zwille)- Recognition of target MRNA
2.PIWI (p-element induced wimpy Testis)- breaks Phosphodiester bond of mRNA.)RNAse H activity.
MiRNA:
The Double-stranded RNAs are naturally produced in eukaryotic cells during development, and they have a key role in regulating gene expression .
Multi-source connectivity as the driver of solar wind variability in the heli...Sérgio Sacani
The ambient solar wind that flls the heliosphere originates from multiple
sources in the solar corona and is highly structured. It is often described
as high-speed, relatively homogeneous, plasma streams from coronal
holes and slow-speed, highly variable, streams whose source regions are
under debate. A key goal of ESA/NASA’s Solar Orbiter mission is to identify
solar wind sources and understand what drives the complexity seen in the
heliosphere. By combining magnetic feld modelling and spectroscopic
techniques with high-resolution observations and measurements, we show
that the solar wind variability detected in situ by Solar Orbiter in March
2022 is driven by spatio-temporal changes in the magnetic connectivity to
multiple sources in the solar atmosphere. The magnetic feld footpoints
connected to the spacecraft moved from the boundaries of a coronal hole
to one active region (12961) and then across to another region (12957). This
is refected in the in situ measurements, which show the transition from fast
to highly Alfvénic then to slow solar wind that is disrupted by the arrival of
a coronal mass ejection. Our results describe solar wind variability at 0.5 au
but are applicable to near-Earth observatories.
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...Scintica Instrumentation
Intravital microscopy (IVM) is a powerful tool utilized to study cellular behavior over time and space in vivo. Much of our understanding of cell biology has been accomplished using various in vitro and ex vivo methods; however, these studies do not necessarily reflect the natural dynamics of biological processes. Unlike traditional cell culture or fixed tissue imaging, IVM allows for the ultra-fast high-resolution imaging of cellular processes over time and space and were studied in its natural environment. Real-time visualization of biological processes in the context of an intact organism helps maintain physiological relevance and provide insights into the progression of disease, response to treatments or developmental processes.
In this webinar we give an overview of advanced applications of the IVM system in preclinical research. IVIM technology is a provider of all-in-one intravital microscopy systems and solutions optimized for in vivo imaging of live animal models at sub-micron resolution. The system’s unique features and user-friendly software enables researchers to probe fast dynamic biological processes such as immune cell tracking, cell-cell interaction as well as vascularization and tumor metastasis with exceptional detail. This webinar will also give an overview of IVM being utilized in drug development, offering a view into the intricate interaction between drugs/nanoparticles and tissues in vivo and allows for the evaluation of therapeutic intervention in a variety of tissues and organs. This interdisciplinary collaboration continues to drive the advancements of novel therapeutic strategies.
Slide 1: Title Slide
Extrachromosomal Inheritance
Slide 2: Introduction to Extrachromosomal Inheritance
Definition: Extrachromosomal inheritance refers to the transmission of genetic material that is not found within the nucleus.
Key Components: Involves genes located in mitochondria, chloroplasts, and plasmids.
Slide 3: Mitochondrial Inheritance
Mitochondria: Organelles responsible for energy production.
Mitochondrial DNA (mtDNA): Circular DNA molecule found in mitochondria.
Inheritance Pattern: Maternally inherited, meaning it is passed from mothers to all their offspring.
Diseases: Examples include Leber’s hereditary optic neuropathy (LHON) and mitochondrial myopathy.
Slide 4: Chloroplast Inheritance
Chloroplasts: Organelles responsible for photosynthesis in plants.
Chloroplast DNA (cpDNA): Circular DNA molecule found in chloroplasts.
Inheritance Pattern: Often maternally inherited in most plants, but can vary in some species.
Examples: Variegation in plants, where leaf color patterns are determined by chloroplast DNA.
Slide 5: Plasmid Inheritance
Plasmids: Small, circular DNA molecules found in bacteria and some eukaryotes.
Features: Can carry antibiotic resistance genes and can be transferred between cells through processes like conjugation.
Significance: Important in biotechnology for gene cloning and genetic engineering.
Slide 6: Mechanisms of Extrachromosomal Inheritance
Non-Mendelian Patterns: Do not follow Mendel’s laws of inheritance.
Cytoplasmic Segregation: During cell division, organelles like mitochondria and chloroplasts are randomly distributed to daughter cells.
Heteroplasmy: Presence of more than one type of organellar genome within a cell, leading to variation in expression.
Slide 7: Examples of Extrachromosomal Inheritance
Four O’clock Plant (Mirabilis jalapa): Shows variegated leaves due to different cpDNA in leaf cells.
Petite Mutants in Yeast: Result from mutations in mitochondrial DNA affecting respiration.
Slide 8: Importance of Extrachromosomal Inheritance
Evolution: Provides insight into the evolution of eukaryotic cells.
Medicine: Understanding mitochondrial inheritance helps in diagnosing and treating mitochondrial diseases.
Agriculture: Chloroplast inheritance can be used in plant breeding and genetic modification.
Slide 9: Recent Research and Advances
Gene Editing: Techniques like CRISPR-Cas9 are being used to edit mitochondrial and chloroplast DNA.
Therapies: Development of mitochondrial replacement therapy (MRT) for preventing mitochondrial diseases.
Slide 10: Conclusion
Summary: Extrachromosomal inheritance involves the transmission of genetic material outside the nucleus and plays a crucial role in genetics, medicine, and biotechnology.
Future Directions: Continued research and technological advancements hold promise for new treatments and applications.
Slide 11: Questions and Discussion
Invite Audience: Open the floor for any questions or further discussion on the topic.
Mammalian Pineal Body Structure and Also Functions
What is the reproducibility crisis in science and what can we do about it?
1. What is the reproducibility crisis
in science and what can we do
about it?
Dorothy V. M. Bishop
Professor of Developmental Neuropsychology
University of Oxford
@deevybee
2. What is the problem?
“There is increasing concern about the
reliability of biomedical research, with recent
articles suggesting that up to 85% of
research funding is wasted.”
Bustin, S. A. (2015). The reproducibility of
biomedical research: Sleepers awake!
Biomolecular Detection and
Quantification
2005. PLoS Medicine, 2(8), e124. doi:
10.1371/journal.pmed.0020124
5. Which Article Should You Write?
There are two possible articles you can write: (a) the article you planned to
write when you designed your study or (b) the article that makes the most sense
now that you have seen the results. They are rarely the same, and the correct
answer is (b).
re Data Analysis: Examine them from every angle. Analyze the sexes separately.
Make up new composite indexes. If a datum suggests a new hypothesis, try to
find additional evidence for it elsewhere in the data. If you see dim traces of
interesting patterns, try to reorganize the data to bring them into bolder relief. If
there are participants you don’t like, or trials, observers, or interviewers who
gave you anomalous results, drop them (temporarily). Go on a fishing expedition
for something— anything —interesting.
Writing the Empirical Journal Article
Daryl J. Bem
The Compleat Academic: A Practical Guide for the Beginning Social
Scientist, 2nd Edition. Washington, DC: American Psychological
Association, 2004.
“This book provides invaluable guidance that will help new academics plan,
play, and ultimately win the academic career game.”
Explicitly advises
HARKing!
6. Generate
and specify
hypotheses
Design
study
Collect data
Analyse
data & test
hypotheses
Interpret
data
Publish or
conduct
next
experiment
Hypothetico-deductive scientific method
based on original by Chris Chambers
p-hacking
P-hacking: doing many tests and only reporting the
significant ones. Collecting extra data or removing
outliers to push ‘nearly significant’ results over
boundary.
How common?
8. Generate
and specify
hypotheses
Design
study
Collect data
Analyse
data & test
hypotheses
Interpret
data
Publish or
conduct
next
experiment
Hypothetico-deductive scientific method
based on original by Chris Chambers
p-hacking
Low
statistical
power
Sample size too small
to detect real effect
9. Button KS et al. 2013. Power failure: why small sample size
undermines the reliability of neuroscience. Nature Reviews
Neuroscience 14:365-376.
Median power of studies included in
neuroscience meta-analyses
10. Generate
and specify
hypotheses
Design
study
Collect data
Analyse
data & test
hypotheses
Interpret
data
Publish or
conduct
next
experiment
Hypothetico-deductive scientific method
based on original by Chris Chambers
p-hacking
Low
statistical
power
Publication
bias
Null findings don’t get
published – literature
distorted
Fanelli, 2010: 92% papers
report positive findings
11. Generate
and specify
hypotheses
Design
study
Collect data
Analyse
data & test
hypotheses
Interpret
data
Publish or
conduct
next
experiment
Hypothetico-deductive scientific method
p-hacking
Low
statistical
power
Publication
bias
Methods to avert bias
not reported
MacLeod et al, 2015: in
vivo research, only around
25% papers reported
randomisation/blinding
Failure to
control for
bias
12. Generate
and specify
hypotheses
Design
study
Collect data
Analyse
data & test
hypotheses
Interpret
data
Publish or
conduct
next
experiment
Hypothetico-deductive scientific method
p-hacking
Low
statistical
power
Publication
bias
Failure to
control for
bias
Poor quality
control, e.g.
misidentified
cell lines/
reagents
13. Bustin (2015) on RNA biomarkers:
“molecular techniques can be unfit for purpose”
Poor fidelity of reagents/cell lines
15. 1956
De Groot
Failure to distinguish between
hypothesis-testing and
hypothesis-generating
(exploratory) research
-> misuse of statistical tests
Historical timeline: concerns about reproducibility
16. 1956
De Groot
1975
Greenwald
“As it is functioning in at least some areas of
behavioral science research, the research-
publication system may be regarded as a
device for systematically generating and
propagating anecdotal information.”
18. 1956
De Groot
1975
Greenwald
1987
Newcombe
“Small studies continue to be carried out
with little more than a blind hope of
showing the desired effect. Nevertheless,
papers based on such work are submitted
for publication, especially if the results
turn out to be statistically significant.”
1979
Rosenthal
19. 1956
De Groot
1975
Greenwald
1987
Newcombe
1993
Dickersin
& Min
Clinical trials with ‘significant’ results substantially more
likely to be published. “Most unpublished trials remained
so because investigators thought the results were ‘not
interesting’ or they ‘did not have enough time’”
1979
Rosenthal
20. 1956
De Groot
1975
Greenwald
1987
Newcombe
1993
Dickersin
& Min
“The misidentified cell lines reported here have already
been unwittingly used in several hundreds of potentially
misleading reports, including use as inappropriate tumor
models and subclones masquerading as independent
replicates.”
1999
Macleod
et al
1979
Rosenthal
21. Why is this making headlines now?
• Increase in studies quantifying the problem
• Concern from those who use research:
• Doctors and Patients
• Pharma companies
• Social media
“It really is striking just for how long there have been reports about the poor
quality of research methodology, inadequate implementation of research
methods and use of inappropriate analysis procedures as well as lack of
transparency of reporting. All have failed to stir researchers, funders,
regulators, institutions or companies into action”. Bustin, 2014
22. Failure to appreciate power of ‘the prepared mind’
Natural instinct is to look for consistent evidence, not disproof
Problems caused by researchers: 1
23. “The self-deception comes in
that over the next 20 years,
people believed they saw
specks of light that
corresponded to what they
thought Vulcan should look
during an eclipse: round objects
crossing the face of the sun,
which were interpreted as
transits of Vulcan.”
24. Seeing things in complex data requires skill
Bailey and von Bonin (1951) noted problems in
Brodmann's approach — lack of observer
independency, reproducibility and objectivity
Yet have stood test of time: still used today
Brodmann areas, 1909
25. Seeing things in complex data requires skill
Or pareidolia
Bailey and von Bonin (1951) noted problems in
Brodmann's approach — lack of observer
independency, reproducibility and objectivity
Yet have stood test of time: still used today
Brodmann areas, 1909
26. Discusses failure so replicate studies on preferential
looking in babies – role of experimenter expertise
27. Special expertise or Jesus in toast?
How to decide
• Eradicate subjectivity from methods
• Adopt standards from industry for checking/double-
checking
• Automate data collection and analysis as far as possible
• Make recordings of methods (e.g. Journal of Visualised
Experiments)
• Make data and analysis scripts open
28. Failure to understand statistics (esp. p-values and power)
http://deevybee.blogspot.co.uk/2016/01/the-amazing-significo-why-researchers.html
Problems caused by researchers: 2
29. Gelman A, and Loken E. 2013. The garden of forking
paths: Why multiple comparisons can be a problem,
even when there is no 'fishing expedition' or 'p-hacking'
and the research hypothesis was posited ahead of
time.
www.stat.columbia.edu/~gelman/research/unpublished/p_
hacking.pdf
"El jardín de senderos que se bifurcan"
30. 1 contrast
Probability of a
‘significant’ p-value
< .05 = .05
Large population
database used to explore
link between ADHD and
handedness
https://figshare.com/articles/The_Garden_of_Forking_Paths/2100379
31. Focus just on Young
subgroup:
2 contrasts at this level
Probability of a
‘significant’ p-value < .05
= .10
Large population
database used to explore
link between ADHD and
handedness
32. Focus just on Young on
measure of hand skill:
4 contrasts at this level
Probability of a
‘significant’ p-value < .05
= .19
Large population
database used to explore
link between ADHD and
handedness
33. Focus just on Young,
Females on
measure of hand skill:
8 contrasts at this level
Probability of a
‘significant’ p-value < .05
= .34
Large population
database used to explore
link between ADHD and
handedness
34. Focus just on Young,
Urban, Females on
measure of hand skill:
16 contrasts at this level
Probability of a
‘significant’ p-value < .05
= .56
Large population
database used to explore
link between ADHD and
handedness
35. Problem exacerbated because
• Can now easily gather huge multivariate datasets
• Can easily do complex statistical analyses
Problems with exploratory analyses
that use methods that presuppose
hypothesis-testing approach
41. Illustrated with field of ERP/EEG
• Flexibility in analysis in terms of:
• Electrodes
• Time intervals
• Frequency ranges
• Measurement of peaks
• etc, etc
• Often see analyses with 4- or 5-way ANOVA (group x side x
site x condition x interval)
• Standard stats packages correct p-values for N levels
WITHIN a factor, but not for overall N factors and
interactions
.
Cramer AOJ, et al 2016. Hidden multiplicity in exploratory multiway ANOVA: Prevalence and
remedies. Psychonomic Bulletin & Review 23:640-647
42.
43. Solutions
b. Distinguish exploration from hypothesis-
testing analyses
• Subdivide data into exploration and replication
sets.
• Or replicate in another dataset
44. Solutions
c. Masked data
Comparison of coronary care units vs treatment at home
From Ben Goldacre’s blog:
http://www.badscience.net/2010/04/righteous-mischief-from-archie-cochrane/
Archie Cochrane
45. Solutions
c. Masked data
MacCoun R., Perlmutter S. 2015 Hide results to seek the truth. Nature 526, 187-189.
“...temporarily and judiciously removing data labels and altering data
values to fight bias and error”
48. • Reluctance to collaborate with competitors
• Reluctance to share data
• Fabricated data
Problems caused by researchers. 3
Solutions to these may require changes to incentive structures, which
leads us to....
52. This is counterproductive because
• Amount of funding needed to do research is not a
proxy for value of that research
• Some activities intrinsically more expensive
• Does not make sense to disfavour research areas
that cost less
52
Daniel Kahneman
53. Furthermore....
• Desperate scramble for
research funds leads to
researchers being
overcommitted ->
poorly conducted
studies
• Ridiculous amount of
waste due to the
‘academic backlog’
53
54. Journal impact factor as measure
of quality
• Mean number of citations to
articles published in any given
journal in the two preceding years
• Originally designed to help
libraries decide on subscriptions
• Now often used as proxy for
quality of an article
54
Eugene Garfield
55. Problems with journal impact factors
• Impact factor not a good
indication of the citations for
individual articles in the
journal, because distribution
very skewed
• Typically, around half the
articles have very few
citations
55
http://www.dcscience.net/colquhoun-nature-impact-2003.pdf
N citations for sample of papers
in Nature
57. Problems caused by employers
• Reward research reproducibility over impact
factor in evaluation
• Consider ‘bang for your buck’ rather than
amount of grant income
• Reward those who adopt open science practices
Solutions for institutions
Nat Biotech, 32(9), 871-873. doi: 10.1038/nbt.3004
Marcia McNutt
Science 2014 • VOL 346 ISSUE 6214
58. Problems caused by funders
• Don’t require that all data reported
Though growing interest in data sharing
• No interest in funding replications
• No interest in funding systematic reviews
Problems caused by funders