Ee184405 statistika dan stokastik statistik deskriptif 1 grafikyusufbf
Statistika adalah suatu bidang ilmu yang mempelajari cara-cara mengumpulkan data untuk selanjutnya dapat dideskripsikan dan diolah, kemudian melakukan induksi/inferensi dalam rangka membuat kesimpulan, agar dapat ditentukan keputusan yang akan diambil berdasarkan data yang dimiliki.
DATA =============> PROSES STATISTIK ===========> INFORMASI
Statistik Deskriptif adalah suatu cara menggambarkan persoalan yang berdasarkan data yang dimiliki yakni dengan cara menata data tersebut sedemikian rupa agar karakteristik data dapat dipahami dengan mudah sehingga berguna untuk keperluan selanjutnya.
Ee184405 statistika dan stokastik statistik deskriptif 1 grafikyusufbf
Statistika adalah suatu bidang ilmu yang mempelajari cara-cara mengumpulkan data untuk selanjutnya dapat dideskripsikan dan diolah, kemudian melakukan induksi/inferensi dalam rangka membuat kesimpulan, agar dapat ditentukan keputusan yang akan diambil berdasarkan data yang dimiliki.
DATA =============> PROSES STATISTIK ===========> INFORMASI
Statistik Deskriptif adalah suatu cara menggambarkan persoalan yang berdasarkan data yang dimiliki yakni dengan cara menata data tersebut sedemikian rupa agar karakteristik data dapat dipahami dengan mudah sehingga berguna untuk keperluan selanjutnya.
This material is a part of PGPSE / CSE study material for the students of PGPSE / CSE students. PGPSE is a free online programme for all those who want to be social entrepreneurs / entrepreneurs
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
Richard's entangled aventures in wonderlandRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
A tale of two Lucys - Delft lecture - March 4, 2024Richard Gill
TUDelft Seminar Probability & Statistics, 4 March 2024
15:45 T/M 16:45 - LOCATION: LECTURE HALL D@TA
Lucia de Berk, a Dutch nurse, was arrested in 2001, and tried and convicted of serial murder of patients in her care. At a lower court the only hard evidence against her was the result of a probability calculation: the chance that she was present at so many suspicious deaths and collapses in the hospitals where she had worked was 1 in 342 million. During appeal proceedings at a higher court, the prosecution shifted gears and gave the impression that there was now hard evidence that she had killed one baby. Having established that she was a killer and a liar (she claimed innocence) it was not difficult to pin another 9 deaths and collapses on her. No statistics were needed any more. In 2005 the conviction was confirmed by the supreme court. But at the same time, some whistleblowers started getting attention from the media. A long fight for the hearts and minds of the public, and a long fight to have the case reopened (without any new evidence - only new scientific interpretation of existing evidence) began and ended in 2010 with Lucia’s complete exoneration. A number of statisticians played a big role in that fight. The idea that the conviction was purely based on objective scientific evidence was actually an illusion. This needed to be explained to journalists & to the public. And the judiciary needed to be convinced that something had to be done about it. Lucy Letby, an English nurse, was arrested in 2020 for murder of a large number of babies at a hospital in Chester, UK, in Jan 2015-June 2016. Her trial started in 2022 and took 10 months. She was convicted and given a whole life sentence in 2023. In my opinion, the similarities between the two cases are horrific. Again there is statistical evidence: a cluster of unexplained bad events, and Lucy was there every time; there is apparently irrefutable scientific evidence for two babies; and just like with Lucia de Berk, there are some weird personal and private writings which can be construed as a confession. For many reasons, the chances of a fair retrial for Lucy Letby are very thin indeed, but I am convinced she is innocent and that her trial was grossly unfair.
Optimal statistical analysis of Bell experiments
Richard D Gill (Mathematical Institute, Leiden University)
The 2022 Nobel prize in physics went to Clauser, Horne and Zeilinger '“for experiments with entangled photons, establishing the violation of Bell inequalities and pioneering quantum information science”. I played a modest part in the process which led up to that prize by contributing statistical methodology used in four decisive “loophole free" experiments of 2020. What I contributed was quite simply the idea of using randomisation in order to get guaranteed statistical validity, and martingale methods which allowed the experimenters to rule out the notion that an apparent violation of Bell’s inequality could simply be due to time trends in physical parameters over the course of an experiment which takes days to complete (confounding of treatment with time). Most recently I have studied some simple methods to reduce noise in the usual ad hoc estimators of the four correlations which figure in Bell’s inequality. Do not fear: the statistical model is very simple, no knowledge of quantum mechanics is needed to understand the statistical issues. The talk is about the statistical analysis of four 2x2 tables.
https://www.mdpi.com/2673-9909/3/2/23
Subtitle: "Statistical issues in the investigation of a suspected serial killer nurse"
Abstract:
Investigating a cluster of deaths on a hospital ward is a difficult task for medical investigators, police, and courts. Patients do die in hospitals (the three most common causes of deaths in a hospital are, in order: cancer, heart disease, medical errors). Often such cases come to the attention of police investigators for two reasons: gossip about a particular nurse is circulating just as a couple of unexpected and disturbing events occur. Hospital investigators see a pattern and call in the police.
I will discuss two such cases with which I have been intensively involved. The first one is the case of the Dutch nurse Lucia de Berk. Arrested in 2001, convicted by a succession of courts up to the supreme court by 2005, after which a long fight started to get her a re-trial. She was completely exonerated in 2010. The second case is that of the English nurse Lucy Letby. Arrested in 2018, 2019 and 2020 for murders taking place in 2015 and 2016. Her trial started in 2022 and concluded with a “full life” sentence a couple of months ago.
There are many similarities between the two cases, but also a couple of disturbing differences. One difference being that Lucy Letby’s lawyers seem to have made no attempt whatsoever to defend her. Another difference is that statistics was used against Lucia de Berk but not, apparently, against Lucy Letby. But appearances are not always what they seem.
Report published by Royal Statistical Society on statistical issues in these cases
https://rss.org.uk/news-publication/news-publications/2022/section-group-reports/rss-publishes-report-on-dealing-with-uncertainty-i/
News feature in “Science” about myself and my work
https://www.science.org/content/article/unlucky-numbers-fighting-murder-convictions-rest-shoddy-stats
https://www.maths.lu.se/kalendarium/?evenemang=statistics-seminar-statistical-issues-investigation-suspected-serial-killer-nurse-richard-gill
Video: https://www.youtube.com/watch?v=RxmFLKTlim8
The RSS has published a report tackling statistical bias in criminal trials where healthcare professionals are accused of murdering patients. Following several high-profile cases where statistical evidence has been misused, the Society calls for all parties in such cases to consult with professional statisticians and use only expert witnesses who are appropriately qualified.
The report, ‘Healthcare serial killer or coincidence?’, is produced by the RSS’s Statistics and the Law Section. The group evolved from a working group of the same name set up in early 2000s after the Society wrote to the Lord Chancellor and made a statement setting out concerns around the use of statistical evidence in the case of Sally Clark.
According to the report, suspicions about medical murder often arise due to a surprising or unexpected series of events, such as an unusual number of deaths among patients under the care of a particular professional.
The RSS has major concerns about use of this kind of evidence in a criminal investigation: first, over the analysis and interpretation of such data, and secondly over whether it can be guaranteed that the data have been compiled in an objective and unbiased manner.
Subtitle: Statistical issues in the investigation of a suspected serial killer nurse
Abstract:
Investigating a cluster of deaths on a hospital ward is a difficult task for medical investigators, police, and courts. Patients do die in hospitals (the three most common causes of deaths in a hospital are, in order: cancer, heart disease, medical errors). Often such cases come to the attention of police investigators for two reasons: gossip about a particular nurse is circulating just as a couple of unexpected and disturbing events occur. Hospital investigators see a pattern and call in the police.
I will discuss two such cases with which I have been intensively involved. The first one is the case of the Dutch nurse Lucia de Berk. Arrested in 2001, convicted by a succession of courts up to the supreme court by 2005, after which a long fight started to get her a re-trial. She was completely exonerated in 2010. The second case is that of the English nurse Lucy Letby. Arrested in 2018, 2019 and 2020 for murders taking place in 2015 and 2016. Her trial started in 2022 and concluded with a “full life” sentence a couple of months ago.
There are many similarities between the two cases, but also a couple of disturbing differences. One difference being that Lucy Letby’s lawyers seem to have made no attempt whatsoever to defend her. Another difference is that statistics was used against Lucia de Berk but not, apparently, against Lucy Letby. But appearances are not always what they seem.
Report published by Royal Statistical Society on statistical issues in these cases
https://rss.org.uk/news-publication/news-publications/2022/section-group-reports/rss-publishes-report-on-dealing-with-uncertainty-i/
News feature in “Science” about myself and my work
https://www.science.org/content/article/unlucky-numbers-fighting-murder-convictions-rest-shoddy-stats
Statistical issues in the investigation of a suspected serial killer nurse
Abstract
Investigating a cluster of deaths on a hospital ward is a difficult task for medical investigators, police, and courts. Patients do die in hospitals (the three most common causes of deaths in a hospital are, in order: cancer, heart disease, medical errors). Often such cases come to the attention of police investigators for two reasons: gossip about a particular nurse is circulating just as a couple of unexpected and disturbing events occur. Hospital investigators see a pattern and call in the police.
I will discuss two such cases with which I have been intensively involved. The first one is the case of the Dutch nurse Lucia de Berk. Arrested in 2001, convicted by a succession of courts up to the supreme court by 2005, after which a long fight started to get her a re-trial. She was completely exonerated in 2010. The second case is that of the English nurse Lucy Letby. Arrested in 2018, 2019 and 2020 for murders taking place in 2015 and 2016. Her trial started in 2022 and concluded with a “full life” sentence a couple of months ago.
There are many similarities between the two cases, but also a couple of disturbing differences. One difference being that Lucy Letby’s lawyers seem to have made no attempt whatsoever to defend her. Another difference is that statistics was used against Lucia de Berk but not, apparently, against Lucy Letby. But appearances are not always what they seem.
Report published by Royal Statistical Society on statistical issues in these cases
https://rss.org.uk/news-publication/news-publications/2022/section-group-reports/rss-publishes-report-on-dealing-with-uncertainty-i/
News feature in “Science” about myself and my work
https://www.science.org/content/article/unlucky-numbers-fighting-murder-convictions-rest-shoddy-stats
I discuss various statistical analyses of the recent Bell experiment of Storz et al. (2023, Nature) at ETH Zurich. Both standard and novel analyses under different assumptions result in almost identical conclusions. This suggests strongly that those assumptions are actually satisfied.
Statistics, causality, and the 2022 Nobel prizes in physics.
Richard Gill
Leiden University
The 2022 Nobel prize in physics was awarded to John Clauser, Alain Aspect and Anton Zeilinger "for experiments with entangled photons, establishing the violation of Bell inequalities and pioneering quantum information science”. I will explain each of these three gentlemen’s contributions and point out connections to classical statistical causality and probabilistic coupling. It seems that the first commercial application of this work will be a technology called DIQKD: "device independent quantum key distribution". Alice and Bob are far apart and need to establish a shared cryptographic key so as to send one another some securely encrypted messages over public communication channels. How can they create a suitable key while far apart from one another, and only able to communicate using classical means and over public channels?
Healthcare serial killer or coincidence?
Richard Gill
Mathematical Institute, Leiden University
Abstract: The UK’s *Royal Statistical Society* recently published a report tackling statistical bias in criminal trials where healthcare professionals are accused of murdering patients. Following several high-profile cases where statistical evidence has been misused, the Society calls for all parties in such cases to consult with professional statisticians and use only expert witnesses who are appropriately qualified.
The RSS report came out just two weeks before the start of the trial in Manchester of a nurse called Lucy Letby. The trial is still ongoing. So far, neither side has called for evidence from experts in statistics. The core of the prosecution case is that so many odd events connected to nurse LL cannot be a coincidence.
I will discuss the challenges both procedural and conceptual which arise when presenting statistical thinking as evidence in criminal trials. https://rss.org.uk/news-publication/news-publications/2022/section-group-reports/rss-publishes-report-on-dealing-with-uncertainty-i/
We analyse data from the final two years of a long-running and influential annual Dutch survey of the quality of Dutch New Herring served in large samples of consumer outlets. The data was compiled and analysed by a university econometrician whose findings were publicized in national and international media. This led to the cessation of the survey amid allegations of bias due to a conflict of interest on the part of the leader of the herring tasting team. The survey organizers responded with accusations of failure of scientific integrity. The econometrician was acquitted of wrong-doing by the Dutch authority, whose inquiry nonetheless concluded that further research was needed. We reconstitute the data and uncover its important features which throw new light on the econometrician's findings, focussing on the issue of correlation versus causality: the sample is definitely not a random sample. Taking account both of newly discovered data features and of the sampling mechanism, we conclude that there is no evidence of biased evaluation, despite the econometrician's renewed insistence on his claim.
This year’s Nobel prize in physics: homage to John Bell.
Richard Gill
Mathematical Institute, Leiden University.
Focussing on statistical issues, I will first sketch the history initiated by John Bell’s landmark 1964 paper “On the Einstein Podolsky Rosen paradox”, which led to the 2022 Nobel prize awarded to John Clauser, Alain Aspect and Anton Zeilinger,
https://www.nobelprize.org/prizes/physics/2022/press-release/
A breakthrough in the history was the four successful “loophole-free” Bell experiments of 2015 and 2016 in Delft, Munich, NIST and Vienna. These experiments pushed quantum technology to the limit and paved the way for DIQKD (“Device Independent Quantum Key Distribution”) and a quantum internet. They were the first successful implementations of the ideal experimental protocol described by John Bell in his 1981 masterpiece "Bertlmann's socks and the nature of reality", and depended on brilliant later innovations: Eberhard’s discovery that less entanglement could allow stronger manifestation of quantum non-locality, and Zeilinger’s discovery of quantum teleportation, allowing entanglement between photons to be transferred to entanglement between ions or atoms and ultimately to components of manufactured semi-conductors.
I will also discuss reanalyses of the 2015+ experiments, which could have allowed the experimenters to claim even smaller p-values than the ones they published,
https://arxiv.org/abs/2209.00702 "Optimal statistical analyses of Bell experiments"
This year’s Nobel prize in physics: homage to John Bell.
Richard Gill
Mathematical Institute, Leiden University.
Focussing on statistical issues, I will first sketch the history initiated by John Bell’s landmark 1964 paper “On the Einstein Podolsky Rosen paradox”, which led to the 2022 Nobel prize awarded to John Clauser, Alain Aspect and Anton Zeilinger,
https://www.nobelprize.org/prizes/physics/2022/press-release/
A breakthrough in the history was the four successful “loophole-free” Bell experiments of 2015 and 2016 in Delft, Munich, NIST and Vienna. These experiments pushed quantum technology to the limit and paved the way for DIQKD (“Device Independent Quantum Key Distribution”) and a quantum internet. They were the first successful implementations of the ideal experimental protocol described by John Bell in his 1981 masterpiece "Bertlmann's socks and the nature of reality", and depended on brilliant later innovations: Eberhard’s discovery that less entanglement could allow stronger manifestation of quantum non-locality, and Zeilinger’s discovery of quantum teleportation, allowing entanglement between photons to be transferred to entanglement between ions or atoms and ultimately to components of manufactured semi-conductors.
I will also discuss reanalyses of the 2015+ experiments, which could have allowed the experimenters to claim even smaller p-values than the ones they published,
https://arxiv.org/abs/2209.00702 "Optimal statistical analyses of Bell experiments"
Better slides: https://www.slideshare.net/gill1109/nobelpdf-253673329
Solution to the measurement problem based on Belavkin's theory of Eventum Mechanics. There is only Schrödinger’s equation and a unitary evolution of the wave function of the universe, but we must add a Heisenberg cut to separate the past from the future (to separate particles from waves): Belavkin’s eventum mechanics. The past is a commuting sub-algebra A of the algebra of all observables B, and in the Heisenberg picture, the past history of any observable in A is also in A. Particles have definite trajectories back into the past; Eventum Mechanics defines the probability distributions of future given past. https://arxiv.org/abs/0905.2723 Schrödinger's cat meets Occam's razor (version 3: 10 Aug 2022); to appear in Entropy
This presentation explores a brief idea about the structural and functional attributes of nucleotides, the structure and function of genetic materials along with the impact of UV rays and pH upon them.
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...Wasswaderrick3
In this book, we use conservation of energy techniques on a fluid element to derive the Modified Bernoulli equation of flow with viscous or friction effects. We derive the general equation of flow/ velocity and then from this we derive the Pouiselle flow equation, the transition flow equation and the turbulent flow equation. In the situations where there are no viscous effects , the equation reduces to the Bernoulli equation. From experimental results, we are able to include other terms in the Bernoulli equation. We also look at cases where pressure gradients exist. We use the Modified Bernoulli equation to derive equations of flow rate for pipes of different cross sectional areas connected together. We also extend our techniques of energy conservation to a sphere falling in a viscous medium under the effect of gravity. We demonstrate Stokes equation of terminal velocity and turbulent flow equation. We look at a way of calculating the time taken for a body to fall in a viscous medium. We also look at the general equation of terminal velocity.
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...University of Maribor
Slides from talk:
Aleš Zamuda: Remote Sensing and Computational, Evolutionary, Supercomputing, and Intelligent Systems.
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Inter-Society Networking Panel GRSS/MTT-S/CIS Panel Session: Promoting Connection and Cooperation
https://www.etran.rs/2024/en/home-english/
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
1. Best Practices in Statistical Data Analysis
Willem Heiser farewell symposium 30 January 2014
Richard Gill
http://www.math.leidenuniv.nl/~gill
Worst Practices in
Statistical Data Analysis
2. • Flashback: one year ago in Tilburg
• Meeting of the social science section of the Dutch
statistical society: Statistiek uit de bocht (Statistics
round the bend)
Ideas stolen from Marcel van Assen, Jelte Wicherts,
Frank van Kolfschooten, Han van der Maas, Howard Wainer …
A talk within a talk,
& some reflections on scientific integrity
3. … or … ?
Integrity or fraud …
or just
questionable research practices?
Richard Gill
Tilburg, December 2012
http://www.math.leidenuniv.nl/~gill
5. • August 2011: a friend draws attention of
Uri Simonsohn (Wharton School, Univ. Penn.)
to “The effect of color ... ”
by D. Smeesters and J. Liu
Smeesters
6. Hint: text mentions a number of within group SD’s; group sizes ≈ 14
• Simonsohn does preliminary statistical analysis
indicating results are “too good to be true”
7. 3×2×2 design,
≈14 subjects per group
• Outcome: # correct answers in 20 item
multiple choice general knowledge quiz
• Three treatments:
• Colour: red, white, blue
• Stereotype or exemplar
• Intelligent or unintelligent
9. • Red makes one see differences
• Blue makes one see similarities
• White is neutral
• Seeing an intelligent person makes you feel intelligent
if you are in a “blue” mood
• Seeing an intelligent person makes you feel dumb
if you are in a “red” mood
• Effect depends on whether you see exemplar or stereotype
Priming
10. • The theory predicts something very like the
picture (an important three way interaction!)
11. •August 2011: a friend draws attention of Uri Simonsohn
(Wharton School, Univ. Penn.) to “The effect of color”
by D. Smeesters and J. Liu.
•Simonsohn does preliminary statistical analysis indicating “too
good to be true”
•September 2011: Simonsohn corresponds with
Smeesters, obtains data, distribution-free analysis
confirms earlier findings
•Simonsohn discovers same anomalies in more papers by
Smeesters, more anomalies
•Smeesters’ hard disk crashes, all original data sets lost.
None of his coauthors have copies.
All original sources (paper documents) lost when moving office
•Smeesters and Simonsohn report to authorities
•June 2012: Erasmus CWI report published, Smeesters
resigns, denies fraud, admits data-massage “which
12. • Erasmus report is censored, authors refuse to answer questions,
Smeesters and Liu data is unobtainable, identity Simonsohn unknown
• Some months later: identity Simonsohn revealed, uncensored version
of report published
• November 2012: Uri Simonsohn posts “Just Post it: The Lesson from
Two Cases of Fabricated Data Detected by Statistics Alone”
• December 2012: original data still unavailable, questions to Erasmus
CWI still unanswered
• March 2013: Simonsohn paper published, data posted
http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2114571
Two cases? Smeesters, Sanna; third case, inconclusive
(original data not available)
What did Simonsohn actually do?
13. • Theory predicts that the 12 experimental groups
can be split into two sets of 6
• Within each set, groups should be quite similar
• Smeesters & Liu report some of the group
averages and some of the group SD’s
• Theory:
variance of group average = within group variance
divided by group size!
• The differences between group averages are too
small compared to the within group variances!
14. (*) Fisher: use left tail-probability of F- test for testing too good to be true
• Simonsohn proposes ad-hoc(*) test-statistic
(comparing between group to within group
variance), null distribution evaluated using
parametric bootstrap
• When original data is made available, can repeat
with non-parametric bootstrap
• Alternative: permutation tests
• Note: to do this, he pools each set of six groups.
“Assumption” that there is no difference between
the groups within each of the two sets of six
groups is conservative
15. sigma <- 2.9
pattern <- c(rep(c(1, 0), 3), rep(c(0, 1), 3))
means <- pattern
means[pattern == 1] <- 11.75
means[pattern == 0] <- 9.5
set.seed(2013)
par(mfrow = c(3,4), bty = "n", xaxt = "n", yaxt = "n")
for (i in 1:12) { averages <- rnorm(12, mean = means, sd = sigma/sqrt(14))
dim(averages)<- c(2, 6)
averages <- rbind(averages-6, 0)
plot(c(0, 20), c(0, 7), xlab = "", ylab = "", type = "n")
abline(h = 0:6)
barplot(as.vector(averages), col = rep(c("black", "white", "white"), n = 6),
add = TRUE)
}
A picture tells 1000 words
19. • Simonsohn’s test-statistic is actually
equivalent to standard ANOVA F-test of
hypothesis “each of two groups of six
conditions have the same mean” – except
that we want to reject if the statistic is too
small
Further analyses
20. > result.anova
Analysis of Variance Table
Model 1: score ~ (colour + prime + dimension)^2
Model 2: score ~ colour * prime * dimension
Res.Df RSS Df Sum of Sq F Pr(>F)
1 159 1350.8
2 157 1299.6 2 51.155 3.0898 0.04829 *
Test of the 3-way interaction
Smeesters and Liu
(OK, except d.f.)
RDG
data <- data.frame(score = scores, colour = colour, prime = prime,
dimension=dimension, pattern=pattern.long)
result.aov.full <- aov(score~colour*prime*dimension, data = data)
result.aov.null <- aov(score~(colour+prime+dimension)^2, data = data)
result.anova <- anova(result.aov.null, result.aov.full)
result.anova
result.aov.zero <- aov(score ~ pattern, data=data)
result.anova.zero <- anova(result.aov.zero, result.aov.full)
result.anova.zero$F[2]
pf(result.anova.zero$F[2], df1=10, df2=156)
21. > result.anova.zero$F[2]
[1] 0.0941672
> pf(result.anova.zero$F[2], df1=10, df2=156)
[1] 0.0001445605
Test of too good to be true
data <- data.frame(score = scores, colour = colour, prime = prime,
dimension=dimension, pattern=pattern.long)
result.aov.full <- aov(score~colour*prime*dimension, data = data)
result.aov.null <- aov(score~(colour+prime+dimension)^2, data = data)
result.anova <- anova(result.aov.null, result.aov.full)
result.anova
result.aov.zero <- aov(score ~ pattern, data=data)
result.anova.zero <- anova(result.aov.zero, result.aov.full)
result.anova.zero$F[2]
pf(result.anova.zero$F[2], df1=10, df2=156)
22. • Scores (integers) appear too uniform
• Permutation test: p-value = 0.00002
Further analyses
25. Geraerts
• Senior author Merckelbach becomes suspicious
of data reported in papers 1 and 2
• He can’t find “Maastricht data” among Geraerts
combined “Maastricht + Harvard” data set for
paper 2 (JAb: Journal of Abnormal Psychology)
27. (JAb)
Curiouser and curiouser:
Self-rep + Probe-rep (Spontaneous) = idem (Others)
Self-rep (Spontaneous) = Probe-rep (Others)
Samples matched (on sex, age education), analysis
does not reflect design
28. • Merckelbach reports Geraerts to Maastricht and to
Rotterdam authorities
• Conclusion: (Maastricht) some carelessness but no
fraud; (Rotterdam) no responsibility
• Merckelbach and McNally request editors of
“Memory” to retract their names from joint paper
• The journalists love it (NRC; van Kolfschooten ...)
Geraerts
30. Picture is “too good to
be true”
•Parametric analysis of Memory tables confirms, esp. on
combining results from 3×2 analyses (Fisher comb.)
•For the JAb paper I received the data from van Kolfschooten
•Parametric analysis gives same result again (4×2)
•Distribution-free (permutation) analysis confirms!
(though: permutation p-value only 0.01 vs
normality+independence 0.0002)
> results
[,1] [,2]
[1,] 0.13599556 0.37733885
[2,] 0.01409201 0.25327297
[3,] 0.15298798 0.08453114
> sum(-log(results))
[1] 12.95321
> pgamma(sum(-log(results)),
6, lower.tail = FALSE)
[1] 0.01106587
> results
[,1] [,2]
[1,] 0.013627082 0.30996011
[2,] 0.083930301 0.24361439
[3,] 0.004041421 0.05290153
[4,] 0.057129222 0.31695753
> pgamma(sum(-log(results)), 8, lower.tail=FALSE)
[1] 0.0002238678
31. • Scientific = Reproducible: Data preparation and
data analysis are integral parts of experiment
• Keeping proper log-books of all steps of data
preparation, manipulation, selection/exclusion of
cases, makes the experiment reproducible
• Sharing statistical analyses over several authors is
almost necessary in order to prevent errors
• These cases couldn’t have occurred if all this had
been standard practice
The morals of the story (1)
32. • Data collection protocol should be written down in
advance in detail and followed carefully
• Exploratory analyses, pilot studies … also science
• Replicating others’ experiments: also science
• It's easy to make mistakes doing statistical analyses:
the statistician needs a co-pilot
• Senior co-authors co-responsible for good scientific
practices of young scientists in their group
• These cases couldn’t have occurred if all this had
been standard practice
The morals of the story (2)
33. http://www.erasmusmagazine.nl/nieuws/detail/article/6265-geraerts-trekt-memory-artikel-terug/
Obtaining the data “for peer review”: send request to secretariaatpsychologie@fsw.eur.nl
Memory affair postscript
• Geraerts is forbidden to talk to Gill
• Erasmus University Psychology Institute asks
Han van der Maas (UvA) to investigate “too good
to be true” pattern in “Memory” paper
• Nonparametric analysis confirms my findings
• Recommendations: 1) the paper is retracted o;
2) report is made public o; 3) data-set idem o
P
~
34. Together, mega-opportunities for Questionable Research
Practice number 7: deciding whether or not to exclude data
after looking at the impact of doing so on the results
(Estimated prevalence near 100%, estimated acceptability rating near 100%)
• No proof of fraud (fraud = intentional deception)
• Definite evidence of errors in data management
• Un-documented, unreproducible reduction
from 42 + 39 + 47 + 33 subjects to 30 + 30 +
30 + 30
Main findings
35. • A balanced design looks more scientific but is an
open invitation to QRP 7
• Identical “too good to be true” pattern is apparent
in an earlier published paper; the data has been
lost
E. Geraerts, H. Merckelbach, M. Jelicic, E. Smeets (2006),
Long term consequences of suppression of intrusive anxious thoughts and repressive coping,
Behaviour Research and Therapy 44, 1451–1460
Remarks
36. • I finally got the data from Geraerts
(extraordinary confidentiality agreement)
• But you can read it off the pdf pictures in the report!
• So let’s take a look…
The latest developments
37.
38.
39.
40. • I cannot identify Maastricht subset in this data
• The JAb paper does not exhibit any of these
anomalies!
• All data of the third paper with same “too good to
be true” is lost
• A number of psychologists also saw “too good to
be true” w.r.t. content (not statistics)
The latest developments
41. • Spot the difference between the following two slides
The next case?
44. • Exhibit A is based on several studies in one paper
• Exhibit B is based on similar studies published in several
papers of other authors
• More papers of author of Exhibit A exhibit same pattern
• His/her graphical displays are bar charts, ordered Treatment
1 (High) : Treatment 2 (Low) : Control
• Displays here based on the ordering
Low : Medium (control) : High
Notes
45. • IMHO, “scientific anomalies” should in the first place be discussed
openly in the scientific community; not investigated by disciplinary
bodies (CWI, LOWI, …)
• Data should be published and shared openly, as much as possible
• Never forget Hanlon’s razor (“do not attribute to malice that which can
be adequately explained by incompetence”; or in plain English: “cockup
before conspiracy”)
• I never met a science journalist who knew the “1 over root n” law
• I never met a psychologist who knew the “1 over root n” law
• Corollary: statisticians must participate in social
debate, must work with the media
Remarks
46. • How/why could faked data exhibit this latest
“too good to be true” pattern?
• Develop a “model-free” statistical test for “excess linearity”
[problem: no replicate groups, so no permutation test]
• How could Geraerts’ data ever get the way it is?
In three different papers, different anomalies, same overall
“too good to be true” pattern?
• What do you consider could be the moral of this story?
Exercises