Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Sti2018 jws
1. Data-dependent analytical choices
relying on NHST should not be trusted!
Jesper W. Schneider
Danish Centre for Studies in Research and Research Policy,
Aarhus University, Denmark
jws@ps.au.dk
2. NHST = “null hypothesis significance test”
• A standard mode of inference in social and behavioral science is to establish
stylized facts using statistical significance in quantitative studies
• NHST is the practice of selection on statistical significance using an arbitrary
threshold (p < .05) to decide whether something is “true” or “false”
• NHST is when we try to knock down a strawman hypothesis to be able to say
something about our preferred hypothesis which we do not test
3. My claim
• Findings in Scientometrics relying on NHST are not to be trusted!
• … or we should be sceptical of claims from individual studies, especially when “decisions”
are based on NHST
• Why?
• At best they are vey susceptible to “garden of forking paths”
• At worst they have been deliberately p-hacked
• So my arguments are primarily in the statistical realm
5. Knowledge production modes in
Scientometrics
• The field is diverse when it comes to methodology and methods
• Its core and boundaries are somewhat elusive
• The quantitative methodological domain is seemingly descriptive (exploratory) and often
case based
• … at least on the surface, as many studies are tacitly confirmatory studying explicit or
implicit hypotheses relying on NHST
• … and here we should watch out!
‘‘… the data did not show a significant relationship between the proportion of female
authors and the number of citations received, controlled by the number of authors who
signed each paper and the journal impact factor (r = -0.085, p = 0.052)”
7. Checked observed data
Log-transform skewed
data
X 1 is correlated with y
include as covariate
Use OLS regression P < .05
This is usually what is presented … the straight way to
the “truth”, and seemingly the only way to the “truth”
- A pre-determined path!!
9. Checked observed data
Log-transform skewed
data
X 1 is correlated with y
include as covariate
Use OLS regression P < .05
… and your recipe gave you this!
10. But if our recipe or the choices
we made under way were
slightly different
11. Checked observed data
Log-transform skewed
data
X 1 is correlated with y
include as covariate
Use OLS regression P < .05
Data are not skewed
X 1 is correlated with y
include as covariate
Use OLS regresion P < .??
We chose a slightly different path?
12. Checked observed data
Log-transform skewed
data
X 1 is correlated with y
include as covariate
Use OLS regression P < .05
Data are not skewed
X 1 is correlated with y
include as covariate
Use OLS regresion P < .??
X 1 is not correlated
with y and is excluded
Use OLS regresion P < .??
Or yet another?
13. Same data, many potential
paths leading to potentially
different outcomes!
15. Is this a problem?
• Yes
• Because selection on significance (confirmatory)
• Multi-comparisons problem leading to many false-positive claims
• No
• If NHST is dropped = exploratory analysis
• If pre-registered
• If all data and choices are open and transparent
• Take away message:
• Different stories can be spun from the same data
• A p-value is specific for the path you chose and thus variable with choices
17. P-hacking
• Looking at data and changing
paths a long the way chasing
significance
• Not exactly a pre-determined
path
• Leads to many problems, not
least abundant false-positive
claims
• P-values are uninterpretable
Average power ≈ 24%
in social and
behavioural sciences,
but 90% of studies
find an “effect”
Everybody is a p-hacker but some do it unintentionally
19. Reproducibility
• It is no wonder that “effects” or “findings” do not “replicate”, or go in all directions when
reproducibility is judged on a statistical basis
• The basic fact is that “effects” vary across conditions – “we cannot step into the same
river twice”
• We do not have they required “controlled settings”
• We usually have indirect and very noisy measurements
• Effects are generally small, “the low hanging fruits have been grabbed”
• Researcher-degrees-of-freedom are countless
• Data analytical choices are many and arbitrary
• A main culprit is the labelling of “replications” as success or failures!
• … this is NHST with its arbitrary thresholds for success or failure and it leads to “false-
positives”
20. My overall message
• No replication is truly direct, and I recommend moving away from the classification of
replications as “direct” or “conceptual” to a framework in which we accept that effects
vary across conditions
• Relatedly, we should stop labeling replications as successes or failures and instead use
continuous measures to compare different studies
• But the general impression of “replication” or the “potential of replication” is quite clear
and central
• For example, if effects can vary by context, this provides more reason why
“replication” is necessary for scientific progress
• Full transparency of data collection and processing procedures, of actual data, of all
analyses done and how they are done, providing code are essential
• … and stop ‘over-selling’ your selective results
21. My overall message
• A statistically significant result does not mean that we have found a “true” or “real”
effect
• Few studies provide definitive evidence … although the stories they tell are often framed
in that way
• We need many studies of the same phenomenon, call it “replication” but we should
express the difference between old and new studies in terms of the expected variation
in the effect between conditions, not whether it is a success or failure
Quantitative studies in Scientometrics
relying on NHST is no exception to this!!