1. Editorial Commentary: Hamstring Autograft With
Preserved Insertions Versus Free Hamstring Autograft
in Knee Anterior Cruciate Ligament Reconstruction:
Clinically the Same but Statistically Different
Mark P. Cote, P.T., D.P.T., M.S.C.T.R., Statistical Editor
Abstract: There is considerable debate over the ideal graft for anterior cruciate ligament (ACL) reconstruction in athletes.
The use of a hamstring autograft with preserved insertions is an approach to ACL reconstruction that may lead to earlier
healing and less morbidity in comparison to a free hamstring autograft. When compared in a randomized trial, statistically
significant but clinically irrelevant differences in clinical outcome and instrumented laxity were observed. These con-
tradictory findings highlight the importance of examining the results from many vantage points and further demonstrate
the misleading conclusions that may result from reliance on P values.
See related article on page 2208
The paper “Outcome of Hamstring Autograft With
Preserved Insertions Compared to Free Hamstring
Autograft in Anterior Cruciate Ligament Surgery at
2 Year Follow Up” by Gupta, Bahadur, Malhotra,
Masih, Sood, Gupta, and Mathur1
provides an excellent
example of statistically significant differences that fail to
be clinically meaningful. The authors execute a well-
designed randomized trial comparing the outcome of
a hamstring autograft with preserved insertions to a
free hamstring autograft for anterior cruciate ligament
reconstruction in professional athletes. Their results
demonstrate a between-group difference in outcome
scores that cross the magical P < .05 threshold; how-
ever, the results do not appear to be clinically relevant.
This is somewhat of a rare occurrence in the realm of
orthopaedic surgery. In most instances, there is concern
over the findings of a study failing to reach statistical
significance, that is, the dreaded type II error.2
In their
study, Gupta et al. observe a difference of 11.7 points on
the Activities of Daily Living and Sports Activity scale of
the Cincinnati Knee Rating system at a P value < .0001,
thereby eliminating concern over a type II error; how-
ever, this difference may not be meaningful. Although
there is no defined minimally clinically important value
for this aspect of the Cincinnati Knee Rating system, a
difference of 11.7 points between the groups on a scale
whose scores can range from 120 to 420 likely repre-
sents a trivial finding. The same holds true for instru-
mented laxity where a near unperceivable difference of
0.8 mm. reaches statistical significance.
This seemingly contradictory finding of a real “sta-
tistical” difference that is in fact representative of no
“clinical” difference is a product of the significance
testing approach to data analysis. Statistical analysis of
study results is needed to account for the randomness
or variability in the outcome being measured. What
we are interested in is how precisely a measure, be it
a between-group difference on a subscale of the
Cincinnati Knee Rating system, intercondylar notch
width, or shoe size, has been estimated.3
As
straightforward as this may seem, this is not how
significance testing works. By relying on a P value,
the results of a study can only be defined in 2 ways:
significant or not significant. This fundamentally
flawed approach can lead to misleading in-
terpretations of the data.3,4
One solution to this problem is the use of the confi-
dence interval. This measure provides both an estimate
of the effect and a measure of precision. Gupta et al.
The author reports that he has no conflicts of interest in the authorship and
publication of this article. Full ICMJE author disclosure forms are available
for this article online, as supplementary material.
Ó 2017 by the Arthroscopy Association of North America
0749-8063/171003/$36.00
https://doi.org/10.1016/j.arthro.2017.08.279
Arthroscopy: The Journal of Arthroscopic and Related Surgery, Vol 33, No 12 (December), 2017: pp 2217-2218 2217
2. appropriately use a 95% confidence interval in
describing their data. For example, the difference of
11.7 points on the Activities of Daily Living and Sports
Activity scale of the Cincinnati Knee Rating system has
a confidence interval of 17.32 to 6.08. From this in-
formation we can say that although these authors
observed a difference of 11.7 points, they were 95%
confident that the true mean difference may be as high
at 17 or as low at 6. This information allows for
meaningful conclusions to be gleaned. Gupta et al. have
seemed to estimate the difference between these groups
with adequate precision considering the fact that the
upper limit of this interval is 17, which still does not
represent a meaningful difference. Had the interval’s
upper limit been a larger number like 35 or 40, we may
be concerned that a clinically meaningful difference
between these groups does exist, in which case a large
sample would be needed to allow the interval to narrow
to a point where this could be discerned. However, with
an upper limit of 17, a clinically meaningful difference
does not appear to be present. Even better, this entire
interpretation did require the use of a P value.
To what do we owe this small difference in the Ac-
tivities of Daily Living and Sports Activity scale of the
Cincinnati Knee Rating system? Differences in this
score at baseline could explain these results. Although
the groups have an 11.7-point difference at 24-month
follow-up, they started with a 5.2-point difference
preoperatively, which would reduce the difference to
6.5. In addition, a comparison of the change in score or
the “delta” from pre- to 24-month follow-up tells a
similar story. Group 1 (hamstring autograft with pre-
served insertions) has a net improvement of 178 points,
whereas group 2 (free hamstring autograft) improves
by 172 points.
Gupta et al. should be commended for conducting a
rigorous clinical trial. These studies require substantial
time, patience, and commitment all in an effort to
produce meaningful results. For most, the hope is that
the trial will produce statistically precise, clinical sig-
nificant findings that shape and inform future practice.
Although this is not the case with Gupta et al.’s study,
their results are meaningful additions to the literature.
This is the first trial comparing a preserved insertion
hamstring autograft with the traditional free hamstring
autograft. It provides Level I evidence that these 2 grafts
produced comparable outcomes in their population.
Science is replication and new investigators aiming to
study the role of different autografts in athletes have
these results to build on. It also serves as an example of
why it is important to fully vet a study’s results rather
than relying on P values. Considering that we are in the
era of value-based health care where clinicians will be
increasingly called on to justify the value of the treat-
ments they provide, distinguishing between clinical and
statistical significance seems like a lesson worth
learning.
References
1. Gupta R, Bahadur R, Malhotra A, et al. Outcome of
hamstring autograft with preserved insertions compared
to free hamstring autograft in anterior cruciate ligament
surgery at 2 year follow up. Arthroscopy 2017;33:
2208-2216.
2. Sabharwal S, Patel NK, Holloway I, Athanasiou T. Sample
size calculations in orthopaedics randomised controlled
trials: revisiting research practices. Acta Orthop Belg 2015;81:
115-122.
3. Harris JD, Brand JC, Cote MP, Faucett SC, Dhawan A.
Research pearls: The significance of statistics and perils of
pooling. Part 1: Clinical versus statistical significance.
Arthroscopy 2017;33:1102-1112.
4. Rothman KJ. Curbing type I and type II errors. Eur J Epi-
demiol 2010;25:223-224.
2218 EDITORIAL COMMENTARY