15. ロジスティック回帰の適正報告研究
要約すると…ほとんどの論文は必要な情報を報告していない
•
•
•
•
•
•
Bagley et al. Logistic regression in the medical literature:
standards for use and reporting, with particular attention to
one medical domain (J ClinEpidemiol54: 979-985, 2001)
Peng et al. An Introduction to Logistic Regression Analysis and
Reporting (J Edu Res 96:3-12,2002)
Moss et al. An appraisal of multivariable logistic models in the
pulmonary and critical care literature (Chest 123: 923-928,
2003)
Ottenbacher et al. A review of two journals found that articles
using multivariable logistic regression frequently did not report
commonly recommended assumptions (J ClinEpidemiol57: 11471152, 2004)
Mikolajczyk et al. Evaluation of logistic regression reporting in
current obstetrics and gynecology literature (ObstetGynecol111:
413-419, 2008)
Kalil et al. Recommendations for the Assessment and Reporting
of Multivariable Logistic Regression in Transplantation Literature.
(Am J Transplant 10:1686–94,2010)
15
19. 必須記載項目①
変数の選択基準
>事例
-Diagnosis of Lung Cancer
The diagnosis of lung cancer was made by histopathological examination of
resection specimens or cytopathological examination of needle-aspiration biopsy
samples. A microcoil localization technique was used to mark the nodule under CT
guidance before surgical resection. Resected tumors were classified with the use of
the World Health Organization classification of lung neoplasms.
-Statistical Analysis
Multivariable logistic-regression models were prepared to estimate the risk of lung
cancer associated with potential predictors, including sociodemographic variables
and clinical variables such as smoking exposure and nodule characteristics.
Inclusion of variables in the models was based on existing knowledge of risk factors
for lung cancer and on nodule characteristics that are readily discernible on lowdose CT images.
-Results
Predictive Model
The variables listed in Table 1 and in Table S1 in the Supplementary Appendix were
evaluated for inclusion in the models.
Annette McWilliams et al. N Engl J Med 2013;369:910-9.
19
21. 必須記載項目②
過剰適合
過剰適合の対応を記載すること
>方法
従属変数のイベント/独立変数の数≥10にする
Peduzzi P. J Clin Epidemiol. 49(12):1373-9. 1996
>事例
-Methods
We followed standard methods to estimate sample size for
multiple logistic regression, with at least ten outcomes needed
for each included independent variable. With an expected PV
catheterization success rate of 50%, we required 120 attempts
(60 successes) to appropriately perform multiple logistic
regression with six variables.
Schnadower D et al. Acad Emerg Med 14(5) 483–485. 2007
21
24. 必須記載項目③
線形性
>事例
-Statistical Analysis
First, a linear model was determined. This was followed by assessment of the fit of the model
and its performance characteristics. Finally, to assess whether the dependent variable was linear
in the logit, three methods, as proposed by Hosmer and Lemeshow, were used: lowess (locally
weighted least squares) smoothing curves, design variables and fractional polynomials.
-Results
Linearity of the dependent variable (SCA or SAD) for both erect and supine radiographs in the logit of
the independent variable was assessed using three methods. This was done to confirm that binomial
logistic regression was the appropriate method of analysis.
• Lowess smoothing curves showed that linearity varied over the interval of the dependent variable.
…(略) This is illustrated in Fig. 1.
• Logistic regression coefficients were plotted against the approximate quartile midpoints of dependent
variables as shown in Fig. 2. …(略) By contrast, the results suggest linearity in the logit for both SCA
and SAD in the first three quartiles for erect radiographs.
• Fractional polynomial model comparisons showed that the best non-linear transformations were not
significantly different from the linear model. Therefore, the fractional polynomial analysis supported
treating both variables as linear in the logit in general, with one exception: a significant p-value of 0.04
for the variable SCA suggested that the fit of the model might be improved if the variable was
transformed by its inverse square. This in turn suggested that the use of the transformed variable in the
logistic regression analysis might result in a superior model.
Cardiovasc J Afr. 2010 Sep-Oct;21(5):274-9
24
28. 必須記載項目④
変数投入法
>事例
-Methods
We analysed differences in outcomes after 12 and 18 months of follow up with
logistic and multiple linear regression (hierarchical backward elimination
method), adjusting for possible differences in baseline scores and background
characteristics (sex, age, educational level, income, composition of household,
and course of gait problems experienced).
Jolanda C M van Haastregt et al. BMJ 2000;321:994
28
33. 必須記載項目⑤
交互作用の評価
>事例
-Methods
Coenen S et al. Br J Gen Pract. 56(524):183-90 2006.
we estimated a logistic model which contained all covariates as possible confounders
and all interaction terms between perceived patient demand and the covariates as
possible effect modifiers. …(略)We also added the interaction terms of patient sex
and age, and GP’s sex and year of birth…(略) If some of the covariates or interaction
terms dropped out of the starting model due to co-linearity, confounding and
interaction were evaluated in a stratified analysis of perceived patient demand versus
GPs’ antibiotic prescribing, controlling for each covariate separately.
-Results
This resulted in a model containing seven interaction terms (patient age, smoking, …(略).
After eliminating interaction terms with a P-value greater than 0.01, only one interaction
term was retained in the model.
-Statistical Analysis
Annette McWilliams et al. N Engl J Med 2013;369:910-9.
We evaluated interactions between important predictors in final models by including
interaction terms along with main-effect terms. None of the interactions we tested
were significant, and they are not discussed further in this article.
33
37. 必須記載項目⑥
交絡因子の評価
>事例
-Methods
Coenen S et al. Br J Gen Pract. 56(524):183-90 2006.
the confounding effect of all covariates not in significant interaction terms in the full
model was assessed, followed by precision considerations. We looked for a subset of
covariates for which the model gave roughly the same parameter estimates for
perceived patient demand and the significant interaction terms, but with narrower
confidence intervals.
-Figure 2
The relationship between the effect of
perceived patient demand on antibiotic
Prescribing for acute cough and the
significant confounders of this relation.
Adjusted odds ratios and 95% confidence
limits.
37
38. 必須記載項目⑥
交絡因子の評価
>事例
-Methods
Potential confounders in multivariable models included number of non-PCP visits,
sex,….(略) Based on a study by Lin and colleagues (29), we also examined the impact
of an unmeasured confounder on our estimated odds ratios (ORs) in post hoc
sensitivity analyses.
-Discussion
Therefore, residual confounding by unmeasured factors may influence results. In post
hoc sensitivity analyses, our results become nonsignificant or potentially reversed
only when the proportions of an unmeasured confounder differed greatly between
individuals who visit PCPs often and those who do not. An unmeasured confounder
seems to have a greater effect on the association of PCP with CRC incidence than
with mortality (Appendix Tables 3 and 4, available at www.annals.org).
-Limitation
This study used administrative data, which made it difficult to identify potential
confounders and prevented examination of the content of primary care visits.
J M. Ferrante et al. Ann Intern Med. 2013;159(7):437-446. 38
42. 必須記載項目⑦
多重共線性
>事例
-Statistical analysis
Variance Inflation Factor (VIF) was used to check for multicollinearity. Predictive and
complexity characteristics of the model were considered during modelling.
-Results Model fit and Predictive power of the models
Variance inflation factor (VIF) was employed to check for multicollinearity. None of
the VIF values were up to 10 and the mean VIF of the model was less than 6. It
means there was no collinearity in the model (see additional file 1).
Kayode et al. BMC Pregnancy and Childbirth 2012, 12:10
42
47. 必須記載項目⑨
適合度の評価
>事例
-Statistical Analysis
We evaluated the predictive performance of the model by assessing its
discrimination (ability to classify correctly) and its calibration (whether
probabilities predicted by the model match observed probabilities).
Discrimination was measured with the use of the area under the
receiveroperating-characteristic curve (AUC). All AUCs reported are presented
with bootstrap biascorrected 95% confidence intervals, with bootstrapping
techniques based on 1000 bootstrapped samples. We evaluated calibration by
subtracting the model-estimated probability from the observed probability for
each study participant, placing these absolute errors in rank order, and evaluating
the magnitude of the median and 90th percentile of the absolute errors. In
addition, the mean absolute errors for each decile of model-predicted risk were
evaluated.
Annette McWilliams et al. N Engl J Med 2013;369:910-9.
47
49. 番外編
妥当性
妥当性とは?
“A test is valid if it measures what it
purports to measure” (Kelley, 1927)
信頼性との違い:ダーツ投げのアナロジー
http://www.socialresearchmethods.net/kb/relandval.php
49
51. 必須記載項目⑩ 内的妥当性(交差妥当性)
>事例
-Methods
To examine the degree of overfitting of the prediction model to the development sample, we
performed a cross-validation procedure. First, the sample was split at random into 10 equal groups.
Second, a logistic regression model predicting diagnosis of PE was developed on nine tenths of the
sample, and the resulting prediction equation was applied to the remaining tenth; this procedure was
repeated 10 times, each time rotating the cross-validation subset. Finally, the ability of the crossvalidated scores to predict PE was examined by comparing the area under the receiver operating
characteristic curve (AUC) with that obtained from the naive prediction scores, without cross
validation. This cross-validation procedure was performed for the full multivariable prediction model,
where each covariate was assigned a separate regression coefficient, and for the simple model, where
the clinical probability score was the sole predictor. Confidence intervals on AUCs were obtained by
the bootstrap method: 250 subsamples with sample sizes of 986 were taken with replacement from the
original sample, the AUC was computed in each, and the 95% confidence interval was derived from
percentiles 2.5 and 97.5 of the distribution of AUCs.
-Results
When the 8 variables in the prediction model were allowed to vary independently, the AUC was 0.79
(range, 0.76-0.81) for the naive equation and 0.77 (range, 0.74-0.80) after cross validation. When the 8
variables were added to form the diagnostic score, the AUC was 0.79 (range, 0.76-0.81) for naive
prediction and 0.78 (range, 0.75-0.80) after cross validation. Hence, this analysis allows us to rule out
substantial overfitting of the clinical score.
Wicki J et al. Arch Intern Med. 2001;161(1):92-97. 51
54. 必須記載項目⑪ 外的妥当性
>事例
-Statistical Analysis
Prediction models developed in the PanCan cohort (excluding spiculation as a
predictor) were validated externally by means of an assessment of discrimination and
calibration in BCCA data.
-Reuslts /Characteristics
In the BCCA validation study, 1090 participants had 5021 nodules, and 40 of the 1090
persons with nodules (3.7%) were ….(略). The characteristics of the participants are
described in Table S1 in the Supplementary Appendix. The PanCan and BCCA study
populations were similar with respect to age, sex, body-mass index, percentage of
patients with emphysema, and percent of predicted forced expiratory volume in 1
second (FEV1).
Annette McWilliams et al. N Engl J Med 2013;369:910-9.
54
56. 必須記載項目⑪ 外的妥当性
>事例
-Results /Predictive Model
Both parsimonious and full models showed excellent discrimination in the
PanCan and BCCA (validation) data with all AUCs more than 0.90 (Fig. S1
and Table S2 in the Supplementary Appendix). In the PanCan and BCCA data
sets, model-predicted probabilities of lung cancer showed good separation
between participants in whom lung cancer was diagnosed and those in whom
it was not diagnosed, with only modest overlap (Fig. 2). …(略)For those
nodules, the AUCs in model 1a were 0.894 and 0.907 in the PanCan and
BCCA data, respectively (Fig. S1 in the Supplementary Appendix). ….(略)
In the BCCA validation data, the full model performed significantly better
than the parsimonious model: the AUC was 0.960 (95% CI, 0.927 to 0.980) in
model 1a as compared with 0.970 (95% CI, 0.947 to 0.986) in model 2a (P =
0.009 for the difference in AUC) ….(略)
-Discussion
Our models show excellent predictive accuracy, with AUCs of at least 0.94 in
an external validation cohort.
Annette McWilliams et al. N Engl J Med 2013;369:910-9.
56
59. 必須記載項目⑫
ソフトウエア
・分析に使用したソフトウエアを記載すること
・統計ソフトのマニュアルを引用すると丁寧
ロジスティック回帰
ではありませんが..
>事例
-METHODS Statistical Analysis
Statistical analyses were performed using R 2.14
(19), with the cmprsk package (20) for Fine and Gray modeling.
-REFFERENCES
19. R Development Core Team. R: A language and environment for statistical
computing. Vienna, Austria: R Foundation for Statistical Computing; 2011.
Accessed at www.R-project.org on 18 March 2013.
20. Gray RJ. cmprsk: Subdistribution Analysis of Competing Risks. R package
version 2.2-2. 2011.
Timothy J. Daskivich et al. Ann Intern Med. 2013;158:709-717.
59
60. Applied Logistic Regression
Third Edition
Logistic Regression: A Self-Learning Text
(Statistics for Biology and Health)
Applied Regression Analysis and Other
Multivariable Methods
60
67. おまけ:Deep Learning
Olshausen BA, Field DJ
Emergence of simple-cell receptive field properties by learning a sparse code for natural
images Nature 381 (6583): 607-609 JUN 13 1996
Hinton, G. E., Osindero, S. and Teh, Y. (2006)
A fast learning algorithm for deep belief nets. Neural Computation, 18, pp 1527-1554.
"Deep Auto-Encoder" ?
Hinton, G. E. and Salakhutdinov, R. R. (2006)
Reducing the dimensionality of data with neural networks. Science, Vol. 313. no. 5786, pp.
504 - 507, 28 July 2006.
Y. Bengio. Learning deep architectures for AI.
Foundations & Trends in Mach. Learn., 2(1):1--127, 2009.
Le, Q. V., Ranzato, M. A., Monga, R., Devin, M., Chen, K., Corrado, G. S., ... & Ng, A. Y.
(2011). Building high-level features using large scale unsupervised learning. arXiv preprint
arXiv:1112.6209.
67