Report

Follow

•0 likes•145 views

Background: Biomarker candidates are defined as measurable molecules found in biological media. According to Biomarkers Definitions Working Group, 2001, biomarkers cover a rather wide range of parameters. Recently, biomarkers are used widely in medical researches, where single biomarkers may not possess the desired cause-effect association for disease classification and outcome prediction. Therefore the efforts of the researchers currently is to combine biomarkers. By new technologies like microarrays, next generation sequencing and mass spectrometry, researchers can obtain many biomarker candidates that can exceed tens of thousands. To avoid wasting money and time, it is suggested to control the number of patients strictly. However, pilot studies usually have low statistical power which reduces the chance of detecting a true effect .

•0 likes•145 views

Follow

Report

Background: Biomarker candidates are defined as measurable molecules found in biological media. According to Biomarkers Definitions Working Group, 2001, biomarkers cover a rather wide range of parameters. Recently, biomarkers are used widely in medical researches, where single biomarkers may not possess the desired cause-effect association for disease classification and outcome prediction. Therefore the efforts of the researchers currently is to combine biomarkers. By new technologies like microarrays, next generation sequencing and mass spectrometry, researchers can obtain many biomarker candidates that can exceed tens of thousands. To avoid wasting money and time, it is suggested to control the number of patients strictly. However, pilot studies usually have low statistical power which reduces the chance of detecting a true effect .

- 1. www.helmholtz-hzi.de Combination of informative biomarkers in small pilot studies and estimation of sample size for extended studies Amani Al-Mekhlafi1,2, Frank Klawonn1,3 Figure 1. HAUCA Curve starting from 0.85 AUC value for hip infection dataset Aims: • Finding the optimal combination of biomarkers to maximize the AUC • Estimating the sample size for extended studies 1Department of Biostatistics, Helmholtz Centre for Infection Research 2PhD student Epidemiology, Braunschweig-Hannover 3Department of Computer Science, Ostfalia University of Applied Sciences Method: Data: A pilot study by Omar et al.5 has a total of 24 patients, 12 patients with chronic periprosthetic hip infection and 12 patients with aseptic hip prosthesis loosening, and 50,416 biomarker candidates (Hip infection dataset) Feature Selection Approach: The classification criterion is based on area under the receiver operating characteristic curve (AUC) Calculation of p-value: Based on the same statistic that is used for the Wilcoxon-Mann-Whitney-U test 6 Correction of p-value: Holm- Bonferroni correction Background: Biomarker candidates are defined as measurable molecules found in biological media. According to Biomarkers Definitions Working Group, 20011, biomarkers cover a rather wide range of parameters. Recently, biomarkers are used widely in medical researches, where single biomarkers may not possess the desired cause-effect association for disease classification and outcome prediction2. Therefore the efforts of the researchers currently is to combine biomarkers. By new technologies like microarrays, next generation sequencing and mass spectrometry, researchers can obtain many biomarker candidates that can exceed tens of thousands3. To avoid wasting money and time, it is suggested to control the number of patients strictly. However, pilot studies usually have low statistical power which reduces the chance of detecting a true effect 4. Step I: HAUCA Curve: A method indicating how many good biomarkers a data set contains compared to pure random effects7 • Calculate the number of biomarkers that exceed specific values of AUC: in the real dataset in a random dataset • Compute 95% quantile of the binomial distribution of each AUC value to obtain a confidence interval In the hip infection data, there is more than random association between the biomarker candidates and the disease. Moreover, the study is worthwhile for further studies. Biomarker AUC p-value Corrected p-value 1 Bio. with high AUC 0.951 3.328e-05 1.678 2 Bio. with high AUC 0.944 4.955e-05 2.498 5 Bio. with high AUC 0.931 1.028e-04 5.183 4 Bio. with high AUC 0.924 1.442e-04 7.271 6 Bio. with high AUC 0.917 2.012e-04 10.142 9 Bio. with high AUC 0.910 2.744e-04 13.834 12 Bio. with high AUC 0.903 3.713e-04 18.718 Table 1. Top Biomarkers with highest AUC values, their p-value, and corrected p-value Step III: Estimate the Sample size: • Specify the AUC value which is wanted to be validated • Specify the prevalence of the positive cases • Specify the number of hypothesis tests n and n+ are increased gradually until the wanted AUC value with a significant corrected p-value is achieved In the hip infection data, a sample size of 60 is needed to validate the 0.85 AUC value Step II: Combination of Biomarkers: • Select the top k features according to the AUC • Calculate within the groups: The difference in means for each feature The variance-covariance matrices between combined features • Calculate the AUC of the combination of possibly correlated biomarkers according to Demler et al. 8 • Measure the lower confidence interval for this combination by bootstrapping with different levels (0.025, 0.05, 0.1) In the same dataset, we can notice that when 10 biomarkers are combined, AUC value becomes close to 1 and the different lower confidence intervals are not less than 0.95. NO. COMBINED BIOMARKERS AUC VALUE 1 0.906186 2 0.952129 3 0.935715 4 0.955792 5 0.944372 6 0.958933 7 0.965342 8 0.978783 9 0.986358 10 0.993131 Figure 2. Curve of AUCs of the combination of the top 20 biomarkers Figure 3. Sample Size to validate each AUC valueTable 2. Top 10 combined AUC values The LEGaTO project has received funding from the European Union’s Horizon 2020 research and innovation programme under the grant agreement No 780681. References 1. Biomarkers Definitions Working Group: Biomarkers and surrogate endpoints: Preferred definitions and conceptual framework. Clin. Pharmacol. Ther,2001; 69, 89–95. 2. Yan, L., Tian, L., and Liu, S. Combining large number of weak biomarkers based on AUC. Stat Med, 2015; 34(29): 3811-3830. 3. Soon, W.W., Hariharan, M., and Snyder, M.P. High-throughput sequencing for biology and medicine. Molecular systems biology,2013; 9:640. 4. Button, KS., Ioannidis, JP., Mokrysz C., Nosek BA., Flint J., Robinson ES., and Munafo MR. Power failure: why small sample size undermines the reliability of neuroscience. Nat Rev Neurosci. 2013;.14(5):365-76. 5. Omar, M., Klawonn, F., Brand, S., Stiesch, M. Krettek, C., and Eberhard, J.Transcriptome wide high-density microarray analysis reveals differential gene transcription in periprosthetic tissue from hips with low-grade infection versus aseptic loosening. Journal of Arthroplasty, 2017;32: 234-240,2016. 6. Mason, S.J., Graham, N.E.: Areas beneath the relative operating characteristics (ROC) and relative operating levels (ROL) curves: Statistical significance and interpretation. Quarterly Journal of the Royal Meteorological Society 128(584) (2002) 2145–2166 7. Klawonn, F., Wang, J., Koch, I., Eberhard, J., and Omar, M. HAUCA curves for the evaluation of biomarker pilot studies with small sample sizes and large numbers of features. Advances in Intelligent Data Analysis XV, 2016; 356–367. 8. Demler, O., Pencina, M., D’Agostino, R.S.: Impact of correlation on predictive ability of biomarkers. Statistics in Medicine 32 (2013) 4196–421 Conclusion: • AUC performance measurement has been used not just because it is well established but also it yields closed form solutions of the required calculations and therefore to fast computation. However, other measurement like entropy, misclassification rate or mutual information might be very good alternatives to AUC. • The correlation between biomarkers may influence the performance of their combination. Therefore it has been taken into consideration and the variance-covariance matrices have been calculated to minimize the overoptimistic performance of the combination when independence is assumed. • In order to statistically validate biomarker candidates from pilot studies it is necessary to estimate the required larger sample size Result: