Artificial intelligence persists on being a right-hand tool for many branches of biology. From preliminary advices and treatments, such as understanding if symptoms related to fever or cold, to critical detection of cancerous cell or classification of X-rays, traditional machine learning and deep learning techniques achieved remarkable feats. However, total dependency on machine-based prediction is yet a far fetched concept. In this paper, we provide a framework utilizing several Natural Language Processing (NLP) algorithms to construct a comparative analysis. We create an ensemble of top-performing algorithms to accomplish classification task on medical reports. We compare both the traditional machine learning and deep learning techniques and evaluate their probabilities of being reliable on analyzing medical diagnosis. We concluded that an ensemble approach can provide reliable outcomes with accuracy over 92% and that the current state of the art is unequipped to provide the result with the standard needed for health sectors but an ensemble of these techniques can be a pathway for future research direction.
Conference: IEEE 11th Annual Information Technology, Electronics and Mobile Communication Conference (IEEE IEMCON 2020)At: Vancouver
5. Remarkable Applications on NLP:
REGULAR USE APPLICATIONS SENSITIVE APPLICATIONS
1. Google Translate1 Fraud Detection System4
2 Cortana, Google Assistant, Siri2 Pattern Recognition5
3 Online Market Recommendation System and
Intelligent Chatbots like Iris3
1. P. Koehn. Statistical machine translation. Cambridge University Press, 2009.
2. Matthew B. Hoy (2018) Alexa, Siri, Cortana, and More: An Introduction to Voice Assistants, Medical Reference
Services Quarterly, 37:1, 81-88
3. E. Fast, B. Chen, J. Mendelsohn, J. Bassen, and M. S. Bernstein. Iris:A conversational agent for complex tasks. In
Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, pages 1–12, 2018.
4. N. Sadman, K. D. Gupta, A. Haque, S. Poudyal, and S. Sen. Detect review manipulation by leveraging reviewer historical
stylometrics in amazon, yelp, facebook and google reviews. In Proceedings of the 2020 The 6th International Conference
on E-Business and Applications, pages 42–47, 2020.
5. N. Sadman, K. D. Gupta, A. Haque, S. Poudyal, and S. Sen. Stylometry as a reliable method for fallback authentication. In
Proceedings of the 2020 17th International Conference on Electrical Engineer- ing/Electronics, Computer,
Telecommunications and Information Tech- nology, 2020.
6. Text data >> Numerical / Image data
Some medical applications using machine learning:
● Cancer Detection1
● Medical Image Analysis (CT , X - ray)2
● Genetic Sequencing3
● Gene Structure Prediction4
Field: Biomedicine , bioinformatics, genetic engineering
1.K. Kourou, T. P. Exarchos, K. P. Exarchos, M. V. Karamouzis, and D. I. Fotiadis. Machine learning applications in cancer prognosis and
prediction. Computational and structural biotechnology journal, 13:8– 17, 2015.
2. F. Ritter et al., "Medical Image Analysis," in IEEE Pulse, vol. 2, no. 6, pp. 60-70, Nov.-Dec. 2011, doi: 10.1109/MPUL.2011.942929.
3. Simon Ardui, Adam Ameur, Joris R Vermeesch, Matthew S Hestand, Single molecule real-time (SMRT) sequencing comes of age:
applications and utilities for medical diagnostics, Nucleic Acids Research, Volume 46, Issue 5, 16 March 2018, Pages 2159–2168.
3. Lee, S., Weerasinghe, W., Wray, N. et al. Using information of relatives in genomic prediction to apply effective stratified medicine. Sci
Rep 7, 42091 (2017). https://doi.org/10.1038/srep42091
7. Few NLP (text based) Medical Applications:
- Text mining, POS tagging, information retrieval and extraction, identification of protein or
gene names, annotations of medical records.1
- Medical document classification into groups (Ultrasonography, Endoscopy and Xray).2
- Statistical text classifier to detect extreme/risk events.3
1. M. Krallinger, R. A.-A. Erhardt, and A. Valencia. Text-mining ap-proaches in molecular biology and biomedicine.Drug
discovery today,10(6):439–445, 2005.
2. M. Khachidze, M. Tsintsadze, and M. Archuadze. Natural languageprocessing based instrument for classification of
free text medicalrecords.BioMed research international, 2016, 2016.
3. M.-S. Ong, F. Magrabi, and E. Coiera. Automated identification ofextreme-risk events in clinical incident reports.Journal
of the AmericanMedical Informatics Association, 19(e1):e110–e118, 2012
8. MOTIVATION BEHIND THE WORK
● Missing a dependable framework in Biomedicine field
● Less framework in NLP than Computer Vision
● Trust issues on computer driven applications
15. Collection from MtSamples1
1.https://www.mtsamples.com/
Transcribed Data Medical Speciality
The left ventricular cavity size and wall thickness appear normal. The wall
motion and left ventricular systolic function appears hyperdynamic with
estimated ejection fraction of 70% to 75%....
Cardiovascular / Pulmonary
'PREOPERATIVE DIAGNOSES:,1. Hallux rigidus, left foot.,2. Elevated first
metatarsal, left foot....
Surgery
POSTOPERATIVE DIAGNOSIS: , Hallux limitus deformity of the right
foot.,ANESTHESIA:, Monitored anesthesia care with 15 mL of 1:1 mixture of
0.5% ....
Orthopedic
SUBJECTIVE:, The patient visits our office for a well-child check with concern
of some spitting up quite a bit. The patient does have some spitting up on
occasion. No projectile in nature, nonbilious....
Consult - History and Phy
16. Data Statistics
Maximum no of words 2460
Minimimum no of words 20
Average no of words 500
Average no of stop words 200
Fig: Class statistics
17. Table: Comparative performance scores of algorithms in ensemble approach
Algorithm F1 Precision Recall
Universal Encoder 0.923 0.941 0.927
BERT 0.875 0.890 0.873
Unidirectional LSTM 0.302 0.313 0.298
SVM 0.842 0.843 0.830
Random Forest 0.810 0.821 0.810
KNN 0.851 0.860 0.849
Multinomial Naive
Bayes
0.786 0.788 0.787
Publish Results from High Score Algorithms
20. Threats to validation:
- Medical data are sensitive and must comply with HIPPA compliances1 and GDPR guidelines2. Thus
hard to collect rich and diverse dataset.
- Existance of bias due to class co-relation
Future plans:
- Improving on algorithms through hypertuning and optimizations
- Creaing a web framework, accessible to both doctors and patients
- Collaborate with hospital to collect a dependable dataset.
1. M. White. Hippa compliance for vendors and suppliers.Journalof healthcare protection management: publication of the
InternationalAssociation for Hospital Security, 30(1):91–97, 2014.
2. T. Mulder and M. Tudorica.Privacy policies, cross-border healthdata and the gdpr.Information & Communications
Technology Law,28(3):261–274, 2019