Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
What to Upload to SlideShare
What to Upload to SlideShare
Loading in …3
1 of 27

Automated and Explainable Deep Learning for Clinical Language Understanding at Roche



Download to read offline

Unstructured free-text medical notes are the only source for many critical facts in healthcare. As a result, accurate natural language processing is a critical component of many healthcare AI applications like clinical decision support, clinical pathway recommendation, cohort selection, patient risk or abnormality detection.

Automated and Explainable Deep Learning for Clinical Language Understanding at Roche

  1. 1. Automated & Explainable Deep Learning for Clinical Language Understanding at Roche Vishakha Sharma, PhD Principal Data Scientist Roche Yogesh Pandit, MS Staff Software Engineer Roche David Talby, PhD Chief Technology Officer John Snow Labs
  2. 2. Agenda The Clinical Language Understanding Challenge at Roche Why patients & doctors need accurate & automated natural language understanding at scale Delivering an Automated, Explained, State-of-the-art NLP & OCR System The deep-learning NLP & OCR models and pipelines that were built to address the challenge Achieving state-of-the-art accuracy on real healthcare data in production Reusing, training & tuning clinical embeddings, entity recognition, entity linking, and OCR
  3. 3. Disclaimer ▪ Roche has been one of John Snow Labs’ customers since August 2018. ▪ This presentation has been prepared by Roche and John Snow Labs to provide a high-level overview of Roche’s use of John Snow Labs’ products. ▪ Nothing contained or stated herein or during the presentation constitutes Roche’s endorsement of John Snow Labs’ products. ▪ John Snow Labs is fully responsible for accuracy and completeness of any statements related to John Snow Labs’ products, including the product’s performance.
  4. 4. The Roche difference Rooted in Science • Trusted in Healthcare Diagnostics #1 in biotechnology and in vitro diagnostics 20 billion diagnostic tests performed* Advanced scientific knowledge and technology that increases the medical value of diagnostic solutions Pharmaceuticals Leading provider of cancer treatments worldwide 127 million patients treated with Roche medicines* Focused on major medical indications and disease areas 30 Roche medicines on the WHO Model List of Essential Medicines* Decision Support Workflow | Data | Analytics Delivering Personalized Healthcare Decision support software leveraging more than 120 years of medical innovation rooted in science *Roche Annual Report, 2018 © 2019 F. Hoffmann-La Roche, Ltd NAVIFY is a trademark of Roche.
  5. 5. NAVIFY Tumor Board NAVIFY Clinical Decision Support appsA cloud-based workflow product that securely integrates and displays relevant aggregated data into a single, holistic patient dashboard for oncology care teams to review, align and decide on the optimal treatment for the patient. The clinical decision support apps ecosystem is secured and fully integrated with NAVIFY Tumor Board. NAVIFY Guidelines app Delivering personalized up-to-date guidelines for reviewing and recording patient diagnostic and treatment paths and documentation adherence using an intuitive execution decision tree released in collaboration with GE Healthcare. NAVIFY Clinical Trial Match app* Easily search the largest international trial registries, including, European Medicines Agency, Japan Medical Association Center for Clinical Trials, etc. NAVIFY Publication Search app* Effortlessly search more than 858,000 publications across PubMed, American Society of Clinical Oncology and American Association of Cancer Research. *Powered by MolecularMatch, Inc. © 2019 F. Hoffmann-La Roche, Ltd NAVIFY is a trademark of Roche.
  6. 6. Unstructured healthcare data challenges for NAVIFY portfolio ▪ Diverse customers distributed across the world ▪ Multiple Languages ▪ Oncology ▪ Different report formats (ex: pathology, radiology) ▪ Different terminologies (ex: SNOMED, LOINC, ICD-O-3) Must unlock unstructured data to build a comprehensive, longitudinal view of the patient, and enable both clinical decision support and population analytics
  7. 7. Sample Pathology Report Disclaimer: There is no real patient data being displayed here. Pathology reports are very diverse: ▪ Jargon ▪ Tables ▪ Key-value pairs ▪ Hand-written notes
  8. 8. Manually Curated Report Manual curation is extremely time consuming, expensive, and prone to errors Disclaimer: There is no real patient data being displayed here.
  9. 9. The NAVIFY team identified two significant needs Requirements for both: ▪ Scalable (support 10 million pathology and radiology reports per year ▪ Compliant with privacy laws ▪ Integrates easily with AWS services ▪ Low cost Natural Language Processing (NLP): ▪ High accuracy ▪ Specialized for medical data ▪ Minimize time to train new models ▪ Extensible for new content types Optical Character Recognition (OCR): ▪ High accuracy ▪ Retain document structure (i.e. tables, lists, paragraphs…)
  10. 10. 45+ Oncology Entities to Extract Example: Surgical Pathology Report (Lung, Breast, Colon) Disclaimer: This is sample data from TCGA. There is no real patient data being displayed here.
  11. 11. Optical Character Recognition (OCR) PDF Text engine_mode page_segmentation_mode erosion page_iterator_level scaling_factor Parameters Metrics word_error_rate character_error_rate bag_of_words_error_rate Experiment & Optimize
  12. 12. Named Entity Recognition (NER) Spark NLP provides both CNN+Bi-LSTM and Bio-Bert implementations We trained a model to extract 45+ labels from Pathology reports Chiu, J. P., et al.(2016). Named entity recognition with bidirectional LSTM-CNNs. Transactions of the Association for Computational Linguistics, 4, 357-370. Bi-directional Long Short Term Memory with Convolutional Neural Network Tensorflow Models Devlin J. et al. (2019) Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA. pp. 4171–4186. Bidirectional Encoder Representations from Transformers (BERT) Lee, J., et al. (2020). BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4), 1234-1240. Bio-BERT
  13. 13. Entity Resolution (ER) With just NER - we can not resolve entities to structured code Pre-trained models for resolving healthcare entities to standard SNOMED & ICD-10 codes
  14. 14. Workflow
  15. 15. Workflow
  16. 16. Training NER model with BERT Initialization Training data Resources Annotator Pipeline Run Training
  17. 17. The use of NLP will be a journey ▪ Initial goal of speeding up of pathology and radiology reports ▪ Faster curation and term highlighting in clinical reports ▪ Automate extraction of high-confidence entities, relationships
  18. 18. What is Spark NLP? ▪ State of the art Natural Language Processing ▪ Production-grade, trainable, and scalable ▪ Open-Source Python, Java & Scala libraries ▪ 100+ Pre-trained models & pipelines ▪ Active: 26 new releases in 2018, 30 in 2019
  19. 19. Spark NLP for Healthcare
  20. 20. Accuracy Benchmarks
  21. 21. Scaling Benchmarks
  22. 22. Speed Benchmarks • Optimized builds of Spark NLP for both Intel and Nvidia • Benchmark done on AWS: Train a French NER model • Achieving F1-score of 89% requires at least 80 Epochs with batch size of 512 • Intel outperformed Nvidia: Cascade Lake was 19% faster & 46% cheaper than Tesla P-100
  23. 23. Clinical Entity Recognition: Accuracy Bert NerDLApproach 93.3 % on blind test set The best NER score in a production system
  24. 24. Clinical Entity Resolution “CNN-based ranking for biomedical entity normalization”. Li et al., BMC Bioinformatics, October 2017.
  25. 25. Learn more: Spark NLP Public Python notebooks - runnable on Google Colab with one click in a browser: enterprise/healthcare/colab Overview of the Spark NLP:
  26. 26. Thank you! We are hiring!!! Try Spark NLP at