Automated and Explainable Deep Learning for Clinical Language Understanding at Roche

Databricks
DatabricksDeveloper Marketing and Relations at MuleSoft
Automated and Explainable Deep Learning for Clinical Language Understanding at Roche
Automated & Explainable Deep
Learning for Clinical Language
Understanding at Roche
Vishakha Sharma, PhD
Principal Data Scientist
Roche
Yogesh Pandit, MS
Staff Software Engineer
Roche
David Talby, PhD
Chief Technology Officer
John Snow Labs
Agenda
The Clinical Language Understanding
Challenge at Roche
Why patients & doctors need accurate & automated
natural language understanding at scale
Delivering an Automated, Explained,
State-of-the-art NLP & OCR System
The deep-learning NLP & OCR models and pipelines
that were built to address the challenge
Achieving state-of-the-art accuracy
on real healthcare data in production
Reusing, training & tuning clinical embeddings,
entity recognition, entity linking, and OCR
Disclaimer
▪ Roche has been one of John Snow Labs’ customers since August 2018.
▪ This presentation has been prepared by Roche and John Snow Labs to
provide a high-level overview of Roche’s use of John Snow Labs’
products.
▪ Nothing contained or stated herein or during the presentation
constitutes Roche’s endorsement of John Snow Labs’ products.
▪ John Snow Labs is fully responsible for accuracy and completeness of
any statements related to John Snow Labs’ products, including the
product’s performance.
The Roche difference
Rooted in Science • Trusted in Healthcare
Diagnostics
#1 in biotechnology
and in vitro diagnostics
20 billion diagnostic tests performed*
Advanced scientific knowledge
and technology that increases
the medical value
of diagnostic solutions
Pharmaceuticals
Leading provider
of cancer treatments worldwide
127 million patients treated
with Roche medicines*
Focused on major medical indications
and disease areas
30 Roche medicines on the
WHO Model List of Essential Medicines*
Decision Support
Workflow | Data | Analytics
Delivering
Personalized Healthcare
Decision support software leveraging more
than 120 years of medical innovation rooted in science
*Roche Annual Report, 2018 © 2019 F. Hoffmann-La Roche, Ltd NAVIFY is a trademark of Roche.
NAVIFY Tumor Board
NAVIFY Clinical Decision Support appsA cloud-based workflow product that securely integrates
and displays relevant aggregated data into a single, holistic patient
dashboard for oncology care teams to review, align and decide on
the optimal treatment for the patient.
The clinical decision support apps ecosystem is
secured and fully integrated with NAVIFY Tumor Board.
NAVIFY Guidelines app
Delivering personalized up-to-date guidelines for reviewing
and recording patient diagnostic and treatment paths and
documentation adherence using an intuitive execution
decision tree released in collaboration with GE Healthcare.
NAVIFY Clinical Trial Match app*
Easily search the largest international trial registries,
including ClinicalTrials.gov, European Medicines Agency,
Japan Medical Association Center for Clinical Trials, etc.
NAVIFY Publication Search app*
Effortlessly search more than 858,000 publications across
PubMed, American Society of Clinical Oncology and
American Association of Cancer Research.
*Powered by MolecularMatch, Inc. © 2019 F. Hoffmann-La Roche, Ltd NAVIFY is a trademark of Roche.
Unstructured healthcare data challenges for
NAVIFY portfolio
▪ Diverse customers distributed across the world
▪ Multiple Languages
▪ Oncology
▪ Different report formats (ex: pathology, radiology)
▪ Different terminologies (ex: SNOMED, LOINC, ICD-O-3)
Must unlock unstructured data to build a comprehensive,
longitudinal view of the patient, and enable both clinical decision
support and population analytics
Sample Pathology Report
Disclaimer: There is no real patient data being displayed here.
Pathology reports are very
diverse:
▪ Jargon
▪ Tables
▪ Key-value pairs
▪ Hand-written notes
Manually Curated Report
Manual curation is extremely
time consuming, expensive,
and prone to errors
Disclaimer: There is no real patient data being displayed here.
The NAVIFY team identified two significant needs
Requirements for both:
▪ Scalable (support 10 million pathology and
radiology reports per year
▪ Compliant with privacy laws
▪ Integrates easily with AWS services
▪ Low cost
Natural Language Processing (NLP):
▪ High accuracy
▪ Specialized for medical data
▪ Minimize time to train new models
▪ Extensible for new content types
Optical Character Recognition (OCR):
▪ High accuracy
▪ Retain document structure
(i.e. tables, lists, paragraphs…)
45+ Oncology Entities to Extract
Example: Surgical Pathology Report (Lung, Breast, Colon)
Disclaimer: This is sample data from TCGA. There is no real patient data being displayed here.
Optical Character Recognition (OCR)
PDF Text
engine_mode
page_segmentation_mode
erosion
page_iterator_level
scaling_factor
Parameters
Metrics
word_error_rate
character_error_rate
bag_of_words_error_rate
Experiment
& Optimize
Named Entity Recognition (NER)
Spark NLP provides both CNN+Bi-LSTM and Bio-Bert implementations
We trained a model to extract 45+ labels from Pathology reports
Chiu, J. P., et al.(2016). Named entity recognition with bidirectional
LSTM-CNNs. Transactions of the Association for Computational
Linguistics, 4, 357-370.
Bi-directional Long Short Term Memory
with Convolutional Neural Network
Tensorflow Models
Devlin J. et al. (2019) Bert: pre-training of deep bidirectional
transformers for language understanding. In: Proceedings of the 2019
Conference of the North American Chapter of the Association for
Computational Linguistics: Human Language Technologies, Volume 1
(Long and Short Papers), Minneapolis, MN, USA. pp. 4171–4186.
Bidirectional Encoder Representations from
Transformers (BERT)
https://github.com/google-research/bert
Lee, J., et al. (2020). BioBERT: a pre-trained biomedical language
representation model for biomedical text mining. Bioinformatics, 36(4),
1234-1240.
Bio-BERT
Entity Resolution (ER)
With just NER - we can not resolve entities to structured code
Pre-trained models for resolving healthcare entities to standard SNOMED & ICD-10 codes
Workflow
Workflow
Training NER model with BERT
Initialization
Training data
Resources
Annotator
Pipeline
Run Training
The use of NLP will be a journey
▪ Initial goal of speeding up of
pathology and radiology reports
▪ Faster curation and term
highlighting in clinical reports
▪ Automate extraction of
high-confidence entities,
relationships
What is Spark NLP?
▪ State of the art Natural Language Processing
▪ Production-grade, trainable, and scalable
▪ Open-Source Python, Java & Scala libraries
▪ 100+ Pre-trained models & pipelines
▪ Active: 26 new releases in 2018, 30 in 2019
Spark NLP
for Healthcare
Accuracy Benchmarks
Scaling Benchmarks
Speed Benchmarks
• Optimized builds of Spark NLP
for both Intel and Nvidia
• Benchmark done on AWS:
Train a French NER model
• Achieving F1-score of 89%
requires at least 80 Epochs with
batch size of 512
• Intel outperformed Nvidia:
Cascade Lake was 19% faster &
46% cheaper than Tesla P-100
Clinical Entity Recognition: Accuracy
Bert
NerDLApproach
93.3 %
on blind test set
The best NER score
in a production system
Clinical Entity Resolution
“CNN-based ranking for
biomedical entity normalization”.
Li et al., BMC Bioinformatics,
October 2017.
Learn more: Spark NLP
Public Python notebooks - runnable on Google Colab with one click in a browser:
github.com/JohnSnowLabs/spark-nlp-workshop/tree/master/tutorials/colab
github.com/JohnSnowLabs/spark-nlp-workshop/tree/master/jupyter/
enterprise/healthcare/colab
Overview of the Spark NLP:
nlp.johnsnowlabs.com
Thank you!
We are hiring!!!
www.navify.com/careers/
Try Spark NLP at
nlp.johnsnowlabs.com
1 of 27

Recommended

Best Practices in DataOps: How to Create Agile, Automated Data Pipelines by
Best Practices in DataOps: How to Create Agile, Automated Data PipelinesBest Practices in DataOps: How to Create Agile, Automated Data Pipelines
Best Practices in DataOps: How to Create Agile, Automated Data PipelinesEric Kavanagh
678 views17 slides
Building Better Data Pipelines using Apache Airflow by
Building Better Data Pipelines using Apache AirflowBuilding Better Data Pipelines using Apache Airflow
Building Better Data Pipelines using Apache AirflowSid Anand
4.2K views35 slides
Transition to SAP S/4HANA System Conversion: A step-by-step guide by
Transition to SAP S/4HANA System Conversion: A step-by-step guide Transition to SAP S/4HANA System Conversion: A step-by-step guide
Transition to SAP S/4HANA System Conversion: A step-by-step guide Kellton Tech Solutions Ltd
1.7K views26 slides
Supplier Enablement – How to Bring Suppliers to Ariba Network by
Supplier Enablement – How to Bring Suppliers to Ariba NetworkSupplier Enablement – How to Bring Suppliers to Ariba Network
Supplier Enablement – How to Bring Suppliers to Ariba NetworkSAP Ariba
6.8K views27 slides
The Power of SAP and Ariba Solution Integration by
The Power of SAP and Ariba Solution IntegrationThe Power of SAP and Ariba Solution Integration
The Power of SAP and Ariba Solution IntegrationSAP Ariba
26.5K views38 slides
S4HANA Migration Overview by
S4HANA Migration OverviewS4HANA Migration Overview
S4HANA Migration OverviewSamir Lalani -CPA
3.1K views19 slides

More Related Content

What's hot

Decoding SAP S/4HANA System Conversion by
Decoding SAP S/4HANA System ConversionDecoding SAP S/4HANA System Conversion
Decoding SAP S/4HANA System ConversionAkilesh Kumaran
503 views42 slides
S4 hana finance -green field implementations by
S4 hana  finance -green field implementationsS4 hana  finance -green field implementations
S4 hana finance -green field implementationsTrainings Customized
685 views20 slides
Roadmap to SAP S/4HANA by
Roadmap to SAP S/4HANARoadmap to SAP S/4HANA
Roadmap to SAP S/4HANAAbsoft Limited
2K views21 slides
Take the Next Step to S/4HANA with "RISE with SAP" by
Take the Next Step to S/4HANA with "RISE with SAP"Take the Next Step to S/4HANA with "RISE with SAP"
Take the Next Step to S/4HANA with "RISE with SAP"panayaofficial
1K views19 slides
Testing SAP Solutions for Dummies by
Testing SAP Solutions for DummiesTesting SAP Solutions for Dummies
Testing SAP Solutions for DummiesLiberteks
1.1K views76 slides
SAP ASAP 8 Methodology by
SAP ASAP 8 MethodologySAP ASAP 8 Methodology
SAP ASAP 8 MethodologyVikram P Madduri
2.1K views20 slides

What's hot(20)

Decoding SAP S/4HANA System Conversion by Akilesh Kumaran
Decoding SAP S/4HANA System ConversionDecoding SAP S/4HANA System Conversion
Decoding SAP S/4HANA System Conversion
Akilesh Kumaran503 views
Take the Next Step to S/4HANA with "RISE with SAP" by panayaofficial
Take the Next Step to S/4HANA with "RISE with SAP"Take the Next Step to S/4HANA with "RISE with SAP"
Take the Next Step to S/4HANA with "RISE with SAP"
panayaofficial1K views
Testing SAP Solutions for Dummies by Liberteks
Testing SAP Solutions for DummiesTesting SAP Solutions for Dummies
Testing SAP Solutions for Dummies
Liberteks1.1K views
Introduction to apache spark by Aakashdata
Introduction to apache spark Introduction to apache spark
Introduction to apache spark
Aakashdata740 views
S/4 HANA conversion functional value proposition by Vignesh Bhatt
S/4 HANA conversion functional value propositionS/4 HANA conversion functional value proposition
S/4 HANA conversion functional value proposition
Vignesh Bhatt3K views
Successful SAP Implementation Checklist by Cygnet Infotech
Successful SAP Implementation ChecklistSuccessful SAP Implementation Checklist
Successful SAP Implementation Checklist
Cygnet Infotech1.1K views
SAP S/4HANA cloud editions or On Prem? Demystifying the options and cost bene... by IBM
SAP S/4HANA cloud editions or On Prem? Demystifying the options and cost bene...SAP S/4HANA cloud editions or On Prem? Demystifying the options and cost bene...
SAP S/4HANA cloud editions or On Prem? Demystifying the options and cost bene...
IBM3K views
Netflix Keystone SPaaS: Real-time Stream Processing as a Service - ABD320 - r... by Amazon Web Services
Netflix Keystone SPaaS: Real-time Stream Processing as a Service - ABD320 - r...Netflix Keystone SPaaS: Real-time Stream Processing as a Service - ABD320 - r...
Netflix Keystone SPaaS: Real-time Stream Processing as a Service - ABD320 - r...
Amazon Web Services2.9K views
Data Migration Tools for the MOVE to SAP S_4HANA - Comparison_ MC _ RDM _ LSM... by SreeGe1
Data Migration Tools for the MOVE to SAP S_4HANA - Comparison_ MC _ RDM _ LSM...Data Migration Tools for the MOVE to SAP S_4HANA - Comparison_ MC _ RDM _ LSM...
Data Migration Tools for the MOVE to SAP S_4HANA - Comparison_ MC _ RDM _ LSM...
SreeGe1162 views
Procurement Transformation with S/4 HANA Sourcing and Procurement by SAP Ariba
Procurement Transformation with S/4 HANA Sourcing and ProcurementProcurement Transformation with S/4 HANA Sourcing and Procurement
Procurement Transformation with S/4 HANA Sourcing and Procurement
SAP Ariba16.5K views

Similar to Automated and Explainable Deep Learning for Clinical Language Understanding at Roche

Kim bonett cv_2016 by
Kim bonett cv_2016Kim bonett cv_2016
Kim bonett cv_2016koko bonet
372 views5 slides
Dialysis Center Software from HospitalSoftwareShop.com by
Dialysis Center Software from HospitalSoftwareShop.comDialysis Center Software from HospitalSoftwareShop.com
Dialysis Center Software from HospitalSoftwareShop.comhospitalsoftwareshop
328 views18 slides
Popsi Cube 2011 by
Popsi Cube 2011Popsi Cube 2011
Popsi Cube 2011Fabrice Beauchene
795 views13 slides
LFS302_Real-World Evidence Platform to Enable Therapeutic Innovation by
LFS302_Real-World Evidence Platform to Enable Therapeutic InnovationLFS302_Real-World Evidence Platform to Enable Therapeutic Innovation
LFS302_Real-World Evidence Platform to Enable Therapeutic InnovationAmazon Web Services
801 views31 slides
Allscripts business slides by
Allscripts business slidesAllscripts business slides
Allscripts business slidesBookofPearls
468 views10 slides
Thesis_IR(Aptsource) by
Thesis_IR(Aptsource)Thesis_IR(Aptsource)
Thesis_IR(Aptsource)Chironjeet Dey
220 views45 slides

Similar to Automated and Explainable Deep Learning for Clinical Language Understanding at Roche(20)

Kim bonett cv_2016 by koko bonet
Kim bonett cv_2016Kim bonett cv_2016
Kim bonett cv_2016
koko bonet372 views
Dialysis Center Software from HospitalSoftwareShop.com by hospitalsoftwareshop
Dialysis Center Software from HospitalSoftwareShop.comDialysis Center Software from HospitalSoftwareShop.com
Dialysis Center Software from HospitalSoftwareShop.com
LFS302_Real-World Evidence Platform to Enable Therapeutic Innovation by Amazon Web Services
LFS302_Real-World Evidence Platform to Enable Therapeutic InnovationLFS302_Real-World Evidence Platform to Enable Therapeutic Innovation
LFS302_Real-World Evidence Platform to Enable Therapeutic Innovation
Allscripts business slides by BookofPearls
Allscripts business slidesAllscripts business slides
Allscripts business slides
BookofPearls468 views
The BioAssay Research Database by Rajarshi Guha
The BioAssay Research DatabaseThe BioAssay Research Database
The BioAssay Research Database
Rajarshi Guha1.8K views
Enabling patient-centricity-pfizer by David Teszler
Enabling patient-centricity-pfizerEnabling patient-centricity-pfizer
Enabling patient-centricity-pfizer
David Teszler71 views
Enabling Patient Centricity for Pfizer through AWS Cloud (LFS301-S-i) - AWS r... by Amazon Web Services
Enabling Patient Centricity for Pfizer through AWS Cloud (LFS301-S-i) - AWS r...Enabling Patient Centricity for Pfizer through AWS Cloud (LFS301-S-i) - AWS r...
Enabling Patient Centricity for Pfizer through AWS Cloud (LFS301-S-i) - AWS r...
Amazon Web Services1.2K views
Tech That Blow Your Mind (Digital Health) by VSee
Tech That Blow Your Mind (Digital Health)Tech That Blow Your Mind (Digital Health)
Tech That Blow Your Mind (Digital Health)
VSee389 views
HospitalSoftwareShop - Software for Ophthalmologists by hospitalsoftwareshop
HospitalSoftwareShop - Software for OphthalmologistsHospitalSoftwareShop - Software for Ophthalmologists
HospitalSoftwareShop - Software for Ophthalmologists
Applying NLP to Personalized Healthcare - 2021 by David Talby
Applying NLP to Personalized Healthcare - 2021Applying NLP to Personalized Healthcare - 2021
Applying NLP to Personalized Healthcare - 2021
David Talby16 views
The use of R statistical package in controlled infrastructure. The case of Cl... by Adrian Olszewski
The use of R statistical package in controlled infrastructure. The case of Cl...The use of R statistical package in controlled infrastructure. The case of Cl...
The use of R statistical package in controlled infrastructure. The case of Cl...
Adrian Olszewski2.2K views
Why ICT Fails in Healthcare: Software Maintenance and Maintainability by Koray Atalag
Why ICT Fails in Healthcare: Software Maintenance and MaintainabilityWhy ICT Fails in Healthcare: Software Maintenance and Maintainability
Why ICT Fails in Healthcare: Software Maintenance and Maintainability
Koray Atalag2.1K views
eHealth - Mark Yendt by GHBN
eHealth - Mark YendteHealth - Mark Yendt
eHealth - Mark Yendt
GHBN596 views
IRJET - Implementation of Disease Prediction Chatbot and Report Analyzer ... by IRJET Journal
IRJET -  	  Implementation of Disease Prediction Chatbot and Report Analyzer ...IRJET -  	  Implementation of Disease Prediction Chatbot and Report Analyzer ...
IRJET - Implementation of Disease Prediction Chatbot and Report Analyzer ...
IRJET Journal60 views
Overview of ePRO by challPHT
Overview of ePROOverview of ePRO
Overview of ePRO
challPHT1.1K views
FHIR for implementers in New Zealand by David Hay
FHIR for implementers in New ZealandFHIR for implementers in New Zealand
FHIR for implementers in New Zealand
David Hay2.1K views

More from Databricks

DW Migration Webinar-March 2022.pptx by
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
4.3K views25 slides
Data Lakehouse Symposium | Day 1 | Part 1 by
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
1.5K views43 slides
Data Lakehouse Symposium | Day 1 | Part 2 by
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
743 views16 slides
Data Lakehouse Symposium | Day 4 by
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
1.8K views74 slides
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop by
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
6.3K views64 slides
Democratizing Data Quality Through a Centralized Platform by
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
1.4K views36 slides

More from Databricks(20)

DW Migration Webinar-March 2022.pptx by Databricks
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks4.3K views
Data Lakehouse Symposium | Day 1 | Part 1 by Databricks
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks1.5K views
Data Lakehouse Symposium | Day 1 | Part 2 by Databricks
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks743 views
Data Lakehouse Symposium | Day 4 by Databricks
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks1.8K views
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop by Databricks
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks6.3K views
Democratizing Data Quality Through a Centralized Platform by Databricks
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks1.4K views
Learn to Use Databricks for Data Science by Databricks
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks1.6K views
Why APM Is Not the Same As ML Monitoring by Databricks
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks743 views
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix by Databricks
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks689 views
Stage Level Scheduling Improving Big Data and AI Integration by Databricks
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks850 views
Simplify Data Conversion from Spark to TensorFlow and PyTorch by Databricks
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks1.8K views
Scaling your Data Pipelines with Apache Spark on Kubernetes by Databricks
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks2.1K views
Scaling and Unifying SciKit Learn and Apache Spark Pipelines by Databricks
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks667 views
Sawtooth Windows for Feature Aggregations by Databricks
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks606 views
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink by Databricks
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks677 views
Re-imagine Data Monitoring with whylogs and Spark by Databricks
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks551 views
Raven: End-to-end Optimization of ML Prediction Queries by Databricks
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks450 views
Processing Large Datasets for ADAS Applications using Apache Spark by Databricks
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks513 views
Massive Data Processing in Adobe Using Delta Lake by Databricks
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks719 views
Machine Learning CI/CD for Email Attack Detection by Databricks
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
Databricks389 views

Recently uploaded

Running PostgreSQL in a Kubernetes cluster: CloudNativePG by
Running PostgreSQL in a Kubernetes cluster: CloudNativePGRunning PostgreSQL in a Kubernetes cluster: CloudNativePG
Running PostgreSQL in a Kubernetes cluster: CloudNativePGNick Ivanov
7 views29 slides
Oral presentation.pdf by
Oral presentation.pdfOral presentation.pdf
Oral presentation.pdfreemalmazroui8
5 views10 slides
PRIVACY AWRE PERSONAL DATA STORAGE by
PRIVACY AWRE PERSONAL DATA STORAGEPRIVACY AWRE PERSONAL DATA STORAGE
PRIVACY AWRE PERSONAL DATA STORAGEantony420421
8 views56 slides
LIVE OAK MEMORIAL PARK.pptx by
LIVE OAK MEMORIAL PARK.pptxLIVE OAK MEMORIAL PARK.pptx
LIVE OAK MEMORIAL PARK.pptxms2332always
8 views6 slides
Analytics Center of Excellence | Data CoE |Analytics CoE| WNS Triange by
Analytics Center of Excellence | Data CoE |Analytics CoE| WNS TriangeAnalytics Center of Excellence | Data CoE |Analytics CoE| WNS Triange
Analytics Center of Excellence | Data CoE |Analytics CoE| WNS TriangeRNayak3
5 views6 slides
Pydata Global 2023 - How can a learnt model unlearn something by
Pydata Global 2023 - How can a learnt model unlearn somethingPydata Global 2023 - How can a learnt model unlearn something
Pydata Global 2023 - How can a learnt model unlearn somethingSARADINDU SENGUPTA
8 views13 slides

Recently uploaded(20)

Running PostgreSQL in a Kubernetes cluster: CloudNativePG by Nick Ivanov
Running PostgreSQL in a Kubernetes cluster: CloudNativePGRunning PostgreSQL in a Kubernetes cluster: CloudNativePG
Running PostgreSQL in a Kubernetes cluster: CloudNativePG
Nick Ivanov7 views
PRIVACY AWRE PERSONAL DATA STORAGE by antony420421
PRIVACY AWRE PERSONAL DATA STORAGEPRIVACY AWRE PERSONAL DATA STORAGE
PRIVACY AWRE PERSONAL DATA STORAGE
antony4204218 views
LIVE OAK MEMORIAL PARK.pptx by ms2332always
LIVE OAK MEMORIAL PARK.pptxLIVE OAK MEMORIAL PARK.pptx
LIVE OAK MEMORIAL PARK.pptx
ms2332always8 views
Analytics Center of Excellence | Data CoE |Analytics CoE| WNS Triange by RNayak3
Analytics Center of Excellence | Data CoE |Analytics CoE| WNS TriangeAnalytics Center of Excellence | Data CoE |Analytics CoE| WNS Triange
Analytics Center of Excellence | Data CoE |Analytics CoE| WNS Triange
RNayak35 views
Pydata Global 2023 - How can a learnt model unlearn something by SARADINDU SENGUPTA
Pydata Global 2023 - How can a learnt model unlearn somethingPydata Global 2023 - How can a learnt model unlearn something
Pydata Global 2023 - How can a learnt model unlearn something
PyData Global 2022 - Things I learned while running neural networks on microc... by SARADINDU SENGUPTA
PyData Global 2022 - Things I learned while running neural networks on microc...PyData Global 2022 - Things I learned while running neural networks on microc...
PyData Global 2022 - Things I learned while running neural networks on microc...
Data about the sector workshop by info828217
Data about the sector workshopData about the sector workshop
Data about the sector workshop
info82821729 views
Customer Data Cleansing Project.pptx by Nat O
Customer Data Cleansing Project.pptxCustomer Data Cleansing Project.pptx
Customer Data Cleansing Project.pptx
Nat O6 views
Data Journeys Hard Talk workshop final.pptx by info828217
Data Journeys Hard Talk workshop final.pptxData Journeys Hard Talk workshop final.pptx
Data Journeys Hard Talk workshop final.pptx
info82821711 views
DGIQ East 2023 AI Ethics SIG by Karen Lopez
DGIQ East 2023 AI Ethics SIGDGIQ East 2023 AI Ethics SIG
DGIQ East 2023 AI Ethics SIG
Karen Lopez5 views
Best Home Security Systems.pptx by mogalang
Best Home Security Systems.pptxBest Home Security Systems.pptx
Best Home Security Systems.pptx
mogalang9 views
Dr. Ousmane Badiane-2023 ReSAKSS Conference by AKADEMIYA2063
Dr. Ousmane Badiane-2023 ReSAKSS ConferenceDr. Ousmane Badiane-2023 ReSAKSS Conference
Dr. Ousmane Badiane-2023 ReSAKSS Conference
AKADEMIYA20635 views
DGST Methodology Presentation.pdf by maddierlegum
DGST Methodology Presentation.pdfDGST Methodology Presentation.pdf
DGST Methodology Presentation.pdf
maddierlegum7 views
AZConf 2023 - Considerations for LLMOps: Running LLMs in production by SARADINDU SENGUPTA
AZConf 2023 - Considerations for LLMOps: Running LLMs in productionAZConf 2023 - Considerations for LLMOps: Running LLMs in production
AZConf 2023 - Considerations for LLMOps: Running LLMs in production

Automated and Explainable Deep Learning for Clinical Language Understanding at Roche

  • 2. Automated & Explainable Deep Learning for Clinical Language Understanding at Roche Vishakha Sharma, PhD Principal Data Scientist Roche Yogesh Pandit, MS Staff Software Engineer Roche David Talby, PhD Chief Technology Officer John Snow Labs
  • 3. Agenda The Clinical Language Understanding Challenge at Roche Why patients & doctors need accurate & automated natural language understanding at scale Delivering an Automated, Explained, State-of-the-art NLP & OCR System The deep-learning NLP & OCR models and pipelines that were built to address the challenge Achieving state-of-the-art accuracy on real healthcare data in production Reusing, training & tuning clinical embeddings, entity recognition, entity linking, and OCR
  • 4. Disclaimer ▪ Roche has been one of John Snow Labs’ customers since August 2018. ▪ This presentation has been prepared by Roche and John Snow Labs to provide a high-level overview of Roche’s use of John Snow Labs’ products. ▪ Nothing contained or stated herein or during the presentation constitutes Roche’s endorsement of John Snow Labs’ products. ▪ John Snow Labs is fully responsible for accuracy and completeness of any statements related to John Snow Labs’ products, including the product’s performance.
  • 5. The Roche difference Rooted in Science • Trusted in Healthcare Diagnostics #1 in biotechnology and in vitro diagnostics 20 billion diagnostic tests performed* Advanced scientific knowledge and technology that increases the medical value of diagnostic solutions Pharmaceuticals Leading provider of cancer treatments worldwide 127 million patients treated with Roche medicines* Focused on major medical indications and disease areas 30 Roche medicines on the WHO Model List of Essential Medicines* Decision Support Workflow | Data | Analytics Delivering Personalized Healthcare Decision support software leveraging more than 120 years of medical innovation rooted in science *Roche Annual Report, 2018 © 2019 F. Hoffmann-La Roche, Ltd NAVIFY is a trademark of Roche.
  • 6. NAVIFY Tumor Board NAVIFY Clinical Decision Support appsA cloud-based workflow product that securely integrates and displays relevant aggregated data into a single, holistic patient dashboard for oncology care teams to review, align and decide on the optimal treatment for the patient. The clinical decision support apps ecosystem is secured and fully integrated with NAVIFY Tumor Board. NAVIFY Guidelines app Delivering personalized up-to-date guidelines for reviewing and recording patient diagnostic and treatment paths and documentation adherence using an intuitive execution decision tree released in collaboration with GE Healthcare. NAVIFY Clinical Trial Match app* Easily search the largest international trial registries, including ClinicalTrials.gov, European Medicines Agency, Japan Medical Association Center for Clinical Trials, etc. NAVIFY Publication Search app* Effortlessly search more than 858,000 publications across PubMed, American Society of Clinical Oncology and American Association of Cancer Research. *Powered by MolecularMatch, Inc. © 2019 F. Hoffmann-La Roche, Ltd NAVIFY is a trademark of Roche.
  • 7. Unstructured healthcare data challenges for NAVIFY portfolio ▪ Diverse customers distributed across the world ▪ Multiple Languages ▪ Oncology ▪ Different report formats (ex: pathology, radiology) ▪ Different terminologies (ex: SNOMED, LOINC, ICD-O-3) Must unlock unstructured data to build a comprehensive, longitudinal view of the patient, and enable both clinical decision support and population analytics
  • 8. Sample Pathology Report Disclaimer: There is no real patient data being displayed here. Pathology reports are very diverse: ▪ Jargon ▪ Tables ▪ Key-value pairs ▪ Hand-written notes
  • 9. Manually Curated Report Manual curation is extremely time consuming, expensive, and prone to errors Disclaimer: There is no real patient data being displayed here.
  • 10. The NAVIFY team identified two significant needs Requirements for both: ▪ Scalable (support 10 million pathology and radiology reports per year ▪ Compliant with privacy laws ▪ Integrates easily with AWS services ▪ Low cost Natural Language Processing (NLP): ▪ High accuracy ▪ Specialized for medical data ▪ Minimize time to train new models ▪ Extensible for new content types Optical Character Recognition (OCR): ▪ High accuracy ▪ Retain document structure (i.e. tables, lists, paragraphs…)
  • 11. 45+ Oncology Entities to Extract Example: Surgical Pathology Report (Lung, Breast, Colon) Disclaimer: This is sample data from TCGA. There is no real patient data being displayed here.
  • 12. Optical Character Recognition (OCR) PDF Text engine_mode page_segmentation_mode erosion page_iterator_level scaling_factor Parameters Metrics word_error_rate character_error_rate bag_of_words_error_rate Experiment & Optimize
  • 13. Named Entity Recognition (NER) Spark NLP provides both CNN+Bi-LSTM and Bio-Bert implementations We trained a model to extract 45+ labels from Pathology reports Chiu, J. P., et al.(2016). Named entity recognition with bidirectional LSTM-CNNs. Transactions of the Association for Computational Linguistics, 4, 357-370. Bi-directional Long Short Term Memory with Convolutional Neural Network Tensorflow Models Devlin J. et al. (2019) Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA. pp. 4171–4186. Bidirectional Encoder Representations from Transformers (BERT) https://github.com/google-research/bert Lee, J., et al. (2020). BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4), 1234-1240. Bio-BERT
  • 14. Entity Resolution (ER) With just NER - we can not resolve entities to structured code Pre-trained models for resolving healthcare entities to standard SNOMED & ICD-10 codes
  • 17. Training NER model with BERT Initialization Training data Resources Annotator Pipeline Run Training
  • 18. The use of NLP will be a journey ▪ Initial goal of speeding up of pathology and radiology reports ▪ Faster curation and term highlighting in clinical reports ▪ Automate extraction of high-confidence entities, relationships
  • 19. What is Spark NLP? ▪ State of the art Natural Language Processing ▪ Production-grade, trainable, and scalable ▪ Open-Source Python, Java & Scala libraries ▪ 100+ Pre-trained models & pipelines ▪ Active: 26 new releases in 2018, 30 in 2019
  • 23. Speed Benchmarks • Optimized builds of Spark NLP for both Intel and Nvidia • Benchmark done on AWS: Train a French NER model • Achieving F1-score of 89% requires at least 80 Epochs with batch size of 512 • Intel outperformed Nvidia: Cascade Lake was 19% faster & 46% cheaper than Tesla P-100
  • 24. Clinical Entity Recognition: Accuracy Bert NerDLApproach 93.3 % on blind test set The best NER score in a production system
  • 25. Clinical Entity Resolution “CNN-based ranking for biomedical entity normalization”. Li et al., BMC Bioinformatics, October 2017.
  • 26. Learn more: Spark NLP Public Python notebooks - runnable on Google Colab with one click in a browser: github.com/JohnSnowLabs/spark-nlp-workshop/tree/master/tutorials/colab github.com/JohnSnowLabs/spark-nlp-workshop/tree/master/jupyter/ enterprise/healthcare/colab Overview of the Spark NLP: nlp.johnsnowlabs.com
  • 27. Thank you! We are hiring!!! www.navify.com/careers/ Try Spark NLP at nlp.johnsnowlabs.com