Director, Digital Healthcare Institute 

Managing Partner, Digital Healthcare Partners

Yoon Sup Choi, Ph.D.
Artificial Intelligence in Medicine
Disclaimer
I disclosure the conflict of interest with above corporates
Startup
VC
Hypes, Fantasies, Fears,
aggressive predictions on AI
“Technology will replace 80% of doctors”
“People should stop training radiologists now.
It’s just completely obvious within 5 years,
deep learning can do better than radiologists”
ARTICLES
https://doi.org/10.1038/s41591-018-0177-5
1
Applied Bioinformatics Laboratories, New York University School of Medicine, New York, NY, USA. 2
Skirball Institute, Department of Cell Biology,
New York University School of Medicine, New York, NY, USA. 3
Department of Pathology, New York University School of Medicine, New York, NY, USA.
4
School of Mechanical Engineering, National Technical University of Athens, Zografou, Greece. 5
Institute for Systems Genetics, New York University School
of Medicine, New York, NY, USA. 6
Department of Biochemistry and Molecular Pharmacology, New York University School of Medicine, New York, NY,
USA. 7
Center for Biospecimen Research and Development, New York University, New York, NY, USA. 8
Department of Population Health and the Center for
Healthcare Innovation and Delivery Science, New York University School of Medicine, New York, NY, USA. 9
These authors contributed equally to this work:
Nicolas Coudray, Paolo Santiago Ocampo. *e-mail: narges.razavian@nyumc.org; aristotelis.tsirigos@nyumc.org
A
ccording to the American Cancer Society and the Cancer
Statistics Center (see URLs), over 150,000 patients with lung
cancer succumb to the disease each year (154,050 expected
for 2018), while another 200,000 new cases are diagnosed on a
yearly basis (234,030 expected for 2018). It is one of the most widely
spread cancers in the world because of not only smoking, but also
exposure to toxic chemicals like radon, asbestos and arsenic. LUAD
and LUSC are the two most prevalent types of non–small cell lung
cancer1
, and each is associated with discrete treatment guidelines. In
the absence of definitive histologic features, this important distinc-
tion can be challenging and time-consuming, and requires confir-
matory immunohistochemical stains.
Classification of lung cancer type is a key diagnostic process
because the available treatment options, including conventional
chemotherapy and, more recently, targeted therapies, differ for
LUAD and LUSC2
. Also, a LUAD diagnosis will prompt the search
for molecular biomarkers and sensitizing mutations and thus has
a great impact on treatment options3,4
. For example, epidermal
growth factor receptor (EGFR) mutations, present in about 20% of
LUAD, and anaplastic lymphoma receptor tyrosine kinase (ALK)
rearrangements, present in<5% of LUAD5
, currently have tar-
geted therapies approved by the Food and Drug Administration
(FDA)6,7
. Mutations in other genes, such as KRAS and tumor pro-
tein P53 (TP53) are very common (about 25% and 50%, respec-
tively) but have proven to be particularly challenging drug targets
so far5,8
. Lung biopsies are typically used to diagnose lung cancer
type and stage. Virtual microscopy of stained images of tissues is
typically acquired at magnifications of 20×to 40×, generating very
large two-dimensional images (10,000 to>100,000 pixels in each
dimension) that are oftentimes challenging to visually inspect in
an exhaustive manner. Furthermore, accurate interpretation can be
difficult, and the distinction between LUAD and LUSC is not always
clear, particularly in poorly differentiated tumors; in this case, ancil-
lary studies are recommended for accurate classification9,10
. To assist
experts, automatic analysis of lung cancer whole-slide images has
been recently studied to predict survival outcomes11
and classifica-
tion12
. For the latter, Yu et al.12
combined conventional thresholding
and image processing techniques with machine-learning methods,
such as random forest classifiers, support vector machines (SVM) or
Naive Bayes classifiers, achieving an AUC of ~0.85 in distinguishing
normal from tumor slides, and ~0.75 in distinguishing LUAD from
LUSC slides. More recently, deep learning was used for the classi-
fication of breast, bladder and lung tumors, achieving an AUC of
0.83 in classification of lung tumor types on tumor slides from The
Cancer Genome Atlas (TCGA)13
. Analysis of plasma DNA values
was also shown to be a good predictor of the presence of non–small
cell cancer, with an AUC of ~0.94 (ref. 14
) in distinguishing LUAD
from LUSC, whereas the use of immunochemical markers yields an
AUC of ~0.94115
.
Here, we demonstrate how the field can further benefit from deep
learning by presenting a strategy based on convolutional neural
networks (CNNs) that not only outperforms methods in previously
Classification and mutation prediction from
non–small cell lung cancer histopathology
images using deep learning
Nicolas Coudray 1,2,9
, Paolo Santiago Ocampo3,9
, Theodore Sakellaropoulos4
, Navneet Narula3
,
Matija Snuderl3
, David Fenyö5,6
, Andre L. Moreira3,7
, Narges Razavian 8
* and Aristotelis Tsirigos 1,3
*
Visual inspection of histopathology slides is one of the main methods used by pathologists to assess the stage, type and sub-
type of lung tumors. Adenocarcinoma (LUAD) and squamous cell carcinoma (LUSC) are the most prevalent subtypes of lung
cancer, and their distinction requires visual inspection by an experienced pathologist. In this study, we trained a deep con-
volutional neural network (inception v3) on whole-slide images obtained from The Cancer Genome Atlas to accurately and
automatically classify them into LUAD, LUSC or normal lung tissue. The performance of our method is comparable to that of
pathologists, with an average area under the curve (AUC) of 0.97. Our model was validated on independent datasets of frozen
tissues, formalin-fixed paraffin-embedded tissues and biopsies. Furthermore, we trained the network to predict the ten most
commonly mutated genes in LUAD. We found that six of them—STK11, EGFR, FAT1, SETBP1, KRAS and TP53—can be pre-
dicted from pathology images, with AUCs from 0.733 to 0.856 as measured on a held-out population. These findings suggest
that deep-learning models can assist pathologists in the detection of cancer subtype or gene mutations. Our approach can be
applied to any cancer type, and the code is available at https://github.com/ncoudray/DeepPATH.
NATURE MEDICINE | www.nature.com/naturemedicine
LETTER https://doi.org/10.1038/s41586-019-1390-1
A clinically applicable approach to continuous
prediction of future acute kidney injury
Nenad Tomašev1
*, Xavier Glorot1
, Jack W. Rae1,2
, Michal Zielinski1
, Harry Askham1
, Andre Saraiva1
, Anne Mottram1
,
Clemens Meyer1
, Suman Ravuri1
, Ivan Protsyuk1
, Alistair Connell1
, Cían O. Hughes1
, Alan Karthikesalingam1
,
Julien Cornebise1,12
, Hugh Montgomery3
, Geraint Rees4
, Chris Laing5
, Clifton R. Baker6
, Kelly Peterson7,8
, Ruth Reeves9
,
Demis Hassabis1
, Dominic King1
, Mustafa Suleyman1
, Trevor Back1,13
, Christopher Nielson10,11,13
, Joseph R. Ledsam1,13
* &
Shakir Mohamed1,13
The early prediction of deterioration could have an important role
in supporting healthcare professionals, as an estimated 11% of
deaths in hospital follow a failure to promptly recognize and treat
deteriorating patients1
. To achieve this goal requires predictions
of patient risk that are continuously updated and accurate, and
delivered at an individual level with sufficient context and enough
time to act. Here we develop a deep learning approach for the
continuous risk prediction of future deterioration in patients,
building on recent work that models adverse events from electronic
health records2–17
and using acute kidney injury—a common and
potentially life-threatening condition18
—as an exemplar. Our
model was developed on a large, longitudinal dataset of electronic
health records that cover diverse clinical environments, comprising
703,782 adult patients across 172 inpatient and 1,062 outpatient
sites. Our model predicts 55.8% of all inpatient episodes of acute
kidney injury, and 90.2% of all acute kidney injuries that required
subsequent administration of dialysis, with a lead time of up to
48 h and a ratio of 2 false alerts for every true alert. In addition
to predicting future acute kidney injury, our model provides
confidence assessments and a list of the clinical features that are most
salient to each prediction, alongside predicted future trajectories
for clinically relevant blood tests9
. Although the recognition and
prompt treatment of acute kidney injury is known to be challenging,
our approach may offer opportunities for identifying patients at risk
within a time window that enables early treatment.
Adverse events and clinical complications are a major cause of mor-
tality and poor outcomes in patients, and substantial effort has been
made to improve their recognition18,19
. Few predictors have found their
way into routine clinical practice, because they either lack effective
sensitivity and specificity or report damage that already exists20
. One
example relates to acute kidney injury (AKI), a potentially life-threat-
ening condition that affects approximately one in five inpatient admis-
sions in the United States21
. Although a substantial proportion of cases
of AKI are thought to be preventable with early treatment22
, current
algorithms for detecting AKI depend on changes in serum creatinine
as a marker of acute decline in renal function. Increases in serum cre-
atinine lag behind renal injury by a considerable period, which results
in delayed access to treatment. This supports a case for preventative
‘screening’-type alerts but there is no evidence that current rule-based
alerts improve outcomes23
. For predictive alerts to be effective, they
must empower clinicians to act before a major clinical decline has
occurred by: (i) delivering actionable insights on preventable condi-
tions; (ii) being personalized for specific patients; (iii) offering suffi-
cient contextual information to inform clinical decision-making; and
(iv) being generally applicable across populations of patients24
.
Promising recent work on modelling adverse events from electronic
health records2–17
suggests that the incorporation of machine learning
may enable the early prediction of AKI. Existing examples of sequential
AKI risk models have either not demonstrated a clinically applicable
level of predictive performance25
or have focused on predictions across
a short time horizon that leaves little time for clinical assessment and
intervention26
.
Our proposed system is a recurrent neural network that operates
sequentially over individual electronic health records, processing the
data one step at a time and building an internal memory that keeps
track of relevant information seen up to that point. At each time point,
the model outputs a probability of AKI occurring at any stage of sever-
ity within the next 48 h (although our approach can be extended to
other time windows or severities of AKI; see Extended Data Table 1).
When the predicted probability exceeds a specified operating-point
threshold, the prediction is considered positive. This model was trained
using data that were curated from a multi-site retrospective dataset of
703,782 adult patients from all available sites at the US Department of
Veterans Affairs—the largest integrated healthcare system in the United
States. The dataset consisted of information that was available from
hospital electronic health records in digital format. The total number of
independent entries in the dataset was approximately 6 billion, includ-
ing 620,000 features. Patients were randomized across training (80%),
validation (5%), calibration (5%) and test (10%) sets. A ground-truth
label for the presence of AKI at any given point in time was added
using the internationally accepted ‘Kidney Disease: Improving Global
Outcomes’ (KDIGO) criteria18
; the incidence of KDIGO AKI was
13.4% of admissions. Detailed descriptions of the model and dataset
are provided in the Methods and Extended Data Figs. 1–3.
Figure 1 shows the use of our model. At every point throughout an
admission, the model provides updated estimates of future AKI risk
along with an associated degree of uncertainty. Providing the uncer-
tainty associated with a prediction may help clinicians to distinguish
ambiguous cases from those predictions that are fully supported by the
available data. Identifying an increased risk of future AKI sufficiently
far in advance is critical, as longer lead times may enable preventative
action to be taken. This is possible even when clinicians may not be
actively intervening with, or monitoring, a patient. Supplementary
Information section A provides more examples of the use of the model.
With our approach, 55.8% of inpatient AKI events of any severity
were predicted early, within a window of up to 48 h in advance and with
a ratio of 2 false predictions for every true positive. This corresponds
to an area under the receiver operating characteristic curve of 92.1%,
and an area under the precision–recall curve of 29.7%. When set at this
threshold, our predictive model would—if operationalized—trigger a
1
DeepMind, London, UK. 2
CoMPLEX, Computer Science, University College London, London, UK. 3
Institute for Human Health and Performance, University College London, London, UK. 4
Institute of
Cognitive Neuroscience, University College London, London, UK. 5
University College London Hospitals, London, UK. 6
Department of Veterans Affairs, Denver, CO, USA. 7
VA Salt Lake City Healthcare
System, Salt Lake City, UT, USA. 8
Division of Epidemiology, University of Utah, Salt Lake City, UT, USA. 9
Department of Veterans Affairs, Nashville, TN, USA. 10
University of Nevada School of
Medicine, Reno, NV, USA. 11
Department of Veterans Affairs, Salt Lake City, UT, USA. 12
Present address: University College London, London, UK. 13
These authors contributed equally: Trevor Back,
Christopher Nielson, Joseph R. Ledsam, Shakir Mohamed. *e-mail: nenadt@google.com; jledsam@google.com
1 1 6 | N A T U R E | V O L 5 7 2 | 1 A U G U S T 2 0 1 9
Copyright 2016 American Medical Association. All rights reserved.
Development and Validation of a Deep Learning Algorithm
for Detection of Diabetic Retinopathy
in Retinal Fundus Photographs
Varun Gulshan, PhD; Lily Peng, MD, PhD; Marc Coram, PhD; Martin C. Stumpe, PhD; Derek Wu, BS; Arunachalam Narayanaswamy, PhD;
Subhashini Venugopalan, MS; Kasumi Widner, MS; Tom Madams, MEng; Jorge Cuadros, OD, PhD; Ramasamy Kim, OD, DNB;
Rajiv Raman, MS, DNB; Philip C. Nelson, BS; Jessica L. Mega, MD, MPH; Dale R. Webster, PhD
IMPORTANCE Deep learning is a family of computational methods that allow an algorithm to
program itself by learning from a large set of examples that demonstrate the desired
behavior, removing the need to specify rules explicitly. Application of these methods to
medical imaging requires further assessment and validation.
OBJECTIVE To apply deep learning to create an algorithm for automated detection of diabetic
retinopathy and diabetic macular edema in retinal fundus photographs.
DESIGN AND SETTING A specific type of neural network optimized for image classification
called a deep convolutional neural network was trained using a retrospective development
data set of 128 175 retinal images, which were graded 3 to 7 times for diabetic retinopathy,
diabetic macular edema, and image gradability by a panel of 54 US licensed ophthalmologists
and ophthalmology senior residents between May and December 2015. The resultant
algorithm was validated in January and February 2016 using 2 separate data sets, both
graded by at least 7 US board-certified ophthalmologists with high intragrader consistency.
EXPOSURE Deep learning–trained algorithm.
MAIN OUTCOMES AND MEASURES The sensitivity and specificity of the algorithm for detecting
referable diabetic retinopathy (RDR), defined as moderate and worse diabetic retinopathy,
referable diabetic macular edema, or both, were generated based on the reference standard
of the majority decision of the ophthalmologist panel. The algorithm was evaluated at 2
operating points selected from the development set, one selected for high specificity and
another for high sensitivity.
RESULTS TheEyePACS-1datasetconsistedof9963imagesfrom4997patients(meanage,54.4
years;62.2%women;prevalenceofRDR,683/8878fullygradableimages[7.8%]);the
Messidor-2datasethad1748imagesfrom874patients(meanage,57.6years;42.6%women;
prevalenceofRDR,254/1745fullygradableimages[14.6%]).FordetectingRDR,thealgorithm
hadanareaunderthereceiveroperatingcurveof0.991(95%CI,0.988-0.993)forEyePACS-1and
0.990(95%CI,0.986-0.995)forMessidor-2.Usingthefirstoperatingcutpointwithhigh
specificity,forEyePACS-1,thesensitivitywas90.3%(95%CI,87.5%-92.7%)andthespecificity
was98.1%(95%CI,97.8%-98.5%).ForMessidor-2,thesensitivitywas87.0%(95%CI,81.1%-
91.0%)andthespecificitywas98.5%(95%CI,97.7%-99.1%).Usingasecondoperatingpoint
withhighsensitivityinthedevelopmentset,forEyePACS-1thesensitivitywas97.5%and
specificitywas93.4%andforMessidor-2thesensitivitywas96.1%andspecificitywas93.9%.
CONCLUSIONS AND RELEVANCE In this evaluation of retinal fundus photographs from adults
with diabetes, an algorithm based on deep machine learning had high sensitivity and
specificity for detecting referable diabetic retinopathy. Further research is necessary to
determine the feasibility of applying this algorithm in the clinical setting and to determine
whether use of the algorithm could lead to improved care and outcomes compared with
current ophthalmologic assessment.
JAMA. doi:10.1001/jama.2016.17216
Published online November 29, 2016.
Editorial
Supplemental content
Author Affiliations: Google Inc,
Mountain View, California (Gulshan,
Peng, Coram, Stumpe, Wu,
Narayanaswamy, Venugopalan,
Widner, Madams, Nelson, Webster);
Department of Computer Science,
University of Texas, Austin
(Venugopalan); EyePACS LLC,
San Jose, California (Cuadros); School
of Optometry, Vision Science
Graduate Group, University of
California, Berkeley (Cuadros);
Aravind Medical Research
Foundation, Aravind Eye Care
System, Madurai, India (Kim); Shri
Bhagwan Mahavir Vitreoretinal
Services, Sankara Nethralaya,
Chennai, Tamil Nadu, India (Raman);
Verily Life Sciences, Mountain View,
California (Mega); Cardiovascular
Division, Department of Medicine,
Brigham and Women’s Hospital and
Harvard Medical School, Boston,
Massachusetts (Mega).
Corresponding Author: Lily Peng,
MD, PhD, Google Research, 1600
Amphitheatre Way, Mountain View,
CA 94043 (lhpeng@google.com).
Research
JAMA | Original Investigation | INNOVATIONS IN HEALTH CARE DELIVERY
(Reprinted) E1
Copyright 2016 American Medical Association. All rights reserved.
Downloaded From: http://jamanetwork.com/ on 12/02/2016
ophthalmology
LETTERS
https://doi.org/10.1038/s41591-018-0335-9
1
Guangzhou Women and Children’s Medical Center, Guangzhou Medical University, Guangzhou, China. 2
Institute for Genomic Medicine, Institute of
Engineering in Medicine, and Shiley Eye Institute, University of California, San Diego, La Jolla, CA, USA. 3
Hangzhou YITU Healthcare Technology Co. Ltd,
Hangzhou, China. 4
Department of Thoracic Surgery/Oncology, First Affiliated Hospital of Guangzhou Medical University, China State Key Laboratory and
National Clinical Research Center for Respiratory Disease, Guangzhou, China. 5
Guangzhou Kangrui Co. Ltd, Guangzhou, China. 6
Guangzhou Regenerative
Medicine and Health Guangdong Laboratory, Guangzhou, China. 7
Veterans Administration Healthcare System, San Diego, CA, USA. 8
These authors contributed
equally: Huiying Liang, Brian Tsui, Hao Ni, Carolina C. S. Valentim, Sally L. Baxter, Guangjian Liu. *e-mail: kang.zhang@gmail.com; xiahumin@hotmail.com
Artificial intelligence (AI)-based methods have emerged as
powerful tools to transform medical care. Although machine
learning classifiers (MLCs) have already demonstrated strong
performance in image-based diagnoses, analysis of diverse
and massive electronic health record (EHR) data remains chal-
lenging. Here, we show that MLCs can query EHRs in a manner
similar to the hypothetico-deductive reasoning used by physi-
cians and unearth associations that previous statistical meth-
ods have not found. Our model applies an automated natural
language processing system using deep learning techniques
to extract clinically relevant information from EHRs. In total,
101.6 million data points from 1,362,559 pediatric patient
visits presenting to a major referral center were analyzed to
train and validate the framework. Our model demonstrates
high diagnostic accuracy across multiple organ systems and is
comparable to experienced pediatricians in diagnosing com-
mon childhood diseases. Our study provides a proof of con-
cept for implementing an AI-based system as a means to aid
physicians in tackling large amounts of data, augmenting diag-
nostic evaluations, and to provide clinical decision support in
cases of diagnostic uncertainty or complexity. Although this
impact may be most evident in areas where healthcare provid-
ers are in relative shortage, the benefits of such an AI system
are likely to be universal.
Medical information has become increasingly complex over
time. The range of disease entities, diagnostic testing and biomark-
ers, and treatment modalities has increased exponentially in recent
years. Subsequently, clinical decision-making has also become more
complex and demands the synthesis of decisions from assessment
of large volumes of data representing clinical information. In the
current digital age, the electronic health record (EHR) represents a
massive repository of electronic data points representing a diverse
array of clinical information1–3
. Artificial intelligence (AI) methods
have emerged as potentially powerful tools to mine EHR data to aid
in disease diagnosis and management, mimicking and perhaps even
augmenting the clinical decision-making of human physicians1
.
To formulate a diagnosis for any given patient, physicians fre-
quently use hypotheticodeductive reasoning. Starting with the chief
complaint, the physician then asks appropriately targeted questions
relating to that complaint. From this initial small feature set, the
physician forms a differential diagnosis and decides what features
(historical questions, physical exam findings, laboratory testing,
and/or imaging studies) to obtain next in order to rule in or rule
out the diagnoses in the differential diagnosis set. The most use-
ful features are identified, such that when the probability of one of
the diagnoses reaches a predetermined level of acceptability, the
process is stopped, and the diagnosis is accepted. It may be pos-
sible to achieve an acceptable level of certainty of the diagnosis with
only a few features without having to process the entire feature set.
Therefore, the physician can be considered a classifier of sorts.
In this study, we designed an AI-based system using machine
learning to extract clinically relevant features from EHR notes to
mimic the clinical reasoning of human physicians. In medicine,
machine learning methods have already demonstrated strong per-
formance in image-based diagnoses, notably in radiology2
, derma-
tology4
, and ophthalmology5–8
, but analysis of EHR data presents
a number of difficult challenges. These challenges include the vast
quantity of data, high dimensionality, data sparsity, and deviations
Evaluation and accurate diagnoses of pediatric
diseases using artificial intelligence
Huiying Liang1,8
, Brian Y. Tsui 2,8
, Hao Ni3,8
, Carolina C. S. Valentim4,8
, Sally L. Baxter 2,8
,
Guangjian Liu1,8
, Wenjia Cai 2
, Daniel S. Kermany1,2
, Xin Sun1
, Jiancong Chen2
, Liya He1
, Jie Zhu1
,
Pin Tian2
, Hua Shao2
, Lianghong Zheng5,6
, Rui Hou5,6
, Sierra Hewett1,2
, Gen Li1,2
, Ping Liang3
,
Xuan Zang3
, Zhiqi Zhang3
, Liyan Pan1
, Huimin Cai5,6
, Rujuan Ling1
, Shuhua Li1
, Yongwang Cui1
,
Shusheng Tang1
, Hong Ye1
, Xiaoyan Huang1
, Waner He1
, Wenqing Liang1
, Qing Zhang1
, Jianmin Jiang1
,
Wei Yu1
, Jianqun Gao1
, Wanxing Ou1
, Yingmin Deng1
, Qiaozhen Hou1
, Bei Wang1
, Cuichan Yao1
,
Yan Liang1
, Shu Zhang1
, Yaou Duan2
, Runze Zhang2
, Sarah Gibson2
, Charlotte L. Zhang2
, Oulan Li2
,
Edward D. Zhang2
, Gabriel Karin2
, Nathan Nguyen2
, Xiaokang Wu1,2
, Cindy Wen2
, Jie Xu2
, Wenqin Xu2
,
Bochu Wang2
, Winston Wang2
, Jing Li1,2
, Bianca Pizzato2
, Caroline Bao2
, Daoman Xiang1
, Wanting He1,2
,
Suiqin He2
, Yugui Zhou1,2
, Weldon Haw2,7
, Michael Goldbaum2
, Adriana Tremoulet2
, Chun-Nan Hsu 2
,
Hannah Carter2
, Long Zhu3
, Kang Zhang 1,2,7
* and Huimin Xia 1
*
NATURE MEDICINE | www.nature.com/naturemedicine
pediatrics
pathology
1Wang P, et al. Gut 2019;0:1–7. doi:10.1136/gutjnl-2018-317500
Endoscopy
ORIGINAL ARTICLE
Real-time automatic detection system increases
colonoscopic polyp and adenoma detection rates: a
prospective randomised controlled study
Pu Wang,  1
Tyler M Berzin,  2
Jeremy Romek Glissen Brown,  2
Shishira Bharadwaj,2
Aymeric Becq,2
Xun Xiao,1
Peixi Liu,1
Liangping Li,1
Yan Song,1
Di Zhang,1
Yi Li,1
Guangre Xu,1
Mengtian Tu,1
Xiaogang Liu  1
To cite: Wang P, Berzin TM,
Glissen Brown JR, et al. Gut
Epub ahead of print: [please
include Day Month Year].
doi:10.1136/
gutjnl-2018-317500
► Additional material is
published online only.To view
please visit the journal online
(http://dx.doi.org/10.1136/
gutjnl-2018-317500).
1
Department of
Gastroenterology, Sichuan
Academy of Medical Sciences
& Sichuan Provincial People’s
Hospital, Chengdu, China
2
Center for Advanced
Endoscopy, Beth Israel
Deaconess Medical Center and
Harvard Medical School, Boston,
Massachusetts, USA
Correspondence to
Xiaogang Liu, Department
of Gastroenterology Sichuan
Academy of Medical Sciences
and Sichuan Provincial People’s
Hospital, Chengdu, China;
Gary.samsph@gmail.com
Received 30 August 2018
Revised 4 February 2019
Accepted 13 February 2019
© Author(s) (or their
employer(s)) 2019. Re-use
permitted under CC BY-NC. No
commercial re-use. See rights
and permissions. Published
by BMJ.
ABSTRACT
Objective The effect of colonoscopy on colorectal
cancer mortality is limited by several factors, among them
a certain miss rate, leading to limited adenoma detection
rates (ADRs).We investigated the effect of an automatic
polyp detection system based on deep learning on polyp
detection rate and ADR.
Design In an open, non-blinded trial, consecutive
patients were prospectively randomised to undergo
diagnostic colonoscopy with or without assistance of a
real-time automatic polyp detection system providing
a simultaneous visual notice and sound alarm on polyp
detection.The primary outcome was ADR.
Results Of 1058 patients included, 536 were
randomised to standard colonoscopy, and 522 were
randomised to colonoscopy with computer-aided
diagnosis.The artificial intelligence (AI) system
significantly increased ADR (29.1%vs20.3%, p<0.001)
and the mean number of adenomas per patient
(0.53vs0.31, p<0.001).This was due to a higher number
of diminutive adenomas found (185vs102; p<0.001),
while there was no statistical difference in larger
adenomas (77vs58, p=0.075). In addition, the number
of hyperplastic polyps was also significantly increased
(114vs52, p<0.001).
Conclusions In a low prevalent ADR population, an
automatic polyp detection system during colonoscopy
resulted in a significant increase in the number of
diminutive adenomas detected, as well as an increase in
the rate of hyperplastic polyps.The cost–benefit ratio of
such effects has to be determined further.
Trial registration number ChiCTR-DDD-17012221;
Results.
INTRODUCTION
Colorectal cancer (CRC) is the second and third-
leading causes of cancer-related deaths in men and
women respectively.1
Colonoscopy is the gold stan-
dard for screening CRC.2 3
Screening colonoscopy
has allowed for a reduction in the incidence and
mortality of CRC via the detection and removal
of adenomatous polyps.4–8
Additionally, there is
evidence that with each 1.0% increase in adenoma
detection rate (ADR), there is an associated 3.0%
decrease in the risk of interval CRC.9 10
However,
polyps can be missed, with reported miss rates of
up to 27% due to both polyp and operator charac-
teristics.11 12
Unrecognised polyps within the visual field is
an important problem to address.11
Several studies
have shown that assistance by a second observer
increases the polyp detection rate (PDR), but such a
strategy remains controversial in terms of increasing
the ADR.13–15
Ideally, a real-time automatic polyp detec-
tion system, with performance close to that of
expert endoscopists, could assist the endosco-
pist in detecting lesions that might correspond to
adenomas in a more consistent and reliable way
Significance of this study
What is already known on this subject?
► Colorectal adenoma detection rate (ADR)
is regarded as a main quality indicator of
(screening) colonoscopy and has been shown
to correlate with interval cancers. Reducing
adenoma miss rates by increasing ADR has
been a goal of many studies focused on
imaging techniques and mechanical methods.
► Artificial intelligence has been recently
introduced for polyp and adenoma detection
as well as differentiation and has shown
promising results in preliminary studies.
What are the new findings?
► This represents the first prospective randomised
controlled trial examining an automatic polyp
detection during colonoscopy and shows an
increase of ADR by 50%, from 20% to 30%.
► This effect was mainly due to a higher rate of
small adenomas found.
► The detection rate of hyperplastic polyps was
also significantly increased.
How might it impact on clinical practice in the
foreseeable future?
► Automatic polyp and adenoma detection could
be the future of diagnostic colonoscopy in order
to achieve stable high adenoma detection rates.
► However, the effect on ultimate outcome is
still unclear, and further improvements such as
polyp differentiation have to be implemented.
on17March2019byguest.Protectedbycopyright.http://gut.bmj.com/Gut:firstpublishedas10.1136/gutjnl-2018-317500on27February2019.Downloadedfrom
pathology
S E P S I S
A targeted real-time early warning score (TREWScore)
for septic shock
Katharine E. Henry,1
David N. Hager,2
Peter J. Pronovost,3,4,5
Suchi Saria1,3,5,6
*
Sepsis is a leading cause of death in the United States, with mortality highest among patients who develop septic
shock. Early aggressive treatment decreases morbidity and mortality. Although automated screening tools can detect
patients currently experiencing severe sepsis and septic shock, none predict those at greatest risk of developing
shock. We analyzed routinely available physiological and laboratory data from intensive care unit patients and devel-
oped “TREWScore,” a targeted real-time early warning score that predicts which patients will develop septic shock.
TREWScore identified patients before the onset of septic shock with an area under the ROC (receiver operating
characteristic) curve (AUC) of 0.83 [95% confidence interval (CI), 0.81 to 0.85]. At a specificity of 0.67, TREWScore
achieved a sensitivity of 0.85 and identified patients a median of 28.2 [interquartile range (IQR), 10.6 to 94.2] hours
before onset. Of those identified, two-thirds were identified before any sepsis-related organ dysfunction. In compar-
ison, the Modified Early Warning Score, which has been used clinically for septic shock prediction, achieved a lower
AUC of 0.73 (95% CI, 0.71 to 0.76). A routine screening protocol based on the presence of two of the systemic inflam-
matory response syndrome criteria, suspicion of infection, and either hypotension or hyperlactatemia achieved a low-
er sensitivity of 0.74 at a comparable specificity of 0.64. Continuous sampling of data from the electronic health
records and calculation of TREWScore may allow clinicians to identify patients at risk for septic shock and provide
earlier interventions that would prevent or mitigate the associated morbidity and mortality.
INTRODUCTION
Seven hundred fifty thousand patients develop severe sepsis and septic
shock in the United States each year. More than half of them are
admitted to an intensive care unit (ICU), accounting for 10% of all
ICU admissions, 20 to 30% of hospital deaths, and $15.4 billion in an-
nual health care costs (1–3). Several studies have demonstrated that
morbidity, mortality, and length of stay are decreased when severe sep-
sis and septic shock are identified and treated early (4–8). In particular,
one study showed that mortality from septic shock increased by 7.6%
with every hour that treatment was delayed after the onset of hypo-
tension (9).
More recent studies comparing protocolized care, usual care, and
early goal-directed therapy (EGDT) for patients with septic shock sug-
gest that usual care is as effective as EGDT (10–12). Some have inter-
preted this to mean that usual care has improved over time and reflects
important aspects of EGDT, such as early antibiotics and early ag-
gressive fluid resuscitation (13). It is likely that continued early identi-
fication and treatment will further improve outcomes. However, the
best approach to managing patients at high risk of developing septic
shock before the onset of severe sepsis or shock has not been studied.
Methods that can identify ahead of time which patients will later expe-
rience septic shock are needed to further understand, study, and im-
prove outcomes in this population.
General-purpose illness severity scoring systems such as the Acute
Physiology and Chronic Health Evaluation (APACHE II), Simplified
Acute Physiology Score (SAPS II), SequentialOrgan Failure Assessment
(SOFA) scores, Modified Early Warning Score (MEWS), and Simple
Clinical Score (SCS) have been validated to assess illness severity and
risk of death among septic patients (14–17). Although these scores
are useful for predicting general deterioration or mortality, they typical-
ly cannot distinguish with high sensitivity and specificity which patients
are at highest risk of developing a specific acute condition.
The increased use of electronic health records (EHRs), which can be
queried in real time, has generated interest in automating tools that
identify patients at risk for septic shock (18–20). A number of “early
warning systems,” “track and trigger” initiatives, “listening applica-
tions,” and “sniffers” have been implemented to improve detection
andtimelinessof therapy forpatients with severe sepsis andseptic shock
(18, 20–23). Although these tools have been successful at detecting pa-
tients currently experiencing severe sepsis or septic shock, none predict
which patients are at highest risk of developing septic shock.
The adoption of the Affordable Care Act has added to the growing
excitement around predictive models derived from electronic health
data in a variety of applications (24), including discharge planning
(25), risk stratification (26, 27), and identification of acute adverse
events (28, 29). For septic shock in particular, promising work includes
that of predicting septic shock using high-fidelity physiological signals
collected directly from bedside monitors (30, 31), inferring relationships
between predictors of septic shock using Bayesian networks (32), and
using routine measurements for septic shock prediction (33–35). No
current prediction models that use only data routinely stored in the
EHR predict septic shock with high sensitivity and specificity many
hours before onset. Moreover, when learning predictive risk scores, cur-
rent methods (34, 36, 37) often have not accounted for the censoring
effects of clinical interventions on patient outcomes (38). For instance,
a patient with severe sepsis who received fluids and never developed
septic shock would be treated as a negative case, despite the possibility
that he or she might have developed septic shock in the absence of such
treatment and therefore could be considered a positive case up until the
1
Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA.
2
Division of Pulmonary and Critical Care Medicine, Department of Medicine, School of
Medicine, Johns Hopkins University, Baltimore, MD 21205, USA. 3
Armstrong Institute for
Patient Safety and Quality, Johns Hopkins University, Baltimore, MD 21202, USA. 4
Department
of Anesthesiology and Critical Care Medicine, School of Medicine, Johns Hopkins University,
Baltimore, MD 21202, USA. 5
Department of Health Policy and Management, Bloomberg
School of Public Health, Johns Hopkins University, Baltimore, MD 21205, USA. 6
Department
of Applied Math and Statistics, Johns Hopkins University, Baltimore, MD 21218, USA.
*Corresponding author. E-mail: ssaria@cs.jhu.edu
R E S E A R C H A R T I C L E
www.ScienceTranslationalMedicine.org 5 August 2015 Vol 7 Issue 299 299ra122 1
onNovember3,2016http://stm.sciencemag.org/Downloadedfrom
infectious
BRIEF COMMUNICATION OPEN
Digital biomarkers of cognitive function
Paul Dagum1
To identify digital biomarkers associated with cognitive function, we analyzed human–computer interaction from 7 days of
smartphone use in 27 subjects (ages 18–34) who received a gold standard neuropsychological assessment. For several
neuropsychological constructs (working memory, memory, executive function, language, and intelligence), we found a family of
digital biomarkers that predicted test scores with high correlations (p < 10−4
). These preliminary results suggest that passive
measures from smartphone use could be a continuous ecological surrogate for laboratory-based neuropsychological assessment.
npj Digital Medicine (2018)1:10 ; doi:10.1038/s41746-018-0018-4
INTRODUCTION
By comparison to the functional metrics available in other
disciplines, conventional measures of neuropsychiatric disorders
have several challenges. First, they are obtrusive, requiring a
subject to break from their normal routine, dedicating time and
often travel. Second, they are not ecological and require subjects
to perform a task outside of the context of everyday behavior.
Third, they are episodic and provide sparse snapshots of a patient
only at the time of the assessment. Lastly, they are poorly scalable,
taxing limited resources including space and trained staff.
In seeking objective and ecological measures of cognition, we
attempted to develop a method to measure memory and
executive function not in the laboratory but in the moment,
day-to-day. We used human–computer interaction on smart-
phones to identify digital biomarkers that were correlated with
neuropsychological performance.
RESULTS
In 2014, 27 participants (ages 27.1 ± 4.4 years, education
14.1 ± 2.3 years, M:F 8:19) volunteered for neuropsychological
assessment and a test of the smartphone app. Smartphone
human–computer interaction data from the 7 days following
the neuropsychological assessment showed a range of correla-
tions with the cognitive scores. Table 1 shows the correlation
between each neurocognitive test and the cross-validated
predictions of the supervised kernel PCA constructed from
the biomarkers for that test. Figure 1 shows each participant
test score and the digital biomarker prediction for (a) digits
backward, (b) symbol digit modality, (c) animal fluency,
(d) Wechsler Memory Scale-3rd Edition (WMS-III) logical
memory (delayed free recall), (e) brief visuospatial memory test
(delayed free recall), and (f) Wechsler Adult Intelligence Scale-
4th Edition (WAIS-IV) block design. Construct validity of the
predictions was determined using pattern matching that
computed a correlation of 0.87 with p < 10−59
between the
covariance matrix of the predictions and the covariance matrix
of the tests.
Table 1. Fourteen neurocognitive assessments covering five cognitive
domains and dexterity were performed by a neuropsychologist.
Shown are the group mean and standard deviation, range of score,
and the correlation between each test and the cross-validated
prediction constructed from the digital biomarkers for that test
Cognitive predictions
Mean (SD) Range R (predicted),
p-value
Working memory
Digits forward 10.9 (2.7) 7–15 0.71 ± 0.10, 10−4
Digits backward 8.3 (2.7) 4–14 0.75 ± 0.08, 10−5
Executive function
Trail A 23.0 (7.6) 12–39 0.70 ± 0.10, 10−4
Trail B 53.3 (13.1) 37–88 0.82 ± 0.06, 10−6
Symbol digit modality 55.8 (7.7) 43–67 0.70 ± 0.10, 10−4
Language
Animal fluency 22.5 (3.8) 15–30 0.67 ± 0.11, 10−4
FAS phonemic fluency 42 (7.1) 27–52 0.63 ± 0.12, 10−3
Dexterity
Grooved pegboard test
(dominant hand)
62.7 (6.7) 51–75 0.73 ± 0.09, 10−4
Memory
California verbal learning test
(delayed free recall)
14.1 (1.9) 9–16 0.62 ± 0.12, 10−3
WMS-III logical memory
(delayed free recall)
29.4 (6.2) 18–42 0.81 ± 0.07, 10−6
Brief visuospatial memory test
(delayed free recall)
10.2 (1.8) 5–12 0.77 ± 0.08, 10−5
Intelligence scale
WAIS-IV block design 46.1(12.8) 12–61 0.83 ± 0.06, 10−6
WAIS-IV matrix reasoning 22.1(3.3) 12–26 0.80 ± 0.07, 10−6
WAIS-IV vocabulary 40.6(4.0) 31–50 0.67 ± 0.11, 10−4
Received: 5 October 2017 Revised: 3 February 2018 Accepted: 7 February 2018
1
Mindstrong Health, 248 Homer Street, Palo Alto, CA 94301, USA
Correspondence: Paul Dagum (paul@mindstronghealth.com)
www.nature.com/npjdigitalmed
psychiatry
P R E C I S I O N M E D I C I N E
Identification of type 2 diabetes subgroups through
topological analysis of patient similarity
Li Li,1
Wei-Yi Cheng,1
Benjamin S. Glicksberg,1
Omri Gottesman,2
Ronald Tamler,3
Rong Chen,1
Erwin P. Bottinger,2
Joel T. Dudley1,4
*
Type 2 diabetes (T2D) is a heterogeneous complex disease affecting more than 29 million Americans alone with a
rising prevalence trending toward steady increases in the coming decades. Thus, there is a pressing clinical need to
improve early prevention and clinical management of T2D and its complications. Clinicians have understood that
patients who carry the T2D diagnosis have a variety of phenotypes and susceptibilities to diabetes-related compli-
cations. We used a precision medicine approach to characterize the complexity of T2D patient populations based
on high-dimensional electronic medical records (EMRs) and genotype data from 11,210 individuals. We successfully
identified three distinct subgroups of T2D from topology-based patient-patient networks. Subtype 1 was character-
ized by T2D complications diabetic nephropathy and diabetic retinopathy; subtype 2 was enriched for cancer ma-
lignancy and cardiovascular diseases; and subtype 3 was associated most strongly with cardiovascular diseases,
neurological diseases, allergies, and HIV infections. We performed a genetic association analysis of the emergent
T2D subtypes to identify subtype-specific genetic markers and identified 1279, 1227, and 1338 single-nucleotide
polymorphisms (SNPs) that mapped to 425, 322, and 437 unique genes specific to subtypes 1, 2, and 3, respec-
tively. By assessing the human disease–SNP association for each subtype, the enriched phenotypes and
biological functions at the gene level for each subtype matched with the disease comorbidities and clinical dif-
ferences that we identified through EMRs. Our approach demonstrates the utility of applying the precision
medicine paradigm in T2D and the promise of extending the approach to the study of other complex, multi-
factorial diseases.
INTRODUCTION
Type 2 diabetes (T2D) is a complex, multifactorial disease that has
emerged as an increasing prevalent worldwide health concern asso-
ciated with high economic and physiological burdens. An estimated
29.1 million Americans (9.3% of the population) were estimated to
have some form of diabetes in 2012—up 13% from 2010—with T2D
representing up to 95% of all diagnosed cases (1, 2). Risk factors for
T2D include obesity, family history of diabetes, physical inactivity, eth-
nicity, and advanced age (1, 2). Diabetes and its complications now
rank among the leading causes of death in the United States (2). In fact,
diabetes is the leading cause of nontraumatic foot amputation, adult
blindness, and need for kidney dialysis, and multiplies risk for myo-
cardial infarction, peripheral artery disease, and cerebrovascular disease
(3–6). The total estimated direct medical cost attributable to diabetes in
the United States in 2012 was $176 billion, with an estimated $76 billion
attributable to hospital inpatient care alone. There is a great need to im-
prove understanding of T2D and its complex factors to facilitate pre-
vention, early detection, and improvements in clinical management.
A more precise characterization of T2D patient populations can en-
hance our understanding of T2D pathophysiology (7, 8). Current
clinical definitions classify diabetes into three major subtypes: type 1 dia-
betes (T1D), T2D, and maturity-onset diabetes of the young. Other sub-
types based on phenotype bridge the gap between T1D and T2D, for
example, latent autoimmune diabetes in adults (LADA) (7) and ketosis-
prone T2D. The current categories indicate that the traditional definition of
diabetes, especially T2D, might comprise additional subtypes with dis-
tinct clinical characteristics. A recent analysis of the longitudinal Whitehall
II cohort study demonstrated improved assessment of cardiovascular
risks when subgrouping T2D patients according to glucose concentration
criteria (9). Genetic association studies reveal that the genetic architec-
ture of T2D is profoundly complex (10–12). Identified T2D-associated
risk variants exhibit allelic heterogeneity and directional differentiation
among populations (13, 14). The apparent clinical and genetic com-
plexity and heterogeneity of T2D patient populations suggest that there
are opportunities to refine the current, predominantly symptom-based,
definition of T2D into additional subtypes (7).
Because etiological and pathophysiological differences exist among
T2D patients, we hypothesize that a data-driven analysis of a clinical
population could identify new T2D subtypes and factors. Here, we de-
velop a data-driven, topology-based approach to (i) map the complexity
of patient populations using clinical data from electronic medical re-
cords (EMRs) and (ii) identify new, emergent T2D patient subgroups
with subtype-specific clinical and genetic characteristics. We apply this
approachtoadatasetcomprisingmatchedEMRsandgenotypedatafrom
more than 11,000 individuals. Topological analysis of these data revealed
three distinct T2D subtypes that exhibited distinct patterns of clinical
characteristics and disease comorbidities. Further, we identified genetic
markers associated with each T2D subtype and performed gene- and
pathway-level analysis of subtype genetic associations. Biological and
phenotypic features enriched in the genetic analysis corroborated clinical
disparities observed among subgroups. Our findings suggest that data-
driven,topologicalanalysisofpatientcohortshasutilityinprecisionmedicine
effortstorefineourunderstandingofT2Dtowardimproving patient care.
1
Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount
Sinai, 700 Lexington Ave., New York, NY 10065, USA. 2
Institute for Personalized Medicine,
Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029,
USA. 3
Division of Endocrinology, Diabetes, and Bone Diseases, Icahn School of Medicine
at Mount Sinai, New York, NY 10029, USA. 4
Department of Health Policy and Research,
Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
*Corresponding author. E-mail: joel.dudley@mssm.edu
R E S E A R C H A R T I C L E
www.ScienceTranslationalMedicine.org 28 October 2015 Vol 7 Issue 311 311ra174 1
onOctober28,2015http://stm.sciencemag.org/Downloadedfrom
endocrinology
0 0 M O N T H 2 0 1 7 | V O L 0 0 0 | N A T U R E | 1
LETTER doi:10.1038/nature21056
Dermatologist-level classification of skin cancer
with deep neural networks
Andre Esteva1
*, Brett Kuprel1
*, Roberto A. Novoa2,3
, Justin Ko2
, Susan M. Swetter2,4
, Helen M. Blau5
& Sebastian Thrun6
Skin cancer, the most common human malignancy1–3
, is primarily
diagnosed visually, beginning with an initial clinical screening
and followed potentially by dermoscopic analysis, a biopsy and
histopathological examination. Automated classification of skin
lesions using images is a challenging task owing to the fine-grained
variability in the appearance of skin lesions. Deep convolutional
neural networks (CNNs)4,5
show potential for general and highly
variable tasks across many fine-grained object categories6–11
.
Here we demonstrate classification of skin lesions using a single
CNN, trained end-to-end from images directly, using only pixels
and disease labels as inputs. We train a CNN using a dataset of
129,450 clinical images—two orders of magnitude larger than
previous datasets12
—consisting of 2,032 different diseases. We
test its performance against 21 board-certified dermatologists on
biopsy-proven clinical images with two critical binary classification
use cases: keratinocyte carcinomas versus benign seborrheic
keratoses; and malignant melanomas versus benign nevi. The first
case represents the identification of the most common cancers, the
second represents the identification of the deadliest skin cancer.
The CNN achieves performance on par with all tested experts
across both tasks, demonstrating an artificial intelligence capable
of classifying skin cancer with a level of competence comparable to
dermatologists. Outfitted with deep neural networks, mobile devices
can potentially extend the reach of dermatologists outside of the
clinic. It is projected that 6.3 billion smartphone subscriptions will
exist by the year 2021 (ref. 13) and can therefore potentially provide
low-cost universal access to vital diagnostic care.
There are 5.4 million new cases of skin cancer in the United States2
every year. One in five Americans will be diagnosed with a cutaneous
malignancy in their lifetime. Although melanomas represent fewer than
5% of all skin cancers in the United States, they account for approxi-
mately 75% of all skin-cancer-related deaths, and are responsible for
over 10,000 deaths annually in the United States alone. Early detection
is critical, as the estimated 5-year survival rate for melanoma drops
from over 99% if detected in its earliest stages to about 14% if detected
in its latest stages. We developed a computational method which may
allow medical practitioners and patients to proactively track skin
lesions and detect cancer earlier. By creating a novel disease taxonomy,
and a disease-partitioning algorithm that maps individual diseases into
training classes, we are able to build a deep learning system for auto-
mated dermatology.
Previous work in dermatological computer-aided classification12,14,15
has lacked the generalization capability of medical practitioners
owing to insufficient data and a focus on standardized tasks such as
dermoscopy16–18
and histological image classification19–22
. Dermoscopy
images are acquired via a specialized instrument and histological
images are acquired via invasive biopsy and microscopy; whereby
both modalities yield highly standardized images. Photographic
images (for example, smartphone images) exhibit variability in factors
such as zoom, angle and lighting, making classification substantially
more challenging23,24
. We overcome this challenge by using a data-
driven approach—1.41 million pre-training and training images
make classification robust to photographic variability. Many previous
techniques require extensive preprocessing, lesion segmentation and
extraction of domain-specific visual features before classification. By
contrast, our system requires no hand-crafted features; it is trained
end-to-end directly from image labels and raw pixels, with a single
network for both photographic and dermoscopic images. The existing
body of work uses small datasets of typically less than a thousand
images of skin lesions16,18,19
, which, as a result, do not generalize well
to new images. We demonstrate generalizable classification with a new
dermatologist-labelled dataset of 129,450 clinical images, including
3,374 dermoscopy images.
Deep learning algorithms, powered by advances in computation
and very large datasets25
, have recently been shown to exceed human
performance in visual tasks such as playing Atari games26
, strategic
board games like Go27
and object recognition6
. In this paper we
outline the development of a CNN that matches the performance of
dermatologists at three key diagnostic tasks: melanoma classification,
melanoma classification using dermoscopy and carcinoma
classification. We restrict the comparisons to image-based classification.
We utilize a GoogleNet Inception v3 CNN architecture9
that was pre-
trained on approximately 1.28 million images (1,000 object categories)
from the 2014 ImageNet Large Scale Visual Recognition Challenge6
,
and train it on our dataset using transfer learning28
. Figure 1 shows the
working system. The CNN is trained using 757 disease classes. Our
dataset is composed of dermatologist-labelled images organized in a
tree-structured taxonomy of 2,032 diseases, in which the individual
diseases form the leaf nodes. The images come from 18 different
clinician-curated, open-access online repositories, as well as from
clinical data from Stanford University Medical Center. Figure 2a shows
a subset of the full taxonomy, which has been organized clinically and
visually by medical experts. We split our dataset into 127,463 training
and validation images and 1,942 biopsy-labelled test images.
To take advantage of fine-grained information contained within the
taxonomy structure, we develop an algorithm (Extended Data Table 1)
to partition diseases into fine-grained training classes (for example,
amelanotic melanoma and acrolentiginous melanoma). During
inference, the CNN outputs a probability distribution over these fine
classes. To recover the probabilities for coarser-level classes of interest
(for example, melanoma) we sum the probabilities of their descendants
(see Methods and Extended Data Fig. 1 for more details).
We validate the effectiveness of the algorithm in two ways, using
nine-fold cr
dermatology
FOCUS LETTERS
W
W
W
Ca d o og s eve a hy hm a de ec on and
c ass ca on n ambu a o y e ec oca d og ams
us ng a deep neu a ne wo k
M m
M
FOCUS LETTERS
D p a n ng nab obu a m n and on o
human b a o y a n v o a on
gynecology
O G NA A
W on o On o og nd b e n e e men
e ommend on g eemen w h n e pe
mu d p n umo bo d
oncology
D m
m
B D m OHCA
m Kw MD K H MD M H M K m MD
M M K m MD M M L m MD M K H K m
MD D MD D MD D R K C
MD D B H O MD D
D m Em M M H
K
D C C C M H
K
T w
A D C D m
M C C M H
G m w G R K
Tw w
C A K H MD D C
D m M C C M
H K G m w G
R K T E m
m @ m m
A
A m O OHCA m
m m w w
T m
m DCA
M T w m K OHCA w
A
CCEPTED
M
A
N
U
SCRIPT
emergency med
nephrology
gastroenterology
ThTh
C
% %
% %
Deve opmen and Va da on o Deep
Lea n ng–based Au oma c De ec on
A go hm o Ma gnan Pu mona y Nodu es
on Ches Rad og aphs
M M M M
M Th M M
M M M M
Th
radiology cardiology
LETTERS
W
Au oma ed deep neu a ne wo k su ve ance o
c an a mages o acu e neu o og c even s
M M M
m M
m m
neurology
NATURE MEDICINE
and the algorithm led to the best accuracy, and the algorithm mark-
edly sped up the review of slides35
. This study is particularly notable,
as the synergy of the combined pathologist and algorithm interpreta-
tion was emphasized instead of the pervasive clinician-versus-algo-
1
Table 2 | FDA AI approvals are accelerating
Company FDA Approval Indication
Apple September 2018 Atrial fibrillation detection
Aidoc August 2018 CT brain bleed diagnosis
iCAD August 2018 Breast density via
mammography
Zebra Medical July 2018 Coronary calcium scoring
Bay Labs June 2018 Echocardiogram EF
determination
Neural Analytics May 2018 Device for paramedic stroke
diagnosis
IDx April 2018 Diabetic retinopathy diagnosis
Icometrix April 2018 MRI brain interpretation
Imagen March 2018 X-ray wrist fracture diagnosis
Viz.ai February 2018 CT stroke diagnosis
Arterys February 2018 Liver and lung cancer (MRI, CT)
diagnosis
MaxQ-AI January 2018 CT brain bleed diagnosis
Alivecor November 2017 Atrial fibrillation detection via
Apple Watch
Arterys January 2017 MRI heart interpretation
NATURE MEDICINE
FDA-approved

AI medical devices
Nature Medicine 2019
•Zebra Medical Vision

•May 2019: pneumothorax detection on X-ray

•June 2019: intracranial hemorrhage on CT

•Aidoc

•May 2019: pulmonary embolism on CT

•June 2019: cervical spine fractures on CT

•GE Healthcare

•Sep 2019: pneumothorax triage on X-ray machine
+
• 1. VUNO Med Bone-Age

• 2. Lunit Insight for lung nodule

• 3. JLK Inspection for Cerebral infarction

• 4. Informeditec Neuro-I for dementia 

• 5. Samsung Electronics for lung nodule

• 6. VUNO Med Deep Brain

• 7. Lunit Insight Mammograpy

• 8. JLK Inspection ATROSCAN

• 9. VUNO Chest X-ray

• 10. Deepnoid Deep Spine

• 11. JLK Inspection Lung CT(JLD-01A)

• 12. JLK Inspection Colonoscopy(JFD-01A)

• 13. JLK Inspection Gastroscopy (JFD-02A)
2018
2019
KFDA-approved

AI medical devices
Artificial Intelligence in medicine is not a future.
It is already here.
Artificial Intelligence in medicine is not a future.
It is already here.
Wrong Question
Who Wins? Who lose? (x)
Will it ‘replace’ doctors? (x)
Right Question
How can we make a better medicine,
based on the collaboration with AI?
The American Medical Association House of
Delegates has adopted policies to keep the focus on
advancing the role of augmented intelligence (AI) in
enhancing patient care, improving population health,
reducing overall costs, increasing value and the support
of professional satisfaction for physicians.
Foundational policy Annual 2018
As a leader in American medicine, our AMA has a
unique opportunity to ensure that the evolution of AI
in medicine benefits patients, physicians and the health
care community. To that end our AMA seeks to:
Leverage ongoing engagement in digital health and
other priority areas for improving patient outcomes
and physician professional satisfaction to help set
priorities for health care AI
Identify opportunities to integrate practicing
physicians’perspectives into the development,
design, validation and implementation of health
care AI
Promote development of thoughtfully designed,
high-quality, clinically validated health care AI that:
• Is designed and evaluated in keeping with best
practices in user-centered design, particularly
for physicians and other members of the health
care team
• Is transparent
• Conforms to leading standards for
reproducibility
• Identifies and takes steps to address bias and
avoids introducing or exacerbating health care
disparities, including when testing or deploying
new AI tools on vulnerable populations
• Safeguards patients’and other individuals’
privacy interests and preserves the security and
integrity of personal information
Encourage education for patients, physicians,
medical students, other health care professionals
and health administrators to promote greater
understanding of the promise and limitations of
health care AI
Explore the legal implications of health care AI,
such as issues of liability or intellectual property,
and advocate for appropriate professional and
governmental oversight for safe, effective, and
equitable use of and access to health care AI
Medical experts are working
to determine the clinical
applications of AI—work that
will guide health care in the
future. These experts, along
with physicians, state and
federal officials must find the
path that ends with better
outcomes for patients. We have
to make sure the technology
does not get ahead of our
humanity and creativity as
physicians.
”—Gerald E. Harmon, MD, AMA Board
of Trustees
“
Policy
Augmented intelligence in health care
https://www.ama-assn.org/system/files/2019-08/ai-2018-board-policy-summary.pdf
Augmented Intelligence,
rather than Artificial Intelligence
Martin Duggan,“IBM Watson Health - Integrated Care & the Evolution to Cognitive Computing”
Which capacity of human physicians can be augmented by AI?
•Analysis of ‘complicated’ medical data

• EMR, genomic data, clinical trial data, insurance claims etc

•Analysis of medical images

• radiology, pathology, ophthalmology, dermatology etc

•Monitoring continuous biomedical data

• sepsis, blood glucose, arrhythmia, cardiac arrest etc
AI in Medicine
•Analysis of ‘complicated’ medical data

• EMR, genomic data, clinical trial data, insurance claims etc

•Analysis of medical images

• radiology, pathology, ophthalmology, gastroenterology etc

•Monitoring continuous biomedical data

• sepsis, blood glucose, arrhythmia, cardiac arrest, AKI etc
AI in Medicine
•Analysis of ‘complicated’ medical data

• EMR, genomic data, clinical trial data, insurance claims etc

•Analysis of medical images

• radiology, pathology, ophthalmology, gastroenterology etc

•Monitoring continuous biomedical data

• sepsis, blood glucose, arrhythmia, cardiac arrest, AKI etc
AI in Medicine
•Drug development

+
•Analysis of ‘complicated’ medical data

• EMR, genomic data, clinical trial data, insurance claims etc

•Analysis of medical images

• radiology, pathology, ophthalmology, gastroenterology etc

•Monitoring continuous biomedical data

• sepsis, blood glucose, arrhythmia, cardiac arrest, AKI etc
AI in Medicine
•Drug development

+
•Analysis of ‘complicated’ medical data

• EMR, genomic data, clinical trial data, insurance claims etc

•Analysis of medical images

• radiology, pathology, ophthalmology, dermatology etc

•Monitoring continuous biomedical data

• sepsis, blood glucose, arrhythmia, cardiac arrest, AKI etc
AI in Medicine
•Drug development

+
Jeopardy!
IBM Watson defeated human quiz champions in 2011
600,000 pieces of medical evidence
2 million pages of text from 42 medical journals and clinical trials
69 guidelines, 61,540 clinical trials
IBM Watson on Medicine
Watson learned...
+
1,500 lung cancer cases
physician notes, lab results and clinical research
+
14,700 hours of hands-on training
Recommended
For consideration
Not Recommended
Evidence button
Lack of Evidence.
WFO in ASCO 2017
Early experience with IBM WFO for lung and colorectal cancer treatment

(Manipal Hospital in India)
Performance of WFO in India
2017 ASCO annual Meeting, J Clin Oncol 35, 2017 (suppl; abstr 8527)
2017 ASCO annual Meeting, J Clin Oncol 35, 2017 (suppl; abstr 8527)
• Concordance between treatment plan of WFO and multidisciplinary tumor board.
• Concordance level or the proportions of recommended / For consideration / Not 



recommended is different from cancer types and stages.
ASCO 2017
ORIGINAL ARTICLE
Watson for Oncology and breast cancer treatment
recommendations: agreement with an expert
multidisciplinary tumor board
S. P. Somashekhar1*, M.-J. Sepu´lveda2
, S. Puglielli3
, A. D. Norden3
, E. H. Shortliffe4
, C. Rohit Kumar1
,
A. Rauthan1
, N. Arun Kumar1
, P. Patil1
, K. Rhee3
& Y. Ramya1
1
Manipal Comprehensive Cancer Centre, Manipal Hospital, Bangalore, India; 2
IBM Research (Retired), Yorktown Heights; 3
Watson Health, IBM Corporation,
Cambridge; 4
Department of Surgical Oncology, College of Health Solutions, Arizona State University, Phoenix, USA
*Correspondence to: Prof. Sampige Prasannakumar Somashekhar, Manipal Comprehensive Cancer Centre, Manipal Hospital, Old Airport Road, Bangalore 560017, Karnataka,
India. Tel: þ91-9845712012; Fax: þ91-80-2502-3759; E-mail: somashekhar.sp@manipalhospitals.com
Background: Breast cancer oncologists are challenged to personalize care with rapidly changing scientific evidence, drug
approvals, and treatment guidelines. Artificial intelligence (AI) clinical decision-support systems (CDSSs) have the potential to
help address this challenge. We report here the results of examining the level of agreement (concordance) between treatment
recommendations made by the AI CDSS Watson for Oncology (WFO) and a multidisciplinary tumor board for breast cancer.
Patients and methods: Treatment recommendations were provided for 638 breast cancers between 2014 and 2016 at the
Manipal Comprehensive Cancer Center, Bengaluru, India. WFO provided treatment recommendations for the identical cases in
2016. A blinded second review was carried out by the center’s tumor board in 2016 for all cases in which there was not
agreement, to account for treatments and guidelines not available before 2016. Treatment recommendations were considered
concordant if the tumor board recommendations were designated ‘recommended’ or ‘for consideration’ by WFO.
Results: Treatment concordance between WFO and the multidisciplinary tumor board occurred in 93% of breast cancer cases.
Subgroup analysis found that patients with stage I or IV disease were less likely to be concordant than patients with stage II or III
disease. Increasing age was found to have a major impact on concordance. Concordance declined significantly (P 0.02;
P < 0.001) in all age groups compared with patients <45 years of age, except for the age group 55–64 years. Receptor status
was not found to affect concordance.
Conclusion: Treatment recommendations made by WFO and the tumor board were highly concordant for breast cancer cases
examined. Breast cancer stage and patient age had significant influence on concordance, while receptor status alone did not.
This study demonstrates that the AI clinical decision-support system WFO may be a helpful tool for breast cancer treatment
decision making, especially at centers where expert breast cancer resources are limited.
Key words: Watson for Oncology, artificial intelligence, cognitive clinical decision-support systems, breast cancer,
concordance, multidisciplinary tumor board
Introduction
Oncologists who treat breast cancer are challenged by a large and
rapidly expanding knowledge base [1, 2]. As of October 2017, for
example, there were 69 FDA-approved drugs for the treatment of
breast cancer, not including combination treatment regimens
[3]. The growth of massive genetic and clinical databases, along
with computing systems to exploit them, will accelerate the speed
of breast cancer treatment advances and shorten the cycle time
for changes to breast cancer treatment guidelines [4, 5]. In add-
ition, these information management challenges in cancer care
are occurring in a practice environment where there is little time
available for tracking and accessing relevant information at the
point of care [6]. For example, a study that surveyed 1117 oncolo-
gists reported that on average 4.6 h per week were spent keeping
VC The Author(s) 2018. Published by Oxford University Press on behalf of the European Society for Medical Oncology.
All rights reserved. For permissions, please email: journals.permissions@oup.com.
Annals of Oncology 29: 418–423, 2018
doi:10.1093/annonc/mdx781
Published online 9 January 2018
Downloaded from https://academic.oup.com/annonc/article-abstract/29/2/418/4781689
by guest
•Annals of Oncology, 2018 January 

•Concordance between WFO and MTB for breast cancer treatment plan

•The first and the only paper, which published in peer-reviewed journal
mendation in 2016; MMDT, Manipal multidisciplinary tumor board; WFO, Watson for Oncology.
31%
18%
1% 2% 33%
5% 31%
6%
0% 10% 20%
Not available Not recommended RecommendedFor consideration
30% 40% 50% 60% 70% 80% 90% 100%
8% 25% 61%
64%
64%
29% 51%
62%
Concordance, 93%
Concordance, 80%
Concordance, 97%
Concordance, 95%
Concordance, 86%
2%
2%
Overall
(n=638)
Stage I
(n=61)
Stage II
(n=262)
Stage III
(n=191)
Stage IV
(n=124)
5%
Figure 1. Treatment concordance between WFO and the MMDT overall and by stage. MMDT, Manipal multidisciplinary tumor board; WFO,
Watson for Oncology.
5%Non-metastatic
HR(+)HER2/neu(+)Triple(–)
Metastatic
Non-metastatic
Metastatic
Non-metastatic
Metastatic
10%
1%
2%
1% 5% 20%
20%10%
0%
Not applicable Not recommended For consideration Recommended
20% 40% 60% 80% 100%
5%
74%
65%
34% 64%
5% 38% 56%
15% 20% 55%
36% 59%
Concordance, 95%
Concordance, 75%
Concordance, 94%
Concordance, 98%
Concordance, 94%
Concordance, 85%
Figure 2. Treatment concordance between WFO and the MMDT by stage and receptor status. HER2/neu, human epidermal growth factor
receptor 2; HR, hormone receptor; MMDT, Manipal multidisciplinary tumor board; WFO, Watson for Oncology.
Volume 29 | Issue 2 | 2018 doi:10.1093/annonc/mdx781 | 421
Downloaded from https://academic.oup.com/annonc/article-abstract/29/2/418/4781689
by guest
on 11 April 2018
• Concordance between treatment plan of WFO and multidisciplinary tumor board.
• Concordance level or the proportions of recommended / For consideration / Not 



recommended is different from cancer stage, receptor status, and metastatic status.
mpowering the Oncology Community for Cancer Care
Genomics
Oncology
Clinical
Trial
Matching
Watson Health’s oncology clients span more than 35 hospital systems
Andrew Norden, KOTRA Conference, March 2017, “The Future of Health is Cognitive”
There’s more
(even with some evidences)
IBM Watson Health
Watson for Clinical Trial Matching (CTM)
18
1. According to the National Comprehensive Cancer Network (NCCN)
2. http://csdd.tufts.edu/files/uploads/02_-_jan_15,_2013_-_recruitment-retention.pdf© 2015 International Business Machines Corporation
Searching across
eligibility criteria of clinical
trials is time consuming
and labor intensive
Current
Challenges
Fewer than 5% of
adult cancer patients
participate in clinical
trials1
37% of sites fail to meet
minimum enrollment
targets. 11% of sites fail
to enroll a single patient 2
The Watson solution
• Uses structured and unstructured
patient data to quickly check
eligibility across relevant clinical
trials
• Provides eligible trial
considerations ranked by
relevance
• Increases speed to qualify
patients
Clinical Investigators
(Opportunity)
• Trials to Patient: Perform
feasibility analysis for a trial
• Identify sites with most
potential for patient enrollment
• Optimize inclusion/exclusion
criteria in protocols
Faster, more efficient
recruitment strategies,
better designed protocols
Point of Care
(Offering)
• Patient to Trials:
Quickly find the
right trial that a
patient might be
eligible for
amongst 100s of
open trials
available
Improve patient care
quality, consistency,
increased efficiencyIBM Confidential
•To process 90 patients, against 3 breast cancer protocols (provided by Novartis)

•Clinical Trial Coordinator: 110 min

•with Watson CTM: 24 min (78% time reduction)

•Watson CTM excluded 94% of the patients automatically, reducing the workload.
ASCO 2017
Watson Genomics Overview
20
Watson Genomics Content
• 20+ Content Sources Including:
• Medical Articles (23Million)
• Drug Information
• Clinical Trial Information
• Genomic Information
Case Sequenced
VCF / MAF, Log2, Dge
Encryption
Molecular Profile
Analysis
Pathway Analysis
Drug Analysis
Service Analysis, Reports, & Visualizations
Kazimierz O.
Wrzeszczynski, PhD*
Mayu O. Frank, NP,
MS*
Takahiko Koyama, PhD*
Kahn Rhrissorrakrai, PhD*
Nicolas Robine, PhD
Filippo Utro, PhD
Anne-Katrin Emde, PhD
Bo-Juen Chen, PhD
Kanika Arora, MS
Minita Shah, MS
Vladimir Vacic, PhD
Raquel Norel, PhD
Erhan Bilal, PhD
Ewa A. Bergmann, MSc
Julia L. Moore Vogel,
PhD
Jeffrey N. Bruce, MD
Andrew B. Lassman, MD
Peter Canoll, MD, PhD
Christian Grommes, MD
Steve Harvey, BS
Laxmi Parida, PhD
Vanessa V. Michelini, BS
Michael C. Zody, PhD
Vaidehi Jobanputra, PhD
Ajay K. Royyuru, PhD
Robert B. Darnell, MD,
Comparing sequencing assays and
human-machine analyses in actionable
genomics for glioblastoma
ABSTRACT
Objective: To analyze a glioblastoma tumor specimen with 3 different platforms and compare
potentially actionable calls from each.
Methods: Tumor DNA was analyzed by a commercial targeted panel. In addition, tumor-normal
DNA was analyzed by whole-genome sequencing (WGS) and tumor RNA was analyzed by RNA
sequencing (RNA-seq). The WGS and RNA-seq data were analyzed by a team of bioinformaticians
and cancer oncologists, and separately by IBM Watson Genomic Analytics (WGA), an automated
system for prioritizing somatic variants and identifying drugs.
Results: More variants were identified by WGS/RNA analysis than by targeted panels. WGA com-
pleted a comparable analysis in a fraction of the time required by the human analysts.
Conclusions: The development of an effective human-machine interface in the analysis of deep
cancer genomic datasets may provide potentially clinically actionable calls for individual pa-
tients in a more timely and efficient manner than currently possible.
ClinicalTrials.gov identifier: NCT02725684. Neurol Genet 2017;3:e164; doi: 10.1212/
NXG.0000000000000164
GLOSSARY
CNV 5 copy number variant; EGFR 5 epidermal growth factor receptor; GATK 5 Genome Analysis Toolkit; GBM 5 glioblas-
toma; IRB 5 institutional review board; NLP 5 Natural Language Processing; NYGC 5 New York Genome Center; RNA-seq 5
RNA sequencing; SNV 5 single nucleotide variant; SV 5 structural variant; TCGA 5 The Cancer Genome Atlas; TPM 5
transcripts per million; VCF 5 variant call file; VUS 5 variants of uncertain significance; WGA 5 Watson Genomic Analytics;
WGS 5 whole-genome sequencing.
The clinical application of next-generation sequencing technology to cancer diagnosis and treat-
ment is in its early stages.1–3
An initial implementation of this technology has been in targeted
panels, where subsets of cancer-relevant and/or highly actionable genes are scrutinized for
potentially actionable mutations. This approach has been widely adopted, offering high redun-
dancy of sequence coverage for the small number of sites of known clinical utility at relatively
Wrzeszczynski KO, Neurol Genet. 2017
• To analyze a glioblastoma tumor specimen,
• 3 different platforms and compared potentially actionable calls
• 1. Human Experts (bioinformaticians and oncologists): WGS and RNA-seq
• 2. Watson Genome Analysis: WGS and RNA-seq
• 3. Foundation One (a commercial targeted panel)
Table 3 List of variants identified as actionable by 3 different platforms
Gene Variant
Identified variant Identified associated drugs
NYGC WGA FO NYGC WGA FO
CDKN2A Deletion Yes Yes Yes Palbociclib, LY2835219
LEE001
Palbociclib LY2835219 Clinical trial
CDKN2B Deletion Yes Yes Yes Palbociclib, LY2835219
LEE002
Palbociclib LY2835219 Clinical trial
EGFR Gain (whole arm) Yes — — Cetuximab — —
ERG Missense P114Q Yes Yes — RI-EIP RI-EIP —
FGFR3 Missense L49V Yes VUS — TK-1258 — —
MET Amplification Yes Yes Yes INC280 Crizotinib, cabozantinib Crizotinib, cabozantinib
MET Frame shift R755fs Yes — — INC280 — —
MET Exon skipping Yes — — INC280 — —
NF1 Deletion Yes — — MEK162 — —
NF1 Nonsense R461* Yes Yes Yes MEK162 MEK162, cobimetinib,
trametinib, GDC-0994
Everolimus, temsirolimus,
trametinib
PIK3R1 Insertion
R562_M563insI
Yes Yes — BKM120 BKM120, LY3023414 —
PTEN Loss (whole arm) Yes — — Everolimus, AZD2014 — —
STAG2 Frame shift R1012 fs Yes Yes Yes Veliparib, clinical trial Olaparib —
DNMT3A Splice site 2083-1G.C — — Yes — — —
TERT Promoter-146C.T Yes — Yes — — —
ABL2 Missense D716N Germline NA VUS
mTOR Missense H1687R Germline NA VUS
NPM1 Missense E169D Germline NA VUS
NTRK1 Missense G18E Germline NA VUS
PTCH1 Missense P1250R Germline NA VUS
TSC1 Missense G1035S Germline NA VUS
Abbreviations: FO 5 FoundationOne; NYGC 5 New York Genome Center; RNA-seq 5 RNA sequencing; WGA 5 Watson Genomic Analytics; WGS 5 whole-
genome sequencing.
Genes, variant description, and, where appropriate, candidate clinically relevant drugs are listed. Variants identified by the FO as variants of uncertain
significance (VUS) were identified by the NYGC as germline variants.
• WGA analysis vastly accelerated the time to discovery of potentially actionable variants from the VCF files.
• WGA was able to provide reports of potentially clinically actionable insights within 10 minutes
• , while human analysis of this patient's VCF file took an estimated 160 hours of person-time
Wrzeszczynski KO, Neurol Genet. 2017
Human experts Watson Genome Analysis Foundation One™
Enhancing Next-Generation Sequencing-Guided Cancer Care Through
Cognitive Computing
NIRALI M. PATEL,a,b,†
VANESSA V. MICHELINI,f,†
JEFF M. SNELL,a,c
SAIANAND BALU,a
ALAN P. HOYLE,a
JOEL S. PARKER,a,c
MICHELE C. HAYWARD,a
DAVID A. EBERHARD,a,b
ASHLEY H. SALAZAR,a
PATRICK MCNEILLIE,g
JIA XU,g
CLAUDIA S. HUETTNER,g
TAKAHIKO KOYAMA,h
FILIPPO UTRO,h
KAHN RHRISSORRAKRAI,h
RAQUEL NOREL,h
ERHAN BILAL,h
AJAY ROYYURU,h
LAXMI PARIDA,h
H. SHELTON EARP,a,d
JUNEKO E. GRILLEY-OLSON,a,d
D. NEIL HAYES,a,d
STEPHEN J. HARVEY,i
NORMAN E. SHARPLESS,a,c,d
WILLIAM Y. KIM
a,c,d,e
a
Lineberger Comprehensive Cancer Center, b
Department of Pathology and Laboratory Medicine, c
Department of Genetics, d
Department of
Medicine, and e
Department of Urology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA; f
IBM Watson Health,
Boca Raton, Florida, USA; g
IBM Watson Health, Cambridge, Massachusetts, USA; h
IBM Research, Yorktown Heights, New York, USA; i
IBM
Watson Health, Herndon, Virginia, USA
†
Contributed equally
Disclosures of potential conflicts of interest may be found at the end of this article.
Key Words. Genomics • High-throughput nucleotide sequencing • Artificial intelligence • Precision medicine
ABSTRACT
Background. Using next-generation sequencing (NGS) to guide
cancer therapy has created challenges in analyzing and report-
ing large volumes of genomic data to patients and caregivers.
Specifically, providing current, accurate information on newly
approved therapies and open clinical trials requires consider-
able manual curation performed mainly by human “molecular
tumor boards” (MTBs). The purpose of this study was to deter-
mine the utility of cognitive computing as performed by Wat-
son for Genomics (WfG) compared with a human MTB.
Materials and Methods. One thousand eighteen patient cases
that previously underwent targeted exon sequencing at the
University of North Carolina (UNC) and subsequent analysis
by the UNCseq informatics pipeline and the UNC MTB
between November 7, 2011, and May 12, 2015, were
analyzed with WfG, a cognitive computing technology for
genomic analysis.
Results. Using a WfG-curated actionable gene list, we identified
additional genomic events of potential significance (not discov-
ered by traditional MTB curation) in 323 (32%) patients. The
majority of these additional genomic events were considered
actionable based upon their ability to qualify patients for
biomarker-selected clinical trials. Indeed, the opening of a rele-
vant clinical trial within 1 month prior to WfG analysis provided
the rationale for identification of a new actionable event in
nearly a quarter of the 323 patients. This automated analysis
took <3 minutes per case.
Conclusion. These results demonstrate that the interpretation
and actionability of somatic NGS results are evolving too rapidly
to rely solely on human curation. Molecular tumor boards
empowered by cognitive computing could potentially improve
patient care by providing a rapid, comprehensive approach for
data analysis and consideration of up-to-date availability of
clinical trials.The Oncologist 2018;23:179–185
Implications for Practice: The results of this study demonstrate that the interpretation and actionability of somatic next-generation
sequencing results are evolving too rapidly to rely solely on human curation. Molecular tumor boards empowered by cognitive
computing can significantly improve patient care by providing a fast, cost-effective, and comprehensive approach for data analysis
in the delivery of precision medicine. Patients and physicians who are considering enrollment in clinical trials may benefit from the
support of such tools applied to genomic data.
INTRODUCTION
Cancer Diagnostics and Molecular Pathology
byguestonSeptember16,2019http://theoncologist.alphamedpress.org/Downloadedfrom
• 1,018 patients cases (targeted exon sequencing)
• previously analyzed by UNCseq informatics pipeline and MTB
• Watson for Genomics
• identified additional actionable genetic alteration in 323 (32%) patients.
• novel findings are mostly due to recent openings of clinical trials.
• this analysis took < 3 min per case.
Patel NM, Oncologist. 2018
Enhancing Next-Generation Sequencing-Guided Cancer Care Through
Cognitive Computing
NIRALI M. PATEL,a,b,†
VANESSA V. MICHELINI,f,†
JEFF M. SNELL,a,c
SAIANAND BALU,a
ALAN P. HOYLE,a
JOEL S. PARKER,a,c
MICHELE C. HAYWARD,a
DAVID A. EBERHARD,a,b
ASHLEY H. SALAZAR,a
PATRICK MCNEILLIE,g
JIA XU,g
CLAUDIA S. HUETTNER,g
TAKAHIKO KOYAMA,h
FILIPPO UTRO,h
KAHN RHRISSORRAKRAI,h
RAQUEL NOREL,h
ERHAN BILAL,h
AJAY ROYYURU,h
LAXMI PARIDA,h
H. SHELTON EARP,a,d
JUNEKO E. GRILLEY-OLSON,a,d
D. NEIL HAYES,a,d
STEPHEN J. HARVEY,i
NORMAN E. SHARPLESS,a,c,d
WILLIAM Y. KIM
a,c,d,e
a
Lineberger Comprehensive Cancer Center, b
Department of Pathology and Laboratory Medicine, c
Department of Genetics, d
Department of
Medicine, and e
Department of Urology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA; f
IBM Watson Health,
Boca Raton, Florida, USA; g
IBM Watson Health, Cambridge, Massachusetts, USA; h
IBM Research, Yorktown Heights, New York, USA; i
IBM
Watson Health, Herndon, Virginia, USA
†
Contributed equally
Disclosures of potential conflicts of interest may be found at the end of this article.
Key Words. Genomics • High-throughput nucleotide sequencing • Artificial intelligence • Precision medicine
ABSTRACT
Background. Using next-generation sequencing (NGS) to guide
cancer therapy has created challenges in analyzing and report-
ing large volumes of genomic data to patients and caregivers.
Specifically, providing current, accurate information on newly
approved therapies and open clinical trials requires consider-
able manual curation performed mainly by human “molecular
tumor boards” (MTBs). The purpose of this study was to deter-
mine the utility of cognitive computing as performed by Wat-
son for Genomics (WfG) compared with a human MTB.
Materials and Methods. One thousand eighteen patient cases
that previously underwent targeted exon sequencing at the
University of North Carolina (UNC) and subsequent analysis
by the UNCseq informatics pipeline and the UNC MTB
between November 7, 2011, and May 12, 2015, were
analyzed with WfG, a cognitive computing technology for
genomic analysis.
Results. Using a WfG-curated actionable gene list, we identified
additional genomic events of potential significance (not discov-
ered by traditional MTB curation) in 323 (32%) patients. The
majority of these additional genomic events were considered
actionable based upon their ability to qualify patients for
biomarker-selected clinical trials. Indeed, the opening of a rele-
vant clinical trial within 1 month prior to WfG analysis provided
the rationale for identification of a new actionable event in
nearly a quarter of the 323 patients. This automated analysis
took <3 minutes per case.
Conclusion. These results demonstrate that the interpretation
and actionability of somatic NGS results are evolving too rapidly
to rely solely on human curation. Molecular tumor boards
empowered by cognitive computing could potentially improve
patient care by providing a rapid, comprehensive approach for
data analysis and consideration of up-to-date availability of
clinical trials.The Oncologist 2018;23:179–185
Implications for Practice: The results of this study demonstrate that the interpretation and actionability of somatic next-generation
sequencing results are evolving too rapidly to rely solely on human curation. Molecular tumor boards empowered by cognitive
computing can significantly improve patient care by providing a fast, cost-effective, and comprehensive approach for data analysis
in the delivery of precision medicine. Patients and physicians who are considering enrollment in clinical trials may benefit from the
support of such tools applied to genomic data.
INTRODUCTION
Cancer Diagnostics and Molecular Pathology
byguestonSeptember16,2019http://theoncologist.alphamedpress.org/Downloadedfrom
UNCseq pts
(1,018)
UNCseq pts
Actionable gene
(703)
UNCseq pts
No actionable gene
(315)
WfG
Actionable gene
(231)
WfG
Actionable gene
(96)
WfG
Actionable gene
(327)
WfG-identified actionable
gene approved by CCGR
(323)
Potential to change therapy (47)
No evidence of disease (145)
Lost to follow-up (29)
Withdrew from study (4)
Deceased (98)
A B
C D E
Figure 3. Sankey diagram of the flow of the UNCseq molecular tumor board (MTB) and WfG comparison. Of the 1,018 patients previously ana-
lyzed by the University of North Carolina (UNC) MTB, 703 were determined to have alterations in genes that met the UNC MTB definition of
actionability (A) and 315 did not (B).The WfG analysis suggested that an additional eight genes not previously defined as actionable should be
182 Enhancing Cancer Care Through Cognitive Computing
http://theoncDownloadedfrom
• 1,018 patients cases (targeted exon sequencing)
• previously analyzed by UNCseq informatics pipeline and MTB
• Watson for Genomics
• identified additional actionable genetic alteration in 323 (32%) patients.
• novel findings are mostly due to recent openings of clinical trials.
• this analysis took < 3 min per case.
Patel NM, Oncologist. 2018
•Analysis of ‘complicated’ medical data

• EMR, genomic data, clinical trial data, insurance claims etc

•Analysis of medical images

• radiology, pathology, ophthalmology, gastroenterology etc

•Monitoring continuous biomedical data

• sepsis, blood glucose, arrhythmia, cardiac arrest etc
AI in Medicine
•Drug development

+
Deep Learning
https://blogs.oracle.com/bigdata/difference-ai-machine-learning-deep-learning
• Additive effect with human physicians (accuracy, time, cost…) 

• Improvement in patient outcome

• Prospective RCT

• Real-world clinical setting validation

• Something beyond human’s visual perception
How to show clinical impact?
NATURE MEDICINE
and the algorithm led to the best accuracy, and the algorithm mark-
edly sped up the review of slides35
. This study is particularly notable,
as the synergy of the combined pathologist and algorithm interpreta-
tion was emphasized instead of the pervasive clinician-versus-algo-
1
Table 2 | FDA AI approvals are accelerating
Company FDA Approval Indication
Apple September 2018 Atrial fibrillation detection
Aidoc August 2018 CT brain bleed diagnosis
iCAD August 2018 Breast density via
mammography
Zebra Medical July 2018 Coronary calcium scoring
Bay Labs June 2018 Echocardiogram EF
determination
Neural Analytics May 2018 Device for paramedic stroke
diagnosis
IDx April 2018 Diabetic retinopathy diagnosis
Icometrix April 2018 MRI brain interpretation
Imagen March 2018 X-ray wrist fracture diagnosis
Viz.ai February 2018 CT stroke diagnosis
Arterys February 2018 Liver and lung cancer (MRI, CT)
diagnosis
MaxQ-AI January 2018 CT brain bleed diagnosis
Alivecor November 2017 Atrial fibrillation detection via
Apple Watch
Arterys January 2017 MRI heart interpretation
NATURE MEDICINE
FDA-approved

AI medical devices
Nature Medicine 2019
•Zebra Medical Vision

•May 2019: pneumothorax detection on X-ray

•June 2019: intracranial hemorrhage on CT

•Aidoc

•May 2019: pulmonary embolism on CT

•June 2019: cervical spine fractures on CT

•GE Healthcare

•Sep 2019: pneumothorax triage on X-ray machine
+
•FDA approved at least 30 AI medical devices for last 3 years. 

•Most of the devices are intended to analyze medical images.
Nature Medicine 2019
• 1. VUNO Med Bone-Age

• 2. Lunit Insight for lung nodule

• 3. JLK Inspection for Cerebral infarction

• 4. Informeditec Neuro-I for dementia 

• 5. Samsung Electronics for lung nodule

• 6. VUNO Med Deep Brain

• 7. Lunit Insight Mammograpy

• 8. JLK Inspection ATROSCAN

• 9. VUNO Chest X-ray

• 10. Deepnoid Deep Spine

• 11. JLK Inspection Lung CT(JLD-01A)

• 12. JLK Inspection Colonoscopy(JFD-01A)

• 13. JLK Inspection Gastroscopy (JFD-02A)
2018
2019
KFDA-approved

AI medical devices
• 1. VUNO Med Bone-Age

• 2. Lunit Insight for lung nodule

• 3. JLK Inspection for Cerebral infarction

• 4. Informeditec Neuro-I for dementia 

• 5. Samsung Electronics for lung nodule

• 6. VUNO Med Deep Brain

• 7. Lunit Insight Mammograpy

• 8. JLK Inspection ATROSCAN

• 9. VUNO Chest X-ray

• 10. Deepnoid Deep Spine

• 11. JLK Inspection Lung CT(JLD-01A)

• 12. JLK Inspection Colonoscopy(JFD-01A)

• 13. JLK Inspection Gastroscopy (JFD-02A)
2018
2019
KFDA-approved

AI medical devices
•KFDA (MFDS) approved at least 13 AI medical devices for last 2 years. 

•All of the devices are intended to analyze medical images.
Radiology
Radiology
1. CAD: Computer Aided Detection

2. Triage: Prioritization of Critical cases

3. Image driven biomarker
Radiology
1. CAD: Computer Aided Detection

2. Triage: Prioritization of Critical cases

3. Image driven biomarker
This copy is for personal use only.
To order printed copies, contact reprints@rsna.org
This copy is for personal use only.
To order printed copies, contact reprints@rsna.org
ORIGINAL RESEARCH • THORACIC IMAGING
hest radiography, one of the most common diagnos- intraobserver agreements because of its limited spatial reso-
Development and Validation of Deep
Learning–based Automatic Detection
Algorithm for Malignant Pulmonary Nodules
on Chest Radiographs
Ju Gang Nam, MD* • Sunggyun Park, PhD* • Eui Jin Hwang, MD • Jong Hyuk Lee, MD • Kwang-Nam Jin, MD,
PhD • KunYoung Lim, MD, PhD • Thienkai HuyVu, MD, PhD • Jae Ho Sohn, MD • Sangheum Hwang, PhD • Jin
Mo Goo, MD, PhD • Chang Min Park, MD, PhD
From the Department of Radiology and Institute of Radiation Medicine, Seoul National University Hospital and College of Medicine, 101 Daehak-ro, Jongno-gu, Seoul
03080, Republic of Korea (J.G.N., E.J.H., J.M.G., C.M.P.); Lunit Incorporated, Seoul, Republic of Korea (S.P.); Department of Radiology, Armed Forces Seoul Hospital,
Seoul, Republic of Korea (J.H.L.); Department of Radiology, Seoul National University Boramae Medical Center, Seoul, Republic of Korea (K.N.J.); Department of
Radiology, National Cancer Center, Goyang, Republic of Korea (K.Y.L.); Department of Radiology and Biomedical Imaging, University of California, San Francisco,
San Francisco, Calif (T.H.V., J.H.S.); and Department of Industrial & Information Systems Engineering, Seoul National University of Science and Technology, Seoul,
Republic of Korea (S.H.). Received January 30, 2018; revision requested March 20; revision received July 29; accepted August 6. Address correspondence to C.M.P.
(e-mail: cmpark.morphius@gmail.com).
Study supported by SNUH Research Fund and Lunit (06–2016–3000) and by Seoul Research and Business Development Program (FI170002).
*J.G.N. and S.P. contributed equally to this work.
Conflicts of interest are listed at the end of this article.
Radiology 2018; 00:1–11 • https://doi.org/10.1148/radiol.2018180237 • Content codes:
Purpose: To develop and validate a deep learning–based automatic detection algorithm (DLAD) for malignant pulmonary nodules
on chest radiographs and to compare its performance with physicians including thoracic radiologists.
Materials and Methods: For this retrospective study, DLAD was developed by using 43292 chest radiographs (normal radiograph–
to–nodule radiograph ratio, 34067:9225) in 34676 patients (healthy-to-nodule ratio, 30784:3892; 19230 men [mean age, 52.8
years; age range, 18–99 years]; 15446 women [mean age, 52.3 years; age range, 18–98 years]) obtained between 2010 and 2015,
which were labeled and partially annotated by 13 board-certified radiologists, in a convolutional neural network. Radiograph clas-
sification and nodule detection performances of DLAD were validated by using one internal and four external data sets from three
South Korean hospitals and one U.S. hospital. For internal and external validation, radiograph classification and nodule detection
performances of DLAD were evaluated by using the area under the receiver operating characteristic curve (AUROC) and jackknife
alternative free-response receiver-operating characteristic (JAFROC) figure of merit (FOM), respectively. An observer performance
test involving 18 physicians, including nine board-certified radiologists, was conducted by using one of the four external validation
data sets. Performances of DLAD, physicians, and physicians assisted with DLAD were evaluated and compared.
Results: According to one internal and four external validation data sets, radiograph classification and nodule detection perfor-
mances of DLAD were a range of 0.92–0.99 (AUROC) and 0.831–0.924 (JAFROC FOM), respectively. DLAD showed a higher
AUROC and JAFROC FOM at the observer performance test than 17 of 18 and 15 of 18 physicians, respectively (P , .05), and
all physicians showed improved nodule detection performances with DLAD (mean JAFROC FOM improvement, 0.043; range,
0.006–0.190; P , .05).
Conclusion: This deep learning–based automatic detection algorithm outperformed physicians in radiograph classification and nod-
ule detection performance for malignant pulmonary nodules on chest radiographs, and it enhanced physicians’ performances when
used as a second reader.
©RSNA, 2018
Online supplemental material is available for this article.
This copy is for personal use only.
To order printed copies, contact reprints@rsna.org
This copy is for personal use only.
To order printed copies, contact reprints@rsna.org
ORIGINAL RESEARCH • THORACIC IMAGING
hest radiography, one of the most common diagnos- intraobserver agreements because of its limited spatial reso-
Development and Validation of Deep
Learning–based Automatic Detection
Algorithm for Malignant Pulmonary Nodules
on Chest Radiographs
Ju Gang Nam, MD* • Sunggyun Park, PhD* • Eui Jin Hwang, MD • Jong Hyuk Lee, MD • Kwang-Nam Jin, MD,
PhD • KunYoung Lim, MD, PhD • Thienkai HuyVu, MD, PhD • Jae Ho Sohn, MD • Sangheum Hwang, PhD • Jin
Mo Goo, MD, PhD • Chang Min Park, MD, PhD
From the Department of Radiology and Institute of Radiation Medicine, Seoul National University Hospital and College of Medicine, 101 Daehak-ro, Jongno-gu, Seoul
03080, Republic of Korea (J.G.N., E.J.H., J.M.G., C.M.P.); Lunit Incorporated, Seoul, Republic of Korea (S.P.); Department of Radiology, Armed Forces Seoul Hospital,
Seoul, Republic of Korea (J.H.L.); Department of Radiology, Seoul National University Boramae Medical Center, Seoul, Republic of Korea (K.N.J.); Department of
Radiology, National Cancer Center, Goyang, Republic of Korea (K.Y.L.); Department of Radiology and Biomedical Imaging, University of California, San Francisco,
San Francisco, Calif (T.H.V., J.H.S.); and Department of Industrial & Information Systems Engineering, Seoul National University of Science and Technology, Seoul,
Republic of Korea (S.H.). Received January 30, 2018; revision requested March 20; revision received July 29; accepted August 6. Address correspondence to C.M.P.
(e-mail: cmpark.morphius@gmail.com).
Study supported by SNUH Research Fund and Lunit (06–2016–3000) and by Seoul Research and Business Development Program (FI170002).
*J.G.N. and S.P. contributed equally to this work.
Conflicts of interest are listed at the end of this article.
Radiology 2018; 00:1–11 • https://doi.org/10.1148/radiol.2018180237 • Content codes:
Purpose: To develop and validate a deep learning–based automatic detection algorithm (DLAD) for malignant pulmonary nodules
on chest radiographs and to compare its performance with physicians including thoracic radiologists.
Materials and Methods: For this retrospective study, DLAD was developed by using 43292 chest radiographs (normal radiograph–
to–nodule radiograph ratio, 34067:9225) in 34676 patients (healthy-to-nodule ratio, 30784:3892; 19230 men [mean age, 52.8
years; age range, 18–99 years]; 15446 women [mean age, 52.3 years; age range, 18–98 years]) obtained between 2010 and 2015,
which were labeled and partially annotated by 13 board-certified radiologists, in a convolutional neural network. Radiograph clas-
sification and nodule detection performances of DLAD were validated by using one internal and four external data sets from three
South Korean hospitals and one U.S. hospital. For internal and external validation, radiograph classification and nodule detection
performances of DLAD were evaluated by using the area under the receiver operating characteristic curve (AUROC) and jackknife
alternative free-response receiver-operating characteristic (JAFROC) figure of merit (FOM), respectively. An observer performance
test involving 18 physicians, including nine board-certified radiologists, was conducted by using one of the four external validation
data sets. Performances of DLAD, physicians, and physicians assisted with DLAD were evaluated and compared.
Results: According to one internal and four external validation data sets, radiograph classification and nodule detection perfor-
mances of DLAD were a range of 0.92–0.99 (AUROC) and 0.831–0.924 (JAFROC FOM), respectively. DLAD showed a higher
AUROC and JAFROC FOM at the observer performance test than 17 of 18 and 15 of 18 physicians, respectively (P , .05), and
all physicians showed improved nodule detection performances with DLAD (mean JAFROC FOM improvement, 0.043; range,
0.006–0.190; P , .05).
Conclusion: This deep learning–based automatic detection algorithm outperformed physicians in radiograph classification and nod-
ule detection performance for malignant pulmonary nodules on chest radiographs, and it enhanced physicians’ performances when
used as a second reader.
©RSNA, 2018
Online supplemental material is available for this article.
• 43,292 chest PA (normal:nodule=34,067:9225)

• labeled/annotated by 13 board-certified radiologists.

• DLAD were validated 1 internal + 4 external datasets 

• Classification / Lesion localization 

• AI vs. Physician vs. Physician+AI

• Compared various level of physicians

• Non-radiology / radiology residents 

• Board-certified radiologist / Thoracic radiologists
Nam et al
Figure 1: Images in a 78-year-old female patient with a 1.9-cm part-solid nodule at the left upper lobe. (a) The nodule was faintly visible on the
chest radiograph (arrowheads) and was detected by 11 of 18 observers. (b) At contrast-enhanced CT examination, biopsy confirmed lung adeno-
carcinoma (arrow). (c) DLAD reported the nodule with a confidence level of 2, resulting in its detection by an additional five radiologists and an
elevation in its confidence by eight radiologists.
Figure 2: Images in a 64-year-old male patient with a 2.2-cm lung adenocarcinoma at the left upper lobe. (a) The nodule was faintly visible on
the chest radiograph (arrowheads) and was detected by seven of 18 observers. (b) Biopsy confirmed lung adenocarcinoma in the left upper lobe
on contrast-enhanced CT image (arrow). (c) DLAD reported the nodule with a confidence level of 2, resulting in its detection by an additional two
radiologists and an elevated confidence level of the nodule by two radiologists.
Deep Learning Automatic Detection Algorithm for Malignant Pulmonary Nodules
Table 3: Patient Classification and Nodule Detection at the Observer Performance Test
Observer
Test 1
DLAD versus Test 1
(P Value) Test 2
Test 1 versus Test 2 (P
Value)
Radiograph
Classification
(AUROC)
Nodule
Detection
(JAFROC FOM)
Radiograph
Classification
Nodule
Detection
Radiograph
Classification
(AUROC)
Nodule
Detection
(JAFROC
FOM)
Radiograph
Classification
Nodule
Detection
Nonradiology
physicians
Observer 1 0.77 0.716 ,.001 ,.001 0.91 0.853 ,.001 ,.001
Observer 2 0.78 0.657 ,.001 ,.001 0.90 0.846 ,.001 ,.001
Observer 3 0.80 0.700 ,.001 ,.001 0.88 0.783 ,.001 ,.001
Group 0.691 ,.001* 0.828 ,.001*
Radiology residents
Observer 4 0.78 0.767 ,.001 ,.001 0.80 0.785 .02 .03
Observer 5 0.86 0.772 .001 ,.001 0.91 0.837 .02 ,.001
Observer 6 0.86 0.789 .05 .002 0.86 0.799 .08 .54
Observer 7 0.84 0.807 .01 .003 0.91 0.843 .003 .02
Observer 8 0.87 0.797 .10 .003 0.90 0.845 .03 .001
Observer 9 0.90 0.847 .52 .12 0.92 0.867 .04 .03
Group 0.790 ,.001* 0.867 ,.001*
Board-certified
radiologists
Observer 10 0.87 0.836 .05 .01 0.90 0.865 .004 .002
Observer 11 0.83 0.804 ,.001 ,.001 0.84 0.817 .03 .04
Observer 12 0.88 0.817 .18 .005 0.91 0.841 .01 .01
Observer 13 0.91 0.824 ..99 .02 0.92 0.836 .51 .24
Observer 14 0.88 0.834 .14 .03 0.88 0.840 .87 .23
Group 0.821 .02* 0.840 .01*
Thoracic radiologists
Observer 15 0.94 0.856 .15 .21 0.96 0.878 .08 .03
Observer 16 0.92 0.854 .60 .17 0.93 0.872 .34 .02
Observer 17 0.86 0.820 .02 .01 0.88 0.838 .14 .12
Observer 18 0.84 0.800 ,.001 ,.001 0.87 0.827 .02 .02
Group 0.833 .08* 0.854 ,.001*
Note.—Observer 4 had 1 year of experience; observers 5 and 6 had 2 years of experience; observers 7–9 had 3 years of experience; observers
10–12 had 7 years of experience; observers 13 and 14 had 8 years of experience; observer 15 had 26 years of experience; observer 16 had 13
years of experience; and observers 17 and 18 had 9 years of experience. Observers 1–3 were 4th-year residents from obstetrics and gynecolo-
AI vs. Physician

(p value)
Physician vs. Physician + AI

(p value)Physician
Non-radiology Residents
Radiology Residents
Board-certified Radiologists
Thoracic Radiologists
Physician + AI
Deep Learning Automatic Detection Algorithm for Malignant Pulmonary Nodules
Table 3: Patient Classification and Nodule Detection at the Observer Performance Test
Observer
Test 1
DLAD versus Test 1
(P Value) Test 2
Test 1 versus Test 2 (P
Value)
Radiograph
Classification
(AUROC)
Nodule
Detection
(JAFROC FOM)
Radiograph
Classification
Nodule
Detection
Radiograph
Classification
(AUROC)
Nodule
Detection
(JAFROC
FOM)
Radiograph
Classification
Nodule
Detection
Nonradiology
physicians
Observer 1 0.77 0.716 ,.001 ,.001 0.91 0.853 ,.001 ,.001
Observer 2 0.78 0.657 ,.001 ,.001 0.90 0.846 ,.001 ,.001
Observer 3 0.80 0.700 ,.001 ,.001 0.88 0.783 ,.001 ,.001
Group 0.691 ,.001* 0.828 ,.001*
Radiology residents
Observer 4 0.78 0.767 ,.001 ,.001 0.80 0.785 .02 .03
Observer 5 0.86 0.772 .001 ,.001 0.91 0.837 .02 ,.001
Observer 6 0.86 0.789 .05 .002 0.86 0.799 .08 .54
Observer 7 0.84 0.807 .01 .003 0.91 0.843 .003 .02
Observer 8 0.87 0.797 .10 .003 0.90 0.845 .03 .001
Observer 9 0.90 0.847 .52 .12 0.92 0.867 .04 .03
Group 0.790 ,.001* 0.867 ,.001*
Board-certified
radiologists
Observer 10 0.87 0.836 .05 .01 0.90 0.865 .004 .002
Observer 11 0.83 0.804 ,.001 ,.001 0.84 0.817 .03 .04
Observer 12 0.88 0.817 .18 .005 0.91 0.841 .01 .01
Observer 13 0.91 0.824 ..99 .02 0.92 0.836 .51 .24
Observer 14 0.88 0.834 .14 .03 0.88 0.840 .87 .23
Group 0.821 .02* 0.840 .01*
Thoracic radiologists
Observer 15 0.94 0.856 .15 .21 0.96 0.878 .08 .03
Observer 16 0.92 0.854 .60 .17 0.93 0.872 .34 .02
Observer 17 0.86 0.820 .02 .01 0.88 0.838 .14 .12
Observer 18 0.84 0.800 ,.001 ,.001 0.87 0.827 .02 .02
Group 0.833 .08* 0.854 ,.001*
Note.—Observer 4 had 1 year of experience; observers 5 and 6 had 2 years of experience; observers 7–9 had 3 years of experience; observers
10–12 had 7 years of experience; observers 13 and 14 had 8 years of experience; observer 15 had 26 years of experience; observer 16 had 13
years of experience; and observers 17 and 18 had 9 years of experience. Observers 1–3 were 4th-year residents from obstetrics and gynecolo-
AI vs. Physician

(p value)
Physician vs. Physician + AI

(p value)Physician
Non-radiology Residents
Radiology Residents
Board-certified Radiologists
Thoracic Radiologists
•As ‘a second reader’, AI improved lung nodule detection,

•classification: 17 of 18 (15 of 18, P<0.05)

•nodule detection: 18 of 18 (14 of 18, P<0.05)
Physician + AI
HUMAN ONLY
HUMAN + ALGORITHM
HUMAN ONLY
HUMAN + ALGORITHM
HUMAN ONLY
HUMAN + ALGORITHM
HUMAN ONLY
HUMAN + ALGORITHM
HUMAN ONLY
HUMAN + ALGORITHM
Clinical Study Results (CXR Nodule) - Radiology
On the courtesy of Lunit, Inc
•Improvement with AI, as ‘a second reader’, 



was most substantial for Non-Radiology Physicians, 



compared to board-certified radiologist or radiology residents.
Radiology
1. CAD: Computer Aided Detection

2. Triage: Prioritization of Critical cases

3. Image driven biomarker
“Something beyond human’s visual perception”
https://venturebeat.com/2019/05/07/mit-csails-ai-can-predict-the-onset-of-breast-cancer-5-years-in-advance/
ORIGINAL RESEARCH BREAST IMAGING
S
ince the creation of the Gail model in 1989 (1), risk
models have supported risk-adjusted screening and pre-
vention and their continued evolution has been a central
pillar of breast cancer research (1–8). Previous research
(2,3) explored multiple risk factors related to hormonal
and genetic information. Mammographic breast density,
which relates to the amount of fibroglandular tissue in a
woman’s breast, is a risk factor that received substantial at-
tention. Brentnall et al (8) incorporated mammographic
breast density into the Gail risk model and Tyrer-Cuzick
model (TC), improving their areas under the receiver op-
erating characteristic curve (AUCs) from 0.55 and 0.57 to
0.59 and 0.61, respectively.
The use of breast density as a proxy for the detailed in-
mammography with vastly different outcomes. Whereas
previous studies (10–12) explored automated methods to
assess breast density, these efforts reduced the mammo-
graphic input into a few statistics largely related to volume
of glandular tissue that are not sufficient to distinguish pa-
tients who will and will not develop breast cancer.
We hypothesize that there are subtle but informa-
tive cues on mammograms that may not be discernible
by humans or simple volume-of-density measurements,
and deep learning (DL) can leverage these cues to yield
improved risk models. Therefore, we developed a DL
model that operates over a full-field mammographic im-
age to assess a patient’s future breast cancer risk. Rather
than manually identifying discriminative image patterns,
A Deep Learning Mammography-based Model for
Improved Breast Cancer Risk Prediction
From the Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, 32 Vassar St, 32-G484, Cambridge, MA 02139 (A.Y.,
T.S., T.P., R.B.); and Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Mass (C.L.). Received November 28, 2018; revision
requested January 18, 2019; revision received March 14; accepted March 18. Address correspondence to A.Y. (e-mail: adamyala@csail.mit.edu).
Conflicts of interest are listed at the end of this article.
See also the editorial by Sitek and Wolfe in this issue.
Radiology 2019; 00:1–7 https://doi.org/10.1148/radiol.2019182716 Content code:
Background: Mammographic density improves the accuracy of breast cancer risk models. However, the use of breast density is lim-
ited by subjective assessment, variation across radiologists, and restricted data. A mammography-based deep learning (DL) model
may provide more accurate risk prediction.
Purpose: To develop a mammography-based DL breast cancer risk model that is more accurate than established clinical breast can-
cer risk models.
Materials and Methods: This retrospective study included 88994 consecutive screening mammograms in 39571 women between
January 1, 2009, and December 31, 2012. For each patient, all examinations were assigned to either training, validation, or test
sets, resulting in 71689, 8554, and 8751 examinations, respectively. Cancer outcomes were obtained through linkage to a regional
tumor registry. By using risk factor information from patient questionnaires and electronic medical records review, three models
were developed to assess breast cancer risk within 5 years: a risk-factor-based logistic regression model (RF-LR) that used traditional
risk factors, a DL model (image-only DL) that used mammograms alone, and a hybrid DL model that used both traditional risk
factors and mammograms. Comparisons were made to an established breast cancer risk model that included breast density (Tyrer-
Cuzick model, version 8 [TC]). Model performance was compared by using areas under the receiver operating characteristic curve
(AUCs) with DeLong test (P , .05).
Results: The test set included 3937 women, aged 56.20 years 6 10.04. Hybrid DL and image-only DL showed AUCs of 0.70
(95% confidence interval [CI]: 0.66, 0.75) and 0.68 (95% CI: 0.64, 0.73), respectively. RF-LR and TC showed AUCs of 0.67
(95% CI: 0.62, 0.72) and 0.62 (95% CI: 0.57, 0.66), respectively. Hybrid DL showed a significantly higher AUC (0.70) than TC
(0.62; P , .001) and RF-LR (0.67; P = .01).
Conclusion: Deep learning models that use full-field mammograms yield substantially improved risk discrimination compared with
the Tyrer-Cuzick (version 8) model.
©RSNA, 2019
Online supplemental material is available for this article.
Radiology 2019
ORIGINAL RESEARCH BREAST IMAGING
S
ince the creation of the Gail model in 1989 (1), risk
models have supported risk-adjusted screening and pre-
vention and their continued evolution has been a central
pillar of breast cancer research (1–8). Previous research
(2,3) explored multiple risk factors related to hormonal
and genetic information. Mammographic breast density,
which relates to the amount of fibroglandular tissue in a
woman’s breast, is a risk factor that received substantial at-
tention. Brentnall et al (8) incorporated mammographic
breast density into the Gail risk model and Tyrer-Cuzick
model (TC), improving their areas under the receiver op-
erating characteristic curve (AUCs) from 0.55 and 0.57 to
0.59 and 0.61, respectively.
The use of breast density as a proxy for the detailed in-
mammography with vastly different outcomes. Whereas
previous studies (10–12) explored automated methods to
assess breast density, these efforts reduced the mammo-
graphic input into a few statistics largely related to volume
of glandular tissue that are not sufficient to distinguish pa-
tients who will and will not develop breast cancer.
We hypothesize that there are subtle but informa-
tive cues on mammograms that may not be discernible
by humans or simple volume-of-density measurements,
and deep learning (DL) can leverage these cues to yield
improved risk models. Therefore, we developed a DL
model that operates over a full-field mammographic im-
age to assess a patient’s future breast cancer risk. Rather
than manually identifying discriminative image patterns,
A Deep Learning Mammography-based Model for
Improved Breast Cancer Risk Prediction
From the Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, 32 Vassar St, 32-G484, Cambridge, MA 02139 (A.Y.,
T.S., T.P., R.B.); and Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Mass (C.L.). Received November 28, 2018; revision
requested January 18, 2019; revision received March 14; accepted March 18. Address correspondence to A.Y. (e-mail: adamyala@csail.mit.edu).
Conflicts of interest are listed at the end of this article.
See also the editorial by Sitek and Wolfe in this issue.
Radiology 2019; 00:1–7 https://doi.org/10.1148/radiol.2019182716 Content code:
Background: Mammographic density improves the accuracy of breast cancer risk models. However, the use of breast density is lim-
ited by subjective assessment, variation across radiologists, and restricted data. A mammography-based deep learning (DL) model
may provide more accurate risk prediction.
Purpose: To develop a mammography-based DL breast cancer risk model that is more accurate than established clinical breast can-
cer risk models.
Materials and Methods: This retrospective study included 88994 consecutive screening mammograms in 39571 women between
January 1, 2009, and December 31, 2012. For each patient, all examinations were assigned to either training, validation, or test
sets, resulting in 71689, 8554, and 8751 examinations, respectively. Cancer outcomes were obtained through linkage to a regional
tumor registry. By using risk factor information from patient questionnaires and electronic medical records review, three models
were developed to assess breast cancer risk within 5 years: a risk-factor-based logistic regression model (RF-LR) that used traditional
risk factors, a DL model (image-only DL) that used mammograms alone, and a hybrid DL model that used both traditional risk
factors and mammograms. Comparisons were made to an established breast cancer risk model that included breast density (Tyrer-
Cuzick model, version 8 [TC]). Model performance was compared by using areas under the receiver operating characteristic curve
(AUCs) with DeLong test (P , .05).
Results: The test set included 3937 women, aged 56.20 years 6 10.04. Hybrid DL and image-only DL showed AUCs of 0.70
(95% confidence interval [CI]: 0.66, 0.75) and 0.68 (95% CI: 0.64, 0.73), respectively. RF-LR and TC showed AUCs of 0.67
(95% CI: 0.62, 0.72) and 0.62 (95% CI: 0.57, 0.66), respectively. Hybrid DL showed a significantly higher AUC (0.70) than TC
(0.62; P , .001) and RF-LR (0.67; P = .01).
Conclusion: Deep learning models that use full-field mammograms yield substantially improved risk discrimination compared with
the Tyrer-Cuzick (version 8) model.
©RSNA, 2019
Online supplemental material is available for this article.
0.77] and 0.71 [95% CI: 0.66, 0.77
pared with an AUC of 0.66 (95% C
improvement of hybrid DL and ima
Table E3 [online]). The improvement of hybrid DL overTC was
significant (P , .01). For patients without a family history of
breast or ovarian cancer, hybrid DL and image-only DL showed
similar discrimination accuracies (AUCs, 0.71 [95% CI: 0.65,
Model AUC Ratio Hazard Ratio in Top Decile
TC 0.62 (0.57, 0.66) 1.89 (0.91, 2.63) 0.50 (0.08, 0.81) 0.18 (0.11, 0.24)
RF-LR 0.67 (0.62, 0.72) 3.69 (2.25, 4.94) 0.41 (0, 0.72) 0.31 (0.23, 0.38)
Image-only DL 0.68 (0.64, 0.73) 2.31 (1.46, 3.02) 0.40 (0.09, 0.61) 0.22 (0.16, 0.27)
Hybrid DL 0.70 (0.66, 0.75) 3.80 (2.45, 4.91) 0.36 (0.01, 0.60) 0.31 (0.24, 0.38)
Note.—Data in parentheses are 95% confidence intervals. There were a total of 3937 patients, 8751 examination
= area under receiver operator characteristic curve, DL = deep learning, RF-LR = risk-factor-based logistic regress
Figure 2: Receiver operating characteristic curve of all models on
the test set. All P values are comparisons with Tyrer-Cuzick version 8
(TCv8). DL = deep learning, hybrid DL = DL model that uses both im-
aging and the traditional risk factors in risk factor logistic regression,
RF-LR = risk factor logistic regression.
Table 3: Risk Test Set for 5-year Ri
by Ethnicity
Parameter
Ethnicity
White
TC
RF-LR
Image-only DL
Hybrid DL
African American
TC
RF-LR
Image-only DL
Hybrid DL
Note.—Data in parentheses are 95% c
3157 patients who were white, there w
and 233 cancers; in the 202 patients w
there were 424 examinations and 11 ca
receiver operator characteristic curve
RF-LR = risk-factor-based logistic regr
• Deep learning model based on mammography image



predicts the onset of breast cancer in next 5 years, 



better than existing risk-factor based model. 

• Image Only DL& Hybrid DL showed better performance 



than risk factor based models.
1. TC model (existing risk based model)
2. Logistic regression based on risk factor 

(strong control)
3. Deep learning model, 

based only on mammography image
4. hybrid of 2+3

(both risk factor and DL of mammography)
Radiology 2019
Pathology
Pathology
1. CAD: Computer Aided Detection

2. Triage: Prioritization of Critical cases

3. Image driven biomarker
Figure 4. Participating Pathologists’ Interpretations of Each of the 240 Breast Biopsy Test Cases
0 25 50 75 100
Interpretations, %
2
4
6
8
10
12
14
16
18
20
22
24
26
28
30
32
34
36
38
40
42
44
46
48
50
52
54
56
58
60
62
64
66
68
70
72
Case
Benign without atypia
72 Cases
2070 Total interpretations
A
0 25 50 75 100
Interpretations, %
218
220
222
224
226
228
230
232
234
236
238
240
Case
Invasive carcinoma
23 Cases
663 Total interpretations
D
0 25 50 75 100
Interpretations, %
147
145
149
151
153
155
157
159
161
163
165
167
169
171
173
175
177
179
181
183
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
Case
DCIS
73 Cases
2097 Total interpretations
C
0 25 50 75 100
Interpretations, %
74
76
78
80
82
84
86
88
90
92
94
96
98
100
102
104
106
108
110
112
114
116
118
120
122
124
126
128
130
132
134
136
138
140
142
144
Case
Atypia
72 Cases
2070 Total interpretations
B
Benign without atypia
Atypia
DCIS
Invasive carcinoma
Pathologist interpretation
DCIS indicates ductal carcinoma in situ.
Diagnostic Concordance in Interpreting Breast Biopsies Original Investigation Research
Elmore etl al. JAMA 2015
Diagnostic Concordance Among Pathologists
Interpreting Breast Biopsy Specimens
• Diagnostic concordance among the pathologist is only about 75%

• Blind test to interpret breast biopsy specimens 

• 240 cases, among 115 board-certified pathologists
Figure 4. Participating Pathologists’ Interpretations of Each of the 240 Breast Biopsy Test Cases
0 25 50 75 100
Interpretations, %
2
4
6
8
10
12
14
16
18
20
22
24
26
28
30
32
34
36
38
40
42
44
46
48
50
52
54
56
58
60
62
64
66
68
70
72
Case
Benign without atypia
72 Cases
2070 Total interpretations
A
0 25 50 75 100
Interpretations, %
218
220
222
224
226
228
230
232
234
236
238
240
Case
Invasive carcinoma
23 Cases
663 Total interpretations
D
0 25 50 75 100
Interpretations, %
147
145
149
151
153
155
157
159
161
163
165
167
169
171
173
175
177
179
181
183
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
Case
DCIS
73 Cases
2097 Total interpretations
C
0 25 50 75 100
Interpretations, %
74
76
78
80
82
84
86
88
90
92
94
96
98
100
102
104
106
108
110
112
114
116
118
120
122
124
126
128
130
132
134
136
138
140
142
144
Case
Atypia
72 Cases
2070 Total interpretations
B
Benign without atypia
Atypia
DCIS
Invasive carcinoma
Pathologist interpretation
DCIS indicates ductal carcinoma in situ.
Diagnostic Concordance in Interpreting Breast Biopsies Original Investigation Research
Elmore etl al. JAMA 2015
Diagnostic Concordance Among Pathologists
Interpreting Breast Biopsy Specimens
• Diagnostic concordance among the pathologist is only about 75%

• Blind test to interpret breast biopsy specimens 

• 240 cases, among 115 board-certified pathologists
ISBI Grand Challenge on
Cancer Metastases Detection in Lymph Node
Camelyon16 (>200 registrants)
International Symposium on Biomedical Imaging 2016
H&E Image Processing Framework
Train
whole slide image
sample
sample
training data
normaltumor
Test
whole slide image
overlapping image
patches tumor prob. map
1.0
0.0
0.5
Convolutional Neural
Network
P(tumor)
https://blogs.nvidia.com/blog/2016/09/19/deep-learning-breast-cancer-diagnosis/
Assisting Pathologists in Detecting
Cancer with Deep Learning
• Algorithms need to be incorporated in a way that complements the pathologist’s workflow. 

• Algorithms could improve the efficiency and consistency of pathologists. 

• For example, pathologists could reduce their false negative rates (percentage of 



undetected tumors) by reviewing the top ranked predicted tumor regions 



including up to 8 false positive regions per slide.
Matchmaking in heaven?

Sensitivity of AI + Specificity of Human
Yun Liu et al. Detecting Cancer Metastases on Gigapixel Pathology Images (2017)
• Google’s AI improved substantially in sensitivity (92.9%, 88.5%)

• Pathologist showed almost 100% of specificity.
•Pathologist and AI are both good at detection, but in different ways. 

•It shows potential additive effect on accuracy, efficiency and consistency
https://www.facebook.com/groups/TensorFlowKR/permalink/633902253617503/
Plenary Lecture of Google Engineers

at AACR 2018
Downloadedfromhttps://journals.lww.com/ajspbyBhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4XMi0hCywCX1AWnYQp/IlQrHD3MyLIZIvnCFZVJ56DGsD590P5lh5KqE20T/dBX3x9CoM=on10/14/2018
Downloadedfromhttps://journals.lww.com/ajspbyBhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4XMi0hCywCX1AWnYQp/IlQrHD3MyLIZIvnCFZVJ56DGsD590P5lh5KqE20T/dBX3x9CoM=on10/14/2018
Impact of Deep Learning Assistance on the
Histopathologic Review of Lymph Nodes for Metastatic
Breast Cancer
David F. Steiner, MD, PhD,* Robert MacDonald, PhD,* Yun Liu, PhD,* Peter Truszkowski, MD,*
Jason D. Hipp, MD, PhD, FCAP,* Christopher Gammage, MS,* Florence Thng, MS,†
Lily Peng, MD, PhD,* and Martin C. Stumpe, PhD*
Abstract: Advances in the quality of whole-slide images have set the
stage for the clinical use of digital images in anatomic pathology.
Along with advances in computer image analysis, this raises the
possibility for computer-assisted diagnostics in pathology to improve
histopathologic interpretation and clinical care. To evaluate the
potential impact of digital assistance on interpretation of digitized
slides, we conducted a multireader multicase study utilizing our deep
learning algorithm for the detection of breast cancer metastasis in
lymph nodes. Six pathologists reviewed 70 digitized slides from lymph
node sections in 2 reader modes, unassisted and assisted, with a wash-
out period between sessions. In the assisted mode, the deep learning
algorithm was used to identify and outline regions with high like-
lihood of containing tumor. Algorithm-assisted pathologists demon-
strated higher accuracy than either the algorithm or the pathologist
alone. In particular, algorithm assistance significantly increased the
sensitivity of detection for micrometastases (91% vs. 83%, P=0.02).
In addition, average review time per image was significantly shorter
with assistance than without assistance for both micrometastases (61
vs. 116 s, P=0.002) and negative images (111 vs. 137 s, P=0.018).
Lastly, pathologists were asked to provide a numeric score regarding
the difficulty of each image classification. On the basis of this score,
pathologists considered the image review of micrometastases to be
significantly easier when interpreted with assistance (P=0.0005).
Utilizing a proof of concept assistant tool, this study demonstrates the
potential of a deep learning algorithm to improve pathologist accu-
racy and efficiency in a digital pathology workflow.
Key Words: artificial intelligence, machine learning, digital pathology,
breast cancer, computer aided detection
(Am J Surg Pathol 2018;00:000–000)
The regulatory approval and gradual implementation of
whole-slide scanners has enabled the digitization of glass
slides for remote consults and archival purposes.1 Digitiza-
tion alone, however, does not necessarily improve the con-
sistency or efficiency of a pathologist’s primary workflow. In
fact, image review on a digital medium can be slightly
slower than on glass, especially for pathologists with limited
digital pathology experience.2 However, digital pathology
and image analysis tools have already demonstrated po-
tential benefits, including the potential to reduce inter-reader
variability in the evaluation of breast cancer HER2 status.3,4
Digitization also opens the door for assistive tools based on
Artificial Intelligence (AI) to improve efficiency and con-
sistency, decrease fatigue, and increase accuracy.5
Among AI technologies, deep learning has demon-
strated strong performance in many automated image-rec-
ognition applications.6–8 Recently, several deep learning–
based algorithms have been developed for the detection of
breast cancer metastases in lymph nodes as well as for other
applications in pathology.9,10 Initial findings suggest that
some algorithms can even exceed a pathologist’s sensitivity
for detecting individual cancer foci in digital images. How-
ever, this sensitivity gain comes at the cost of increased false
positives, potentially limiting the utility of such algorithms for
automated clinical use.11 In addition, deep learning algo-
rithms are inherently limited to the task for which they have
been specifically trained. While we have begun to understand
the strengths of these algorithms (such as exhaustive search)
and their weaknesses (sensitivity to poor optical focus, tumor
mimics; manuscript under review), the potential clinical util-
ity of such algorithms has not been thoroughly examined.
While an accurate algorithm alone will not necessarily aid
pathologists or improve clinical interpretation, these benefits
may be achieved through thoughtful and appropriate in-
tegration of algorithm predictions into the clinical workflow.8
From the *Google AI Healthcare; and †Verily Life Sciences, Mountain
View, CA.
D.F.S., R.M., and Y.L. are co-first authors (equal contribution).
Work done as part of the Google Brain Healthcare Technology Fellowship
(D.F.S. and P.T.).
Conflicts of Interest and Source of Funding: D.F.S., R.M., Y.L., P.T.,
J.D.H., C.G., F.T., L.P., M.C.S. are employees of Alphabet and have
Alphabet stock.
Correspondence: David F. Steiner, MD, PhD, Google AI Healthcare,
1600 Amphitheatre Way, Mountain View, CA 94043
(e-mail: davesteiner@google.com).
Supplemental Digital Content is available for this article. Direct URL citations
appear in the printed text and are provided in the HTML and PDF
versions of this article on the journal’s website, www.ajsp.com.
Copyright © 2018 The Author(s). Published by Wolters Kluwer Health,
Inc. This is an open-access article distributed under the terms of the
Creative Commons Attribution-Non Commercial-No Derivatives
License 4.0 (CCBY-NC-ND), where it is permissible to download and
share the work provided it is properly cited. The work cannot be
changed in any way or used commercially without permission from
the journal.
ORIGINAL ARTICLE
Am J Surg Pathol  Volume 00, Number 00, ’’ 2018 www.ajsp.com | 1
• Google’s Pathology AI, LYNA(LYmph Node Assistant)

• Review of lymph nodes for meatastatic breast cancer. 

• Synergistic effect between Pathologists and AI
models across all observations were generated using the glmer
function. All models were generated using the lme4 package
in R and each category (eg, negative or micrometastases) was
modeled separately.
confidence interval [CI], 93.4%-95.8%). To evaluate the
impact of the assisted read on accuracy, we analyzed per-
formance by case category and assistance modality (Fig. 3A).
For micrometastases, sensitivity was significantly higher with
Negative
(Specificity)
Micromet
(Sensitivity)
Macromet
(Sensitivity)
0.7
0.5
0.6
0.8
0.9
1.0
p=0.02
A B
Performance
Unassisted
Assisted
FIGURE 3. Improved metastasis detection with algorithm assistance. A, Data represents performance across all images by image
category and assistance modality. Error bars indicate SE. The performance metric corresponds to corresponds to specificity for
negative cases and sensitivity for micrometastases (micromet) and macrometastases (macromet). B, Operating point of individual
pathologists with and without assistance for micrometastases and negative cases, overlayed on the receiver operating characteristic
curve of the algorithm. AUC indicates area under the curve.
Copyright © 2018 Wolters Kluwer Health, Inc. All rights reserved. www.ajsp.com | 5
• Sensitivity / Specificity, with assistance of AI

• Micromet: sensitivity was significantly improved

• Negative  Macromet: non-significant
isolated diagnostic tasks. Underlying these exciting advances,
however, is the important notion that these algorithms do not
replace the breadth and contextual knowledge of human
pathologists and that even the best algorithms would need to
from 83% to 91% and resulted in higher overall diagnostic
accuracy than that of either unassisted pathologist inter-
pretation or the computer algorithm alone. Although deep
learning algorithms have been credited with comparable
Unassisted Assisted
TimeofReview(seconds)
Timeofreviewperimage(seconds)
Negative ITC Micromet Macromet
A B
p=0.002
p=0.02
Unassisted
Assisted
Micrometastases
FIGURE 5. Average review time per image decreases with assistance. A, Average review time per image across all pathologists
analyzed by category. Black circles are average times with assistance, gray triangles represent average times without assistance.
Error bars indicate 95% confidence interval. B, Micrometastasis time of review decreases for nearly all images with assistance.
Circles represent average review time for each individual micrometastasis image, averaged across the 6 pathologists by assistance
modality. The dashed lines connect the points corresponding to the same image with and without assistance. The 2 images that
were not reviewed faster on average with assistance are represented with red dot-dash lines. Vertical lines of the box represent
quartiles, and the diamond indicates the average of review time for micrometastases in that modality. Micromet indicates mi-
crometastasis; macromet, macrometastasis.
8 | www.ajsp.com Copyright © 2018 Wolters Kluwer Health, Inc. All rights reserved.
• Time of review (per image)

• Negative  Micromet: significant reduction

• Micromet: 2 min ➔ 1 min

• ITC(Isolated Tumor Cell) Macromet: non-significant
Pathology
1. CAD: Computer Aided Detection

2. Triage: Prioritization of Critical cases

3. Image driven biomarker
“Something beyond human’s visual perception”: cancer immunotherapy
BRIEF COMMUNICATION
https://doi.org/10.1038/s41591-019-0462-y
1
Department of Medicine III, University Hospital RWTH Aachen, Aachen, Germany. 2
German Cancer Consortium (DKTK), German Cancer Research
Center (DKFZ), Heidelberg, Germany. 3
Applied Tumor Immunity, German Cancer Research Center (DKFZ), Heidelberg, Germany. 4
Hematology/
Microsatellite instability determines whether patients with
gastrointestinal cancer respond exceptionally well to immu-
notherapy. However, in clinical practice, not every patient is
tested for MSI, because this requires additional genetic or
immunohistochemical tests. Here we show that deep residual
learning can predict MSI directly from HE histology, which
is ubiquitously available. This approach has the potential to
provide immunotherapy to a much broader subset of patients
with gastrointestinal cancer.
Although immunotherapy now represents a cornerstone of can-
cer therapy, patients with gastrointestinal cancer usually do not
benefit to the same extent as patients with other solid malignancies,
such as melanoma or lung cancer1
, unless the tumor belongs to the
group of microsatellite instable (MSI) tumors2
. In this group, which
accounts for approximately 15% of gastric (stomach) adenocarci-
noma (STAD) and colorectal cancer (CRC)3
, immune checkpoint
inhibitors demonstrated considerable clinical benefit4
, resulting in
recent approval by the Food and Drug Administration (FDA). MSI
can be identified by immunohistochemistry or genetic analyses5
, but
not all patients are screened for MSI except in high-volume tertiary
care centers6
. Accordingly, a substantial group of potential respond-
ers to immunotherapy may not be offered timely treatment with
immune checkpoint inhibitors, missing chances of disease control.
Deep learning has outperformed humans in some medical data
analysis tasks7
and can predict patient survival and mutations in
tumors using images of lung8
, prostate9
and brain10,11
tumors. To
facilitate universal MSI screening, we investigated whether deep
learning can predict MSI status directly from HE-stained histol-
ogy slides. First, we compared five convolutional neural networks
on a three-class set of gastrointestinal cancer tissues (n=94slides,
n=81patients, Fig. 1a–c, Extended Data Fig. 1). Resnet18, a resid-
ual learning12
convolutional neural network, was an efficient tumor
detector with an out-of-sample area under the curve (AUC)0.99,
which represented an improvement on the current state of the art13,14
.
Another resnet18 (Fig. 1d) was trained to classify MSI versus mic-
rosatellite stability (MSS, Fig. 1e) in large patient cohorts from The
Cancer Genome Atlas (TCGA): n=315formalin-fixed paraffin-
embedded (FFPE) samples of STAD15
(TCGA-STAD), n=360FFPE
samples of CRC16
(TCGA-CRC-DX) and n=378snap-frozen sam-
ples of CRC (TCGA-CRC-KR; Supplementary Table 1).
Tumor tissue was automatically detected and subsequently tes-
sellated into 100,570 (TCGA-STAD), 60,894 (TCGA-CRC-KR) and
93,408 (TCGA-CRC-DX) color-normalized tiles, in which the deep
learning model scored MSI. In the TCGA-CRC-DX test cohort, true
MSI image tiles (as defined in Supplementary Table 2) had a median
MSI score of 0.61 (95% confidence interval (CI), 0.12–0.82; Fig. 2a),
whereas true MSS tiles had an MSI score of 0.29 (95% CI, 0.08–0.57;
two-tailed t-test P=1.1×10−6
; Fig. 2b). In the TCGA-CRC-KR test
cohort, the MSI score was 0.50 (95% CI, 0.17–0.80) for MSI tiles
and 0.22 (95% CI, 0.06–0.60; P=7.3×10−11
) for MSS tiles, indicat-
ing that our approach can robustly distinguish features that are pre-
dictive of MSI in both snap-frozen and FFPE samples. Patient-level
AUCs for MSI detection were 0.81 (95% confidence interval, 0.69–
0.90) in TCGA-STAD, 0.84 (95% CI, 0.73–0.91) in TCGA-CRC-KR
and 0.77 (95% CI, 0.62–0.87) in TCGA-CRC-DX (Extended Data
Fig. 2a; MSI frequency is listed in Supplementary Table 3).
The multi-center DACHS study17,18
was used as an external
validation set (n=378patients). Using the automatic tumor detec-
tor and the MSI detector trained on TCGA-CRC-DX (Fig. 2c),
the patient-level AUC was 0.84 (95% CI, 0.72–0.92) (Fig. 2d). The
model that was trained on FFPE samples and used on FFPE sam-
ples was superior to a model that was trained on frozen samples
and used on FFPE samples. Similarly, a model that was trained on
CRC samples and used on CRC samples performed better than a
model that was trained on STAD samples and used on CRC samples
(Extended Data Fig. 2a). To analyze the limits of our proposed
Deep learning can predict microsatellite instability
directly from histology in gastrointestinal cancer
Jakob Nikolas Kather 1,2,3,4,5
*, Alexander T. Pearson4
, Niels Halama 2,5,6
, Dirk Jäger2,3,5
,
Jeremias Krause 1
, Sven H. Loosen1
, Alexander Marx7
, Peter Boor 8
, Frank Tacke9
,
Ulf Peter Neumann10
, Heike I. Grabsch 11,12
, Takaki Yoshikawa13,14
, Hermann Brenner2,15,16
,
Jenny Chang-Claude17,18
, Michael Hoffmeister15
, Christian Trautwein1
and Tom Luedde 1
*
Net 1
Net 2
Predicte
Tumor
versus
normal
mor detection and MSI prediction in HE histology. a, A convolutional neural network was trained as a tumor detector for STAD and CRC. Scale
m. b,c, Tumor regions were cut into square tiles (b), which were color-normalized and sorted into MSI and MSS (c). Scale bar, 256µm. d, Another
was trained to classify MSI versus MSS. e, This automatic pipeline was applied to held-out patient sets.
MSI ba
dc
MSS
AUC = 0.84
Train on TCGA cohort
360 patients
93,408 tiles
Predict on DACHS cohort
378 patients
896,530 tiles
MSS MSI
e
MSIness in
STAD
C
R
C
-KR
C
R
C
-D
X
D
AC
H
S
CD8+
signature
CD8+
IHC
PD-L1 expression
IFNγ signature
1
0
0 1
FPR
TPR
Correlation 0.50.40.20.1 0.3
assification performance in an external validation set. a,b, Tissue slides of patients with MSI and MSS tumors in the TCGA-CRC-DX test set
•Prediction of microsatellite instability in GI cancer 



directly from HE pathology image.

•AUC=0.84 , External Validation (n=378)

•validated for both snap-frozen and FFPE

•validated for endometrial cancer, as well
Nat Med 2019
PD-L1 low, AI low (mPFS 1.9mo)
PD-L1 high, AI low (mPFS 2.0mo)
PD-L1 low, AI high (mPFS 4.0mo)
PD-L1 high, AI high (mPFS 6.7mo)
Therapeutic Biomarker Disc
ASCO 2019
•Prediction of immune check point inhibitor response, 



based only on pathology image, in metastatic NSCLC. (n=189) 

•PD-L1 (existing biomarker) vs. AI score

•Additive / complementary effect 

•AI score is better than PD-L1 for the response prediction.
ASCO 2019ASCO 2019  On the courtesy of Lunit, Inc
Gastroenterology
ARTICLES
https://doi.org/10.1038/s41551-018-0301-3
C
olonoscopy is the gold-standard screening test for colorectal
cancer1–3
, one of the leading causes of cancer death in both the
United States4,5
and China6
. Colonoscopy can reduce the risk
of death from colorectal cancer through the detection of tumours
at an earlier, more treatable stage as well as through the removal of
precancerous adenomas3,7
. Conversely, failure to detect adenomas
may lead to the development of interval cancer. Evidence has shown
that each 1.0% increase in adenoma detection rate (ADR) leads to a
3.0% decrease in the risk of interval colorectal cancer8
.
Although more than 14million colonoscopies are performed
in the United States annually2
, the adenoma miss rate (AMR) is
estimated to be 6–27%9
. Certain polyps may be missed more fre-
quently, including smaller polyps10,11
, flat polyps12
and polyps in the
left colon13
. There are two independent reasons why a polyp may
be missed during colonoscopy: (i) it was never in the visual field or
(ii) it was in the visual field but not recognized. Several hardware
innovations have sought to address the first problem by improv-
ing visualization of the colonic lumen, for instance by providing a
larger, panoramic camera view, or by flattening colonic folds using a
distal-cap attachment. The problem of unrecognized polyps within
the visual field has been more difficult to address14
. Several studies
have shown that observation of the video monitor by either nurses
or gastroenterology trainees may increase polyp detection by up
to 30%15–17
. Ideally, a real-time automatic polyp-detection system
could serve as a similarly effective second observer that could draw
the endoscopist’s eye, in real time, to concerning lesions, effec-
tively creating an ‘extra set of eyes’ on all aspects of the video data
with fidelity. Although automatic polyp detection in colonoscopy
videos has been an active research topic for the past 20 years, per-
formance levels close to that of the expert endoscopist18–20
have not
been achieved. Early work in automatic polyp detection has focused
on applying deep-learning techniques to polyp detection, but most
published works are small in scale, with small development and/or
training validation sets19,20
.
Here, we report the development and validation of a deep-learn-
ing algorithm, integrated with a multi-threaded processing system,
for the automatic detection of polyps during colonoscopy. We vali-
dated the system in two image studies and two video studies. Each
study contained two independent validation datasets.
Results
We developed a deep-learning algorithm using 5,545colonoscopy
images from colonoscopy reports of 1,290patients that underwent
a colonoscopy examination in the Endoscopy Center of Sichuan
Provincial People’s Hospital between January 2007 and December
2015. Out of the 5,545images used, 3,634images contained polyps
(65.54%) and 1,911 images did not contain polyps (34.46%). For
algorithm training, experienced endoscopists annotated the pres-
ence of each polyp in all of the images in the development data-
set. We validated the algorithm on four independent datasets.
DatasetsA and B were used for image analysis, and datasetsC and D
were used for video analysis.
DatasetA contained 27,113colonoscopy images from colo-
noscopy reports of 1,138consecutive patients who underwent a
colonoscopy examination in the Endoscopy Center of Sichuan
Provincial People’s Hospital between January and December 2016
and who were found to have at least one polyp. Out of the 27,113
images, 5,541images contained polyps (20.44%) and 21,572images
did not contain polyps (79.56%). All polyps were confirmed histo-
logically after biopsy. DatasetB is a public database (CVC-ClinicDB;
Development and validation of a deep-learning
algorithm for the detection of polyps during
colonoscopy
Pu Wang1
, Xiao Xiao2
, Jeremy R. Glissen Brown3
, Tyler M. Berzin 3
, Mengtian Tu1
, Fei Xiong1
,
Xiao Hu1
, Peixi Liu1
, Yan Song1
, Di Zhang1
, Xue Yang1
, Liangping Li1
, Jiong He2
, Xin Yi2
, Jingjia Liu2
and
Xiaogang Liu 1
*
The detection and removal of precancerous polyps via colonoscopy is the gold standard for the prevention of colon cancer.
However, the detection rate of adenomatous polyps can vary significantly among endoscopists. Here, we show that a machine-
learningalgorithmcandetectpolypsinclinicalcolonoscopies,inrealtimeandwithhighsensitivityandspecificity.Wedeveloped
the deep-learning algorithm by using data from 1,290 patients, and validated it on newly collected 27,113 colonoscopy images
from 1,138 patients with at least one detected polyp (per-image-sensitivity, 94.38%; per-image-specificity, 95.92%; area under
the receiver operating characteristic curve, 0.984), on a public database of 612 polyp-containing images (per-image-sensitiv-
ity, 88.24%), on 138 colonoscopy videos with histologically confirmed polyps (per-image-sensitivity of 91.64%; per-polyp-sen-
sitivity, 100%), and on 54 unaltered full-range colonoscopy videos without polyps (per-image-specificity, 95.40%). By using a
multi-threaded processing system, the algorithm can process at least 25 frames per second with a latency of 76.80±5.60ms
in real-time video analysis. The software may aid endoscopists while performing colonoscopies, and help assess differences in
polyp and adenoma detection performance among endoscopists.
• Deep learning algorithm to detect polyps during colonoscopy

• Retrospective study

• Validated for both images (snapshot) and videos (short clip, full-range)

• sensitivity and specificity  90%
ARTICLESNATURE BIOMEDICAL ENGINEERING
from patients who underwent colonoscopy examinations up to 2
years later.
Also, we demonstrated high per-image-sensitivity (94.38%
and 91.64%) in both the image (datasetA) and video (datasetC)
analyses. DatasetsA and C included large variations of polyp mor-
phology and image quality (Fig. 3, Supplementary Figs. 2–5 and
Supplementary Videos 3 and 4). For images with only flat and iso-
chromatic polyps of sizeless than0.5cm, which have a higher miss
rate25
, the algorithm achieved a per-image-sensitivity of 91.65%,
which indicates the algorithm’s ability to capture subtle visual fea-
tures of certain polyps and the potential to decrease the missed
diagnosis of such polyps in real-world clinical settings.
The algorithm reached a per-polyp-sensitivity of 100% along
with a tracing persistence of 88.93% in the video analysis of data-
setC, further indicating that the software’s ability to track polyps
may be comparable to that of an experienced endoscopist. High per-
datasets are often small and do not represent the full range of colon
conditions encountered in the clinical setting, and there are often
discrepancies in the reporting of clinical metrics of success such as
sensitivity and specificity19,20,26
. Compared with other metrics such
as precision, we believe that sensitivity and specificity are the most
appropriate metrics for the evaluation of algorithm performance
because of their independence on the ratio of positive to negative
samples. Furthermore, many published studies are preclinical, and
focus on validation using still images. A 2017 review18
of preclini-
cal and clinical studies on AI applied to colonoscopy showed that
a recent study27
was the first and only physician-initiated preclini-
cal study on automatic polyp detection. In that study, the authors
assessed their system using 24colonoscopy videos containing
31polyps and obtained a sensitivity of 70.4% and a specificity of
72.4%. Several other studies have been published since that time,
including a study that examines an automatic polyp-detection sys-
Fig. 3 | Examples of polyp detection for datasetsA and C. Polyps of different morphology, including flat isochromatic polyps (left), dome-shaped polyps
(second from left, middle), pedunculated polyps (second from right) and sessile serrated adenomatous polyps (right), were detected by the algorithm
(as indicated by the green tags in the bottom set of images) in both normal and insufficient light conditions, under both qualified and suboptimal bowel
preparations. Some polyps were detected with only partial appearance (middle, second from right). See Supplementary Figs 2–6 for additional examples.
Nature Biomedical Engineering 2018
Examples of Polyp Detection
Endoscopy
Prosp
The
trial
syste
ADR
the S
patie
to Fe
bowe
given
defin
high
infla
Figure 1 Deep learning architecture.The detection algorithm is a deep
convolutional neural network (CNN) based on SegNet architecture. Data
flow is from left to right: a colonoscopy image is sequentially warped
into a binary image, with 1 representing polyp pixels and 0 representing
no polyp in probability a map.This is then displayed, as showed in the
output, with a hollow tracing box on the CADe monitor.
1Wang P, et al. Gut 2019;0:1–7. doi:10.1136/gutjnl-2018-317500
Endoscopy
ORIGINAL ARTICLE
Real-time automatic detection system increases
colonoscopic polyp and adenoma detection rates: a
prospective randomised controlled study
Pu Wang,  1
Tyler M Berzin,  2
Jeremy Romek Glissen Brown,  2
Shishira Bharadwaj,2
Aymeric Becq,2
Xun Xiao,1
Peixi Liu,1
Liangping Li,1
Yan Song,1
Di Zhang,1
Yi Li,1
Guangre Xu,1
Mengtian Tu,1
Xiaogang Liu  1
To cite: Wang P, Berzin TM,
Glissen Brown JR, et al. Gut
Epub ahead of print: [please
include Day Month Year].
doi:10.1136/
gutjnl-2018-317500
► Additional material is
published online only.To view
please visit the journal online
(http://dx.doi.org/10.1136/
gutjnl-2018-317500).
1
Department of
Gastroenterology, Sichuan
Academy of Medical Sciences
 Sichuan Provincial People’s
Hospital, Chengdu, China
2
Center for Advanced
Endoscopy, Beth Israel
Deaconess Medical Center and
Harvard Medical School, Boston,
Massachusetts, USA
Correspondence to
Xiaogang Liu, Department
of Gastroenterology Sichuan
Academy of Medical Sciences
and Sichuan Provincial People’s
Hospital, Chengdu, China;
Gary.samsph@gmail.com
Received 30 August 2018
Revised 4 February 2019
Accepted 13 February 2019
© Author(s) (or their
employer(s)) 2019. Re-use
permitted under CC BY-NC. No
commercial re-use. See rights
and permissions. Published
by BMJ.
ABSTRACT
Objective The effect of colonoscopy on colorectal
cancer mortality is limited by several factors, among them
a certain miss rate, leading to limited adenoma detection
rates (ADRs).We investigated the effect of an automatic
polyp detection system based on deep learning on polyp
detection rate and ADR.
Design In an open, non-blinded trial, consecutive
patients were prospectively randomised to undergo
diagnostic colonoscopy with or without assistance of a
real-time automatic polyp detection system providing
a simultaneous visual notice and sound alarm on polyp
detection.The primary outcome was ADR.
Results Of 1058 patients included, 536 were
randomised to standard colonoscopy, and 522 were
randomised to colonoscopy with computer-aided
diagnosis.The artificial intelligence (AI) system
significantly increased ADR (29.1%vs20.3%, p0.001)
and the mean number of adenomas per patient
(0.53vs0.31, p0.001).This was due to a higher number
of diminutive adenomas found (185vs102; p0.001),
while there was no statistical difference in larger
adenomas (77vs58, p=0.075). In addition, the number
of hyperplastic polyps was also significantly increased
(114vs52, p0.001).
Conclusions In a low prevalent ADR population, an
automatic polyp detection system during colonoscopy
resulted in a significant increase in the number of
diminutive adenomas detected, as well as an increase in
the rate of hyperplastic polyps.The cost–benefit ratio of
such effects has to be determined further.
Trial registration number ChiCTR-DDD-17012221;
Results.
INTRODUCTION
Colorectal cancer (CRC) is the second and third-
leading causes of cancer-related deaths in men and
women respectively.1
Colonoscopy is the gold stan-
dard for screening CRC.2 3
Screening colonoscopy
has allowed for a reduction in the incidence and
mortality of CRC via the detection and removal
of adenomatous polyps.4–8
Additionally, there is
evidence that with each 1.0% increase in adenoma
detection rate (ADR), there is an associated 3.0%
decrease in the risk of interval CRC.9 10
However,
polyps can be missed, with reported miss rates of
up to 27% due to both polyp and operator charac-
teristics.11 12
Unrecognised polyps within the visual field is
an important problem to address.11
Several studies
have shown that assistance by a second observer
increases the polyp detection rate (PDR), but such a
strategy remains controversial in terms of increasing
the ADR.13–15
Ideally, a real-time automatic polyp detec-
tion system, with performance close to that of
expert endoscopists, could assist the endosco-
pist in detecting lesions that might correspond to
adenomas in a more consistent and reliable way
Significance of this study
What is already known on this subject?
► Colorectal adenoma detection rate (ADR)
is regarded as a main quality indicator of
(screening) colonoscopy and has been shown
to correlate with interval cancers. Reducing
adenoma miss rates by increasing ADR has
been a goal of many studies focused on
imaging techniques and mechanical methods.
► Artificial intelligence has been recently
introduced for polyp and adenoma detection
as well as differentiation and has shown
promising results in preliminary studies.
What are the new findings?
► This represents the first prospective randomised
controlled trial examining an automatic polyp
detection during colonoscopy and shows an
increase of ADR by 50%, from 20% to 30%.
► This effect was mainly due to a higher rate of
small adenomas found.
► The detection rate of hyperplastic polyps was
also significantly increased.
How might it impact on clinical practice in the
foreseeable future?
► Automatic polyp and adenoma detection could
be the future of diagnostic colonoscopy in order
to achieve stable high adenoma detection rates.
► However, the effect on ultimate outcome is
still unclear, and further improvements such as
polyp differentiation have to be implemented.
on17March2019byguest.Protectedbycopyright.http://gut.bmj.com/Gut:firstpublishedas10.1136/gutjnl-2018-317500on27February2019.Downloadedfrom
• Deep learning based ‘Real-time’ automatic detection system 



increases colonoscopic polyp and adenoma detection rates

• Prospective RCT (n=1058; standard=536, CAD=522)

• With aided of AI 

• Improvement in adenoma detection rate: 29.1% vs 20.3% (p0.001)

• Increment of number of detected adenoma: 0.53 vs 0.31 (p0.001)

• Increment of number of hyperplastic polyps: 114 vs 52 (p0.001)
•Analysis of ‘complicated’ medical data

• EMR, genomic data, clinical trial data, insurance claims etc

•Analysis of medical images

• radiology, pathology, ophthalmology, gastroenterology etc

•Monitoring continuous biomedical data

• sepsis, blood glucose, arrhythmia, cardiac arrest, AKI etc
AI in Medicine
•Drug development

+
•Analysis of ‘complicated’ medical data

• EMR, genomic data, clinical trial data, insurance claims etc

•Analysis of medical images

• radiology, pathology, ophthalmology, gastroenterology etc

•Monitoring continuous biomedical data

• sepsis, blood glucose, arrhythmia, cardiac arrest, AKI etc
AI in Medicine
•Drug development

+
•Development 

•data, data, data: quality and quality + privacy

•reference standard: how to define gold standard?  

•overcoming “overfitting” problem 

•Validation 

•Validation level

•analytical validity; clinical validity; clinical utility

•Interpreting the blackbox: “explainable AI”

•Regulation

•approve criteria: FDA’s Pre-Cert, active learning etc

•adversarial attack: Robustness vs. Performance

•Liability: whose liability in case of malpractice
Remaining Issues
•Development 

•data, data, data: quality and quality + privacy

•reference standard: how to define gold standard?  

•overcoming “overfitting” problem 

•Validation 

•Validation level

•analytical validity; clinical validity; clinical utility

•Interpreting the blackbox: “explainable AI”

•Regulation

•approve criteria: FDA’s Pre-Cert, active learning etc

•adversarial attack: Robustness vs. Performance

•Liability: whose liability in case of malpractice
Remaining Issues
•Development 

•data, data, data: quality and quality + privacy

•reference standard: how to define gold standard?  

•overcoming “overfitting” problem 

•Validation 

•Validation level

•analytical validity; clinical validity; clinical utility

•Interpreting the blackbox: “explainable AI”

•Regulation

•approve criteria: FDA’s Pre-Cert, active learning etc

•adversarial attack: Robustness vs. Performance

•Liability: whose liability in case of malpractice
Remaining Issues
One more thing…
malpractice
more data lack of time
3 minute visit
저수가
EMR 전공의 특별법
lack of manpower
over-loading
Social recognition
Unprecedented Pressure
research performance
genomics
의료사고=구속
삭감
진상 환자
진료 실적
Physician Burnout
Sep 2018, Health 2.0 @Santa Clara
“Physician Burnout” is recognized as a serious issue in US
March 2019, the Future of Individual Medicine @San Diego
“vicious cycle”
Victims of “Physician Burnout” will be the all of us, in the society.
Collaboration of physician and AI

: a solution for physician burnout?
ARTICLES
https://doi.org/10.1038/s41591-018-0177-5
1
Applied Bioinformatics Laboratories, New York University School of Medicine, New York, NY, USA. 2
Skirball Institute, Department of Cell Biology,
New York University School of Medicine, New York, NY, USA. 3
Department of Pathology, New York University School of Medicine, New York, NY, USA.
4
School of Mechanical Engineering, National Technical University of Athens, Zografou, Greece. 5
Institute for Systems Genetics, New York University School
of Medicine, New York, NY, USA. 6
Department of Biochemistry and Molecular Pharmacology, New York University School of Medicine, New York, NY,
USA. 7
Center for Biospecimen Research and Development, New York University, New York, NY, USA. 8
Department of Population Health and the Center for
Healthcare Innovation and Delivery Science, New York University School of Medicine, New York, NY, USA. 9
These authors contributed equally to this work:
Nicolas Coudray, Paolo Santiago Ocampo. *e-mail: narges.razavian@nyumc.org; aristotelis.tsirigos@nyumc.org
A
ccording to the American Cancer Society and the Cancer
Statistics Center (see URLs), over 150,000 patients with lung
cancer succumb to the disease each year (154,050 expected
for 2018), while another 200,000 new cases are diagnosed on a
yearly basis (234,030 expected for 2018). It is one of the most widely
spread cancers in the world because of not only smoking, but also
exposure to toxic chemicals like radon, asbestos and arsenic. LUAD
and LUSC are the two most prevalent types of non–small cell lung
cancer1
, and each is associated with discrete treatment guidelines. In
the absence of definitive histologic features, this important distinc-
tion can be challenging and time-consuming, and requires confir-
matory immunohistochemical stains.
Classification of lung cancer type is a key diagnostic process
because the available treatment options, including conventional
chemotherapy and, more recently, targeted therapies, differ for
LUAD and LUSC2
. Also, a LUAD diagnosis will prompt the search
for molecular biomarkers and sensitizing mutations and thus has
a great impact on treatment options3,4
. For example, epidermal
growth factor receptor (EGFR) mutations, present in about 20% of
LUAD, and anaplastic lymphoma receptor tyrosine kinase (ALK)
rearrangements, present in5% of LUAD5
, currently have tar-
geted therapies approved by the Food and Drug Administration
(FDA)6,7
. Mutations in other genes, such as KRAS and tumor pro-
tein P53 (TP53) are very common (about 25% and 50%, respec-
tively) but have proven to be particularly challenging drug targets
so far5,8
. Lung biopsies are typically used to diagnose lung cancer
type and stage. Virtual microscopy of stained images of tissues is
typically acquired at magnifications of 20×to 40×, generating very
large two-dimensional images (10,000 to100,000 pixels in each
dimension) that are oftentimes challenging to visually inspect in
an exhaustive manner. Furthermore, accurate interpretation can be
difficult, and the distinction between LUAD and LUSC is not always
clear, particularly in poorly differentiated tumors; in this case, ancil-
lary studies are recommended for accurate classification9,10
. To assist
experts, automatic analysis of lung cancer whole-slide images has
been recently studied to predict survival outcomes11
and classifica-
tion12
. For the latter, Yu et al.12
combined conventional thresholding
and image processing techniques with machine-learning methods,
such as random forest classifiers, support vector machines (SVM) or
Naive Bayes classifiers, achieving an AUC of ~0.85 in distinguishing
normal from tumor slides, and ~0.75 in distinguishing LUAD from
LUSC slides. More recently, deep learning was used for the classi-
fication of breast, bladder and lung tumors, achieving an AUC of
0.83 in classification of lung tumor types on tumor slides from The
Cancer Genome Atlas (TCGA)13
. Analysis of plasma DNA values
was also shown to be a good predictor of the presence of non–small
cell cancer, with an AUC of ~0.94 (ref. 14
) in distinguishing LUAD
from LUSC, whereas the use of immunochemical markers yields an
AUC of ~0.94115
.
Here, we demonstrate how the field can further benefit from deep
learning by presenting a strategy based on convolutional neural
networks (CNNs) that not only outperforms methods in previously
Classification and mutation prediction from
non–small cell lung cancer histopathology
images using deep learning
Nicolas Coudray 1,2,9
, Paolo Santiago Ocampo3,9
, Theodore Sakellaropoulos4
, Navneet Narula3
,
Matija Snuderl3
, David Fenyö5,6
, Andre L. Moreira3,7
, Narges Razavian 8
* and Aristotelis Tsirigos 1,3
*
Visual inspection of histopathology slides is one of the main methods used by pathologists to assess the stage, type and sub-
type of lung tumors. Adenocarcinoma (LUAD) and squamous cell carcinoma (LUSC) are the most prevalent subtypes of lung
cancer, and their distinction requires visual inspection by an experienced pathologist. In this study, we trained a deep con-
volutional neural network (inception v3) on whole-slide images obtained from The Cancer Genome Atlas to accurately and
automatically classify them into LUAD, LUSC or normal lung tissue. The performance of our method is comparable to that of
pathologists, with an average area under the curve (AUC) of 0.97. Our model was validated on independent datasets of frozen
tissues, formalin-fixed paraffin-embedded tissues and biopsies. Furthermore, we trained the network to predict the ten most
commonly mutated genes in LUAD. We found that six of them—STK11, EGFR, FAT1, SETBP1, KRAS and TP53—can be pre-
dicted from pathology images, with AUCs from 0.733 to 0.856 as measured on a held-out population. These findings suggest
that deep-learning models can assist pathologists in the detection of cancer subtype or gene mutations. Our approach can be
applied to any cancer type, and the code is available at https://github.com/ncoudray/DeepPATH.
NATURE MEDICINE | www.nature.com/naturemedicine
1Wang P, et al. Gut 2019;0:1–7. doi:10.1136/gutjnl-2018-317500
Endoscopy
ORIGINAL ARTICLE
Real-time automatic detection system increases
colonoscopic polyp and adenoma detection rates: a
prospective randomised controlled study
Pu Wang,  1
Tyler M Berzin,  2
Jeremy Romek Glissen Brown,  2
Shishira Bharadwaj,2
Aymeric Becq,2
Xun Xiao,1
Peixi Liu,1
Liangping Li,1
Yan Song,1
Di Zhang,1
Yi Li,1
Guangre Xu,1
Mengtian Tu,1
Xiaogang Liu  1
To cite: Wang P, Berzin TM,
Glissen Brown JR, et al. Gut
Epub ahead of print: [please
include Day Month Year].
doi:10.1136/
gutjnl-2018-317500
► Additional material is
published online only.To view
please visit the journal online
(http://dx.doi.org/10.1136/
gutjnl-2018-317500).
1
Department of
Gastroenterology, Sichuan
Academy of Medical Sciences
 Sichuan Provincial People’s
Hospital, Chengdu, China
2
Center for Advanced
Endoscopy, Beth Israel
Deaconess Medical Center and
Harvard Medical School, Boston,
Massachusetts, USA
Correspondence to
Xiaogang Liu, Department
of Gastroenterology Sichuan
Academy of Medical Sciences
and Sichuan Provincial People’s
Hospital, Chengdu, China;
Gary.samsph@gmail.com
Received 30 August 2018
Revised 4 February 2019
Accepted 13 February 2019
© Author(s) (or their
employer(s)) 2019. Re-use
permitted under CC BY-NC. No
commercial re-use. See rights
and permissions. Published
by BMJ.
ABSTRACT
Objective The effect of colonoscopy on colorectal
cancer mortality is limited by several factors, among them
a certain miss rate, leading to limited adenoma detection
rates (ADRs).We investigated the effect of an automatic
polyp detection system based on deep learning on polyp
detection rate and ADR.
Design In an open, non-blinded trial, consecutive
patients were prospectively randomised to undergo
diagnostic colonoscopy with or without assistance of a
real-time automatic polyp detection system providing
a simultaneous visual notice and sound alarm on polyp
detection.The primary outcome was ADR.
Results Of 1058 patients included, 536 were
randomised to standard colonoscopy, and 522 were
randomised to colonoscopy with computer-aided
diagnosis.The artificial intelligence (AI) system
significantly increased ADR (29.1%vs20.3%, p0.001)
and the mean number of adenomas per patient
(0.53vs0.31, p0.001).This was due to a higher number
of diminutive adenomas found (185vs102; p0.001),
while there was no statistical difference in larger
adenomas (77vs58, p=0.075). In addition, the number
of hyperplastic polyps was also significantly increased
(114vs52, p0.001).
Conclusions In a low prevalent ADR population, an
automatic polyp detection system during colonoscopy
resulted in a significant increase in the number of
diminutive adenomas detected, as well as an increase in
the rate of hyperplastic polyps.The cost–benefit ratio of
such effects has to be determined further.
Trial registration number ChiCTR-DDD-17012221;
Results.
INTRODUCTION
Colorectal cancer (CRC) is the second and third-
leading causes of cancer-related deaths in men and
women respectively.1
Colonoscopy is the gold stan-
dard for screening CRC.2 3
Screening colonoscopy
has allowed for a reduction in the incidence and
mortality of CRC via the detection and removal
of adenomatous polyps.4–8
Additionally, there is
evidence that with each 1.0% increase in adenoma
detection rate (ADR), there is an associated 3.0%
decrease in the risk of interval CRC.9 10
However,
polyps can be missed, with reported miss rates of
up to 27% due to both polyp and operator charac-
teristics.11 12
Unrecognised polyps within the visual field is
an important problem to address.11
Several studies
have shown that assistance by a second observer
increases the polyp detection rate (PDR), but such a
strategy remains controversial in terms of increasing
the ADR.13–15
Ideally, a real-time automatic polyp detec-
tion system, with performance close to that of
expert endoscopists, could assist the endosco-
pist in detecting lesions that might correspond to
adenomas in a more consistent and reliable way
Significance of this study
What is already known on this subject?
► Colorectal adenoma detection rate (ADR)
is regarded as a main quality indicator of
(screening) colonoscopy and has been shown
to correlate with interval cancers. Reducing
adenoma miss rates by increasing ADR has
been a goal of many studies focused on
imaging techniques and mechanical methods.
► Artificial intelligence has been recently
introduced for polyp and adenoma detection
as well as differentiation and has shown
promising results in preliminary studies.
What are the new findings?
► This represents the first prospective randomised
controlled trial examining an automatic polyp
detection during colonoscopy and shows an
increase of ADR by 50%, from 20% to 30%.
► This effect was mainly due to a higher rate of
small adenomas found.
► The detection rate of hyperplastic polyps was
also significantly increased.
How might it impact on clinical practice in the
foreseeable future?
► Automatic polyp and adenoma detection could
be the future of diagnostic colonoscopy in order
to achieve stable high adenoma detection rates.
► However, the effect on ultimate outcome is
still unclear, and further improvements such as
polyp differentiation have to be implemented.
on17March2019byguest.Protectedbycopyright.http://gut.bmj.com/Gut:firstpublishedas10.1136/gutjnl-2018-317500on27February2019.Downloadedfrom
This copy is for personal use only.
To order printed copies, contact reprints@rsna.org
This copy is for personal use only.
To order printed copies, contact reprints@rsna.org
ORIGINAL RESEARCH • THORACIC IMAGING
C
hest radiography, one of the most common diagnos-
tic imaging tests in medicine, is used for screening,
diagnostic work-ups, and monitoring of various thoracic
diseases (1,2). One of its major objectives is detection of
pulmonary nodules because pulmonary nodules are often
the initial radiologic manifestation of lung cancers (1,2).
However, to date, pulmonary nodule detection on chest
radiographs has not been completely satisfactory, with a
reported sensitivity ranging between 36%–84%, varying
widely according to the tumor size and study population
(2–6). Indeed, chest radiography has been shown to be
prone to many reading errors with low interobserver and
intraobserver agreements because of its limited spatial reso-
lution, noise from overlapping anatomic structures, and
the variable perceptual ability of radiologists. Recent work
shows that 19%–26% of lung cancers visible on chest ra-
diographs were in fact missed at their first readings (6,7).
Of course, hindsight is always perfect when one knows
where to look.
For this reason, there has been increasing dependency
on chest CT images over chest radiographs in pulmonary
nodule detection. However, even low-dose CT scans re-
quire approximately 50–100 times higher radiation dose
than single-view chest radiographic examinations (8,9)
Development and Validation of Deep
Learning–based Automatic Detection
Algorithm for Malignant Pulmonary Nodules
on Chest Radiographs
Ju Gang Nam, MD* • Sunggyun Park, PhD* • Eui Jin Hwang, MD • Jong Hyuk Lee, MD • Kwang-Nam Jin, MD,
PhD • KunYoung Lim, MD, PhD • Thienkai HuyVu, MD, PhD • Jae Ho Sohn, MD • Sangheum Hwang, PhD • Jin
Mo Goo, MD, PhD • Chang Min Park, MD, PhD
From the Department of Radiology and Institute of Radiation Medicine, Seoul National University Hospital and College of Medicine, 101 Daehak-ro, Jongno-gu, Seoul
03080, Republic of Korea (J.G.N., E.J.H., J.M.G., C.M.P.); Lunit Incorporated, Seoul, Republic of Korea (S.P.); Department of Radiology, Armed Forces Seoul Hospital,
Seoul, Republic of Korea (J.H.L.); Department of Radiology, Seoul National University Boramae Medical Center, Seoul, Republic of Korea (K.N.J.); Department of
Radiology, National Cancer Center, Goyang, Republic of Korea (K.Y.L.); Department of Radiology and Biomedical Imaging, University of California, San Francisco,
San Francisco, Calif (T.H.V., J.H.S.); and Department of Industrial  Information Systems Engineering, Seoul National University of Science and Technology, Seoul,
Republic of Korea (S.H.). Received January 30, 2018; revision requested March 20; revision received July 29; accepted August 6. Address correspondence to C.M.P.
(e-mail: cmpark.morphius@gmail.com).
Study supported by SNUH Research Fund and Lunit (06–2016–3000) and by Seoul Research and Business Development Program (FI170002).
*J.G.N. and S.P. contributed equally to this work.
Conflicts of interest are listed at the end of this article.
Radiology 2018; 00:1–11 • https://doi.org/10.1148/radiol.2018180237 • Content codes:
Purpose: To develop and validate a deep learning–based automatic detection algorithm (DLAD) for malignant pulmonary nodules
on chest radiographs and to compare its performance with physicians including thoracic radiologists.
Materials and Methods: For this retrospective study, DLAD was developed by using 43292 chest radiographs (normal radiograph–
to–nodule radiograph ratio, 34067:9225) in 34676 patients (healthy-to-nodule ratio, 30784:3892; 19230 men [mean age, 52.8
years; age range, 18–99 years]; 15446 women [mean age, 52.3 years; age range, 18–98 years]) obtained between 2010 and 2015,
which were labeled and partially annotated by 13 board-certified radiologists, in a convolutional neural network. Radiograph clas-
sification and nodule detection performances of DLAD were validated by using one internal and four external data sets from three
South Korean hospitals and one U.S. hospital. For internal and external validation, radiograph classification and nodule detection
performances of DLAD were evaluated by using the area under the receiver operating characteristic curve (AUROC) and jackknife
alternative free-response receiver-operating characteristic (JAFROC) figure of merit (FOM), respectively. An observer performance
test involving 18 physicians, including nine board-certified radiologists, was conducted by using one of the four external validation
data sets. Performances of DLAD, physicians, and physicians assisted with DLAD were evaluated and compared.
Results: According to one internal and four external validation data sets, radiograph classification and nodule detection perfor-
mances of DLAD were a range of 0.92–0.99 (AUROC) and 0.831–0.924 (JAFROC FOM), respectively. DLAD showed a higher
AUROC and JAFROC FOM at the observer performance test than 17 of 18 and 15 of 18 physicians, respectively (P , .05), and
all physicians showed improved nodule detection performances with DLAD (mean JAFROC FOM improvement, 0.043; range,
0.006–0.190; P , .05).
Conclusion: This deep learning–based automatic detection algorithm outperformed physicians in radiograph classification and nod-
ule detection performance for malignant pulmonary nodules on chest radiographs, and it enhanced physicians’ performances when
used as a second reader.
©RSNA, 2018
Online supplemental material is available for this article.
LETTERS
https://doi.org/10.1038/s41591-019-0447-x
1
Google AI, Mountain View, CA, USA. 2
Stanford Health Care and Palo Alto Veterans Affairs, Palo Alto, CA, USA. 3
Northwestern Medicine, Chicago, IL, USA.
4
New York University-Langone Medical Center, Center for Biological Imaging, New York City, NY, USA. 5
These authors contributed equally: Diego Ardila,
Atilla P. Kiraly, Sujeeth Bharadwaj, Bokyung Choi. *e-mail: tsed@google.com
With an estimated 160,000 deaths in 2018, lung cancer is
the most common cause of cancer death in the United States1
.
Lung cancer screening using low-dose computed tomography
has been shown to reduce mortality by 20–43% and is now
included in US screening guidelines1–6
. Existing challenges
include inter-grader variability and high false-positive and
false-negative rates7–10
. We propose a deep learning algorithm
that uses a patient’s current and prior computed tomography
volumes to predict the risk of lung cancer. Our model achieves
a state-of-the-art performance (94.4% area under the curve)
on 6,716 National Lung Cancer Screening Trial cases, and
performs similarly on an independent clinical validation set
of 1,139cases. We conducted two reader studies. When prior
computed tomography imaging was not available, our model
outperformed all six radiologists with absolute reductions of
11% in false positives and 5% in false negatives. Where prior
computed tomography imaging was available, the model per-
formance was on-par with the same radiologists. This creates
an opportunity to optimize the screening process via com-
puter assistance and automation. While the vast majority of
patients remain unscreened, we show the potential for deep
learning models to increase the accuracy, consistency and
adoption of lung cancer screening worldwide.
In 2013, the United States Preventive Services Task Force rec-
ommended low-dose computed tomography (LDCT) lung cancer
screening in high-risk populations based on reported improved
mortality in the National Lung Cancer Screening Trial (NLST)2–5
.
In 2014, the American College of Radiology published the Lung-
RADS guidelines for LDCT lung cancer screening, to standardize
image interpretation by radiologists and dictate management rec-
ommendations1,6
. Evaluation is based on a variety of image find-
ings, but primarily nodule size, density and growth6
. At screening
sites, Lung-RADS and other models such as PanCan are used to
determine malignancy risk ratings that drive recommendations for
clinical management11,12
. Improving the sensitivity and specificity
of lung cancer screening is imperative because of the high clinical
and financial costs of missed diagnosis, late diagnosis and unneces-
sary biopsy procedures resulting from false negatives and false posi-
tives5,13–17
. Despite improved consistency, persistent inter-grader
variability and incomplete characterization of comprehensive
imaging findings remain as limitations7–10
of Lung-RADS. These
limitations suggest opportunities for more sophisticated systems
to improve performance and inter-reader consistency18,19
. Deep
learning approaches offer the exciting potential to automate more
complex image analysis, detect subtle holistic imaging findings and
unify methodologies for image evaluation20
.
A variety of software devices have been approved by the
Food and Drug Administration (FDA) with the goal of address-
ing workflow efficiency and performance through augmented
detection of lung nodules on lung computed tomography (CT)21
.
Clinical research has primarily focused on either nodule detec-
tion or diagnostic support for lesions manually selected by imag-
ing experts22–27
. Nodule detection systems were engineered with
the goal of improving radiologist sensitivity in identifying nod-
ules while minimizing costs to specificity, thereby falling into the
category of computer-aided detection (CADe)28
. This approach
highlights small nodules, leaving malignancy risk evaluation and
clinical decision making to the clinician. Diagnostic support for
pre-identified lesions is included in computer-aided diagnosis
(CADx) platforms, which are primarily aimed at improving speci-
ficity. CADx has gained greater interest and even first regulatory
approvals in other areas of radiology, though not in lung cancer at
the time of manuscript preparation29
.
To move beyond the limitations of prior CADe and CADx
approaches, we aimed to build an end-to-end approach perform-
ing both localization and lung cancer risk categorization tasks using
the input CT data alone. More specifically, we were interested in
replicating a more complete part of a radiologist’s workflow, includ-
ing full assessment of LDCT volume, focus on regions of concern,
comparison to prior imaging when available and calibration against
biopsy-confirmed outcomes.
Another important high-level decision in our approach was to
learn features using deep convolutional neural networks (CNN),
rather than using hand-engineered features such as texture fea-
tures or specific Hounsfield unit values. We chose to learn features
because this approach has repeatedly been shown superior to hand-
engineered features in many open computer vision competitions in
the past five years30,31
, including the Kaggle 2017 Data Science Bowl
which used NLST data32
.
There were three key components in our new approach (Fig. 1).
First, we constructed a three-dimensional (3D) CNN model that
performs end-to-end analysis of whole-CT volumes, using LDCT
End-to-end lung cancer screening with
three-dimensional deep learning on low-dose
chest computed tomography
Diego Ardila 1,5
, Atilla P. Kiraly1,5
, Sujeeth Bharadwaj1,5
, Bokyung Choi1,5
, Joshua J. Reicher2
,
Lily Peng1
, Daniel Tse 1
*, Mozziyar Etemadi 3
, Wenxing Ye1
, Greg Corrado1
, David P. Naidich4
and Shravya Shetty1
Corrected: Author Correction
NATURE MEDICINE | VOL 25 | JUNE 2019 | 954–961 | www.nature.com/naturemedicine954
Downloadedfromhttps://journals.lww.com/ajspbyBhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4XMi0hCywCX1AWnYQp/IlQrHD3MyLIZIvnCFZVJ56DGsD590P5lh5KqE20T/dBX3x9CoM=on10/14/2018
Downloadedfromhttps://journals.lww.com/ajspbyBhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4XMi0hCywCX1AWnYQp/IlQrHD3MyLIZIvnCFZVJ56DGsD590P5lh5KqE20T/dBX3x9CoM=on10/14/2018
Impact of Deep Learning Assistance on the
Histopathologic Review of Lymph Nodes for Metastatic
Breast Cancer
David F. Steiner, MD, PhD,* Robert MacDonald, PhD,* Yun Liu, PhD,* Peter Truszkowski, MD,*
Jason D. Hipp, MD, PhD, FCAP,* Christopher Gammage, MS,* Florence Thng, MS,†
Lily Peng, MD, PhD,* and Martin C. Stumpe, PhD*
Abstract: Advances in the quality of whole-slide images have set the
stage for the clinical use of digital images in anatomic pathology.
Along with advances in computer image analysis, this raises the
possibility for computer-assisted diagnostics in pathology to improve
histopathologic interpretation and clinical care. To evaluate the
potential impact of digital assistance on interpretation of digitized
slides, we conducted a multireader multicase study utilizing our deep
learning algorithm for the detection of breast cancer metastasis in
lymph nodes. Six pathologists reviewed 70 digitized slides from lymph
node sections in 2 reader modes, unassisted and assisted, with a wash-
out period between sessions. In the assisted mode, the deep learning
algorithm was used to identify and outline regions with high like-
lihood of containing tumor. Algorithm-assisted pathologists demon-
strated higher accuracy than either the algorithm or the pathologist
alone. In particular, algorithm assistance significantly increased the
sensitivity of detection for micrometastases (91% vs. 83%, P=0.02).
In addition, average review time per image was significantly shorter
with assistance than without assistance for both micrometastases (61
vs. 116 s, P=0.002) and negative images (111 vs. 137 s, P=0.018).
Lastly, pathologists were asked to provide a numeric score regarding
the difficulty of each image classification. On the basis of this score,
pathologists considered the image review of micrometastases to be
significantly easier when interpreted with assistance (P=0.0005).
Utilizing a proof of concept assistant tool, this study demonstrates the
potential of a deep learning algorithm to improve pathologist accu-
racy and efficiency in a digital pathology workflow.
Key Words: artificial intelligence, machine learning, digital pathology,
breast cancer, computer aided detection
(Am J Surg Pathol 2018;00:000–000)
The regulatory approval and gradual implementation of
whole-slide scanners has enabled the digitization of glass
slides for remote consults and archival purposes.1 Digitiza-
tion alone, however, does not necessarily improve the con-
sistency or efficiency of a pathologist’s primary workflow. In
fact, image review on a digital medium can be slightly
slower than on glass, especially for pathologists with limited
digital pathology experience.2 However, digital pathology
and image analysis tools have already demonstrated po-
tential benefits, including the potential to reduce inter-reader
variability in the evaluation of breast cancer HER2 status.3,4
Digitization also opens the door for assistive tools based on
Artificial Intelligence (AI) to improve efficiency and con-
sistency, decrease fatigue, and increase accuracy.5
Among AI technologies, deep learning has demon-
strated strong performance in many automated image-rec-
ognition applications.6–8 Recently, several deep learning–
based algorithms have been developed for the detection of
breast cancer metastases in lymph nodes as well as for other
applications in pathology.9,10 Initial findings suggest that
some algorithms can even exceed a pathologist’s sensitivity
for detecting individual cancer foci in digital images. How-
ever, this sensitivity gain comes at the cost of increased false
positives, potentially limiting the utility of such algorithms for
automated clinical use.11 In addition, deep learning algo-
rithms are inherently limited to the task for which they have
been specifically trained. While we have begun to understand
the strengths of these algorithms (such as exhaustive search)
and their weaknesses (sensitivity to poor optical focus, tumor
mimics; manuscript under review), the potential clinical util-
ity of such algorithms has not been thoroughly examined.
While an accurate algorithm alone will not necessarily aid
pathologists or improve clinical interpretation, these benefits
may be achieved through thoughtful and appropriate in-
tegration of algorithm predictions into the clinical workflow.8
From the *Google AI Healthcare; and †Verily Life Sciences, Mountain
View, CA.
D.F.S., R.M., and Y.L. are co-first authors (equal contribution).
Work done as part of the Google Brain Healthcare Technology Fellowship
(D.F.S. and P.T.).
Conflicts of Interest and Source of Funding: D.F.S., R.M., Y.L., P.T.,
J.D.H., C.G., F.T., L.P., M.C.S. are employees of Alphabet and have
Alphabet stock.
Correspondence: David F. Steiner, MD, PhD, Google AI Healthcare,
1600 Amphitheatre Way, Mountain View, CA 94043
(e-mail: davesteiner@google.com).
Supplemental Digital Content is available for this article. Direct URL citations
appear in the printed text and are provided in the HTML and PDF
versions of this article on the journal’s website, www.ajsp.com.
Copyright © 2018 The Author(s). Published by Wolters Kluwer Health,
Inc. This is an open-access article distributed under the terms of the
Creative Commons Attribution-Non Commercial-No Derivatives
License 4.0 (CCBY-NC-ND), where it is permissible to download and
share the work provided it is properly cited. The work cannot be
changed in any way or used commercially without permission from
the journal.
ORIGINAL ARTICLE
Am J Surg Pathol  Volume 00, Number 00, ’’ 2018 www.ajsp.com | 1
Lung nodule 

on Chest X-ray
Lung cancer

on Chest CT
Lung Cancer

on Pathology
Breast Cancer

on Pathology
Polys  Adenoma

on Colonoscopy
Improving the accuracy
ARTICLES
https://doi.org/10.1038/s41591-018-0177-5
1
Applied Bioinformatics Laboratories, New York University School of Medicine, New York, NY, USA. 2
Skirball Institute, Department of Cell Biology,
New York University School of Medicine, New York, NY, USA. 3
Department of Pathology, New York University School of Medicine, New York, NY, USA.
4
School of Mechanical Engineering, National Technical University of Athens, Zografou, Greece. 5
Institute for Systems Genetics, New York University School
of Medicine, New York, NY, USA. 6
Department of Biochemistry and Molecular Pharmacology, New York University School of Medicine, New York, NY,
USA. 7
Center for Biospecimen Research and Development, New York University, New York, NY, USA. 8
Department of Population Health and the Center for
Healthcare Innovation and Delivery Science, New York University School of Medicine, New York, NY, USA. 9
These authors contributed equally to this work:
Nicolas Coudray, Paolo Santiago Ocampo. *e-mail: narges.razavian@nyumc.org; aristotelis.tsirigos@nyumc.org
A
ccording to the American Cancer Society and the Cancer
Statistics Center (see URLs), over 150,000 patients with lung
cancer succumb to the disease each year (154,050 expected
for 2018), while another 200,000 new cases are diagnosed on a
yearly basis (234,030 expected for 2018). It is one of the most widely
spread cancers in the world because of not only smoking, but also
exposure to toxic chemicals like radon, asbestos and arsenic. LUAD
and LUSC are the two most prevalent types of non–small cell lung
cancer1
, and each is associated with discrete treatment guidelines. In
the absence of definitive histologic features, this important distinc-
tion can be challenging and time-consuming, and requires confir-
matory immunohistochemical stains.
Classification of lung cancer type is a key diagnostic process
because the available treatment options, including conventional
chemotherapy and, more recently, targeted therapies, differ for
LUAD and LUSC2
. Also, a LUAD diagnosis will prompt the search
for molecular biomarkers and sensitizing mutations and thus has
a great impact on treatment options3,4
. For example, epidermal
growth factor receptor (EGFR) mutations, present in about 20% of
LUAD, and anaplastic lymphoma receptor tyrosine kinase (ALK)
rearrangements, present in5% of LUAD5
, currently have tar-
geted therapies approved by the Food and Drug Administration
(FDA)6,7
. Mutations in other genes, such as KRAS and tumor pro-
tein P53 (TP53) are very common (about 25% and 50%, respec-
tively) but have proven to be particularly challenging drug targets
so far5,8
. Lung biopsies are typically used to diagnose lung cancer
type and stage. Virtual microscopy of stained images of tissues is
typically acquired at magnifications of 20×to 40×, generating very
large two-dimensional images (10,000 to100,000 pixels in each
dimension) that are oftentimes challenging to visually inspect in
an exhaustive manner. Furthermore, accurate interpretation can be
difficult, and the distinction between LUAD and LUSC is not always
clear, particularly in poorly differentiated tumors; in this case, ancil-
lary studies are recommended for accurate classification9,10
. To assist
experts, automatic analysis of lung cancer whole-slide images has
been recently studied to predict survival outcomes11
and classifica-
tion12
. For the latter, Yu et al.12
combined conventional thresholding
and image processing techniques with machine-learning methods,
such as random forest classifiers, support vector machines (SVM) or
Naive Bayes classifiers, achieving an AUC of ~0.85 in distinguishing
normal from tumor slides, and ~0.75 in distinguishing LUAD from
LUSC slides. More recently, deep learning was used for the classi-
fication of breast, bladder and lung tumors, achieving an AUC of
0.83 in classification of lung tumor types on tumor slides from The
Cancer Genome Atlas (TCGA)13
. Analysis of plasma DNA values
was also shown to be a good predictor of the presence of non–small
cell cancer, with an AUC of ~0.94 (ref. 14
) in distinguishing LUAD
from LUSC, whereas the use of immunochemical markers yields an
AUC of ~0.94115
.
Here, we demonstrate how the field can further benefit from deep
learning by presenting a strategy based on convolutional neural
networks (CNNs) that not only outperforms methods in previously
Classification and mutation prediction from
non–small cell lung cancer histopathology
images using deep learning
Nicolas Coudray 1,2,9
, Paolo Santiago Ocampo3,9
, Theodore Sakellaropoulos4
, Navneet Narula3
,
Matija Snuderl3
, David Fenyö5,6
, Andre L. Moreira3,7
, Narges Razavian 8
* and Aristotelis Tsirigos 1,3
*
Visual inspection of histopathology slides is one of the main methods used by pathologists to assess the stage, type and sub-
type of lung tumors. Adenocarcinoma (LUAD) and squamous cell carcinoma (LUSC) are the most prevalent subtypes of lung
cancer, and their distinction requires visual inspection by an experienced pathologist. In this study, we trained a deep con-
volutional neural network (inception v3) on whole-slide images obtained from The Cancer Genome Atlas to accurately and
automatically classify them into LUAD, LUSC or normal lung tissue. The performance of our method is comparable to that of
pathologists, with an average area under the curve (AUC) of 0.97. Our model was validated on independent datasets of frozen
tissues, formalin-fixed paraffin-embedded tissues and biopsies. Furthermore, we trained the network to predict the ten most
commonly mutated genes in LUAD. We found that six of them—STK11, EGFR, FAT1, SETBP1, KRAS and TP53—can be pre-
dicted from pathology images, with AUCs from 0.733 to 0.856 as measured on a held-out population. These findings suggest
that deep-learning models can assist pathologists in the detection of cancer subtype or gene mutations. Our approach can be
applied to any cancer type, and the code is available at https://github.com/ncoudray/DeepPATH.
NATURE MEDICINE | www.nature.com/naturemedicine
1Wang P, et al. Gut 2019;0:1–7. doi:10.1136/gutjnl-2018-317500
Endoscopy
ORIGINAL ARTICLE
Real-time automatic detection system increases
colonoscopic polyp and adenoma detection rates: a
prospective randomised controlled study
Pu Wang,  1
Tyler M Berzin,  2
Jeremy Romek Glissen Brown,  2
Shishira Bharadwaj,2
Aymeric Becq,2
Xun Xiao,1
Peixi Liu,1
Liangping Li,1
Yan Song,1
Di Zhang,1
Yi Li,1
Guangre Xu,1
Mengtian Tu,1
Xiaogang Liu  1
To cite: Wang P, Berzin TM,
Glissen Brown JR, et al. Gut
Epub ahead of print: [please
include Day Month Year].
doi:10.1136/
gutjnl-2018-317500
► Additional material is
published online only.To view
please visit the journal online
(http://dx.doi.org/10.1136/
gutjnl-2018-317500).
1
Department of
Gastroenterology, Sichuan
Academy of Medical Sciences
 Sichuan Provincial People’s
Hospital, Chengdu, China
2
Center for Advanced
Endoscopy, Beth Israel
Deaconess Medical Center and
Harvard Medical School, Boston,
Massachusetts, USA
Correspondence to
Xiaogang Liu, Department
of Gastroenterology Sichuan
Academy of Medical Sciences
and Sichuan Provincial People’s
Hospital, Chengdu, China;
Gary.samsph@gmail.com
Received 30 August 2018
Revised 4 February 2019
Accepted 13 February 2019
© Author(s) (or their
employer(s)) 2019. Re-use
permitted under CC BY-NC. No
commercial re-use. See rights
and permissions. Published
by BMJ.
ABSTRACT
Objective The effect of colonoscopy on colorectal
cancer mortality is limited by several factors, among them
a certain miss rate, leading to limited adenoma detection
rates (ADRs).We investigated the effect of an automatic
polyp detection system based on deep learning on polyp
detection rate and ADR.
Design In an open, non-blinded trial, consecutive
patients were prospectively randomised to undergo
diagnostic colonoscopy with or without assistance of a
real-time automatic polyp detection system providing
a simultaneous visual notice and sound alarm on polyp
detection.The primary outcome was ADR.
Results Of 1058 patients included, 536 were
randomised to standard colonoscopy, and 522 were
randomised to colonoscopy with computer-aided
diagnosis.The artificial intelligence (AI) system
significantly increased ADR (29.1%vs20.3%, p0.001)
and the mean number of adenomas per patient
(0.53vs0.31, p0.001).This was due to a higher number
of diminutive adenomas found (185vs102; p0.001),
while there was no statistical difference in larger
adenomas (77vs58, p=0.075). In addition, the number
of hyperplastic polyps was also significantly increased
(114vs52, p0.001).
Conclusions In a low prevalent ADR population, an
automatic polyp detection system during colonoscopy
resulted in a significant increase in the number of
diminutive adenomas detected, as well as an increase in
the rate of hyperplastic polyps.The cost–benefit ratio of
such effects has to be determined further.
Trial registration number ChiCTR-DDD-17012221;
Results.
INTRODUCTION
Colorectal cancer (CRC) is the second and third-
leading causes of cancer-related deaths in men and
women respectively.1
Colonoscopy is the gold stan-
dard for screening CRC.2 3
Screening colonoscopy
has allowed for a reduction in the incidence and
mortality of CRC via the detection and removal
of adenomatous polyps.4–8
Additionally, there is
evidence that with each 1.0% increase in adenoma
detection rate (ADR), there is an associated 3.0%
decrease in the risk of interval CRC.9 10
However,
polyps can be missed, with reported miss rates of
up to 27% due to both polyp and operator charac-
teristics.11 12
Unrecognised polyps within the visual field is
an important problem to address.11
Several studies
have shown that assistance by a second observer
increases the polyp detection rate (PDR), but such a
strategy remains controversial in terms of increasing
the ADR.13–15
Ideally, a real-time automatic polyp detec-
tion system, with performance close to that of
expert endoscopists, could assist the endosco-
pist in detecting lesions that might correspond to
adenomas in a more consistent and reliable way
Significance of this study
What is already known on this subject?
► Colorectal adenoma detection rate (ADR)
is regarded as a main quality indicator of
(screening) colonoscopy and has been shown
to correlate with interval cancers. Reducing
adenoma miss rates by increasing ADR has
been a goal of many studies focused on
imaging techniques and mechanical methods.
► Artificial intelligence has been recently
introduced for polyp and adenoma detection
as well as differentiation and has shown
promising results in preliminary studies.
What are the new findings?
► This represents the first prospective randomised
controlled trial examining an automatic polyp
detection during colonoscopy and shows an
increase of ADR by 50%, from 20% to 30%.
► This effect was mainly due to a higher rate of
small adenomas found.
► The detection rate of hyperplastic polyps was
also significantly increased.
How might it impact on clinical practice in the
foreseeable future?
► Automatic polyp and adenoma detection could
be the future of diagnostic colonoscopy in order
to achieve stable high adenoma detection rates.
► However, the effect on ultimate outcome is
still unclear, and further improvements such as
polyp differentiation have to be implemented.
on17March2019byguest.Protectedbycopyright.http://gut.bmj.com/Gut:firstpublishedas10.1136/gutjnl-2018-317500on27February2019.Downloadedfrom
This copy is for personal use only.
To order printed copies, contact reprints@rsna.org
This copy is for personal use only.
To order printed copies, contact reprints@rsna.org
ORIGINAL RESEARCH • THORACIC IMAGING
C
hest radiography, one of the most common diagnos-
tic imaging tests in medicine, is used for screening,
diagnostic work-ups, and monitoring of various thoracic
diseases (1,2). One of its major objectives is detection of
pulmonary nodules because pulmonary nodules are often
the initial radiologic manifestation of lung cancers (1,2).
However, to date, pulmonary nodule detection on chest
radiographs has not been completely satisfactory, with a
reported sensitivity ranging between 36%–84%, varying
widely according to the tumor size and study population
(2–6). Indeed, chest radiography has been shown to be
prone to many reading errors with low interobserver and
intraobserver agreements because of its limited spatial reso-
lution, noise from overlapping anatomic structures, and
the variable perceptual ability of radiologists. Recent work
shows that 19%–26% of lung cancers visible on chest ra-
diographs were in fact missed at their first readings (6,7).
Of course, hindsight is always perfect when one knows
where to look.
For this reason, there has been increasing dependency
on chest CT images over chest radiographs in pulmonary
nodule detection. However, even low-dose CT scans re-
quire approximately 50–100 times higher radiation dose
than single-view chest radiographic examinations (8,9)
Development and Validation of Deep
Learning–based Automatic Detection
Algorithm for Malignant Pulmonary Nodules
on Chest Radiographs
Ju Gang Nam, MD* • Sunggyun Park, PhD* • Eui Jin Hwang, MD • Jong Hyuk Lee, MD • Kwang-Nam Jin, MD,
PhD • KunYoung Lim, MD, PhD • Thienkai HuyVu, MD, PhD • Jae Ho Sohn, MD • Sangheum Hwang, PhD • Jin
Mo Goo, MD, PhD • Chang Min Park, MD, PhD
From the Department of Radiology and Institute of Radiation Medicine, Seoul National University Hospital and College of Medicine, 101 Daehak-ro, Jongno-gu, Seoul
03080, Republic of Korea (J.G.N., E.J.H., J.M.G., C.M.P.); Lunit Incorporated, Seoul, Republic of Korea (S.P.); Department of Radiology, Armed Forces Seoul Hospital,
Seoul, Republic of Korea (J.H.L.); Department of Radiology, Seoul National University Boramae Medical Center, Seoul, Republic of Korea (K.N.J.); Department of
Radiology, National Cancer Center, Goyang, Republic of Korea (K.Y.L.); Department of Radiology and Biomedical Imaging, University of California, San Francisco,
San Francisco, Calif (T.H.V., J.H.S.); and Department of Industrial  Information Systems Engineering, Seoul National University of Science and Technology, Seoul,
Republic of Korea (S.H.). Received January 30, 2018; revision requested March 20; revision received July 29; accepted August 6. Address correspondence to C.M.P.
(e-mail: cmpark.morphius@gmail.com).
Study supported by SNUH Research Fund and Lunit (06–2016–3000) and by Seoul Research and Business Development Program (FI170002).
*J.G.N. and S.P. contributed equally to this work.
Conflicts of interest are listed at the end of this article.
Radiology 2018; 00:1–11 • https://doi.org/10.1148/radiol.2018180237 • Content codes:
Purpose: To develop and validate a deep learning–based automatic detection algorithm (DLAD) for malignant pulmonary nodules
on chest radiographs and to compare its performance with physicians including thoracic radiologists.
Materials and Methods: For this retrospective study, DLAD was developed by using 43292 chest radiographs (normal radiograph–
to–nodule radiograph ratio, 34067:9225) in 34676 patients (healthy-to-nodule ratio, 30784:3892; 19230 men [mean age, 52.8
years; age range, 18–99 years]; 15446 women [mean age, 52.3 years; age range, 18–98 years]) obtained between 2010 and 2015,
which were labeled and partially annotated by 13 board-certified radiologists, in a convolutional neural network. Radiograph clas-
sification and nodule detection performances of DLAD were validated by using one internal and four external data sets from three
South Korean hospitals and one U.S. hospital. For internal and external validation, radiograph classification and nodule detection
performances of DLAD were evaluated by using the area under the receiver operating characteristic curve (AUROC) and jackknife
alternative free-response receiver-operating characteristic (JAFROC) figure of merit (FOM), respectively. An observer performance
test involving 18 physicians, including nine board-certified radiologists, was conducted by using one of the four external validation
data sets. Performances of DLAD, physicians, and physicians assisted with DLAD were evaluated and compared.
Results: According to one internal and four external validation data sets, radiograph classification and nodule detection perfor-
mances of DLAD were a range of 0.92–0.99 (AUROC) and 0.831–0.924 (JAFROC FOM), respectively. DLAD showed a higher
AUROC and JAFROC FOM at the observer performance test than 17 of 18 and 15 of 18 physicians, respectively (P , .05), and
all physicians showed improved nodule detection performances with DLAD (mean JAFROC FOM improvement, 0.043; range,
0.006–0.190; P , .05).
Conclusion: This deep learning–based automatic detection algorithm outperformed physicians in radiograph classification and nod-
ule detection performance for malignant pulmonary nodules on chest radiographs, and it enhanced physicians’ performances when
used as a second reader.
©RSNA, 2018
Online supplemental material is available for this article.
LETTERS
https://doi.org/10.1038/s41591-019-0447-x
1
Google AI, Mountain View, CA, USA. 2
Stanford Health Care and Palo Alto Veterans Affairs, Palo Alto, CA, USA. 3
Northwestern Medicine, Chicago, IL, USA.
4
New York University-Langone Medical Center, Center for Biological Imaging, New York City, NY, USA. 5
These authors contributed equally: Diego Ardila,
Atilla P. Kiraly, Sujeeth Bharadwaj, Bokyung Choi. *e-mail: tsed@google.com
With an estimated 160,000 deaths in 2018, lung cancer is
the most common cause of cancer death in the United States1
.
Lung cancer screening using low-dose computed tomography
has been shown to reduce mortality by 20–43% and is now
included in US screening guidelines1–6
. Existing challenges
include inter-grader variability and high false-positive and
false-negative rates7–10
. We propose a deep learning algorithm
that uses a patient’s current and prior computed tomography
volumes to predict the risk of lung cancer. Our model achieves
a state-of-the-art performance (94.4% area under the curve)
on 6,716 National Lung Cancer Screening Trial cases, and
performs similarly on an independent clinical validation set
of 1,139cases. We conducted two reader studies. When prior
computed tomography imaging was not available, our model
outperformed all six radiologists with absolute reductions of
11% in false positives and 5% in false negatives. Where prior
computed tomography imaging was available, the model per-
formance was on-par with the same radiologists. This creates
an opportunity to optimize the screening process via com-
puter assistance and automation. While the vast majority of
patients remain unscreened, we show the potential for deep
learning models to increase the accuracy, consistency and
adoption of lung cancer screening worldwide.
In 2013, the United States Preventive Services Task Force rec-
ommended low-dose computed tomography (LDCT) lung cancer
screening in high-risk populations based on reported improved
mortality in the National Lung Cancer Screening Trial (NLST)2–5
.
In 2014, the American College of Radiology published the Lung-
RADS guidelines for LDCT lung cancer screening, to standardize
image interpretation by radiologists and dictate management rec-
ommendations1,6
. Evaluation is based on a variety of image find-
ings, but primarily nodule size, density and growth6
. At screening
sites, Lung-RADS and other models such as PanCan are used to
determine malignancy risk ratings that drive recommendations for
clinical management11,12
. Improving the sensitivity and specificity
of lung cancer screening is imperative because of the high clinical
and financial costs of missed diagnosis, late diagnosis and unneces-
sary biopsy procedures resulting from false negatives and false posi-
tives5,13–17
. Despite improved consistency, persistent inter-grader
variability and incomplete characterization of comprehensive
imaging findings remain as limitations7–10
of Lung-RADS. These
limitations suggest opportunities for more sophisticated systems
to improve performance and inter-reader consistency18,19
. Deep
learning approaches offer the exciting potential to automate more
complex image analysis, detect subtle holistic imaging findings and
unify methodologies for image evaluation20
.
A variety of software devices have been approved by the
Food and Drug Administration (FDA) with the goal of address-
ing workflow efficiency and performance through augmented
detection of lung nodules on lung computed tomography (CT)21
.
Clinical research has primarily focused on either nodule detec-
tion or diagnostic support for lesions manually selected by imag-
ing experts22–27
. Nodule detection systems were engineered with
the goal of improving radiologist sensitivity in identifying nod-
ules while minimizing costs to specificity, thereby falling into the
category of computer-aided detection (CADe)28
. This approach
highlights small nodules, leaving malignancy risk evaluation and
clinical decision making to the clinician. Diagnostic support for
pre-identified lesions is included in computer-aided diagnosis
(CADx) platforms, which are primarily aimed at improving speci-
ficity. CADx has gained greater interest and even first regulatory
approvals in other areas of radiology, though not in lung cancer at
the time of manuscript preparation29
.
To move beyond the limitations of prior CADe and CADx
approaches, we aimed to build an end-to-end approach perform-
ing both localization and lung cancer risk categorization tasks using
the input CT data alone. More specifically, we were interested in
replicating a more complete part of a radiologist’s workflow, includ-
ing full assessment of LDCT volume, focus on regions of concern,
comparison to prior imaging when available and calibration against
biopsy-confirmed outcomes.
Another important high-level decision in our approach was to
learn features using deep convolutional neural networks (CNN),
rather than using hand-engineered features such as texture fea-
tures or specific Hounsfield unit values. We chose to learn features
because this approach has repeatedly been shown superior to hand-
engineered features in many open computer vision competitions in
the past five years30,31
, including the Kaggle 2017 Data Science Bowl
which used NLST data32
.
There were three key components in our new approach (Fig. 1).
First, we constructed a three-dimensional (3D) CNN model that
performs end-to-end analysis of whole-CT volumes, using LDCT
End-to-end lung cancer screening with
three-dimensional deep learning on low-dose
chest computed tomography
Diego Ardila 1,5
, Atilla P. Kiraly1,5
, Sujeeth Bharadwaj1,5
, Bokyung Choi1,5
, Joshua J. Reicher2
,
Lily Peng1
, Daniel Tse 1
*, Mozziyar Etemadi 3
, Wenxing Ye1
, Greg Corrado1
, David P. Naidich4
and Shravya Shetty1
Corrected: Author Correction
NATURE MEDICINE | VOL 25 | JUNE 2019 | 954–961 | www.nature.com/naturemedicine954
Downloadedfromhttps://journals.lww.com/ajspbyBhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4XMi0hCywCX1AWnYQp/IlQrHD3MyLIZIvnCFZVJ56DGsD590P5lh5KqE20T/dBX3x9CoM=on10/14/2018
Downloadedfromhttps://journals.lww.com/ajspbyBhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4XMi0hCywCX1AWnYQp/IlQrHD3MyLIZIvnCFZVJ56DGsD590P5lh5KqE20T/dBX3x9CoM=on10/14/2018
Impact of Deep Learning Assistance on the
Histopathologic Review of Lymph Nodes for Metastatic
Breast Cancer
David F. Steiner, MD, PhD,* Robert MacDonald, PhD,* Yun Liu, PhD,* Peter Truszkowski, MD,*
Jason D. Hipp, MD, PhD, FCAP,* Christopher Gammage, MS,* Florence Thng, MS,†
Lily Peng, MD, PhD,* and Martin C. Stumpe, PhD*
Abstract: Advances in the quality of whole-slide images have set the
stage for the clinical use of digital images in anatomic pathology.
Along with advances in computer image analysis, this raises the
possibility for computer-assisted diagnostics in pathology to improve
histopathologic interpretation and clinical care. To evaluate the
potential impact of digital assistance on interpretation of digitized
slides, we conducted a multireader multicase study utilizing our deep
learning algorithm for the detection of breast cancer metastasis in
lymph nodes. Six pathologists reviewed 70 digitized slides from lymph
node sections in 2 reader modes, unassisted and assisted, with a wash-
out period between sessions. In the assisted mode, the deep learning
algorithm was used to identify and outline regions with high like-
lihood of containing tumor. Algorithm-assisted pathologists demon-
strated higher accuracy than either the algorithm or the pathologist
alone. In particular, algorithm assistance significantly increased the
sensitivity of detection for micrometastases (91% vs. 83%, P=0.02).
In addition, average review time per image was significantly shorter
with assistance than without assistance for both micrometastases (61
vs. 116 s, P=0.002) and negative images (111 vs. 137 s, P=0.018).
Lastly, pathologists were asked to provide a numeric score regarding
the difficulty of each image classification. On the basis of this score,
pathologists considered the image review of micrometastases to be
significantly easier when interpreted with assistance (P=0.0005).
Utilizing a proof of concept assistant tool, this study demonstrates the
potential of a deep learning algorithm to improve pathologist accu-
racy and efficiency in a digital pathology workflow.
Key Words: artificial intelligence, machine learning, digital pathology,
breast cancer, computer aided detection
(Am J Surg Pathol 2018;00:000–000)
The regulatory approval and gradual implementation of
whole-slide scanners has enabled the digitization of glass
slides for remote consults and archival purposes.1 Digitiza-
tion alone, however, does not necessarily improve the con-
sistency or efficiency of a pathologist’s primary workflow. In
fact, image review on a digital medium can be slightly
slower than on glass, especially for pathologists with limited
digital pathology experience.2 However, digital pathology
and image analysis tools have already demonstrated po-
tential benefits, including the potential to reduce inter-reader
variability in the evaluation of breast cancer HER2 status.3,4
Digitization also opens the door for assistive tools based on
Artificial Intelligence (AI) to improve efficiency and con-
sistency, decrease fatigue, and increase accuracy.5
Among AI technologies, deep learning has demon-
strated strong performance in many automated image-rec-
ognition applications.6–8 Recently, several deep learning–
based algorithms have been developed for the detection of
breast cancer metastases in lymph nodes as well as for other
applications in pathology.9,10 Initial findings suggest that
some algorithms can even exceed a pathologist’s sensitivity
for detecting individual cancer foci in digital images. How-
ever, this sensitivity gain comes at the cost of increased false
positives, potentially limiting the utility of such algorithms for
automated clinical use.11 In addition, deep learning algo-
rithms are inherently limited to the task for which they have
been specifically trained. While we have begun to understand
the strengths of these algorithms (such as exhaustive search)
and their weaknesses (sensitivity to poor optical focus, tumor
mimics; manuscript under review), the potential clinical util-
ity of such algorithms has not been thoroughly examined.
While an accurate algorithm alone will not necessarily aid
pathologists or improve clinical interpretation, these benefits
may be achieved through thoughtful and appropriate in-
tegration of algorithm predictions into the clinical workflow.8
From the *Google AI Healthcare; and †Verily Life Sciences, Mountain
View, CA.
D.F.S., R.M., and Y.L. are co-first authors (equal contribution).
Work done as part of the Google Brain Healthcare Technology Fellowship
(D.F.S. and P.T.).
Conflicts of Interest and Source of Funding: D.F.S., R.M., Y.L., P.T.,
J.D.H., C.G., F.T., L.P., M.C.S. are employees of Alphabet and have
Alphabet stock.
Correspondence: David F. Steiner, MD, PhD, Google AI Healthcare,
1600 Amphitheatre Way, Mountain View, CA 94043
(e-mail: davesteiner@google.com).
Supplemental Digital Content is available for this article. Direct URL citations
appear in the printed text and are provided in the HTML and PDF
versions of this article on the journal’s website, www.ajsp.com.
Copyright © 2018 The Author(s). Published by Wolters Kluwer Health,
Inc. This is an open-access article distributed under the terms of the
Creative Commons Attribution-Non Commercial-No Derivatives
License 4.0 (CCBY-NC-ND), where it is permissible to download and
share the work provided it is properly cited. The work cannot be
changed in any way or used commercially without permission from
the journal.
ORIGINAL ARTICLE
Am J Surg Pathol  Volume 00, Number 00, ’’ 2018 www.ajsp.com | 1
Lung nodule 

on Chest X-ray
Downloadedfromhttps://journals.lww.com/ajspbyBhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4XMi0hCywCX1AWnYQp/IlQrHD3MyLIZIvnCFZVJ56DGsD590P5lh5KqE20T/dBX3x9CoM=on10/14/2018
Downloadedfromhttps://journals.lww.com/ajspbyBhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4XMi0hCywCX1AWnYQp/IlQrHD3MyLIZIvnCFZVJ56DGsD590P5lh5KqE20T/dBX3x9CoM=on10/14/2018
Impact of Deep Learning Assistance on the
Histopathologic Review of Lymph Nodes for Metastatic
Breast Cancer
David F. Steiner, MD, PhD,* Robert MacDonald, PhD,* Yun Liu, PhD,* Peter Truszkowski, MD,*
Jason D. Hipp, MD, PhD, FCAP,* Christopher Gammage, MS,* Florence Thng, MS,†
Lily Peng, MD, PhD,* and Martin C. Stumpe, PhD*
Abstract: Advances in the quality of whole-slide images have set the
stage for the clinical use of digital images in anatomic pathology.
Along with advances in computer image analysis, this raises the
possibility for computer-assisted diagnostics in pathology to improve
histopathologic interpretation and clinical care. To evaluate the
potential impact of digital assistance on interpretation of digitized
slides, we conducted a multireader multicase study utilizing our deep
learning algorithm for the detection of breast cancer metastasis in
lymph nodes. Six pathologists reviewed 70 digitized slides from lymph
node sections in 2 reader modes, unassisted and assisted, with a wash-
out period between sessions. In the assisted mode, the deep learning
algorithm was used to identify and outline regions with high like-
lihood of containing tumor. Algorithm-assisted pathologists demon-
strated higher accuracy than either the algorithm or the pathologist
alone. In particular, algorithm assistance significantly increased the
sensitivity of detection for micrometastases (91% vs. 83%, P=0.02).
In addition, average review time per image was significantly shorter
with assistance than without assistance for both micrometastases (61
vs. 116 s, P=0.002) and negative images (111 vs. 137 s, P=0.018).
Lastly, pathologists were asked to provide a numeric score regarding
the difficulty of each image classification. On the basis of this score,
pathologists considered the image review of micrometastases to be
significantly easier when interpreted with assistance (P=0.0005).
Utilizing a proof of concept assistant tool, this study demonstrates the
potential of a deep learning algorithm to improve pathologist accu-
racy and efficiency in a digital pathology workflow.
Key Words: artificial intelligence, machine learning, digital pathology,
breast cancer, computer aided detection
(Am J Surg Pathol 2018;00:000–000)
The regulatory approval and gradual implementation of
whole-slide scanners has enabled the digitization of glass
slides for remote consults and archival purposes.1 Digitiza-
tion alone, however, does not necessarily improve the con-
sistency or efficiency of a pathologist’s primary workflow. In
fact, image review on a digital medium can be slightly
slower than on glass, especially for pathologists with limited
digital pathology experience.2 However, digital pathology
and image analysis tools have already demonstrated po-
tential benefits, including the potential to reduce inter-reader
variability in the evaluation of breast cancer HER2 status.3,4
Digitization also opens the door for assistive tools based on
Artificial Intelligence (AI) to improve efficiency and con-
sistency, decrease fatigue, and increase accuracy.5
Among AI technologies, deep learning has demon-
strated strong performance in many automated image-rec-
ognition applications.6–8 Recently, several deep learning–
based algorithms have been developed for the detection of
breast cancer metastases in lymph nodes as well as for other
applications in pathology.9,10 Initial findings suggest that
some algorithms can even exceed a pathologist’s sensitivity
for detecting individual cancer foci in digital images. How-
ever, this sensitivity gain comes at the cost of increased false
positives, potentially limiting the utility of such algorithms for
automated clinical use.11 In addition, deep learning algo-
rithms are inherently limited to the task for which they have
been specifically trained. While we have begun to understand
the strengths of these algorithms (such as exhaustive search)
and their weaknesses (sensitivity to poor optical focus, tumor
mimics; manuscript under review), the potential clinical util-
ity of such algorithms has not been thoroughly examined.
While an accurate algorithm alone will not necessarily aid
pathologists or improve clinical interpretation, these benefits
may be achieved through thoughtful and appropriate in-
tegration of algorithm predictions into the clinical workflow.8
From the *Google AI Healthcare; and †Verily Life Sciences, Mountain
View, CA.
D.F.S., R.M., and Y.L. are co-first authors (equal contribution).
Work done as part of the Google Brain Healthcare Technology Fellowship
(D.F.S. and P.T.).
Conflicts of Interest and Source of Funding: D.F.S., R.M., Y.L., P.T.,
J.D.H., C.G., F.T., L.P., M.C.S. are employees of Alphabet and have
Alphabet stock.
Correspondence: David F. Steiner, MD, PhD, Google AI Healthcare,
1600 Amphitheatre Way, Mountain View, CA 94043
(e-mail: davesteiner@google.com).
Supplemental Digital Content is available for this article. Direct URL citations
appear in the printed text and are provided in the HTML and PDF
versions of this article on the journal’s website, www.ajsp.com.
Copyright © 2018 The Author(s). Published by Wolters Kluwer Health,
Inc. This is an open-access article distributed under the terms of the
Creative Commons Attribution-Non Commercial-No Derivatives
License 4.0 (CCBY-NC-ND), where it is permissible to download and
share the work provided it is properly cited. The work cannot be
changed in any way or used commercially without permission from
the journal.
ORIGINAL ARTICLE
Am J Surg Pathol  Volume 00, Number 00, ’’ 2018 www.ajsp.com | 1
Kazimierz O.
Wrzeszczynski, PhD*
Mayu O. Frank, NP,
MS*
Takahiko Koyama, PhD*
Kahn Rhrissorrakrai, PhD*
Nicolas Robine, PhD
Filippo Utro, PhD
Anne-Katrin Emde, PhD
Bo-Juen Chen, PhD
Kanika Arora, MS
Minita Shah, MS
Vladimir Vacic, PhD
Raquel Norel, PhD
Erhan Bilal, PhD
Ewa A. Bergmann, MSc
Julia L. Moore Vogel,
PhD
Jeffrey N. Bruce, MD
Andrew B. Lassman, MD
Peter Canoll, MD, PhD
Christian Grommes, MD
Steve Harvey, BS
Laxmi Parida, PhD
Vanessa V. Michelini, BS
Michael C. Zody, PhD
Vaidehi Jobanputra, PhD
Ajay K. Royyuru, PhD
Robert B. Darnell, MD,
PhD
Correspondence to Dr. Darnell:
darnelr@rockefeller.edu
Supplemental data
at Neurology.org/ng
Comparing sequencing assays and
human-machine analyses in actionable
genomics for glioblastoma
ABSTRACT
Objective: To analyze a glioblastoma tumor specimen with 3 different platforms and compare
potentially actionable calls from each.
Methods: Tumor DNA was analyzed by a commercial targeted panel. In addition, tumor-normal
DNA was analyzed by whole-genome sequencing (WGS) and tumor RNA was analyzed by RNA
sequencing (RNA-seq). The WGS and RNA-seq data were analyzed by a team of bioinformaticians
and cancer oncologists, and separately by IBM Watson Genomic Analytics (WGA), an automated
system for prioritizing somatic variants and identifying drugs.
Results: More variants were identified by WGS/RNA analysis than by targeted panels. WGA com-
pleted a comparable analysis in a fraction of the time required by the human analysts.
Conclusions: The development of an effective human-machine interface in the analysis of deep
cancer genomic datasets may provide potentially clinically actionable calls for individual pa-
tients in a more timely and efficient manner than currently possible.
ClinicalTrials.gov identifier: NCT02725684. Neurol Genet 2017;3:e164; doi: 10.1212/
NXG.0000000000000164
GLOSSARY
CNV 5 copy number variant; EGFR 5 epidermal growth factor receptor; GATK 5 Genome Analysis Toolkit; GBM 5 glioblas-
toma; IRB 5 institutional review board; NLP 5 Natural Language Processing; NYGC 5 New York Genome Center; RNA-seq 5
RNA sequencing; SNV 5 single nucleotide variant; SV 5 structural variant; TCGA 5 The Cancer Genome Atlas; TPM 5
transcripts per million; VCF 5 variant call file; VUS 5 variants of uncertain significance; WGA 5 Watson Genomic Analytics;
WGS 5 whole-genome sequencing.
The clinical application of next-generation sequencing technology to cancer diagnosis and treat-
ment is in its early stages.1–3
An initial implementation of this technology has been in targeted
panels, where subsets of cancer-relevant and/or highly actionable genes are scrutinized for
potentially actionable mutations. This approach has been widely adopted, offering high redun-
dancy of sequence coverage for the small number of sites of known clinical utility at relatively
low cost.
However, recent studies have shown that many more potentially clinically actionable muta-
tions exist both in known cancer genes and in other genes not yet identified as cancer drivers.4,5
Improvements in the efficiency of next-generation sequencing make it possible to consider
whole-genome sequencing (WGS) as well as other omic assays such as RNA sequencing
(RNA-seq) as clinical assays, but uncertainties remain about how much additional useful infor-
mation is available from these assays.
*These authors contributed equally to the manuscript.
From the New York Genome Center (K.O.W., M.O.F., N.R., A.-K.E., B.-J.C., K.A., M.S., V.V., E.A.B., J.L.M.V., M.C.Z., V.J., R.B.D.); IBM
Thomas J. Watson Research Center (T.K., K.R., F.U., R.N., E.B., L.P., A.K.R.); Columbia University Medical Center (J.N.B., A.B.L., P.C., V.J.);
Memorial Sloan-Kettering Cancer Center (C.G.), New York, NY; IBM Watson Health (S.H., V.V.M.), Boca Raton, FL; Laboratory of Molecular
Neuro-Oncology (M.O.F., R.B.D.), and Howard Hughes Medical Institute (R.B.D.), The Rockefeller University, New York, NY. B.-J.C. is
currently affiliated with Google, New York, NY. V.V. is currently affiliated with 23andMe, Inc., Mountain View, CA. E.A.B. is currently affiliated
with Max Planck Institute of Immunobiology and Epigenetics, Freiburg, Germany.
Funding information and disclosures are provided at the end of the article. Go to Neurology.org/ng for full disclosure forms. The Article Processing
Charge was funded by the authors.
This is an open access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License 4.0 (CC
BY-NC-ND), which permits downloading and sharing the work provided it is properly cited. The work cannot be changed in any way or used
commercially without permission from the journal.
Neurology.org/ng Copyright © 2017 The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Academy of Neurology. 1
Enhancing Next-Generation Sequencing-Guided Cancer Care Through
Cognitive Computing
NIRALI M. PATEL,a,b,†
VANESSA V. MICHELINI,f,†
JEFF M. SNELL,a,c
SAIANAND BALU,a
ALAN P. HOYLE,a
JOEL S. PARKER,a,c
MICHELE C. HAYWARD,a
DAVID A. EBERHARD,a,b
ASHLEY H. SALAZAR,a
PATRICK MCNEILLIE,g
JIA XU,g
CLAUDIA S. HUETTNER,g
TAKAHIKO KOYAMA,h
FILIPPO UTRO,h
KAHN RHRISSORRAKRAI,h
RAQUEL NOREL,h
ERHAN BILAL,h
AJAY ROYYURU,h
LAXMI PARIDA,h
H. SHELTON EARP,a,d
JUNEKO E. GRILLEY-OLSON,a,d
D. NEIL HAYES,a,d
STEPHEN J. HARVEY,i
NORMAN E. SHARPLESS,a,c,d
WILLIAM Y. KIM
a,c,d,e
a
Lineberger Comprehensive Cancer Center, b
Department of Pathology and Laboratory Medicine, c
Department of Genetics, d
Department of
Medicine, and e
Department of Urology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA; f
IBM Watson Health,
Boca Raton, Florida, USA; g
IBM Watson Health, Cambridge, Massachusetts, USA; h
IBM Research, Yorktown Heights, New York, USA; i
IBM
Watson Health, Herndon, Virginia, USA
†
Contributed equally
Disclosures of potential conflicts of interest may be found at the end of this article.
Key Words. Genomics • High-throughput nucleotide sequencing • Artificial intelligence • Precision medicine
ABSTRACT
Background. Using next-generation sequencing (NGS) to guide
cancer therapy has created challenges in analyzing and report-
ing large volumes of genomic data to patients and caregivers.
Specifically, providing current, accurate information on newly
approved therapies and open clinical trials requires consider-
able manual curation performed mainly by human “molecular
tumor boards” (MTBs). The purpose of this study was to deter-
mine the utility of cognitive computing as performed by Wat-
son for Genomics (WfG) compared with a human MTB.
Materials and Methods. One thousand eighteen patient cases
that previously underwent targeted exon sequencing at the
University of North Carolina (UNC) and subsequent analysis
by the UNCseq informatics pipeline and the UNC MTB
between November 7, 2011, and May 12, 2015, were
analyzed with WfG, a cognitive computing technology for
genomic analysis.
Results. Using a WfG-curated actionable gene list, we identified
additional genomic events of potential significance (not discov-
ered by traditional MTB curation) in 323 (32%) patients. The
majority of these additional genomic events were considered
actionable based upon their ability to qualify patients for
biomarker-selected clinical trials. Indeed, the opening of a rele-
vant clinical trial within 1 month prior to WfG analysis provided
the rationale for identification of a new actionable event in
nearly a quarter of the 323 patients. This automated analysis
took 3 minutes per case.
Conclusion. These results demonstrate that the interpretation
and actionability of somatic NGS results are evolving too rapidly
to rely solely on human curation. Molecular tumor boards
empowered by cognitive computing could potentially improve
patient care by providing a rapid, comprehensive approach for
data analysis and consideration of up-to-date availability of
clinical trials.The Oncologist 2017;22:1–7
Implications for Practice: The results of this study demonstrate that the interpretation and actionability of somatic next-generation
sequencing results are evolving too rapidly to rely solely on human curation. Molecular tumor boards empowered by cognitive
computing can significantly improve patient care by providing a fast, cost-effective, and comprehensive approach for data analysis
in the delivery of precision medicine. Patients and physicians who are considering enrollment in clinical trials may benefit from the
support of such tools applied to genomic data.
INTRODUCTION
Next-generation sequencing (NGS) has emerged as an afford-
able and reproducible means to query patients’ tumors for
somatic genetic anomalies [1, 2].The optimal utilization of NGS
is fundamental to the promise of “precision medicine,” yet the
results of even targeted-capture NGS are highly complex,
returning a variety of somatic events in hundreds of analyzed
genes. The majority of such events have no known relevance
to the treatment of patients with cancer, and even for
Correspondence: William Y. Kim, M.D., or Norman E. Sharpless, M.D., Lineberger Comprehensive Cancer Center, University of North Carolina, CB#
7295, Chapel Hill, North Carolina 27599-7295, USA.Telephone: 919-966-4765; e-mail: william_kim@med.unc.edu or nes@med.unc.edu Received
April 18, 2017; accepted for publication October 6, 2017. http://dx.doi.org/10.1634/theoncologist.2017-0170
TheOncologist 2017;22:1–7 www.TheOncologist.com Oc AlphaMed Press 2017
Cancer Diagnostics and Molecular Pathology
Published Ahead of Print on November 20, 2017 as 10.1634/theoncologist.2017-0170.
byguestonNovember22,2017http://theoncologist.alphamedpress.org/Downloadedfrom
AJR:209, December 2017 1
Since 1992, concerns regarding interob-
server variability in manual bone age esti-
mation [4] have led to the establishment of
several automatic computerized methods for
bone age estimation, including computer-as-
sisted skeletal age scores, computer-aided
skeletal maturation assessment systems, and
BoneXpert (Visiana) [5–14]. BoneXpert was
developed according to traditional machine-
learning techniques and has been shown to
have a good performance for patients of var-
ious ethnicities and in various clinical set-
tings [10–14]. The deep-learning technique
is an improvement in artificial neural net-
works. Unlike traditional machine-learning
techniques, deep-learning techniques allow
an algorithm to program itself by learning
from the images given a large dataset of la-
beled examples, thus removing the need to
specify rules [15].
Deep-learning techniques permit higher
levels of abstraction and improved predic-
tions from data. Deep-learning techniques
Computerized Bone Age
Estimation Using Deep Learning–
Based Program: Evaluation of the
Accuracy and Efficiency
Jeong Rye Kim1
Woo Hyun Shim1
Hee Mang Yoon1
Sang Hyup Hong1
Jin Seong Lee1
Young Ah Cho1
Sangki Kim2
Kim JR, Shim WH, Yoon MH, et al.
1
Department of Radiology and Research Institute of
Radiology, Asan Medical Center, University of Ulsan
College of Medicine, 88 Olympic-ro 43-gil, Songpa-gu,
Seoul 05505, South Korea. Address correspondence to
H. M. Yoon (espoirhm@gmail.com).
2
Vuno Research Center, Vuno Inc., Seoul, South Korea.
Pediatric Imaging • Original Research
Supplemental Data
Available online at www.ajronline.org.
AJR 2017; 209:1–7
0361–803X/17/2096–1
© American Roentgen Ray Society
B
one age estimation is crucial for
developmental status determina-
tions and ultimate height predic-
tions in the pediatric population,
particularly for patients with growth disor-
ders and endocrine abnormalities [1]. Two
major left-hand wrist radiograph-based
methods for bone age estimation are current-
ly used: the Greulich-Pyle [2] and Tanner-
Whitehouse [3] methods. The former is much
more frequently used in clinical practice.
Greulich-Pyle–based bone age estimation is
performed by comparing a patient’s left-hand
radiograph to standard radiographs in the
Greulich-Pyle atlas and is therefore simple
and easily applied in clinical practice. How-
ever, the process of bone age estimation,
which comprises a simple comparison of
multiple images, can be repetitive and time
consuming and is thus sometimes burden-
some to radiologists. Moreover, the accuracy
depends on the radiologist’s experience and
tends to be subjective.
Keywords: bone age, children, deep learning, neural
network model
DOI:10.2214/AJR.17.18224
J. R. Kim and W. H. Shim contributed equally to this work.
Received March 12, 2017; accepted after revision
July 7, 2017.
S. Kim is employed by Vuno, Inc., which created the deep
learning–based automatic software system for bone
age determination. J. R. Kim, W. H. Shim, H. M. Yoon,
S. H. Hong, J. S. Lee, and Y. A. Cho are employed by
Asan Medical Center, which holds patent rights for the
deep learning–based automatic software system for
bone age assessment.
OBJECTIVE. The purpose of this study is to evaluate the accuracy and efficiency of a
new automatic software system for bone age assessment and to validate its feasibility in clini-
cal practice.
MATERIALS AND METHODS. A Greulich-Pyle method–based deep-learning tech-
nique was used to develop the automatic software system for bone age determination. Using
this software, bone age was estimated from left-hand radiographs of 200 patients (3–17 years
old) using first-rank bone age (software only), computer-assisted bone age (two radiologists
with software assistance), and Greulich-Pyle atlas–assisted bone age (two radiologists with
Greulich-Pyle atlas assistance only). The reference bone age was determined by the consen-
sus of two experienced radiologists.
RESULTS. First-rank bone ages determined by the automatic software system showed a
69.5% concordance rate and significant correlations with the reference bone age (r = 0.992;
p  0.001). Concordance rates increased with the use of the automatic software system for
both reviewer 1 (63.0% for Greulich-Pyle atlas–assisted bone age vs 72.5% for computer-as-
sisted bone age) and reviewer 2 (49.5% for Greulich-Pyle atlas–assisted bone age vs 57.5% for
computer-assisted bone age). Reading times were reduced by 18.0% and 40.0% for reviewers
1 and 2, respectively.
CONCLUSION. Automatic software system showed reliably accurate bone age estima-
tions and appeared to enhance efficiency by reducing reading times without compromising
the diagnostic accuracy.
Kim et al.
Accuracy and Efficiency of Computerized Bone Age Estimation
Pediatric Imaging
Original Research
Downloadedfromwww.ajronline.orgbyFloridaAtlanticUnivon09/13/17fromIPaddress131.91.169.193.CopyrightARRS.Forpersonaluseonly;allrightsreserved
TECHNICAL REPORT
https://doi.org/10.1038/s41591-019-0539-7
1
Google Health, Mountain View, CA, USA. 2
Present address: AstraZeneca, Gaithersburg, MD, USA. 3
Present address: Tempus Labs Inc., Chicago, IL, USA.
4
These authors contributed equally: Po-Hsuan Cameron Chen, Krishna Gadepalli, Robert MacDonald. *e-mail: cmermel@google.com
M
icroscopic examination of samples is the gold standard for
the diagnosis of cancer, autoimmune diseases, infectious
diseases and more. In cancer, the microscopic examina-
tion of stained tissue sections is critical for diagnosing and staging
the patient’s tumor, which informs treatment decisions and prog-
nosis. In cancer, microscopy analysis faces three major challenges.
As a form of image interpretation, these examinations are inher-
ently subjective, exhibiting considerable inter- and intra-observer
variability1,2
. Moreover, clinical guidelines3
and studies4
have begun
to require quantitative assessments as part of the effort towards
better patient risk stratification3
. For example, breast cancer stag-
ing requires the counting of mitotic cells, and quantification of
the tumor burden in lymph nodes by measuring the largest tumor
focus. However, despite being helpful in treatment planning, quan-
tification is laborious and error-prone. Lastly, access to disease
experts can be limited in both developed and developing countries5
,
exacerbating the problem.
As a potential solution, recent advances in AI, specifically deep
learning6
, have demonstrated automated medical image analy-
sis with performance comparable to that by human experts1,7–10
.
Research has also shown the potential to improve diagnostic
accuracy, quantitation and efficiency by applying deep learning
algorithms to digitized whole-slide pathology images for cancer
classification and detection8,10,11
. However, the integration of these
advances into cancer diagnosis is not straightforward because of
two primary challenges: image digitization and the technical skills
required to utilize deep learning algorithms. First, most micro-
scopic examinations are performed using analog microscopes, and
a digitized workflow requires significant infrastructure investments.
Second, because of differences in hardware, firmware and software,
the use of AI algorithms developed by others is challenging even for
experts. As such, actual utilization of AI in microscopy frequently
remains inaccessible.
Here, we propose a cost-effective solution to these barriers to
entry of AI in microscopic analysis: an augmented optical light
microscope that enables real-time integration of AI. We define
‘real-time integration’ as adding the capability of AI assistance
without slowing down specimen review or modifying the standard
workflow. We propose to superimpose the predictions of the AI
algorithm on the view of the sample seen by the user through the
eyepiece. Because augmenting additional information over the orig-
inal view is termed augmented reality, we term this microscope the
augmented reality microscope. Although we apply this technology
to cancer diagnosis in this paper, the ARM is application-agnostic
and can be utilized in other microscopy applications.
Aligned with ARM’s function to serve as a viable platform for
AI assistance in microscopy applications, the ARM system satisfies
three major design requirements: spatial registration of the aug-
mented information, system response time and robustness of the
deep learning algorithms. First, AI predictions such as tumor or
cell locations need to be precisely aligned with the specimen in the
observer’s field of view (FOV) to retain the correct spatial context.
Importantly, this alignment must be insensitive to small changes
in the user’s eye position relative to the eyepiece (parallax-free) to
account for user movements. Second, although the latest deep learn-
ing algorithms often require billions of mathematical operations12
,
these algorithms have to be applied in real time to avoid unnatural
latency in the workflow. This is especially critical in applications
such as cancer diagnosis, where the pathologist is constantly and
rapidly panning around the slide. Finally, many deep learning algo-
rithms for microscope images were developed using other digitiza-
tion methods, such as whole-slide scanners in histopathology8,10,11
.
We demonstrate that two deep learning a
An augmen ed ea y m c oscope w h
ea me a c a n e gence n eg a on
o cance d agnos s
m M
m
M m M m
W
W
Lung cancer

on Chest CT
Lung Cancer

on Pathology
Breast Cancer

on Pathology
Polys  Adenoma

on Colonoscopy
Analysis of 

Cancer Genome
Analysis of 

Cancer Genome
Assessment of 

Bone Age
Breast Cancer

on Pathology
Breast Cancer

on Pathology
Improving the accuracy
Improving the efficiency
ARTICLES
https://doi.org/10.1038/s41591-018-0177-5
1
Applied Bioinformatics Laboratories, New York University School of Medicine, New York, NY, USA. 2
Skirball Institute, Department of Cell Biology,
New York University School of Medicine, New York, NY, USA. 3
Department of Pathology, New York University School of Medicine, New York, NY, USA.
4
School of Mechanical Engineering, National Technical University of Athens, Zografou, Greece. 5
Institute for Systems Genetics, New York University School
of Medicine, New York, NY, USA. 6
Department of Biochemistry and Molecular Pharmacology, New York University School of Medicine, New York, NY,
USA. 7
Center for Biospecimen Research and Development, New York University, New York, NY, USA. 8
Department of Population Health and the Center for
Healthcare Innovation and Delivery Science, New York University School of Medicine, New York, NY, USA. 9
These authors contributed equally to this work:
Nicolas Coudray, Paolo Santiago Ocampo. *e-mail: narges.razavian@nyumc.org; aristotelis.tsirigos@nyumc.org
A
ccording to the American Cancer Society and the Cancer
Statistics Center (see URLs), over 150,000 patients with lung
cancer succumb to the disease each year (154,050 expected
for 2018), while another 200,000 new cases are diagnosed on a
yearly basis (234,030 expected for 2018). It is one of the most widely
spread cancers in the world because of not only smoking, but also
exposure to toxic chemicals like radon, asbestos and arsenic. LUAD
and LUSC are the two most prevalent types of non–small cell lung
cancer1
, and each is associated with discrete treatment guidelines. In
the absence of definitive histologic features, this important distinc-
tion can be challenging and time-consuming, and requires confir-
matory immunohistochemical stains.
Classification of lung cancer type is a key diagnostic process
because the available treatment options, including conventional
chemotherapy and, more recently, targeted therapies, differ for
LUAD and LUSC2
. Also, a LUAD diagnosis will prompt the search
for molecular biomarkers and sensitizing mutations and thus has
a great impact on treatment options3,4
. For example, epidermal
growth factor receptor (EGFR) mutations, present in about 20% of
LUAD, and anaplastic lymphoma receptor tyrosine kinase (ALK)
rearrangements, present in5% of LUAD5
, currently have tar-
geted therapies approved by the Food and Drug Administration
(FDA)6,7
. Mutations in other genes, such as KRAS and tumor pro-
tein P53 (TP53) are very common (about 25% and 50%, respec-
tively) but have proven to be particularly challenging drug targets
so far5,8
. Lung biopsies are typically used to diagnose lung cancer
type and stage. Virtual microscopy of stained images of tissues is
typically acquired at magnifications of 20×to 40×, generating very
large two-dimensional images (10,000 to100,000 pixels in each
dimension) that are oftentimes challenging to visually inspect in
an exhaustive manner. Furthermore, accurate interpretation can be
difficult, and the distinction between LUAD and LUSC is not always
clear, particularly in poorly differentiated tumors; in this case, ancil-
lary studies are recommended for accurate classification9,10
. To assist
experts, automatic analysis of lung cancer whole-slide images has
been recently studied to predict survival outcomes11
and classifica-
tion12
. For the latter, Yu et al.12
combined conventional thresholding
and image processing techniques with machine-learning methods,
such as random forest classifiers, support vector machines (SVM) or
Naive Bayes classifiers, achieving an AUC of ~0.85 in distinguishing
normal from tumor slides, and ~0.75 in distinguishing LUAD from
LUSC slides. More recently, deep learning was used for the classi-
fication of breast, bladder and lung tumors, achieving an AUC of
0.83 in classification of lung tumor types on tumor slides from The
Cancer Genome Atlas (TCGA)13
. Analysis of plasma DNA values
was also shown to be a good predictor of the presence of non–small
cell cancer, with an AUC of ~0.94 (ref. 14
) in distinguishing LUAD
from LUSC, whereas the use of immunochemical markers yields an
AUC of ~0.94115
.
Here, we demonstrate how the field can further benefit from deep
learning by presenting a strategy based on convolutional neural
networks (CNNs) that not only outperforms methods in previously
Classification and mutation prediction from
non–small cell lung cancer histopathology
images using deep learning
Nicolas Coudray 1,2,9
, Paolo Santiago Ocampo3,9
, Theodore Sakellaropoulos4
, Navneet Narula3
,
Matija Snuderl3
, David Fenyö5,6
, Andre L. Moreira3,7
, Narges Razavian 8
* and Aristotelis Tsirigos 1,3
*
Visual inspection of histopathology slides is one of the main methods used by pathologists to assess the stage, type and sub-
type of lung tumors. Adenocarcinoma (LUAD) and squamous cell carcinoma (LUSC) are the most prevalent subtypes of lung
cancer, and their distinction requires visual inspection by an experienced pathologist. In this study, we trained a deep con-
volutional neural network (inception v3) on whole-slide images obtained from The Cancer Genome Atlas to accurately and
automatically classify them into LUAD, LUSC or normal lung tissue. The performance of our method is comparable to that of
pathologists, with an average area under the curve (AUC) of 0.97. Our model was validated on independent datasets of frozen
tissues, formalin-fixed paraffin-embedded tissues and biopsies. Furthermore, we trained the network to predict the ten most
commonly mutated genes in LUAD. We found that six of them—STK11, EGFR, FAT1, SETBP1, KRAS and TP53—can be pre-
dicted from pathology images, with AUCs from 0.733 to 0.856 as measured on a held-out population. These findings suggest
that deep-learning models can assist pathologists in the detection of cancer subtype or gene mutations. Our approach can be
applied to any cancer type, and the code is available at https://github.com/ncoudray/DeepPATH.
NATURE MEDICINE | www.nature.com/naturemedicine
1Wang P, et al. Gut 2019;0:1–7. doi:10.1136/gutjnl-2018-317500
Endoscopy
ORIGINAL ARTICLE
Real-time automatic detection system increases
colonoscopic polyp and adenoma detection rates: a
prospective randomised controlled study
Pu Wang,  1
Tyler M Berzin,  2
Jeremy Romek Glissen Brown,  2
Shishira Bharadwaj,2
Aymeric Becq,2
Xun Xiao,1
Peixi Liu,1
Liangping Li,1
Yan Song,1
Di Zhang,1
Yi Li,1
Guangre Xu,1
Mengtian Tu,1
Xiaogang Liu  1
To cite: Wang P, Berzin TM,
Glissen Brown JR, et al. Gut
Epub ahead of print: [please
include Day Month Year].
doi:10.1136/
gutjnl-2018-317500
► Additional material is
published online only.To view
please visit the journal online
(http://dx.doi.org/10.1136/
gutjnl-2018-317500).
1
Department of
Gastroenterology, Sichuan
Academy of Medical Sciences
 Sichuan Provincial People’s
Hospital, Chengdu, China
2
Center for Advanced
Endoscopy, Beth Israel
Deaconess Medical Center and
Harvard Medical School, Boston,
Massachusetts, USA
Correspondence to
Xiaogang Liu, Department
of Gastroenterology Sichuan
Academy of Medical Sciences
and Sichuan Provincial People’s
Hospital, Chengdu, China;
Gary.samsph@gmail.com
Received 30 August 2018
Revised 4 February 2019
Accepted 13 February 2019
© Author(s) (or their
employer(s)) 2019. Re-use
permitted under CC BY-NC. No
commercial re-use. See rights
and permissions. Published
by BMJ.
ABSTRACT
Objective The effect of colonoscopy on colorectal
cancer mortality is limited by several factors, among them
a certain miss rate, leading to limited adenoma detection
rates (ADRs).We investigated the effect of an automatic
polyp detection system based on deep learning on polyp
detection rate and ADR.
Design In an open, non-blinded trial, consecutive
patients were prospectively randomised to undergo
diagnostic colonoscopy with or without assistance of a
real-time automatic polyp detection system providing
a simultaneous visual notice and sound alarm on polyp
detection.The primary outcome was ADR.
Results Of 1058 patients included, 536 were
randomised to standard colonoscopy, and 522 were
randomised to colonoscopy with computer-aided
diagnosis.The artificial intelligence (AI) system
significantly increased ADR (29.1%vs20.3%, p0.001)
and the mean number of adenomas per patient
(0.53vs0.31, p0.001).This was due to a higher number
of diminutive adenomas found (185vs102; p0.001),
while there was no statistical difference in larger
adenomas (77vs58, p=0.075). In addition, the number
of hyperplastic polyps was also significantly increased
(114vs52, p0.001).
Conclusions In a low prevalent ADR population, an
automatic polyp detection system during colonoscopy
resulted in a significant increase in the number of
diminutive adenomas detected, as well as an increase in
the rate of hyperplastic polyps.The cost–benefit ratio of
such effects has to be determined further.
Trial registration number ChiCTR-DDD-17012221;
Results.
INTRODUCTION
Colorectal cancer (CRC) is the second and third-
leading causes of cancer-related deaths in men and
women respectively.1
Colonoscopy is the gold stan-
dard for screening CRC.2 3
Screening colonoscopy
has allowed for a reduction in the incidence and
mortality of CRC via the detection and removal
of adenomatous polyps.4–8
Additionally, there is
evidence that with each 1.0% increase in adenoma
detection rate (ADR), there is an associated 3.0%
decrease in the risk of interval CRC.9 10
However,
polyps can be missed, with reported miss rates of
up to 27% due to both polyp and operator charac-
teristics.11 12
Unrecognised polyps within the visual field is
an important problem to address.11
Several studies
have shown that assistance by a second observer
increases the polyp detection rate (PDR), but such a
strategy remains controversial in terms of increasing
the ADR.13–15
Ideally, a real-time automatic polyp detec-
tion system, with performance close to that of
expert endoscopists, could assist the endosco-
pist in detecting lesions that might correspond to
adenomas in a more consistent and reliable way
Significance of this study
What is already known on this subject?
► Colorectal adenoma detection rate (ADR)
is regarded as a main quality indicator of
(screening) colonoscopy and has been shown
to correlate with interval cancers. Reducing
adenoma miss rates by increasing ADR has
been a goal of many studies focused on
imaging techniques and mechanical methods.
► Artificial intelligence has been recently
introduced for polyp and adenoma detection
as well as differentiation and has shown
promising results in preliminary studies.
What are the new findings?
► This represents the first prospective randomised
controlled trial examining an automatic polyp
detection during colonoscopy and shows an
increase of ADR by 50%, from 20% to 30%.
► This effect was mainly due to a higher rate of
small adenomas found.
► The detection rate of hyperplastic polyps was
also significantly increased.
How might it impact on clinical practice in the
foreseeable future?
► Automatic polyp and adenoma detection could
be the future of diagnostic colonoscopy in order
to achieve stable high adenoma detection rates.
► However, the effect on ultimate outcome is
still unclear, and further improvements such as
polyp differentiation have to be implemented.
on17March2019byguest.Protectedbycopyright.http://gut.bmj.com/Gut:firstpublishedas10.1136/gutjnl-2018-317500on27February2019.Downloadedfrom
This copy is for personal use only.
To order printed copies, contact reprints@rsna.org
This copy is for personal use only.
To order printed copies, contact reprints@rsna.org
ORIGINAL RESEARCH • THORACIC IMAGING
C
hest radiography, one of the most common diagnos-
tic imaging tests in medicine, is used for screening,
diagnostic work-ups, and monitoring of various thoracic
diseases (1,2). One of its major objectives is detection of
pulmonary nodules because pulmonary nodules are often
the initial radiologic manifestation of lung cancers (1,2).
However, to date, pulmonary nodule detection on chest
radiographs has not been completely satisfactory, with a
reported sensitivity ranging between 36%–84%, varying
widely according to the tumor size and study population
(2–6). Indeed, chest radiography has been shown to be
prone to many reading errors with low interobserver and
intraobserver agreements because of its limited spatial reso-
lution, noise from overlapping anatomic structures, and
the variable perceptual ability of radiologists. Recent work
shows that 19%–26% of lung cancers visible on chest ra-
diographs were in fact missed at their first readings (6,7).
Of course, hindsight is always perfect when one knows
where to look.
For this reason, there has been increasing dependency
on chest CT images over chest radiographs in pulmonary
nodule detection. However, even low-dose CT scans re-
quire approximately 50–100 times higher radiation dose
than single-view chest radiographic examinations (8,9)
Development and Validation of Deep
Learning–based Automatic Detection
Algorithm for Malignant Pulmonary Nodules
on Chest Radiographs
Ju Gang Nam, MD* • Sunggyun Park, PhD* • Eui Jin Hwang, MD • Jong Hyuk Lee, MD • Kwang-Nam Jin, MD,
PhD • KunYoung Lim, MD, PhD • Thienkai HuyVu, MD, PhD • Jae Ho Sohn, MD • Sangheum Hwang, PhD • Jin
Mo Goo, MD, PhD • Chang Min Park, MD, PhD
From the Department of Radiology and Institute of Radiation Medicine, Seoul National University Hospital and College of Medicine, 101 Daehak-ro, Jongno-gu, Seoul
03080, Republic of Korea (J.G.N., E.J.H., J.M.G., C.M.P.); Lunit Incorporated, Seoul, Republic of Korea (S.P.); Department of Radiology, Armed Forces Seoul Hospital,
Seoul, Republic of Korea (J.H.L.); Department of Radiology, Seoul National University Boramae Medical Center, Seoul, Republic of Korea (K.N.J.); Department of
Radiology, National Cancer Center, Goyang, Republic of Korea (K.Y.L.); Department of Radiology and Biomedical Imaging, University of California, San Francisco,
San Francisco, Calif (T.H.V., J.H.S.); and Department of Industrial  Information Systems Engineering, Seoul National University of Science and Technology, Seoul,
Republic of Korea (S.H.). Received January 30, 2018; revision requested March 20; revision received July 29; accepted August 6. Address correspondence to C.M.P.
(e-mail: cmpark.morphius@gmail.com).
Study supported by SNUH Research Fund and Lunit (06–2016–3000) and by Seoul Research and Business Development Program (FI170002).
*J.G.N. and S.P. contributed equally to this work.
Conflicts of interest are listed at the end of this article.
Radiology 2018; 00:1–11 • https://doi.org/10.1148/radiol.2018180237 • Content codes:
Purpose: To develop and validate a deep learning–based automatic detection algorithm (DLAD) for malignant pulmonary nodules
on chest radiographs and to compare its performance with physicians including thoracic radiologists.
Materials and Methods: For this retrospective study, DLAD was developed by using 43292 chest radiographs (normal radiograph–
to–nodule radiograph ratio, 34067:9225) in 34676 patients (healthy-to-nodule ratio, 30784:3892; 19230 men [mean age, 52.8
years; age range, 18–99 years]; 15446 women [mean age, 52.3 years; age range, 18–98 years]) obtained between 2010 and 2015,
which were labeled and partially annotated by 13 board-certified radiologists, in a convolutional neural network. Radiograph clas-
sification and nodule detection performances of DLAD were validated by using one internal and four external data sets from three
South Korean hospitals and one U.S. hospital. For internal and external validation, radiograph classification and nodule detection
performances of DLAD were evaluated by using the area under the receiver operating characteristic curve (AUROC) and jackknife
alternative free-response receiver-operating characteristic (JAFROC) figure of merit (FOM), respectively. An observer performance
test involving 18 physicians, including nine board-certified radiologists, was conducted by using one of the four external validation
data sets. Performances of DLAD, physicians, and physicians assisted with DLAD were evaluated and compared.
Results: According to one internal and four external validation data sets, radiograph classification and nodule detection perfor-
mances of DLAD were a range of 0.92–0.99 (AUROC) and 0.831–0.924 (JAFROC FOM), respectively. DLAD showed a higher
AUROC and JAFROC FOM at the observer performance test than 17 of 18 and 15 of 18 physicians, respectively (P , .05), and
all physicians showed improved nodule detection performances with DLAD (mean JAFROC FOM improvement, 0.043; range,
0.006–0.190; P , .05).
Conclusion: This deep learning–based automatic detection algorithm outperformed physicians in radiograph classification and nod-
ule detection performance for malignant pulmonary nodules on chest radiographs, and it enhanced physicians’ performances when
used as a second reader.
©RSNA, 2018
Online supplemental material is available for this article.
LETTERS
https://doi.org/10.1038/s41591-019-0447-x
1
Google AI, Mountain View, CA, USA. 2
Stanford Health Care and Palo Alto Veterans Affairs, Palo Alto, CA, USA. 3
Northwestern Medicine, Chicago, IL, USA.
4
New York University-Langone Medical Center, Center for Biological Imaging, New York City, NY, USA. 5
These authors contributed equally: Diego Ardila,
Atilla P. Kiraly, Sujeeth Bharadwaj, Bokyung Choi. *e-mail: tsed@google.com
With an estimated 160,000 deaths in 2018, lung cancer is
the most common cause of cancer death in the United States1
.
Lung cancer screening using low-dose computed tomography
has been shown to reduce mortality by 20–43% and is now
included in US screening guidelines1–6
. Existing challenges
include inter-grader variability and high false-positive and
false-negative rates7–10
. We propose a deep learning algorithm
that uses a patient’s current and prior computed tomography
volumes to predict the risk of lung cancer. Our model achieves
a state-of-the-art performance (94.4% area under the curve)
on 6,716 National Lung Cancer Screening Trial cases, and
performs similarly on an independent clinical validation set
of 1,139cases. We conducted two reader studies. When prior
computed tomography imaging was not available, our model
outperformed all six radiologists with absolute reductions of
11% in false positives and 5% in false negatives. Where prior
computed tomography imaging was available, the model per-
formance was on-par with the same radiologists. This creates
an opportunity to optimize the screening process via com-
puter assistance and automation. While the vast majority of
patients remain unscreened, we show the potential for deep
learning models to increase the accuracy, consistency and
adoption of lung cancer screening worldwide.
In 2013, the United States Preventive Services Task Force rec-
ommended low-dose computed tomography (LDCT) lung cancer
screening in high-risk populations based on reported improved
mortality in the National Lung Cancer Screening Trial (NLST)2–5
.
In 2014, the American College of Radiology published the Lung-
RADS guidelines for LDCT lung cancer screening, to standardize
image interpretation by radiologists and dictate management rec-
ommendations1,6
. Evaluation is based on a variety of image find-
ings, but primarily nodule size, density and growth6
. At screening
sites, Lung-RADS and other models such as PanCan are used to
determine malignancy risk ratings that drive recommendations for
clinical management11,12
. Improving the sensitivity and specificity
of lung cancer screening is imperative because of the high clinical
and financial costs of missed diagnosis, late diagnosis and unneces-
sary biopsy procedures resulting from false negatives and false posi-
tives5,13–17
. Despite improved consistency, persistent inter-grader
variability and incomplete characterization of comprehensive
imaging findings remain as limitations7–10
of Lung-RADS. These
limitations suggest opportunities for more sophisticated systems
to improve performance and inter-reader consistency18,19
. Deep
learning approaches offer the exciting potential to automate more
complex image analysis, detect subtle holistic imaging findings and
unify methodologies for image evaluation20
.
A variety of software devices have been approved by the
Food and Drug Administration (FDA) with the goal of address-
ing workflow efficiency and performance through augmented
detection of lung nodules on lung computed tomography (CT)21
.
Clinical research has primarily focused on either nodule detec-
tion or diagnostic support for lesions manually selected by imag-
ing experts22–27
. Nodule detection systems were engineered with
the goal of improving radiologist sensitivity in identifying nod-
ules while minimizing costs to specificity, thereby falling into the
category of computer-aided detection (CADe)28
. This approach
highlights small nodules, leaving malignancy risk evaluation and
clinical decision making to the clinician. Diagnostic support for
pre-identified lesions is included in computer-aided diagnosis
(CADx) platforms, which are primarily aimed at improving speci-
ficity. CADx has gained greater interest and even first regulatory
approvals in other areas of radiology, though not in lung cancer at
the time of manuscript preparation29
.
To move beyond the limitations of prior CADe and CADx
approaches, we aimed to build an end-to-end approach perform-
ing both localization and lung cancer risk categorization tasks using
the input CT data alone. More specifically, we were interested in
replicating a more complete part of a radiologist’s workflow, includ-
ing full assessment of LDCT volume, focus on regions of concern,
comparison to prior imaging when available and calibration against
biopsy-confirmed outcomes.
Another important high-level decision in our approach was to
learn features using deep convolutional neural networks (CNN),
rather than using hand-engineered features such as texture fea-
tures or specific Hounsfield unit values. We chose to learn features
because this approach has repeatedly been shown superior to hand-
engineered features in many open computer vision competitions in
the past five years30,31
, including the Kaggle 2017 Data Science Bowl
which used NLST data32
.
There were three key components in our new approach (Fig. 1).
First, we constructed a three-dimensional (3D) CNN model that
performs end-to-end analysis of whole-CT volumes, using LDCT
End-to-end lung cancer screening with
three-dimensional deep learning on low-dose
chest computed tomography
Diego Ardila 1,5
, Atilla P. Kiraly1,5
, Sujeeth Bharadwaj1,5
, Bokyung Choi1,5
, Joshua J. Reicher2
,
Lily Peng1
, Daniel Tse 1
*, Mozziyar Etemadi 3
, Wenxing Ye1
, Greg Corrado1
, David P. Naidich4
and Shravya Shetty1
Corrected: Author Correction
NATURE MEDICINE | VOL 25 | JUNE 2019 | 954–961 | www.nature.com/naturemedicine954
Downloadedfromhttps://journals.lww.com/ajspbyBhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4XMi0hCywCX1AWnYQp/IlQrHD3MyLIZIvnCFZVJ56DGsD590P5lh5KqE20T/dBX3x9CoM=on10/14/2018
Downloadedfromhttps://journals.lww.com/ajspbyBhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4XMi0hCywCX1AWnYQp/IlQrHD3MyLIZIvnCFZVJ56DGsD590P5lh5KqE20T/dBX3x9CoM=on10/14/2018
Impact of Deep Learning Assistance on the
Histopathologic Review of Lymph Nodes for Metastatic
Breast Cancer
David F. Steiner, MD, PhD,* Robert MacDonald, PhD,* Yun Liu, PhD,* Peter Truszkowski, MD,*
Jason D. Hipp, MD, PhD, FCAP,* Christopher Gammage, MS,* Florence Thng, MS,†
Lily Peng, MD, PhD,* and Martin C. Stumpe, PhD*
Abstract: Advances in the quality of whole-slide images have set the
stage for the clinical use of digital images in anatomic pathology.
Along with advances in computer image analysis, this raises the
possibility for computer-assisted diagnostics in pathology to improve
histopathologic interpretation and clinical care. To evaluate the
potential impact of digital assistance on interpretation of digitized
slides, we conducted a multireader multicase study utilizing our deep
learning algorithm for the detection of breast cancer metastasis in
lymph nodes. Six pathologists reviewed 70 digitized slides from lymph
node sections in 2 reader modes, unassisted and assisted, with a wash-
out period between sessions. In the assisted mode, the deep learning
algorithm was used to identify and outline regions with high like-
lihood of containing tumor. Algorithm-assisted pathologists demon-
strated higher accuracy than either the algorithm or the pathologist
alone. In particular, algorithm assistance significantly increased the
sensitivity of detection for micrometastases (91% vs. 83%, P=0.02).
In addition, average review time per image was significantly shorter
with assistance than without assistance for both micrometastases (61
vs. 116 s, P=0.002) and negative images (111 vs. 137 s, P=0.018).
Lastly, pathologists were asked to provide a numeric score regarding
the difficulty of each image classification. On the basis of this score,
pathologists considered the image review of micrometastases to be
significantly easier when interpreted with assistance (P=0.0005).
Utilizing a proof of concept assistant tool, this study demonstrates the
potential of a deep learning algorithm to improve pathologist accu-
racy and efficiency in a digital pathology workflow.
Key Words: artificial intelligence, machine learning, digital pathology,
breast cancer, computer aided detection
(Am J Surg Pathol 2018;00:000–000)
The regulatory approval and gradual implementation of
whole-slide scanners has enabled the digitization of glass
slides for remote consults and archival purposes.1 Digitiza-
tion alone, however, does not necessarily improve the con-
sistency or efficiency of a pathologist’s primary workflow. In
fact, image review on a digital medium can be slightly
slower than on glass, especially for pathologists with limited
digital pathology experience.2 However, digital pathology
and image analysis tools have already demonstrated po-
tential benefits, including the potential to reduce inter-reader
variability in the evaluation of breast cancer HER2 status.3,4
Digitization also opens the door for assistive tools based on
Artificial Intelligence (AI) to improve efficiency and con-
sistency, decrease fatigue, and increase accuracy.5
Among AI technologies, deep learning has demon-
strated strong performance in many automated image-rec-
ognition applications.6–8 Recently, several deep learning–
based algorithms have been developed for the detection of
breast cancer metastases in lymph nodes as well as for other
applications in pathology.9,10 Initial findings suggest that
some algorithms can even exceed a pathologist’s sensitivity
for detecting individual cancer foci in digital images. How-
ever, this sensitivity gain comes at the cost of increased false
positives, potentially limiting the utility of such algorithms for
automated clinical use.11 In addition, deep learning algo-
rithms are inherently limited to the task for which they have
been specifically trained. While we have begun to understand
the strengths of these algorithms (such as exhaustive search)
and their weaknesses (sensitivity to poor optical focus, tumor
mimics; manuscript under review), the potential clinical util-
ity of such algorithms has not been thoroughly examined.
While an accurate algorithm alone will not necessarily aid
pathologists or improve clinical interpretation, these benefits
may be achieved through thoughtful and appropriate in-
tegration of algorithm predictions into the clinical workflow.8
From the *Google AI Healthcare; and †Verily Life Sciences, Mountain
View, CA.
D.F.S., R.M., and Y.L. are co-first authors (equal contribution).
Work done as part of the Google Brain Healthcare Technology Fellowship
(D.F.S. and P.T.).
Conflicts of Interest and Source of Funding: D.F.S., R.M., Y.L., P.T.,
J.D.H., C.G., F.T., L.P., M.C.S. are employees of Alphabet and have
Alphabet stock.
Correspondence: David F. Steiner, MD, PhD, Google AI Healthcare,
1600 Amphitheatre Way, Mountain View, CA 94043
(e-mail: davesteiner@google.com).
Supplemental Digital Content is available for this article. Direct URL citations
appear in the printed text and are provided in the HTML and PDF
versions of this article on the journal’s website, www.ajsp.com.
Copyright © 2018 The Author(s). Published by Wolters Kluwer Health,
Inc. This is an open-access article distributed under the terms of the
Creative Commons Attribution-Non Commercial-No Derivatives
License 4.0 (CCBY-NC-ND), where it is permissible to download and
share the work provided it is properly cited. The work cannot be
changed in any way or used commercially without permission from
the journal.
ORIGINAL ARTICLE
Am J Surg Pathol  Volume 00, Number 00, ’’ 2018 www.ajsp.com | 1
Lung nodule 

on Chest X-ray
Downloadedfromhttps://journals.lww.com/ajspbyBhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4XMi0hCywCX1AWnYQp/IlQrHD3MyLIZIvnCFZVJ56DGsD590P5lh5KqE20T/dBX3x9CoM=on10/14/2018
Downloadedfromhttps://journals.lww.com/ajspbyBhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4XMi0hCywCX1AWnYQp/IlQrHD3MyLIZIvnCFZVJ56DGsD590P5lh5KqE20T/dBX3x9CoM=on10/14/2018
Impact of Deep Learning Assistance on the
Histopathologic Review of Lymph Nodes for Metastatic
Breast Cancer
David F. Steiner, MD, PhD,* Robert MacDonald, PhD,* Yun Liu, PhD,* Peter Truszkowski, MD,*
Jason D. Hipp, MD, PhD, FCAP,* Christopher Gammage, MS,* Florence Thng, MS,†
Lily Peng, MD, PhD,* and Martin C. Stumpe, PhD*
Abstract: Advances in the quality of whole-slide images have set the
stage for the clinical use of digital images in anatomic pathology.
Along with advances in computer image analysis, this raises the
possibility for computer-assisted diagnostics in pathology to improve
histopathologic interpretation and clinical care. To evaluate the
potential impact of digital assistance on interpretation of digitized
slides, we conducted a multireader multicase study utilizing our deep
learning algorithm for the detection of breast cancer metastasis in
lymph nodes. Six pathologists reviewed 70 digitized slides from lymph
node sections in 2 reader modes, unassisted and assisted, with a wash-
out period between sessions. In the assisted mode, the deep learning
algorithm was used to identify and outline regions with high like-
lihood of containing tumor. Algorithm-assisted pathologists demon-
strated higher accuracy than either the algorithm or the pathologist
alone. In particular, algorithm assistance significantly increased the
sensitivity of detection for micrometastases (91% vs. 83%, P=0.02).
In addition, average review time per image was significantly shorter
with assistance than without assistance for both micrometastases (61
vs. 116 s, P=0.002) and negative images (111 vs. 137 s, P=0.018).
Lastly, pathologists were asked to provide a numeric score regarding
the difficulty of each image classification. On the basis of this score,
pathologists considered the image review of micrometastases to be
significantly easier when interpreted with assistance (P=0.0005).
Utilizing a proof of concept assistant tool, this study demonstrates the
potential of a deep learning algorithm to improve pathologist accu-
racy and efficiency in a digital pathology workflow.
Key Words: artificial intelligence, machine learning, digital pathology,
breast cancer, computer aided detection
(Am J Surg Pathol 2018;00:000–000)
The regulatory approval and gradual implementation of
whole-slide scanners has enabled the digitization of glass
slides for remote consults and archival purposes.1 Digitiza-
tion alone, however, does not necessarily improve the con-
sistency or efficiency of a pathologist’s primary workflow. In
fact, image review on a digital medium can be slightly
slower than on glass, especially for pathologists with limited
digital pathology experience.2 However, digital pathology
and image analysis tools have already demonstrated po-
tential benefits, including the potential to reduce inter-reader
variability in the evaluation of breast cancer HER2 status.3,4
Digitization also opens the door for assistive tools based on
Artificial Intelligence (AI) to improve efficiency and con-
sistency, decrease fatigue, and increase accuracy.5
Among AI technologies, deep learning has demon-
strated strong performance in many automated image-rec-
ognition applications.6–8 Recently, several deep learning–
based algorithms have been developed for the detection of
breast cancer metastases in lymph nodes as well as for other
applications in pathology.9,10 Initial findings suggest that
some algorithms can even exceed a pathologist’s sensitivity
for detecting individual cancer foci in digital images. How-
ever, this sensitivity gain comes at the cost of increased false
positives, potentially limiting the utility of such algorithms for
automated clinical use.11 In addition, deep learning algo-
rithms are inherently limited to the task for which they have
been specifically trained. While we have begun to understand
the strengths of these algorithms (such as exhaustive search)
and their weaknesses (sensitivity to poor optical focus, tumor
mimics; manuscript under review), the potential clinical util-
ity of such algorithms has not been thoroughly examined.
While an accurate algorithm alone will not necessarily aid
pathologists or improve clinical interpretation, these benefits
may be achieved through thoughtful and appropriate in-
tegration of algorithm predictions into the clinical workflow.8
From the *Google AI Healthcare; and †Verily Life Sciences, Mountain
View, CA.
D.F.S., R.M., and Y.L. are co-first authors (equal contribution).
Work done as part of the Google Brain Healthcare Technology Fellowship
(D.F.S. and P.T.).
Conflicts of Interest and Source of Funding: D.F.S., R.M., Y.L., P.T.,
J.D.H., C.G., F.T., L.P., M.C.S. are employees of Alphabet and have
Alphabet stock.
Correspondence: David F. Steiner, MD, PhD, Google AI Healthcare,
1600 Amphitheatre Way, Mountain View, CA 94043
(e-mail: davesteiner@google.com).
Supplemental Digital Content is available for this article. Direct URL citations
appear in the printed text and are provided in the HTML and PDF
versions of this article on the journal’s website, www.ajsp.com.
Copyright © 2018 The Author(s). Published by Wolters Kluwer Health,
Inc. This is an open-access article distributed under the terms of the
Creative Commons Attribution-Non Commercial-No Derivatives
License 4.0 (CCBY-NC-ND), where it is permissible to download and
share the work provided it is properly cited. The work cannot be
changed in any way or used commercially without permission from
the journal.
ORIGINAL ARTICLE
Am J Surg Pathol  Volume 00, Number 00, ’’ 2018 www.ajsp.com | 1
Kazimierz O.
Wrzeszczynski, PhD*
Mayu O. Frank, NP,
MS*
Takahiko Koyama, PhD*
Kahn Rhrissorrakrai, PhD*
Nicolas Robine, PhD
Filippo Utro, PhD
Anne-Katrin Emde, PhD
Bo-Juen Chen, PhD
Kanika Arora, MS
Minita Shah, MS
Vladimir Vacic, PhD
Raquel Norel, PhD
Erhan Bilal, PhD
Ewa A. Bergmann, MSc
Julia L. Moore Vogel,
PhD
Jeffrey N. Bruce, MD
Andrew B. Lassman, MD
Peter Canoll, MD, PhD
Christian Grommes, MD
Steve Harvey, BS
Laxmi Parida, PhD
Vanessa V. Michelini, BS
Michael C. Zody, PhD
Vaidehi Jobanputra, PhD
Ajay K. Royyuru, PhD
Robert B. Darnell, MD,
PhD
Correspondence to Dr. Darnell:
darnelr@rockefeller.edu
Supplemental data
at Neurology.org/ng
Comparing sequencing assays and
human-machine analyses in actionable
genomics for glioblastoma
ABSTRACT
Objective: To analyze a glioblastoma tumor specimen with 3 different platforms and compare
potentially actionable calls from each.
Methods: Tumor DNA was analyzed by a commercial targeted panel. In addition, tumor-normal
DNA was analyzed by whole-genome sequencing (WGS) and tumor RNA was analyzed by RNA
sequencing (RNA-seq). The WGS and RNA-seq data were analyzed by a team of bioinformaticians
and cancer oncologists, and separately by IBM Watson Genomic Analytics (WGA), an automated
system for prioritizing somatic variants and identifying drugs.
Results: More variants were identified by WGS/RNA analysis than by targeted panels. WGA com-
pleted a comparable analysis in a fraction of the time required by the human analysts.
Conclusions: The development of an effective human-machine interface in the analysis of deep
cancer genomic datasets may provide potentially clinically actionable calls for individual pa-
tients in a more timely and efficient manner than currently possible.
ClinicalTrials.gov identifier: NCT02725684. Neurol Genet 2017;3:e164; doi: 10.1212/
NXG.0000000000000164
GLOSSARY
CNV 5 copy number variant; EGFR 5 epidermal growth factor receptor; GATK 5 Genome Analysis Toolkit; GBM 5 glioblas-
toma; IRB 5 institutional review board; NLP 5 Natural Language Processing; NYGC 5 New York Genome Center; RNA-seq 5
RNA sequencing; SNV 5 single nucleotide variant; SV 5 structural variant; TCGA 5 The Cancer Genome Atlas; TPM 5
transcripts per million; VCF 5 variant call file; VUS 5 variants of uncertain significance; WGA 5 Watson Genomic Analytics;
WGS 5 whole-genome sequencing.
The clinical application of next-generation sequencing technology to cancer diagnosis and treat-
ment is in its early stages.1–3
An initial implementation of this technology has been in targeted
panels, where subsets of cancer-relevant and/or highly actionable genes are scrutinized for
potentially actionable mutations. This approach has been widely adopted, offering high redun-
dancy of sequence coverage for the small number of sites of known clinical utility at relatively
low cost.
However, recent studies have shown that many more potentially clinically actionable muta-
tions exist both in known cancer genes and in other genes not yet identified as cancer drivers.4,5
Improvements in the efficiency of next-generation sequencing make it possible to consider
whole-genome sequencing (WGS) as well as other omic assays such as RNA sequencing
(RNA-seq) as clinical assays, but uncertainties remain about how much additional useful infor-
mation is available from these assays.
*These authors contributed equally to the manuscript.
From the New York Genome Center (K.O.W., M.O.F., N.R., A.-K.E., B.-J.C., K.A., M.S., V.V., E.A.B., J.L.M.V., M.C.Z., V.J., R.B.D.); IBM
Thomas J. Watson Research Center (T.K., K.R., F.U., R.N., E.B., L.P., A.K.R.); Columbia University Medical Center (J.N.B., A.B.L., P.C., V.J.);
Memorial Sloan-Kettering Cancer Center (C.G.), New York, NY; IBM Watson Health (S.H., V.V.M.), Boca Raton, FL; Laboratory of Molecular
Neuro-Oncology (M.O.F., R.B.D.), and Howard Hughes Medical Institute (R.B.D.), The Rockefeller University, New York, NY. B.-J.C. is
currently affiliated with Google, New York, NY. V.V. is currently affiliated with 23andMe, Inc., Mountain View, CA. E.A.B. is currently affiliated
with Max Planck Institute of Immunobiology and Epigenetics, Freiburg, Germany.
Funding information and disclosures are provided at the end of the article. Go to Neurology.org/ng for full disclosure forms. The Article Processing
Charge was funded by the authors.
This is an open access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License 4.0 (CC
BY-NC-ND), which permits downloading and sharing the work provided it is properly cited. The work cannot be changed in any way or used
commercially without permission from the journal.
Neurology.org/ng Copyright © 2017 The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Academy of Neurology. 1
Enhancing Next-Generation Sequencing-Guided Cancer Care Through
Cognitive Computing
NIRALI M. PATEL,a,b,†
VANESSA V. MICHELINI,f,†
JEFF M. SNELL,a,c
SAIANAND BALU,a
ALAN P. HOYLE,a
JOEL S. PARKER,a,c
MICHELE C. HAYWARD,a
DAVID A. EBERHARD,a,b
ASHLEY H. SALAZAR,a
PATRICK MCNEILLIE,g
JIA XU,g
CLAUDIA S. HUETTNER,g
TAKAHIKO KOYAMA,h
FILIPPO UTRO,h
KAHN RHRISSORRAKRAI,h
RAQUEL NOREL,h
ERHAN BILAL,h
AJAY ROYYURU,h
LAXMI PARIDA,h
H. SHELTON EARP,a,d
JUNEKO E. GRILLEY-OLSON,a,d
D. NEIL HAYES,a,d
STEPHEN J. HARVEY,i
NORMAN E. SHARPLESS,a,c,d
WILLIAM Y. KIM
a,c,d,e
a
Lineberger Comprehensive Cancer Center, b
Department of Pathology and Laboratory Medicine, c
Department of Genetics, d
Department of
Medicine, and e
Department of Urology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA; f
IBM Watson Health,
Boca Raton, Florida, USA; g
IBM Watson Health, Cambridge, Massachusetts, USA; h
IBM Research, Yorktown Heights, New York, USA; i
IBM
Watson Health, Herndon, Virginia, USA
†
Contributed equally
Disclosures of potential conflicts of interest may be found at the end of this article.
Key Words. Genomics • High-throughput nucleotide sequencing • Artificial intelligence • Precision medicine
ABSTRACT
Background. Using next-generation sequencing (NGS) to guide
cancer therapy has created challenges in analyzing and report-
ing large volumes of genomic data to patients and caregivers.
Specifically, providing current, accurate information on newly
approved therapies and open clinical trials requires consider-
able manual curation performed mainly by human “molecular
tumor boards” (MTBs). The purpose of this study was to deter-
mine the utility of cognitive computing as performed by Wat-
son for Genomics (WfG) compared with a human MTB.
Materials and Methods. One thousand eighteen patient cases
that previously underwent targeted exon sequencing at the
University of North Carolina (UNC) and subsequent analysis
by the UNCseq informatics pipeline and the UNC MTB
between November 7, 2011, and May 12, 2015, were
analyzed with WfG, a cognitive computing technology for
genomic analysis.
Results. Using a WfG-curated actionable gene list, we identified
additional genomic events of potential significance (not discov-
ered by traditional MTB curation) in 323 (32%) patients. The
majority of these additional genomic events were considered
actionable based upon their ability to qualify patients for
biomarker-selected clinical trials. Indeed, the opening of a rele-
vant clinical trial within 1 month prior to WfG analysis provided
the rationale for identification of a new actionable event in
nearly a quarter of the 323 patients. This automated analysis
took 3 minutes per case.
Conclusion. These results demonstrate that the interpretation
and actionability of somatic NGS results are evolving too rapidly
to rely solely on human curation. Molecular tumor boards
empowered by cognitive computing could potentially improve
patient care by providing a rapid, comprehensive approach for
data analysis and consideration of up-to-date availability of
clinical trials.The Oncologist 2017;22:1–7
Implications for Practice: The results of this study demonstrate that the interpretation and actionability of somatic next-generation
sequencing results are evolving too rapidly to rely solely on human curation. Molecular tumor boards empowered by cognitive
computing can significantly improve patient care by providing a fast, cost-effective, and comprehensive approach for data analysis
in the delivery of precision medicine. Patients and physicians who are considering enrollment in clinical trials may benefit from the
support of such tools applied to genomic data.
INTRODUCTION
Next-generation sequencing (NGS) has emerged as an afford-
able and reproducible means to query patients’ tumors for
somatic genetic anomalies [1, 2].The optimal utilization of NGS
is fundamental to the promise of “precision medicine,” yet the
results of even targeted-capture NGS are highly complex,
returning a variety of somatic events in hundreds of analyzed
genes. The majority of such events have no known relevance
to the treatment of patients with cancer, and even for
Correspondence: William Y. Kim, M.D., or Norman E. Sharpless, M.D., Lineberger Comprehensive Cancer Center, University of North Carolina, CB#
7295, Chapel Hill, North Carolina 27599-7295, USA.Telephone: 919-966-4765; e-mail: william_kim@med.unc.edu or nes@med.unc.edu Received
April 18, 2017; accepted for publication October 6, 2017. http://dx.doi.org/10.1634/theoncologist.2017-0170
TheOncologist 2017;22:1–7 www.TheOncologist.com Oc AlphaMed Press 2017
Cancer Diagnostics and Molecular Pathology
Published Ahead of Print on November 20, 2017 as 10.1634/theoncologist.2017-0170.
byguestonNovember22,2017http://theoncologist.alphamedpress.org/Downloadedfrom
AJR:209, December 2017 1
Since 1992, concerns regarding interob-
server variability in manual bone age esti-
mation [4] have led to the establishment of
several automatic computerized methods for
bone age estimation, including computer-as-
sisted skeletal age scores, computer-aided
skeletal maturation assessment systems, and
BoneXpert (Visiana) [5–14]. BoneXpert was
developed according to traditional machine-
learning techniques and has been shown to
have a good performance for patients of var-
ious ethnicities and in various clinical set-
tings [10–14]. The deep-learning technique
is an improvement in artificial neural net-
works. Unlike traditional machine-learning
techniques, deep-learning techniques allow
an algorithm to program itself by learning
from the images given a large dataset of la-
beled examples, thus removing the need to
specify rules [15].
Deep-learning techniques permit higher
levels of abstraction and improved predic-
tions from data. Deep-learning techniques
Computerized Bone Age
Estimation Using Deep Learning–
Based Program: Evaluation of the
Accuracy and Efficiency
Jeong Rye Kim1
Woo Hyun Shim1
Hee Mang Yoon1
Sang Hyup Hong1
Jin Seong Lee1
Young Ah Cho1
Sangki Kim2
Kim JR, Shim WH, Yoon MH, et al.
1
Department of Radiology and Research Institute of
Radiology, Asan Medical Center, University of Ulsan
College of Medicine, 88 Olympic-ro 43-gil, Songpa-gu,
Seoul 05505, South Korea. Address correspondence to
H. M. Yoon (espoirhm@gmail.com).
2
Vuno Research Center, Vuno Inc., Seoul, South Korea.
Pediatric Imaging • Original Research
Supplemental Data
Available online at www.ajronline.org.
AJR 2017; 209:1–7
0361–803X/17/2096–1
© American Roentgen Ray Society
B
one age estimation is crucial for
developmental status determina-
tions and ultimate height predic-
tions in the pediatric population,
particularly for patients with growth disor-
ders and endocrine abnormalities [1]. Two
major left-hand wrist radiograph-based
methods for bone age estimation are current-
ly used: the Greulich-Pyle [2] and Tanner-
Whitehouse [3] methods. The former is much
more frequently used in clinical practice.
Greulich-Pyle–based bone age estimation is
performed by comparing a patient’s left-hand
radiograph to standard radiographs in the
Greulich-Pyle atlas and is therefore simple
and easily applied in clinical practice. How-
ever, the process of bone age estimation,
which comprises a simple comparison of
multiple images, can be repetitive and time
consuming and is thus sometimes burden-
some to radiologists. Moreover, the accuracy
depends on the radiologist’s experience and
tends to be subjective.
Keywords: bone age, children, deep learning, neural
network model
DOI:10.2214/AJR.17.18224
J. R. Kim and W. H. Shim contributed equally to this work.
Received March 12, 2017; accepted after revision
July 7, 2017.
S. Kim is employed by Vuno, Inc., which created the deep
learning–based automatic software system for bone
age determination. J. R. Kim, W. H. Shim, H. M. Yoon,
S. H. Hong, J. S. Lee, and Y. A. Cho are employed by
Asan Medical Center, which holds patent rights for the
deep learning–based automatic software system for
bone age assessment.
OBJECTIVE. The purpose of this study is to evaluate the accuracy and efficiency of a
new automatic software system for bone age assessment and to validate its feasibility in clini-
cal practice.
MATERIALS AND METHODS. A Greulich-Pyle method–based deep-learning tech-
nique was used to develop the automatic software system for bone age determination. Using
this software, bone age was estimated from left-hand radiographs of 200 patients (3–17 years
old) using first-rank bone age (software only), computer-assisted bone age (two radiologists
with software assistance), and Greulich-Pyle atlas–assisted bone age (two radiologists with
Greulich-Pyle atlas assistance only). The reference bone age was determined by the consen-
sus of two experienced radiologists.
RESULTS. First-rank bone ages determined by the automatic software system showed a
69.5% concordance rate and significant correlations with the reference bone age (r = 0.992;
p  0.001). Concordance rates increased with the use of the automatic software system for
both reviewer 1 (63.0% for Greulich-Pyle atlas–assisted bone age vs 72.5% for computer-as-
sisted bone age) and reviewer 2 (49.5% for Greulich-Pyle atlas–assisted bone age vs 57.5% for
computer-assisted bone age). Reading times were reduced by 18.0% and 40.0% for reviewers
1 and 2, respectively.
CONCLUSION. Automatic software system showed reliably accurate bone age estima-
tions and appeared to enhance efficiency by reducing reading times without compromising
the diagnostic accuracy.
Kim et al.
Accuracy and Efficiency of Computerized Bone Age Estimation
Pediatric Imaging
Original Research
Downloadedfromwww.ajronline.orgbyFloridaAtlanticUnivon09/13/17fromIPaddress131.91.169.193.CopyrightARRS.Forpersonaluseonly;allrightsreserved
TECHNICAL REPORT
https://doi.org/10.1038/s41591-019-0539-7
1
Google Health, Mountain View, CA, USA. 2
Present address: AstraZeneca, Gaithersburg, MD, USA. 3
Present address: Tempus Labs Inc., Chicago, IL, USA.
4
These authors contributed equally: Po-Hsuan Cameron Chen, Krishna Gadepalli, Robert MacDonald. *e-mail: cmermel@google.com
M
icroscopic examination of samples is the gold standard for
the diagnosis of cancer, autoimmune diseases, infectious
diseases and more. In cancer, the microscopic examina-
tion of stained tissue sections is critical for diagnosing and staging
the patient’s tumor, which informs treatment decisions and prog-
nosis. In cancer, microscopy analysis faces three major challenges.
As a form of image interpretation, these examinations are inher-
ently subjective, exhibiting considerable inter- and intra-observer
variability1,2
. Moreover, clinical guidelines3
and studies4
have begun
to require quantitative assessments as part of the effort towards
better patient risk stratification3
. For example, breast cancer stag-
ing requires the counting of mitotic cells, and quantification of
the tumor burden in lymph nodes by measuring the largest tumor
focus. However, despite being helpful in treatment planning, quan-
tification is laborious and error-prone. Lastly, access to disease
experts can be limited in both developed and developing countries5
,
exacerbating the problem.
As a potential solution, recent advances in AI, specifically deep
learning6
, have demonstrated automated medical image analy-
sis with performance comparable to that by human experts1,7–10
.
Research has also shown the potential to improve diagnostic
accuracy, quantitation and efficiency by applying deep learning
algorithms to digitized whole-slide pathology images for cancer
classification and detection8,10,11
. However, the integration of these
advances into cancer diagnosis is not straightforward because of
two primary challenges: image digitization and the technical skills
required to utilize deep learning algorithms. First, most micro-
scopic examinations are performed using analog microscopes, and
a digitized workflow requires significant infrastructure investments.
Second, because of differences in hardware, firmware and software,
the use of AI algorithms developed by others is challenging even for
experts. As such, actual utilization of AI in microscopy frequently
remains inaccessible.
Here, we propose a cost-effective solution to these barriers to
entry of AI in microscopic analysis: an augmented optical light
microscope that enables real-time integration of AI. We define
‘real-time integration’ as adding the capability of AI assistance
without slowing down specimen review or modifying the standard
workflow. We propose to superimpose the predictions of the AI
algorithm on the view of the sample seen by the user through the
eyepiece. Because augmenting additional information over the orig-
inal view is termed augmented reality, we term this microscope the
augmented reality microscope. Although we apply this technology
to cancer diagnosis in this paper, the ARM is application-agnostic
and can be utilized in other microscopy applications.
Aligned with ARM’s function to serve as a viable platform for
AI assistance in microscopy applications, the ARM system satisfies
three major design requirements: spatial registration of the aug-
mented information, system response time and robustness of the
deep learning algorithms. First, AI predictions such as tumor or
cell locations need to be precisely aligned with the specimen in the
observer’s field of view (FOV) to retain the correct spatial context.
Importantly, this alignment must be insensitive to small changes
in the user’s eye position relative to the eyepiece (parallax-free) to
account for user movements. Second, although the latest deep learn-
ing algorithms often require billions of mathematical operations12
,
these algorithms have to be applied in real time to avoid unnatural
latency in the workflow. This is especially critical in applications
such as cancer diagnosis, where the pathologist is constantly and
rapidly panning around the slide. Finally, many deep learning algo-
rithms for microscope images were developed using other digitiza-
tion methods, such as whole-slide scanners in histopathology8,10,11
.
We demonstrate that two deep learning a
An augmen ed ea y m c oscope w h
ea me a c a n e gence n eg a on
o cance d agnos s
m M
m
M m M m
W
W
LETTER
A c n ca y app cab e approach o con nuou
pred c on o u ure acu e k dney n ury
A g d m w n ng o R W o
o p ho
A A m B D H
C A
I
S a ab and a u a d p a n ng w h on h a h
o d
Lung cancer

on Chest CT
Lung Cancer

on Pathology
Breast Cancer

on Pathology
Polys  Adenoma

on Colonoscopy
Analysis of 

Cancer Genome
Analysis of 

Cancer Genome
Assessment of 

Bone Age
Breast Cancer

on Pathology
Breast Cancer

on Pathology
Septic Shock Acute Kidney Injury Cardiac Arrest Mortality

 readmission
Improving the accuracy
Improving the efficiency
m
w m m m
m w A
Enabling the prediction
Arrhythmia
referred from Dr. Eric Topol on Twitter
“virtuous cycle”
How can we make a better medicine,
with artificial (aka. augmented) intelligence?
Feedback/Questions
• Email: yoonsup.choi@gmail.com
• Blog: http://www.yoonsupchoi.com
• Facebook: Yoon Sup Choi
최윤섭 지음
의료인공지능
표지디자인•최승협
치를 만드는 것
기업가, 엔젤투
의 대표적인 전
이 분야를 처음
포항공과대학교
동 대학원 시스
취득하였다. 스
조교수, KT 종합
구원 연구조교
저널에 10여 편
국내 최초로 디
윤섭 디지털 헬
국내 유일의 헬
어 파트너스’의
스타트업을 의
관대학교 디지
뷰노, 직토, 3b
소울링, 메디히
자문을 맡아 한
고 있다. 국내
케어 이노베이
을 연재하고 있
와 『그렇게 나는
•블로그_ htt
•페이스북_ h
•이메일_ yoo
최윤섭
의료 인공지능은 보수적인 의료 시스템을 재편할 혁신을 일으키고 있다. 의료 인공지능의 빠른 발전과
광범위한 영향은 전문화, 세분화되며 발전해 온 현대 의료 전문가들이 이해하기가 어려우며, 어디서부
터 공부해야 할지도 막연하다. 이런 상황에서 의료 인공지능의 개념과 적용, 그리고 의사와의 관계를 쉽
게 풀어내는 이 책은 좋은 길라잡이가 될 것이다. 특히 미래의 주역이 될 의학도와 젊은 의료인에게 유용
한 소개서이다.
━ 서준범, 서울아산병원 영상의학과 교수, 의료영상인공지능사업단장
인공지능이 의료의 패러다임을 크게 바꿀 것이라는 것에 동의하지 않는 사람은 거의 없다. 하지만 인공
지능이 처리해야 할 의료의 난제는 많으며 그 해결 방안도 천차만별이다. 흔히 생각하는 만병통치약 같
은 의료 인공지능은 존재하지 않는다. 이 책은 다양한 의료 인공지능의 개발, 활용 및 가능성을 균형 있
게 분석하고 있다. 인공지능을 도입하려는 의료인, 생소한 의료 영역에 도전할 인공지능 연구자 모두에
게 일독을 권한다.
━ 정지훈, 경희사이버대 미디어커뮤니케이션학과 선임강의교수, 의사
서울의대 기초의학교육을 책임지고 있는 교수의 입장에서, 산업화 이후 변하지 않은 현재의 의학 교육
으로는 격변하는 인공지능 시대에 의대생을 대비시키지 못한다는 한계를 절실히 느낀다. 저와 함께 의
대 인공지능 교육을 개척하고 있는 최윤섭 소장의 전문적 분석과 미래 지향적 안목이 담긴 책이다. 인공
지능이라는 미래를 대비할 의대생과 교수, 그리고 의대 진학을 고민하는 학생과 학부모에게 추천한다.
━ 최형진, 서울대학교 의과대학 해부학교실 교수, 내과 전문의
최근 의료 인공지능의 도입에 대해서 극단적인 시각과 태도가 공존하고 있다. 이 책은 다양한 사례와 깊
은 통찰을 통해 의료 인공지능의 현황과 미래에 대해 균형적인 시각을 제공하여, 인공지능이 의료에 본
격적으로 도입되기 위한 토론의 장을 마련한다. 의료 인공지능이 일상화된 10년 후 돌아보았을 때, 이 책
이 그런 시대를 이끄는 길라잡이 역할을 하였음을 확인할 수 있기를 기대한다.
━ 정규환, 뷰노 CTO
의료 인공지능은 다른 분야 인공지능보다 더 본질적인 이해가 필요하다. 단순히 인간의 일을 대신하는
수준을 넘어 의학의 패러다임을 데이터 기반으로 변화시키기 때문이다. 따라서 인공지능을 균형있게 이
해하고, 어떻게 의사와 환자에게 도움을 줄 수 있을지 깊은 고민이 필요하다. 세계적으로 일어나고 있는
이러한 노력의 결과물을 집대성한 이 책이 반가운 이유다.
━ 백승욱, 루닛 대표
의료 인공지능의 최신 동향뿐만 아니라, 의의와 한계, 전망, 그리고 다양한 생각거리까지 주는 책이다.
논쟁이 되는 여러 이슈에 대해서도 저자는 자신의 시각을 명확한 근거에 기반하여 설득력 있게 제시하
고 있다. 개인적으로는 이 책을 대학원 수업 교재로 활용하려 한다.
━ 신수용, 성균관대학교 디지털헬스학과 교수
최윤섭지음
의료인공지능
값 20,000원
ISBN 979-11-86269-99-2
의료 인공지능을 본격적으로 다룬 국내 최초의 책!
이 책은 의료 인공지능의 기술적 측면과 아울러 의료계 안팎에서 제기
되는 인공지능과 관련된 여러 이슈를 본격적으로 다루고 있다. 현재 의
료 인공지능을 둘러싸고 제기되는 다양한 이슈를 대부분 커버했다고 자
신한다. 예를 들어 인공지능으로 인해 의사는 대체될 것인가, 어느 진료
과가 먼저 영향을 받을 것인가, 인공지능을 어떻게 규제하고 효용과 안
전성을 어떻게 증명할 것인가, 의료 사고의 책임은 누가 지는가, 의학
교육은 어떻게 바뀌어야 하는가 등의 이슈를 가능한 쉬운 언어로 깊이
있게 다루려고 노력하였다.
필자는 의료인이나 인공지능 전문가가 아닌, 일반인들이 의료 인공지
능의 최신 동향과 주요 이슈를 이해할 수 있도록 전문적인 용어를 최대
한 배제하고 쉽게 쓰려고 노력했다. 독자들은 이 책에서 다른 곳에서 접
하기 어려웠던 의료 인공지능과 관련한 종합적인 논의를 접하게 될 것
이다. 이 책이 출판된 이후에도 의료 인공지능 분야는 너무나 빨리 발전
하기 때문에 최신 기술이 계속 나올 것이다. 하지만 이 책에서 제시하는
이 분야에 대한 개괄이 의료 인공지능에 접근하고, 기술을 공부하며, 앞
으로 닥쳐올 변화에 대비할 수 있는 토대가 되기를 바란다.
또한 필자의 개인적인 바람이라면 의료인들, 특히 갓 의사 면허를 취득
했거나 현재 의대에 재학 중인 예비 의사들에게 이 졸저가 도움되면 좋
겠다. 인공지능은 필연적으로 미래 의사의 역할 변화를 불러일으킬 것
이며 의료의 패러다임 자체를 뒤바꾸는 근본적인 변화를 일으킬 수도
있다. 하지만 현재의 의료계, 특히 의학 교육은 이 이슈에 제대로 대응
하지 못하고 있다. 이제부터 논의할 많은 내용 중 대부분은 의학 교육의
혁신이 필요하다는 결론으로 귀결된다.
안타깝게도 현재의 젊은 의사들, 그리고 의과대학의 예비 의사들은 샌
드위치 신세이다. 이들은 과거의 교육을 받고서 인공지능과 함께하는
미래를 살아가야 할 것이다. 언젠가는 의학 교육과 트레이닝 방식도 이
러한 변화에 발맞춰 변화할 것이다. 하지만 이들은 그 전에 진료실과 수
술방으로 투입될 운명이다. 무책임하게 들릴 수도 있겠지만, 여러분들
은 이러한 미래를 결국 각자 공부하고 준비하여 각자도생하는 수밖에
없다.
━ 저자 「프롤로그」 중에서
미래의료학자 최윤섭 박사가 제시하는
의료 인공지능의 현재와 미래
의료 딥러닝과 IBM 왓슨의 현주소
인공지능은 의사를 대체하는가
값 20,000원
ISBN 979-11-86269-99-2
미래를 살아가야 할 것이다. 언젠가는 의학 교육과 트레이닝 방식도 이
러한 변화에 발맞춰 변화할 것이다. 하지만 이들은 그 전에 진료실과 수
술방으로 투입될 운명이다. 무책임하게 들릴 수도 있겠지만, 여러분들
은 이러한 미래를 결국 각자 공부하고 준비하여 각자도생하는 수밖에
없다.
━ 저자 「프롤로그」 중에서
소울링, 메디히
자문을 맡아 한
고 있다. 국내
케어 이노베이
을 연재하고 있
와 『그렇게 나는
•블로그_ htt
•페이스북_ h
•이메일_ yoo
More Slides  Videos (in Korean)

[ASGO 2019] Artificial Intelligence in Medicine

  • 1.
    Director, Digital HealthcareInstitute Managing Partner, Digital Healthcare Partners Yoon Sup Choi, Ph.D. Artificial Intelligence in Medicine
  • 2.
    Disclaimer I disclosure theconflict of interest with above corporates Startup VC
  • 3.
    Hypes, Fantasies, Fears, aggressivepredictions on AI “Technology will replace 80% of doctors” “People should stop training radiologists now. It’s just completely obvious within 5 years, deep learning can do better than radiologists”
  • 5.
    ARTICLES https://doi.org/10.1038/s41591-018-0177-5 1 Applied Bioinformatics Laboratories,New York University School of Medicine, New York, NY, USA. 2 Skirball Institute, Department of Cell Biology, New York University School of Medicine, New York, NY, USA. 3 Department of Pathology, New York University School of Medicine, New York, NY, USA. 4 School of Mechanical Engineering, National Technical University of Athens, Zografou, Greece. 5 Institute for Systems Genetics, New York University School of Medicine, New York, NY, USA. 6 Department of Biochemistry and Molecular Pharmacology, New York University School of Medicine, New York, NY, USA. 7 Center for Biospecimen Research and Development, New York University, New York, NY, USA. 8 Department of Population Health and the Center for Healthcare Innovation and Delivery Science, New York University School of Medicine, New York, NY, USA. 9 These authors contributed equally to this work: Nicolas Coudray, Paolo Santiago Ocampo. *e-mail: narges.razavian@nyumc.org; aristotelis.tsirigos@nyumc.org A ccording to the American Cancer Society and the Cancer Statistics Center (see URLs), over 150,000 patients with lung cancer succumb to the disease each year (154,050 expected for 2018), while another 200,000 new cases are diagnosed on a yearly basis (234,030 expected for 2018). It is one of the most widely spread cancers in the world because of not only smoking, but also exposure to toxic chemicals like radon, asbestos and arsenic. LUAD and LUSC are the two most prevalent types of non–small cell lung cancer1 , and each is associated with discrete treatment guidelines. In the absence of definitive histologic features, this important distinc- tion can be challenging and time-consuming, and requires confir- matory immunohistochemical stains. Classification of lung cancer type is a key diagnostic process because the available treatment options, including conventional chemotherapy and, more recently, targeted therapies, differ for LUAD and LUSC2 . Also, a LUAD diagnosis will prompt the search for molecular biomarkers and sensitizing mutations and thus has a great impact on treatment options3,4 . For example, epidermal growth factor receptor (EGFR) mutations, present in about 20% of LUAD, and anaplastic lymphoma receptor tyrosine kinase (ALK) rearrangements, present in<5% of LUAD5 , currently have tar- geted therapies approved by the Food and Drug Administration (FDA)6,7 . Mutations in other genes, such as KRAS and tumor pro- tein P53 (TP53) are very common (about 25% and 50%, respec- tively) but have proven to be particularly challenging drug targets so far5,8 . Lung biopsies are typically used to diagnose lung cancer type and stage. Virtual microscopy of stained images of tissues is typically acquired at magnifications of 20×to 40×, generating very large two-dimensional images (10,000 to>100,000 pixels in each dimension) that are oftentimes challenging to visually inspect in an exhaustive manner. Furthermore, accurate interpretation can be difficult, and the distinction between LUAD and LUSC is not always clear, particularly in poorly differentiated tumors; in this case, ancil- lary studies are recommended for accurate classification9,10 . To assist experts, automatic analysis of lung cancer whole-slide images has been recently studied to predict survival outcomes11 and classifica- tion12 . For the latter, Yu et al.12 combined conventional thresholding and image processing techniques with machine-learning methods, such as random forest classifiers, support vector machines (SVM) or Naive Bayes classifiers, achieving an AUC of ~0.85 in distinguishing normal from tumor slides, and ~0.75 in distinguishing LUAD from LUSC slides. More recently, deep learning was used for the classi- fication of breast, bladder and lung tumors, achieving an AUC of 0.83 in classification of lung tumor types on tumor slides from The Cancer Genome Atlas (TCGA)13 . Analysis of plasma DNA values was also shown to be a good predictor of the presence of non–small cell cancer, with an AUC of ~0.94 (ref. 14 ) in distinguishing LUAD from LUSC, whereas the use of immunochemical markers yields an AUC of ~0.94115 . Here, we demonstrate how the field can further benefit from deep learning by presenting a strategy based on convolutional neural networks (CNNs) that not only outperforms methods in previously Classification and mutation prediction from non–small cell lung cancer histopathology images using deep learning Nicolas Coudray 1,2,9 , Paolo Santiago Ocampo3,9 , Theodore Sakellaropoulos4 , Navneet Narula3 , Matija Snuderl3 , David Fenyö5,6 , Andre L. Moreira3,7 , Narges Razavian 8 * and Aristotelis Tsirigos 1,3 * Visual inspection of histopathology slides is one of the main methods used by pathologists to assess the stage, type and sub- type of lung tumors. Adenocarcinoma (LUAD) and squamous cell carcinoma (LUSC) are the most prevalent subtypes of lung cancer, and their distinction requires visual inspection by an experienced pathologist. In this study, we trained a deep con- volutional neural network (inception v3) on whole-slide images obtained from The Cancer Genome Atlas to accurately and automatically classify them into LUAD, LUSC or normal lung tissue. The performance of our method is comparable to that of pathologists, with an average area under the curve (AUC) of 0.97. Our model was validated on independent datasets of frozen tissues, formalin-fixed paraffin-embedded tissues and biopsies. Furthermore, we trained the network to predict the ten most commonly mutated genes in LUAD. We found that six of them—STK11, EGFR, FAT1, SETBP1, KRAS and TP53—can be pre- dicted from pathology images, with AUCs from 0.733 to 0.856 as measured on a held-out population. These findings suggest that deep-learning models can assist pathologists in the detection of cancer subtype or gene mutations. Our approach can be applied to any cancer type, and the code is available at https://github.com/ncoudray/DeepPATH. NATURE MEDICINE | www.nature.com/naturemedicine LETTER https://doi.org/10.1038/s41586-019-1390-1 A clinically applicable approach to continuous prediction of future acute kidney injury Nenad Tomašev1 *, Xavier Glorot1 , Jack W. Rae1,2 , Michal Zielinski1 , Harry Askham1 , Andre Saraiva1 , Anne Mottram1 , Clemens Meyer1 , Suman Ravuri1 , Ivan Protsyuk1 , Alistair Connell1 , Cían O. Hughes1 , Alan Karthikesalingam1 , Julien Cornebise1,12 , Hugh Montgomery3 , Geraint Rees4 , Chris Laing5 , Clifton R. Baker6 , Kelly Peterson7,8 , Ruth Reeves9 , Demis Hassabis1 , Dominic King1 , Mustafa Suleyman1 , Trevor Back1,13 , Christopher Nielson10,11,13 , Joseph R. Ledsam1,13 * & Shakir Mohamed1,13 The early prediction of deterioration could have an important role in supporting healthcare professionals, as an estimated 11% of deaths in hospital follow a failure to promptly recognize and treat deteriorating patients1 . To achieve this goal requires predictions of patient risk that are continuously updated and accurate, and delivered at an individual level with sufficient context and enough time to act. Here we develop a deep learning approach for the continuous risk prediction of future deterioration in patients, building on recent work that models adverse events from electronic health records2–17 and using acute kidney injury—a common and potentially life-threatening condition18 —as an exemplar. Our model was developed on a large, longitudinal dataset of electronic health records that cover diverse clinical environments, comprising 703,782 adult patients across 172 inpatient and 1,062 outpatient sites. Our model predicts 55.8% of all inpatient episodes of acute kidney injury, and 90.2% of all acute kidney injuries that required subsequent administration of dialysis, with a lead time of up to 48 h and a ratio of 2 false alerts for every true alert. In addition to predicting future acute kidney injury, our model provides confidence assessments and a list of the clinical features that are most salient to each prediction, alongside predicted future trajectories for clinically relevant blood tests9 . Although the recognition and prompt treatment of acute kidney injury is known to be challenging, our approach may offer opportunities for identifying patients at risk within a time window that enables early treatment. Adverse events and clinical complications are a major cause of mor- tality and poor outcomes in patients, and substantial effort has been made to improve their recognition18,19 . Few predictors have found their way into routine clinical practice, because they either lack effective sensitivity and specificity or report damage that already exists20 . One example relates to acute kidney injury (AKI), a potentially life-threat- ening condition that affects approximately one in five inpatient admis- sions in the United States21 . Although a substantial proportion of cases of AKI are thought to be preventable with early treatment22 , current algorithms for detecting AKI depend on changes in serum creatinine as a marker of acute decline in renal function. Increases in serum cre- atinine lag behind renal injury by a considerable period, which results in delayed access to treatment. This supports a case for preventative ‘screening’-type alerts but there is no evidence that current rule-based alerts improve outcomes23 . For predictive alerts to be effective, they must empower clinicians to act before a major clinical decline has occurred by: (i) delivering actionable insights on preventable condi- tions; (ii) being personalized for specific patients; (iii) offering suffi- cient contextual information to inform clinical decision-making; and (iv) being generally applicable across populations of patients24 . Promising recent work on modelling adverse events from electronic health records2–17 suggests that the incorporation of machine learning may enable the early prediction of AKI. Existing examples of sequential AKI risk models have either not demonstrated a clinically applicable level of predictive performance25 or have focused on predictions across a short time horizon that leaves little time for clinical assessment and intervention26 . Our proposed system is a recurrent neural network that operates sequentially over individual electronic health records, processing the data one step at a time and building an internal memory that keeps track of relevant information seen up to that point. At each time point, the model outputs a probability of AKI occurring at any stage of sever- ity within the next 48 h (although our approach can be extended to other time windows or severities of AKI; see Extended Data Table 1). When the predicted probability exceeds a specified operating-point threshold, the prediction is considered positive. This model was trained using data that were curated from a multi-site retrospective dataset of 703,782 adult patients from all available sites at the US Department of Veterans Affairs—the largest integrated healthcare system in the United States. The dataset consisted of information that was available from hospital electronic health records in digital format. The total number of independent entries in the dataset was approximately 6 billion, includ- ing 620,000 features. Patients were randomized across training (80%), validation (5%), calibration (5%) and test (10%) sets. A ground-truth label for the presence of AKI at any given point in time was added using the internationally accepted ‘Kidney Disease: Improving Global Outcomes’ (KDIGO) criteria18 ; the incidence of KDIGO AKI was 13.4% of admissions. Detailed descriptions of the model and dataset are provided in the Methods and Extended Data Figs. 1–3. Figure 1 shows the use of our model. At every point throughout an admission, the model provides updated estimates of future AKI risk along with an associated degree of uncertainty. Providing the uncer- tainty associated with a prediction may help clinicians to distinguish ambiguous cases from those predictions that are fully supported by the available data. Identifying an increased risk of future AKI sufficiently far in advance is critical, as longer lead times may enable preventative action to be taken. This is possible even when clinicians may not be actively intervening with, or monitoring, a patient. Supplementary Information section A provides more examples of the use of the model. With our approach, 55.8% of inpatient AKI events of any severity were predicted early, within a window of up to 48 h in advance and with a ratio of 2 false predictions for every true positive. This corresponds to an area under the receiver operating characteristic curve of 92.1%, and an area under the precision–recall curve of 29.7%. When set at this threshold, our predictive model would—if operationalized—trigger a 1 DeepMind, London, UK. 2 CoMPLEX, Computer Science, University College London, London, UK. 3 Institute for Human Health and Performance, University College London, London, UK. 4 Institute of Cognitive Neuroscience, University College London, London, UK. 5 University College London Hospitals, London, UK. 6 Department of Veterans Affairs, Denver, CO, USA. 7 VA Salt Lake City Healthcare System, Salt Lake City, UT, USA. 8 Division of Epidemiology, University of Utah, Salt Lake City, UT, USA. 9 Department of Veterans Affairs, Nashville, TN, USA. 10 University of Nevada School of Medicine, Reno, NV, USA. 11 Department of Veterans Affairs, Salt Lake City, UT, USA. 12 Present address: University College London, London, UK. 13 These authors contributed equally: Trevor Back, Christopher Nielson, Joseph R. Ledsam, Shakir Mohamed. *e-mail: nenadt@google.com; jledsam@google.com 1 1 6 | N A T U R E | V O L 5 7 2 | 1 A U G U S T 2 0 1 9 Copyright 2016 American Medical Association. All rights reserved. Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs Varun Gulshan, PhD; Lily Peng, MD, PhD; Marc Coram, PhD; Martin C. Stumpe, PhD; Derek Wu, BS; Arunachalam Narayanaswamy, PhD; Subhashini Venugopalan, MS; Kasumi Widner, MS; Tom Madams, MEng; Jorge Cuadros, OD, PhD; Ramasamy Kim, OD, DNB; Rajiv Raman, MS, DNB; Philip C. Nelson, BS; Jessica L. Mega, MD, MPH; Dale R. Webster, PhD IMPORTANCE Deep learning is a family of computational methods that allow an algorithm to program itself by learning from a large set of examples that demonstrate the desired behavior, removing the need to specify rules explicitly. Application of these methods to medical imaging requires further assessment and validation. OBJECTIVE To apply deep learning to create an algorithm for automated detection of diabetic retinopathy and diabetic macular edema in retinal fundus photographs. DESIGN AND SETTING A specific type of neural network optimized for image classification called a deep convolutional neural network was trained using a retrospective development data set of 128 175 retinal images, which were graded 3 to 7 times for diabetic retinopathy, diabetic macular edema, and image gradability by a panel of 54 US licensed ophthalmologists and ophthalmology senior residents between May and December 2015. The resultant algorithm was validated in January and February 2016 using 2 separate data sets, both graded by at least 7 US board-certified ophthalmologists with high intragrader consistency. EXPOSURE Deep learning–trained algorithm. MAIN OUTCOMES AND MEASURES The sensitivity and specificity of the algorithm for detecting referable diabetic retinopathy (RDR), defined as moderate and worse diabetic retinopathy, referable diabetic macular edema, or both, were generated based on the reference standard of the majority decision of the ophthalmologist panel. The algorithm was evaluated at 2 operating points selected from the development set, one selected for high specificity and another for high sensitivity. RESULTS TheEyePACS-1datasetconsistedof9963imagesfrom4997patients(meanage,54.4 years;62.2%women;prevalenceofRDR,683/8878fullygradableimages[7.8%]);the Messidor-2datasethad1748imagesfrom874patients(meanage,57.6years;42.6%women; prevalenceofRDR,254/1745fullygradableimages[14.6%]).FordetectingRDR,thealgorithm hadanareaunderthereceiveroperatingcurveof0.991(95%CI,0.988-0.993)forEyePACS-1and 0.990(95%CI,0.986-0.995)forMessidor-2.Usingthefirstoperatingcutpointwithhigh specificity,forEyePACS-1,thesensitivitywas90.3%(95%CI,87.5%-92.7%)andthespecificity was98.1%(95%CI,97.8%-98.5%).ForMessidor-2,thesensitivitywas87.0%(95%CI,81.1%- 91.0%)andthespecificitywas98.5%(95%CI,97.7%-99.1%).Usingasecondoperatingpoint withhighsensitivityinthedevelopmentset,forEyePACS-1thesensitivitywas97.5%and specificitywas93.4%andforMessidor-2thesensitivitywas96.1%andspecificitywas93.9%. CONCLUSIONS AND RELEVANCE In this evaluation of retinal fundus photographs from adults with diabetes, an algorithm based on deep machine learning had high sensitivity and specificity for detecting referable diabetic retinopathy. Further research is necessary to determine the feasibility of applying this algorithm in the clinical setting and to determine whether use of the algorithm could lead to improved care and outcomes compared with current ophthalmologic assessment. JAMA. doi:10.1001/jama.2016.17216 Published online November 29, 2016. Editorial Supplemental content Author Affiliations: Google Inc, Mountain View, California (Gulshan, Peng, Coram, Stumpe, Wu, Narayanaswamy, Venugopalan, Widner, Madams, Nelson, Webster); Department of Computer Science, University of Texas, Austin (Venugopalan); EyePACS LLC, San Jose, California (Cuadros); School of Optometry, Vision Science Graduate Group, University of California, Berkeley (Cuadros); Aravind Medical Research Foundation, Aravind Eye Care System, Madurai, India (Kim); Shri Bhagwan Mahavir Vitreoretinal Services, Sankara Nethralaya, Chennai, Tamil Nadu, India (Raman); Verily Life Sciences, Mountain View, California (Mega); Cardiovascular Division, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, Massachusetts (Mega). Corresponding Author: Lily Peng, MD, PhD, Google Research, 1600 Amphitheatre Way, Mountain View, CA 94043 (lhpeng@google.com). Research JAMA | Original Investigation | INNOVATIONS IN HEALTH CARE DELIVERY (Reprinted) E1 Copyright 2016 American Medical Association. All rights reserved. Downloaded From: http://jamanetwork.com/ on 12/02/2016 ophthalmology LETTERS https://doi.org/10.1038/s41591-018-0335-9 1 Guangzhou Women and Children’s Medical Center, Guangzhou Medical University, Guangzhou, China. 2 Institute for Genomic Medicine, Institute of Engineering in Medicine, and Shiley Eye Institute, University of California, San Diego, La Jolla, CA, USA. 3 Hangzhou YITU Healthcare Technology Co. Ltd, Hangzhou, China. 4 Department of Thoracic Surgery/Oncology, First Affiliated Hospital of Guangzhou Medical University, China State Key Laboratory and National Clinical Research Center for Respiratory Disease, Guangzhou, China. 5 Guangzhou Kangrui Co. Ltd, Guangzhou, China. 6 Guangzhou Regenerative Medicine and Health Guangdong Laboratory, Guangzhou, China. 7 Veterans Administration Healthcare System, San Diego, CA, USA. 8 These authors contributed equally: Huiying Liang, Brian Tsui, Hao Ni, Carolina C. S. Valentim, Sally L. Baxter, Guangjian Liu. *e-mail: kang.zhang@gmail.com; xiahumin@hotmail.com Artificial intelligence (AI)-based methods have emerged as powerful tools to transform medical care. Although machine learning classifiers (MLCs) have already demonstrated strong performance in image-based diagnoses, analysis of diverse and massive electronic health record (EHR) data remains chal- lenging. Here, we show that MLCs can query EHRs in a manner similar to the hypothetico-deductive reasoning used by physi- cians and unearth associations that previous statistical meth- ods have not found. Our model applies an automated natural language processing system using deep learning techniques to extract clinically relevant information from EHRs. In total, 101.6 million data points from 1,362,559 pediatric patient visits presenting to a major referral center were analyzed to train and validate the framework. Our model demonstrates high diagnostic accuracy across multiple organ systems and is comparable to experienced pediatricians in diagnosing com- mon childhood diseases. Our study provides a proof of con- cept for implementing an AI-based system as a means to aid physicians in tackling large amounts of data, augmenting diag- nostic evaluations, and to provide clinical decision support in cases of diagnostic uncertainty or complexity. Although this impact may be most evident in areas where healthcare provid- ers are in relative shortage, the benefits of such an AI system are likely to be universal. Medical information has become increasingly complex over time. The range of disease entities, diagnostic testing and biomark- ers, and treatment modalities has increased exponentially in recent years. Subsequently, clinical decision-making has also become more complex and demands the synthesis of decisions from assessment of large volumes of data representing clinical information. In the current digital age, the electronic health record (EHR) represents a massive repository of electronic data points representing a diverse array of clinical information1–3 . Artificial intelligence (AI) methods have emerged as potentially powerful tools to mine EHR data to aid in disease diagnosis and management, mimicking and perhaps even augmenting the clinical decision-making of human physicians1 . To formulate a diagnosis for any given patient, physicians fre- quently use hypotheticodeductive reasoning. Starting with the chief complaint, the physician then asks appropriately targeted questions relating to that complaint. From this initial small feature set, the physician forms a differential diagnosis and decides what features (historical questions, physical exam findings, laboratory testing, and/or imaging studies) to obtain next in order to rule in or rule out the diagnoses in the differential diagnosis set. The most use- ful features are identified, such that when the probability of one of the diagnoses reaches a predetermined level of acceptability, the process is stopped, and the diagnosis is accepted. It may be pos- sible to achieve an acceptable level of certainty of the diagnosis with only a few features without having to process the entire feature set. Therefore, the physician can be considered a classifier of sorts. In this study, we designed an AI-based system using machine learning to extract clinically relevant features from EHR notes to mimic the clinical reasoning of human physicians. In medicine, machine learning methods have already demonstrated strong per- formance in image-based diagnoses, notably in radiology2 , derma- tology4 , and ophthalmology5–8 , but analysis of EHR data presents a number of difficult challenges. These challenges include the vast quantity of data, high dimensionality, data sparsity, and deviations Evaluation and accurate diagnoses of pediatric diseases using artificial intelligence Huiying Liang1,8 , Brian Y. Tsui 2,8 , Hao Ni3,8 , Carolina C. S. Valentim4,8 , Sally L. Baxter 2,8 , Guangjian Liu1,8 , Wenjia Cai 2 , Daniel S. Kermany1,2 , Xin Sun1 , Jiancong Chen2 , Liya He1 , Jie Zhu1 , Pin Tian2 , Hua Shao2 , Lianghong Zheng5,6 , Rui Hou5,6 , Sierra Hewett1,2 , Gen Li1,2 , Ping Liang3 , Xuan Zang3 , Zhiqi Zhang3 , Liyan Pan1 , Huimin Cai5,6 , Rujuan Ling1 , Shuhua Li1 , Yongwang Cui1 , Shusheng Tang1 , Hong Ye1 , Xiaoyan Huang1 , Waner He1 , Wenqing Liang1 , Qing Zhang1 , Jianmin Jiang1 , Wei Yu1 , Jianqun Gao1 , Wanxing Ou1 , Yingmin Deng1 , Qiaozhen Hou1 , Bei Wang1 , Cuichan Yao1 , Yan Liang1 , Shu Zhang1 , Yaou Duan2 , Runze Zhang2 , Sarah Gibson2 , Charlotte L. Zhang2 , Oulan Li2 , Edward D. Zhang2 , Gabriel Karin2 , Nathan Nguyen2 , Xiaokang Wu1,2 , Cindy Wen2 , Jie Xu2 , Wenqin Xu2 , Bochu Wang2 , Winston Wang2 , Jing Li1,2 , Bianca Pizzato2 , Caroline Bao2 , Daoman Xiang1 , Wanting He1,2 , Suiqin He2 , Yugui Zhou1,2 , Weldon Haw2,7 , Michael Goldbaum2 , Adriana Tremoulet2 , Chun-Nan Hsu 2 , Hannah Carter2 , Long Zhu3 , Kang Zhang 1,2,7 * and Huimin Xia 1 * NATURE MEDICINE | www.nature.com/naturemedicine pediatrics pathology 1Wang P, et al. Gut 2019;0:1–7. doi:10.1136/gutjnl-2018-317500 Endoscopy ORIGINAL ARTICLE Real-time automatic detection system increases colonoscopic polyp and adenoma detection rates: a prospective randomised controlled study Pu Wang,  1 Tyler M Berzin,  2 Jeremy Romek Glissen Brown,  2 Shishira Bharadwaj,2 Aymeric Becq,2 Xun Xiao,1 Peixi Liu,1 Liangping Li,1 Yan Song,1 Di Zhang,1 Yi Li,1 Guangre Xu,1 Mengtian Tu,1 Xiaogang Liu  1 To cite: Wang P, Berzin TM, Glissen Brown JR, et al. Gut Epub ahead of print: [please include Day Month Year]. doi:10.1136/ gutjnl-2018-317500 ► Additional material is published online only.To view please visit the journal online (http://dx.doi.org/10.1136/ gutjnl-2018-317500). 1 Department of Gastroenterology, Sichuan Academy of Medical Sciences & Sichuan Provincial People’s Hospital, Chengdu, China 2 Center for Advanced Endoscopy, Beth Israel Deaconess Medical Center and Harvard Medical School, Boston, Massachusetts, USA Correspondence to Xiaogang Liu, Department of Gastroenterology Sichuan Academy of Medical Sciences and Sichuan Provincial People’s Hospital, Chengdu, China; Gary.samsph@gmail.com Received 30 August 2018 Revised 4 February 2019 Accepted 13 February 2019 © Author(s) (or their employer(s)) 2019. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ. ABSTRACT Objective The effect of colonoscopy on colorectal cancer mortality is limited by several factors, among them a certain miss rate, leading to limited adenoma detection rates (ADRs).We investigated the effect of an automatic polyp detection system based on deep learning on polyp detection rate and ADR. Design In an open, non-blinded trial, consecutive patients were prospectively randomised to undergo diagnostic colonoscopy with or without assistance of a real-time automatic polyp detection system providing a simultaneous visual notice and sound alarm on polyp detection.The primary outcome was ADR. Results Of 1058 patients included, 536 were randomised to standard colonoscopy, and 522 were randomised to colonoscopy with computer-aided diagnosis.The artificial intelligence (AI) system significantly increased ADR (29.1%vs20.3%, p<0.001) and the mean number of adenomas per patient (0.53vs0.31, p<0.001).This was due to a higher number of diminutive adenomas found (185vs102; p<0.001), while there was no statistical difference in larger adenomas (77vs58, p=0.075). In addition, the number of hyperplastic polyps was also significantly increased (114vs52, p<0.001). Conclusions In a low prevalent ADR population, an automatic polyp detection system during colonoscopy resulted in a significant increase in the number of diminutive adenomas detected, as well as an increase in the rate of hyperplastic polyps.The cost–benefit ratio of such effects has to be determined further. Trial registration number ChiCTR-DDD-17012221; Results. INTRODUCTION Colorectal cancer (CRC) is the second and third- leading causes of cancer-related deaths in men and women respectively.1 Colonoscopy is the gold stan- dard for screening CRC.2 3 Screening colonoscopy has allowed for a reduction in the incidence and mortality of CRC via the detection and removal of adenomatous polyps.4–8 Additionally, there is evidence that with each 1.0% increase in adenoma detection rate (ADR), there is an associated 3.0% decrease in the risk of interval CRC.9 10 However, polyps can be missed, with reported miss rates of up to 27% due to both polyp and operator charac- teristics.11 12 Unrecognised polyps within the visual field is an important problem to address.11 Several studies have shown that assistance by a second observer increases the polyp detection rate (PDR), but such a strategy remains controversial in terms of increasing the ADR.13–15 Ideally, a real-time automatic polyp detec- tion system, with performance close to that of expert endoscopists, could assist the endosco- pist in detecting lesions that might correspond to adenomas in a more consistent and reliable way Significance of this study What is already known on this subject? ► Colorectal adenoma detection rate (ADR) is regarded as a main quality indicator of (screening) colonoscopy and has been shown to correlate with interval cancers. Reducing adenoma miss rates by increasing ADR has been a goal of many studies focused on imaging techniques and mechanical methods. ► Artificial intelligence has been recently introduced for polyp and adenoma detection as well as differentiation and has shown promising results in preliminary studies. What are the new findings? ► This represents the first prospective randomised controlled trial examining an automatic polyp detection during colonoscopy and shows an increase of ADR by 50%, from 20% to 30%. ► This effect was mainly due to a higher rate of small adenomas found. ► The detection rate of hyperplastic polyps was also significantly increased. How might it impact on clinical practice in the foreseeable future? ► Automatic polyp and adenoma detection could be the future of diagnostic colonoscopy in order to achieve stable high adenoma detection rates. ► However, the effect on ultimate outcome is still unclear, and further improvements such as polyp differentiation have to be implemented. on17March2019byguest.Protectedbycopyright.http://gut.bmj.com/Gut:firstpublishedas10.1136/gutjnl-2018-317500on27February2019.Downloadedfrom pathology S E P S I S A targeted real-time early warning score (TREWScore) for septic shock Katharine E. Henry,1 David N. Hager,2 Peter J. Pronovost,3,4,5 Suchi Saria1,3,5,6 * Sepsis is a leading cause of death in the United States, with mortality highest among patients who develop septic shock. Early aggressive treatment decreases morbidity and mortality. Although automated screening tools can detect patients currently experiencing severe sepsis and septic shock, none predict those at greatest risk of developing shock. We analyzed routinely available physiological and laboratory data from intensive care unit patients and devel- oped “TREWScore,” a targeted real-time early warning score that predicts which patients will develop septic shock. TREWScore identified patients before the onset of septic shock with an area under the ROC (receiver operating characteristic) curve (AUC) of 0.83 [95% confidence interval (CI), 0.81 to 0.85]. At a specificity of 0.67, TREWScore achieved a sensitivity of 0.85 and identified patients a median of 28.2 [interquartile range (IQR), 10.6 to 94.2] hours before onset. Of those identified, two-thirds were identified before any sepsis-related organ dysfunction. In compar- ison, the Modified Early Warning Score, which has been used clinically for septic shock prediction, achieved a lower AUC of 0.73 (95% CI, 0.71 to 0.76). A routine screening protocol based on the presence of two of the systemic inflam- matory response syndrome criteria, suspicion of infection, and either hypotension or hyperlactatemia achieved a low- er sensitivity of 0.74 at a comparable specificity of 0.64. Continuous sampling of data from the electronic health records and calculation of TREWScore may allow clinicians to identify patients at risk for septic shock and provide earlier interventions that would prevent or mitigate the associated morbidity and mortality. INTRODUCTION Seven hundred fifty thousand patients develop severe sepsis and septic shock in the United States each year. More than half of them are admitted to an intensive care unit (ICU), accounting for 10% of all ICU admissions, 20 to 30% of hospital deaths, and $15.4 billion in an- nual health care costs (1–3). Several studies have demonstrated that morbidity, mortality, and length of stay are decreased when severe sep- sis and septic shock are identified and treated early (4–8). In particular, one study showed that mortality from septic shock increased by 7.6% with every hour that treatment was delayed after the onset of hypo- tension (9). More recent studies comparing protocolized care, usual care, and early goal-directed therapy (EGDT) for patients with septic shock sug- gest that usual care is as effective as EGDT (10–12). Some have inter- preted this to mean that usual care has improved over time and reflects important aspects of EGDT, such as early antibiotics and early ag- gressive fluid resuscitation (13). It is likely that continued early identi- fication and treatment will further improve outcomes. However, the best approach to managing patients at high risk of developing septic shock before the onset of severe sepsis or shock has not been studied. Methods that can identify ahead of time which patients will later expe- rience septic shock are needed to further understand, study, and im- prove outcomes in this population. General-purpose illness severity scoring systems such as the Acute Physiology and Chronic Health Evaluation (APACHE II), Simplified Acute Physiology Score (SAPS II), SequentialOrgan Failure Assessment (SOFA) scores, Modified Early Warning Score (MEWS), and Simple Clinical Score (SCS) have been validated to assess illness severity and risk of death among septic patients (14–17). Although these scores are useful for predicting general deterioration or mortality, they typical- ly cannot distinguish with high sensitivity and specificity which patients are at highest risk of developing a specific acute condition. The increased use of electronic health records (EHRs), which can be queried in real time, has generated interest in automating tools that identify patients at risk for septic shock (18–20). A number of “early warning systems,” “track and trigger” initiatives, “listening applica- tions,” and “sniffers” have been implemented to improve detection andtimelinessof therapy forpatients with severe sepsis andseptic shock (18, 20–23). Although these tools have been successful at detecting pa- tients currently experiencing severe sepsis or septic shock, none predict which patients are at highest risk of developing septic shock. The adoption of the Affordable Care Act has added to the growing excitement around predictive models derived from electronic health data in a variety of applications (24), including discharge planning (25), risk stratification (26, 27), and identification of acute adverse events (28, 29). For septic shock in particular, promising work includes that of predicting septic shock using high-fidelity physiological signals collected directly from bedside monitors (30, 31), inferring relationships between predictors of septic shock using Bayesian networks (32), and using routine measurements for septic shock prediction (33–35). No current prediction models that use only data routinely stored in the EHR predict septic shock with high sensitivity and specificity many hours before onset. Moreover, when learning predictive risk scores, cur- rent methods (34, 36, 37) often have not accounted for the censoring effects of clinical interventions on patient outcomes (38). For instance, a patient with severe sepsis who received fluids and never developed septic shock would be treated as a negative case, despite the possibility that he or she might have developed septic shock in the absence of such treatment and therefore could be considered a positive case up until the 1 Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA. 2 Division of Pulmonary and Critical Care Medicine, Department of Medicine, School of Medicine, Johns Hopkins University, Baltimore, MD 21205, USA. 3 Armstrong Institute for Patient Safety and Quality, Johns Hopkins University, Baltimore, MD 21202, USA. 4 Department of Anesthesiology and Critical Care Medicine, School of Medicine, Johns Hopkins University, Baltimore, MD 21202, USA. 5 Department of Health Policy and Management, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD 21205, USA. 6 Department of Applied Math and Statistics, Johns Hopkins University, Baltimore, MD 21218, USA. *Corresponding author. E-mail: ssaria@cs.jhu.edu R E S E A R C H A R T I C L E www.ScienceTranslationalMedicine.org 5 August 2015 Vol 7 Issue 299 299ra122 1 onNovember3,2016http://stm.sciencemag.org/Downloadedfrom infectious BRIEF COMMUNICATION OPEN Digital biomarkers of cognitive function Paul Dagum1 To identify digital biomarkers associated with cognitive function, we analyzed human–computer interaction from 7 days of smartphone use in 27 subjects (ages 18–34) who received a gold standard neuropsychological assessment. For several neuropsychological constructs (working memory, memory, executive function, language, and intelligence), we found a family of digital biomarkers that predicted test scores with high correlations (p < 10−4 ). These preliminary results suggest that passive measures from smartphone use could be a continuous ecological surrogate for laboratory-based neuropsychological assessment. npj Digital Medicine (2018)1:10 ; doi:10.1038/s41746-018-0018-4 INTRODUCTION By comparison to the functional metrics available in other disciplines, conventional measures of neuropsychiatric disorders have several challenges. First, they are obtrusive, requiring a subject to break from their normal routine, dedicating time and often travel. Second, they are not ecological and require subjects to perform a task outside of the context of everyday behavior. Third, they are episodic and provide sparse snapshots of a patient only at the time of the assessment. Lastly, they are poorly scalable, taxing limited resources including space and trained staff. In seeking objective and ecological measures of cognition, we attempted to develop a method to measure memory and executive function not in the laboratory but in the moment, day-to-day. We used human–computer interaction on smart- phones to identify digital biomarkers that were correlated with neuropsychological performance. RESULTS In 2014, 27 participants (ages 27.1 ± 4.4 years, education 14.1 ± 2.3 years, M:F 8:19) volunteered for neuropsychological assessment and a test of the smartphone app. Smartphone human–computer interaction data from the 7 days following the neuropsychological assessment showed a range of correla- tions with the cognitive scores. Table 1 shows the correlation between each neurocognitive test and the cross-validated predictions of the supervised kernel PCA constructed from the biomarkers for that test. Figure 1 shows each participant test score and the digital biomarker prediction for (a) digits backward, (b) symbol digit modality, (c) animal fluency, (d) Wechsler Memory Scale-3rd Edition (WMS-III) logical memory (delayed free recall), (e) brief visuospatial memory test (delayed free recall), and (f) Wechsler Adult Intelligence Scale- 4th Edition (WAIS-IV) block design. Construct validity of the predictions was determined using pattern matching that computed a correlation of 0.87 with p < 10−59 between the covariance matrix of the predictions and the covariance matrix of the tests. Table 1. Fourteen neurocognitive assessments covering five cognitive domains and dexterity were performed by a neuropsychologist. Shown are the group mean and standard deviation, range of score, and the correlation between each test and the cross-validated prediction constructed from the digital biomarkers for that test Cognitive predictions Mean (SD) Range R (predicted), p-value Working memory Digits forward 10.9 (2.7) 7–15 0.71 ± 0.10, 10−4 Digits backward 8.3 (2.7) 4–14 0.75 ± 0.08, 10−5 Executive function Trail A 23.0 (7.6) 12–39 0.70 ± 0.10, 10−4 Trail B 53.3 (13.1) 37–88 0.82 ± 0.06, 10−6 Symbol digit modality 55.8 (7.7) 43–67 0.70 ± 0.10, 10−4 Language Animal fluency 22.5 (3.8) 15–30 0.67 ± 0.11, 10−4 FAS phonemic fluency 42 (7.1) 27–52 0.63 ± 0.12, 10−3 Dexterity Grooved pegboard test (dominant hand) 62.7 (6.7) 51–75 0.73 ± 0.09, 10−4 Memory California verbal learning test (delayed free recall) 14.1 (1.9) 9–16 0.62 ± 0.12, 10−3 WMS-III logical memory (delayed free recall) 29.4 (6.2) 18–42 0.81 ± 0.07, 10−6 Brief visuospatial memory test (delayed free recall) 10.2 (1.8) 5–12 0.77 ± 0.08, 10−5 Intelligence scale WAIS-IV block design 46.1(12.8) 12–61 0.83 ± 0.06, 10−6 WAIS-IV matrix reasoning 22.1(3.3) 12–26 0.80 ± 0.07, 10−6 WAIS-IV vocabulary 40.6(4.0) 31–50 0.67 ± 0.11, 10−4 Received: 5 October 2017 Revised: 3 February 2018 Accepted: 7 February 2018 1 Mindstrong Health, 248 Homer Street, Palo Alto, CA 94301, USA Correspondence: Paul Dagum (paul@mindstronghealth.com) www.nature.com/npjdigitalmed psychiatry P R E C I S I O N M E D I C I N E Identification of type 2 diabetes subgroups through topological analysis of patient similarity Li Li,1 Wei-Yi Cheng,1 Benjamin S. Glicksberg,1 Omri Gottesman,2 Ronald Tamler,3 Rong Chen,1 Erwin P. Bottinger,2 Joel T. Dudley1,4 * Type 2 diabetes (T2D) is a heterogeneous complex disease affecting more than 29 million Americans alone with a rising prevalence trending toward steady increases in the coming decades. Thus, there is a pressing clinical need to improve early prevention and clinical management of T2D and its complications. Clinicians have understood that patients who carry the T2D diagnosis have a variety of phenotypes and susceptibilities to diabetes-related compli- cations. We used a precision medicine approach to characterize the complexity of T2D patient populations based on high-dimensional electronic medical records (EMRs) and genotype data from 11,210 individuals. We successfully identified three distinct subgroups of T2D from topology-based patient-patient networks. Subtype 1 was character- ized by T2D complications diabetic nephropathy and diabetic retinopathy; subtype 2 was enriched for cancer ma- lignancy and cardiovascular diseases; and subtype 3 was associated most strongly with cardiovascular diseases, neurological diseases, allergies, and HIV infections. We performed a genetic association analysis of the emergent T2D subtypes to identify subtype-specific genetic markers and identified 1279, 1227, and 1338 single-nucleotide polymorphisms (SNPs) that mapped to 425, 322, and 437 unique genes specific to subtypes 1, 2, and 3, respec- tively. By assessing the human disease–SNP association for each subtype, the enriched phenotypes and biological functions at the gene level for each subtype matched with the disease comorbidities and clinical dif- ferences that we identified through EMRs. Our approach demonstrates the utility of applying the precision medicine paradigm in T2D and the promise of extending the approach to the study of other complex, multi- factorial diseases. INTRODUCTION Type 2 diabetes (T2D) is a complex, multifactorial disease that has emerged as an increasing prevalent worldwide health concern asso- ciated with high economic and physiological burdens. An estimated 29.1 million Americans (9.3% of the population) were estimated to have some form of diabetes in 2012—up 13% from 2010—with T2D representing up to 95% of all diagnosed cases (1, 2). Risk factors for T2D include obesity, family history of diabetes, physical inactivity, eth- nicity, and advanced age (1, 2). Diabetes and its complications now rank among the leading causes of death in the United States (2). In fact, diabetes is the leading cause of nontraumatic foot amputation, adult blindness, and need for kidney dialysis, and multiplies risk for myo- cardial infarction, peripheral artery disease, and cerebrovascular disease (3–6). The total estimated direct medical cost attributable to diabetes in the United States in 2012 was $176 billion, with an estimated $76 billion attributable to hospital inpatient care alone. There is a great need to im- prove understanding of T2D and its complex factors to facilitate pre- vention, early detection, and improvements in clinical management. A more precise characterization of T2D patient populations can en- hance our understanding of T2D pathophysiology (7, 8). Current clinical definitions classify diabetes into three major subtypes: type 1 dia- betes (T1D), T2D, and maturity-onset diabetes of the young. Other sub- types based on phenotype bridge the gap between T1D and T2D, for example, latent autoimmune diabetes in adults (LADA) (7) and ketosis- prone T2D. The current categories indicate that the traditional definition of diabetes, especially T2D, might comprise additional subtypes with dis- tinct clinical characteristics. A recent analysis of the longitudinal Whitehall II cohort study demonstrated improved assessment of cardiovascular risks when subgrouping T2D patients according to glucose concentration criteria (9). Genetic association studies reveal that the genetic architec- ture of T2D is profoundly complex (10–12). Identified T2D-associated risk variants exhibit allelic heterogeneity and directional differentiation among populations (13, 14). The apparent clinical and genetic com- plexity and heterogeneity of T2D patient populations suggest that there are opportunities to refine the current, predominantly symptom-based, definition of T2D into additional subtypes (7). Because etiological and pathophysiological differences exist among T2D patients, we hypothesize that a data-driven analysis of a clinical population could identify new T2D subtypes and factors. Here, we de- velop a data-driven, topology-based approach to (i) map the complexity of patient populations using clinical data from electronic medical re- cords (EMRs) and (ii) identify new, emergent T2D patient subgroups with subtype-specific clinical and genetic characteristics. We apply this approachtoadatasetcomprisingmatchedEMRsandgenotypedatafrom more than 11,000 individuals. Topological analysis of these data revealed three distinct T2D subtypes that exhibited distinct patterns of clinical characteristics and disease comorbidities. Further, we identified genetic markers associated with each T2D subtype and performed gene- and pathway-level analysis of subtype genetic associations. Biological and phenotypic features enriched in the genetic analysis corroborated clinical disparities observed among subgroups. Our findings suggest that data- driven,topologicalanalysisofpatientcohortshasutilityinprecisionmedicine effortstorefineourunderstandingofT2Dtowardimproving patient care. 1 Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 700 Lexington Ave., New York, NY 10065, USA. 2 Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029, USA. 3 Division of Endocrinology, Diabetes, and Bone Diseases, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA. 4 Department of Health Policy and Research, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA. *Corresponding author. E-mail: joel.dudley@mssm.edu R E S E A R C H A R T I C L E www.ScienceTranslationalMedicine.org 28 October 2015 Vol 7 Issue 311 311ra174 1 onOctober28,2015http://stm.sciencemag.org/Downloadedfrom endocrinology 0 0 M O N T H 2 0 1 7 | V O L 0 0 0 | N A T U R E | 1 LETTER doi:10.1038/nature21056 Dermatologist-level classification of skin cancer with deep neural networks Andre Esteva1 *, Brett Kuprel1 *, Roberto A. Novoa2,3 , Justin Ko2 , Susan M. Swetter2,4 , Helen M. Blau5 & Sebastian Thrun6 Skin cancer, the most common human malignancy1–3 , is primarily diagnosed visually, beginning with an initial clinical screening and followed potentially by dermoscopic analysis, a biopsy and histopathological examination. Automated classification of skin lesions using images is a challenging task owing to the fine-grained variability in the appearance of skin lesions. Deep convolutional neural networks (CNNs)4,5 show potential for general and highly variable tasks across many fine-grained object categories6–11 . Here we demonstrate classification of skin lesions using a single CNN, trained end-to-end from images directly, using only pixels and disease labels as inputs. We train a CNN using a dataset of 129,450 clinical images—two orders of magnitude larger than previous datasets12 —consisting of 2,032 different diseases. We test its performance against 21 board-certified dermatologists on biopsy-proven clinical images with two critical binary classification use cases: keratinocyte carcinomas versus benign seborrheic keratoses; and malignant melanomas versus benign nevi. The first case represents the identification of the most common cancers, the second represents the identification of the deadliest skin cancer. The CNN achieves performance on par with all tested experts across both tasks, demonstrating an artificial intelligence capable of classifying skin cancer with a level of competence comparable to dermatologists. Outfitted with deep neural networks, mobile devices can potentially extend the reach of dermatologists outside of the clinic. It is projected that 6.3 billion smartphone subscriptions will exist by the year 2021 (ref. 13) and can therefore potentially provide low-cost universal access to vital diagnostic care. There are 5.4 million new cases of skin cancer in the United States2 every year. One in five Americans will be diagnosed with a cutaneous malignancy in their lifetime. Although melanomas represent fewer than 5% of all skin cancers in the United States, they account for approxi- mately 75% of all skin-cancer-related deaths, and are responsible for over 10,000 deaths annually in the United States alone. Early detection is critical, as the estimated 5-year survival rate for melanoma drops from over 99% if detected in its earliest stages to about 14% if detected in its latest stages. We developed a computational method which may allow medical practitioners and patients to proactively track skin lesions and detect cancer earlier. By creating a novel disease taxonomy, and a disease-partitioning algorithm that maps individual diseases into training classes, we are able to build a deep learning system for auto- mated dermatology. Previous work in dermatological computer-aided classification12,14,15 has lacked the generalization capability of medical practitioners owing to insufficient data and a focus on standardized tasks such as dermoscopy16–18 and histological image classification19–22 . Dermoscopy images are acquired via a specialized instrument and histological images are acquired via invasive biopsy and microscopy; whereby both modalities yield highly standardized images. Photographic images (for example, smartphone images) exhibit variability in factors such as zoom, angle and lighting, making classification substantially more challenging23,24 . We overcome this challenge by using a data- driven approach—1.41 million pre-training and training images make classification robust to photographic variability. Many previous techniques require extensive preprocessing, lesion segmentation and extraction of domain-specific visual features before classification. By contrast, our system requires no hand-crafted features; it is trained end-to-end directly from image labels and raw pixels, with a single network for both photographic and dermoscopic images. The existing body of work uses small datasets of typically less than a thousand images of skin lesions16,18,19 , which, as a result, do not generalize well to new images. We demonstrate generalizable classification with a new dermatologist-labelled dataset of 129,450 clinical images, including 3,374 dermoscopy images. Deep learning algorithms, powered by advances in computation and very large datasets25 , have recently been shown to exceed human performance in visual tasks such as playing Atari games26 , strategic board games like Go27 and object recognition6 . In this paper we outline the development of a CNN that matches the performance of dermatologists at three key diagnostic tasks: melanoma classification, melanoma classification using dermoscopy and carcinoma classification. We restrict the comparisons to image-based classification. We utilize a GoogleNet Inception v3 CNN architecture9 that was pre- trained on approximately 1.28 million images (1,000 object categories) from the 2014 ImageNet Large Scale Visual Recognition Challenge6 , and train it on our dataset using transfer learning28 . Figure 1 shows the working system. The CNN is trained using 757 disease classes. Our dataset is composed of dermatologist-labelled images organized in a tree-structured taxonomy of 2,032 diseases, in which the individual diseases form the leaf nodes. The images come from 18 different clinician-curated, open-access online repositories, as well as from clinical data from Stanford University Medical Center. Figure 2a shows a subset of the full taxonomy, which has been organized clinically and visually by medical experts. We split our dataset into 127,463 training and validation images and 1,942 biopsy-labelled test images. To take advantage of fine-grained information contained within the taxonomy structure, we develop an algorithm (Extended Data Table 1) to partition diseases into fine-grained training classes (for example, amelanotic melanoma and acrolentiginous melanoma). During inference, the CNN outputs a probability distribution over these fine classes. To recover the probabilities for coarser-level classes of interest (for example, melanoma) we sum the probabilities of their descendants (see Methods and Extended Data Fig. 1 for more details). We validate the effectiveness of the algorithm in two ways, using nine-fold cr dermatology FOCUS LETTERS W W W Ca d o og s eve a hy hm a de ec on and c ass ca on n ambu a o y e ec oca d og ams us ng a deep neu a ne wo k M m M FOCUS LETTERS D p a n ng nab obu a m n and on o human b a o y a n v o a on gynecology O G NA A W on o On o og nd b e n e e men e ommend on g eemen w h n e pe mu d p n umo bo d oncology D m m B D m OHCA m Kw MD K H MD M H M K m MD M M K m MD M M L m MD M K H K m MD D MD D MD D R K C MD D B H O MD D D m Em M M H K D C C C M H K T w A D C D m M C C M H G m w G R K Tw w C A K H MD D C D m M C C M H K G m w G R K T E m m @ m m A A m O OHCA m m m w w T m m DCA M T w m K OHCA w A CCEPTED M A N U SCRIPT emergency med nephrology gastroenterology ThTh C % % % % Deve opmen and Va da on o Deep Lea n ng–based Au oma c De ec on A go hm o Ma gnan Pu mona y Nodu es on Ches Rad og aphs M M M M M Th M M M M M M Th radiology cardiology LETTERS W Au oma ed deep neu a ne wo k su ve ance o c an a mages o acu e neu o og c even s M M M m M m m neurology
  • 6.
    NATURE MEDICINE and thealgorithm led to the best accuracy, and the algorithm mark- edly sped up the review of slides35 . This study is particularly notable, as the synergy of the combined pathologist and algorithm interpreta- tion was emphasized instead of the pervasive clinician-versus-algo- 1 Table 2 | FDA AI approvals are accelerating Company FDA Approval Indication Apple September 2018 Atrial fibrillation detection Aidoc August 2018 CT brain bleed diagnosis iCAD August 2018 Breast density via mammography Zebra Medical July 2018 Coronary calcium scoring Bay Labs June 2018 Echocardiogram EF determination Neural Analytics May 2018 Device for paramedic stroke diagnosis IDx April 2018 Diabetic retinopathy diagnosis Icometrix April 2018 MRI brain interpretation Imagen March 2018 X-ray wrist fracture diagnosis Viz.ai February 2018 CT stroke diagnosis Arterys February 2018 Liver and lung cancer (MRI, CT) diagnosis MaxQ-AI January 2018 CT brain bleed diagnosis Alivecor November 2017 Atrial fibrillation detection via Apple Watch Arterys January 2017 MRI heart interpretation NATURE MEDICINE FDA-approved AI medical devices Nature Medicine 2019 •Zebra Medical Vision •May 2019: pneumothorax detection on X-ray •June 2019: intracranial hemorrhage on CT •Aidoc •May 2019: pulmonary embolism on CT •June 2019: cervical spine fractures on CT •GE Healthcare •Sep 2019: pneumothorax triage on X-ray machine +
  • 7.
    • 1. VUNOMed Bone-Age • 2. Lunit Insight for lung nodule • 3. JLK Inspection for Cerebral infarction • 4. Informeditec Neuro-I for dementia 
 • 5. Samsung Electronics for lung nodule • 6. VUNO Med Deep Brain • 7. Lunit Insight Mammograpy • 8. JLK Inspection ATROSCAN • 9. VUNO Chest X-ray • 10. Deepnoid Deep Spine • 11. JLK Inspection Lung CT(JLD-01A) • 12. JLK Inspection Colonoscopy(JFD-01A) • 13. JLK Inspection Gastroscopy (JFD-02A) 2018 2019 KFDA-approved AI medical devices
  • 8.
    Artificial Intelligence inmedicine is not a future. It is already here.
  • 9.
    Artificial Intelligence inmedicine is not a future. It is already here.
  • 10.
    Wrong Question Who Wins?Who lose? (x) Will it ‘replace’ doctors? (x)
  • 11.
    Right Question How canwe make a better medicine, based on the collaboration with AI?
  • 12.
    The American MedicalAssociation House of Delegates has adopted policies to keep the focus on advancing the role of augmented intelligence (AI) in enhancing patient care, improving population health, reducing overall costs, increasing value and the support of professional satisfaction for physicians. Foundational policy Annual 2018 As a leader in American medicine, our AMA has a unique opportunity to ensure that the evolution of AI in medicine benefits patients, physicians and the health care community. To that end our AMA seeks to: Leverage ongoing engagement in digital health and other priority areas for improving patient outcomes and physician professional satisfaction to help set priorities for health care AI Identify opportunities to integrate practicing physicians’perspectives into the development, design, validation and implementation of health care AI Promote development of thoughtfully designed, high-quality, clinically validated health care AI that: • Is designed and evaluated in keeping with best practices in user-centered design, particularly for physicians and other members of the health care team • Is transparent • Conforms to leading standards for reproducibility • Identifies and takes steps to address bias and avoids introducing or exacerbating health care disparities, including when testing or deploying new AI tools on vulnerable populations • Safeguards patients’and other individuals’ privacy interests and preserves the security and integrity of personal information Encourage education for patients, physicians, medical students, other health care professionals and health administrators to promote greater understanding of the promise and limitations of health care AI Explore the legal implications of health care AI, such as issues of liability or intellectual property, and advocate for appropriate professional and governmental oversight for safe, effective, and equitable use of and access to health care AI Medical experts are working to determine the clinical applications of AI—work that will guide health care in the future. These experts, along with physicians, state and federal officials must find the path that ends with better outcomes for patients. We have to make sure the technology does not get ahead of our humanity and creativity as physicians. ”—Gerald E. Harmon, MD, AMA Board of Trustees “ Policy Augmented intelligence in health care https://www.ama-assn.org/system/files/2019-08/ai-2018-board-policy-summary.pdf Augmented Intelligence, rather than Artificial Intelligence
  • 13.
    Martin Duggan,“IBM WatsonHealth - Integrated Care & the Evolution to Cognitive Computing” Which capacity of human physicians can be augmented by AI?
  • 14.
    •Analysis of ‘complicated’medical data • EMR, genomic data, clinical trial data, insurance claims etc •Analysis of medical images • radiology, pathology, ophthalmology, dermatology etc •Monitoring continuous biomedical data • sepsis, blood glucose, arrhythmia, cardiac arrest etc AI in Medicine
  • 15.
    •Analysis of ‘complicated’medical data • EMR, genomic data, clinical trial data, insurance claims etc •Analysis of medical images • radiology, pathology, ophthalmology, gastroenterology etc •Monitoring continuous biomedical data • sepsis, blood glucose, arrhythmia, cardiac arrest, AKI etc AI in Medicine
  • 16.
    •Analysis of ‘complicated’medical data • EMR, genomic data, clinical trial data, insurance claims etc •Analysis of medical images • radiology, pathology, ophthalmology, gastroenterology etc •Monitoring continuous biomedical data • sepsis, blood glucose, arrhythmia, cardiac arrest, AKI etc AI in Medicine •Drug development +
  • 17.
    •Analysis of ‘complicated’medical data • EMR, genomic data, clinical trial data, insurance claims etc •Analysis of medical images • radiology, pathology, ophthalmology, gastroenterology etc •Monitoring continuous biomedical data • sepsis, blood glucose, arrhythmia, cardiac arrest, AKI etc AI in Medicine •Drug development +
  • 18.
    •Analysis of ‘complicated’medical data • EMR, genomic data, clinical trial data, insurance claims etc •Analysis of medical images • radiology, pathology, ophthalmology, dermatology etc •Monitoring continuous biomedical data • sepsis, blood glucose, arrhythmia, cardiac arrest, AKI etc AI in Medicine •Drug development +
  • 19.
    Jeopardy! IBM Watson defeatedhuman quiz champions in 2011
  • 20.
    600,000 pieces ofmedical evidence 2 million pages of text from 42 medical journals and clinical trials 69 guidelines, 61,540 clinical trials IBM Watson on Medicine Watson learned... + 1,500 lung cancer cases physician notes, lab results and clinical research + 14,700 hours of hands-on training
  • 22.
  • 24.
  • 25.
    WFO in ASCO2017 Early experience with IBM WFO for lung and colorectal cancer treatment
 (Manipal Hospital in India) Performance of WFO in India 2017 ASCO annual Meeting, J Clin Oncol 35, 2017 (suppl; abstr 8527) 2017 ASCO annual Meeting, J Clin Oncol 35, 2017 (suppl; abstr 8527) • Concordance between treatment plan of WFO and multidisciplinary tumor board. • Concordance level or the proportions of recommended / For consideration / Not 
 
 recommended is different from cancer types and stages. ASCO 2017
  • 26.
    ORIGINAL ARTICLE Watson forOncology and breast cancer treatment recommendations: agreement with an expert multidisciplinary tumor board S. P. Somashekhar1*, M.-J. Sepu´lveda2 , S. Puglielli3 , A. D. Norden3 , E. H. Shortliffe4 , C. Rohit Kumar1 , A. Rauthan1 , N. Arun Kumar1 , P. Patil1 , K. Rhee3 & Y. Ramya1 1 Manipal Comprehensive Cancer Centre, Manipal Hospital, Bangalore, India; 2 IBM Research (Retired), Yorktown Heights; 3 Watson Health, IBM Corporation, Cambridge; 4 Department of Surgical Oncology, College of Health Solutions, Arizona State University, Phoenix, USA *Correspondence to: Prof. Sampige Prasannakumar Somashekhar, Manipal Comprehensive Cancer Centre, Manipal Hospital, Old Airport Road, Bangalore 560017, Karnataka, India. Tel: þ91-9845712012; Fax: þ91-80-2502-3759; E-mail: somashekhar.sp@manipalhospitals.com Background: Breast cancer oncologists are challenged to personalize care with rapidly changing scientific evidence, drug approvals, and treatment guidelines. Artificial intelligence (AI) clinical decision-support systems (CDSSs) have the potential to help address this challenge. We report here the results of examining the level of agreement (concordance) between treatment recommendations made by the AI CDSS Watson for Oncology (WFO) and a multidisciplinary tumor board for breast cancer. Patients and methods: Treatment recommendations were provided for 638 breast cancers between 2014 and 2016 at the Manipal Comprehensive Cancer Center, Bengaluru, India. WFO provided treatment recommendations for the identical cases in 2016. A blinded second review was carried out by the center’s tumor board in 2016 for all cases in which there was not agreement, to account for treatments and guidelines not available before 2016. Treatment recommendations were considered concordant if the tumor board recommendations were designated ‘recommended’ or ‘for consideration’ by WFO. Results: Treatment concordance between WFO and the multidisciplinary tumor board occurred in 93% of breast cancer cases. Subgroup analysis found that patients with stage I or IV disease were less likely to be concordant than patients with stage II or III disease. Increasing age was found to have a major impact on concordance. Concordance declined significantly (P 0.02; P < 0.001) in all age groups compared with patients <45 years of age, except for the age group 55–64 years. Receptor status was not found to affect concordance. Conclusion: Treatment recommendations made by WFO and the tumor board were highly concordant for breast cancer cases examined. Breast cancer stage and patient age had significant influence on concordance, while receptor status alone did not. This study demonstrates that the AI clinical decision-support system WFO may be a helpful tool for breast cancer treatment decision making, especially at centers where expert breast cancer resources are limited. Key words: Watson for Oncology, artificial intelligence, cognitive clinical decision-support systems, breast cancer, concordance, multidisciplinary tumor board Introduction Oncologists who treat breast cancer are challenged by a large and rapidly expanding knowledge base [1, 2]. As of October 2017, for example, there were 69 FDA-approved drugs for the treatment of breast cancer, not including combination treatment regimens [3]. The growth of massive genetic and clinical databases, along with computing systems to exploit them, will accelerate the speed of breast cancer treatment advances and shorten the cycle time for changes to breast cancer treatment guidelines [4, 5]. In add- ition, these information management challenges in cancer care are occurring in a practice environment where there is little time available for tracking and accessing relevant information at the point of care [6]. For example, a study that surveyed 1117 oncolo- gists reported that on average 4.6 h per week were spent keeping VC The Author(s) 2018. Published by Oxford University Press on behalf of the European Society for Medical Oncology. All rights reserved. For permissions, please email: journals.permissions@oup.com. Annals of Oncology 29: 418–423, 2018 doi:10.1093/annonc/mdx781 Published online 9 January 2018 Downloaded from https://academic.oup.com/annonc/article-abstract/29/2/418/4781689 by guest •Annals of Oncology, 2018 January •Concordance between WFO and MTB for breast cancer treatment plan •The first and the only paper, which published in peer-reviewed journal
  • 27.
    mendation in 2016;MMDT, Manipal multidisciplinary tumor board; WFO, Watson for Oncology. 31% 18% 1% 2% 33% 5% 31% 6% 0% 10% 20% Not available Not recommended RecommendedFor consideration 30% 40% 50% 60% 70% 80% 90% 100% 8% 25% 61% 64% 64% 29% 51% 62% Concordance, 93% Concordance, 80% Concordance, 97% Concordance, 95% Concordance, 86% 2% 2% Overall (n=638) Stage I (n=61) Stage II (n=262) Stage III (n=191) Stage IV (n=124) 5% Figure 1. Treatment concordance between WFO and the MMDT overall and by stage. MMDT, Manipal multidisciplinary tumor board; WFO, Watson for Oncology. 5%Non-metastatic HR(+)HER2/neu(+)Triple(–) Metastatic Non-metastatic Metastatic Non-metastatic Metastatic 10% 1% 2% 1% 5% 20% 20%10% 0% Not applicable Not recommended For consideration Recommended 20% 40% 60% 80% 100% 5% 74% 65% 34% 64% 5% 38% 56% 15% 20% 55% 36% 59% Concordance, 95% Concordance, 75% Concordance, 94% Concordance, 98% Concordance, 94% Concordance, 85% Figure 2. Treatment concordance between WFO and the MMDT by stage and receptor status. HER2/neu, human epidermal growth factor receptor 2; HR, hormone receptor; MMDT, Manipal multidisciplinary tumor board; WFO, Watson for Oncology. Volume 29 | Issue 2 | 2018 doi:10.1093/annonc/mdx781 | 421 Downloaded from https://academic.oup.com/annonc/article-abstract/29/2/418/4781689 by guest on 11 April 2018 • Concordance between treatment plan of WFO and multidisciplinary tumor board. • Concordance level or the proportions of recommended / For consideration / Not 
 
 recommended is different from cancer stage, receptor status, and metastatic status.
  • 28.
    mpowering the OncologyCommunity for Cancer Care Genomics Oncology Clinical Trial Matching Watson Health’s oncology clients span more than 35 hospital systems Andrew Norden, KOTRA Conference, March 2017, “The Future of Health is Cognitive” There’s more (even with some evidences)
  • 29.
    IBM Watson Health Watsonfor Clinical Trial Matching (CTM) 18 1. According to the National Comprehensive Cancer Network (NCCN) 2. http://csdd.tufts.edu/files/uploads/02_-_jan_15,_2013_-_recruitment-retention.pdf© 2015 International Business Machines Corporation Searching across eligibility criteria of clinical trials is time consuming and labor intensive Current Challenges Fewer than 5% of adult cancer patients participate in clinical trials1 37% of sites fail to meet minimum enrollment targets. 11% of sites fail to enroll a single patient 2 The Watson solution • Uses structured and unstructured patient data to quickly check eligibility across relevant clinical trials • Provides eligible trial considerations ranked by relevance • Increases speed to qualify patients Clinical Investigators (Opportunity) • Trials to Patient: Perform feasibility analysis for a trial • Identify sites with most potential for patient enrollment • Optimize inclusion/exclusion criteria in protocols Faster, more efficient recruitment strategies, better designed protocols Point of Care (Offering) • Patient to Trials: Quickly find the right trial that a patient might be eligible for amongst 100s of open trials available Improve patient care quality, consistency, increased efficiencyIBM Confidential
  • 30.
    •To process 90patients, against 3 breast cancer protocols (provided by Novartis) •Clinical Trial Coordinator: 110 min •with Watson CTM: 24 min (78% time reduction) •Watson CTM excluded 94% of the patients automatically, reducing the workload. ASCO 2017
  • 31.
    Watson Genomics Overview 20 WatsonGenomics Content • 20+ Content Sources Including: • Medical Articles (23Million) • Drug Information • Clinical Trial Information • Genomic Information Case Sequenced VCF / MAF, Log2, Dge Encryption Molecular Profile Analysis Pathway Analysis Drug Analysis Service Analysis, Reports, & Visualizations
  • 32.
    Kazimierz O. Wrzeszczynski, PhD* MayuO. Frank, NP, MS* Takahiko Koyama, PhD* Kahn Rhrissorrakrai, PhD* Nicolas Robine, PhD Filippo Utro, PhD Anne-Katrin Emde, PhD Bo-Juen Chen, PhD Kanika Arora, MS Minita Shah, MS Vladimir Vacic, PhD Raquel Norel, PhD Erhan Bilal, PhD Ewa A. Bergmann, MSc Julia L. Moore Vogel, PhD Jeffrey N. Bruce, MD Andrew B. Lassman, MD Peter Canoll, MD, PhD Christian Grommes, MD Steve Harvey, BS Laxmi Parida, PhD Vanessa V. Michelini, BS Michael C. Zody, PhD Vaidehi Jobanputra, PhD Ajay K. Royyuru, PhD Robert B. Darnell, MD, Comparing sequencing assays and human-machine analyses in actionable genomics for glioblastoma ABSTRACT Objective: To analyze a glioblastoma tumor specimen with 3 different platforms and compare potentially actionable calls from each. Methods: Tumor DNA was analyzed by a commercial targeted panel. In addition, tumor-normal DNA was analyzed by whole-genome sequencing (WGS) and tumor RNA was analyzed by RNA sequencing (RNA-seq). The WGS and RNA-seq data were analyzed by a team of bioinformaticians and cancer oncologists, and separately by IBM Watson Genomic Analytics (WGA), an automated system for prioritizing somatic variants and identifying drugs. Results: More variants were identified by WGS/RNA analysis than by targeted panels. WGA com- pleted a comparable analysis in a fraction of the time required by the human analysts. Conclusions: The development of an effective human-machine interface in the analysis of deep cancer genomic datasets may provide potentially clinically actionable calls for individual pa- tients in a more timely and efficient manner than currently possible. ClinicalTrials.gov identifier: NCT02725684. Neurol Genet 2017;3:e164; doi: 10.1212/ NXG.0000000000000164 GLOSSARY CNV 5 copy number variant; EGFR 5 epidermal growth factor receptor; GATK 5 Genome Analysis Toolkit; GBM 5 glioblas- toma; IRB 5 institutional review board; NLP 5 Natural Language Processing; NYGC 5 New York Genome Center; RNA-seq 5 RNA sequencing; SNV 5 single nucleotide variant; SV 5 structural variant; TCGA 5 The Cancer Genome Atlas; TPM 5 transcripts per million; VCF 5 variant call file; VUS 5 variants of uncertain significance; WGA 5 Watson Genomic Analytics; WGS 5 whole-genome sequencing. The clinical application of next-generation sequencing technology to cancer diagnosis and treat- ment is in its early stages.1–3 An initial implementation of this technology has been in targeted panels, where subsets of cancer-relevant and/or highly actionable genes are scrutinized for potentially actionable mutations. This approach has been widely adopted, offering high redun- dancy of sequence coverage for the small number of sites of known clinical utility at relatively Wrzeszczynski KO, Neurol Genet. 2017 • To analyze a glioblastoma tumor specimen, • 3 different platforms and compared potentially actionable calls • 1. Human Experts (bioinformaticians and oncologists): WGS and RNA-seq • 2. Watson Genome Analysis: WGS and RNA-seq • 3. Foundation One (a commercial targeted panel)
  • 33.
    Table 3 Listof variants identified as actionable by 3 different platforms Gene Variant Identified variant Identified associated drugs NYGC WGA FO NYGC WGA FO CDKN2A Deletion Yes Yes Yes Palbociclib, LY2835219 LEE001 Palbociclib LY2835219 Clinical trial CDKN2B Deletion Yes Yes Yes Palbociclib, LY2835219 LEE002 Palbociclib LY2835219 Clinical trial EGFR Gain (whole arm) Yes — — Cetuximab — — ERG Missense P114Q Yes Yes — RI-EIP RI-EIP — FGFR3 Missense L49V Yes VUS — TK-1258 — — MET Amplification Yes Yes Yes INC280 Crizotinib, cabozantinib Crizotinib, cabozantinib MET Frame shift R755fs Yes — — INC280 — — MET Exon skipping Yes — — INC280 — — NF1 Deletion Yes — — MEK162 — — NF1 Nonsense R461* Yes Yes Yes MEK162 MEK162, cobimetinib, trametinib, GDC-0994 Everolimus, temsirolimus, trametinib PIK3R1 Insertion R562_M563insI Yes Yes — BKM120 BKM120, LY3023414 — PTEN Loss (whole arm) Yes — — Everolimus, AZD2014 — — STAG2 Frame shift R1012 fs Yes Yes Yes Veliparib, clinical trial Olaparib — DNMT3A Splice site 2083-1G.C — — Yes — — — TERT Promoter-146C.T Yes — Yes — — — ABL2 Missense D716N Germline NA VUS mTOR Missense H1687R Germline NA VUS NPM1 Missense E169D Germline NA VUS NTRK1 Missense G18E Germline NA VUS PTCH1 Missense P1250R Germline NA VUS TSC1 Missense G1035S Germline NA VUS Abbreviations: FO 5 FoundationOne; NYGC 5 New York Genome Center; RNA-seq 5 RNA sequencing; WGA 5 Watson Genomic Analytics; WGS 5 whole- genome sequencing. Genes, variant description, and, where appropriate, candidate clinically relevant drugs are listed. Variants identified by the FO as variants of uncertain significance (VUS) were identified by the NYGC as germline variants. • WGA analysis vastly accelerated the time to discovery of potentially actionable variants from the VCF files. • WGA was able to provide reports of potentially clinically actionable insights within 10 minutes • , while human analysis of this patient's VCF file took an estimated 160 hours of person-time Wrzeszczynski KO, Neurol Genet. 2017 Human experts Watson Genome Analysis Foundation One™
  • 34.
    Enhancing Next-Generation Sequencing-GuidedCancer Care Through Cognitive Computing NIRALI M. PATEL,a,b,† VANESSA V. MICHELINI,f,† JEFF M. SNELL,a,c SAIANAND BALU,a ALAN P. HOYLE,a JOEL S. PARKER,a,c MICHELE C. HAYWARD,a DAVID A. EBERHARD,a,b ASHLEY H. SALAZAR,a PATRICK MCNEILLIE,g JIA XU,g CLAUDIA S. HUETTNER,g TAKAHIKO KOYAMA,h FILIPPO UTRO,h KAHN RHRISSORRAKRAI,h RAQUEL NOREL,h ERHAN BILAL,h AJAY ROYYURU,h LAXMI PARIDA,h H. SHELTON EARP,a,d JUNEKO E. GRILLEY-OLSON,a,d D. NEIL HAYES,a,d STEPHEN J. HARVEY,i NORMAN E. SHARPLESS,a,c,d WILLIAM Y. KIM a,c,d,e a Lineberger Comprehensive Cancer Center, b Department of Pathology and Laboratory Medicine, c Department of Genetics, d Department of Medicine, and e Department of Urology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA; f IBM Watson Health, Boca Raton, Florida, USA; g IBM Watson Health, Cambridge, Massachusetts, USA; h IBM Research, Yorktown Heights, New York, USA; i IBM Watson Health, Herndon, Virginia, USA † Contributed equally Disclosures of potential conflicts of interest may be found at the end of this article. Key Words. Genomics • High-throughput nucleotide sequencing • Artificial intelligence • Precision medicine ABSTRACT Background. Using next-generation sequencing (NGS) to guide cancer therapy has created challenges in analyzing and report- ing large volumes of genomic data to patients and caregivers. Specifically, providing current, accurate information on newly approved therapies and open clinical trials requires consider- able manual curation performed mainly by human “molecular tumor boards” (MTBs). The purpose of this study was to deter- mine the utility of cognitive computing as performed by Wat- son for Genomics (WfG) compared with a human MTB. Materials and Methods. One thousand eighteen patient cases that previously underwent targeted exon sequencing at the University of North Carolina (UNC) and subsequent analysis by the UNCseq informatics pipeline and the UNC MTB between November 7, 2011, and May 12, 2015, were analyzed with WfG, a cognitive computing technology for genomic analysis. Results. Using a WfG-curated actionable gene list, we identified additional genomic events of potential significance (not discov- ered by traditional MTB curation) in 323 (32%) patients. The majority of these additional genomic events were considered actionable based upon their ability to qualify patients for biomarker-selected clinical trials. Indeed, the opening of a rele- vant clinical trial within 1 month prior to WfG analysis provided the rationale for identification of a new actionable event in nearly a quarter of the 323 patients. This automated analysis took <3 minutes per case. Conclusion. These results demonstrate that the interpretation and actionability of somatic NGS results are evolving too rapidly to rely solely on human curation. Molecular tumor boards empowered by cognitive computing could potentially improve patient care by providing a rapid, comprehensive approach for data analysis and consideration of up-to-date availability of clinical trials.The Oncologist 2018;23:179–185 Implications for Practice: The results of this study demonstrate that the interpretation and actionability of somatic next-generation sequencing results are evolving too rapidly to rely solely on human curation. Molecular tumor boards empowered by cognitive computing can significantly improve patient care by providing a fast, cost-effective, and comprehensive approach for data analysis in the delivery of precision medicine. Patients and physicians who are considering enrollment in clinical trials may benefit from the support of such tools applied to genomic data. INTRODUCTION Cancer Diagnostics and Molecular Pathology byguestonSeptember16,2019http://theoncologist.alphamedpress.org/Downloadedfrom • 1,018 patients cases (targeted exon sequencing) • previously analyzed by UNCseq informatics pipeline and MTB • Watson for Genomics • identified additional actionable genetic alteration in 323 (32%) patients. • novel findings are mostly due to recent openings of clinical trials. • this analysis took < 3 min per case. Patel NM, Oncologist. 2018
  • 35.
    Enhancing Next-Generation Sequencing-GuidedCancer Care Through Cognitive Computing NIRALI M. PATEL,a,b,† VANESSA V. MICHELINI,f,† JEFF M. SNELL,a,c SAIANAND BALU,a ALAN P. HOYLE,a JOEL S. PARKER,a,c MICHELE C. HAYWARD,a DAVID A. EBERHARD,a,b ASHLEY H. SALAZAR,a PATRICK MCNEILLIE,g JIA XU,g CLAUDIA S. HUETTNER,g TAKAHIKO KOYAMA,h FILIPPO UTRO,h KAHN RHRISSORRAKRAI,h RAQUEL NOREL,h ERHAN BILAL,h AJAY ROYYURU,h LAXMI PARIDA,h H. SHELTON EARP,a,d JUNEKO E. GRILLEY-OLSON,a,d D. NEIL HAYES,a,d STEPHEN J. HARVEY,i NORMAN E. SHARPLESS,a,c,d WILLIAM Y. KIM a,c,d,e a Lineberger Comprehensive Cancer Center, b Department of Pathology and Laboratory Medicine, c Department of Genetics, d Department of Medicine, and e Department of Urology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA; f IBM Watson Health, Boca Raton, Florida, USA; g IBM Watson Health, Cambridge, Massachusetts, USA; h IBM Research, Yorktown Heights, New York, USA; i IBM Watson Health, Herndon, Virginia, USA † Contributed equally Disclosures of potential conflicts of interest may be found at the end of this article. Key Words. Genomics • High-throughput nucleotide sequencing • Artificial intelligence • Precision medicine ABSTRACT Background. Using next-generation sequencing (NGS) to guide cancer therapy has created challenges in analyzing and report- ing large volumes of genomic data to patients and caregivers. Specifically, providing current, accurate information on newly approved therapies and open clinical trials requires consider- able manual curation performed mainly by human “molecular tumor boards” (MTBs). The purpose of this study was to deter- mine the utility of cognitive computing as performed by Wat- son for Genomics (WfG) compared with a human MTB. Materials and Methods. One thousand eighteen patient cases that previously underwent targeted exon sequencing at the University of North Carolina (UNC) and subsequent analysis by the UNCseq informatics pipeline and the UNC MTB between November 7, 2011, and May 12, 2015, were analyzed with WfG, a cognitive computing technology for genomic analysis. Results. Using a WfG-curated actionable gene list, we identified additional genomic events of potential significance (not discov- ered by traditional MTB curation) in 323 (32%) patients. The majority of these additional genomic events were considered actionable based upon their ability to qualify patients for biomarker-selected clinical trials. Indeed, the opening of a rele- vant clinical trial within 1 month prior to WfG analysis provided the rationale for identification of a new actionable event in nearly a quarter of the 323 patients. This automated analysis took <3 minutes per case. Conclusion. These results demonstrate that the interpretation and actionability of somatic NGS results are evolving too rapidly to rely solely on human curation. Molecular tumor boards empowered by cognitive computing could potentially improve patient care by providing a rapid, comprehensive approach for data analysis and consideration of up-to-date availability of clinical trials.The Oncologist 2018;23:179–185 Implications for Practice: The results of this study demonstrate that the interpretation and actionability of somatic next-generation sequencing results are evolving too rapidly to rely solely on human curation. Molecular tumor boards empowered by cognitive computing can significantly improve patient care by providing a fast, cost-effective, and comprehensive approach for data analysis in the delivery of precision medicine. Patients and physicians who are considering enrollment in clinical trials may benefit from the support of such tools applied to genomic data. INTRODUCTION Cancer Diagnostics and Molecular Pathology byguestonSeptember16,2019http://theoncologist.alphamedpress.org/Downloadedfrom UNCseq pts (1,018) UNCseq pts Actionable gene (703) UNCseq pts No actionable gene (315) WfG Actionable gene (231) WfG Actionable gene (96) WfG Actionable gene (327) WfG-identified actionable gene approved by CCGR (323) Potential to change therapy (47) No evidence of disease (145) Lost to follow-up (29) Withdrew from study (4) Deceased (98) A B C D E Figure 3. Sankey diagram of the flow of the UNCseq molecular tumor board (MTB) and WfG comparison. Of the 1,018 patients previously ana- lyzed by the University of North Carolina (UNC) MTB, 703 were determined to have alterations in genes that met the UNC MTB definition of actionability (A) and 315 did not (B).The WfG analysis suggested that an additional eight genes not previously defined as actionable should be 182 Enhancing Cancer Care Through Cognitive Computing http://theoncDownloadedfrom • 1,018 patients cases (targeted exon sequencing) • previously analyzed by UNCseq informatics pipeline and MTB • Watson for Genomics • identified additional actionable genetic alteration in 323 (32%) patients. • novel findings are mostly due to recent openings of clinical trials. • this analysis took < 3 min per case. Patel NM, Oncologist. 2018
  • 36.
    •Analysis of ‘complicated’medical data • EMR, genomic data, clinical trial data, insurance claims etc •Analysis of medical images • radiology, pathology, ophthalmology, gastroenterology etc •Monitoring continuous biomedical data • sepsis, blood glucose, arrhythmia, cardiac arrest etc AI in Medicine •Drug development +
  • 37.
  • 38.
  • 40.
    • Additive effectwith human physicians (accuracy, time, cost…) • Improvement in patient outcome • Prospective RCT • Real-world clinical setting validation • Something beyond human’s visual perception How to show clinical impact?
  • 41.
    NATURE MEDICINE and thealgorithm led to the best accuracy, and the algorithm mark- edly sped up the review of slides35 . This study is particularly notable, as the synergy of the combined pathologist and algorithm interpreta- tion was emphasized instead of the pervasive clinician-versus-algo- 1 Table 2 | FDA AI approvals are accelerating Company FDA Approval Indication Apple September 2018 Atrial fibrillation detection Aidoc August 2018 CT brain bleed diagnosis iCAD August 2018 Breast density via mammography Zebra Medical July 2018 Coronary calcium scoring Bay Labs June 2018 Echocardiogram EF determination Neural Analytics May 2018 Device for paramedic stroke diagnosis IDx April 2018 Diabetic retinopathy diagnosis Icometrix April 2018 MRI brain interpretation Imagen March 2018 X-ray wrist fracture diagnosis Viz.ai February 2018 CT stroke diagnosis Arterys February 2018 Liver and lung cancer (MRI, CT) diagnosis MaxQ-AI January 2018 CT brain bleed diagnosis Alivecor November 2017 Atrial fibrillation detection via Apple Watch Arterys January 2017 MRI heart interpretation NATURE MEDICINE FDA-approved AI medical devices Nature Medicine 2019 •Zebra Medical Vision •May 2019: pneumothorax detection on X-ray •June 2019: intracranial hemorrhage on CT •Aidoc •May 2019: pulmonary embolism on CT •June 2019: cervical spine fractures on CT •GE Healthcare •Sep 2019: pneumothorax triage on X-ray machine + •FDA approved at least 30 AI medical devices for last 3 years. •Most of the devices are intended to analyze medical images. Nature Medicine 2019
  • 42.
    • 1. VUNOMed Bone-Age • 2. Lunit Insight for lung nodule • 3. JLK Inspection for Cerebral infarction • 4. Informeditec Neuro-I for dementia 
 • 5. Samsung Electronics for lung nodule • 6. VUNO Med Deep Brain • 7. Lunit Insight Mammograpy • 8. JLK Inspection ATROSCAN • 9. VUNO Chest X-ray • 10. Deepnoid Deep Spine • 11. JLK Inspection Lung CT(JLD-01A) • 12. JLK Inspection Colonoscopy(JFD-01A) • 13. JLK Inspection Gastroscopy (JFD-02A) 2018 2019 KFDA-approved AI medical devices
  • 43.
    • 1. VUNOMed Bone-Age • 2. Lunit Insight for lung nodule • 3. JLK Inspection for Cerebral infarction • 4. Informeditec Neuro-I for dementia 
 • 5. Samsung Electronics for lung nodule • 6. VUNO Med Deep Brain • 7. Lunit Insight Mammograpy • 8. JLK Inspection ATROSCAN • 9. VUNO Chest X-ray • 10. Deepnoid Deep Spine • 11. JLK Inspection Lung CT(JLD-01A) • 12. JLK Inspection Colonoscopy(JFD-01A) • 13. JLK Inspection Gastroscopy (JFD-02A) 2018 2019 KFDA-approved AI medical devices •KFDA (MFDS) approved at least 13 AI medical devices for last 2 years. •All of the devices are intended to analyze medical images.
  • 44.
  • 45.
    Radiology 1. CAD: ComputerAided Detection 2. Triage: Prioritization of Critical cases 3. Image driven biomarker
  • 46.
    Radiology 1. CAD: ComputerAided Detection 2. Triage: Prioritization of Critical cases 3. Image driven biomarker
  • 47.
    This copy isfor personal use only. To order printed copies, contact reprints@rsna.org This copy is for personal use only. To order printed copies, contact reprints@rsna.org ORIGINAL RESEARCH • THORACIC IMAGING hest radiography, one of the most common diagnos- intraobserver agreements because of its limited spatial reso- Development and Validation of Deep Learning–based Automatic Detection Algorithm for Malignant Pulmonary Nodules on Chest Radiographs Ju Gang Nam, MD* • Sunggyun Park, PhD* • Eui Jin Hwang, MD • Jong Hyuk Lee, MD • Kwang-Nam Jin, MD, PhD • KunYoung Lim, MD, PhD • Thienkai HuyVu, MD, PhD • Jae Ho Sohn, MD • Sangheum Hwang, PhD • Jin Mo Goo, MD, PhD • Chang Min Park, MD, PhD From the Department of Radiology and Institute of Radiation Medicine, Seoul National University Hospital and College of Medicine, 101 Daehak-ro, Jongno-gu, Seoul 03080, Republic of Korea (J.G.N., E.J.H., J.M.G., C.M.P.); Lunit Incorporated, Seoul, Republic of Korea (S.P.); Department of Radiology, Armed Forces Seoul Hospital, Seoul, Republic of Korea (J.H.L.); Department of Radiology, Seoul National University Boramae Medical Center, Seoul, Republic of Korea (K.N.J.); Department of Radiology, National Cancer Center, Goyang, Republic of Korea (K.Y.L.); Department of Radiology and Biomedical Imaging, University of California, San Francisco, San Francisco, Calif (T.H.V., J.H.S.); and Department of Industrial & Information Systems Engineering, Seoul National University of Science and Technology, Seoul, Republic of Korea (S.H.). Received January 30, 2018; revision requested March 20; revision received July 29; accepted August 6. Address correspondence to C.M.P. (e-mail: cmpark.morphius@gmail.com). Study supported by SNUH Research Fund and Lunit (06–2016–3000) and by Seoul Research and Business Development Program (FI170002). *J.G.N. and S.P. contributed equally to this work. Conflicts of interest are listed at the end of this article. Radiology 2018; 00:1–11 • https://doi.org/10.1148/radiol.2018180237 • Content codes: Purpose: To develop and validate a deep learning–based automatic detection algorithm (DLAD) for malignant pulmonary nodules on chest radiographs and to compare its performance with physicians including thoracic radiologists. Materials and Methods: For this retrospective study, DLAD was developed by using 43292 chest radiographs (normal radiograph– to–nodule radiograph ratio, 34067:9225) in 34676 patients (healthy-to-nodule ratio, 30784:3892; 19230 men [mean age, 52.8 years; age range, 18–99 years]; 15446 women [mean age, 52.3 years; age range, 18–98 years]) obtained between 2010 and 2015, which were labeled and partially annotated by 13 board-certified radiologists, in a convolutional neural network. Radiograph clas- sification and nodule detection performances of DLAD were validated by using one internal and four external data sets from three South Korean hospitals and one U.S. hospital. For internal and external validation, radiograph classification and nodule detection performances of DLAD were evaluated by using the area under the receiver operating characteristic curve (AUROC) and jackknife alternative free-response receiver-operating characteristic (JAFROC) figure of merit (FOM), respectively. An observer performance test involving 18 physicians, including nine board-certified radiologists, was conducted by using one of the four external validation data sets. Performances of DLAD, physicians, and physicians assisted with DLAD were evaluated and compared. Results: According to one internal and four external validation data sets, radiograph classification and nodule detection perfor- mances of DLAD were a range of 0.92–0.99 (AUROC) and 0.831–0.924 (JAFROC FOM), respectively. DLAD showed a higher AUROC and JAFROC FOM at the observer performance test than 17 of 18 and 15 of 18 physicians, respectively (P , .05), and all physicians showed improved nodule detection performances with DLAD (mean JAFROC FOM improvement, 0.043; range, 0.006–0.190; P , .05). Conclusion: This deep learning–based automatic detection algorithm outperformed physicians in radiograph classification and nod- ule detection performance for malignant pulmonary nodules on chest radiographs, and it enhanced physicians’ performances when used as a second reader. ©RSNA, 2018 Online supplemental material is available for this article.
  • 48.
    This copy isfor personal use only. To order printed copies, contact reprints@rsna.org This copy is for personal use only. To order printed copies, contact reprints@rsna.org ORIGINAL RESEARCH • THORACIC IMAGING hest radiography, one of the most common diagnos- intraobserver agreements because of its limited spatial reso- Development and Validation of Deep Learning–based Automatic Detection Algorithm for Malignant Pulmonary Nodules on Chest Radiographs Ju Gang Nam, MD* • Sunggyun Park, PhD* • Eui Jin Hwang, MD • Jong Hyuk Lee, MD • Kwang-Nam Jin, MD, PhD • KunYoung Lim, MD, PhD • Thienkai HuyVu, MD, PhD • Jae Ho Sohn, MD • Sangheum Hwang, PhD • Jin Mo Goo, MD, PhD • Chang Min Park, MD, PhD From the Department of Radiology and Institute of Radiation Medicine, Seoul National University Hospital and College of Medicine, 101 Daehak-ro, Jongno-gu, Seoul 03080, Republic of Korea (J.G.N., E.J.H., J.M.G., C.M.P.); Lunit Incorporated, Seoul, Republic of Korea (S.P.); Department of Radiology, Armed Forces Seoul Hospital, Seoul, Republic of Korea (J.H.L.); Department of Radiology, Seoul National University Boramae Medical Center, Seoul, Republic of Korea (K.N.J.); Department of Radiology, National Cancer Center, Goyang, Republic of Korea (K.Y.L.); Department of Radiology and Biomedical Imaging, University of California, San Francisco, San Francisco, Calif (T.H.V., J.H.S.); and Department of Industrial & Information Systems Engineering, Seoul National University of Science and Technology, Seoul, Republic of Korea (S.H.). Received January 30, 2018; revision requested March 20; revision received July 29; accepted August 6. Address correspondence to C.M.P. (e-mail: cmpark.morphius@gmail.com). Study supported by SNUH Research Fund and Lunit (06–2016–3000) and by Seoul Research and Business Development Program (FI170002). *J.G.N. and S.P. contributed equally to this work. Conflicts of interest are listed at the end of this article. Radiology 2018; 00:1–11 • https://doi.org/10.1148/radiol.2018180237 • Content codes: Purpose: To develop and validate a deep learning–based automatic detection algorithm (DLAD) for malignant pulmonary nodules on chest radiographs and to compare its performance with physicians including thoracic radiologists. Materials and Methods: For this retrospective study, DLAD was developed by using 43292 chest radiographs (normal radiograph– to–nodule radiograph ratio, 34067:9225) in 34676 patients (healthy-to-nodule ratio, 30784:3892; 19230 men [mean age, 52.8 years; age range, 18–99 years]; 15446 women [mean age, 52.3 years; age range, 18–98 years]) obtained between 2010 and 2015, which were labeled and partially annotated by 13 board-certified radiologists, in a convolutional neural network. Radiograph clas- sification and nodule detection performances of DLAD were validated by using one internal and four external data sets from three South Korean hospitals and one U.S. hospital. For internal and external validation, radiograph classification and nodule detection performances of DLAD were evaluated by using the area under the receiver operating characteristic curve (AUROC) and jackknife alternative free-response receiver-operating characteristic (JAFROC) figure of merit (FOM), respectively. An observer performance test involving 18 physicians, including nine board-certified radiologists, was conducted by using one of the four external validation data sets. Performances of DLAD, physicians, and physicians assisted with DLAD were evaluated and compared. Results: According to one internal and four external validation data sets, radiograph classification and nodule detection perfor- mances of DLAD were a range of 0.92–0.99 (AUROC) and 0.831–0.924 (JAFROC FOM), respectively. DLAD showed a higher AUROC and JAFROC FOM at the observer performance test than 17 of 18 and 15 of 18 physicians, respectively (P , .05), and all physicians showed improved nodule detection performances with DLAD (mean JAFROC FOM improvement, 0.043; range, 0.006–0.190; P , .05). Conclusion: This deep learning–based automatic detection algorithm outperformed physicians in radiograph classification and nod- ule detection performance for malignant pulmonary nodules on chest radiographs, and it enhanced physicians’ performances when used as a second reader. ©RSNA, 2018 Online supplemental material is available for this article. • 43,292 chest PA (normal:nodule=34,067:9225) • labeled/annotated by 13 board-certified radiologists. • DLAD were validated 1 internal + 4 external datasets • Classification / Lesion localization • AI vs. Physician vs. Physician+AI • Compared various level of physicians • Non-radiology / radiology residents • Board-certified radiologist / Thoracic radiologists
  • 49.
    Nam et al Figure1: Images in a 78-year-old female patient with a 1.9-cm part-solid nodule at the left upper lobe. (a) The nodule was faintly visible on the chest radiograph (arrowheads) and was detected by 11 of 18 observers. (b) At contrast-enhanced CT examination, biopsy confirmed lung adeno- carcinoma (arrow). (c) DLAD reported the nodule with a confidence level of 2, resulting in its detection by an additional five radiologists and an elevation in its confidence by eight radiologists. Figure 2: Images in a 64-year-old male patient with a 2.2-cm lung adenocarcinoma at the left upper lobe. (a) The nodule was faintly visible on the chest radiograph (arrowheads) and was detected by seven of 18 observers. (b) Biopsy confirmed lung adenocarcinoma in the left upper lobe on contrast-enhanced CT image (arrow). (c) DLAD reported the nodule with a confidence level of 2, resulting in its detection by an additional two radiologists and an elevated confidence level of the nodule by two radiologists.
  • 50.
    Deep Learning AutomaticDetection Algorithm for Malignant Pulmonary Nodules Table 3: Patient Classification and Nodule Detection at the Observer Performance Test Observer Test 1 DLAD versus Test 1 (P Value) Test 2 Test 1 versus Test 2 (P Value) Radiograph Classification (AUROC) Nodule Detection (JAFROC FOM) Radiograph Classification Nodule Detection Radiograph Classification (AUROC) Nodule Detection (JAFROC FOM) Radiograph Classification Nodule Detection Nonradiology physicians Observer 1 0.77 0.716 ,.001 ,.001 0.91 0.853 ,.001 ,.001 Observer 2 0.78 0.657 ,.001 ,.001 0.90 0.846 ,.001 ,.001 Observer 3 0.80 0.700 ,.001 ,.001 0.88 0.783 ,.001 ,.001 Group 0.691 ,.001* 0.828 ,.001* Radiology residents Observer 4 0.78 0.767 ,.001 ,.001 0.80 0.785 .02 .03 Observer 5 0.86 0.772 .001 ,.001 0.91 0.837 .02 ,.001 Observer 6 0.86 0.789 .05 .002 0.86 0.799 .08 .54 Observer 7 0.84 0.807 .01 .003 0.91 0.843 .003 .02 Observer 8 0.87 0.797 .10 .003 0.90 0.845 .03 .001 Observer 9 0.90 0.847 .52 .12 0.92 0.867 .04 .03 Group 0.790 ,.001* 0.867 ,.001* Board-certified radiologists Observer 10 0.87 0.836 .05 .01 0.90 0.865 .004 .002 Observer 11 0.83 0.804 ,.001 ,.001 0.84 0.817 .03 .04 Observer 12 0.88 0.817 .18 .005 0.91 0.841 .01 .01 Observer 13 0.91 0.824 ..99 .02 0.92 0.836 .51 .24 Observer 14 0.88 0.834 .14 .03 0.88 0.840 .87 .23 Group 0.821 .02* 0.840 .01* Thoracic radiologists Observer 15 0.94 0.856 .15 .21 0.96 0.878 .08 .03 Observer 16 0.92 0.854 .60 .17 0.93 0.872 .34 .02 Observer 17 0.86 0.820 .02 .01 0.88 0.838 .14 .12 Observer 18 0.84 0.800 ,.001 ,.001 0.87 0.827 .02 .02 Group 0.833 .08* 0.854 ,.001* Note.—Observer 4 had 1 year of experience; observers 5 and 6 had 2 years of experience; observers 7–9 had 3 years of experience; observers 10–12 had 7 years of experience; observers 13 and 14 had 8 years of experience; observer 15 had 26 years of experience; observer 16 had 13 years of experience; and observers 17 and 18 had 9 years of experience. Observers 1–3 were 4th-year residents from obstetrics and gynecolo- AI vs. Physician (p value) Physician vs. Physician + AI (p value)Physician Non-radiology Residents Radiology Residents Board-certified Radiologists Thoracic Radiologists Physician + AI
  • 51.
    Deep Learning AutomaticDetection Algorithm for Malignant Pulmonary Nodules Table 3: Patient Classification and Nodule Detection at the Observer Performance Test Observer Test 1 DLAD versus Test 1 (P Value) Test 2 Test 1 versus Test 2 (P Value) Radiograph Classification (AUROC) Nodule Detection (JAFROC FOM) Radiograph Classification Nodule Detection Radiograph Classification (AUROC) Nodule Detection (JAFROC FOM) Radiograph Classification Nodule Detection Nonradiology physicians Observer 1 0.77 0.716 ,.001 ,.001 0.91 0.853 ,.001 ,.001 Observer 2 0.78 0.657 ,.001 ,.001 0.90 0.846 ,.001 ,.001 Observer 3 0.80 0.700 ,.001 ,.001 0.88 0.783 ,.001 ,.001 Group 0.691 ,.001* 0.828 ,.001* Radiology residents Observer 4 0.78 0.767 ,.001 ,.001 0.80 0.785 .02 .03 Observer 5 0.86 0.772 .001 ,.001 0.91 0.837 .02 ,.001 Observer 6 0.86 0.789 .05 .002 0.86 0.799 .08 .54 Observer 7 0.84 0.807 .01 .003 0.91 0.843 .003 .02 Observer 8 0.87 0.797 .10 .003 0.90 0.845 .03 .001 Observer 9 0.90 0.847 .52 .12 0.92 0.867 .04 .03 Group 0.790 ,.001* 0.867 ,.001* Board-certified radiologists Observer 10 0.87 0.836 .05 .01 0.90 0.865 .004 .002 Observer 11 0.83 0.804 ,.001 ,.001 0.84 0.817 .03 .04 Observer 12 0.88 0.817 .18 .005 0.91 0.841 .01 .01 Observer 13 0.91 0.824 ..99 .02 0.92 0.836 .51 .24 Observer 14 0.88 0.834 .14 .03 0.88 0.840 .87 .23 Group 0.821 .02* 0.840 .01* Thoracic radiologists Observer 15 0.94 0.856 .15 .21 0.96 0.878 .08 .03 Observer 16 0.92 0.854 .60 .17 0.93 0.872 .34 .02 Observer 17 0.86 0.820 .02 .01 0.88 0.838 .14 .12 Observer 18 0.84 0.800 ,.001 ,.001 0.87 0.827 .02 .02 Group 0.833 .08* 0.854 ,.001* Note.—Observer 4 had 1 year of experience; observers 5 and 6 had 2 years of experience; observers 7–9 had 3 years of experience; observers 10–12 had 7 years of experience; observers 13 and 14 had 8 years of experience; observer 15 had 26 years of experience; observer 16 had 13 years of experience; and observers 17 and 18 had 9 years of experience. Observers 1–3 were 4th-year residents from obstetrics and gynecolo- AI vs. Physician (p value) Physician vs. Physician + AI (p value)Physician Non-radiology Residents Radiology Residents Board-certified Radiologists Thoracic Radiologists •As ‘a second reader’, AI improved lung nodule detection, •classification: 17 of 18 (15 of 18, P<0.05) •nodule detection: 18 of 18 (14 of 18, P<0.05) Physician + AI
  • 52.
    HUMAN ONLY HUMAN +ALGORITHM HUMAN ONLY HUMAN + ALGORITHM HUMAN ONLY HUMAN + ALGORITHM HUMAN ONLY HUMAN + ALGORITHM HUMAN ONLY HUMAN + ALGORITHM Clinical Study Results (CXR Nodule) - Radiology On the courtesy of Lunit, Inc •Improvement with AI, as ‘a second reader’, 
 
 was most substantial for Non-Radiology Physicians, 
 
 compared to board-certified radiologist or radiology residents.
  • 53.
    Radiology 1. CAD: ComputerAided Detection 2. Triage: Prioritization of Critical cases 3. Image driven biomarker “Something beyond human’s visual perception”
  • 54.
  • 55.
    ORIGINAL RESEARCH BREASTIMAGING S ince the creation of the Gail model in 1989 (1), risk models have supported risk-adjusted screening and pre- vention and their continued evolution has been a central pillar of breast cancer research (1–8). Previous research (2,3) explored multiple risk factors related to hormonal and genetic information. Mammographic breast density, which relates to the amount of fibroglandular tissue in a woman’s breast, is a risk factor that received substantial at- tention. Brentnall et al (8) incorporated mammographic breast density into the Gail risk model and Tyrer-Cuzick model (TC), improving their areas under the receiver op- erating characteristic curve (AUCs) from 0.55 and 0.57 to 0.59 and 0.61, respectively. The use of breast density as a proxy for the detailed in- mammography with vastly different outcomes. Whereas previous studies (10–12) explored automated methods to assess breast density, these efforts reduced the mammo- graphic input into a few statistics largely related to volume of glandular tissue that are not sufficient to distinguish pa- tients who will and will not develop breast cancer. We hypothesize that there are subtle but informa- tive cues on mammograms that may not be discernible by humans or simple volume-of-density measurements, and deep learning (DL) can leverage these cues to yield improved risk models. Therefore, we developed a DL model that operates over a full-field mammographic im- age to assess a patient’s future breast cancer risk. Rather than manually identifying discriminative image patterns, A Deep Learning Mammography-based Model for Improved Breast Cancer Risk Prediction From the Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, 32 Vassar St, 32-G484, Cambridge, MA 02139 (A.Y., T.S., T.P., R.B.); and Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Mass (C.L.). Received November 28, 2018; revision requested January 18, 2019; revision received March 14; accepted March 18. Address correspondence to A.Y. (e-mail: adamyala@csail.mit.edu). Conflicts of interest are listed at the end of this article. See also the editorial by Sitek and Wolfe in this issue. Radiology 2019; 00:1–7 https://doi.org/10.1148/radiol.2019182716 Content code: Background: Mammographic density improves the accuracy of breast cancer risk models. However, the use of breast density is lim- ited by subjective assessment, variation across radiologists, and restricted data. A mammography-based deep learning (DL) model may provide more accurate risk prediction. Purpose: To develop a mammography-based DL breast cancer risk model that is more accurate than established clinical breast can- cer risk models. Materials and Methods: This retrospective study included 88994 consecutive screening mammograms in 39571 women between January 1, 2009, and December 31, 2012. For each patient, all examinations were assigned to either training, validation, or test sets, resulting in 71689, 8554, and 8751 examinations, respectively. Cancer outcomes were obtained through linkage to a regional tumor registry. By using risk factor information from patient questionnaires and electronic medical records review, three models were developed to assess breast cancer risk within 5 years: a risk-factor-based logistic regression model (RF-LR) that used traditional risk factors, a DL model (image-only DL) that used mammograms alone, and a hybrid DL model that used both traditional risk factors and mammograms. Comparisons were made to an established breast cancer risk model that included breast density (Tyrer- Cuzick model, version 8 [TC]). Model performance was compared by using areas under the receiver operating characteristic curve (AUCs) with DeLong test (P , .05). Results: The test set included 3937 women, aged 56.20 years 6 10.04. Hybrid DL and image-only DL showed AUCs of 0.70 (95% confidence interval [CI]: 0.66, 0.75) and 0.68 (95% CI: 0.64, 0.73), respectively. RF-LR and TC showed AUCs of 0.67 (95% CI: 0.62, 0.72) and 0.62 (95% CI: 0.57, 0.66), respectively. Hybrid DL showed a significantly higher AUC (0.70) than TC (0.62; P , .001) and RF-LR (0.67; P = .01). Conclusion: Deep learning models that use full-field mammograms yield substantially improved risk discrimination compared with the Tyrer-Cuzick (version 8) model. ©RSNA, 2019 Online supplemental material is available for this article. Radiology 2019
  • 56.
    ORIGINAL RESEARCH BREASTIMAGING S ince the creation of the Gail model in 1989 (1), risk models have supported risk-adjusted screening and pre- vention and their continued evolution has been a central pillar of breast cancer research (1–8). Previous research (2,3) explored multiple risk factors related to hormonal and genetic information. Mammographic breast density, which relates to the amount of fibroglandular tissue in a woman’s breast, is a risk factor that received substantial at- tention. Brentnall et al (8) incorporated mammographic breast density into the Gail risk model and Tyrer-Cuzick model (TC), improving their areas under the receiver op- erating characteristic curve (AUCs) from 0.55 and 0.57 to 0.59 and 0.61, respectively. The use of breast density as a proxy for the detailed in- mammography with vastly different outcomes. Whereas previous studies (10–12) explored automated methods to assess breast density, these efforts reduced the mammo- graphic input into a few statistics largely related to volume of glandular tissue that are not sufficient to distinguish pa- tients who will and will not develop breast cancer. We hypothesize that there are subtle but informa- tive cues on mammograms that may not be discernible by humans or simple volume-of-density measurements, and deep learning (DL) can leverage these cues to yield improved risk models. Therefore, we developed a DL model that operates over a full-field mammographic im- age to assess a patient’s future breast cancer risk. Rather than manually identifying discriminative image patterns, A Deep Learning Mammography-based Model for Improved Breast Cancer Risk Prediction From the Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, 32 Vassar St, 32-G484, Cambridge, MA 02139 (A.Y., T.S., T.P., R.B.); and Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Mass (C.L.). Received November 28, 2018; revision requested January 18, 2019; revision received March 14; accepted March 18. Address correspondence to A.Y. (e-mail: adamyala@csail.mit.edu). Conflicts of interest are listed at the end of this article. See also the editorial by Sitek and Wolfe in this issue. Radiology 2019; 00:1–7 https://doi.org/10.1148/radiol.2019182716 Content code: Background: Mammographic density improves the accuracy of breast cancer risk models. However, the use of breast density is lim- ited by subjective assessment, variation across radiologists, and restricted data. A mammography-based deep learning (DL) model may provide more accurate risk prediction. Purpose: To develop a mammography-based DL breast cancer risk model that is more accurate than established clinical breast can- cer risk models. Materials and Methods: This retrospective study included 88994 consecutive screening mammograms in 39571 women between January 1, 2009, and December 31, 2012. For each patient, all examinations were assigned to either training, validation, or test sets, resulting in 71689, 8554, and 8751 examinations, respectively. Cancer outcomes were obtained through linkage to a regional tumor registry. By using risk factor information from patient questionnaires and electronic medical records review, three models were developed to assess breast cancer risk within 5 years: a risk-factor-based logistic regression model (RF-LR) that used traditional risk factors, a DL model (image-only DL) that used mammograms alone, and a hybrid DL model that used both traditional risk factors and mammograms. Comparisons were made to an established breast cancer risk model that included breast density (Tyrer- Cuzick model, version 8 [TC]). Model performance was compared by using areas under the receiver operating characteristic curve (AUCs) with DeLong test (P , .05). Results: The test set included 3937 women, aged 56.20 years 6 10.04. Hybrid DL and image-only DL showed AUCs of 0.70 (95% confidence interval [CI]: 0.66, 0.75) and 0.68 (95% CI: 0.64, 0.73), respectively. RF-LR and TC showed AUCs of 0.67 (95% CI: 0.62, 0.72) and 0.62 (95% CI: 0.57, 0.66), respectively. Hybrid DL showed a significantly higher AUC (0.70) than TC (0.62; P , .001) and RF-LR (0.67; P = .01). Conclusion: Deep learning models that use full-field mammograms yield substantially improved risk discrimination compared with the Tyrer-Cuzick (version 8) model. ©RSNA, 2019 Online supplemental material is available for this article. 0.77] and 0.71 [95% CI: 0.66, 0.77 pared with an AUC of 0.66 (95% C improvement of hybrid DL and ima Table E3 [online]). The improvement of hybrid DL overTC was significant (P , .01). For patients without a family history of breast or ovarian cancer, hybrid DL and image-only DL showed similar discrimination accuracies (AUCs, 0.71 [95% CI: 0.65, Model AUC Ratio Hazard Ratio in Top Decile TC 0.62 (0.57, 0.66) 1.89 (0.91, 2.63) 0.50 (0.08, 0.81) 0.18 (0.11, 0.24) RF-LR 0.67 (0.62, 0.72) 3.69 (2.25, 4.94) 0.41 (0, 0.72) 0.31 (0.23, 0.38) Image-only DL 0.68 (0.64, 0.73) 2.31 (1.46, 3.02) 0.40 (0.09, 0.61) 0.22 (0.16, 0.27) Hybrid DL 0.70 (0.66, 0.75) 3.80 (2.45, 4.91) 0.36 (0.01, 0.60) 0.31 (0.24, 0.38) Note.—Data in parentheses are 95% confidence intervals. There were a total of 3937 patients, 8751 examination = area under receiver operator characteristic curve, DL = deep learning, RF-LR = risk-factor-based logistic regress Figure 2: Receiver operating characteristic curve of all models on the test set. All P values are comparisons with Tyrer-Cuzick version 8 (TCv8). DL = deep learning, hybrid DL = DL model that uses both im- aging and the traditional risk factors in risk factor logistic regression, RF-LR = risk factor logistic regression. Table 3: Risk Test Set for 5-year Ri by Ethnicity Parameter Ethnicity White TC RF-LR Image-only DL Hybrid DL African American TC RF-LR Image-only DL Hybrid DL Note.—Data in parentheses are 95% c 3157 patients who were white, there w and 233 cancers; in the 202 patients w there were 424 examinations and 11 ca receiver operator characteristic curve RF-LR = risk-factor-based logistic regr • Deep learning model based on mammography image
 
 predicts the onset of breast cancer in next 5 years, 
 
 better than existing risk-factor based model. • Image Only DL& Hybrid DL showed better performance 
 
 than risk factor based models. 1. TC model (existing risk based model) 2. Logistic regression based on risk factor (strong control) 3. Deep learning model, based only on mammography image 4. hybrid of 2+3 (both risk factor and DL of mammography) Radiology 2019
  • 57.
  • 58.
    Pathology 1. CAD: ComputerAided Detection 2. Triage: Prioritization of Critical cases 3. Image driven biomarker
  • 59.
    Figure 4. ParticipatingPathologists’ Interpretations of Each of the 240 Breast Biopsy Test Cases 0 25 50 75 100 Interpretations, % 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 Case Benign without atypia 72 Cases 2070 Total interpretations A 0 25 50 75 100 Interpretations, % 218 220 222 224 226 228 230 232 234 236 238 240 Case Invasive carcinoma 23 Cases 663 Total interpretations D 0 25 50 75 100 Interpretations, % 147 145 149 151 153 155 157 159 161 163 165 167 169 171 173 175 177 179 181 183 185 187 189 191 193 195 197 199 201 203 205 207 209 211 213 215 217 Case DCIS 73 Cases 2097 Total interpretations C 0 25 50 75 100 Interpretations, % 74 76 78 80 82 84 86 88 90 92 94 96 98 100 102 104 106 108 110 112 114 116 118 120 122 124 126 128 130 132 134 136 138 140 142 144 Case Atypia 72 Cases 2070 Total interpretations B Benign without atypia Atypia DCIS Invasive carcinoma Pathologist interpretation DCIS indicates ductal carcinoma in situ. Diagnostic Concordance in Interpreting Breast Biopsies Original Investigation Research Elmore etl al. JAMA 2015 Diagnostic Concordance Among Pathologists Interpreting Breast Biopsy Specimens • Diagnostic concordance among the pathologist is only about 75% • Blind test to interpret breast biopsy specimens • 240 cases, among 115 board-certified pathologists
  • 60.
    Figure 4. ParticipatingPathologists’ Interpretations of Each of the 240 Breast Biopsy Test Cases 0 25 50 75 100 Interpretations, % 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 Case Benign without atypia 72 Cases 2070 Total interpretations A 0 25 50 75 100 Interpretations, % 218 220 222 224 226 228 230 232 234 236 238 240 Case Invasive carcinoma 23 Cases 663 Total interpretations D 0 25 50 75 100 Interpretations, % 147 145 149 151 153 155 157 159 161 163 165 167 169 171 173 175 177 179 181 183 185 187 189 191 193 195 197 199 201 203 205 207 209 211 213 215 217 Case DCIS 73 Cases 2097 Total interpretations C 0 25 50 75 100 Interpretations, % 74 76 78 80 82 84 86 88 90 92 94 96 98 100 102 104 106 108 110 112 114 116 118 120 122 124 126 128 130 132 134 136 138 140 142 144 Case Atypia 72 Cases 2070 Total interpretations B Benign without atypia Atypia DCIS Invasive carcinoma Pathologist interpretation DCIS indicates ductal carcinoma in situ. Diagnostic Concordance in Interpreting Breast Biopsies Original Investigation Research Elmore etl al. JAMA 2015 Diagnostic Concordance Among Pathologists Interpreting Breast Biopsy Specimens • Diagnostic concordance among the pathologist is only about 75% • Blind test to interpret breast biopsy specimens • 240 cases, among 115 board-certified pathologists
  • 61.
    ISBI Grand Challengeon Cancer Metastases Detection in Lymph Node
  • 62.
  • 63.
    International Symposium onBiomedical Imaging 2016 H&E Image Processing Framework Train whole slide image sample sample training data normaltumor Test whole slide image overlapping image patches tumor prob. map 1.0 0.0 0.5 Convolutional Neural Network P(tumor)
  • 64.
  • 65.
    Assisting Pathologists inDetecting Cancer with Deep Learning • Algorithms need to be incorporated in a way that complements the pathologist’s workflow. • Algorithms could improve the efficiency and consistency of pathologists. • For example, pathologists could reduce their false negative rates (percentage of 
 
 undetected tumors) by reviewing the top ranked predicted tumor regions 
 
 including up to 8 false positive regions per slide.
  • 66.
    Matchmaking in heaven? Sensitivityof AI + Specificity of Human Yun Liu et al. Detecting Cancer Metastases on Gigapixel Pathology Images (2017) • Google’s AI improved substantially in sensitivity (92.9%, 88.5%) • Pathologist showed almost 100% of specificity. •Pathologist and AI are both good at detection, but in different ways. •It shows potential additive effect on accuracy, efficiency and consistency
  • 67.
  • 68.
    Downloadedfromhttps://journals.lww.com/ajspbyBhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4XMi0hCywCX1AWnYQp/IlQrHD3MyLIZIvnCFZVJ56DGsD590P5lh5KqE20T/dBX3x9CoM=on10/14/2018 Downloadedfromhttps://journals.lww.com/ajspbyBhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4XMi0hCywCX1AWnYQp/IlQrHD3MyLIZIvnCFZVJ56DGsD590P5lh5KqE20T/dBX3x9CoM=on10/14/2018 Impact of DeepLearning Assistance on the Histopathologic Review of Lymph Nodes for Metastatic Breast Cancer David F. Steiner, MD, PhD,* Robert MacDonald, PhD,* Yun Liu, PhD,* Peter Truszkowski, MD,* Jason D. Hipp, MD, PhD, FCAP,* Christopher Gammage, MS,* Florence Thng, MS,† Lily Peng, MD, PhD,* and Martin C. Stumpe, PhD* Abstract: Advances in the quality of whole-slide images have set the stage for the clinical use of digital images in anatomic pathology. Along with advances in computer image analysis, this raises the possibility for computer-assisted diagnostics in pathology to improve histopathologic interpretation and clinical care. To evaluate the potential impact of digital assistance on interpretation of digitized slides, we conducted a multireader multicase study utilizing our deep learning algorithm for the detection of breast cancer metastasis in lymph nodes. Six pathologists reviewed 70 digitized slides from lymph node sections in 2 reader modes, unassisted and assisted, with a wash- out period between sessions. In the assisted mode, the deep learning algorithm was used to identify and outline regions with high like- lihood of containing tumor. Algorithm-assisted pathologists demon- strated higher accuracy than either the algorithm or the pathologist alone. In particular, algorithm assistance significantly increased the sensitivity of detection for micrometastases (91% vs. 83%, P=0.02). In addition, average review time per image was significantly shorter with assistance than without assistance for both micrometastases (61 vs. 116 s, P=0.002) and negative images (111 vs. 137 s, P=0.018). Lastly, pathologists were asked to provide a numeric score regarding the difficulty of each image classification. On the basis of this score, pathologists considered the image review of micrometastases to be significantly easier when interpreted with assistance (P=0.0005). Utilizing a proof of concept assistant tool, this study demonstrates the potential of a deep learning algorithm to improve pathologist accu- racy and efficiency in a digital pathology workflow. Key Words: artificial intelligence, machine learning, digital pathology, breast cancer, computer aided detection (Am J Surg Pathol 2018;00:000–000) The regulatory approval and gradual implementation of whole-slide scanners has enabled the digitization of glass slides for remote consults and archival purposes.1 Digitiza- tion alone, however, does not necessarily improve the con- sistency or efficiency of a pathologist’s primary workflow. In fact, image review on a digital medium can be slightly slower than on glass, especially for pathologists with limited digital pathology experience.2 However, digital pathology and image analysis tools have already demonstrated po- tential benefits, including the potential to reduce inter-reader variability in the evaluation of breast cancer HER2 status.3,4 Digitization also opens the door for assistive tools based on Artificial Intelligence (AI) to improve efficiency and con- sistency, decrease fatigue, and increase accuracy.5 Among AI technologies, deep learning has demon- strated strong performance in many automated image-rec- ognition applications.6–8 Recently, several deep learning– based algorithms have been developed for the detection of breast cancer metastases in lymph nodes as well as for other applications in pathology.9,10 Initial findings suggest that some algorithms can even exceed a pathologist’s sensitivity for detecting individual cancer foci in digital images. How- ever, this sensitivity gain comes at the cost of increased false positives, potentially limiting the utility of such algorithms for automated clinical use.11 In addition, deep learning algo- rithms are inherently limited to the task for which they have been specifically trained. While we have begun to understand the strengths of these algorithms (such as exhaustive search) and their weaknesses (sensitivity to poor optical focus, tumor mimics; manuscript under review), the potential clinical util- ity of such algorithms has not been thoroughly examined. While an accurate algorithm alone will not necessarily aid pathologists or improve clinical interpretation, these benefits may be achieved through thoughtful and appropriate in- tegration of algorithm predictions into the clinical workflow.8 From the *Google AI Healthcare; and †Verily Life Sciences, Mountain View, CA. D.F.S., R.M., and Y.L. are co-first authors (equal contribution). Work done as part of the Google Brain Healthcare Technology Fellowship (D.F.S. and P.T.). Conflicts of Interest and Source of Funding: D.F.S., R.M., Y.L., P.T., J.D.H., C.G., F.T., L.P., M.C.S. are employees of Alphabet and have Alphabet stock. Correspondence: David F. Steiner, MD, PhD, Google AI Healthcare, 1600 Amphitheatre Way, Mountain View, CA 94043 (e-mail: davesteiner@google.com). Supplemental Digital Content is available for this article. Direct URL citations appear in the printed text and are provided in the HTML and PDF versions of this article on the journal’s website, www.ajsp.com. Copyright © 2018 The Author(s). Published by Wolters Kluwer Health, Inc. This is an open-access article distributed under the terms of the Creative Commons Attribution-Non Commercial-No Derivatives License 4.0 (CCBY-NC-ND), where it is permissible to download and share the work provided it is properly cited. The work cannot be changed in any way or used commercially without permission from the journal. ORIGINAL ARTICLE Am J Surg Pathol Volume 00, Number 00, ’’ 2018 www.ajsp.com | 1 • Google’s Pathology AI, LYNA(LYmph Node Assistant) • Review of lymph nodes for meatastatic breast cancer. • Synergistic effect between Pathologists and AI
  • 69.
    models across allobservations were generated using the glmer function. All models were generated using the lme4 package in R and each category (eg, negative or micrometastases) was modeled separately. confidence interval [CI], 93.4%-95.8%). To evaluate the impact of the assisted read on accuracy, we analyzed per- formance by case category and assistance modality (Fig. 3A). For micrometastases, sensitivity was significantly higher with Negative (Specificity) Micromet (Sensitivity) Macromet (Sensitivity) 0.7 0.5 0.6 0.8 0.9 1.0 p=0.02 A B Performance Unassisted Assisted FIGURE 3. Improved metastasis detection with algorithm assistance. A, Data represents performance across all images by image category and assistance modality. Error bars indicate SE. The performance metric corresponds to corresponds to specificity for negative cases and sensitivity for micrometastases (micromet) and macrometastases (macromet). B, Operating point of individual pathologists with and without assistance for micrometastases and negative cases, overlayed on the receiver operating characteristic curve of the algorithm. AUC indicates area under the curve. Copyright © 2018 Wolters Kluwer Health, Inc. All rights reserved. www.ajsp.com | 5 • Sensitivity / Specificity, with assistance of AI • Micromet: sensitivity was significantly improved • Negative Macromet: non-significant
  • 70.
    isolated diagnostic tasks.Underlying these exciting advances, however, is the important notion that these algorithms do not replace the breadth and contextual knowledge of human pathologists and that even the best algorithms would need to from 83% to 91% and resulted in higher overall diagnostic accuracy than that of either unassisted pathologist inter- pretation or the computer algorithm alone. Although deep learning algorithms have been credited with comparable Unassisted Assisted TimeofReview(seconds) Timeofreviewperimage(seconds) Negative ITC Micromet Macromet A B p=0.002 p=0.02 Unassisted Assisted Micrometastases FIGURE 5. Average review time per image decreases with assistance. A, Average review time per image across all pathologists analyzed by category. Black circles are average times with assistance, gray triangles represent average times without assistance. Error bars indicate 95% confidence interval. B, Micrometastasis time of review decreases for nearly all images with assistance. Circles represent average review time for each individual micrometastasis image, averaged across the 6 pathologists by assistance modality. The dashed lines connect the points corresponding to the same image with and without assistance. The 2 images that were not reviewed faster on average with assistance are represented with red dot-dash lines. Vertical lines of the box represent quartiles, and the diamond indicates the average of review time for micrometastases in that modality. Micromet indicates mi- crometastasis; macromet, macrometastasis. 8 | www.ajsp.com Copyright © 2018 Wolters Kluwer Health, Inc. All rights reserved. • Time of review (per image) • Negative Micromet: significant reduction • Micromet: 2 min ➔ 1 min • ITC(Isolated Tumor Cell) Macromet: non-significant
  • 71.
    Pathology 1. CAD: ComputerAided Detection 2. Triage: Prioritization of Critical cases 3. Image driven biomarker “Something beyond human’s visual perception”: cancer immunotherapy
  • 72.
    BRIEF COMMUNICATION https://doi.org/10.1038/s41591-019-0462-y 1 Department ofMedicine III, University Hospital RWTH Aachen, Aachen, Germany. 2 German Cancer Consortium (DKTK), German Cancer Research Center (DKFZ), Heidelberg, Germany. 3 Applied Tumor Immunity, German Cancer Research Center (DKFZ), Heidelberg, Germany. 4 Hematology/ Microsatellite instability determines whether patients with gastrointestinal cancer respond exceptionally well to immu- notherapy. However, in clinical practice, not every patient is tested for MSI, because this requires additional genetic or immunohistochemical tests. Here we show that deep residual learning can predict MSI directly from HE histology, which is ubiquitously available. This approach has the potential to provide immunotherapy to a much broader subset of patients with gastrointestinal cancer. Although immunotherapy now represents a cornerstone of can- cer therapy, patients with gastrointestinal cancer usually do not benefit to the same extent as patients with other solid malignancies, such as melanoma or lung cancer1 , unless the tumor belongs to the group of microsatellite instable (MSI) tumors2 . In this group, which accounts for approximately 15% of gastric (stomach) adenocarci- noma (STAD) and colorectal cancer (CRC)3 , immune checkpoint inhibitors demonstrated considerable clinical benefit4 , resulting in recent approval by the Food and Drug Administration (FDA). MSI can be identified by immunohistochemistry or genetic analyses5 , but not all patients are screened for MSI except in high-volume tertiary care centers6 . Accordingly, a substantial group of potential respond- ers to immunotherapy may not be offered timely treatment with immune checkpoint inhibitors, missing chances of disease control. Deep learning has outperformed humans in some medical data analysis tasks7 and can predict patient survival and mutations in tumors using images of lung8 , prostate9 and brain10,11 tumors. To facilitate universal MSI screening, we investigated whether deep learning can predict MSI status directly from HE-stained histol- ogy slides. First, we compared five convolutional neural networks on a three-class set of gastrointestinal cancer tissues (n=94slides, n=81patients, Fig. 1a–c, Extended Data Fig. 1). Resnet18, a resid- ual learning12 convolutional neural network, was an efficient tumor detector with an out-of-sample area under the curve (AUC)0.99, which represented an improvement on the current state of the art13,14 . Another resnet18 (Fig. 1d) was trained to classify MSI versus mic- rosatellite stability (MSS, Fig. 1e) in large patient cohorts from The Cancer Genome Atlas (TCGA): n=315formalin-fixed paraffin- embedded (FFPE) samples of STAD15 (TCGA-STAD), n=360FFPE samples of CRC16 (TCGA-CRC-DX) and n=378snap-frozen sam- ples of CRC (TCGA-CRC-KR; Supplementary Table 1). Tumor tissue was automatically detected and subsequently tes- sellated into 100,570 (TCGA-STAD), 60,894 (TCGA-CRC-KR) and 93,408 (TCGA-CRC-DX) color-normalized tiles, in which the deep learning model scored MSI. In the TCGA-CRC-DX test cohort, true MSI image tiles (as defined in Supplementary Table 2) had a median MSI score of 0.61 (95% confidence interval (CI), 0.12–0.82; Fig. 2a), whereas true MSS tiles had an MSI score of 0.29 (95% CI, 0.08–0.57; two-tailed t-test P=1.1×10−6 ; Fig. 2b). In the TCGA-CRC-KR test cohort, the MSI score was 0.50 (95% CI, 0.17–0.80) for MSI tiles and 0.22 (95% CI, 0.06–0.60; P=7.3×10−11 ) for MSS tiles, indicat- ing that our approach can robustly distinguish features that are pre- dictive of MSI in both snap-frozen and FFPE samples. Patient-level AUCs for MSI detection were 0.81 (95% confidence interval, 0.69– 0.90) in TCGA-STAD, 0.84 (95% CI, 0.73–0.91) in TCGA-CRC-KR and 0.77 (95% CI, 0.62–0.87) in TCGA-CRC-DX (Extended Data Fig. 2a; MSI frequency is listed in Supplementary Table 3). The multi-center DACHS study17,18 was used as an external validation set (n=378patients). Using the automatic tumor detec- tor and the MSI detector trained on TCGA-CRC-DX (Fig. 2c), the patient-level AUC was 0.84 (95% CI, 0.72–0.92) (Fig. 2d). The model that was trained on FFPE samples and used on FFPE sam- ples was superior to a model that was trained on frozen samples and used on FFPE samples. Similarly, a model that was trained on CRC samples and used on CRC samples performed better than a model that was trained on STAD samples and used on CRC samples (Extended Data Fig. 2a). To analyze the limits of our proposed Deep learning can predict microsatellite instability directly from histology in gastrointestinal cancer Jakob Nikolas Kather 1,2,3,4,5 *, Alexander T. Pearson4 , Niels Halama 2,5,6 , Dirk Jäger2,3,5 , Jeremias Krause 1 , Sven H. Loosen1 , Alexander Marx7 , Peter Boor 8 , Frank Tacke9 , Ulf Peter Neumann10 , Heike I. Grabsch 11,12 , Takaki Yoshikawa13,14 , Hermann Brenner2,15,16 , Jenny Chang-Claude17,18 , Michael Hoffmeister15 , Christian Trautwein1 and Tom Luedde 1 * Net 1 Net 2 Predicte Tumor versus normal mor detection and MSI prediction in HE histology. a, A convolutional neural network was trained as a tumor detector for STAD and CRC. Scale m. b,c, Tumor regions were cut into square tiles (b), which were color-normalized and sorted into MSI and MSS (c). Scale bar, 256µm. d, Another was trained to classify MSI versus MSS. e, This automatic pipeline was applied to held-out patient sets. MSI ba dc MSS AUC = 0.84 Train on TCGA cohort 360 patients 93,408 tiles Predict on DACHS cohort 378 patients 896,530 tiles MSS MSI e MSIness in STAD C R C -KR C R C -D X D AC H S CD8+ signature CD8+ IHC PD-L1 expression IFNγ signature 1 0 0 1 FPR TPR Correlation 0.50.40.20.1 0.3 assification performance in an external validation set. a,b, Tissue slides of patients with MSI and MSS tumors in the TCGA-CRC-DX test set •Prediction of microsatellite instability in GI cancer 
 
 directly from HE pathology image. •AUC=0.84 , External Validation (n=378) •validated for both snap-frozen and FFPE •validated for endometrial cancer, as well Nat Med 2019
  • 73.
    PD-L1 low, AIlow (mPFS 1.9mo) PD-L1 high, AI low (mPFS 2.0mo) PD-L1 low, AI high (mPFS 4.0mo) PD-L1 high, AI high (mPFS 6.7mo) Therapeutic Biomarker Disc ASCO 2019 •Prediction of immune check point inhibitor response, 
 
 based only on pathology image, in metastatic NSCLC. (n=189) •PD-L1 (existing biomarker) vs. AI score •Additive / complementary effect •AI score is better than PD-L1 for the response prediction. ASCO 2019ASCO 2019 On the courtesy of Lunit, Inc
  • 74.
  • 75.
    ARTICLES https://doi.org/10.1038/s41551-018-0301-3 C olonoscopy is thegold-standard screening test for colorectal cancer1–3 , one of the leading causes of cancer death in both the United States4,5 and China6 . Colonoscopy can reduce the risk of death from colorectal cancer through the detection of tumours at an earlier, more treatable stage as well as through the removal of precancerous adenomas3,7 . Conversely, failure to detect adenomas may lead to the development of interval cancer. Evidence has shown that each 1.0% increase in adenoma detection rate (ADR) leads to a 3.0% decrease in the risk of interval colorectal cancer8 . Although more than 14million colonoscopies are performed in the United States annually2 , the adenoma miss rate (AMR) is estimated to be 6–27%9 . Certain polyps may be missed more fre- quently, including smaller polyps10,11 , flat polyps12 and polyps in the left colon13 . There are two independent reasons why a polyp may be missed during colonoscopy: (i) it was never in the visual field or (ii) it was in the visual field but not recognized. Several hardware innovations have sought to address the first problem by improv- ing visualization of the colonic lumen, for instance by providing a larger, panoramic camera view, or by flattening colonic folds using a distal-cap attachment. The problem of unrecognized polyps within the visual field has been more difficult to address14 . Several studies have shown that observation of the video monitor by either nurses or gastroenterology trainees may increase polyp detection by up to 30%15–17 . Ideally, a real-time automatic polyp-detection system could serve as a similarly effective second observer that could draw the endoscopist’s eye, in real time, to concerning lesions, effec- tively creating an ‘extra set of eyes’ on all aspects of the video data with fidelity. Although automatic polyp detection in colonoscopy videos has been an active research topic for the past 20 years, per- formance levels close to that of the expert endoscopist18–20 have not been achieved. Early work in automatic polyp detection has focused on applying deep-learning techniques to polyp detection, but most published works are small in scale, with small development and/or training validation sets19,20 . Here, we report the development and validation of a deep-learn- ing algorithm, integrated with a multi-threaded processing system, for the automatic detection of polyps during colonoscopy. We vali- dated the system in two image studies and two video studies. Each study contained two independent validation datasets. Results We developed a deep-learning algorithm using 5,545colonoscopy images from colonoscopy reports of 1,290patients that underwent a colonoscopy examination in the Endoscopy Center of Sichuan Provincial People’s Hospital between January 2007 and December 2015. Out of the 5,545images used, 3,634images contained polyps (65.54%) and 1,911 images did not contain polyps (34.46%). For algorithm training, experienced endoscopists annotated the pres- ence of each polyp in all of the images in the development data- set. We validated the algorithm on four independent datasets. DatasetsA and B were used for image analysis, and datasetsC and D were used for video analysis. DatasetA contained 27,113colonoscopy images from colo- noscopy reports of 1,138consecutive patients who underwent a colonoscopy examination in the Endoscopy Center of Sichuan Provincial People’s Hospital between January and December 2016 and who were found to have at least one polyp. Out of the 27,113 images, 5,541images contained polyps (20.44%) and 21,572images did not contain polyps (79.56%). All polyps were confirmed histo- logically after biopsy. DatasetB is a public database (CVC-ClinicDB; Development and validation of a deep-learning algorithm for the detection of polyps during colonoscopy Pu Wang1 , Xiao Xiao2 , Jeremy R. Glissen Brown3 , Tyler M. Berzin 3 , Mengtian Tu1 , Fei Xiong1 , Xiao Hu1 , Peixi Liu1 , Yan Song1 , Di Zhang1 , Xue Yang1 , Liangping Li1 , Jiong He2 , Xin Yi2 , Jingjia Liu2 and Xiaogang Liu 1 * The detection and removal of precancerous polyps via colonoscopy is the gold standard for the prevention of colon cancer. However, the detection rate of adenomatous polyps can vary significantly among endoscopists. Here, we show that a machine- learningalgorithmcandetectpolypsinclinicalcolonoscopies,inrealtimeandwithhighsensitivityandspecificity.Wedeveloped the deep-learning algorithm by using data from 1,290 patients, and validated it on newly collected 27,113 colonoscopy images from 1,138 patients with at least one detected polyp (per-image-sensitivity, 94.38%; per-image-specificity, 95.92%; area under the receiver operating characteristic curve, 0.984), on a public database of 612 polyp-containing images (per-image-sensitiv- ity, 88.24%), on 138 colonoscopy videos with histologically confirmed polyps (per-image-sensitivity of 91.64%; per-polyp-sen- sitivity, 100%), and on 54 unaltered full-range colonoscopy videos without polyps (per-image-specificity, 95.40%). By using a multi-threaded processing system, the algorithm can process at least 25 frames per second with a latency of 76.80±5.60ms in real-time video analysis. The software may aid endoscopists while performing colonoscopies, and help assess differences in polyp and adenoma detection performance among endoscopists. • Deep learning algorithm to detect polyps during colonoscopy • Retrospective study • Validated for both images (snapshot) and videos (short clip, full-range) • sensitivity and specificity 90% ARTICLESNATURE BIOMEDICAL ENGINEERING from patients who underwent colonoscopy examinations up to 2 years later. Also, we demonstrated high per-image-sensitivity (94.38% and 91.64%) in both the image (datasetA) and video (datasetC) analyses. DatasetsA and C included large variations of polyp mor- phology and image quality (Fig. 3, Supplementary Figs. 2–5 and Supplementary Videos 3 and 4). For images with only flat and iso- chromatic polyps of sizeless than0.5cm, which have a higher miss rate25 , the algorithm achieved a per-image-sensitivity of 91.65%, which indicates the algorithm’s ability to capture subtle visual fea- tures of certain polyps and the potential to decrease the missed diagnosis of such polyps in real-world clinical settings. The algorithm reached a per-polyp-sensitivity of 100% along with a tracing persistence of 88.93% in the video analysis of data- setC, further indicating that the software’s ability to track polyps may be comparable to that of an experienced endoscopist. High per- datasets are often small and do not represent the full range of colon conditions encountered in the clinical setting, and there are often discrepancies in the reporting of clinical metrics of success such as sensitivity and specificity19,20,26 . Compared with other metrics such as precision, we believe that sensitivity and specificity are the most appropriate metrics for the evaluation of algorithm performance because of their independence on the ratio of positive to negative samples. Furthermore, many published studies are preclinical, and focus on validation using still images. A 2017 review18 of preclini- cal and clinical studies on AI applied to colonoscopy showed that a recent study27 was the first and only physician-initiated preclini- cal study on automatic polyp detection. In that study, the authors assessed their system using 24colonoscopy videos containing 31polyps and obtained a sensitivity of 70.4% and a specificity of 72.4%. Several other studies have been published since that time, including a study that examines an automatic polyp-detection sys- Fig. 3 | Examples of polyp detection for datasetsA and C. Polyps of different morphology, including flat isochromatic polyps (left), dome-shaped polyps (second from left, middle), pedunculated polyps (second from right) and sessile serrated adenomatous polyps (right), were detected by the algorithm (as indicated by the green tags in the bottom set of images) in both normal and insufficient light conditions, under both qualified and suboptimal bowel preparations. Some polyps were detected with only partial appearance (middle, second from right). See Supplementary Figs 2–6 for additional examples. Nature Biomedical Engineering 2018 Examples of Polyp Detection
  • 76.
    Endoscopy Prosp The trial syste ADR the S patie to Fe bowe given defin high infla Figure1 Deep learning architecture.The detection algorithm is a deep convolutional neural network (CNN) based on SegNet architecture. Data flow is from left to right: a colonoscopy image is sequentially warped into a binary image, with 1 representing polyp pixels and 0 representing no polyp in probability a map.This is then displayed, as showed in the output, with a hollow tracing box on the CADe monitor. 1Wang P, et al. Gut 2019;0:1–7. doi:10.1136/gutjnl-2018-317500 Endoscopy ORIGINAL ARTICLE Real-time automatic detection system increases colonoscopic polyp and adenoma detection rates: a prospective randomised controlled study Pu Wang,  1 Tyler M Berzin,  2 Jeremy Romek Glissen Brown,  2 Shishira Bharadwaj,2 Aymeric Becq,2 Xun Xiao,1 Peixi Liu,1 Liangping Li,1 Yan Song,1 Di Zhang,1 Yi Li,1 Guangre Xu,1 Mengtian Tu,1 Xiaogang Liu  1 To cite: Wang P, Berzin TM, Glissen Brown JR, et al. Gut Epub ahead of print: [please include Day Month Year]. doi:10.1136/ gutjnl-2018-317500 ► Additional material is published online only.To view please visit the journal online (http://dx.doi.org/10.1136/ gutjnl-2018-317500). 1 Department of Gastroenterology, Sichuan Academy of Medical Sciences Sichuan Provincial People’s Hospital, Chengdu, China 2 Center for Advanced Endoscopy, Beth Israel Deaconess Medical Center and Harvard Medical School, Boston, Massachusetts, USA Correspondence to Xiaogang Liu, Department of Gastroenterology Sichuan Academy of Medical Sciences and Sichuan Provincial People’s Hospital, Chengdu, China; Gary.samsph@gmail.com Received 30 August 2018 Revised 4 February 2019 Accepted 13 February 2019 © Author(s) (or their employer(s)) 2019. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ. ABSTRACT Objective The effect of colonoscopy on colorectal cancer mortality is limited by several factors, among them a certain miss rate, leading to limited adenoma detection rates (ADRs).We investigated the effect of an automatic polyp detection system based on deep learning on polyp detection rate and ADR. Design In an open, non-blinded trial, consecutive patients were prospectively randomised to undergo diagnostic colonoscopy with or without assistance of a real-time automatic polyp detection system providing a simultaneous visual notice and sound alarm on polyp detection.The primary outcome was ADR. Results Of 1058 patients included, 536 were randomised to standard colonoscopy, and 522 were randomised to colonoscopy with computer-aided diagnosis.The artificial intelligence (AI) system significantly increased ADR (29.1%vs20.3%, p0.001) and the mean number of adenomas per patient (0.53vs0.31, p0.001).This was due to a higher number of diminutive adenomas found (185vs102; p0.001), while there was no statistical difference in larger adenomas (77vs58, p=0.075). In addition, the number of hyperplastic polyps was also significantly increased (114vs52, p0.001). Conclusions In a low prevalent ADR population, an automatic polyp detection system during colonoscopy resulted in a significant increase in the number of diminutive adenomas detected, as well as an increase in the rate of hyperplastic polyps.The cost–benefit ratio of such effects has to be determined further. Trial registration number ChiCTR-DDD-17012221; Results. INTRODUCTION Colorectal cancer (CRC) is the second and third- leading causes of cancer-related deaths in men and women respectively.1 Colonoscopy is the gold stan- dard for screening CRC.2 3 Screening colonoscopy has allowed for a reduction in the incidence and mortality of CRC via the detection and removal of adenomatous polyps.4–8 Additionally, there is evidence that with each 1.0% increase in adenoma detection rate (ADR), there is an associated 3.0% decrease in the risk of interval CRC.9 10 However, polyps can be missed, with reported miss rates of up to 27% due to both polyp and operator charac- teristics.11 12 Unrecognised polyps within the visual field is an important problem to address.11 Several studies have shown that assistance by a second observer increases the polyp detection rate (PDR), but such a strategy remains controversial in terms of increasing the ADR.13–15 Ideally, a real-time automatic polyp detec- tion system, with performance close to that of expert endoscopists, could assist the endosco- pist in detecting lesions that might correspond to adenomas in a more consistent and reliable way Significance of this study What is already known on this subject? ► Colorectal adenoma detection rate (ADR) is regarded as a main quality indicator of (screening) colonoscopy and has been shown to correlate with interval cancers. Reducing adenoma miss rates by increasing ADR has been a goal of many studies focused on imaging techniques and mechanical methods. ► Artificial intelligence has been recently introduced for polyp and adenoma detection as well as differentiation and has shown promising results in preliminary studies. What are the new findings? ► This represents the first prospective randomised controlled trial examining an automatic polyp detection during colonoscopy and shows an increase of ADR by 50%, from 20% to 30%. ► This effect was mainly due to a higher rate of small adenomas found. ► The detection rate of hyperplastic polyps was also significantly increased. How might it impact on clinical practice in the foreseeable future? ► Automatic polyp and adenoma detection could be the future of diagnostic colonoscopy in order to achieve stable high adenoma detection rates. ► However, the effect on ultimate outcome is still unclear, and further improvements such as polyp differentiation have to be implemented. on17March2019byguest.Protectedbycopyright.http://gut.bmj.com/Gut:firstpublishedas10.1136/gutjnl-2018-317500on27February2019.Downloadedfrom • Deep learning based ‘Real-time’ automatic detection system 
 
 increases colonoscopic polyp and adenoma detection rates • Prospective RCT (n=1058; standard=536, CAD=522) • With aided of AI • Improvement in adenoma detection rate: 29.1% vs 20.3% (p0.001) • Increment of number of detected adenoma: 0.53 vs 0.31 (p0.001) • Increment of number of hyperplastic polyps: 114 vs 52 (p0.001)
  • 77.
    •Analysis of ‘complicated’medical data • EMR, genomic data, clinical trial data, insurance claims etc •Analysis of medical images • radiology, pathology, ophthalmology, gastroenterology etc •Monitoring continuous biomedical data • sepsis, blood glucose, arrhythmia, cardiac arrest, AKI etc AI in Medicine •Drug development +
  • 78.
    •Analysis of ‘complicated’medical data • EMR, genomic data, clinical trial data, insurance claims etc •Analysis of medical images • radiology, pathology, ophthalmology, gastroenterology etc •Monitoring continuous biomedical data • sepsis, blood glucose, arrhythmia, cardiac arrest, AKI etc AI in Medicine •Drug development +
  • 79.
    •Development •data, data,data: quality and quality + privacy •reference standard: how to define gold standard?  •overcoming “overfitting” problem •Validation •Validation level •analytical validity; clinical validity; clinical utility •Interpreting the blackbox: “explainable AI” •Regulation •approve criteria: FDA’s Pre-Cert, active learning etc •adversarial attack: Robustness vs. Performance •Liability: whose liability in case of malpractice Remaining Issues
  • 80.
    •Development •data, data,data: quality and quality + privacy •reference standard: how to define gold standard?  •overcoming “overfitting” problem •Validation •Validation level •analytical validity; clinical validity; clinical utility •Interpreting the blackbox: “explainable AI” •Regulation •approve criteria: FDA’s Pre-Cert, active learning etc •adversarial attack: Robustness vs. Performance •Liability: whose liability in case of malpractice Remaining Issues
  • 81.
    •Development •data, data,data: quality and quality + privacy •reference standard: how to define gold standard?  •overcoming “overfitting” problem •Validation •Validation level •analytical validity; clinical validity; clinical utility •Interpreting the blackbox: “explainable AI” •Regulation •approve criteria: FDA’s Pre-Cert, active learning etc •adversarial attack: Robustness vs. Performance •Liability: whose liability in case of malpractice Remaining Issues
  • 82.
  • 83.
    malpractice more data lackof time 3 minute visit 저수가 EMR 전공의 특별법 lack of manpower over-loading Social recognition Unprecedented Pressure research performance genomics 의료사고=구속 삭감 진상 환자 진료 실적
  • 84.
  • 85.
    Sep 2018, Health2.0 @Santa Clara “Physician Burnout” is recognized as a serious issue in US
  • 86.
    March 2019, theFuture of Individual Medicine @San Diego “vicious cycle” Victims of “Physician Burnout” will be the all of us, in the society.
  • 87.
    Collaboration of physicianand AI : a solution for physician burnout?
  • 88.
    ARTICLES https://doi.org/10.1038/s41591-018-0177-5 1 Applied Bioinformatics Laboratories,New York University School of Medicine, New York, NY, USA. 2 Skirball Institute, Department of Cell Biology, New York University School of Medicine, New York, NY, USA. 3 Department of Pathology, New York University School of Medicine, New York, NY, USA. 4 School of Mechanical Engineering, National Technical University of Athens, Zografou, Greece. 5 Institute for Systems Genetics, New York University School of Medicine, New York, NY, USA. 6 Department of Biochemistry and Molecular Pharmacology, New York University School of Medicine, New York, NY, USA. 7 Center for Biospecimen Research and Development, New York University, New York, NY, USA. 8 Department of Population Health and the Center for Healthcare Innovation and Delivery Science, New York University School of Medicine, New York, NY, USA. 9 These authors contributed equally to this work: Nicolas Coudray, Paolo Santiago Ocampo. *e-mail: narges.razavian@nyumc.org; aristotelis.tsirigos@nyumc.org A ccording to the American Cancer Society and the Cancer Statistics Center (see URLs), over 150,000 patients with lung cancer succumb to the disease each year (154,050 expected for 2018), while another 200,000 new cases are diagnosed on a yearly basis (234,030 expected for 2018). It is one of the most widely spread cancers in the world because of not only smoking, but also exposure to toxic chemicals like radon, asbestos and arsenic. LUAD and LUSC are the two most prevalent types of non–small cell lung cancer1 , and each is associated with discrete treatment guidelines. In the absence of definitive histologic features, this important distinc- tion can be challenging and time-consuming, and requires confir- matory immunohistochemical stains. Classification of lung cancer type is a key diagnostic process because the available treatment options, including conventional chemotherapy and, more recently, targeted therapies, differ for LUAD and LUSC2 . Also, a LUAD diagnosis will prompt the search for molecular biomarkers and sensitizing mutations and thus has a great impact on treatment options3,4 . For example, epidermal growth factor receptor (EGFR) mutations, present in about 20% of LUAD, and anaplastic lymphoma receptor tyrosine kinase (ALK) rearrangements, present in5% of LUAD5 , currently have tar- geted therapies approved by the Food and Drug Administration (FDA)6,7 . Mutations in other genes, such as KRAS and tumor pro- tein P53 (TP53) are very common (about 25% and 50%, respec- tively) but have proven to be particularly challenging drug targets so far5,8 . Lung biopsies are typically used to diagnose lung cancer type and stage. Virtual microscopy of stained images of tissues is typically acquired at magnifications of 20×to 40×, generating very large two-dimensional images (10,000 to100,000 pixels in each dimension) that are oftentimes challenging to visually inspect in an exhaustive manner. Furthermore, accurate interpretation can be difficult, and the distinction between LUAD and LUSC is not always clear, particularly in poorly differentiated tumors; in this case, ancil- lary studies are recommended for accurate classification9,10 . To assist experts, automatic analysis of lung cancer whole-slide images has been recently studied to predict survival outcomes11 and classifica- tion12 . For the latter, Yu et al.12 combined conventional thresholding and image processing techniques with machine-learning methods, such as random forest classifiers, support vector machines (SVM) or Naive Bayes classifiers, achieving an AUC of ~0.85 in distinguishing normal from tumor slides, and ~0.75 in distinguishing LUAD from LUSC slides. More recently, deep learning was used for the classi- fication of breast, bladder and lung tumors, achieving an AUC of 0.83 in classification of lung tumor types on tumor slides from The Cancer Genome Atlas (TCGA)13 . Analysis of plasma DNA values was also shown to be a good predictor of the presence of non–small cell cancer, with an AUC of ~0.94 (ref. 14 ) in distinguishing LUAD from LUSC, whereas the use of immunochemical markers yields an AUC of ~0.94115 . Here, we demonstrate how the field can further benefit from deep learning by presenting a strategy based on convolutional neural networks (CNNs) that not only outperforms methods in previously Classification and mutation prediction from non–small cell lung cancer histopathology images using deep learning Nicolas Coudray 1,2,9 , Paolo Santiago Ocampo3,9 , Theodore Sakellaropoulos4 , Navneet Narula3 , Matija Snuderl3 , David Fenyö5,6 , Andre L. Moreira3,7 , Narges Razavian 8 * and Aristotelis Tsirigos 1,3 * Visual inspection of histopathology slides is one of the main methods used by pathologists to assess the stage, type and sub- type of lung tumors. Adenocarcinoma (LUAD) and squamous cell carcinoma (LUSC) are the most prevalent subtypes of lung cancer, and their distinction requires visual inspection by an experienced pathologist. In this study, we trained a deep con- volutional neural network (inception v3) on whole-slide images obtained from The Cancer Genome Atlas to accurately and automatically classify them into LUAD, LUSC or normal lung tissue. The performance of our method is comparable to that of pathologists, with an average area under the curve (AUC) of 0.97. Our model was validated on independent datasets of frozen tissues, formalin-fixed paraffin-embedded tissues and biopsies. Furthermore, we trained the network to predict the ten most commonly mutated genes in LUAD. We found that six of them—STK11, EGFR, FAT1, SETBP1, KRAS and TP53—can be pre- dicted from pathology images, with AUCs from 0.733 to 0.856 as measured on a held-out population. These findings suggest that deep-learning models can assist pathologists in the detection of cancer subtype or gene mutations. Our approach can be applied to any cancer type, and the code is available at https://github.com/ncoudray/DeepPATH. NATURE MEDICINE | www.nature.com/naturemedicine 1Wang P, et al. Gut 2019;0:1–7. doi:10.1136/gutjnl-2018-317500 Endoscopy ORIGINAL ARTICLE Real-time automatic detection system increases colonoscopic polyp and adenoma detection rates: a prospective randomised controlled study Pu Wang,  1 Tyler M Berzin,  2 Jeremy Romek Glissen Brown,  2 Shishira Bharadwaj,2 Aymeric Becq,2 Xun Xiao,1 Peixi Liu,1 Liangping Li,1 Yan Song,1 Di Zhang,1 Yi Li,1 Guangre Xu,1 Mengtian Tu,1 Xiaogang Liu  1 To cite: Wang P, Berzin TM, Glissen Brown JR, et al. Gut Epub ahead of print: [please include Day Month Year]. doi:10.1136/ gutjnl-2018-317500 ► Additional material is published online only.To view please visit the journal online (http://dx.doi.org/10.1136/ gutjnl-2018-317500). 1 Department of Gastroenterology, Sichuan Academy of Medical Sciences Sichuan Provincial People’s Hospital, Chengdu, China 2 Center for Advanced Endoscopy, Beth Israel Deaconess Medical Center and Harvard Medical School, Boston, Massachusetts, USA Correspondence to Xiaogang Liu, Department of Gastroenterology Sichuan Academy of Medical Sciences and Sichuan Provincial People’s Hospital, Chengdu, China; Gary.samsph@gmail.com Received 30 August 2018 Revised 4 February 2019 Accepted 13 February 2019 © Author(s) (or their employer(s)) 2019. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ. ABSTRACT Objective The effect of colonoscopy on colorectal cancer mortality is limited by several factors, among them a certain miss rate, leading to limited adenoma detection rates (ADRs).We investigated the effect of an automatic polyp detection system based on deep learning on polyp detection rate and ADR. Design In an open, non-blinded trial, consecutive patients were prospectively randomised to undergo diagnostic colonoscopy with or without assistance of a real-time automatic polyp detection system providing a simultaneous visual notice and sound alarm on polyp detection.The primary outcome was ADR. Results Of 1058 patients included, 536 were randomised to standard colonoscopy, and 522 were randomised to colonoscopy with computer-aided diagnosis.The artificial intelligence (AI) system significantly increased ADR (29.1%vs20.3%, p0.001) and the mean number of adenomas per patient (0.53vs0.31, p0.001).This was due to a higher number of diminutive adenomas found (185vs102; p0.001), while there was no statistical difference in larger adenomas (77vs58, p=0.075). In addition, the number of hyperplastic polyps was also significantly increased (114vs52, p0.001). Conclusions In a low prevalent ADR population, an automatic polyp detection system during colonoscopy resulted in a significant increase in the number of diminutive adenomas detected, as well as an increase in the rate of hyperplastic polyps.The cost–benefit ratio of such effects has to be determined further. Trial registration number ChiCTR-DDD-17012221; Results. INTRODUCTION Colorectal cancer (CRC) is the second and third- leading causes of cancer-related deaths in men and women respectively.1 Colonoscopy is the gold stan- dard for screening CRC.2 3 Screening colonoscopy has allowed for a reduction in the incidence and mortality of CRC via the detection and removal of adenomatous polyps.4–8 Additionally, there is evidence that with each 1.0% increase in adenoma detection rate (ADR), there is an associated 3.0% decrease in the risk of interval CRC.9 10 However, polyps can be missed, with reported miss rates of up to 27% due to both polyp and operator charac- teristics.11 12 Unrecognised polyps within the visual field is an important problem to address.11 Several studies have shown that assistance by a second observer increases the polyp detection rate (PDR), but such a strategy remains controversial in terms of increasing the ADR.13–15 Ideally, a real-time automatic polyp detec- tion system, with performance close to that of expert endoscopists, could assist the endosco- pist in detecting lesions that might correspond to adenomas in a more consistent and reliable way Significance of this study What is already known on this subject? ► Colorectal adenoma detection rate (ADR) is regarded as a main quality indicator of (screening) colonoscopy and has been shown to correlate with interval cancers. Reducing adenoma miss rates by increasing ADR has been a goal of many studies focused on imaging techniques and mechanical methods. ► Artificial intelligence has been recently introduced for polyp and adenoma detection as well as differentiation and has shown promising results in preliminary studies. What are the new findings? ► This represents the first prospective randomised controlled trial examining an automatic polyp detection during colonoscopy and shows an increase of ADR by 50%, from 20% to 30%. ► This effect was mainly due to a higher rate of small adenomas found. ► The detection rate of hyperplastic polyps was also significantly increased. How might it impact on clinical practice in the foreseeable future? ► Automatic polyp and adenoma detection could be the future of diagnostic colonoscopy in order to achieve stable high adenoma detection rates. ► However, the effect on ultimate outcome is still unclear, and further improvements such as polyp differentiation have to be implemented. on17March2019byguest.Protectedbycopyright.http://gut.bmj.com/Gut:firstpublishedas10.1136/gutjnl-2018-317500on27February2019.Downloadedfrom This copy is for personal use only. To order printed copies, contact reprints@rsna.org This copy is for personal use only. To order printed copies, contact reprints@rsna.org ORIGINAL RESEARCH • THORACIC IMAGING C hest radiography, one of the most common diagnos- tic imaging tests in medicine, is used for screening, diagnostic work-ups, and monitoring of various thoracic diseases (1,2). One of its major objectives is detection of pulmonary nodules because pulmonary nodules are often the initial radiologic manifestation of lung cancers (1,2). However, to date, pulmonary nodule detection on chest radiographs has not been completely satisfactory, with a reported sensitivity ranging between 36%–84%, varying widely according to the tumor size and study population (2–6). Indeed, chest radiography has been shown to be prone to many reading errors with low interobserver and intraobserver agreements because of its limited spatial reso- lution, noise from overlapping anatomic structures, and the variable perceptual ability of radiologists. Recent work shows that 19%–26% of lung cancers visible on chest ra- diographs were in fact missed at their first readings (6,7). Of course, hindsight is always perfect when one knows where to look. For this reason, there has been increasing dependency on chest CT images over chest radiographs in pulmonary nodule detection. However, even low-dose CT scans re- quire approximately 50–100 times higher radiation dose than single-view chest radiographic examinations (8,9) Development and Validation of Deep Learning–based Automatic Detection Algorithm for Malignant Pulmonary Nodules on Chest Radiographs Ju Gang Nam, MD* • Sunggyun Park, PhD* • Eui Jin Hwang, MD • Jong Hyuk Lee, MD • Kwang-Nam Jin, MD, PhD • KunYoung Lim, MD, PhD • Thienkai HuyVu, MD, PhD • Jae Ho Sohn, MD • Sangheum Hwang, PhD • Jin Mo Goo, MD, PhD • Chang Min Park, MD, PhD From the Department of Radiology and Institute of Radiation Medicine, Seoul National University Hospital and College of Medicine, 101 Daehak-ro, Jongno-gu, Seoul 03080, Republic of Korea (J.G.N., E.J.H., J.M.G., C.M.P.); Lunit Incorporated, Seoul, Republic of Korea (S.P.); Department of Radiology, Armed Forces Seoul Hospital, Seoul, Republic of Korea (J.H.L.); Department of Radiology, Seoul National University Boramae Medical Center, Seoul, Republic of Korea (K.N.J.); Department of Radiology, National Cancer Center, Goyang, Republic of Korea (K.Y.L.); Department of Radiology and Biomedical Imaging, University of California, San Francisco, San Francisco, Calif (T.H.V., J.H.S.); and Department of Industrial Information Systems Engineering, Seoul National University of Science and Technology, Seoul, Republic of Korea (S.H.). Received January 30, 2018; revision requested March 20; revision received July 29; accepted August 6. Address correspondence to C.M.P. (e-mail: cmpark.morphius@gmail.com). Study supported by SNUH Research Fund and Lunit (06–2016–3000) and by Seoul Research and Business Development Program (FI170002). *J.G.N. and S.P. contributed equally to this work. Conflicts of interest are listed at the end of this article. Radiology 2018; 00:1–11 • https://doi.org/10.1148/radiol.2018180237 • Content codes: Purpose: To develop and validate a deep learning–based automatic detection algorithm (DLAD) for malignant pulmonary nodules on chest radiographs and to compare its performance with physicians including thoracic radiologists. Materials and Methods: For this retrospective study, DLAD was developed by using 43292 chest radiographs (normal radiograph– to–nodule radiograph ratio, 34067:9225) in 34676 patients (healthy-to-nodule ratio, 30784:3892; 19230 men [mean age, 52.8 years; age range, 18–99 years]; 15446 women [mean age, 52.3 years; age range, 18–98 years]) obtained between 2010 and 2015, which were labeled and partially annotated by 13 board-certified radiologists, in a convolutional neural network. Radiograph clas- sification and nodule detection performances of DLAD were validated by using one internal and four external data sets from three South Korean hospitals and one U.S. hospital. For internal and external validation, radiograph classification and nodule detection performances of DLAD were evaluated by using the area under the receiver operating characteristic curve (AUROC) and jackknife alternative free-response receiver-operating characteristic (JAFROC) figure of merit (FOM), respectively. An observer performance test involving 18 physicians, including nine board-certified radiologists, was conducted by using one of the four external validation data sets. Performances of DLAD, physicians, and physicians assisted with DLAD were evaluated and compared. Results: According to one internal and four external validation data sets, radiograph classification and nodule detection perfor- mances of DLAD were a range of 0.92–0.99 (AUROC) and 0.831–0.924 (JAFROC FOM), respectively. DLAD showed a higher AUROC and JAFROC FOM at the observer performance test than 17 of 18 and 15 of 18 physicians, respectively (P , .05), and all physicians showed improved nodule detection performances with DLAD (mean JAFROC FOM improvement, 0.043; range, 0.006–0.190; P , .05). Conclusion: This deep learning–based automatic detection algorithm outperformed physicians in radiograph classification and nod- ule detection performance for malignant pulmonary nodules on chest radiographs, and it enhanced physicians’ performances when used as a second reader. ©RSNA, 2018 Online supplemental material is available for this article. LETTERS https://doi.org/10.1038/s41591-019-0447-x 1 Google AI, Mountain View, CA, USA. 2 Stanford Health Care and Palo Alto Veterans Affairs, Palo Alto, CA, USA. 3 Northwestern Medicine, Chicago, IL, USA. 4 New York University-Langone Medical Center, Center for Biological Imaging, New York City, NY, USA. 5 These authors contributed equally: Diego Ardila, Atilla P. Kiraly, Sujeeth Bharadwaj, Bokyung Choi. *e-mail: tsed@google.com With an estimated 160,000 deaths in 2018, lung cancer is the most common cause of cancer death in the United States1 . Lung cancer screening using low-dose computed tomography has been shown to reduce mortality by 20–43% and is now included in US screening guidelines1–6 . Existing challenges include inter-grader variability and high false-positive and false-negative rates7–10 . We propose a deep learning algorithm that uses a patient’s current and prior computed tomography volumes to predict the risk of lung cancer. Our model achieves a state-of-the-art performance (94.4% area under the curve) on 6,716 National Lung Cancer Screening Trial cases, and performs similarly on an independent clinical validation set of 1,139cases. We conducted two reader studies. When prior computed tomography imaging was not available, our model outperformed all six radiologists with absolute reductions of 11% in false positives and 5% in false negatives. Where prior computed tomography imaging was available, the model per- formance was on-par with the same radiologists. This creates an opportunity to optimize the screening process via com- puter assistance and automation. While the vast majority of patients remain unscreened, we show the potential for deep learning models to increase the accuracy, consistency and adoption of lung cancer screening worldwide. In 2013, the United States Preventive Services Task Force rec- ommended low-dose computed tomography (LDCT) lung cancer screening in high-risk populations based on reported improved mortality in the National Lung Cancer Screening Trial (NLST)2–5 . In 2014, the American College of Radiology published the Lung- RADS guidelines for LDCT lung cancer screening, to standardize image interpretation by radiologists and dictate management rec- ommendations1,6 . Evaluation is based on a variety of image find- ings, but primarily nodule size, density and growth6 . At screening sites, Lung-RADS and other models such as PanCan are used to determine malignancy risk ratings that drive recommendations for clinical management11,12 . Improving the sensitivity and specificity of lung cancer screening is imperative because of the high clinical and financial costs of missed diagnosis, late diagnosis and unneces- sary biopsy procedures resulting from false negatives and false posi- tives5,13–17 . Despite improved consistency, persistent inter-grader variability and incomplete characterization of comprehensive imaging findings remain as limitations7–10 of Lung-RADS. These limitations suggest opportunities for more sophisticated systems to improve performance and inter-reader consistency18,19 . Deep learning approaches offer the exciting potential to automate more complex image analysis, detect subtle holistic imaging findings and unify methodologies for image evaluation20 . A variety of software devices have been approved by the Food and Drug Administration (FDA) with the goal of address- ing workflow efficiency and performance through augmented detection of lung nodules on lung computed tomography (CT)21 . Clinical research has primarily focused on either nodule detec- tion or diagnostic support for lesions manually selected by imag- ing experts22–27 . Nodule detection systems were engineered with the goal of improving radiologist sensitivity in identifying nod- ules while minimizing costs to specificity, thereby falling into the category of computer-aided detection (CADe)28 . This approach highlights small nodules, leaving malignancy risk evaluation and clinical decision making to the clinician. Diagnostic support for pre-identified lesions is included in computer-aided diagnosis (CADx) platforms, which are primarily aimed at improving speci- ficity. CADx has gained greater interest and even first regulatory approvals in other areas of radiology, though not in lung cancer at the time of manuscript preparation29 . To move beyond the limitations of prior CADe and CADx approaches, we aimed to build an end-to-end approach perform- ing both localization and lung cancer risk categorization tasks using the input CT data alone. More specifically, we were interested in replicating a more complete part of a radiologist’s workflow, includ- ing full assessment of LDCT volume, focus on regions of concern, comparison to prior imaging when available and calibration against biopsy-confirmed outcomes. Another important high-level decision in our approach was to learn features using deep convolutional neural networks (CNN), rather than using hand-engineered features such as texture fea- tures or specific Hounsfield unit values. We chose to learn features because this approach has repeatedly been shown superior to hand- engineered features in many open computer vision competitions in the past five years30,31 , including the Kaggle 2017 Data Science Bowl which used NLST data32 . There were three key components in our new approach (Fig. 1). First, we constructed a three-dimensional (3D) CNN model that performs end-to-end analysis of whole-CT volumes, using LDCT End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography Diego Ardila 1,5 , Atilla P. Kiraly1,5 , Sujeeth Bharadwaj1,5 , Bokyung Choi1,5 , Joshua J. Reicher2 , Lily Peng1 , Daniel Tse 1 *, Mozziyar Etemadi 3 , Wenxing Ye1 , Greg Corrado1 , David P. Naidich4 and Shravya Shetty1 Corrected: Author Correction NATURE MEDICINE | VOL 25 | JUNE 2019 | 954–961 | www.nature.com/naturemedicine954 Downloadedfromhttps://journals.lww.com/ajspbyBhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4XMi0hCywCX1AWnYQp/IlQrHD3MyLIZIvnCFZVJ56DGsD590P5lh5KqE20T/dBX3x9CoM=on10/14/2018 Downloadedfromhttps://journals.lww.com/ajspbyBhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4XMi0hCywCX1AWnYQp/IlQrHD3MyLIZIvnCFZVJ56DGsD590P5lh5KqE20T/dBX3x9CoM=on10/14/2018 Impact of Deep Learning Assistance on the Histopathologic Review of Lymph Nodes for Metastatic Breast Cancer David F. Steiner, MD, PhD,* Robert MacDonald, PhD,* Yun Liu, PhD,* Peter Truszkowski, MD,* Jason D. Hipp, MD, PhD, FCAP,* Christopher Gammage, MS,* Florence Thng, MS,† Lily Peng, MD, PhD,* and Martin C. Stumpe, PhD* Abstract: Advances in the quality of whole-slide images have set the stage for the clinical use of digital images in anatomic pathology. Along with advances in computer image analysis, this raises the possibility for computer-assisted diagnostics in pathology to improve histopathologic interpretation and clinical care. To evaluate the potential impact of digital assistance on interpretation of digitized slides, we conducted a multireader multicase study utilizing our deep learning algorithm for the detection of breast cancer metastasis in lymph nodes. Six pathologists reviewed 70 digitized slides from lymph node sections in 2 reader modes, unassisted and assisted, with a wash- out period between sessions. In the assisted mode, the deep learning algorithm was used to identify and outline regions with high like- lihood of containing tumor. Algorithm-assisted pathologists demon- strated higher accuracy than either the algorithm or the pathologist alone. In particular, algorithm assistance significantly increased the sensitivity of detection for micrometastases (91% vs. 83%, P=0.02). In addition, average review time per image was significantly shorter with assistance than without assistance for both micrometastases (61 vs. 116 s, P=0.002) and negative images (111 vs. 137 s, P=0.018). Lastly, pathologists were asked to provide a numeric score regarding the difficulty of each image classification. On the basis of this score, pathologists considered the image review of micrometastases to be significantly easier when interpreted with assistance (P=0.0005). Utilizing a proof of concept assistant tool, this study demonstrates the potential of a deep learning algorithm to improve pathologist accu- racy and efficiency in a digital pathology workflow. Key Words: artificial intelligence, machine learning, digital pathology, breast cancer, computer aided detection (Am J Surg Pathol 2018;00:000–000) The regulatory approval and gradual implementation of whole-slide scanners has enabled the digitization of glass slides for remote consults and archival purposes.1 Digitiza- tion alone, however, does not necessarily improve the con- sistency or efficiency of a pathologist’s primary workflow. In fact, image review on a digital medium can be slightly slower than on glass, especially for pathologists with limited digital pathology experience.2 However, digital pathology and image analysis tools have already demonstrated po- tential benefits, including the potential to reduce inter-reader variability in the evaluation of breast cancer HER2 status.3,4 Digitization also opens the door for assistive tools based on Artificial Intelligence (AI) to improve efficiency and con- sistency, decrease fatigue, and increase accuracy.5 Among AI technologies, deep learning has demon- strated strong performance in many automated image-rec- ognition applications.6–8 Recently, several deep learning– based algorithms have been developed for the detection of breast cancer metastases in lymph nodes as well as for other applications in pathology.9,10 Initial findings suggest that some algorithms can even exceed a pathologist’s sensitivity for detecting individual cancer foci in digital images. How- ever, this sensitivity gain comes at the cost of increased false positives, potentially limiting the utility of such algorithms for automated clinical use.11 In addition, deep learning algo- rithms are inherently limited to the task for which they have been specifically trained. While we have begun to understand the strengths of these algorithms (such as exhaustive search) and their weaknesses (sensitivity to poor optical focus, tumor mimics; manuscript under review), the potential clinical util- ity of such algorithms has not been thoroughly examined. While an accurate algorithm alone will not necessarily aid pathologists or improve clinical interpretation, these benefits may be achieved through thoughtful and appropriate in- tegration of algorithm predictions into the clinical workflow.8 From the *Google AI Healthcare; and †Verily Life Sciences, Mountain View, CA. D.F.S., R.M., and Y.L. are co-first authors (equal contribution). Work done as part of the Google Brain Healthcare Technology Fellowship (D.F.S. and P.T.). Conflicts of Interest and Source of Funding: D.F.S., R.M., Y.L., P.T., J.D.H., C.G., F.T., L.P., M.C.S. are employees of Alphabet and have Alphabet stock. Correspondence: David F. Steiner, MD, PhD, Google AI Healthcare, 1600 Amphitheatre Way, Mountain View, CA 94043 (e-mail: davesteiner@google.com). Supplemental Digital Content is available for this article. Direct URL citations appear in the printed text and are provided in the HTML and PDF versions of this article on the journal’s website, www.ajsp.com. Copyright © 2018 The Author(s). Published by Wolters Kluwer Health, Inc. This is an open-access article distributed under the terms of the Creative Commons Attribution-Non Commercial-No Derivatives License 4.0 (CCBY-NC-ND), where it is permissible to download and share the work provided it is properly cited. The work cannot be changed in any way or used commercially without permission from the journal. ORIGINAL ARTICLE Am J Surg Pathol Volume 00, Number 00, ’’ 2018 www.ajsp.com | 1 Lung nodule on Chest X-ray Lung cancer on Chest CT Lung Cancer on Pathology Breast Cancer on Pathology Polys Adenoma on Colonoscopy Improving the accuracy
  • 89.
    ARTICLES https://doi.org/10.1038/s41591-018-0177-5 1 Applied Bioinformatics Laboratories,New York University School of Medicine, New York, NY, USA. 2 Skirball Institute, Department of Cell Biology, New York University School of Medicine, New York, NY, USA. 3 Department of Pathology, New York University School of Medicine, New York, NY, USA. 4 School of Mechanical Engineering, National Technical University of Athens, Zografou, Greece. 5 Institute for Systems Genetics, New York University School of Medicine, New York, NY, USA. 6 Department of Biochemistry and Molecular Pharmacology, New York University School of Medicine, New York, NY, USA. 7 Center for Biospecimen Research and Development, New York University, New York, NY, USA. 8 Department of Population Health and the Center for Healthcare Innovation and Delivery Science, New York University School of Medicine, New York, NY, USA. 9 These authors contributed equally to this work: Nicolas Coudray, Paolo Santiago Ocampo. *e-mail: narges.razavian@nyumc.org; aristotelis.tsirigos@nyumc.org A ccording to the American Cancer Society and the Cancer Statistics Center (see URLs), over 150,000 patients with lung cancer succumb to the disease each year (154,050 expected for 2018), while another 200,000 new cases are diagnosed on a yearly basis (234,030 expected for 2018). It is one of the most widely spread cancers in the world because of not only smoking, but also exposure to toxic chemicals like radon, asbestos and arsenic. LUAD and LUSC are the two most prevalent types of non–small cell lung cancer1 , and each is associated with discrete treatment guidelines. In the absence of definitive histologic features, this important distinc- tion can be challenging and time-consuming, and requires confir- matory immunohistochemical stains. Classification of lung cancer type is a key diagnostic process because the available treatment options, including conventional chemotherapy and, more recently, targeted therapies, differ for LUAD and LUSC2 . Also, a LUAD diagnosis will prompt the search for molecular biomarkers and sensitizing mutations and thus has a great impact on treatment options3,4 . For example, epidermal growth factor receptor (EGFR) mutations, present in about 20% of LUAD, and anaplastic lymphoma receptor tyrosine kinase (ALK) rearrangements, present in5% of LUAD5 , currently have tar- geted therapies approved by the Food and Drug Administration (FDA)6,7 . Mutations in other genes, such as KRAS and tumor pro- tein P53 (TP53) are very common (about 25% and 50%, respec- tively) but have proven to be particularly challenging drug targets so far5,8 . Lung biopsies are typically used to diagnose lung cancer type and stage. Virtual microscopy of stained images of tissues is typically acquired at magnifications of 20×to 40×, generating very large two-dimensional images (10,000 to100,000 pixels in each dimension) that are oftentimes challenging to visually inspect in an exhaustive manner. Furthermore, accurate interpretation can be difficult, and the distinction between LUAD and LUSC is not always clear, particularly in poorly differentiated tumors; in this case, ancil- lary studies are recommended for accurate classification9,10 . To assist experts, automatic analysis of lung cancer whole-slide images has been recently studied to predict survival outcomes11 and classifica- tion12 . For the latter, Yu et al.12 combined conventional thresholding and image processing techniques with machine-learning methods, such as random forest classifiers, support vector machines (SVM) or Naive Bayes classifiers, achieving an AUC of ~0.85 in distinguishing normal from tumor slides, and ~0.75 in distinguishing LUAD from LUSC slides. More recently, deep learning was used for the classi- fication of breast, bladder and lung tumors, achieving an AUC of 0.83 in classification of lung tumor types on tumor slides from The Cancer Genome Atlas (TCGA)13 . Analysis of plasma DNA values was also shown to be a good predictor of the presence of non–small cell cancer, with an AUC of ~0.94 (ref. 14 ) in distinguishing LUAD from LUSC, whereas the use of immunochemical markers yields an AUC of ~0.94115 . Here, we demonstrate how the field can further benefit from deep learning by presenting a strategy based on convolutional neural networks (CNNs) that not only outperforms methods in previously Classification and mutation prediction from non–small cell lung cancer histopathology images using deep learning Nicolas Coudray 1,2,9 , Paolo Santiago Ocampo3,9 , Theodore Sakellaropoulos4 , Navneet Narula3 , Matija Snuderl3 , David Fenyö5,6 , Andre L. Moreira3,7 , Narges Razavian 8 * and Aristotelis Tsirigos 1,3 * Visual inspection of histopathology slides is one of the main methods used by pathologists to assess the stage, type and sub- type of lung tumors. Adenocarcinoma (LUAD) and squamous cell carcinoma (LUSC) are the most prevalent subtypes of lung cancer, and their distinction requires visual inspection by an experienced pathologist. In this study, we trained a deep con- volutional neural network (inception v3) on whole-slide images obtained from The Cancer Genome Atlas to accurately and automatically classify them into LUAD, LUSC or normal lung tissue. The performance of our method is comparable to that of pathologists, with an average area under the curve (AUC) of 0.97. Our model was validated on independent datasets of frozen tissues, formalin-fixed paraffin-embedded tissues and biopsies. Furthermore, we trained the network to predict the ten most commonly mutated genes in LUAD. We found that six of them—STK11, EGFR, FAT1, SETBP1, KRAS and TP53—can be pre- dicted from pathology images, with AUCs from 0.733 to 0.856 as measured on a held-out population. These findings suggest that deep-learning models can assist pathologists in the detection of cancer subtype or gene mutations. Our approach can be applied to any cancer type, and the code is available at https://github.com/ncoudray/DeepPATH. NATURE MEDICINE | www.nature.com/naturemedicine 1Wang P, et al. Gut 2019;0:1–7. doi:10.1136/gutjnl-2018-317500 Endoscopy ORIGINAL ARTICLE Real-time automatic detection system increases colonoscopic polyp and adenoma detection rates: a prospective randomised controlled study Pu Wang,  1 Tyler M Berzin,  2 Jeremy Romek Glissen Brown,  2 Shishira Bharadwaj,2 Aymeric Becq,2 Xun Xiao,1 Peixi Liu,1 Liangping Li,1 Yan Song,1 Di Zhang,1 Yi Li,1 Guangre Xu,1 Mengtian Tu,1 Xiaogang Liu  1 To cite: Wang P, Berzin TM, Glissen Brown JR, et al. Gut Epub ahead of print: [please include Day Month Year]. doi:10.1136/ gutjnl-2018-317500 ► Additional material is published online only.To view please visit the journal online (http://dx.doi.org/10.1136/ gutjnl-2018-317500). 1 Department of Gastroenterology, Sichuan Academy of Medical Sciences Sichuan Provincial People’s Hospital, Chengdu, China 2 Center for Advanced Endoscopy, Beth Israel Deaconess Medical Center and Harvard Medical School, Boston, Massachusetts, USA Correspondence to Xiaogang Liu, Department of Gastroenterology Sichuan Academy of Medical Sciences and Sichuan Provincial People’s Hospital, Chengdu, China; Gary.samsph@gmail.com Received 30 August 2018 Revised 4 February 2019 Accepted 13 February 2019 © Author(s) (or their employer(s)) 2019. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ. ABSTRACT Objective The effect of colonoscopy on colorectal cancer mortality is limited by several factors, among them a certain miss rate, leading to limited adenoma detection rates (ADRs).We investigated the effect of an automatic polyp detection system based on deep learning on polyp detection rate and ADR. Design In an open, non-blinded trial, consecutive patients were prospectively randomised to undergo diagnostic colonoscopy with or without assistance of a real-time automatic polyp detection system providing a simultaneous visual notice and sound alarm on polyp detection.The primary outcome was ADR. Results Of 1058 patients included, 536 were randomised to standard colonoscopy, and 522 were randomised to colonoscopy with computer-aided diagnosis.The artificial intelligence (AI) system significantly increased ADR (29.1%vs20.3%, p0.001) and the mean number of adenomas per patient (0.53vs0.31, p0.001).This was due to a higher number of diminutive adenomas found (185vs102; p0.001), while there was no statistical difference in larger adenomas (77vs58, p=0.075). In addition, the number of hyperplastic polyps was also significantly increased (114vs52, p0.001). Conclusions In a low prevalent ADR population, an automatic polyp detection system during colonoscopy resulted in a significant increase in the number of diminutive adenomas detected, as well as an increase in the rate of hyperplastic polyps.The cost–benefit ratio of such effects has to be determined further. Trial registration number ChiCTR-DDD-17012221; Results. INTRODUCTION Colorectal cancer (CRC) is the second and third- leading causes of cancer-related deaths in men and women respectively.1 Colonoscopy is the gold stan- dard for screening CRC.2 3 Screening colonoscopy has allowed for a reduction in the incidence and mortality of CRC via the detection and removal of adenomatous polyps.4–8 Additionally, there is evidence that with each 1.0% increase in adenoma detection rate (ADR), there is an associated 3.0% decrease in the risk of interval CRC.9 10 However, polyps can be missed, with reported miss rates of up to 27% due to both polyp and operator charac- teristics.11 12 Unrecognised polyps within the visual field is an important problem to address.11 Several studies have shown that assistance by a second observer increases the polyp detection rate (PDR), but such a strategy remains controversial in terms of increasing the ADR.13–15 Ideally, a real-time automatic polyp detec- tion system, with performance close to that of expert endoscopists, could assist the endosco- pist in detecting lesions that might correspond to adenomas in a more consistent and reliable way Significance of this study What is already known on this subject? ► Colorectal adenoma detection rate (ADR) is regarded as a main quality indicator of (screening) colonoscopy and has been shown to correlate with interval cancers. Reducing adenoma miss rates by increasing ADR has been a goal of many studies focused on imaging techniques and mechanical methods. ► Artificial intelligence has been recently introduced for polyp and adenoma detection as well as differentiation and has shown promising results in preliminary studies. What are the new findings? ► This represents the first prospective randomised controlled trial examining an automatic polyp detection during colonoscopy and shows an increase of ADR by 50%, from 20% to 30%. ► This effect was mainly due to a higher rate of small adenomas found. ► The detection rate of hyperplastic polyps was also significantly increased. How might it impact on clinical practice in the foreseeable future? ► Automatic polyp and adenoma detection could be the future of diagnostic colonoscopy in order to achieve stable high adenoma detection rates. ► However, the effect on ultimate outcome is still unclear, and further improvements such as polyp differentiation have to be implemented. on17March2019byguest.Protectedbycopyright.http://gut.bmj.com/Gut:firstpublishedas10.1136/gutjnl-2018-317500on27February2019.Downloadedfrom This copy is for personal use only. To order printed copies, contact reprints@rsna.org This copy is for personal use only. To order printed copies, contact reprints@rsna.org ORIGINAL RESEARCH • THORACIC IMAGING C hest radiography, one of the most common diagnos- tic imaging tests in medicine, is used for screening, diagnostic work-ups, and monitoring of various thoracic diseases (1,2). One of its major objectives is detection of pulmonary nodules because pulmonary nodules are often the initial radiologic manifestation of lung cancers (1,2). However, to date, pulmonary nodule detection on chest radiographs has not been completely satisfactory, with a reported sensitivity ranging between 36%–84%, varying widely according to the tumor size and study population (2–6). Indeed, chest radiography has been shown to be prone to many reading errors with low interobserver and intraobserver agreements because of its limited spatial reso- lution, noise from overlapping anatomic structures, and the variable perceptual ability of radiologists. Recent work shows that 19%–26% of lung cancers visible on chest ra- diographs were in fact missed at their first readings (6,7). Of course, hindsight is always perfect when one knows where to look. For this reason, there has been increasing dependency on chest CT images over chest radiographs in pulmonary nodule detection. However, even low-dose CT scans re- quire approximately 50–100 times higher radiation dose than single-view chest radiographic examinations (8,9) Development and Validation of Deep Learning–based Automatic Detection Algorithm for Malignant Pulmonary Nodules on Chest Radiographs Ju Gang Nam, MD* • Sunggyun Park, PhD* • Eui Jin Hwang, MD • Jong Hyuk Lee, MD • Kwang-Nam Jin, MD, PhD • KunYoung Lim, MD, PhD • Thienkai HuyVu, MD, PhD • Jae Ho Sohn, MD • Sangheum Hwang, PhD • Jin Mo Goo, MD, PhD • Chang Min Park, MD, PhD From the Department of Radiology and Institute of Radiation Medicine, Seoul National University Hospital and College of Medicine, 101 Daehak-ro, Jongno-gu, Seoul 03080, Republic of Korea (J.G.N., E.J.H., J.M.G., C.M.P.); Lunit Incorporated, Seoul, Republic of Korea (S.P.); Department of Radiology, Armed Forces Seoul Hospital, Seoul, Republic of Korea (J.H.L.); Department of Radiology, Seoul National University Boramae Medical Center, Seoul, Republic of Korea (K.N.J.); Department of Radiology, National Cancer Center, Goyang, Republic of Korea (K.Y.L.); Department of Radiology and Biomedical Imaging, University of California, San Francisco, San Francisco, Calif (T.H.V., J.H.S.); and Department of Industrial Information Systems Engineering, Seoul National University of Science and Technology, Seoul, Republic of Korea (S.H.). Received January 30, 2018; revision requested March 20; revision received July 29; accepted August 6. Address correspondence to C.M.P. (e-mail: cmpark.morphius@gmail.com). Study supported by SNUH Research Fund and Lunit (06–2016–3000) and by Seoul Research and Business Development Program (FI170002). *J.G.N. and S.P. contributed equally to this work. Conflicts of interest are listed at the end of this article. Radiology 2018; 00:1–11 • https://doi.org/10.1148/radiol.2018180237 • Content codes: Purpose: To develop and validate a deep learning–based automatic detection algorithm (DLAD) for malignant pulmonary nodules on chest radiographs and to compare its performance with physicians including thoracic radiologists. Materials and Methods: For this retrospective study, DLAD was developed by using 43292 chest radiographs (normal radiograph– to–nodule radiograph ratio, 34067:9225) in 34676 patients (healthy-to-nodule ratio, 30784:3892; 19230 men [mean age, 52.8 years; age range, 18–99 years]; 15446 women [mean age, 52.3 years; age range, 18–98 years]) obtained between 2010 and 2015, which were labeled and partially annotated by 13 board-certified radiologists, in a convolutional neural network. Radiograph clas- sification and nodule detection performances of DLAD were validated by using one internal and four external data sets from three South Korean hospitals and one U.S. hospital. For internal and external validation, radiograph classification and nodule detection performances of DLAD were evaluated by using the area under the receiver operating characteristic curve (AUROC) and jackknife alternative free-response receiver-operating characteristic (JAFROC) figure of merit (FOM), respectively. An observer performance test involving 18 physicians, including nine board-certified radiologists, was conducted by using one of the four external validation data sets. Performances of DLAD, physicians, and physicians assisted with DLAD were evaluated and compared. Results: According to one internal and four external validation data sets, radiograph classification and nodule detection perfor- mances of DLAD were a range of 0.92–0.99 (AUROC) and 0.831–0.924 (JAFROC FOM), respectively. DLAD showed a higher AUROC and JAFROC FOM at the observer performance test than 17 of 18 and 15 of 18 physicians, respectively (P , .05), and all physicians showed improved nodule detection performances with DLAD (mean JAFROC FOM improvement, 0.043; range, 0.006–0.190; P , .05). Conclusion: This deep learning–based automatic detection algorithm outperformed physicians in radiograph classification and nod- ule detection performance for malignant pulmonary nodules on chest radiographs, and it enhanced physicians’ performances when used as a second reader. ©RSNA, 2018 Online supplemental material is available for this article. LETTERS https://doi.org/10.1038/s41591-019-0447-x 1 Google AI, Mountain View, CA, USA. 2 Stanford Health Care and Palo Alto Veterans Affairs, Palo Alto, CA, USA. 3 Northwestern Medicine, Chicago, IL, USA. 4 New York University-Langone Medical Center, Center for Biological Imaging, New York City, NY, USA. 5 These authors contributed equally: Diego Ardila, Atilla P. Kiraly, Sujeeth Bharadwaj, Bokyung Choi. *e-mail: tsed@google.com With an estimated 160,000 deaths in 2018, lung cancer is the most common cause of cancer death in the United States1 . Lung cancer screening using low-dose computed tomography has been shown to reduce mortality by 20–43% and is now included in US screening guidelines1–6 . Existing challenges include inter-grader variability and high false-positive and false-negative rates7–10 . We propose a deep learning algorithm that uses a patient’s current and prior computed tomography volumes to predict the risk of lung cancer. Our model achieves a state-of-the-art performance (94.4% area under the curve) on 6,716 National Lung Cancer Screening Trial cases, and performs similarly on an independent clinical validation set of 1,139cases. We conducted two reader studies. When prior computed tomography imaging was not available, our model outperformed all six radiologists with absolute reductions of 11% in false positives and 5% in false negatives. Where prior computed tomography imaging was available, the model per- formance was on-par with the same radiologists. This creates an opportunity to optimize the screening process via com- puter assistance and automation. While the vast majority of patients remain unscreened, we show the potential for deep learning models to increase the accuracy, consistency and adoption of lung cancer screening worldwide. In 2013, the United States Preventive Services Task Force rec- ommended low-dose computed tomography (LDCT) lung cancer screening in high-risk populations based on reported improved mortality in the National Lung Cancer Screening Trial (NLST)2–5 . In 2014, the American College of Radiology published the Lung- RADS guidelines for LDCT lung cancer screening, to standardize image interpretation by radiologists and dictate management rec- ommendations1,6 . Evaluation is based on a variety of image find- ings, but primarily nodule size, density and growth6 . At screening sites, Lung-RADS and other models such as PanCan are used to determine malignancy risk ratings that drive recommendations for clinical management11,12 . Improving the sensitivity and specificity of lung cancer screening is imperative because of the high clinical and financial costs of missed diagnosis, late diagnosis and unneces- sary biopsy procedures resulting from false negatives and false posi- tives5,13–17 . Despite improved consistency, persistent inter-grader variability and incomplete characterization of comprehensive imaging findings remain as limitations7–10 of Lung-RADS. These limitations suggest opportunities for more sophisticated systems to improve performance and inter-reader consistency18,19 . Deep learning approaches offer the exciting potential to automate more complex image analysis, detect subtle holistic imaging findings and unify methodologies for image evaluation20 . A variety of software devices have been approved by the Food and Drug Administration (FDA) with the goal of address- ing workflow efficiency and performance through augmented detection of lung nodules on lung computed tomography (CT)21 . Clinical research has primarily focused on either nodule detec- tion or diagnostic support for lesions manually selected by imag- ing experts22–27 . Nodule detection systems were engineered with the goal of improving radiologist sensitivity in identifying nod- ules while minimizing costs to specificity, thereby falling into the category of computer-aided detection (CADe)28 . This approach highlights small nodules, leaving malignancy risk evaluation and clinical decision making to the clinician. Diagnostic support for pre-identified lesions is included in computer-aided diagnosis (CADx) platforms, which are primarily aimed at improving speci- ficity. CADx has gained greater interest and even first regulatory approvals in other areas of radiology, though not in lung cancer at the time of manuscript preparation29 . To move beyond the limitations of prior CADe and CADx approaches, we aimed to build an end-to-end approach perform- ing both localization and lung cancer risk categorization tasks using the input CT data alone. More specifically, we were interested in replicating a more complete part of a radiologist’s workflow, includ- ing full assessment of LDCT volume, focus on regions of concern, comparison to prior imaging when available and calibration against biopsy-confirmed outcomes. Another important high-level decision in our approach was to learn features using deep convolutional neural networks (CNN), rather than using hand-engineered features such as texture fea- tures or specific Hounsfield unit values. We chose to learn features because this approach has repeatedly been shown superior to hand- engineered features in many open computer vision competitions in the past five years30,31 , including the Kaggle 2017 Data Science Bowl which used NLST data32 . There were three key components in our new approach (Fig. 1). First, we constructed a three-dimensional (3D) CNN model that performs end-to-end analysis of whole-CT volumes, using LDCT End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography Diego Ardila 1,5 , Atilla P. Kiraly1,5 , Sujeeth Bharadwaj1,5 , Bokyung Choi1,5 , Joshua J. Reicher2 , Lily Peng1 , Daniel Tse 1 *, Mozziyar Etemadi 3 , Wenxing Ye1 , Greg Corrado1 , David P. Naidich4 and Shravya Shetty1 Corrected: Author Correction NATURE MEDICINE | VOL 25 | JUNE 2019 | 954–961 | www.nature.com/naturemedicine954 Downloadedfromhttps://journals.lww.com/ajspbyBhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4XMi0hCywCX1AWnYQp/IlQrHD3MyLIZIvnCFZVJ56DGsD590P5lh5KqE20T/dBX3x9CoM=on10/14/2018 Downloadedfromhttps://journals.lww.com/ajspbyBhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4XMi0hCywCX1AWnYQp/IlQrHD3MyLIZIvnCFZVJ56DGsD590P5lh5KqE20T/dBX3x9CoM=on10/14/2018 Impact of Deep Learning Assistance on the Histopathologic Review of Lymph Nodes for Metastatic Breast Cancer David F. Steiner, MD, PhD,* Robert MacDonald, PhD,* Yun Liu, PhD,* Peter Truszkowski, MD,* Jason D. Hipp, MD, PhD, FCAP,* Christopher Gammage, MS,* Florence Thng, MS,† Lily Peng, MD, PhD,* and Martin C. Stumpe, PhD* Abstract: Advances in the quality of whole-slide images have set the stage for the clinical use of digital images in anatomic pathology. Along with advances in computer image analysis, this raises the possibility for computer-assisted diagnostics in pathology to improve histopathologic interpretation and clinical care. To evaluate the potential impact of digital assistance on interpretation of digitized slides, we conducted a multireader multicase study utilizing our deep learning algorithm for the detection of breast cancer metastasis in lymph nodes. Six pathologists reviewed 70 digitized slides from lymph node sections in 2 reader modes, unassisted and assisted, with a wash- out period between sessions. In the assisted mode, the deep learning algorithm was used to identify and outline regions with high like- lihood of containing tumor. Algorithm-assisted pathologists demon- strated higher accuracy than either the algorithm or the pathologist alone. In particular, algorithm assistance significantly increased the sensitivity of detection for micrometastases (91% vs. 83%, P=0.02). In addition, average review time per image was significantly shorter with assistance than without assistance for both micrometastases (61 vs. 116 s, P=0.002) and negative images (111 vs. 137 s, P=0.018). Lastly, pathologists were asked to provide a numeric score regarding the difficulty of each image classification. On the basis of this score, pathologists considered the image review of micrometastases to be significantly easier when interpreted with assistance (P=0.0005). Utilizing a proof of concept assistant tool, this study demonstrates the potential of a deep learning algorithm to improve pathologist accu- racy and efficiency in a digital pathology workflow. Key Words: artificial intelligence, machine learning, digital pathology, breast cancer, computer aided detection (Am J Surg Pathol 2018;00:000–000) The regulatory approval and gradual implementation of whole-slide scanners has enabled the digitization of glass slides for remote consults and archival purposes.1 Digitiza- tion alone, however, does not necessarily improve the con- sistency or efficiency of a pathologist’s primary workflow. In fact, image review on a digital medium can be slightly slower than on glass, especially for pathologists with limited digital pathology experience.2 However, digital pathology and image analysis tools have already demonstrated po- tential benefits, including the potential to reduce inter-reader variability in the evaluation of breast cancer HER2 status.3,4 Digitization also opens the door for assistive tools based on Artificial Intelligence (AI) to improve efficiency and con- sistency, decrease fatigue, and increase accuracy.5 Among AI technologies, deep learning has demon- strated strong performance in many automated image-rec- ognition applications.6–8 Recently, several deep learning– based algorithms have been developed for the detection of breast cancer metastases in lymph nodes as well as for other applications in pathology.9,10 Initial findings suggest that some algorithms can even exceed a pathologist’s sensitivity for detecting individual cancer foci in digital images. How- ever, this sensitivity gain comes at the cost of increased false positives, potentially limiting the utility of such algorithms for automated clinical use.11 In addition, deep learning algo- rithms are inherently limited to the task for which they have been specifically trained. While we have begun to understand the strengths of these algorithms (such as exhaustive search) and their weaknesses (sensitivity to poor optical focus, tumor mimics; manuscript under review), the potential clinical util- ity of such algorithms has not been thoroughly examined. While an accurate algorithm alone will not necessarily aid pathologists or improve clinical interpretation, these benefits may be achieved through thoughtful and appropriate in- tegration of algorithm predictions into the clinical workflow.8 From the *Google AI Healthcare; and †Verily Life Sciences, Mountain View, CA. D.F.S., R.M., and Y.L. are co-first authors (equal contribution). Work done as part of the Google Brain Healthcare Technology Fellowship (D.F.S. and P.T.). Conflicts of Interest and Source of Funding: D.F.S., R.M., Y.L., P.T., J.D.H., C.G., F.T., L.P., M.C.S. are employees of Alphabet and have Alphabet stock. Correspondence: David F. Steiner, MD, PhD, Google AI Healthcare, 1600 Amphitheatre Way, Mountain View, CA 94043 (e-mail: davesteiner@google.com). Supplemental Digital Content is available for this article. Direct URL citations appear in the printed text and are provided in the HTML and PDF versions of this article on the journal’s website, www.ajsp.com. Copyright © 2018 The Author(s). Published by Wolters Kluwer Health, Inc. This is an open-access article distributed under the terms of the Creative Commons Attribution-Non Commercial-No Derivatives License 4.0 (CCBY-NC-ND), where it is permissible to download and share the work provided it is properly cited. The work cannot be changed in any way or used commercially without permission from the journal. ORIGINAL ARTICLE Am J Surg Pathol Volume 00, Number 00, ’’ 2018 www.ajsp.com | 1 Lung nodule on Chest X-ray Downloadedfromhttps://journals.lww.com/ajspbyBhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4XMi0hCywCX1AWnYQp/IlQrHD3MyLIZIvnCFZVJ56DGsD590P5lh5KqE20T/dBX3x9CoM=on10/14/2018 Downloadedfromhttps://journals.lww.com/ajspbyBhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4XMi0hCywCX1AWnYQp/IlQrHD3MyLIZIvnCFZVJ56DGsD590P5lh5KqE20T/dBX3x9CoM=on10/14/2018 Impact of Deep Learning Assistance on the Histopathologic Review of Lymph Nodes for Metastatic Breast Cancer David F. Steiner, MD, PhD,* Robert MacDonald, PhD,* Yun Liu, PhD,* Peter Truszkowski, MD,* Jason D. Hipp, MD, PhD, FCAP,* Christopher Gammage, MS,* Florence Thng, MS,† Lily Peng, MD, PhD,* and Martin C. Stumpe, PhD* Abstract: Advances in the quality of whole-slide images have set the stage for the clinical use of digital images in anatomic pathology. Along with advances in computer image analysis, this raises the possibility for computer-assisted diagnostics in pathology to improve histopathologic interpretation and clinical care. To evaluate the potential impact of digital assistance on interpretation of digitized slides, we conducted a multireader multicase study utilizing our deep learning algorithm for the detection of breast cancer metastasis in lymph nodes. Six pathologists reviewed 70 digitized slides from lymph node sections in 2 reader modes, unassisted and assisted, with a wash- out period between sessions. In the assisted mode, the deep learning algorithm was used to identify and outline regions with high like- lihood of containing tumor. Algorithm-assisted pathologists demon- strated higher accuracy than either the algorithm or the pathologist alone. In particular, algorithm assistance significantly increased the sensitivity of detection for micrometastases (91% vs. 83%, P=0.02). In addition, average review time per image was significantly shorter with assistance than without assistance for both micrometastases (61 vs. 116 s, P=0.002) and negative images (111 vs. 137 s, P=0.018). Lastly, pathologists were asked to provide a numeric score regarding the difficulty of each image classification. On the basis of this score, pathologists considered the image review of micrometastases to be significantly easier when interpreted with assistance (P=0.0005). Utilizing a proof of concept assistant tool, this study demonstrates the potential of a deep learning algorithm to improve pathologist accu- racy and efficiency in a digital pathology workflow. Key Words: artificial intelligence, machine learning, digital pathology, breast cancer, computer aided detection (Am J Surg Pathol 2018;00:000–000) The regulatory approval and gradual implementation of whole-slide scanners has enabled the digitization of glass slides for remote consults and archival purposes.1 Digitiza- tion alone, however, does not necessarily improve the con- sistency or efficiency of a pathologist’s primary workflow. In fact, image review on a digital medium can be slightly slower than on glass, especially for pathologists with limited digital pathology experience.2 However, digital pathology and image analysis tools have already demonstrated po- tential benefits, including the potential to reduce inter-reader variability in the evaluation of breast cancer HER2 status.3,4 Digitization also opens the door for assistive tools based on Artificial Intelligence (AI) to improve efficiency and con- sistency, decrease fatigue, and increase accuracy.5 Among AI technologies, deep learning has demon- strated strong performance in many automated image-rec- ognition applications.6–8 Recently, several deep learning– based algorithms have been developed for the detection of breast cancer metastases in lymph nodes as well as for other applications in pathology.9,10 Initial findings suggest that some algorithms can even exceed a pathologist’s sensitivity for detecting individual cancer foci in digital images. How- ever, this sensitivity gain comes at the cost of increased false positives, potentially limiting the utility of such algorithms for automated clinical use.11 In addition, deep learning algo- rithms are inherently limited to the task for which they have been specifically trained. While we have begun to understand the strengths of these algorithms (such as exhaustive search) and their weaknesses (sensitivity to poor optical focus, tumor mimics; manuscript under review), the potential clinical util- ity of such algorithms has not been thoroughly examined. While an accurate algorithm alone will not necessarily aid pathologists or improve clinical interpretation, these benefits may be achieved through thoughtful and appropriate in- tegration of algorithm predictions into the clinical workflow.8 From the *Google AI Healthcare; and †Verily Life Sciences, Mountain View, CA. D.F.S., R.M., and Y.L. are co-first authors (equal contribution). Work done as part of the Google Brain Healthcare Technology Fellowship (D.F.S. and P.T.). Conflicts of Interest and Source of Funding: D.F.S., R.M., Y.L., P.T., J.D.H., C.G., F.T., L.P., M.C.S. are employees of Alphabet and have Alphabet stock. Correspondence: David F. Steiner, MD, PhD, Google AI Healthcare, 1600 Amphitheatre Way, Mountain View, CA 94043 (e-mail: davesteiner@google.com). Supplemental Digital Content is available for this article. Direct URL citations appear in the printed text and are provided in the HTML and PDF versions of this article on the journal’s website, www.ajsp.com. Copyright © 2018 The Author(s). Published by Wolters Kluwer Health, Inc. This is an open-access article distributed under the terms of the Creative Commons Attribution-Non Commercial-No Derivatives License 4.0 (CCBY-NC-ND), where it is permissible to download and share the work provided it is properly cited. The work cannot be changed in any way or used commercially without permission from the journal. ORIGINAL ARTICLE Am J Surg Pathol Volume 00, Number 00, ’’ 2018 www.ajsp.com | 1 Kazimierz O. Wrzeszczynski, PhD* Mayu O. Frank, NP, MS* Takahiko Koyama, PhD* Kahn Rhrissorrakrai, PhD* Nicolas Robine, PhD Filippo Utro, PhD Anne-Katrin Emde, PhD Bo-Juen Chen, PhD Kanika Arora, MS Minita Shah, MS Vladimir Vacic, PhD Raquel Norel, PhD Erhan Bilal, PhD Ewa A. Bergmann, MSc Julia L. Moore Vogel, PhD Jeffrey N. Bruce, MD Andrew B. Lassman, MD Peter Canoll, MD, PhD Christian Grommes, MD Steve Harvey, BS Laxmi Parida, PhD Vanessa V. Michelini, BS Michael C. Zody, PhD Vaidehi Jobanputra, PhD Ajay K. Royyuru, PhD Robert B. Darnell, MD, PhD Correspondence to Dr. Darnell: darnelr@rockefeller.edu Supplemental data at Neurology.org/ng Comparing sequencing assays and human-machine analyses in actionable genomics for glioblastoma ABSTRACT Objective: To analyze a glioblastoma tumor specimen with 3 different platforms and compare potentially actionable calls from each. Methods: Tumor DNA was analyzed by a commercial targeted panel. In addition, tumor-normal DNA was analyzed by whole-genome sequencing (WGS) and tumor RNA was analyzed by RNA sequencing (RNA-seq). The WGS and RNA-seq data were analyzed by a team of bioinformaticians and cancer oncologists, and separately by IBM Watson Genomic Analytics (WGA), an automated system for prioritizing somatic variants and identifying drugs. Results: More variants were identified by WGS/RNA analysis than by targeted panels. WGA com- pleted a comparable analysis in a fraction of the time required by the human analysts. Conclusions: The development of an effective human-machine interface in the analysis of deep cancer genomic datasets may provide potentially clinically actionable calls for individual pa- tients in a more timely and efficient manner than currently possible. ClinicalTrials.gov identifier: NCT02725684. Neurol Genet 2017;3:e164; doi: 10.1212/ NXG.0000000000000164 GLOSSARY CNV 5 copy number variant; EGFR 5 epidermal growth factor receptor; GATK 5 Genome Analysis Toolkit; GBM 5 glioblas- toma; IRB 5 institutional review board; NLP 5 Natural Language Processing; NYGC 5 New York Genome Center; RNA-seq 5 RNA sequencing; SNV 5 single nucleotide variant; SV 5 structural variant; TCGA 5 The Cancer Genome Atlas; TPM 5 transcripts per million; VCF 5 variant call file; VUS 5 variants of uncertain significance; WGA 5 Watson Genomic Analytics; WGS 5 whole-genome sequencing. The clinical application of next-generation sequencing technology to cancer diagnosis and treat- ment is in its early stages.1–3 An initial implementation of this technology has been in targeted panels, where subsets of cancer-relevant and/or highly actionable genes are scrutinized for potentially actionable mutations. This approach has been widely adopted, offering high redun- dancy of sequence coverage for the small number of sites of known clinical utility at relatively low cost. However, recent studies have shown that many more potentially clinically actionable muta- tions exist both in known cancer genes and in other genes not yet identified as cancer drivers.4,5 Improvements in the efficiency of next-generation sequencing make it possible to consider whole-genome sequencing (WGS) as well as other omic assays such as RNA sequencing (RNA-seq) as clinical assays, but uncertainties remain about how much additional useful infor- mation is available from these assays. *These authors contributed equally to the manuscript. From the New York Genome Center (K.O.W., M.O.F., N.R., A.-K.E., B.-J.C., K.A., M.S., V.V., E.A.B., J.L.M.V., M.C.Z., V.J., R.B.D.); IBM Thomas J. Watson Research Center (T.K., K.R., F.U., R.N., E.B., L.P., A.K.R.); Columbia University Medical Center (J.N.B., A.B.L., P.C., V.J.); Memorial Sloan-Kettering Cancer Center (C.G.), New York, NY; IBM Watson Health (S.H., V.V.M.), Boca Raton, FL; Laboratory of Molecular Neuro-Oncology (M.O.F., R.B.D.), and Howard Hughes Medical Institute (R.B.D.), The Rockefeller University, New York, NY. B.-J.C. is currently affiliated with Google, New York, NY. V.V. is currently affiliated with 23andMe, Inc., Mountain View, CA. E.A.B. is currently affiliated with Max Planck Institute of Immunobiology and Epigenetics, Freiburg, Germany. Funding information and disclosures are provided at the end of the article. Go to Neurology.org/ng for full disclosure forms. The Article Processing Charge was funded by the authors. This is an open access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License 4.0 (CC BY-NC-ND), which permits downloading and sharing the work provided it is properly cited. The work cannot be changed in any way or used commercially without permission from the journal. Neurology.org/ng Copyright © 2017 The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Academy of Neurology. 1 Enhancing Next-Generation Sequencing-Guided Cancer Care Through Cognitive Computing NIRALI M. PATEL,a,b,† VANESSA V. MICHELINI,f,† JEFF M. SNELL,a,c SAIANAND BALU,a ALAN P. HOYLE,a JOEL S. PARKER,a,c MICHELE C. HAYWARD,a DAVID A. EBERHARD,a,b ASHLEY H. SALAZAR,a PATRICK MCNEILLIE,g JIA XU,g CLAUDIA S. HUETTNER,g TAKAHIKO KOYAMA,h FILIPPO UTRO,h KAHN RHRISSORRAKRAI,h RAQUEL NOREL,h ERHAN BILAL,h AJAY ROYYURU,h LAXMI PARIDA,h H. SHELTON EARP,a,d JUNEKO E. GRILLEY-OLSON,a,d D. NEIL HAYES,a,d STEPHEN J. HARVEY,i NORMAN E. SHARPLESS,a,c,d WILLIAM Y. KIM a,c,d,e a Lineberger Comprehensive Cancer Center, b Department of Pathology and Laboratory Medicine, c Department of Genetics, d Department of Medicine, and e Department of Urology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA; f IBM Watson Health, Boca Raton, Florida, USA; g IBM Watson Health, Cambridge, Massachusetts, USA; h IBM Research, Yorktown Heights, New York, USA; i IBM Watson Health, Herndon, Virginia, USA † Contributed equally Disclosures of potential conflicts of interest may be found at the end of this article. Key Words. Genomics • High-throughput nucleotide sequencing • Artificial intelligence • Precision medicine ABSTRACT Background. Using next-generation sequencing (NGS) to guide cancer therapy has created challenges in analyzing and report- ing large volumes of genomic data to patients and caregivers. Specifically, providing current, accurate information on newly approved therapies and open clinical trials requires consider- able manual curation performed mainly by human “molecular tumor boards” (MTBs). The purpose of this study was to deter- mine the utility of cognitive computing as performed by Wat- son for Genomics (WfG) compared with a human MTB. Materials and Methods. One thousand eighteen patient cases that previously underwent targeted exon sequencing at the University of North Carolina (UNC) and subsequent analysis by the UNCseq informatics pipeline and the UNC MTB between November 7, 2011, and May 12, 2015, were analyzed with WfG, a cognitive computing technology for genomic analysis. Results. Using a WfG-curated actionable gene list, we identified additional genomic events of potential significance (not discov- ered by traditional MTB curation) in 323 (32%) patients. The majority of these additional genomic events were considered actionable based upon their ability to qualify patients for biomarker-selected clinical trials. Indeed, the opening of a rele- vant clinical trial within 1 month prior to WfG analysis provided the rationale for identification of a new actionable event in nearly a quarter of the 323 patients. This automated analysis took 3 minutes per case. Conclusion. These results demonstrate that the interpretation and actionability of somatic NGS results are evolving too rapidly to rely solely on human curation. Molecular tumor boards empowered by cognitive computing could potentially improve patient care by providing a rapid, comprehensive approach for data analysis and consideration of up-to-date availability of clinical trials.The Oncologist 2017;22:1–7 Implications for Practice: The results of this study demonstrate that the interpretation and actionability of somatic next-generation sequencing results are evolving too rapidly to rely solely on human curation. Molecular tumor boards empowered by cognitive computing can significantly improve patient care by providing a fast, cost-effective, and comprehensive approach for data analysis in the delivery of precision medicine. Patients and physicians who are considering enrollment in clinical trials may benefit from the support of such tools applied to genomic data. INTRODUCTION Next-generation sequencing (NGS) has emerged as an afford- able and reproducible means to query patients’ tumors for somatic genetic anomalies [1, 2].The optimal utilization of NGS is fundamental to the promise of “precision medicine,” yet the results of even targeted-capture NGS are highly complex, returning a variety of somatic events in hundreds of analyzed genes. The majority of such events have no known relevance to the treatment of patients with cancer, and even for Correspondence: William Y. Kim, M.D., or Norman E. Sharpless, M.D., Lineberger Comprehensive Cancer Center, University of North Carolina, CB# 7295, Chapel Hill, North Carolina 27599-7295, USA.Telephone: 919-966-4765; e-mail: william_kim@med.unc.edu or nes@med.unc.edu Received April 18, 2017; accepted for publication October 6, 2017. http://dx.doi.org/10.1634/theoncologist.2017-0170 TheOncologist 2017;22:1–7 www.TheOncologist.com Oc AlphaMed Press 2017 Cancer Diagnostics and Molecular Pathology Published Ahead of Print on November 20, 2017 as 10.1634/theoncologist.2017-0170. byguestonNovember22,2017http://theoncologist.alphamedpress.org/Downloadedfrom AJR:209, December 2017 1 Since 1992, concerns regarding interob- server variability in manual bone age esti- mation [4] have led to the establishment of several automatic computerized methods for bone age estimation, including computer-as- sisted skeletal age scores, computer-aided skeletal maturation assessment systems, and BoneXpert (Visiana) [5–14]. BoneXpert was developed according to traditional machine- learning techniques and has been shown to have a good performance for patients of var- ious ethnicities and in various clinical set- tings [10–14]. The deep-learning technique is an improvement in artificial neural net- works. Unlike traditional machine-learning techniques, deep-learning techniques allow an algorithm to program itself by learning from the images given a large dataset of la- beled examples, thus removing the need to specify rules [15]. Deep-learning techniques permit higher levels of abstraction and improved predic- tions from data. Deep-learning techniques Computerized Bone Age Estimation Using Deep Learning– Based Program: Evaluation of the Accuracy and Efficiency Jeong Rye Kim1 Woo Hyun Shim1 Hee Mang Yoon1 Sang Hyup Hong1 Jin Seong Lee1 Young Ah Cho1 Sangki Kim2 Kim JR, Shim WH, Yoon MH, et al. 1 Department of Radiology and Research Institute of Radiology, Asan Medical Center, University of Ulsan College of Medicine, 88 Olympic-ro 43-gil, Songpa-gu, Seoul 05505, South Korea. Address correspondence to H. M. Yoon (espoirhm@gmail.com). 2 Vuno Research Center, Vuno Inc., Seoul, South Korea. Pediatric Imaging • Original Research Supplemental Data Available online at www.ajronline.org. AJR 2017; 209:1–7 0361–803X/17/2096–1 © American Roentgen Ray Society B one age estimation is crucial for developmental status determina- tions and ultimate height predic- tions in the pediatric population, particularly for patients with growth disor- ders and endocrine abnormalities [1]. Two major left-hand wrist radiograph-based methods for bone age estimation are current- ly used: the Greulich-Pyle [2] and Tanner- Whitehouse [3] methods. The former is much more frequently used in clinical practice. Greulich-Pyle–based bone age estimation is performed by comparing a patient’s left-hand radiograph to standard radiographs in the Greulich-Pyle atlas and is therefore simple and easily applied in clinical practice. How- ever, the process of bone age estimation, which comprises a simple comparison of multiple images, can be repetitive and time consuming and is thus sometimes burden- some to radiologists. Moreover, the accuracy depends on the radiologist’s experience and tends to be subjective. Keywords: bone age, children, deep learning, neural network model DOI:10.2214/AJR.17.18224 J. R. Kim and W. H. Shim contributed equally to this work. Received March 12, 2017; accepted after revision July 7, 2017. S. Kim is employed by Vuno, Inc., which created the deep learning–based automatic software system for bone age determination. J. R. Kim, W. H. Shim, H. M. Yoon, S. H. Hong, J. S. Lee, and Y. A. Cho are employed by Asan Medical Center, which holds patent rights for the deep learning–based automatic software system for bone age assessment. OBJECTIVE. The purpose of this study is to evaluate the accuracy and efficiency of a new automatic software system for bone age assessment and to validate its feasibility in clini- cal practice. MATERIALS AND METHODS. A Greulich-Pyle method–based deep-learning tech- nique was used to develop the automatic software system for bone age determination. Using this software, bone age was estimated from left-hand radiographs of 200 patients (3–17 years old) using first-rank bone age (software only), computer-assisted bone age (two radiologists with software assistance), and Greulich-Pyle atlas–assisted bone age (two radiologists with Greulich-Pyle atlas assistance only). The reference bone age was determined by the consen- sus of two experienced radiologists. RESULTS. First-rank bone ages determined by the automatic software system showed a 69.5% concordance rate and significant correlations with the reference bone age (r = 0.992; p 0.001). Concordance rates increased with the use of the automatic software system for both reviewer 1 (63.0% for Greulich-Pyle atlas–assisted bone age vs 72.5% for computer-as- sisted bone age) and reviewer 2 (49.5% for Greulich-Pyle atlas–assisted bone age vs 57.5% for computer-assisted bone age). Reading times were reduced by 18.0% and 40.0% for reviewers 1 and 2, respectively. CONCLUSION. Automatic software system showed reliably accurate bone age estima- tions and appeared to enhance efficiency by reducing reading times without compromising the diagnostic accuracy. Kim et al. Accuracy and Efficiency of Computerized Bone Age Estimation Pediatric Imaging Original Research Downloadedfromwww.ajronline.orgbyFloridaAtlanticUnivon09/13/17fromIPaddress131.91.169.193.CopyrightARRS.Forpersonaluseonly;allrightsreserved TECHNICAL REPORT https://doi.org/10.1038/s41591-019-0539-7 1 Google Health, Mountain View, CA, USA. 2 Present address: AstraZeneca, Gaithersburg, MD, USA. 3 Present address: Tempus Labs Inc., Chicago, IL, USA. 4 These authors contributed equally: Po-Hsuan Cameron Chen, Krishna Gadepalli, Robert MacDonald. *e-mail: cmermel@google.com M icroscopic examination of samples is the gold standard for the diagnosis of cancer, autoimmune diseases, infectious diseases and more. In cancer, the microscopic examina- tion of stained tissue sections is critical for diagnosing and staging the patient’s tumor, which informs treatment decisions and prog- nosis. In cancer, microscopy analysis faces three major challenges. As a form of image interpretation, these examinations are inher- ently subjective, exhibiting considerable inter- and intra-observer variability1,2 . Moreover, clinical guidelines3 and studies4 have begun to require quantitative assessments as part of the effort towards better patient risk stratification3 . For example, breast cancer stag- ing requires the counting of mitotic cells, and quantification of the tumor burden in lymph nodes by measuring the largest tumor focus. However, despite being helpful in treatment planning, quan- tification is laborious and error-prone. Lastly, access to disease experts can be limited in both developed and developing countries5 , exacerbating the problem. As a potential solution, recent advances in AI, specifically deep learning6 , have demonstrated automated medical image analy- sis with performance comparable to that by human experts1,7–10 . Research has also shown the potential to improve diagnostic accuracy, quantitation and efficiency by applying deep learning algorithms to digitized whole-slide pathology images for cancer classification and detection8,10,11 . However, the integration of these advances into cancer diagnosis is not straightforward because of two primary challenges: image digitization and the technical skills required to utilize deep learning algorithms. First, most micro- scopic examinations are performed using analog microscopes, and a digitized workflow requires significant infrastructure investments. Second, because of differences in hardware, firmware and software, the use of AI algorithms developed by others is challenging even for experts. As such, actual utilization of AI in microscopy frequently remains inaccessible. Here, we propose a cost-effective solution to these barriers to entry of AI in microscopic analysis: an augmented optical light microscope that enables real-time integration of AI. We define ‘real-time integration’ as adding the capability of AI assistance without slowing down specimen review or modifying the standard workflow. We propose to superimpose the predictions of the AI algorithm on the view of the sample seen by the user through the eyepiece. Because augmenting additional information over the orig- inal view is termed augmented reality, we term this microscope the augmented reality microscope. Although we apply this technology to cancer diagnosis in this paper, the ARM is application-agnostic and can be utilized in other microscopy applications. Aligned with ARM’s function to serve as a viable platform for AI assistance in microscopy applications, the ARM system satisfies three major design requirements: spatial registration of the aug- mented information, system response time and robustness of the deep learning algorithms. First, AI predictions such as tumor or cell locations need to be precisely aligned with the specimen in the observer’s field of view (FOV) to retain the correct spatial context. Importantly, this alignment must be insensitive to small changes in the user’s eye position relative to the eyepiece (parallax-free) to account for user movements. Second, although the latest deep learn- ing algorithms often require billions of mathematical operations12 , these algorithms have to be applied in real time to avoid unnatural latency in the workflow. This is especially critical in applications such as cancer diagnosis, where the pathologist is constantly and rapidly panning around the slide. Finally, many deep learning algo- rithms for microscope images were developed using other digitiza- tion methods, such as whole-slide scanners in histopathology8,10,11 . We demonstrate that two deep learning a An augmen ed ea y m c oscope w h ea me a c a n e gence n eg a on o cance d agnos s m M m M m M m W W Lung cancer on Chest CT Lung Cancer on Pathology Breast Cancer on Pathology Polys Adenoma on Colonoscopy Analysis of Cancer Genome Analysis of Cancer Genome Assessment of Bone Age Breast Cancer on Pathology Breast Cancer on Pathology Improving the accuracy Improving the efficiency
  • 90.
    ARTICLES https://doi.org/10.1038/s41591-018-0177-5 1 Applied Bioinformatics Laboratories,New York University School of Medicine, New York, NY, USA. 2 Skirball Institute, Department of Cell Biology, New York University School of Medicine, New York, NY, USA. 3 Department of Pathology, New York University School of Medicine, New York, NY, USA. 4 School of Mechanical Engineering, National Technical University of Athens, Zografou, Greece. 5 Institute for Systems Genetics, New York University School of Medicine, New York, NY, USA. 6 Department of Biochemistry and Molecular Pharmacology, New York University School of Medicine, New York, NY, USA. 7 Center for Biospecimen Research and Development, New York University, New York, NY, USA. 8 Department of Population Health and the Center for Healthcare Innovation and Delivery Science, New York University School of Medicine, New York, NY, USA. 9 These authors contributed equally to this work: Nicolas Coudray, Paolo Santiago Ocampo. *e-mail: narges.razavian@nyumc.org; aristotelis.tsirigos@nyumc.org A ccording to the American Cancer Society and the Cancer Statistics Center (see URLs), over 150,000 patients with lung cancer succumb to the disease each year (154,050 expected for 2018), while another 200,000 new cases are diagnosed on a yearly basis (234,030 expected for 2018). It is one of the most widely spread cancers in the world because of not only smoking, but also exposure to toxic chemicals like radon, asbestos and arsenic. LUAD and LUSC are the two most prevalent types of non–small cell lung cancer1 , and each is associated with discrete treatment guidelines. In the absence of definitive histologic features, this important distinc- tion can be challenging and time-consuming, and requires confir- matory immunohistochemical stains. Classification of lung cancer type is a key diagnostic process because the available treatment options, including conventional chemotherapy and, more recently, targeted therapies, differ for LUAD and LUSC2 . Also, a LUAD diagnosis will prompt the search for molecular biomarkers and sensitizing mutations and thus has a great impact on treatment options3,4 . For example, epidermal growth factor receptor (EGFR) mutations, present in about 20% of LUAD, and anaplastic lymphoma receptor tyrosine kinase (ALK) rearrangements, present in5% of LUAD5 , currently have tar- geted therapies approved by the Food and Drug Administration (FDA)6,7 . Mutations in other genes, such as KRAS and tumor pro- tein P53 (TP53) are very common (about 25% and 50%, respec- tively) but have proven to be particularly challenging drug targets so far5,8 . Lung biopsies are typically used to diagnose lung cancer type and stage. Virtual microscopy of stained images of tissues is typically acquired at magnifications of 20×to 40×, generating very large two-dimensional images (10,000 to100,000 pixels in each dimension) that are oftentimes challenging to visually inspect in an exhaustive manner. Furthermore, accurate interpretation can be difficult, and the distinction between LUAD and LUSC is not always clear, particularly in poorly differentiated tumors; in this case, ancil- lary studies are recommended for accurate classification9,10 . To assist experts, automatic analysis of lung cancer whole-slide images has been recently studied to predict survival outcomes11 and classifica- tion12 . For the latter, Yu et al.12 combined conventional thresholding and image processing techniques with machine-learning methods, such as random forest classifiers, support vector machines (SVM) or Naive Bayes classifiers, achieving an AUC of ~0.85 in distinguishing normal from tumor slides, and ~0.75 in distinguishing LUAD from LUSC slides. More recently, deep learning was used for the classi- fication of breast, bladder and lung tumors, achieving an AUC of 0.83 in classification of lung tumor types on tumor slides from The Cancer Genome Atlas (TCGA)13 . Analysis of plasma DNA values was also shown to be a good predictor of the presence of non–small cell cancer, with an AUC of ~0.94 (ref. 14 ) in distinguishing LUAD from LUSC, whereas the use of immunochemical markers yields an AUC of ~0.94115 . Here, we demonstrate how the field can further benefit from deep learning by presenting a strategy based on convolutional neural networks (CNNs) that not only outperforms methods in previously Classification and mutation prediction from non–small cell lung cancer histopathology images using deep learning Nicolas Coudray 1,2,9 , Paolo Santiago Ocampo3,9 , Theodore Sakellaropoulos4 , Navneet Narula3 , Matija Snuderl3 , David Fenyö5,6 , Andre L. Moreira3,7 , Narges Razavian 8 * and Aristotelis Tsirigos 1,3 * Visual inspection of histopathology slides is one of the main methods used by pathologists to assess the stage, type and sub- type of lung tumors. Adenocarcinoma (LUAD) and squamous cell carcinoma (LUSC) are the most prevalent subtypes of lung cancer, and their distinction requires visual inspection by an experienced pathologist. In this study, we trained a deep con- volutional neural network (inception v3) on whole-slide images obtained from The Cancer Genome Atlas to accurately and automatically classify them into LUAD, LUSC or normal lung tissue. The performance of our method is comparable to that of pathologists, with an average area under the curve (AUC) of 0.97. Our model was validated on independent datasets of frozen tissues, formalin-fixed paraffin-embedded tissues and biopsies. Furthermore, we trained the network to predict the ten most commonly mutated genes in LUAD. We found that six of them—STK11, EGFR, FAT1, SETBP1, KRAS and TP53—can be pre- dicted from pathology images, with AUCs from 0.733 to 0.856 as measured on a held-out population. These findings suggest that deep-learning models can assist pathologists in the detection of cancer subtype or gene mutations. Our approach can be applied to any cancer type, and the code is available at https://github.com/ncoudray/DeepPATH. NATURE MEDICINE | www.nature.com/naturemedicine 1Wang P, et al. Gut 2019;0:1–7. doi:10.1136/gutjnl-2018-317500 Endoscopy ORIGINAL ARTICLE Real-time automatic detection system increases colonoscopic polyp and adenoma detection rates: a prospective randomised controlled study Pu Wang,  1 Tyler M Berzin,  2 Jeremy Romek Glissen Brown,  2 Shishira Bharadwaj,2 Aymeric Becq,2 Xun Xiao,1 Peixi Liu,1 Liangping Li,1 Yan Song,1 Di Zhang,1 Yi Li,1 Guangre Xu,1 Mengtian Tu,1 Xiaogang Liu  1 To cite: Wang P, Berzin TM, Glissen Brown JR, et al. Gut Epub ahead of print: [please include Day Month Year]. doi:10.1136/ gutjnl-2018-317500 ► Additional material is published online only.To view please visit the journal online (http://dx.doi.org/10.1136/ gutjnl-2018-317500). 1 Department of Gastroenterology, Sichuan Academy of Medical Sciences Sichuan Provincial People’s Hospital, Chengdu, China 2 Center for Advanced Endoscopy, Beth Israel Deaconess Medical Center and Harvard Medical School, Boston, Massachusetts, USA Correspondence to Xiaogang Liu, Department of Gastroenterology Sichuan Academy of Medical Sciences and Sichuan Provincial People’s Hospital, Chengdu, China; Gary.samsph@gmail.com Received 30 August 2018 Revised 4 February 2019 Accepted 13 February 2019 © Author(s) (or their employer(s)) 2019. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ. ABSTRACT Objective The effect of colonoscopy on colorectal cancer mortality is limited by several factors, among them a certain miss rate, leading to limited adenoma detection rates (ADRs).We investigated the effect of an automatic polyp detection system based on deep learning on polyp detection rate and ADR. Design In an open, non-blinded trial, consecutive patients were prospectively randomised to undergo diagnostic colonoscopy with or without assistance of a real-time automatic polyp detection system providing a simultaneous visual notice and sound alarm on polyp detection.The primary outcome was ADR. Results Of 1058 patients included, 536 were randomised to standard colonoscopy, and 522 were randomised to colonoscopy with computer-aided diagnosis.The artificial intelligence (AI) system significantly increased ADR (29.1%vs20.3%, p0.001) and the mean number of adenomas per patient (0.53vs0.31, p0.001).This was due to a higher number of diminutive adenomas found (185vs102; p0.001), while there was no statistical difference in larger adenomas (77vs58, p=0.075). In addition, the number of hyperplastic polyps was also significantly increased (114vs52, p0.001). Conclusions In a low prevalent ADR population, an automatic polyp detection system during colonoscopy resulted in a significant increase in the number of diminutive adenomas detected, as well as an increase in the rate of hyperplastic polyps.The cost–benefit ratio of such effects has to be determined further. Trial registration number ChiCTR-DDD-17012221; Results. INTRODUCTION Colorectal cancer (CRC) is the second and third- leading causes of cancer-related deaths in men and women respectively.1 Colonoscopy is the gold stan- dard for screening CRC.2 3 Screening colonoscopy has allowed for a reduction in the incidence and mortality of CRC via the detection and removal of adenomatous polyps.4–8 Additionally, there is evidence that with each 1.0% increase in adenoma detection rate (ADR), there is an associated 3.0% decrease in the risk of interval CRC.9 10 However, polyps can be missed, with reported miss rates of up to 27% due to both polyp and operator charac- teristics.11 12 Unrecognised polyps within the visual field is an important problem to address.11 Several studies have shown that assistance by a second observer increases the polyp detection rate (PDR), but such a strategy remains controversial in terms of increasing the ADR.13–15 Ideally, a real-time automatic polyp detec- tion system, with performance close to that of expert endoscopists, could assist the endosco- pist in detecting lesions that might correspond to adenomas in a more consistent and reliable way Significance of this study What is already known on this subject? ► Colorectal adenoma detection rate (ADR) is regarded as a main quality indicator of (screening) colonoscopy and has been shown to correlate with interval cancers. Reducing adenoma miss rates by increasing ADR has been a goal of many studies focused on imaging techniques and mechanical methods. ► Artificial intelligence has been recently introduced for polyp and adenoma detection as well as differentiation and has shown promising results in preliminary studies. What are the new findings? ► This represents the first prospective randomised controlled trial examining an automatic polyp detection during colonoscopy and shows an increase of ADR by 50%, from 20% to 30%. ► This effect was mainly due to a higher rate of small adenomas found. ► The detection rate of hyperplastic polyps was also significantly increased. How might it impact on clinical practice in the foreseeable future? ► Automatic polyp and adenoma detection could be the future of diagnostic colonoscopy in order to achieve stable high adenoma detection rates. ► However, the effect on ultimate outcome is still unclear, and further improvements such as polyp differentiation have to be implemented. on17March2019byguest.Protectedbycopyright.http://gut.bmj.com/Gut:firstpublishedas10.1136/gutjnl-2018-317500on27February2019.Downloadedfrom This copy is for personal use only. To order printed copies, contact reprints@rsna.org This copy is for personal use only. To order printed copies, contact reprints@rsna.org ORIGINAL RESEARCH • THORACIC IMAGING C hest radiography, one of the most common diagnos- tic imaging tests in medicine, is used for screening, diagnostic work-ups, and monitoring of various thoracic diseases (1,2). One of its major objectives is detection of pulmonary nodules because pulmonary nodules are often the initial radiologic manifestation of lung cancers (1,2). However, to date, pulmonary nodule detection on chest radiographs has not been completely satisfactory, with a reported sensitivity ranging between 36%–84%, varying widely according to the tumor size and study population (2–6). Indeed, chest radiography has been shown to be prone to many reading errors with low interobserver and intraobserver agreements because of its limited spatial reso- lution, noise from overlapping anatomic structures, and the variable perceptual ability of radiologists. Recent work shows that 19%–26% of lung cancers visible on chest ra- diographs were in fact missed at their first readings (6,7). Of course, hindsight is always perfect when one knows where to look. For this reason, there has been increasing dependency on chest CT images over chest radiographs in pulmonary nodule detection. However, even low-dose CT scans re- quire approximately 50–100 times higher radiation dose than single-view chest radiographic examinations (8,9) Development and Validation of Deep Learning–based Automatic Detection Algorithm for Malignant Pulmonary Nodules on Chest Radiographs Ju Gang Nam, MD* • Sunggyun Park, PhD* • Eui Jin Hwang, MD • Jong Hyuk Lee, MD • Kwang-Nam Jin, MD, PhD • KunYoung Lim, MD, PhD • Thienkai HuyVu, MD, PhD • Jae Ho Sohn, MD • Sangheum Hwang, PhD • Jin Mo Goo, MD, PhD • Chang Min Park, MD, PhD From the Department of Radiology and Institute of Radiation Medicine, Seoul National University Hospital and College of Medicine, 101 Daehak-ro, Jongno-gu, Seoul 03080, Republic of Korea (J.G.N., E.J.H., J.M.G., C.M.P.); Lunit Incorporated, Seoul, Republic of Korea (S.P.); Department of Radiology, Armed Forces Seoul Hospital, Seoul, Republic of Korea (J.H.L.); Department of Radiology, Seoul National University Boramae Medical Center, Seoul, Republic of Korea (K.N.J.); Department of Radiology, National Cancer Center, Goyang, Republic of Korea (K.Y.L.); Department of Radiology and Biomedical Imaging, University of California, San Francisco, San Francisco, Calif (T.H.V., J.H.S.); and Department of Industrial Information Systems Engineering, Seoul National University of Science and Technology, Seoul, Republic of Korea (S.H.). Received January 30, 2018; revision requested March 20; revision received July 29; accepted August 6. Address correspondence to C.M.P. (e-mail: cmpark.morphius@gmail.com). Study supported by SNUH Research Fund and Lunit (06–2016–3000) and by Seoul Research and Business Development Program (FI170002). *J.G.N. and S.P. contributed equally to this work. Conflicts of interest are listed at the end of this article. Radiology 2018; 00:1–11 • https://doi.org/10.1148/radiol.2018180237 • Content codes: Purpose: To develop and validate a deep learning–based automatic detection algorithm (DLAD) for malignant pulmonary nodules on chest radiographs and to compare its performance with physicians including thoracic radiologists. Materials and Methods: For this retrospective study, DLAD was developed by using 43292 chest radiographs (normal radiograph– to–nodule radiograph ratio, 34067:9225) in 34676 patients (healthy-to-nodule ratio, 30784:3892; 19230 men [mean age, 52.8 years; age range, 18–99 years]; 15446 women [mean age, 52.3 years; age range, 18–98 years]) obtained between 2010 and 2015, which were labeled and partially annotated by 13 board-certified radiologists, in a convolutional neural network. Radiograph clas- sification and nodule detection performances of DLAD were validated by using one internal and four external data sets from three South Korean hospitals and one U.S. hospital. For internal and external validation, radiograph classification and nodule detection performances of DLAD were evaluated by using the area under the receiver operating characteristic curve (AUROC) and jackknife alternative free-response receiver-operating characteristic (JAFROC) figure of merit (FOM), respectively. An observer performance test involving 18 physicians, including nine board-certified radiologists, was conducted by using one of the four external validation data sets. Performances of DLAD, physicians, and physicians assisted with DLAD were evaluated and compared. Results: According to one internal and four external validation data sets, radiograph classification and nodule detection perfor- mances of DLAD were a range of 0.92–0.99 (AUROC) and 0.831–0.924 (JAFROC FOM), respectively. DLAD showed a higher AUROC and JAFROC FOM at the observer performance test than 17 of 18 and 15 of 18 physicians, respectively (P , .05), and all physicians showed improved nodule detection performances with DLAD (mean JAFROC FOM improvement, 0.043; range, 0.006–0.190; P , .05). Conclusion: This deep learning–based automatic detection algorithm outperformed physicians in radiograph classification and nod- ule detection performance for malignant pulmonary nodules on chest radiographs, and it enhanced physicians’ performances when used as a second reader. ©RSNA, 2018 Online supplemental material is available for this article. LETTERS https://doi.org/10.1038/s41591-019-0447-x 1 Google AI, Mountain View, CA, USA. 2 Stanford Health Care and Palo Alto Veterans Affairs, Palo Alto, CA, USA. 3 Northwestern Medicine, Chicago, IL, USA. 4 New York University-Langone Medical Center, Center for Biological Imaging, New York City, NY, USA. 5 These authors contributed equally: Diego Ardila, Atilla P. Kiraly, Sujeeth Bharadwaj, Bokyung Choi. *e-mail: tsed@google.com With an estimated 160,000 deaths in 2018, lung cancer is the most common cause of cancer death in the United States1 . Lung cancer screening using low-dose computed tomography has been shown to reduce mortality by 20–43% and is now included in US screening guidelines1–6 . Existing challenges include inter-grader variability and high false-positive and false-negative rates7–10 . We propose a deep learning algorithm that uses a patient’s current and prior computed tomography volumes to predict the risk of lung cancer. Our model achieves a state-of-the-art performance (94.4% area under the curve) on 6,716 National Lung Cancer Screening Trial cases, and performs similarly on an independent clinical validation set of 1,139cases. We conducted two reader studies. When prior computed tomography imaging was not available, our model outperformed all six radiologists with absolute reductions of 11% in false positives and 5% in false negatives. Where prior computed tomography imaging was available, the model per- formance was on-par with the same radiologists. This creates an opportunity to optimize the screening process via com- puter assistance and automation. While the vast majority of patients remain unscreened, we show the potential for deep learning models to increase the accuracy, consistency and adoption of lung cancer screening worldwide. In 2013, the United States Preventive Services Task Force rec- ommended low-dose computed tomography (LDCT) lung cancer screening in high-risk populations based on reported improved mortality in the National Lung Cancer Screening Trial (NLST)2–5 . In 2014, the American College of Radiology published the Lung- RADS guidelines for LDCT lung cancer screening, to standardize image interpretation by radiologists and dictate management rec- ommendations1,6 . Evaluation is based on a variety of image find- ings, but primarily nodule size, density and growth6 . At screening sites, Lung-RADS and other models such as PanCan are used to determine malignancy risk ratings that drive recommendations for clinical management11,12 . Improving the sensitivity and specificity of lung cancer screening is imperative because of the high clinical and financial costs of missed diagnosis, late diagnosis and unneces- sary biopsy procedures resulting from false negatives and false posi- tives5,13–17 . Despite improved consistency, persistent inter-grader variability and incomplete characterization of comprehensive imaging findings remain as limitations7–10 of Lung-RADS. These limitations suggest opportunities for more sophisticated systems to improve performance and inter-reader consistency18,19 . Deep learning approaches offer the exciting potential to automate more complex image analysis, detect subtle holistic imaging findings and unify methodologies for image evaluation20 . A variety of software devices have been approved by the Food and Drug Administration (FDA) with the goal of address- ing workflow efficiency and performance through augmented detection of lung nodules on lung computed tomography (CT)21 . Clinical research has primarily focused on either nodule detec- tion or diagnostic support for lesions manually selected by imag- ing experts22–27 . Nodule detection systems were engineered with the goal of improving radiologist sensitivity in identifying nod- ules while minimizing costs to specificity, thereby falling into the category of computer-aided detection (CADe)28 . This approach highlights small nodules, leaving malignancy risk evaluation and clinical decision making to the clinician. Diagnostic support for pre-identified lesions is included in computer-aided diagnosis (CADx) platforms, which are primarily aimed at improving speci- ficity. CADx has gained greater interest and even first regulatory approvals in other areas of radiology, though not in lung cancer at the time of manuscript preparation29 . To move beyond the limitations of prior CADe and CADx approaches, we aimed to build an end-to-end approach perform- ing both localization and lung cancer risk categorization tasks using the input CT data alone. More specifically, we were interested in replicating a more complete part of a radiologist’s workflow, includ- ing full assessment of LDCT volume, focus on regions of concern, comparison to prior imaging when available and calibration against biopsy-confirmed outcomes. Another important high-level decision in our approach was to learn features using deep convolutional neural networks (CNN), rather than using hand-engineered features such as texture fea- tures or specific Hounsfield unit values. We chose to learn features because this approach has repeatedly been shown superior to hand- engineered features in many open computer vision competitions in the past five years30,31 , including the Kaggle 2017 Data Science Bowl which used NLST data32 . There were three key components in our new approach (Fig. 1). First, we constructed a three-dimensional (3D) CNN model that performs end-to-end analysis of whole-CT volumes, using LDCT End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography Diego Ardila 1,5 , Atilla P. Kiraly1,5 , Sujeeth Bharadwaj1,5 , Bokyung Choi1,5 , Joshua J. Reicher2 , Lily Peng1 , Daniel Tse 1 *, Mozziyar Etemadi 3 , Wenxing Ye1 , Greg Corrado1 , David P. Naidich4 and Shravya Shetty1 Corrected: Author Correction NATURE MEDICINE | VOL 25 | JUNE 2019 | 954–961 | www.nature.com/naturemedicine954 Downloadedfromhttps://journals.lww.com/ajspbyBhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4XMi0hCywCX1AWnYQp/IlQrHD3MyLIZIvnCFZVJ56DGsD590P5lh5KqE20T/dBX3x9CoM=on10/14/2018 Downloadedfromhttps://journals.lww.com/ajspbyBhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4XMi0hCywCX1AWnYQp/IlQrHD3MyLIZIvnCFZVJ56DGsD590P5lh5KqE20T/dBX3x9CoM=on10/14/2018 Impact of Deep Learning Assistance on the Histopathologic Review of Lymph Nodes for Metastatic Breast Cancer David F. Steiner, MD, PhD,* Robert MacDonald, PhD,* Yun Liu, PhD,* Peter Truszkowski, MD,* Jason D. Hipp, MD, PhD, FCAP,* Christopher Gammage, MS,* Florence Thng, MS,† Lily Peng, MD, PhD,* and Martin C. Stumpe, PhD* Abstract: Advances in the quality of whole-slide images have set the stage for the clinical use of digital images in anatomic pathology. Along with advances in computer image analysis, this raises the possibility for computer-assisted diagnostics in pathology to improve histopathologic interpretation and clinical care. To evaluate the potential impact of digital assistance on interpretation of digitized slides, we conducted a multireader multicase study utilizing our deep learning algorithm for the detection of breast cancer metastasis in lymph nodes. Six pathologists reviewed 70 digitized slides from lymph node sections in 2 reader modes, unassisted and assisted, with a wash- out period between sessions. In the assisted mode, the deep learning algorithm was used to identify and outline regions with high like- lihood of containing tumor. Algorithm-assisted pathologists demon- strated higher accuracy than either the algorithm or the pathologist alone. In particular, algorithm assistance significantly increased the sensitivity of detection for micrometastases (91% vs. 83%, P=0.02). In addition, average review time per image was significantly shorter with assistance than without assistance for both micrometastases (61 vs. 116 s, P=0.002) and negative images (111 vs. 137 s, P=0.018). Lastly, pathologists were asked to provide a numeric score regarding the difficulty of each image classification. On the basis of this score, pathologists considered the image review of micrometastases to be significantly easier when interpreted with assistance (P=0.0005). Utilizing a proof of concept assistant tool, this study demonstrates the potential of a deep learning algorithm to improve pathologist accu- racy and efficiency in a digital pathology workflow. Key Words: artificial intelligence, machine learning, digital pathology, breast cancer, computer aided detection (Am J Surg Pathol 2018;00:000–000) The regulatory approval and gradual implementation of whole-slide scanners has enabled the digitization of glass slides for remote consults and archival purposes.1 Digitiza- tion alone, however, does not necessarily improve the con- sistency or efficiency of a pathologist’s primary workflow. In fact, image review on a digital medium can be slightly slower than on glass, especially for pathologists with limited digital pathology experience.2 However, digital pathology and image analysis tools have already demonstrated po- tential benefits, including the potential to reduce inter-reader variability in the evaluation of breast cancer HER2 status.3,4 Digitization also opens the door for assistive tools based on Artificial Intelligence (AI) to improve efficiency and con- sistency, decrease fatigue, and increase accuracy.5 Among AI technologies, deep learning has demon- strated strong performance in many automated image-rec- ognition applications.6–8 Recently, several deep learning– based algorithms have been developed for the detection of breast cancer metastases in lymph nodes as well as for other applications in pathology.9,10 Initial findings suggest that some algorithms can even exceed a pathologist’s sensitivity for detecting individual cancer foci in digital images. How- ever, this sensitivity gain comes at the cost of increased false positives, potentially limiting the utility of such algorithms for automated clinical use.11 In addition, deep learning algo- rithms are inherently limited to the task for which they have been specifically trained. While we have begun to understand the strengths of these algorithms (such as exhaustive search) and their weaknesses (sensitivity to poor optical focus, tumor mimics; manuscript under review), the potential clinical util- ity of such algorithms has not been thoroughly examined. While an accurate algorithm alone will not necessarily aid pathologists or improve clinical interpretation, these benefits may be achieved through thoughtful and appropriate in- tegration of algorithm predictions into the clinical workflow.8 From the *Google AI Healthcare; and †Verily Life Sciences, Mountain View, CA. D.F.S., R.M., and Y.L. are co-first authors (equal contribution). Work done as part of the Google Brain Healthcare Technology Fellowship (D.F.S. and P.T.). Conflicts of Interest and Source of Funding: D.F.S., R.M., Y.L., P.T., J.D.H., C.G., F.T., L.P., M.C.S. are employees of Alphabet and have Alphabet stock. Correspondence: David F. Steiner, MD, PhD, Google AI Healthcare, 1600 Amphitheatre Way, Mountain View, CA 94043 (e-mail: davesteiner@google.com). Supplemental Digital Content is available for this article. Direct URL citations appear in the printed text and are provided in the HTML and PDF versions of this article on the journal’s website, www.ajsp.com. Copyright © 2018 The Author(s). Published by Wolters Kluwer Health, Inc. This is an open-access article distributed under the terms of the Creative Commons Attribution-Non Commercial-No Derivatives License 4.0 (CCBY-NC-ND), where it is permissible to download and share the work provided it is properly cited. The work cannot be changed in any way or used commercially without permission from the journal. ORIGINAL ARTICLE Am J Surg Pathol Volume 00, Number 00, ’’ 2018 www.ajsp.com | 1 Lung nodule on Chest X-ray Downloadedfromhttps://journals.lww.com/ajspbyBhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4XMi0hCywCX1AWnYQp/IlQrHD3MyLIZIvnCFZVJ56DGsD590P5lh5KqE20T/dBX3x9CoM=on10/14/2018 Downloadedfromhttps://journals.lww.com/ajspbyBhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4XMi0hCywCX1AWnYQp/IlQrHD3MyLIZIvnCFZVJ56DGsD590P5lh5KqE20T/dBX3x9CoM=on10/14/2018 Impact of Deep Learning Assistance on the Histopathologic Review of Lymph Nodes for Metastatic Breast Cancer David F. Steiner, MD, PhD,* Robert MacDonald, PhD,* Yun Liu, PhD,* Peter Truszkowski, MD,* Jason D. Hipp, MD, PhD, FCAP,* Christopher Gammage, MS,* Florence Thng, MS,† Lily Peng, MD, PhD,* and Martin C. Stumpe, PhD* Abstract: Advances in the quality of whole-slide images have set the stage for the clinical use of digital images in anatomic pathology. Along with advances in computer image analysis, this raises the possibility for computer-assisted diagnostics in pathology to improve histopathologic interpretation and clinical care. To evaluate the potential impact of digital assistance on interpretation of digitized slides, we conducted a multireader multicase study utilizing our deep learning algorithm for the detection of breast cancer metastasis in lymph nodes. Six pathologists reviewed 70 digitized slides from lymph node sections in 2 reader modes, unassisted and assisted, with a wash- out period between sessions. In the assisted mode, the deep learning algorithm was used to identify and outline regions with high like- lihood of containing tumor. Algorithm-assisted pathologists demon- strated higher accuracy than either the algorithm or the pathologist alone. In particular, algorithm assistance significantly increased the sensitivity of detection for micrometastases (91% vs. 83%, P=0.02). In addition, average review time per image was significantly shorter with assistance than without assistance for both micrometastases (61 vs. 116 s, P=0.002) and negative images (111 vs. 137 s, P=0.018). Lastly, pathologists were asked to provide a numeric score regarding the difficulty of each image classification. On the basis of this score, pathologists considered the image review of micrometastases to be significantly easier when interpreted with assistance (P=0.0005). Utilizing a proof of concept assistant tool, this study demonstrates the potential of a deep learning algorithm to improve pathologist accu- racy and efficiency in a digital pathology workflow. Key Words: artificial intelligence, machine learning, digital pathology, breast cancer, computer aided detection (Am J Surg Pathol 2018;00:000–000) The regulatory approval and gradual implementation of whole-slide scanners has enabled the digitization of glass slides for remote consults and archival purposes.1 Digitiza- tion alone, however, does not necessarily improve the con- sistency or efficiency of a pathologist’s primary workflow. In fact, image review on a digital medium can be slightly slower than on glass, especially for pathologists with limited digital pathology experience.2 However, digital pathology and image analysis tools have already demonstrated po- tential benefits, including the potential to reduce inter-reader variability in the evaluation of breast cancer HER2 status.3,4 Digitization also opens the door for assistive tools based on Artificial Intelligence (AI) to improve efficiency and con- sistency, decrease fatigue, and increase accuracy.5 Among AI technologies, deep learning has demon- strated strong performance in many automated image-rec- ognition applications.6–8 Recently, several deep learning– based algorithms have been developed for the detection of breast cancer metastases in lymph nodes as well as for other applications in pathology.9,10 Initial findings suggest that some algorithms can even exceed a pathologist’s sensitivity for detecting individual cancer foci in digital images. How- ever, this sensitivity gain comes at the cost of increased false positives, potentially limiting the utility of such algorithms for automated clinical use.11 In addition, deep learning algo- rithms are inherently limited to the task for which they have been specifically trained. While we have begun to understand the strengths of these algorithms (such as exhaustive search) and their weaknesses (sensitivity to poor optical focus, tumor mimics; manuscript under review), the potential clinical util- ity of such algorithms has not been thoroughly examined. While an accurate algorithm alone will not necessarily aid pathologists or improve clinical interpretation, these benefits may be achieved through thoughtful and appropriate in- tegration of algorithm predictions into the clinical workflow.8 From the *Google AI Healthcare; and †Verily Life Sciences, Mountain View, CA. D.F.S., R.M., and Y.L. are co-first authors (equal contribution). Work done as part of the Google Brain Healthcare Technology Fellowship (D.F.S. and P.T.). Conflicts of Interest and Source of Funding: D.F.S., R.M., Y.L., P.T., J.D.H., C.G., F.T., L.P., M.C.S. are employees of Alphabet and have Alphabet stock. Correspondence: David F. Steiner, MD, PhD, Google AI Healthcare, 1600 Amphitheatre Way, Mountain View, CA 94043 (e-mail: davesteiner@google.com). Supplemental Digital Content is available for this article. Direct URL citations appear in the printed text and are provided in the HTML and PDF versions of this article on the journal’s website, www.ajsp.com. Copyright © 2018 The Author(s). Published by Wolters Kluwer Health, Inc. This is an open-access article distributed under the terms of the Creative Commons Attribution-Non Commercial-No Derivatives License 4.0 (CCBY-NC-ND), where it is permissible to download and share the work provided it is properly cited. The work cannot be changed in any way or used commercially without permission from the journal. ORIGINAL ARTICLE Am J Surg Pathol Volume 00, Number 00, ’’ 2018 www.ajsp.com | 1 Kazimierz O. Wrzeszczynski, PhD* Mayu O. Frank, NP, MS* Takahiko Koyama, PhD* Kahn Rhrissorrakrai, PhD* Nicolas Robine, PhD Filippo Utro, PhD Anne-Katrin Emde, PhD Bo-Juen Chen, PhD Kanika Arora, MS Minita Shah, MS Vladimir Vacic, PhD Raquel Norel, PhD Erhan Bilal, PhD Ewa A. Bergmann, MSc Julia L. Moore Vogel, PhD Jeffrey N. Bruce, MD Andrew B. Lassman, MD Peter Canoll, MD, PhD Christian Grommes, MD Steve Harvey, BS Laxmi Parida, PhD Vanessa V. Michelini, BS Michael C. Zody, PhD Vaidehi Jobanputra, PhD Ajay K. Royyuru, PhD Robert B. Darnell, MD, PhD Correspondence to Dr. Darnell: darnelr@rockefeller.edu Supplemental data at Neurology.org/ng Comparing sequencing assays and human-machine analyses in actionable genomics for glioblastoma ABSTRACT Objective: To analyze a glioblastoma tumor specimen with 3 different platforms and compare potentially actionable calls from each. Methods: Tumor DNA was analyzed by a commercial targeted panel. In addition, tumor-normal DNA was analyzed by whole-genome sequencing (WGS) and tumor RNA was analyzed by RNA sequencing (RNA-seq). The WGS and RNA-seq data were analyzed by a team of bioinformaticians and cancer oncologists, and separately by IBM Watson Genomic Analytics (WGA), an automated system for prioritizing somatic variants and identifying drugs. Results: More variants were identified by WGS/RNA analysis than by targeted panels. WGA com- pleted a comparable analysis in a fraction of the time required by the human analysts. Conclusions: The development of an effective human-machine interface in the analysis of deep cancer genomic datasets may provide potentially clinically actionable calls for individual pa- tients in a more timely and efficient manner than currently possible. ClinicalTrials.gov identifier: NCT02725684. Neurol Genet 2017;3:e164; doi: 10.1212/ NXG.0000000000000164 GLOSSARY CNV 5 copy number variant; EGFR 5 epidermal growth factor receptor; GATK 5 Genome Analysis Toolkit; GBM 5 glioblas- toma; IRB 5 institutional review board; NLP 5 Natural Language Processing; NYGC 5 New York Genome Center; RNA-seq 5 RNA sequencing; SNV 5 single nucleotide variant; SV 5 structural variant; TCGA 5 The Cancer Genome Atlas; TPM 5 transcripts per million; VCF 5 variant call file; VUS 5 variants of uncertain significance; WGA 5 Watson Genomic Analytics; WGS 5 whole-genome sequencing. The clinical application of next-generation sequencing technology to cancer diagnosis and treat- ment is in its early stages.1–3 An initial implementation of this technology has been in targeted panels, where subsets of cancer-relevant and/or highly actionable genes are scrutinized for potentially actionable mutations. This approach has been widely adopted, offering high redun- dancy of sequence coverage for the small number of sites of known clinical utility at relatively low cost. However, recent studies have shown that many more potentially clinically actionable muta- tions exist both in known cancer genes and in other genes not yet identified as cancer drivers.4,5 Improvements in the efficiency of next-generation sequencing make it possible to consider whole-genome sequencing (WGS) as well as other omic assays such as RNA sequencing (RNA-seq) as clinical assays, but uncertainties remain about how much additional useful infor- mation is available from these assays. *These authors contributed equally to the manuscript. From the New York Genome Center (K.O.W., M.O.F., N.R., A.-K.E., B.-J.C., K.A., M.S., V.V., E.A.B., J.L.M.V., M.C.Z., V.J., R.B.D.); IBM Thomas J. Watson Research Center (T.K., K.R., F.U., R.N., E.B., L.P., A.K.R.); Columbia University Medical Center (J.N.B., A.B.L., P.C., V.J.); Memorial Sloan-Kettering Cancer Center (C.G.), New York, NY; IBM Watson Health (S.H., V.V.M.), Boca Raton, FL; Laboratory of Molecular Neuro-Oncology (M.O.F., R.B.D.), and Howard Hughes Medical Institute (R.B.D.), The Rockefeller University, New York, NY. B.-J.C. is currently affiliated with Google, New York, NY. V.V. is currently affiliated with 23andMe, Inc., Mountain View, CA. E.A.B. is currently affiliated with Max Planck Institute of Immunobiology and Epigenetics, Freiburg, Germany. Funding information and disclosures are provided at the end of the article. Go to Neurology.org/ng for full disclosure forms. The Article Processing Charge was funded by the authors. This is an open access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License 4.0 (CC BY-NC-ND), which permits downloading and sharing the work provided it is properly cited. The work cannot be changed in any way or used commercially without permission from the journal. Neurology.org/ng Copyright © 2017 The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Academy of Neurology. 1 Enhancing Next-Generation Sequencing-Guided Cancer Care Through Cognitive Computing NIRALI M. PATEL,a,b,† VANESSA V. MICHELINI,f,† JEFF M. SNELL,a,c SAIANAND BALU,a ALAN P. HOYLE,a JOEL S. PARKER,a,c MICHELE C. HAYWARD,a DAVID A. EBERHARD,a,b ASHLEY H. SALAZAR,a PATRICK MCNEILLIE,g JIA XU,g CLAUDIA S. HUETTNER,g TAKAHIKO KOYAMA,h FILIPPO UTRO,h KAHN RHRISSORRAKRAI,h RAQUEL NOREL,h ERHAN BILAL,h AJAY ROYYURU,h LAXMI PARIDA,h H. SHELTON EARP,a,d JUNEKO E. GRILLEY-OLSON,a,d D. NEIL HAYES,a,d STEPHEN J. HARVEY,i NORMAN E. SHARPLESS,a,c,d WILLIAM Y. KIM a,c,d,e a Lineberger Comprehensive Cancer Center, b Department of Pathology and Laboratory Medicine, c Department of Genetics, d Department of Medicine, and e Department of Urology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA; f IBM Watson Health, Boca Raton, Florida, USA; g IBM Watson Health, Cambridge, Massachusetts, USA; h IBM Research, Yorktown Heights, New York, USA; i IBM Watson Health, Herndon, Virginia, USA † Contributed equally Disclosures of potential conflicts of interest may be found at the end of this article. Key Words. Genomics • High-throughput nucleotide sequencing • Artificial intelligence • Precision medicine ABSTRACT Background. Using next-generation sequencing (NGS) to guide cancer therapy has created challenges in analyzing and report- ing large volumes of genomic data to patients and caregivers. Specifically, providing current, accurate information on newly approved therapies and open clinical trials requires consider- able manual curation performed mainly by human “molecular tumor boards” (MTBs). The purpose of this study was to deter- mine the utility of cognitive computing as performed by Wat- son for Genomics (WfG) compared with a human MTB. Materials and Methods. One thousand eighteen patient cases that previously underwent targeted exon sequencing at the University of North Carolina (UNC) and subsequent analysis by the UNCseq informatics pipeline and the UNC MTB between November 7, 2011, and May 12, 2015, were analyzed with WfG, a cognitive computing technology for genomic analysis. Results. Using a WfG-curated actionable gene list, we identified additional genomic events of potential significance (not discov- ered by traditional MTB curation) in 323 (32%) patients. The majority of these additional genomic events were considered actionable based upon their ability to qualify patients for biomarker-selected clinical trials. Indeed, the opening of a rele- vant clinical trial within 1 month prior to WfG analysis provided the rationale for identification of a new actionable event in nearly a quarter of the 323 patients. This automated analysis took 3 minutes per case. Conclusion. These results demonstrate that the interpretation and actionability of somatic NGS results are evolving too rapidly to rely solely on human curation. Molecular tumor boards empowered by cognitive computing could potentially improve patient care by providing a rapid, comprehensive approach for data analysis and consideration of up-to-date availability of clinical trials.The Oncologist 2017;22:1–7 Implications for Practice: The results of this study demonstrate that the interpretation and actionability of somatic next-generation sequencing results are evolving too rapidly to rely solely on human curation. Molecular tumor boards empowered by cognitive computing can significantly improve patient care by providing a fast, cost-effective, and comprehensive approach for data analysis in the delivery of precision medicine. Patients and physicians who are considering enrollment in clinical trials may benefit from the support of such tools applied to genomic data. INTRODUCTION Next-generation sequencing (NGS) has emerged as an afford- able and reproducible means to query patients’ tumors for somatic genetic anomalies [1, 2].The optimal utilization of NGS is fundamental to the promise of “precision medicine,” yet the results of even targeted-capture NGS are highly complex, returning a variety of somatic events in hundreds of analyzed genes. The majority of such events have no known relevance to the treatment of patients with cancer, and even for Correspondence: William Y. Kim, M.D., or Norman E. Sharpless, M.D., Lineberger Comprehensive Cancer Center, University of North Carolina, CB# 7295, Chapel Hill, North Carolina 27599-7295, USA.Telephone: 919-966-4765; e-mail: william_kim@med.unc.edu or nes@med.unc.edu Received April 18, 2017; accepted for publication October 6, 2017. http://dx.doi.org/10.1634/theoncologist.2017-0170 TheOncologist 2017;22:1–7 www.TheOncologist.com Oc AlphaMed Press 2017 Cancer Diagnostics and Molecular Pathology Published Ahead of Print on November 20, 2017 as 10.1634/theoncologist.2017-0170. byguestonNovember22,2017http://theoncologist.alphamedpress.org/Downloadedfrom AJR:209, December 2017 1 Since 1992, concerns regarding interob- server variability in manual bone age esti- mation [4] have led to the establishment of several automatic computerized methods for bone age estimation, including computer-as- sisted skeletal age scores, computer-aided skeletal maturation assessment systems, and BoneXpert (Visiana) [5–14]. BoneXpert was developed according to traditional machine- learning techniques and has been shown to have a good performance for patients of var- ious ethnicities and in various clinical set- tings [10–14]. The deep-learning technique is an improvement in artificial neural net- works. Unlike traditional machine-learning techniques, deep-learning techniques allow an algorithm to program itself by learning from the images given a large dataset of la- beled examples, thus removing the need to specify rules [15]. Deep-learning techniques permit higher levels of abstraction and improved predic- tions from data. Deep-learning techniques Computerized Bone Age Estimation Using Deep Learning– Based Program: Evaluation of the Accuracy and Efficiency Jeong Rye Kim1 Woo Hyun Shim1 Hee Mang Yoon1 Sang Hyup Hong1 Jin Seong Lee1 Young Ah Cho1 Sangki Kim2 Kim JR, Shim WH, Yoon MH, et al. 1 Department of Radiology and Research Institute of Radiology, Asan Medical Center, University of Ulsan College of Medicine, 88 Olympic-ro 43-gil, Songpa-gu, Seoul 05505, South Korea. Address correspondence to H. M. Yoon (espoirhm@gmail.com). 2 Vuno Research Center, Vuno Inc., Seoul, South Korea. Pediatric Imaging • Original Research Supplemental Data Available online at www.ajronline.org. AJR 2017; 209:1–7 0361–803X/17/2096–1 © American Roentgen Ray Society B one age estimation is crucial for developmental status determina- tions and ultimate height predic- tions in the pediatric population, particularly for patients with growth disor- ders and endocrine abnormalities [1]. Two major left-hand wrist radiograph-based methods for bone age estimation are current- ly used: the Greulich-Pyle [2] and Tanner- Whitehouse [3] methods. The former is much more frequently used in clinical practice. Greulich-Pyle–based bone age estimation is performed by comparing a patient’s left-hand radiograph to standard radiographs in the Greulich-Pyle atlas and is therefore simple and easily applied in clinical practice. How- ever, the process of bone age estimation, which comprises a simple comparison of multiple images, can be repetitive and time consuming and is thus sometimes burden- some to radiologists. Moreover, the accuracy depends on the radiologist’s experience and tends to be subjective. Keywords: bone age, children, deep learning, neural network model DOI:10.2214/AJR.17.18224 J. R. Kim and W. H. Shim contributed equally to this work. Received March 12, 2017; accepted after revision July 7, 2017. S. Kim is employed by Vuno, Inc., which created the deep learning–based automatic software system for bone age determination. J. R. Kim, W. H. Shim, H. M. Yoon, S. H. Hong, J. S. Lee, and Y. A. Cho are employed by Asan Medical Center, which holds patent rights for the deep learning–based automatic software system for bone age assessment. OBJECTIVE. The purpose of this study is to evaluate the accuracy and efficiency of a new automatic software system for bone age assessment and to validate its feasibility in clini- cal practice. MATERIALS AND METHODS. A Greulich-Pyle method–based deep-learning tech- nique was used to develop the automatic software system for bone age determination. Using this software, bone age was estimated from left-hand radiographs of 200 patients (3–17 years old) using first-rank bone age (software only), computer-assisted bone age (two radiologists with software assistance), and Greulich-Pyle atlas–assisted bone age (two radiologists with Greulich-Pyle atlas assistance only). The reference bone age was determined by the consen- sus of two experienced radiologists. RESULTS. First-rank bone ages determined by the automatic software system showed a 69.5% concordance rate and significant correlations with the reference bone age (r = 0.992; p 0.001). Concordance rates increased with the use of the automatic software system for both reviewer 1 (63.0% for Greulich-Pyle atlas–assisted bone age vs 72.5% for computer-as- sisted bone age) and reviewer 2 (49.5% for Greulich-Pyle atlas–assisted bone age vs 57.5% for computer-assisted bone age). Reading times were reduced by 18.0% and 40.0% for reviewers 1 and 2, respectively. CONCLUSION. Automatic software system showed reliably accurate bone age estima- tions and appeared to enhance efficiency by reducing reading times without compromising the diagnostic accuracy. Kim et al. Accuracy and Efficiency of Computerized Bone Age Estimation Pediatric Imaging Original Research Downloadedfromwww.ajronline.orgbyFloridaAtlanticUnivon09/13/17fromIPaddress131.91.169.193.CopyrightARRS.Forpersonaluseonly;allrightsreserved TECHNICAL REPORT https://doi.org/10.1038/s41591-019-0539-7 1 Google Health, Mountain View, CA, USA. 2 Present address: AstraZeneca, Gaithersburg, MD, USA. 3 Present address: Tempus Labs Inc., Chicago, IL, USA. 4 These authors contributed equally: Po-Hsuan Cameron Chen, Krishna Gadepalli, Robert MacDonald. *e-mail: cmermel@google.com M icroscopic examination of samples is the gold standard for the diagnosis of cancer, autoimmune diseases, infectious diseases and more. In cancer, the microscopic examina- tion of stained tissue sections is critical for diagnosing and staging the patient’s tumor, which informs treatment decisions and prog- nosis. In cancer, microscopy analysis faces three major challenges. As a form of image interpretation, these examinations are inher- ently subjective, exhibiting considerable inter- and intra-observer variability1,2 . Moreover, clinical guidelines3 and studies4 have begun to require quantitative assessments as part of the effort towards better patient risk stratification3 . For example, breast cancer stag- ing requires the counting of mitotic cells, and quantification of the tumor burden in lymph nodes by measuring the largest tumor focus. However, despite being helpful in treatment planning, quan- tification is laborious and error-prone. Lastly, access to disease experts can be limited in both developed and developing countries5 , exacerbating the problem. As a potential solution, recent advances in AI, specifically deep learning6 , have demonstrated automated medical image analy- sis with performance comparable to that by human experts1,7–10 . Research has also shown the potential to improve diagnostic accuracy, quantitation and efficiency by applying deep learning algorithms to digitized whole-slide pathology images for cancer classification and detection8,10,11 . However, the integration of these advances into cancer diagnosis is not straightforward because of two primary challenges: image digitization and the technical skills required to utilize deep learning algorithms. First, most micro- scopic examinations are performed using analog microscopes, and a digitized workflow requires significant infrastructure investments. Second, because of differences in hardware, firmware and software, the use of AI algorithms developed by others is challenging even for experts. As such, actual utilization of AI in microscopy frequently remains inaccessible. Here, we propose a cost-effective solution to these barriers to entry of AI in microscopic analysis: an augmented optical light microscope that enables real-time integration of AI. We define ‘real-time integration’ as adding the capability of AI assistance without slowing down specimen review or modifying the standard workflow. We propose to superimpose the predictions of the AI algorithm on the view of the sample seen by the user through the eyepiece. Because augmenting additional information over the orig- inal view is termed augmented reality, we term this microscope the augmented reality microscope. Although we apply this technology to cancer diagnosis in this paper, the ARM is application-agnostic and can be utilized in other microscopy applications. Aligned with ARM’s function to serve as a viable platform for AI assistance in microscopy applications, the ARM system satisfies three major design requirements: spatial registration of the aug- mented information, system response time and robustness of the deep learning algorithms. First, AI predictions such as tumor or cell locations need to be precisely aligned with the specimen in the observer’s field of view (FOV) to retain the correct spatial context. Importantly, this alignment must be insensitive to small changes in the user’s eye position relative to the eyepiece (parallax-free) to account for user movements. Second, although the latest deep learn- ing algorithms often require billions of mathematical operations12 , these algorithms have to be applied in real time to avoid unnatural latency in the workflow. This is especially critical in applications such as cancer diagnosis, where the pathologist is constantly and rapidly panning around the slide. Finally, many deep learning algo- rithms for microscope images were developed using other digitiza- tion methods, such as whole-slide scanners in histopathology8,10,11 . We demonstrate that two deep learning a An augmen ed ea y m c oscope w h ea me a c a n e gence n eg a on o cance d agnos s m M m M m M m W W LETTER A c n ca y app cab e approach o con nuou pred c on o u ure acu e k dney n ury A g d m w n ng o R W o o p ho A A m B D H C A I S a ab and a u a d p a n ng w h on h a h o d Lung cancer on Chest CT Lung Cancer on Pathology Breast Cancer on Pathology Polys Adenoma on Colonoscopy Analysis of Cancer Genome Analysis of Cancer Genome Assessment of Bone Age Breast Cancer on Pathology Breast Cancer on Pathology Septic Shock Acute Kidney Injury Cardiac Arrest Mortality readmission Improving the accuracy Improving the efficiency m w m m m m w A Enabling the prediction Arrhythmia
  • 91.
    referred from Dr.Eric Topol on Twitter “virtuous cycle”
  • 92.
    How can wemake a better medicine, with artificial (aka. augmented) intelligence?
  • 93.
    Feedback/Questions • Email: yoonsup.choi@gmail.com •Blog: http://www.yoonsupchoi.com • Facebook: Yoon Sup Choi 최윤섭 지음 의료인공지능 표지디자인•최승협 치를 만드는 것 기업가, 엔젤투 의 대표적인 전 이 분야를 처음 포항공과대학교 동 대학원 시스 취득하였다. 스 조교수, KT 종합 구원 연구조교 저널에 10여 편 국내 최초로 디 윤섭 디지털 헬 국내 유일의 헬 어 파트너스’의 스타트업을 의 관대학교 디지 뷰노, 직토, 3b 소울링, 메디히 자문을 맡아 한 고 있다. 국내 케어 이노베이 을 연재하고 있 와 『그렇게 나는 •블로그_ htt •페이스북_ h •이메일_ yoo 최윤섭 의료 인공지능은 보수적인 의료 시스템을 재편할 혁신을 일으키고 있다. 의료 인공지능의 빠른 발전과 광범위한 영향은 전문화, 세분화되며 발전해 온 현대 의료 전문가들이 이해하기가 어려우며, 어디서부 터 공부해야 할지도 막연하다. 이런 상황에서 의료 인공지능의 개념과 적용, 그리고 의사와의 관계를 쉽 게 풀어내는 이 책은 좋은 길라잡이가 될 것이다. 특히 미래의 주역이 될 의학도와 젊은 의료인에게 유용 한 소개서이다. ━ 서준범, 서울아산병원 영상의학과 교수, 의료영상인공지능사업단장 인공지능이 의료의 패러다임을 크게 바꿀 것이라는 것에 동의하지 않는 사람은 거의 없다. 하지만 인공 지능이 처리해야 할 의료의 난제는 많으며 그 해결 방안도 천차만별이다. 흔히 생각하는 만병통치약 같 은 의료 인공지능은 존재하지 않는다. 이 책은 다양한 의료 인공지능의 개발, 활용 및 가능성을 균형 있 게 분석하고 있다. 인공지능을 도입하려는 의료인, 생소한 의료 영역에 도전할 인공지능 연구자 모두에 게 일독을 권한다. ━ 정지훈, 경희사이버대 미디어커뮤니케이션학과 선임강의교수, 의사 서울의대 기초의학교육을 책임지고 있는 교수의 입장에서, 산업화 이후 변하지 않은 현재의 의학 교육 으로는 격변하는 인공지능 시대에 의대생을 대비시키지 못한다는 한계를 절실히 느낀다. 저와 함께 의 대 인공지능 교육을 개척하고 있는 최윤섭 소장의 전문적 분석과 미래 지향적 안목이 담긴 책이다. 인공 지능이라는 미래를 대비할 의대생과 교수, 그리고 의대 진학을 고민하는 학생과 학부모에게 추천한다. ━ 최형진, 서울대학교 의과대학 해부학교실 교수, 내과 전문의 최근 의료 인공지능의 도입에 대해서 극단적인 시각과 태도가 공존하고 있다. 이 책은 다양한 사례와 깊 은 통찰을 통해 의료 인공지능의 현황과 미래에 대해 균형적인 시각을 제공하여, 인공지능이 의료에 본 격적으로 도입되기 위한 토론의 장을 마련한다. 의료 인공지능이 일상화된 10년 후 돌아보았을 때, 이 책 이 그런 시대를 이끄는 길라잡이 역할을 하였음을 확인할 수 있기를 기대한다. ━ 정규환, 뷰노 CTO 의료 인공지능은 다른 분야 인공지능보다 더 본질적인 이해가 필요하다. 단순히 인간의 일을 대신하는 수준을 넘어 의학의 패러다임을 데이터 기반으로 변화시키기 때문이다. 따라서 인공지능을 균형있게 이 해하고, 어떻게 의사와 환자에게 도움을 줄 수 있을지 깊은 고민이 필요하다. 세계적으로 일어나고 있는 이러한 노력의 결과물을 집대성한 이 책이 반가운 이유다. ━ 백승욱, 루닛 대표 의료 인공지능의 최신 동향뿐만 아니라, 의의와 한계, 전망, 그리고 다양한 생각거리까지 주는 책이다. 논쟁이 되는 여러 이슈에 대해서도 저자는 자신의 시각을 명확한 근거에 기반하여 설득력 있게 제시하 고 있다. 개인적으로는 이 책을 대학원 수업 교재로 활용하려 한다. ━ 신수용, 성균관대학교 디지털헬스학과 교수 최윤섭지음 의료인공지능 값 20,000원 ISBN 979-11-86269-99-2 의료 인공지능을 본격적으로 다룬 국내 최초의 책! 이 책은 의료 인공지능의 기술적 측면과 아울러 의료계 안팎에서 제기 되는 인공지능과 관련된 여러 이슈를 본격적으로 다루고 있다. 현재 의 료 인공지능을 둘러싸고 제기되는 다양한 이슈를 대부분 커버했다고 자 신한다. 예를 들어 인공지능으로 인해 의사는 대체될 것인가, 어느 진료 과가 먼저 영향을 받을 것인가, 인공지능을 어떻게 규제하고 효용과 안 전성을 어떻게 증명할 것인가, 의료 사고의 책임은 누가 지는가, 의학 교육은 어떻게 바뀌어야 하는가 등의 이슈를 가능한 쉬운 언어로 깊이 있게 다루려고 노력하였다. 필자는 의료인이나 인공지능 전문가가 아닌, 일반인들이 의료 인공지 능의 최신 동향과 주요 이슈를 이해할 수 있도록 전문적인 용어를 최대 한 배제하고 쉽게 쓰려고 노력했다. 독자들은 이 책에서 다른 곳에서 접 하기 어려웠던 의료 인공지능과 관련한 종합적인 논의를 접하게 될 것 이다. 이 책이 출판된 이후에도 의료 인공지능 분야는 너무나 빨리 발전 하기 때문에 최신 기술이 계속 나올 것이다. 하지만 이 책에서 제시하는 이 분야에 대한 개괄이 의료 인공지능에 접근하고, 기술을 공부하며, 앞 으로 닥쳐올 변화에 대비할 수 있는 토대가 되기를 바란다. 또한 필자의 개인적인 바람이라면 의료인들, 특히 갓 의사 면허를 취득 했거나 현재 의대에 재학 중인 예비 의사들에게 이 졸저가 도움되면 좋 겠다. 인공지능은 필연적으로 미래 의사의 역할 변화를 불러일으킬 것 이며 의료의 패러다임 자체를 뒤바꾸는 근본적인 변화를 일으킬 수도 있다. 하지만 현재의 의료계, 특히 의학 교육은 이 이슈에 제대로 대응 하지 못하고 있다. 이제부터 논의할 많은 내용 중 대부분은 의학 교육의 혁신이 필요하다는 결론으로 귀결된다. 안타깝게도 현재의 젊은 의사들, 그리고 의과대학의 예비 의사들은 샌 드위치 신세이다. 이들은 과거의 교육을 받고서 인공지능과 함께하는 미래를 살아가야 할 것이다. 언젠가는 의학 교육과 트레이닝 방식도 이 러한 변화에 발맞춰 변화할 것이다. 하지만 이들은 그 전에 진료실과 수 술방으로 투입될 운명이다. 무책임하게 들릴 수도 있겠지만, 여러분들 은 이러한 미래를 결국 각자 공부하고 준비하여 각자도생하는 수밖에 없다. ━ 저자 「프롤로그」 중에서 미래의료학자 최윤섭 박사가 제시하는 의료 인공지능의 현재와 미래 의료 딥러닝과 IBM 왓슨의 현주소 인공지능은 의사를 대체하는가 값 20,000원 ISBN 979-11-86269-99-2 미래를 살아가야 할 것이다. 언젠가는 의학 교육과 트레이닝 방식도 이 러한 변화에 발맞춰 변화할 것이다. 하지만 이들은 그 전에 진료실과 수 술방으로 투입될 운명이다. 무책임하게 들릴 수도 있겠지만, 여러분들 은 이러한 미래를 결국 각자 공부하고 준비하여 각자도생하는 수밖에 없다. ━ 저자 「프롤로그」 중에서 소울링, 메디히 자문을 맡아 한 고 있다. 국내 케어 이노베이 을 연재하고 있 와 『그렇게 나는 •블로그_ htt •페이스북_ h •이메일_ yoo More Slides Videos (in Korean)