의료의 미래, 디지털 헬스케어

디지털 헬스케어 파트너스

최윤섭, PhD
“It's in Apple's DNA that technology alone is not enough. 

It's technology married with liberal arts.”
The Convergence of IT, BT and Medicine
최윤섭 지음
의료인공지능
표지디자인•최승협
컴퓨터공학, 생명과학, 의학의 융합을 통해 디지
털 헬스케어 분야의 혁신을 창출하고 사회적 가
치를 만드는 것을 화두로 삼고 있는 융합생명과학자, 미래의료학자,
기업가, 엔젤투자가, 에반젤리스트이다. 국내 디지털 헬스케어 분야
의 대표적인 전문가로, 활발한 연구, 저술 및 강연 등을 통해 국내에
이 분야를 처음 소개한 장본인이다.
포항공과대학교에서 컴퓨터공학과 생명과학을 복수전공하였으며
동 대학원 시스템생명공학부에서 전산생물학으로 이학박사 학위를
취득하였다. 스탠퍼드대학교 방문연구원, 서울의대 암연구소 연구
조교수, KT 종합기술원 컨버전스연구소 팀장, 서울대병원 의생명연
구원 연구조교수 등을 거쳤다. 『사이언스』를 비롯한 세계적인 과학
저널에 10여 편의 논문을 발표했다.
국내 최초로 디지털 헬스케어를 본격적으로 연구하는 연구소인 ‘최
윤섭 디지털 헬스케어 연구소’를 설립하여 소장을 맡고 있다. 또한
국내 유일의 헬스케어 스타트업 전문 엑셀러레이터 ‘디지털 헬스케
어 파트너스’의 공동 창업자 및 대표 파트너로 혁신적인 헬스케어
스타트업을 의료 전문가들과 함께 발굴, 투자, 육성하고 있다. 성균
관대학교 디지털헬스학과 초빙교수로도 재직 중이다.
뷰노, 직토, 3billion, 서지컬마인드, 닥터다이어리, VRAD, 메디히어,
소울링, 메디히어, 모바일닥터 등의 헬스케어 스타트업에 투자하고
자문을 맡아 한국에서도 헬스케어 혁신을 만들어내기 위해 노력하
고 있다. 국내 최초의 디지털 헬스케어 전문 블로그 『최윤섭의 헬스
케어 이노베이션』에 활발하게 집필하고 있으며, 『매일경제』에 칼럼
을 연재하고 있다. 저서로 『헬스케어 이노베이션: 이미 시작된 미래』
와 『그렇게 나는 스스로 기업이 되었다』가 있다.
•블로그_ http://www.yoonsupchoi.com/
•페이스북_ https://www.facebook.com/yoonsup.choi
•이메일_ yoonsup.choi@gmail.com
최윤섭
의료 인공지능은 보수적인 의료 시스템을 재편할 혁신을 일으키고 있다. 의료 인공지능의 빠른 발전과
광범위한 영향은 전문화, 세분화되며 발전해 온 현대 의료 전문가들이 이해하기가 어려우며, 어디서부
터 공부해야 할지도 막연하다. 이런 상황에서 의료 인공지능의 개념과 적용, 그리고 의사와의 관계를 쉽
게 풀어내는 이 책은 좋은 길라잡이가 될 것이다. 특히 미래의 주역이 될 의학도와 젊은 의료인에게 유용
한 소개서이다.
━ 서준범, 서울아산병원 영상의학과 교수, 의료영상인공지능사업단장
인공지능이 의료의 패러다임을 크게 바꿀 것이라는 것에 동의하지 않는 사람은 거의 없다. 하지만 인공
지능이 처리해야 할 의료의 난제는 많으며 그 해결 방안도 천차만별이다. 흔히 생각하는 만병통치약 같
은 의료 인공지능은 존재하지 않는다. 이 책은 다양한 의료 인공지능의 개발, 활용 및 가능성을 균형 있
게 분석하고 있다. 인공지능을 도입하려는 의료인, 생소한 의료 영역에 도전할 인공지능 연구자 모두에
게 일독을 권한다.
━ 정지훈, 경희사이버대 미디어커뮤니케이션학과 선임강의교수, 의사
서울의대 기초의학교육을 책임지고 있는 교수의 입장에서, 산업화 이후 변하지 않은 현재의 의학 교육
으로는 격변하는 인공지능 시대에 의대생을 대비시키지 못한다는 한계를 절실히 느낀다. 저와 함께 의
대 인공지능 교육을 개척하고 있는 최윤섭 소장의 전문적 분석과 미래 지향적 안목이 담긴 책이다. 인공
지능이라는 미래를 대비할 의대생과 교수, 그리고 의대 진학을 고민하는 학생과 학부모에게 추천한다.
━ 최형진, 서울대학교 의과대학 해부학교실 교수, 내과 전문의
최근 의료 인공지능의 도입에 대해서 극단적인 시각과 태도가 공존하고 있다. 이 책은 다양한 사례와 깊
은 통찰을 통해 의료 인공지능의 현황과 미래에 대해 균형적인 시각을 제공하여, 인공지능이 의료에 본
격적으로 도입되기 위한 토론의 장을 마련한다. 의료 인공지능이 일상화된 10년 후 돌아보았을 때, 이 책
이 그런 시대를 이끄는 길라잡이 역할을 하였음을 확인할 수 있기를 기대한다.
━ 정규환, 뷰노 CTO
의료 인공지능은 다른 분야 인공지능보다 더 본질적인 이해가 필요하다. 단순히 인간의 일을 대신하는
수준을 넘어 의학의 패러다임을 데이터 기반으로 변화시키기 때문이다. 따라서 인공지능을 균형있게 이
해하고, 어떻게 의사와 환자에게 도움을 줄 수 있을지 깊은 고민이 필요하다. 세계적으로 일어나고 있는
이러한 노력의 결과물을 집대성한 이 책이 반가운 이유다.
━ 백승욱, 루닛 대표
의료 인공지능의 최신 동향뿐만 아니라, 의의와 한계, 전망, 그리고 다양한 생각거리까지 주는 책이다.
논쟁이 되는 여러 이슈에 대해서도 저자는 자신의 시각을 명확한 근거에 기반하여 설득력 있게 제시하
고 있다. 개인적으로는 이 책을 대학원 수업 교재로 활용하려 한다.
━ 신수용, 성균관대학교 디지털헬스학과 교수
최윤섭지음
의료인공지능
값 20,000원
ISBN 979-11-86269-99-2
미래의료학자 최윤섭 박사가 제시하는
의료 인공지능의 현재와 미래
의료 딥러닝과 IBM 왓슨의 현주소
인공지능은 의사를 대체하는가
값 20,000원
ISBN 979-11-86269-99-2
소울링, 메디히어, 모바일닥터 등의 헬스케어 스타트업에 투자하고
자문을 맡아 한국에서도 헬스케어 혁신을 만들어내기 위해 노력하
고 있다. 국내 최초의 디지털 헬스케어 전문 블로그 『최윤섭의 헬스
케어 이노베이션』에 활발하게 집필하고 있으며, 『매일경제』에 칼럼
을 연재하고 있다. 저서로 『헬스케어 이노베이션: 이미 시작된 미래』
와 『그렇게 나는 스스로 기업이 되었다』가 있다.
•블로그_ http://www.yoonsupchoi.com/
•페이스북_ https://www.facebook.com/yoonsup.choi
•이메일_ yoonsup.choi@gmail.com
(2014) (2018) (2020)
Inevitable Tsunami of Change
https://rockhealth.com/reports/amidst-a-record-3-1b-funding-in-q1-2020-digital-health-braces-for-covid-19-impa
2010 2011 2012 2013 2014 2015 2016 2017 2018
Q1 Q2 Q3 Q4
153
283
476
647
608
568
684
851
765
FUNDING SNAPSHOT: YEAR OVER YEAR
5
Deal Count
$1.4B
$1.7B
$1.7B
$627M
$603M$459M
$8.2B
$6.2B
$7.1B
$2.9B
$2.3B$2.0B
$1.2B
$11.7B
$2.3B
Funding surpassed 2017 numbers by almost $3B, making 2018 the fourth consecutive increase in capital investment and
largest since we began tracking digital health funding in 2010. Deal volume decreased from Q3 to Q4, but deal sizes spiked,
with $3B invested in Q4 alone. Average deal size in 2018 was $21M, a $6M increase from 2017.
$3.0B
$14.6B
DEALS & FUNDING INVESTORS SEGMENT DETAIL
Source: StartUp Health Insights | startuphealth.com/insights Note: Report based on public data through 12/31/18 on seed (incl. accelerator), venture, corporate venture, and private equity funding only. © 2019 StartUp Health LLC
•글로벌 투자 추이를 보더라도, 2018년 역대 최대 규모: $14.6B

•2015년 이후 4년 연속 증가 중
https://hq.startuphealth.com/posts/startup-healths-2018-insights-funding-report-a-record-year-for-digital-health
27
Switzerland
EUROPE
$3.2B
$1.96B $1B
$3.5B
NORTH AMERICA
$12B Valuation
$1.8B
$3.1B$3.2B
$1B
$1B
38 healthcare unicorns valued at $90.7B
Global VC-backed digital health companies with a private market valuation of $1B+ (7/26/19)
UNITED KINGDOM
$1.5B
MIDDLE EAST
$1B Valuation
ISRAEL
$7B
$1B$1.2B
$1B
$1.65B
$1.8B
$1.25B
$2.8B
$1B $1B
$2B Valuation
$1.5B
UNITED STATES
GERMANY
$1.7B
$2.5B
CHINA
ASIA
$3B
$5.5B Valuation
$5B
$2.4B
$2.4B
France
$1.1B $3.5B
$1.6B
$1B
$1B
$1B
$1B
CB Insights, Global Healthcare Reports 2019 2Q
•전 세계적으로 38개의 디지털 헬스케어 유니콘 스타트업 (=기업가치 $1B 이상) 이 있으나, 

•국내에는 하나도 없음
헬스케어
넓은 의미의 건강 관리에는 해당되지만, 

디지털 기술이 적용되지 않고, 전문 의료 영역도 아닌 것

예) 운동, 영양, 수면
디지털 헬스케어
건강 관리 중에 디지털 기술이 사용되는 것

예) 사물인터넷, 인공지능, 3D 프린터, VR/AR
모바일 헬스케어
디지털 헬스케어 중 

모바일 기술이 사용되는 것

예) 스마트폰, 사물인터넷, SNS
개인 유전정보분석
암유전체, 질병위험도, 

보인자, 약물 민감도
웰니스, 조상 분석
의료
질병 예방, 치료, 처방, 관리 

등 전문 의료 영역
원격의료
원격 환자 모니터링
원격진료
전화, 화상, 판독
명상 앱
ADHD 치료 게임

PTSD 치료 VR
디지털 치료제
중독 치료 앱
헬스케어 관련 분야 구성도
EDITORIAL OPEN
Digital medicine, on its way to being just plain medicine
npj Digital Medicine (2018)1:20175 ; doi:10.1038/
s41746-017-0005-1
There are already nearly 30,000 peer-reviewed English-language
scientific journals, producing an estimated 2.5 million articles a year.1
So why another, and why one focused specifically on digital
medicine?
To answer that question, we need to begin by defining what
“digital medicine” means: using digital tools to upgrade the
practice of medicine to one that is high-definition and far more
individualized. It encompasses our ability to digitize human beings
using biosensors that track our complex physiologic systems, but
also the means to process the vast data generated via algorithms,
cloud computing, and artificial intelligence. It has the potential to
democratize medicine, with smartphones as the hub, enabling
each individual to generate their own real world data and being
far more engaged with their health. Add to this new imaging
tools, mobile device laboratory capabilities, end-to-end digital
clinical trials, telemedicine, and one can see there is a remarkable
array of transformative technology which lays the groundwork for
a new form of healthcare.
As is obvious by its definition, the far-reaching scope of digital
medicine straddles many and widely varied expertise. Computer
scientists, healthcare providers, engineers, behavioral scientists,
ethicists, clinical researchers, and epidemiologists are just some of
the backgrounds necessary to move the field forward. But to truly
accelerate the development of digital medicine solutions in health
requires the collaborative and thoughtful interaction between
individuals from several, if not most of these specialties. That is the
primary goal of npj Digital Medicine: to serve as a cross-cutting
resource for everyone interested in this area, fostering collabora-
tions and accelerating its advancement.
Current systems of healthcare face multiple insurmountable
challenges. Patients are not receiving the kind of care they want
and need, caregivers are dissatisfied with their role, and in most
countries, especially the United States, the cost of care is
unsustainable. We are confident that the development of new
systems of care that take full advantage of the many capabilities
that digital innovations bring can address all of these major issues.
Researchers too, can take advantage of these leading-edge
technologies as they enable clinical research to break free of the
confines of the academic medical center and be brought into the
real world of participants’ lives. The continuous capture of multiple
interconnected streams of data will allow for a much deeper
refinement of our understanding and definition of most pheno-
types, with the discovery of novel signals in these enormous data
sets made possible only through the use of machine learning.
Our enthusiasm for the future of digital medicine is tempered by
the recognition that presently too much of the publicized work in
this field is characterized by irrational exuberance and excessive
hype. Many technologies have yet to be formally studied in a
clinical setting, and for those that have, too many began and
ended with an under-powered pilot program. In addition, there are
more than a few examples of digital “snake oil” with substantial
uptake prior to their eventual discrediting.2
Both of these practices
are barriers to advancing the field of digital medicine.
Our vision for npj Digital Medicine is to provide a reliable,
evidence-based forum for all clinicians, researchers, and even
patients, curious about how digital technologies can transform
every aspect of health management and care. Being open source,
as all medical research should be, allows for the broadest possible
dissemination, which we will strongly encourage, including
through advocating for the publication of preprints
And finally, quite paradoxically, we hope that npj Digital
Medicine is so successful that in the coming years there will no
longer be a need for this journal, or any journal specifically
focused on digital medicine. Because if we are able to meet our
primary goal of accelerating the advancement of digital medicine,
then soon, we will just be calling it medicine. And there are
already several excellent journals for that.
ACKNOWLEDGEMENTS
Supported by the National Institutes of Health (NIH)/National Center for Advancing
Translational Sciences grant UL1TR001114 and a grant from the Qualcomm Foundation.
ADDITIONAL INFORMATION
Competing interests:The authors declare no competing financial interests.
Publisher's note:Springer Nature remains neutral with regard to jurisdictional claims
in published maps and institutional affiliations.
Change history:The original version of this Article had an incorrect Article number
of 5 and an incorrect Publication year of 2017. These errors have now been corrected
in the PDF and HTML versions of the Article.
Steven R. Steinhubl1
and Eric J. Topol1
1
Scripps Translational Science Institute, 3344 North Torrey Pines
Court, Suite 300, La Jolla, CA 92037, USA
Correspondence: Steven R. Steinhubl (steinhub@scripps.edu) or
Eric J. Topol (etopol@scripps.edu)
REFERENCES
1. Ware, M. & Mabe, M. The STM report: an overview of scientific and scholarly journal
publishing 2015 [updated March]. http://digitalcommons.unl.edu/scholcom/92017
(2015).
2. Plante, T. B., Urrea, B. & MacFarlane, Z. T. et al. Validation of the instant blood
pressure smartphone App. JAMA Intern. Med. 176, 700–702 (2016).
Open Access This article is licensed under a Creative Commons
Attribution 4.0 International License, which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give
appropriate credit to the original author(s) and the source, provide a link to the Creative
Commons license, and indicate if changes were made. The images or other third party
material in this article are included in the article’s Creative Commons license, unless
indicated otherwise in a credit line to the material. If material is not included in the
article’s Creative Commons license and your intended use is not permitted by statutory
regulation or exceeds the permitted use, you will need to obtain permission directly
from the copyright holder. To view a copy of this license, visit http://creativecommons.
org/licenses/by/4.0/.
© The Author(s) 2018
Received: 19 October 2017 Accepted: 25 October 2017
www.nature.com/npjdigitalmed
Published in partnership with the Scripps Translational Science Institute
디지털 의료의 미래는?

일상적인 의료가 되는 것
What is most important factor in digital medicine?
“Data! Data! Data!” he cried.“I can’t
make bricks without clay!”
- Sherlock Holmes,“The Adventure of the Copper Beeches”
새로운 데이터가

새로운 방식으로

새로운 주체에 의해

측정, 저장, 통합, 분석된다.
데이터의 종류

데이터의 질적/양적 측면
웨어러블 기기

스마트폰

유전 정보 분석

인공지능

SNS
사용자/환자

대중
디지털 헬스케어의 3단계
•Step 1. 데이터의 측정

•Step 2. 데이터의 통합

•Step 3. 데이터의 분석
LETTER https://doi.org/10.1038/s41586-019-1390-1
A clinically applicable approach to continuous
prediction of future acute kidney injury
Nenad Tomašev1
*, Xavier Glorot1
, Jack W. Rae1,2
, Michal Zielinski1
, Harry Askham1
, Andre Saraiva1
, Anne Mottram1
,
Clemens Meyer1
, Suman Ravuri1
, Ivan Protsyuk1
, Alistair Connell1
, Cían O. Hughes1
, Alan Karthikesalingam1
,
Julien Cornebise1,12
, Hugh Montgomery3
, Geraint Rees4
, Chris Laing5
, Clifton R. Baker6
, Kelly Peterson7,8
, Ruth Reeves9
,
Demis Hassabis1
, Dominic King1
, Mustafa Suleyman1
, Trevor Back1,13
, Christopher Nielson10,11,13
, Joseph R. Ledsam1,13
* &
Shakir Mohamed1,13
The early prediction of deterioration could have an important role
in supporting healthcare professionals, as an estimated 11% of
deaths in hospital follow a failure to promptly recognize and treat
deteriorating patients1
. To achieve this goal requires predictions
of patient risk that are continuously updated and accurate, and
delivered at an individual level with sufficient context and enough
time to act. Here we develop a deep learning approach for the
continuous risk prediction of future deterioration in patients,
building on recent work that models adverse events from electronic
health records2–17
and using acute kidney injury—a common and
potentially life-threatening condition18
—as an exemplar. Our
model was developed on a large, longitudinal dataset of electronic
health records that cover diverse clinical environments, comprising
703,782 adult patients across 172 inpatient and 1,062 outpatient
sites. Our model predicts 55.8% of all inpatient episodes of acute
kidney injury, and 90.2% of all acute kidney injuries that required
subsequent administration of dialysis, with a lead time of up to
48 h and a ratio of 2 false alerts for every true alert. In addition
to predicting future acute kidney injury, our model provides
confidence assessments and a list of the clinical features that are most
salient to each prediction, alongside predicted future trajectories
for clinically relevant blood tests9
. Although the recognition and
prompt treatment of acute kidney injury is known to be challenging,
our approach may offer opportunities for identifying patients at risk
within a time window that enables early treatment.
Adverse events and clinical complications are a major cause of mor-
tality and poor outcomes in patients, and substantial effort has been
made to improve their recognition18,19
. Few predictors have found their
way into routine clinical practice, because they either lack effective
sensitivity and specificity or report damage that already exists20
. One
example relates to acute kidney injury (AKI), a potentially life-threat-
ening condition that affects approximately one in five inpatient admis-
sions in the United States21
. Although a substantial proportion of cases
of AKI are thought to be preventable with early treatment22
, current
algorithms for detecting AKI depend on changes in serum creatinine
as a marker of acute decline in renal function. Increases in serum cre-
atinine lag behind renal injury by a considerable period, which results
in delayed access to treatment. This supports a case for preventative
‘screening’-type alerts but there is no evidence that current rule-based
alerts improve outcomes23
. For predictive alerts to be effective, they
must empower clinicians to act before a major clinical decline has
occurred by: (i) delivering actionable insights on preventable condi-
tions; (ii) being personalized for specific patients; (iii) offering suffi-
cient contextual information to inform clinical decision-making; and
(iv) being generally applicable across populations of patients24
.
Promising recent work on modelling adverse events from electronic
health records2–17
suggests that the incorporation of machine learning
may enable the early prediction of AKI. Existing examples of sequential
AKI risk models have either not demonstrated a clinically applicable
level of predictive performance25
or have focused on predictions across
a short time horizon that leaves little time for clinical assessment and
intervention26
.
Our proposed system is a recurrent neural network that operates
sequentially over individual electronic health records, processing the
data one step at a time and building an internal memory that keeps
track of relevant information seen up to that point. At each time point,
the model outputs a probability of AKI occurring at any stage of sever-
ity within the next 48 h (although our approach can be extended to
other time windows or severities of AKI; see Extended Data Table 1).
When the predicted probability exceeds a specified operating-point
threshold, the prediction is considered positive. This model was trained
using data that were curated from a multi-site retrospective dataset of
703,782 adult patients from all available sites at the US Department of
Veterans Affairs—the largest integrated healthcare system in the United
States. The dataset consisted of information that was available from
hospital electronic health records in digital format. The total number of
independent entries in the dataset was approximately 6 billion, includ-
ing 620,000 features. Patients were randomized across training (80%),
validation (5%), calibration (5%) and test (10%) sets. A ground-truth
label for the presence of AKI at any given point in time was added
using the internationally accepted ‘Kidney Disease: Improving Global
Outcomes’ (KDIGO) criteria18
; the incidence of KDIGO AKI was
13.4% of admissions. Detailed descriptions of the model and dataset
are provided in the Methods and Extended Data Figs. 1–3.
Figure 1 shows the use of our model. At every point throughout an
admission, the model provides updated estimates of future AKI risk
along with an associated degree of uncertainty. Providing the uncer-
tainty associated with a prediction may help clinicians to distinguish
ambiguous cases from those predictions that are fully supported by the
available data. Identifying an increased risk of future AKI sufficiently
far in advance is critical, as longer lead times may enable preventative
action to be taken. This is possible even when clinicians may not be
actively intervening with, or monitoring, a patient. Supplementary
Information section A provides more examples of the use of the model.
With our approach, 55.8% of inpatient AKI events of any severity
were predicted early, within a window of up to 48 h in advance and with
a ratio of 2 false predictions for every true positive. This corresponds
to an area under the receiver operating characteristic curve of 92.1%,
and an area under the precision–recall curve of 29.7%. When set at this
threshold, our predictive model would—if operationalized—trigger a
1
DeepMind, London, UK. 2
CoMPLEX, Computer Science, University College London, London, UK. 3
Institute for Human Health and Performance, University College London, London, UK. 4
Institute of
Cognitive Neuroscience, University College London, London, UK. 5
University College London Hospitals, London, UK. 6
Department of Veterans Affairs, Denver, CO, USA. 7
VA Salt Lake City Healthcare
System, Salt Lake City, UT, USA. 8
Division of Epidemiology, University of Utah, Salt Lake City, UT, USA. 9
Department of Veterans Affairs, Nashville, TN, USA. 10
University of Nevada School of
Medicine, Reno, NV, USA. 11
Department of Veterans Affairs, Salt Lake City, UT, USA. 12
Present address: University College London, London, UK. 13
These authors contributed equally: Trevor Back,
Christopher Nielson, Joseph R. Ledsam, Shakir Mohamed. *e-mail: nenadt@google.com; jledsam@google.com
1 1 6 | N A T U R E | V O L 5 7 2 | 1 A U G U S T 2 0 1 9
Copyright 2016 American Medical Association. All rights reserved.
Development and Validation of a Deep Learning Algorithm
for Detection of Diabetic Retinopathy
in Retinal Fundus Photographs
Varun Gulshan, PhD; Lily Peng, MD, PhD; Marc Coram, PhD; Martin C. Stumpe, PhD; Derek Wu, BS; Arunachalam Narayanaswamy, PhD;
Subhashini Venugopalan, MS; Kasumi Widner, MS; Tom Madams, MEng; Jorge Cuadros, OD, PhD; Ramasamy Kim, OD, DNB;
Rajiv Raman, MS, DNB; Philip C. Nelson, BS; Jessica L. Mega, MD, MPH; Dale R. Webster, PhD
IMPORTANCE Deep learning is a family of computational methods that allow an algorithm to
program itself by learning from a large set of examples that demonstrate the desired
behavior, removing the need to specify rules explicitly. Application of these methods to
medical imaging requires further assessment and validation.
OBJECTIVE To apply deep learning to create an algorithm for automated detection of diabetic
retinopathy and diabetic macular edema in retinal fundus photographs.
DESIGN AND SETTING A specific type of neural network optimized for image classification
called a deep convolutional neural network was trained using a retrospective development
data set of 128 175 retinal images, which were graded 3 to 7 times for diabetic retinopathy,
diabetic macular edema, and image gradability by a panel of 54 US licensed ophthalmologists
and ophthalmology senior residents between May and December 2015. The resultant
algorithm was validated in January and February 2016 using 2 separate data sets, both
graded by at least 7 US board-certified ophthalmologists with high intragrader consistency.
EXPOSURE Deep learning–trained algorithm.
MAIN OUTCOMES AND MEASURES The sensitivity and specificity of the algorithm for detecting
referable diabetic retinopathy (RDR), defined as moderate and worse diabetic retinopathy,
referable diabetic macular edema, or both, were generated based on the reference standard
of the majority decision of the ophthalmologist panel. The algorithm was evaluated at 2
operating points selected from the development set, one selected for high specificity and
another for high sensitivity.
RESULTS TheEyePACS-1datasetconsistedof9963imagesfrom4997patients(meanage,54.4
years;62.2%women;prevalenceofRDR,683/8878fullygradableimages[7.8%]);the
Messidor-2datasethad1748imagesfrom874patients(meanage,57.6years;42.6%women;
prevalenceofRDR,254/1745fullygradableimages[14.6%]).FordetectingRDR,thealgorithm
hadanareaunderthereceiveroperatingcurveof0.991(95%CI,0.988-0.993)forEyePACS-1and
0.990(95%CI,0.986-0.995)forMessidor-2.Usingthefirstoperatingcutpointwithhigh
specificity,forEyePACS-1,thesensitivitywas90.3%(95%CI,87.5%-92.7%)andthespecificity
was98.1%(95%CI,97.8%-98.5%).ForMessidor-2,thesensitivitywas87.0%(95%CI,81.1%-
91.0%)andthespecificitywas98.5%(95%CI,97.7%-99.1%).Usingasecondoperatingpoint
withhighsensitivityinthedevelopmentset,forEyePACS-1thesensitivitywas97.5%and
specificitywas93.4%andforMessidor-2thesensitivitywas96.1%andspecificitywas93.9%.
CONCLUSIONS AND RELEVANCE In this evaluation of retinal fundus photographs from adults
with diabetes, an algorithm based on deep machine learning had high sensitivity and
specificity for detecting referable diabetic retinopathy. Further research is necessary to
determine the feasibility of applying this algorithm in the clinical setting and to determine
whether use of the algorithm could lead to improved care and outcomes compared with
current ophthalmologic assessment.
JAMA. doi:10.1001/jama.2016.17216
Published online November 29, 2016.
Editorial
Supplemental content
Author Affiliations: Google Inc,
Mountain View, California (Gulshan,
Peng, Coram, Stumpe, Wu,
Narayanaswamy, Venugopalan,
Widner, Madams, Nelson, Webster);
Department of Computer Science,
University of Texas, Austin
(Venugopalan); EyePACS LLC,
San Jose, California (Cuadros); School
of Optometry, Vision Science
Graduate Group, University of
California, Berkeley (Cuadros);
Aravind Medical Research
Foundation, Aravind Eye Care
System, Madurai, India (Kim); Shri
Bhagwan Mahavir Vitreoretinal
Services, Sankara Nethralaya,
Chennai, Tamil Nadu, India (Raman);
Verily Life Sciences, Mountain View,
California (Mega); Cardiovascular
Division, Department of Medicine,
Brigham and Women’s Hospital and
Harvard Medical School, Boston,
Massachusetts (Mega).
Corresponding Author: Lily Peng,
MD, PhD, Google Research, 1600
Amphitheatre Way, Mountain View,
CA 94043 (lhpeng@google.com).
Research
JAMA | Original Investigation | INNOVATIONS IN HEALTH CARE DELIVERY
(Reprinted) E1
Copyright 2016 American Medical Association. All rights reserved.
Downloaded From: http://jamanetwork.com/ on 12/02/2016
안과
LETTERS
https://doi.org/10.1038/s41591-018-0335-9
1
Guangzhou Women and Children’s Medical Center, Guangzhou Medical University, Guangzhou, China. 2
Institute for Genomic Medicine, Institute of
Engineering in Medicine, and Shiley Eye Institute, University of California, San Diego, La Jolla, CA, USA. 3
Hangzhou YITU Healthcare Technology Co. Ltd,
Hangzhou, China. 4
Department of Thoracic Surgery/Oncology, First Affiliated Hospital of Guangzhou Medical University, China State Key Laboratory and
National Clinical Research Center for Respiratory Disease, Guangzhou, China. 5
Guangzhou Kangrui Co. Ltd, Guangzhou, China. 6
Guangzhou Regenerative
Medicine and Health Guangdong Laboratory, Guangzhou, China. 7
Veterans Administration Healthcare System, San Diego, CA, USA. 8
These authors contributed
equally: Huiying Liang, Brian Tsui, Hao Ni, Carolina C. S. Valentim, Sally L. Baxter, Guangjian Liu. *e-mail: kang.zhang@gmail.com; xiahumin@hotmail.com
Artificial intelligence (AI)-based methods have emerged as
powerful tools to transform medical care. Although machine
learning classifiers (MLCs) have already demonstrated strong
performance in image-based diagnoses, analysis of diverse
and massive electronic health record (EHR) data remains chal-
lenging. Here, we show that MLCs can query EHRs in a manner
similar to the hypothetico-deductive reasoning used by physi-
cians and unearth associations that previous statistical meth-
ods have not found. Our model applies an automated natural
language processing system using deep learning techniques
to extract clinically relevant information from EHRs. In total,
101.6 million data points from 1,362,559 pediatric patient
visits presenting to a major referral center were analyzed to
train and validate the framework. Our model demonstrates
high diagnostic accuracy across multiple organ systems and is
comparable to experienced pediatricians in diagnosing com-
mon childhood diseases. Our study provides a proof of con-
cept for implementing an AI-based system as a means to aid
physicians in tackling large amounts of data, augmenting diag-
nostic evaluations, and to provide clinical decision support in
cases of diagnostic uncertainty or complexity. Although this
impact may be most evident in areas where healthcare provid-
ers are in relative shortage, the benefits of such an AI system
are likely to be universal.
Medical information has become increasingly complex over
time. The range of disease entities, diagnostic testing and biomark-
ers, and treatment modalities has increased exponentially in recent
years. Subsequently, clinical decision-making has also become more
complex and demands the synthesis of decisions from assessment
of large volumes of data representing clinical information. In the
current digital age, the electronic health record (EHR) represents a
massive repository of electronic data points representing a diverse
array of clinical information1–3
. Artificial intelligence (AI) methods
have emerged as potentially powerful tools to mine EHR data to aid
in disease diagnosis and management, mimicking and perhaps even
augmenting the clinical decision-making of human physicians1
.
To formulate a diagnosis for any given patient, physicians fre-
quently use hypotheticodeductive reasoning. Starting with the chief
complaint, the physician then asks appropriately targeted questions
relating to that complaint. From this initial small feature set, the
physician forms a differential diagnosis and decides what features
(historical questions, physical exam findings, laboratory testing,
and/or imaging studies) to obtain next in order to rule in or rule
out the diagnoses in the differential diagnosis set. The most use-
ful features are identified, such that when the probability of one of
the diagnoses reaches a predetermined level of acceptability, the
process is stopped, and the diagnosis is accepted. It may be pos-
sible to achieve an acceptable level of certainty of the diagnosis with
only a few features without having to process the entire feature set.
Therefore, the physician can be considered a classifier of sorts.
In this study, we designed an AI-based system using machine
learning to extract clinically relevant features from EHR notes to
mimic the clinical reasoning of human physicians. In medicine,
machine learning methods have already demonstrated strong per-
formance in image-based diagnoses, notably in radiology2
, derma-
tology4
, and ophthalmology5–8
, but analysis of EHR data presents
a number of difficult challenges. These challenges include the vast
quantity of data, high dimensionality, data sparsity, and deviations
Evaluation and accurate diagnoses of pediatric
diseases using artificial intelligence
Huiying Liang1,8
, Brian Y. Tsui 2,8
, Hao Ni3,8
, Carolina C. S. Valentim4,8
, Sally L. Baxter 2,8
,
Guangjian Liu1,8
, Wenjia Cai 2
, Daniel S. Kermany1,2
, Xin Sun1
, Jiancong Chen2
, Liya He1
, Jie Zhu1
,
Pin Tian2
, Hua Shao2
, Lianghong Zheng5,6
, Rui Hou5,6
, Sierra Hewett1,2
, Gen Li1,2
, Ping Liang3
,
Xuan Zang3
, Zhiqi Zhang3
, Liyan Pan1
, Huimin Cai5,6
, Rujuan Ling1
, Shuhua Li1
, Yongwang Cui1
,
Shusheng Tang1
, Hong Ye1
, Xiaoyan Huang1
, Waner He1
, Wenqing Liang1
, Qing Zhang1
, Jianmin Jiang1
,
Wei Yu1
, Jianqun Gao1
, Wanxing Ou1
, Yingmin Deng1
, Qiaozhen Hou1
, Bei Wang1
, Cuichan Yao1
,
Yan Liang1
, Shu Zhang1
, Yaou Duan2
, Runze Zhang2
, Sarah Gibson2
, Charlotte L. Zhang2
, Oulan Li2
,
Edward D. Zhang2
, Gabriel Karin2
, Nathan Nguyen2
, Xiaokang Wu1,2
, Cindy Wen2
, Jie Xu2
, Wenqin Xu2
,
Bochu Wang2
, Winston Wang2
, Jing Li1,2
, Bianca Pizzato2
, Caroline Bao2
, Daoman Xiang1
, Wanting He1,2
,
Suiqin He2
, Yugui Zhou1,2
, Weldon Haw2,7
, Michael Goldbaum2
, Adriana Tremoulet2
, Chun-Nan Hsu 2
,
Hannah Carter2
, Long Zhu3
, Kang Zhang 1,2,7
* and Huimin Xia 1
*
NATURE MEDICINE | www.nature.com/naturemedicine
소아청소년과
ARTICLES
https://doi.org/10.1038/s41591-018-0177-5
1
Applied Bioinformatics Laboratories, New York University School of Medicine, New York, NY, USA. 2
Skirball Institute, Department of Cell Biology,
New York University School of Medicine, New York, NY, USA. 3
Department of Pathology, New York University School of Medicine, New York, NY, USA.
4
School of Mechanical Engineering, National Technical University of Athens, Zografou, Greece. 5
Institute for Systems Genetics, New York University School
of Medicine, New York, NY, USA. 6
Department of Biochemistry and Molecular Pharmacology, New York University School of Medicine, New York, NY,
USA. 7
Center for Biospecimen Research and Development, New York University, New York, NY, USA. 8
Department of Population Health and the Center for
Healthcare Innovation and Delivery Science, New York University School of Medicine, New York, NY, USA. 9
These authors contributed equally to this work:
Nicolas Coudray, Paolo Santiago Ocampo. *e-mail: narges.razavian@nyumc.org; aristotelis.tsirigos@nyumc.org
A
ccording to the American Cancer Society and the Cancer
Statistics Center (see URLs), over 150,000 patients with lung
cancer succumb to the disease each year (154,050 expected
for 2018), while another 200,000 new cases are diagnosed on a
yearly basis (234,030 expected for 2018). It is one of the most widely
spread cancers in the world because of not only smoking, but also
exposure to toxic chemicals like radon, asbestos and arsenic. LUAD
and LUSC are the two most prevalent types of non–small cell lung
cancer1
, and each is associated with discrete treatment guidelines. In
the absence of definitive histologic features, this important distinc-
tion can be challenging and time-consuming, and requires confir-
matory immunohistochemical stains.
Classification of lung cancer type is a key diagnostic process
because the available treatment options, including conventional
chemotherapy and, more recently, targeted therapies, differ for
LUAD and LUSC2
. Also, a LUAD diagnosis will prompt the search
for molecular biomarkers and sensitizing mutations and thus has
a great impact on treatment options3,4
. For example, epidermal
growth factor receptor (EGFR) mutations, present in about 20% of
LUAD, and anaplastic lymphoma receptor tyrosine kinase (ALK)
rearrangements, present in<5% of LUAD5
, currently have tar-
geted therapies approved by the Food and Drug Administration
(FDA)6,7
. Mutations in other genes, such as KRAS and tumor pro-
tein P53 (TP53) are very common (about 25% and 50%, respec-
tively) but have proven to be particularly challenging drug targets
so far5,8
. Lung biopsies are typically used to diagnose lung cancer
type and stage. Virtual microscopy of stained images of tissues is
typically acquired at magnifications of 20×to 40×, generating very
large two-dimensional images (10,000 to>100,000 pixels in each
dimension) that are oftentimes challenging to visually inspect in
an exhaustive manner. Furthermore, accurate interpretation can be
difficult, and the distinction between LUAD and LUSC is not always
clear, particularly in poorly differentiated tumors; in this case, ancil-
lary studies are recommended for accurate classification9,10
. To assist
experts, automatic analysis of lung cancer whole-slide images has
been recently studied to predict survival outcomes11
and classifica-
tion12
. For the latter, Yu et al.12
combined conventional thresholding
and image processing techniques with machine-learning methods,
such as random forest classifiers, support vector machines (SVM) or
Naive Bayes classifiers, achieving an AUC of ~0.85 in distinguishing
normal from tumor slides, and ~0.75 in distinguishing LUAD from
LUSC slides. More recently, deep learning was used for the classi-
fication of breast, bladder and lung tumors, achieving an AUC of
0.83 in classification of lung tumor types on tumor slides from The
Cancer Genome Atlas (TCGA)13
. Analysis of plasma DNA values
was also shown to be a good predictor of the presence of non–small
cell cancer, with an AUC of ~0.94 (ref. 14
) in distinguishing LUAD
from LUSC, whereas the use of immunochemical markers yields an
AUC of ~0.94115
.
Here, we demonstrate how the field can further benefit from deep
learning by presenting a strategy based on convolutional neural
networks (CNNs) that not only outperforms methods in previously
Classification and mutation prediction from
non–small cell lung cancer histopathology
images using deep learning
Nicolas Coudray 1,2,9
, Paolo Santiago Ocampo3,9
, Theodore Sakellaropoulos4
, Navneet Narula3
,
Matija Snuderl3
, David Fenyö5,6
, Andre L. Moreira3,7
, Narges Razavian 8
* and Aristotelis Tsirigos 1,3
*
Visual inspection of histopathology slides is one of the main methods used by pathologists to assess the stage, type and sub-
type of lung tumors. Adenocarcinoma (LUAD) and squamous cell carcinoma (LUSC) are the most prevalent subtypes of lung
cancer, and their distinction requires visual inspection by an experienced pathologist. In this study, we trained a deep con-
volutional neural network (inception v3) on whole-slide images obtained from The Cancer Genome Atlas to accurately and
automatically classify them into LUAD, LUSC or normal lung tissue. The performance of our method is comparable to that of
pathologists, with an average area under the curve (AUC) of 0.97. Our model was validated on independent datasets of frozen
tissues, formalin-fixed paraffin-embedded tissues and biopsies. Furthermore, we trained the network to predict the ten most
commonly mutated genes in LUAD. We found that six of them—STK11, EGFR, FAT1, SETBP1, KRAS and TP53—can be pre-
dicted from pathology images, with AUCs from 0.733 to 0.856 as measured on a held-out population. These findings suggest
that deep-learning models can assist pathologists in the detection of cancer subtype or gene mutations. Our approach can be
applied to any cancer type, and the code is available at https://github.com/ncoudray/DeepPATH.
NATURE MEDICINE | www.nature.com/naturemedicine
병리과병리과병리과병리과병리과병리과
ARTICLES
https://doi.org/10.1038/s41551-018-0301-3
1
Sichuan Academy of Medical Sciences & Sichuan Provincial People’s Hospital, Chengdu, China. 2
Shanghai Wision AI Co., Ltd, Shanghai, China. 3
Beth Israel
Deaconess Medical Center and Harvard Medical School, Center for Advanced Endoscopy, Boston , MA, USA. *e-mail: gary.samsph@gmail.com
C
olonoscopy is the gold-standard screening test for colorectal
cancer1–3
, one of the leading causes of cancer death in both the
United States4,5
and China6
. Colonoscopy can reduce the risk
of death from colorectal cancer through the detection of tumours
at an earlier, more treatable stage as well as through the removal of
precancerous adenomas3,7
. Conversely, failure to detect adenomas
may lead to the development of interval cancer. Evidence has shown
that each 1.0% increase in adenoma detection rate (ADR) leads to a
3.0% decrease in the risk of interval colorectal cancer8
.
Although more than 14million colonoscopies are performed
in the United States annually2
, the adenoma miss rate (AMR) is
estimated to be 6–27%9
. Certain polyps may be missed more fre-
quently, including smaller polyps10,11
, flat polyps12
and polyps in the
left colon13
. There are two independent reasons why a polyp may
be missed during colonoscopy: (i) it was never in the visual field or
(ii) it was in the visual field but not recognized. Several hardware
innovations have sought to address the first problem by improv-
ing visualization of the colonic lumen, for instance by providing a
larger, panoramic camera view, or by flattening colonic folds using a
distal-cap attachment. The problem of unrecognized polyps within
the visual field has been more difficult to address14
. Several studies
have shown that observation of the video monitor by either nurses
or gastroenterology trainees may increase polyp detection by up
to 30%15–17
. Ideally, a real-time automatic polyp-detection system
could serve as a similarly effective second observer that could draw
the endoscopist’s eye, in real time, to concerning lesions, effec-
tively creating an ‘extra set of eyes’ on all aspects of the video data
with fidelity. Although automatic polyp detection in colonoscopy
videos has been an active research topic for the past 20 years, per-
formance levels close to that of the expert endoscopist18–20
have not
been achieved. Early work in automatic polyp detection has focused
on applying deep-learning techniques to polyp detection, but most
published works are small in scale, with small development and/or
training validation sets19,20
.
Here, we report the development and validation of a deep-learn-
ing algorithm, integrated with a multi-threaded processing system,
for the automatic detection of polyps during colonoscopy. We vali-
dated the system in two image studies and two video studies. Each
study contained two independent validation datasets.
Results
We developed a deep-learning algorithm using 5,545colonoscopy
images from colonoscopy reports of 1,290patients that underwent
a colonoscopy examination in the Endoscopy Center of Sichuan
Provincial People’s Hospital between January 2007 and December
2015. Out of the 5,545images used, 3,634images contained polyps
(65.54%) and 1,911 images did not contain polyps (34.46%). For
algorithm training, experienced endoscopists annotated the pres-
ence of each polyp in all of the images in the development data-
set. We validated the algorithm on four independent datasets.
DatasetsA and B were used for image analysis, and datasetsC and D
were used for video analysis.
DatasetA contained 27,113colonoscopy images from colo-
noscopy reports of 1,138consecutive patients who underwent a
colonoscopy examination in the Endoscopy Center of Sichuan
Provincial People’s Hospital between January and December 2016
and who were found to have at least one polyp. Out of the 27,113
images, 5,541images contained polyps (20.44%) and 21,572images
did not contain polyps (79.56%). All polyps were confirmed histo-
logically after biopsy. DatasetB is a public database (CVC-ClinicDB;
Development and validation of a deep-learning
algorithm for the detection of polyps during
colonoscopy
Pu Wang1
, Xiao Xiao2
, Jeremy R. Glissen Brown3
, Tyler M. Berzin 3
, Mengtian Tu1
, Fei Xiong1
,
Xiao Hu1
, Peixi Liu1
, Yan Song1
, Di Zhang1
, Xue Yang1
, Liangping Li1
, Jiong He2
, Xin Yi2
, Jingjia Liu2
and
Xiaogang Liu 1
*
The detection and removal of precancerous polyps via colonoscopy is the gold standard for the prevention of colon cancer.
However, the detection rate of adenomatous polyps can vary significantly among endoscopists. Here, we show that a machine-
learningalgorithmcandetectpolypsinclinicalcolonoscopies,inrealtimeandwithhighsensitivityandspecificity.Wedeveloped
the deep-learning algorithm by using data from 1,290 patients, and validated it on newly collected 27,113 colonoscopy images
from 1,138 patients with at least one detected polyp (per-image-sensitivity, 94.38%; per-image-specificity, 95.92%; area under
the receiver operating characteristic curve, 0.984), on a public database of 612 polyp-containing images (per-image-sensitiv-
ity, 88.24%), on 138 colonoscopy videos with histologically confirmed polyps (per-image-sensitivity of 91.64%; per-polyp-sen-
sitivity, 100%), and on 54 unaltered full-range colonoscopy videos without polyps (per-image-specificity, 95.40%). By using a
multi-threaded processing system, the algorithm can process at least 25 frames per second with a latency of 76.80±5.60ms
in real-time video analysis. The software may aid endoscopists while performing colonoscopies, and help assess differences in
polyp and adenoma detection performance among endoscopists.
NATURE BIOMEDICA L ENGINEERING | VOL 2 | OCTOBER 2018 | 741–748 | www.nature.com/natbiomedeng 741
소화기내과
1Wang P, et al. Gut 2019;0:1–7. doi:10.1136/gutjnl-2018-317500
Endoscopy
ORIGINAL ARTICLE
Real-time automatic detection system increases
colonoscopic polyp and adenoma detection rates: a
prospective randomised controlled study
Pu Wang,  1
Tyler M Berzin,  2
Jeremy Romek Glissen Brown,  2
Shishira Bharadwaj,2
Aymeric Becq,2
Xun Xiao,1
Peixi Liu,1
Liangping Li,1
Yan Song,1
Di Zhang,1
Yi Li,1
Guangre Xu,1
Mengtian Tu,1
Xiaogang Liu  1
To cite: Wang P, Berzin TM,
Glissen Brown JR, et al. Gut
Epub ahead of print: [please
include Day Month Year].
doi:10.1136/
gutjnl-2018-317500
► Additional material is
published online only.To view
please visit the journal online
(http://dx.doi.org/10.1136/
gutjnl-2018-317500).
1
Department of
Gastroenterology, Sichuan
Academy of Medical Sciences
& Sichuan Provincial People’s
Hospital, Chengdu, China
2
Center for Advanced
Endoscopy, Beth Israel
Deaconess Medical Center and
Harvard Medical School, Boston,
Massachusetts, USA
Correspondence to
Xiaogang Liu, Department
of Gastroenterology Sichuan
Academy of Medical Sciences
and Sichuan Provincial People’s
Hospital, Chengdu, China;
Gary.samsph@gmail.com
Received 30 August 2018
Revised 4 February 2019
Accepted 13 February 2019
© Author(s) (or their
employer(s)) 2019. Re-use
permitted under CC BY-NC. No
commercial re-use. See rights
and permissions. Published
by BMJ.
ABSTRACT
Objective The effect of colonoscopy on colorectal
cancer mortality is limited by several factors, among them
a certain miss rate, leading to limited adenoma detection
rates (ADRs).We investigated the effect of an automatic
polyp detection system based on deep learning on polyp
detection rate and ADR.
Design In an open, non-blinded trial, consecutive
patients were prospectively randomised to undergo
diagnostic colonoscopy with or without assistance of a
real-time automatic polyp detection system providing
a simultaneous visual notice and sound alarm on polyp
detection.The primary outcome was ADR.
Results Of 1058 patients included, 536 were
randomised to standard colonoscopy, and 522 were
randomised to colonoscopy with computer-aided
diagnosis.The artificial intelligence (AI) system
significantly increased ADR (29.1%vs20.3%, p<0.001)
and the mean number of adenomas per patient
(0.53vs0.31, p<0.001).This was due to a higher number
of diminutive adenomas found (185vs102; p<0.001),
while there was no statistical difference in larger
adenomas (77vs58, p=0.075). In addition, the number
of hyperplastic polyps was also significantly increased
(114vs52, p<0.001).
Conclusions In a low prevalent ADR population, an
automatic polyp detection system during colonoscopy
resulted in a significant increase in the number of
diminutive adenomas detected, as well as an increase in
the rate of hyperplastic polyps.The cost–benefit ratio of
such effects has to be determined further.
Trial registration number ChiCTR-DDD-17012221;
Results.
INTRODUCTION
Colorectal cancer (CRC) is the second and third-
leading causes of cancer-related deaths in men and
women respectively.1
Colonoscopy is the gold stan-
dard for screening CRC.2 3
Screening colonoscopy
has allowed for a reduction in the incidence and
mortality of CRC via the detection and removal
of adenomatous polyps.4–8
Additionally, there is
evidence that with each 1.0% increase in adenoma
detection rate (ADR), there is an associated 3.0%
decrease in the risk of interval CRC.9 10
However,
polyps can be missed, with reported miss rates of
up to 27% due to both polyp and operator charac-
teristics.11 12
Unrecognised polyps within the visual field is
an important problem to address.11
Several studies
have shown that assistance by a second observer
increases the polyp detection rate (PDR), but such a
strategy remains controversial in terms of increasing
the ADR.13–15
Ideally, a real-time automatic polyp detec-
tion system, with performance close to that of
expert endoscopists, could assist the endosco-
pist in detecting lesions that might correspond to
adenomas in a more consistent and reliable way
Significance of this study
What is already known on this subject?
► Colorectal adenoma detection rate (ADR)
is regarded as a main quality indicator of
(screening) colonoscopy and has been shown
to correlate with interval cancers. Reducing
adenoma miss rates by increasing ADR has
been a goal of many studies focused on
imaging techniques and mechanical methods.
► Artificial intelligence has been recently
introduced for polyp and adenoma detection
as well as differentiation and has shown
promising results in preliminary studies.
What are the new findings?
► This represents the first prospective randomised
controlled trial examining an automatic polyp
detection during colonoscopy and shows an
increase of ADR by 50%, from 20% to 30%.
► This effect was mainly due to a higher rate of
small adenomas found.
► The detection rate of hyperplastic polyps was
also significantly increased.
How might it impact on clinical practice in the
foreseeable future?
► Automatic polyp and adenoma detection could
be the future of diagnostic colonoscopy in order
to achieve stable high adenoma detection rates.
► However, the effect on ultimate outcome is
still unclear, and further improvements such as
polyp differentiation have to be implemented.
on17March2019byguest.Protectedbycopyright.http://gut.bmj.com/Gut:firstpublishedas10.1136/gutjnl-2018-317500on27February2019.Downloadedfrom
소화기내과
Downloadedfromhttps://journals.lww.com/ajspbyBhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4XMi0hCywCX1AWnYQp/IlQrHD3MyLIZIvnCFZVJ56DGsD590P5lh5KqE20T/dBX3x9CoM=on10/14/2018
Downloadedfromhttps://journals.lww.com/ajspbyBhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4XMi0hCywCX1AWnYQp/IlQrHD3MyLIZIvnCFZVJ56DGsD590P5lh5KqE20T/dBX3x9CoM=on10/14/2018
Impact of Deep Learning Assistance on the
Histopathologic Review of Lymph Nodes for Metastatic
Breast Cancer
David F. Steiner, MD, PhD,* Robert MacDonald, PhD,* Yun Liu, PhD,* Peter Truszkowski, MD,*
Jason D. Hipp, MD, PhD, FCAP,* Christopher Gammage, MS,* Florence Thng, MS,†
Lily Peng, MD, PhD,* and Martin C. Stumpe, PhD*
Abstract: Advances in the quality of whole-slide images have set the
stage for the clinical use of digital images in anatomic pathology.
Along with advances in computer image analysis, this raises the
possibility for computer-assisted diagnostics in pathology to improve
histopathologic interpretation and clinical care. To evaluate the
potential impact of digital assistance on interpretation of digitized
slides, we conducted a multireader multicase study utilizing our deep
learning algorithm for the detection of breast cancer metastasis in
lymph nodes. Six pathologists reviewed 70 digitized slides from lymph
node sections in 2 reader modes, unassisted and assisted, with a wash-
out period between sessions. In the assisted mode, the deep learning
algorithm was used to identify and outline regions with high like-
lihood of containing tumor. Algorithm-assisted pathologists demon-
strated higher accuracy than either the algorithm or the pathologist
alone. In particular, algorithm assistance significantly increased the
sensitivity of detection for micrometastases (91% vs. 83%, P=0.02).
In addition, average review time per image was significantly shorter
with assistance than without assistance for both micrometastases (61
vs. 116 s, P=0.002) and negative images (111 vs. 137 s, P=0.018).
Lastly, pathologists were asked to provide a numeric score regarding
the difficulty of each image classification. On the basis of this score,
pathologists considered the image review of micrometastases to be
significantly easier when interpreted with assistance (P=0.0005).
Utilizing a proof of concept assistant tool, this study demonstrates the
potential of a deep learning algorithm to improve pathologist accu-
racy and efficiency in a digital pathology workflow.
Key Words: artificial intelligence, machine learning, digital pathology,
breast cancer, computer aided detection
(Am J Surg Pathol 2018;00:000–000)
The regulatory approval and gradual implementation of
whole-slide scanners has enabled the digitization of glass
slides for remote consults and archival purposes.1 Digitiza-
tion alone, however, does not necessarily improve the con-
sistency or efficiency of a pathologist’s primary workflow. In
fact, image review on a digital medium can be slightly
slower than on glass, especially for pathologists with limited
digital pathology experience.2 However, digital pathology
and image analysis tools have already demonstrated po-
tential benefits, including the potential to reduce inter-reader
variability in the evaluation of breast cancer HER2 status.3,4
Digitization also opens the door for assistive tools based on
Artificial Intelligence (AI) to improve efficiency and con-
sistency, decrease fatigue, and increase accuracy.5
Among AI technologies, deep learning has demon-
strated strong performance in many automated image-rec-
ognition applications.6–8 Recently, several deep learning–
based algorithms have been developed for the detection of
breast cancer metastases in lymph nodes as well as for other
applications in pathology.9,10 Initial findings suggest that
some algorithms can even exceed a pathologist’s sensitivity
for detecting individual cancer foci in digital images. How-
ever, this sensitivity gain comes at the cost of increased false
positives, potentially limiting the utility of such algorithms for
automated clinical use.11 In addition, deep learning algo-
rithms are inherently limited to the task for which they have
been specifically trained. While we have begun to understand
the strengths of these algorithms (such as exhaustive search)
and their weaknesses (sensitivity to poor optical focus, tumor
mimics; manuscript under review), the potential clinical util-
ity of such algorithms has not been thoroughly examined.
While an accurate algorithm alone will not necessarily aid
pathologists or improve clinical interpretation, these benefits
may be achieved through thoughtful and appropriate in-
tegration of algorithm predictions into the clinical workflow.8
From the *Google AI Healthcare; and †Verily Life Sciences, Mountain
View, CA.
D.F.S., R.M., and Y.L. are co-first authors (equal contribution).
Work done as part of the Google Brain Healthcare Technology Fellowship
(D.F.S. and P.T.).
Conflicts of Interest and Source of Funding: D.F.S., R.M., Y.L., P.T.,
J.D.H., C.G., F.T., L.P., M.C.S. are employees of Alphabet and have
Alphabet stock.
Correspondence: David F. Steiner, MD, PhD, Google AI Healthcare,
1600 Amphitheatre Way, Mountain View, CA 94043
(e-mail: davesteiner@google.com).
Supplemental Digital Content is available for this article. Direct URL citations
appear in the printed text and are provided in the HTML and PDF
versions of this article on the journal’s website, www.ajsp.com.
Copyright © 2018 The Author(s). Published by Wolters Kluwer Health,
Inc. This is an open-access article distributed under the terms of the
Creative Commons Attribution-Non Commercial-No Derivatives
License 4.0 (CCBY-NC-ND), where it is permissible to download and
share the work provided it is properly cited. The work cannot be
changed in any way or used commercially without permission from
the journal.
ORIGINAL ARTICLE
Am J Surg Pathol  Volume 00, Number 00, ’’ 2018 www.ajsp.com | 1
병리과
S E P S I S
A targeted real-time early warning score (TREWScore)
for septic shock
Katharine E. Henry,1
David N. Hager,2
Peter J. Pronovost,3,4,5
Suchi Saria1,3,5,6
*
Sepsis is a leading cause of death in the United States, with mortality highest among patients who develop septic
shock. Early aggressive treatment decreases morbidity and mortality. Although automated screening tools can detect
patients currently experiencing severe sepsis and septic shock, none predict those at greatest risk of developing
shock. We analyzed routinely available physiological and laboratory data from intensive care unit patients and devel-
oped “TREWScore,” a targeted real-time early warning score that predicts which patients will develop septic shock.
TREWScore identified patients before the onset of septic shock with an area under the ROC (receiver operating
characteristic) curve (AUC) of 0.83 [95% confidence interval (CI), 0.81 to 0.85]. At a specificity of 0.67, TREWScore
achieved a sensitivity of 0.85 and identified patients a median of 28.2 [interquartile range (IQR), 10.6 to 94.2] hours
before onset. Of those identified, two-thirds were identified before any sepsis-related organ dysfunction. In compar-
ison, the Modified Early Warning Score, which has been used clinically for septic shock prediction, achieved a lower
AUC of 0.73 (95% CI, 0.71 to 0.76). A routine screening protocol based on the presence of two of the systemic inflam-
matory response syndrome criteria, suspicion of infection, and either hypotension or hyperlactatemia achieved a low-
er sensitivity of 0.74 at a comparable specificity of 0.64. Continuous sampling of data from the electronic health
records and calculation of TREWScore may allow clinicians to identify patients at risk for septic shock and provide
earlier interventions that would prevent or mitigate the associated morbidity and mortality.
INTRODUCTION
Seven hundred fifty thousand patients develop severe sepsis and septic
shock in the United States each year. More than half of them are
admitted to an intensive care unit (ICU), accounting for 10% of all
ICU admissions, 20 to 30% of hospital deaths, and $15.4 billion in an-
nual health care costs (1–3). Several studies have demonstrated that
morbidity, mortality, and length of stay are decreased when severe sep-
sis and septic shock are identified and treated early (4–8). In particular,
one study showed that mortality from septic shock increased by 7.6%
with every hour that treatment was delayed after the onset of hypo-
tension (9).
More recent studies comparing protocolized care, usual care, and
early goal-directed therapy (EGDT) for patients with septic shock sug-
gest that usual care is as effective as EGDT (10–12). Some have inter-
preted this to mean that usual care has improved over time and reflects
important aspects of EGDT, such as early antibiotics and early ag-
gressive fluid resuscitation (13). It is likely that continued early identi-
fication and treatment will further improve outcomes. However, the
best approach to managing patients at high risk of developing septic
shock before the onset of severe sepsis or shock has not been studied.
Methods that can identify ahead of time which patients will later expe-
rience septic shock are needed to further understand, study, and im-
prove outcomes in this population.
General-purpose illness severity scoring systems such as the Acute
Physiology and Chronic Health Evaluation (APACHE II), Simplified
Acute Physiology Score (SAPS II), SequentialOrgan Failure Assessment
(SOFA) scores, Modified Early Warning Score (MEWS), and Simple
Clinical Score (SCS) have been validated to assess illness severity and
risk of death among septic patients (14–17). Although these scores
are useful for predicting general deterioration or mortality, they typical-
ly cannot distinguish with high sensitivity and specificity which patients
are at highest risk of developing a specific acute condition.
The increased use of electronic health records (EHRs), which can be
queried in real time, has generated interest in automating tools that
identify patients at risk for septic shock (18–20). A number of “early
warning systems,” “track and trigger” initiatives, “listening applica-
tions,” and “sniffers” have been implemented to improve detection
andtimelinessof therapy forpatients with severe sepsis andseptic shock
(18, 20–23). Although these tools have been successful at detecting pa-
tients currently experiencing severe sepsis or septic shock, none predict
which patients are at highest risk of developing septic shock.
The adoption of the Affordable Care Act has added to the growing
excitement around predictive models derived from electronic health
data in a variety of applications (24), including discharge planning
(25), risk stratification (26, 27), and identification of acute adverse
events (28, 29). For septic shock in particular, promising work includes
that of predicting septic shock using high-fidelity physiological signals
collected directly from bedside monitors (30, 31), inferring relationships
between predictors of septic shock using Bayesian networks (32), and
using routine measurements for septic shock prediction (33–35). No
current prediction models that use only data routinely stored in the
EHR predict septic shock with high sensitivity and specificity many
hours before onset. Moreover, when learning predictive risk scores, cur-
rent methods (34, 36, 37) often have not accounted for the censoring
effects of clinical interventions on patient outcomes (38). For instance,
a patient with severe sepsis who received fluids and never developed
septic shock would be treated as a negative case, despite the possibility
that he or she might have developed septic shock in the absence of such
treatment and therefore could be considered a positive case up until the
1
Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA.
2
Division of Pulmonary and Critical Care Medicine, Department of Medicine, School of
Medicine, Johns Hopkins University, Baltimore, MD 21205, USA. 3
Armstrong Institute for
Patient Safety and Quality, Johns Hopkins University, Baltimore, MD 21202, USA. 4
Department
of Anesthesiology and Critical Care Medicine, School of Medicine, Johns Hopkins University,
Baltimore, MD 21202, USA. 5
Department of Health Policy and Management, Bloomberg
School of Public Health, Johns Hopkins University, Baltimore, MD 21205, USA. 6
Department
of Applied Math and Statistics, Johns Hopkins University, Baltimore, MD 21218, USA.
*Corresponding author. E-mail: ssaria@cs.jhu.edu
R E S E A R C H A R T I C L E
www.ScienceTranslationalMedicine.org 5 August 2015 Vol 7 Issue 299 299ra122 1
onNovember3,2016http://stm.sciencemag.org/Downloadedfrom
감염내과
BRIEF COMMUNICATION OPEN
Digital biomarkers of cognitive function
Paul Dagum1
To identify digital biomarkers associated with cognitive function, we analyzed human–computer interaction from 7 days of
smartphone use in 27 subjects (ages 18–34) who received a gold standard neuropsychological assessment. For several
neuropsychological constructs (working memory, memory, executive function, language, and intelligence), we found a family of
digital biomarkers that predicted test scores with high correlations (p  10−4
). These preliminary results suggest that passive
measures from smartphone use could be a continuous ecological surrogate for laboratory-based neuropsychological assessment.
npj Digital Medicine (2018)1:10 ; doi:10.1038/s41746-018-0018-4
INTRODUCTION
By comparison to the functional metrics available in other
disciplines, conventional measures of neuropsychiatric disorders
have several challenges. First, they are obtrusive, requiring a
subject to break from their normal routine, dedicating time and
often travel. Second, they are not ecological and require subjects
to perform a task outside of the context of everyday behavior.
Third, they are episodic and provide sparse snapshots of a patient
only at the time of the assessment. Lastly, they are poorly scalable,
taxing limited resources including space and trained staff.
In seeking objective and ecological measures of cognition, we
attempted to develop a method to measure memory and
executive function not in the laboratory but in the moment,
day-to-day. We used human–computer interaction on smart-
phones to identify digital biomarkers that were correlated with
neuropsychological performance.
RESULTS
In 2014, 27 participants (ages 27.1 ± 4.4 years, education
14.1 ± 2.3 years, M:F 8:19) volunteered for neuropsychological
assessment and a test of the smartphone app. Smartphone
human–computer interaction data from the 7 days following
the neuropsychological assessment showed a range of correla-
tions with the cognitive scores. Table 1 shows the correlation
between each neurocognitive test and the cross-validated
predictions of the supervised kernel PCA constructed from
the biomarkers for that test. Figure 1 shows each participant
test score and the digital biomarker prediction for (a) digits
backward, (b) symbol digit modality, (c) animal fluency,
(d) Wechsler Memory Scale-3rd Edition (WMS-III) logical
memory (delayed free recall), (e) brief visuospatial memory test
(delayed free recall), and (f) Wechsler Adult Intelligence Scale-
4th Edition (WAIS-IV) block design. Construct validity of the
predictions was determined using pattern matching that
computed a correlation of 0.87 with p  10−59
between the
covariance matrix of the predictions and the covariance matrix
of the tests.
Table 1. Fourteen neurocognitive assessments covering five cognitive
domains and dexterity were performed by a neuropsychologist.
Shown are the group mean and standard deviation, range of score,
and the correlation between each test and the cross-validated
prediction constructed from the digital biomarkers for that test
Cognitive predictions
Mean (SD) Range R (predicted),
p-value
Working memory
Digits forward 10.9 (2.7) 7–15 0.71 ± 0.10, 10−4
Digits backward 8.3 (2.7) 4–14 0.75 ± 0.08, 10−5
Executive function
Trail A 23.0 (7.6) 12–39 0.70 ± 0.10, 10−4
Trail B 53.3 (13.1) 37–88 0.82 ± 0.06, 10−6
Symbol digit modality 55.8 (7.7) 43–67 0.70 ± 0.10, 10−4
Language
Animal fluency 22.5 (3.8) 15–30 0.67 ± 0.11, 10−4
FAS phonemic fluency 42 (7.1) 27–52 0.63 ± 0.12, 10−3
Dexterity
Grooved pegboard test
(dominant hand)
62.7 (6.7) 51–75 0.73 ± 0.09, 10−4
Memory
California verbal learning test
(delayed free recall)
14.1 (1.9) 9–16 0.62 ± 0.12, 10−3
WMS-III logical memory
(delayed free recall)
29.4 (6.2) 18–42 0.81 ± 0.07, 10−6
Brief visuospatial memory test
(delayed free recall)
10.2 (1.8) 5–12 0.77 ± 0.08, 10−5
Intelligence scale
WAIS-IV block design 46.1(12.8) 12–61 0.83 ± 0.06, 10−6
WAIS-IV matrix reasoning 22.1(3.3) 12–26 0.80 ± 0.07, 10−6
WAIS-IV vocabulary 40.6(4.0) 31–50 0.67 ± 0.11, 10−4
Received: 5 October 2017 Revised: 3 February 2018 Accepted: 7 February 2018
1
Mindstrong Health, 248 Homer Street, Palo Alto, CA 94301, USA
Correspondence: Paul Dagum (paul@mindstronghealth.com)
www.nature.com/npjdigitalmed
정신의학과
P R E C I S I O N M E D I C I N E
Identification of type 2 diabetes subgroups through
topological analysis of patient similarity
Li Li,1
Wei-Yi Cheng,1
Benjamin S. Glicksberg,1
Omri Gottesman,2
Ronald Tamler,3
Rong Chen,1
Erwin P. Bottinger,2
Joel T. Dudley1,4
*
Type 2 diabetes (T2D) is a heterogeneous complex disease affecting more than 29 million Americans alone with a
rising prevalence trending toward steady increases in the coming decades. Thus, there is a pressing clinical need to
improve early prevention and clinical management of T2D and its complications. Clinicians have understood that
patients who carry the T2D diagnosis have a variety of phenotypes and susceptibilities to diabetes-related compli-
cations. We used a precision medicine approach to characterize the complexity of T2D patient populations based
on high-dimensional electronic medical records (EMRs) and genotype data from 11,210 individuals. We successfully
identified three distinct subgroups of T2D from topology-based patient-patient networks. Subtype 1 was character-
ized by T2D complications diabetic nephropathy and diabetic retinopathy; subtype 2 was enriched for cancer ma-
lignancy and cardiovascular diseases; and subtype 3 was associated most strongly with cardiovascular diseases,
neurological diseases, allergies, and HIV infections. We performed a genetic association analysis of the emergent
T2D subtypes to identify subtype-specific genetic m
내분비내과
LETTER
Derma o og - eve c a ca on o k n cancer
w h deep neura ne work
피부과
FOCUS LETTERS
W
W
W
W
W
Ca d o og s eve a hy hm a de ec on and
c ass ca on n ambu a o y e ec oca d og ams
us ng a deep neu a ne wo k
M m
M
FOCUS LETTERS
심장내과
D p a n ng nab obu a m n and on o
human b a o y a n v o a on
산부인과
O G NA A
W on o On o og nd b e n e e men
e ommend on g eemen w h n e pe
mu d p n umo bo d
종양내과신장내과
up d u onomou obo o u u g
외과
NATURE MEDICINE
and the algorithm led to the best accuracy, and the algorithm mark-
edly sped up the review of slides35
. This study is particularly notable,
41
Table 2 | FDA AI approvals are accelerating
Company FDA Approval Indication
Apple September 2018 Atrial fibrillation detection
Aidoc August 2018 CT brain bleed diagnosis
iCAD August 2018 Breast density via
mammography
Zebra Medical July 2018 Coronary calcium scoring
Bay Labs June 2018 Echocardiogram EF
determination
Neural Analytics May 2018 Device for paramedic stroke
diagnosis
IDx April 2018 Diabetic retinopathy diagnosis
Icometrix April 2018 MRI brain interpretation
Imagen March 2018 X-ray wrist fracture diagnosis
Viz.ai February 2018 CT stroke diagnosis
Arterys February 2018 Liver and lung cancer (MRI, CT)
diagnosis
MaxQ-AI January 2018 CT brain bleed diagnosis
Alivecor November 2017 Atrial fibrillation detection via
Apple Watch
Arterys January 2017 MRI heart interpretation
NATURE MEDICINE
인공지능 기반 의료기기 

FDA 인허가 현황
Nature Medicine 2019
• Zebra Medical Vision

• 2019년 5월: 흉부 엑스레이에서 기흉 triage

• 2019년 6월: head CT 에서 뇌출혈 판독

• Aidoc

• 2019년 5월: CT에서 폐색전증 판독

• 2019년 6월: CT에서 경추골절 판독

• GE 헬스케어

• 2019년 9월: 흉부 엑스레이 기기에서 기흉 triage
+
인공지능 기반 의료기기 

국내 인허가 현황
• 1. 뷰노 본에이지 (2등급 허가)

• 2. 루닛 인사이트 폐결절 (2등급 허가)

• 3. JLK인스펙션 뇌경색 (3등급 허가)

• 4. 인포메디텍 뉴로아이 (2등급 인증): MRI 기반 치매 진단 보조

• 5. 삼성전자 폐결절 (2등급 허가)

• 6. 뷰노 딥브레인 (2등급 인증)

• 7. 루닛 인사이트 MMG (3등급 허가)

• 8. JLK인스펙션 ATROSCAN (2등급 인증) 건강검진용 뇌 노화 측정

• 9. 뷰노 체스트엑스레이 (2등급 허가)

• 10. 딥노이드 딥스파인 (2등급 허가): X-ray 요추 압박골절 검출보조

• 11. JLK 인스펙션 폐 CT(JLD-01A) (2등급 인증)

• 12. JLK 인스펙션 대장내시경 (JFD-01A) (2등급 인증)

• 13. JLK 인스펙션 위내시경 (JFD-02A) (2등급 인증)

• 14. 루닛 인사이트 CXR (2등급 허가): 흉부 엑스레이에서 이상부위 검출 보조

• 15. 뷰노 Fundus AI (3등급 허가): 안저 사진 분석, 12가지 이상 소견 유무

• 16. 딥바이오 DeepDx-Prostate: 전립선 조직 생검으로 암진단 보조

• 17. 뷰노 LungCT (2등급 허가): CT 영상 기반 폐결절 검출 인공지능
2018년
2019년
2020년
JLK인스펙션, 코스닥 시장 상장
•2019년 7월 기술성 평가 통과

•9월 6일 상장 예비 심사 청구
•2019년 12월 11일 코스닥 상장

•공모 시장에서 180억원 조달
뷰노, 연내 상장 계획
“뷰노는 지난 4월 산업은행에서 90억원을 투자 받는 과정에
서 기업가치 1500억원을 인정받았다. 업계에서는 뷰노의 상
장 후 기업가치는 2000억원 이상으로 예상하고 있다.”
“뷰노는 나이스디앤비, 한국기업데이터 두 기관이 진행한 

기술성평가에서 모두 A등급을 획득해 높은 인공지능(AI) 

기술력을 입증했다. 뷰노는 이번 결과를 기반으로 이른 시일
내 코스닥 상장을 위한 예비심사 청구서를 제출할 예정이다.”
Artificial Intelligence in medicine is not a future.
It is already here.
Artificial Intelligence in medicine is not a future.
It is already here.
Wrong Question
누가 더 잘 하는가? (x)

의사를 대체할 것인가? (x)
Right Question
더 나은 의료를 어떻게 만들 수 있는가?(O)

의료의 목적을 어떻게 더 잘 이룰 수 있나? (O)
The American Medical Association House of
Delegates has adopted policies to keep the focus on
advancing the role of augmented intelligence (AI) in
enhancing patient care, improving population health,
reducing overall costs, increasing value and the support
of professional satisfaction for physicians.
Foundational policy Annual 2018
As a leader in American medicine, our AMA has a
unique opportunity to ensure that the evolution of AI
in medicine benefits patients, physicians and the health
care community. To that end our AMA seeks to:
Leverage ongoing engagement in digital health and
other priority areas for improving patient outcomes
and physician professional satisfaction to help set
priorities for health care AI
Identify opportunities to integrate practicing
physicians’perspectives into the development,
design, validation and implementation of health
care AI
Promote development of thoughtfully designed,
high-quality, clinically validated health care AI that:
• Is designed and evaluated in keeping with best
practices in user-centered design, particularly
for physicians and other members of the health
care team
• Is transparent
• Conforms to leading standards for
reproducibility
• Identifies and takes steps to address bias and
avoids introducing or exacerbating health care
disparities, including when testing or deploying
new AI tools on vulnerable populations
• Safeguards patients’and other individuals’
privacy interests and preserves the security and
integrity of personal information
Encourage education for patients, physicians,
medical students, other health care professionals
and health administrators to promote greater
understanding of the promise and limitations of
health care AI
Explore the legal implications of health care AI,
such as issues of liability or intellectual property,
and advocate for appropriate professional and
governmental oversight for safe, effective, and
equitable use of and access to health care AI
Medical experts are working
to determine the clinical
applications of AI—work that
will guide health care in the
future. These experts, along
with physicians, state and
federal officials must find the
path that ends with better
outcomes for patients. We have
to make sure the technology
does not get ahead of our
humanity and creativity as
physicians.
”—Gerald E. Harmon, MD, AMA Board
of Trustees
“
Policy
Augmented intelligence in health care
https://www.ama-assn.org/system/files/2019-08/ai-2018-board-policy-summary.pdf
Augmented Intelligence,
rather than Artificial Intelligence
Martin Duggan,“IBM Watson Health - Integrated Care  the Evolution to Cognitive Computing”
인간 의사의 어떤 측면이 augmented 될 수 있는가?
의료 인공지능
•1부: 제 2의 기계시대와 의료 인공지능

•2부: 의료 인공지능의 과거와 현재

•3부: 미래를 어떻게 맞이할 것인가
의료 인공지능
•1부: 제 2의 기계시대와 의료 인공지능

•2부: 의료 인공지능의 과거와 현재

•3부: 미래를 어떻게 맞이할 것인가
•복잡한 의료 데이터의 분석 및 insight 도출

•영상 의료/병리 데이터의 분석/판독

•연속 데이터의 모니터링 및 예방/예측
의료 인공지능의 세 유형
•복잡한 의료 데이터의 분석 및 insight 도출

•영상 의료/병리 데이터의 분석/판독

•연속 데이터의 모니터링 및 예방/예측
의료 인공지능의 세 유형
Jeopardy!
2011년 인간 챔피언 두 명 과 퀴즈 대결을 벌여서 압도적인 우승을 차지
600,000 pieces of medical evidence
2 million pages of text from 42 medical journals and clinical trials
69 guidelines, 61,540 clinical trials
IBM Watson on Medicine
Watson learned...
+
1,500 lung cancer cases
physician notes, lab results and clinical research
+
14,700 hours of hands-on training
Lack of Evidence.
WFO in ASCO 2017
• Early experience with IBM WFO cognitive computing system for lung 



and colorectal cancer treatment (마니팔 병원)

• 지난 3년간: lung cancer(112), colon cancer(126), rectum cancer(124)
• lung cancer: localized 88.9%, meta 97.9%
• colon cancer: localized 85.5%, meta 76.6%
• rectum cancer: localized 96.8%, meta 80.6%
Performance of WFO in India
2017 ASCO annual Meeting, J Clin Oncol 35, 2017 (suppl; abstr 8527)
WFO in ASCO 2017
•가천대 길병원의 대장암과 위암 환자에 왓슨 적용 결과

• 대장암 환자(stage II-IV) 340명

• 진행성 위암 환자 185명 (Retrospective)

• 의사와의 일치율

• 대장암 환자: 73%

• 보조 (adjuvant) 항암치료를 받은 250명: 85%

• 전이성 환자 90명: 40%

• 위암 환자: 49%

• Trastzumab/FOLFOX 가 국민 건강 보험 수가를 받지 못함

• S-1(tegafur, gimeracil and oteracil)+cisplatin):

• 국내는 매우 루틴; 미국에서는 X
•“향후 10년 동안 첫번째 cardiovascular event 가 올 것인가” 예측

•전향적 코호트 스터디: 영국 환자 378,256 명

•일상적 의료 데이터를 바탕으로 기계학습으로 질병을 예측하는 첫번째 대규모 스터디

•기존의 ACC/AHA 가이드라인과 4가지 기계학습 알고리즘의 정확도를 비교

•Random forest; Logistic regression; Gradient boosting; Neural network
ARTICLE OPEN
Scalable and accurate deep learning with electronic health
records
Alvin Rajkomar 1,2
, Eyal Oren1
, Kai Chen1
, Andrew M. Dai1
, Nissan Hajaj1
, Michaela Hardt1
, Peter J. Liu1
, Xiaobing Liu1
, Jake Marcus1
,
Mimi Sun1
, Patrik Sundberg1
, Hector Yee1
, Kun Zhang1
, Yi Zhang1
, Gerardo Flores1
, Gavin E. Duggan1
, Jamie Irvine1
, Quoc Le1
,
Kurt Litsch1
, Alexander Mossin1
, Justin Tansuwan1
, De Wang1
, James Wexler1
, Jimbo Wilson1
, Dana Ludwig2
, Samuel L. Volchenboum3
,
Katherine Chou1
, Michael Pearson1
, Srinivasan Madabushi1
, Nigam H. Shah4
, Atul J. Butte2
, Michael D. Howell1
, Claire Cui1
,
Greg S. Corrado1
and Jeffrey Dean1
Predictive modeling with electronic health record (EHR) data is anticipated to drive personalized medicine and improve healthcare
quality. Constructing predictive statistical models typically requires extraction of curated predictor variables from normalized EHR
data, a labor-intensive process that discards the vast majority of information in each patient’s record. We propose a representation
of patients’ entire raw EHR records based on the Fast Healthcare Interoperability Resources (FHIR) format. We demonstrate that
deep learning methods using this representation are capable of accurately predicting multiple medical events from multiple
centers without site-specific data harmonization. We validated our approach using de-identified EHR data from two US academic
medical centers with 216,221 adult patients hospitalized for at least 24 h. In the sequential format we propose, this volume of EHR
data unrolled into a total of 46,864,534,945 data points, including clinical notes. Deep learning models achieved high accuracy for
tasks such as predicting: in-hospital mortality (area under the receiver operator curve [AUROC] across sites 0.93–0.94), 30-day
unplanned readmission (AUROC 0.75–0.76), prolonged length of stay (AUROC 0.85–0.86), and all of a patient’s final discharge
diagnoses (frequency-weighted AUROC 0.90). These models outperformed traditional, clinically-used predictive models in all cases.
We believe that this approach can be used to create accurate and scalable predictions for a variety of clinical scenarios. In a case
study of a particular prediction, we demonstrate that neural networks can be used to identify relevant information from the
patient’s chart.
npj Digital Medicine (2018)1:18 ; doi:10.1038/s41746-018-0029-1
INTRODUCTION
The promise of digital medicine stems in part from the hope that,
by digitizing health data, we might more easily leverage computer
information systems to understand and improve care. In fact,
routinely collected patient healthcare data are now approaching
the genomic scale in volume and complexity.1
Unfortunately,
most of this information is not yet used in the sorts of predictive
statistical models clinicians might use to improve care delivery. It
is widely suspected that use of such efforts, if successful, could
provide major benefits not only for patient safety and quality but
also in reducing healthcare costs.2–6
In spite of the richness and potential of available data, scaling
the development of predictive models is difficult because, for
traditional predictive modeling techniques, each outcome to be
predicted requires the creation of a custom dataset with specific
variables.7
It is widely held that 80% of the effort in an analytic
model is preprocessing, merging, customizing, and cleaning
nurses, and other providers are included. Traditional modeling
approaches have dealt with this complexity simply by choosing a
very limited number of commonly collected variables to consider.7
This is problematic because the resulting models may produce
imprecise predictions: false-positive predictions can overwhelm
physicians, nurses, and other providers with false alarms and
concomitant alert fatigue,10
which the Joint Commission identified
as a national patient safety priority in 2014.11
False-negative
predictions can miss significant numbers of clinically important
events, leading to poor clinical outcomes.11,12
Incorporating the
entire EHR, including clinicians’ free-text notes, offers some hope
of overcoming these shortcomings but is unwieldy for most
predictive modeling techniques.
Recent developments in deep learning and artificial neural
networks may allow us to address many of these challenges and
unlock the information in the EHR. Deep learning emerged as the
preferred machine learning approach in machine perception
www.nature.com/npjdigitalmed
•2018년 1월 구글이 전자의무기록(EMR)을 분석하여, 환자 치료 결과를 예측하는 인공지능 발표

•환자가 입원 중에 사망할 것인지

•장기간 입원할 것인지

•퇴원 후에 30일 내에 재입원할 것인지

•퇴원 시의 진단명

•이번 연구의 특징: 확장성

•과거 다른 연구와 달리 EMR의 일부 데이터를 pre-processing 하지 않고,

•전체 EMR 를 통째로 모두 분석하였음: UCSF, UCM (시카고 대학병원)

•특히, 비정형 데이터인 의사의 진료 노트도 분석
LETTERS
https://doi.org/10.1038/s41591-018-0335-9
1
Guangzhou Women and Children’s Medical Center, Guangzhou Medical University, Guangzhou, China. 2
Institute for Genomic Medicine, Institute of
Engineering in Medicine, and Shiley Eye Institute, University of California, San Diego, La Jolla, CA, USA. 3
Hangzhou YITU Healthcare Technology Co. Ltd,
Hangzhou, China. 4
Department of Thoracic Surgery/Oncology, First Affiliated Hospital of Guangzhou Medical University, China State Key Laboratory and
National Clinical Research Center for Respiratory Disease, Guangzhou, China. 5
Guangzhou Kangrui Co. Ltd, Guangzhou, China. 6
Guangzhou Regenerative
Medicine and Health Guangdong Laboratory, Guangzhou, China. 7
Veterans Administration Healthcare System, San Diego, CA, USA. 8
These authors contributed
equally: Huiying Liang, Brian Tsui, Hao Ni, Carolina C. S. Valentim, Sally L. Baxter, Guangjian Liu. *e-mail: kang.zhang@gmail.com; xiahumin@hotmail.com
Artificial intelligence (AI)-based methods have emerged as
powerful tools to transform medical care. Although machine
learning classifiers (MLCs) have already demonstrated strong
performance in image-based diagnoses, analysis of diverse
and massive electronic health record (EHR) data remains chal-
lenging. Here, we show that MLCs can query EHRs in a manner
similar to the hypothetico-deductive reasoning used by physi-
cians and unearth associations that previous statistical meth-
ods have not found. Our model applies an automated natural
language processing system using deep learning techniques
to extract clinically relevant information from EHRs. In total,
101.6 million data points from 1,362,559 pediatric patient
visits presenting to a major referral center were analyzed to
train and validate the framework. Our model demonstrates
high diagnostic accuracy across multiple organ systems and is
comparable to experienced pediatricians in diagnosing com-
mon childhood diseases. Our study provides a proof of con-
cept for implementing an AI-based system as a means to aid
physiciansintacklinglargeamountsofdata,augmentingdiag-
nostic evaluations, and to provide clinical decision support in
cases of diagnostic uncertainty or complexity. Although this
impact may be most evident in areas where healthcare provid-
ers are in relative shortage, the benefits of such an AI system
are likely to be universal.
Medical information has become increasingly complex over
time. The range of disease entities, diagnostic testing and biomark-
ers, and treatment modalities has increased exponentially in recent
years. Subsequently, clinical decision-making has also become more
complex and demands the synthesis of decisions from assessment
of large volumes of data representing clinical information. In the
current digital age, the electronic health record (EHR) represents a
massive repository of electronic data points representing a diverse
array of clinical information1–3
. Artificial intelligence (AI) methods
have emerged as potentially powerful tools to mine EHR data to aid
in disease diagnosis and management, mimicking and perhaps even
augmenting the clinical decision-making of human physicians1
.
To formulate a diagnosis for any given patient, physicians fre-
quently use hypotheticodeductive reasoning. Starting with the chief
complaint, the physician then asks appropriately targeted questions
relating to that complaint. From this initial small feature set, the
physician forms a differential diagnosis and decides what features
(historical questions, physical exam findings, laboratory testing,
and/or imaging studies) to obtain next in order to rule in or rule
out the diagnoses in the differential diagnosis set. The most use-
ful features are identified, such that when the probability of one of
the diagnoses reaches a predetermined level of acceptability, the
process is stopped, and the diagnosis is accepted. It may be pos-
sible to achieve an acceptable level of certainty of the diagnosis with
only a few features without having to process the entire feature set.
Therefore, the physician can be considered a classifier of sorts.
In this study, we designed an AI-based system using machine
learning to extract clinically relevant features from EHR notes to
mimic the clinical reasoning of human physicians. In medicine,
machine learning methods have already demonstrated strong per-
formance in image-based diagnoses, notably in radiology2
, derma-
tology4
, and ophthalmology5–8
, but analysis of EHR data presents
a number of difficult challenges. These challenges include the vast
quantity of data, high dimensionality, data sparsity, and deviations
Evaluation and accurate diagnoses of pediatric
diseases using artificial intelligence
Huiying Liang1,8
, Brian Y. Tsui 2,8
, Hao Ni3,8
, Carolina C. S. Valentim4,8
, Sally L. Baxter 2,8
,
Guangjian Liu1,8
, Wenjia Cai 2
, Daniel S. Kermany1,2
, Xin Sun1
, Jiancong Chen2
, Liya He1
, Jie Zhu1
,
Pin Tian2
, Hua Shao2
, Lianghong Zheng5,6
, Rui Hou5,6
, Sierra Hewett1,2
, Gen Li1,2
, Ping Liang3
,
Xuan Zang3
, Zhiqi Zhang3
, Liyan Pan1
, Huimin Cai5,6
, Rujuan Ling1
, Shuhua Li1
, Yongwang Cui1
,
Shusheng Tang1
, Hong Ye1
, Xiaoyan Huang1
, Waner He1
, Wenqing Liang1
, Qing Zhang1
, Jianmin Jiang1
,
Wei Yu1
, Jianqun Gao1
, Wanxing Ou1
, Yingmin Deng1
, Qiaozhen Hou1
, Bei Wang1
, Cuichan Yao1
,
Yan Liang1
, Shu Zhang1
, Yaou Duan2
, Runze Zhang2
, Sarah Gibson2
, Charlotte L. Zhang2
, Oulan Li2
,
Edward D. Zhang2
, Gabriel Karin2
, Nathan Nguyen2
, Xiaokang Wu1,2
, Cindy Wen2
, Jie Xu2
, Wenqin Xu2
,
Bochu Wang2
, Winston Wang2
, Jing Li1,2
, Bianca Pizzato2
, Caroline Bao2
, Daoman Xiang1
, Wanting He1,2
,
Suiqin He2
, Yugui Zhou1,2
, Weldon Haw2,7
, Michael Goldbaum2
, Adriana Tremoulet2
, Chun-Nan Hsu 2
,
Hannah Carter2
, Long Zhu3
, Kang Zhang 1,2,7
* and Huimin Xia 1
*
NATURE MEDICINE | www.nature.com/naturemedicine
•소아 환자 130만 명의 EMR 데이터 101.6 million 개 분석 

•딥러닝 기반의 자연어 처리 기술

•의사의 hypothetico-deductive reasoning 모방

•소아 환자의 common disease를 진단하는 인공지능
Nat Med 2019 Feb
•복잡한 의료 데이터의 분석 및 insight 도출

•영상 의료/병리 데이터의 분석/판독

•연속 데이터의 모니터링 및 예방/예측
의료 인공지능의 세 유형
Deep Learning
http://theanalyticsstore.ie/deep-learning/
인공지능
기계학습
딥러닝
전문가 시스템
사이버네틱스
… 인공신경망
결정 트리
서포트 벡터 머신
…
컨볼루션 신경망 (CNN)
순환 신경망(RNN)
…
인공지능과 딥러닝의 관계
베이즈 네트워크
Deep Learning
“인공지능이 인간만큼 의료 영상을 잘 분석한다는 논문은
이제 받지 않겠다. 이미 충분히 증명되었기 때문이다.”
Clinical Impact!
• 인공지능의 의학적인 효용을 어떻게 보여줄 것인가

• ‘정확도 높다’ ➔ 환자의 치료 성과 개선 

• ‘정확도 높다’ ➔ 의사와의 시너지 (정확성, 효율, 비용 등)

• ‘하나의 질병’ ➔ ‘전체 질병’

• 후향적 연구 / 내부 검증 ➔ 전향적 RCT ➔ 진료 현장에서 활용

• 인간의 지각 능력으로는 불가능한 것
NATURE MEDICINE
and the algorithm led to the best accuracy, and the algorithm mark-
edly sped up the review of slides35
. This study is particularly notable,
41
Table 2 | FDA AI approvals are accelerating
Company FDA Approval Indication
Apple September 2018 Atrial fibrillation detection
Aidoc August 2018 CT brain bleed diagnosis
iCAD August 2018 Breast density via
mammography
Zebra Medical July 2018 Coronary calcium scoring
Bay Labs June 2018 Echocardiogram EF
determination
Neural Analytics May 2018 Device for paramedic stroke
diagnosis
IDx April 2018 Diabetic retinopathy diagnosis
Icometrix April 2018 MRI brain interpretation
Imagen March 2018 X-ray wrist fracture diagnosis
Viz.ai February 2018 CT stroke diagnosis
Arterys February 2018 Liver and lung cancer (MRI, CT)
diagnosis
MaxQ-AI January 2018 CT brain bleed diagnosis
Alivecor November 2017 Atrial fibrillation detection via
Apple Watch
Arterys January 2017 MRI heart interpretation
NATURE MEDICINE
인공지능 기반 의료기기 

FDA 인허가 현황
Nature Medicine 2019
• Zebra Medical Vision

• 2019년 5월: 흉부 엑스레이에서 기흉 triage

• 2019년 6월: head CT 에서 뇌출혈 판독

• Aidoc

• 2019년 5월: CT에서 폐색전증 판독

• 2019년 6월: CT에서 경추골절 판독

• GE 헬스케어

• 2019년 9월: 흉부 엑스레이 기기에서 기흉 triage
+
인공지능 기반 의료기기 

국내 인허가 현황
• 1. 뷰노 본에이지 (2등급 허가)

• 2. 루닛 인사이트 폐결절 (2등급 허가)

• 3. JLK인스펙션 뇌경색 (3등급 허가)

• 4. 인포메디텍 뉴로아이 (2등급 인증): MRI 기반 치매 진단 보조

• 5. 삼성전자 폐결절 (2등급 허가)

• 6. 뷰노 딥브레인 (2등급 인증)

• 7. 루닛 인사이트 MMG (3등급 허가)

• 8. JLK인스펙션 ATROSCAN (2등급 인증) 건강검진용 뇌 노화 측정

• 9. 뷰노 체스트엑스레이 (2등급 허가)

• 10. 딥노이드 딥스파인 (2등급 허가): X-ray 요추 압박골절 검출보조

• 11. JLK 인스펙션 폐 CT(JLD-01A) (2등급 인증)

• 12. JLK 인스펙션 대장내시경 (JFD-01A) (2등급 인증)

• 13. JLK 인스펙션 위내시경 (JFD-02A) (2등급 인증)

• 14. 루닛 인사이트 CXR (2등급 허가): 흉부 엑스레이에서 이상부위 검출 보조

• 15. 뷰노 Fundus AI (3등급 허가): 안저 사진 분석, 12가지 이상 소견 유무

• 16. 딥바이오 DeepDx-Prostate: 전립선 조직 생검으로 암진단 보조

• 17. 뷰노 LungCT (2등급 허가): CT 영상 기반 폐결절 검출 인공지능
2018년
2019년
2020년
Radiology
•손 엑스레이 영상을 판독하여 환자의 골연령 (뼈 나이)를 계산해주는 인공지능

• 기존에 의사는 그룰리히-파일(Greulich-Pyle)법 등으로 표준 사진과 엑스레이를 비교하여 판독

• 인공지능은 참조표준영상에서 성별/나이별 패턴을 찾아서 유사성을 확률로 표시 + 표준 영상 검색

•의사가 성조숙증이나 저성장을 진단하는데 도움을 줄 수 있음
- 1 -
보 도 자 료
국내에서 개발한 인공지능(AI) 기반 의료기기 첫 허가
- 인공지능 기술 활용하여 뼈 나이 판독한다 -
식품의약품안전처 처장 류영진 는 국내 의료기기업체 주 뷰노가
개발한 인공지능 기술이 적용된 의료영상분석장치소프트웨어
뷰노메드 본에이지 를 월 일 허가했다고
밝혔습니다
이번에 허가된 뷰노메드 본에이지 는 인공지능 이 엑스레이 영상을
분석하여 환자의 뼈 나이를 제시하고 의사가 제시된 정보 등으로
성조숙증이나 저성장을 진단하는데 도움을 주는 소프트웨어입니다
그동안 의사가 환자의 왼쪽 손 엑스레이 영상을 참조표준영상
과 비교하면서 수동으로 뼈 나이를 판독하던 것을 자동화하여
판독시간을 단축하였습니다
이번 허가 제품은 년 월부터 빅데이터 및 인공지능 기술이
적용된 의료기기의 허가 심사 가이드라인 적용 대상으로 선정되어
임상시험 설계에서 허가까지 맞춤 지원하였습니다
뷰노메드 본에이지 는 환자 왼쪽 손 엑스레이 영상을 분석하여 의
료인이 환자 뼈 나이를 판단하는데 도움을 주기 위한 목적으로
허가되었습니다
- 2 -
분석은 인공지능이 촬영된 엑스레이 영상의 패턴을 인식하여 성별
남자 개 여자 개 로 분류된 뼈 나이 모델 참조표준영상에서
성별 나이별 패턴을 찾아 유사성을 확률로 표시하면 의사가 확률값
호르몬 수치 등의 정보를 종합하여 성조숙증이나 저성장을 진단합
니다
임상시험을 통해 제품 정확도 성능 를 평가한 결과 의사가 판단한
뼈 나이와 비교했을 때 평균 개월 차이가 있었으며 제조업체가
해당 제품 인공지능이 스스로 인지 학습할 수 있도록 영상자료를
주기적으로 업데이트하여 의사와의 오차를 좁혀나갈 수 있도록
설계되었습니다
인공지능 기반 의료기기 임상시험계획 승인건수는 이번에 허가받은
뷰노메드 본에이지 를 포함하여 현재까지 건입니다
임상시험이 승인된 인공지능 기반 의료기기는 자기공명영상으로
뇌경색 유형을 분류하는 소프트웨어 건 엑스레이 영상을 통해
폐결절 진단을 도와주는 소프트웨어 건 입니다
참고로 식약처는 인공지능 가상현실 프린팅 등 차 산업과
관련된 의료기기 신속한 개발을 지원하기 위하여 제품 연구 개발부터
임상시험 허가에 이르기까지 전 과정을 맞춤 지원하는 차세대
프로젝트 신개발 의료기기 허가도우미 등을 운영하고 있
습니다
식약처는 이번 제품 허가를 통해 개개인의 뼈 나이를 신속하게
분석 판정하는데 도움을 줄 수 있을 것이라며 앞으로도 첨단 의료기기
개발이 활성화될 수 있도록 적극적으로 지원해 나갈 것이라고
밝혔습니다
저는 뷰노의 자문을 맡고 있으며, 지분 관계가 있음을 밝힙니다
AJR:209, December 2017 1
Since 1992, concerns regarding interob-
server variability in manual bone age esti-
mation [4] have led to the establishment of
several automatic computerized methods for
bone age estimation, including computer-as-
sisted skeletal age scores, computer-aided
skeletal maturation assessment systems, and
BoneXpert (Visiana) [5–14]. BoneXpert was
developed according to traditional machine-
learning techniques and has been shown to
have a good performance for patients of var-
ious ethnicities and in various clinical set-
tings [10–14]. The deep-learning technique
is an improvement in artificial neural net-
works. Unlike traditional machine-learning
techniques, deep-learning techniques allow
an algorithm to program itself by learning
from the images given a large dataset of la-
beled examples, thus removing the need to
specify rules [15].
Deep-learning techniques permit higher
levels of abstraction and improved predic-
tions from data. Deep-learning techniques
Computerized Bone Age
Estimation Using Deep Learning–
Based Program: Evaluation of the
Accuracy and Efficiency
Jeong Rye Kim1
Woo Hyun Shim1
Hee Mang Yoon1
Sang Hyup Hong1
Jin Seong Lee1
Young Ah Cho1
Sangki Kim2
Kim JR, Shim WH, Yoon MH, et al.
1
Department of Radiology and Research Institute of
Radiology, Asan Medical Center, University of Ulsan
College of Medicine, 88 Olympic-ro 43-gil, Songpa-gu,
Seoul 05505, South Korea. Address correspondence to
H. M. Yoon (espoirhm@gmail.com).
2
Vuno Research Center, Vuno Inc., Seoul, South Korea.
Pediatric Imaging • Original Research
Supplemental Data
Available online at www.ajronline.org.
AJR 2017; 209:1–7
0361–803X/17/2096–1
© American Roentgen Ray Society
B
one age estimation is crucial for
developmental status determina-
tions and ultimate height predic-
tions in the pediatric population,
particularly for patients with growth disor-
ders and endocrine abnormalities [1]. Two
major left-hand wrist radiograph-based
methods for bone age estimation are current-
ly used: the Greulich-Pyle [2] and Tanner-
Whitehouse [3] methods. The former is much
more frequently used in clinical practice.
Greulich-Pyle–based bone age estimation is
performed by comparing a patient’s left-hand
radiograph to standard radiographs in the
Greulich-Pyle atlas and is therefore simple
and easily applied in clinical practice. How-
ever, the process of bone age estimation,
which comprises a simple comparison of
multiple images, can be repetitive and time
consuming and is thus sometimes burden-
some to radiologists. Moreover, the accuracy
depends on the radiologist’s experience and
tends to be subjective.
Keywords: bone age, children, deep learning, neural
network model
DOI:10.2214/AJR.17.18224
J. R. Kim and W. H. Shim contributed equally to this work.
Received March 12, 2017; accepted after revision
July 7, 2017.
S. Kim is employed by Vuno, Inc., which created the deep
learning–based automatic software system for bone
age determination. J. R. Kim, W. H. Shim, H. M. Yoon,
S. H. Hong, J. S. Lee, and Y. A. Cho are employed by
Asan Medical Center, which holds patent rights for the
deep learning–based automatic software system for
bone age assessment.
OBJECTIVE. The purpose of this study is to evaluate the accuracy and efficiency of a
new automatic software system for bone age assessment and to validate its feasibility in clini-
cal practice.
MATERIALS AND METHODS. A Greulich-Pyle method–based deep-learning tech-
nique was used to develop the automatic software system for bone age determination. Using
this software, bone age was estimated from left-hand radiographs of 200 patients (3–17 years
old) using first-rank bone age (software only), computer-assisted bone age (two radiologists
with software assistance), and Greulich-Pyle atlas–assisted bone age (two radiologists with
Greulich-Pyle atlas assistance only). The reference bone age was determined by the consen-
sus of two experienced radiologists.
RESULTS. First-rank bone ages determined by the automatic software system showed a
69.5% concordance rate and significant correlations with the reference bone age (r = 0.992;
p  0.001). Concordance rates increased with the use of the automatic software system for
both reviewer 1 (63.0% for Greulich-Pyle atlas–assisted bone age vs 72.5% for computer-as-
sisted bone age) and reviewer 2 (49.5% for Greulich-Pyle atlas–assisted bone age vs 57.5% for
computer-assisted bone age). Reading times were reduced by 18.0% and 40.0% for reviewers
1 and 2, respectively.
CONCLUSION. Automatic software system showed reliably accurate bone age estima-
tions and appeared to enhance efficiency by reducing reading times without compromising
the diagnostic accuracy.
Kim et al.
Accuracy and Efficiency of Computerized Bone Age Estimation
Pediatric Imaging
Original Research
Downloadedfromwww.ajronline.orgbyFloridaAtlanticUnivon09/13/17fromIPaddress131.91.169.193.CopyrightARRS.Forpersonaluseonly;allrightsreserved
• 총 환자의 수: 200명

• 레퍼런스: 경험 많은 소아영상의학과 전문의 2명(18년, 4년 경력)의 컨센서스

• 의사A: 소아영상 세부전공한 영상의학 전문의 (500례 이상의 판독 경험)

• 의사B: 영상의학과 2년차 전공의 (판독법 하루 교육 이수 + 20례 판독)

• 인공지능: VUNO의 골연령 판독 딥러닝
AJR Am J Roentgenol. 2017 Dec;209(6):1374-1380.
40
50
60
70
80
인공지능 의사 A 의사 B
40
50
60
70
80
의사 A 

+ 인공지능
의사 B 

+ 인공지능
69.5%
63%
49.5%
72.5%
57.5%
정확도(%)
영상의학과 펠로우

(소아영상 세부전공)
영상의학과 

2년차 전공의
인공지능 vs 의사 인공지능 + 의사
AJR Am J Roentgenol. 2017 Dec;209(6):1374-1380.
• 총 환자의 수: 200명

• 의사A: 소아영상 세부전공한 영상의학 전문의 (500례 이상의 판독 경험)

• 의사B: 영상의학과 2년차 전공의 (판독법 하루 교육 이수 + 20례 판독)

• 레퍼런스: 경험 많은 소아영상의학과 전문의 2명(18년, 4년 경력)의 컨센서스

• 인공지능: VUNO의 골연령 판독 딥러닝
골연령 판독에 인간 의사와 인공지능의 시너지 효과
Digital Healthcare Institute
Director,Yoon Sup Choi, PhD
yoonsup.choi@gmail.com
총 판독 시간 (m)
0
50
100
150
200
w/o AI w/ AI
0
50
100
150
200
w/o AI w/ AI
188m
154m
180m
108m
saving 40%
of time
saving 18%
of time
의사 A 의사 B
골연령 판독에서 인공지능을 활용하면

판독 시간의 절감도 가능
• 총 환자의 수: 200명

• 의사A: 소아영상 세부전공한 영상의학 전문의 (500례 이상의 판독 경험)

• 의사B: 영상의학과 2년차 전공의 (판독법 하루 교육 이수 + 20례 판독)

• 레퍼런스: 경험 많은 소아영상의학과 전문의 2명(18년, 4년 경력)의 컨센서스

• 인공지능: VUNO의 골연령 판독 딥러닝
AJR Am J Roentgenol. 2017 Dec;209(6):1374-1380.
Digital Healthcare Institute
Director,Yoon Sup Choi, PhD
yoonsup.choi@gmail.com
This copy is for personal use only.
To order printed copies, contact reprints@rsna.org
This copy is for personal use only.
To order printed copies, contact reprints@rsna.org
ORIGINAL RESEARCH • THORACIC IMAGING
hest radiography, one of the most common diagnos- intraobserver agreements because of its limited spatial reso-
Development and Validation of Deep
Learning–based Automatic Detection
Algorithm for Malignant Pulmonary Nodules
on Chest Radiographs
Ju Gang Nam, MD* • Sunggyun Park, PhD* • Eui Jin Hwang, MD • Jong Hyuk Lee, MD • Kwang-Nam Jin, MD,
PhD • KunYoung Lim, MD, PhD • Thienkai HuyVu, MD, PhD • Jae Ho Sohn, MD • Sangheum Hwang, PhD • Jin
Mo Goo, MD, PhD • Chang Min Park, MD, PhD
From the Department of Radiology and Institute of Radiation Medicine, Seoul National University Hospital and College of Medicine, 101 Daehak-ro, Jongno-gu, Seoul
03080, Republic of Korea (J.G.N., E.J.H., J.M.G., C.M.P.); Lunit Incorporated, Seoul, Republic of Korea (S.P.); Department of Radiology, Armed Forces Seoul Hospital,
Seoul, Republic of Korea (J.H.L.); Department of Radiology, Seoul National University Boramae Medical Center, Seoul, Republic of Korea (K.N.J.); Department of
Radiology, National Cancer Center, Goyang, Republic of Korea (K.Y.L.); Department of Radiology and Biomedical Imaging, University of California, San Francisco,
San Francisco, Calif (T.H.V., J.H.S.); and Department of Industrial  Information Systems Engineering, Seoul National University of Science and Technology, Seoul,
Republic of Korea (S.H.). Received January 30, 2018; revision requested March 20; revision received July 29; accepted August 6. Address correspondence to C.M.P.
(e-mail: cmpark.morphius@gmail.com).
Study supported by SNUH Research Fund and Lunit (06–2016–3000) and by Seoul Research and Business Development Program (FI170002).
*J.G.N. and S.P. contributed equally to this work.
Conflicts of interest are listed at the end of this article.
Radiology 2018; 00:1–11 • https://doi.org/10.1148/radiol.2018180237 • Content codes:
Purpose: To develop and validate a deep learning–based automatic detection algorithm (DLAD) for malignant pulmonary nodules
on chest radiographs and to compare its performance with physicians including thoracic radiologists.
Materials and Methods: For this retrospective study, DLAD was developed by using 43292 chest radiographs (normal radiograph–
to–nodule radiograph ratio, 34067:9225) in 34676 patients (healthy-to-nodule ratio, 30784:3892; 19230 men [mean age, 52.8
years; age range, 18–99 years]; 15446 women [mean age, 52.3 years; age range, 18–98 years]) obtained between 2010 and 2015,
which were labeled and partially annotated by 13 board-certified radiologists, in a convolutional neural network. Radiograph clas-
sification and nodule detection performances of DLAD were validated by using one internal and four external data sets from three
South Korean hospitals and one U.S. hospital. For internal and external validation, radiograph classification and nodule detection
performances of DLAD were evaluated by using the area under the receiver operating characteristic curve (AUROC) and jackknife
alternative free-response receiver-operating characteristic (JAFROC) figure of merit (FOM), respectively. An observer performance
test involving 18 physicians, including nine board-certified radiologists, was conducted by using one of the four external validation
data sets. Performances of DLAD, physicians, and physicians assisted with DLAD were evaluated and compared.
Results: According to one internal and four external validation data sets, radiograph classification and nodule detection perfor-
mances of DLAD were a range of 0.92–0.99 (AUROC) and 0.831–0.924 (JAFROC FOM), respectively. DLAD showed a higher
AUROC and JAFROC FOM at the observer performance test than 17 of 18 and 15 of 18 physicians, respectively (P , .05), and
all physicians showed improved nodule detection performances with DLAD (mean JAFROC FOM improvement, 0.043; range,
0.006–0.190; P , .05).
Conclusion: This deep learning–based automatic detection algorithm outperformed physicians in radiograph classification and nod-
ule detection performance for malignant pulmonary nodules on chest radiographs, and it enhanced physicians’ performances when
used as a second reader.
©RSNA, 2018
Online supplemental material is available for this article.
This copy is for personal use only.
To order printed copies, contact reprints@rsna.org
This copy is for personal use only.
To order printed copies, contact reprints@rsna.org
ORIGINAL RESEARCH • THORACIC IMAGING
hest radiography, one of the most common diagnos- intraobserver agreements because of its limited spatial reso-
Development and Validation of Deep
Learning–based Automatic Detection
Algorithm for Malignant Pulmonary Nodules
on Chest Radiographs
Ju Gang Nam, MD* • Sunggyun Park, PhD* • Eui Jin Hwang, MD • Jong Hyuk Lee, MD • Kwang-Nam Jin, MD,
PhD • KunYoung Lim, MD, PhD • Thienkai HuyVu, MD, PhD • Jae Ho Sohn, MD • Sangheum Hwang, PhD • Jin
Mo Goo, MD, PhD • Chang Min Park, MD, PhD
From the Department of Radiology and Institute of Radiation Medicine, Seoul National University Hospital and College of Medicine, 101 Daehak-ro, Jongno-gu, Seoul
03080, Republic of Korea (J.G.N., E.J.H., J.M.G., C.M.P.); Lunit Incorporated, Seoul, Republic of Korea (S.P.); Department of Radiology, Armed Forces Seoul Hospital,
Seoul, Republic of Korea (J.H.L.); Department of Radiology, Seoul National University Boramae Medical Center, Seoul, Republic of Korea (K.N.J.); Department of
Radiology, National Cancer Center, Goyang, Republic of Korea (K.Y.L.); Department of Radiology and Biomedical Imaging, University of California, San Francisco,
San Francisco, Calif (T.H.V., J.H.S.); and Department of Industrial  Information Systems Engineering, Seoul National University of Science and Technology, Seoul,
Republic of Korea (S.H.). Received January 30, 2018; revision requested March 20; revision received July 29; accepted August 6. Address correspondence to C.M.P.
(e-mail: cmpark.morphius@gmail.com).
Study supported by SNUH Research Fund and Lunit (06–2016–3000) and by Seoul Research and Business Development Program (FI170002).
*J.G.N. and S.P. contributed equally to this work.
Conflicts of interest are listed at the end of this article.
Radiology 2018; 00:1–11 • https://doi.org/10.1148/radiol.2018180237 • Content codes:
Purpose: To develop and validate a deep learning–based automatic detection algorithm (DLAD) for malignant pulmonary nodules
on chest radiographs and to compare its performance with physicians including thoracic radiologists.
Materials and Methods: For this retrospective study, DLAD was developed by using 43292 chest radiographs (normal radiograph–
to–nodule radiograph ratio, 34067:9225) in 34676 patients (healthy-to-nodule ratio, 30784:3892; 19230 men [mean age, 52.8
years; age range, 18–99 years]; 15446 women [mean age, 52.3 years; age range, 18–98 years]) obtained between 2010 and 2015,
which were labeled and partially annotated by 13 board-certified radiologists, in a convolutional neural network. Radiograph clas-
sification and nodule detection performances of DLAD were validated by using one internal and four external data sets from three
South Korean hospitals and one U.S. hospital. For internal and external validation, radiograph classification and nodule detection
performances of DLAD were evaluated by using the area under the receiver operating characteristic curve (AUROC) and jackknife
alternative free-response receiver-operating characteristic (JAFROC) figure of merit (FOM), respectively. An observer performance
test involving 18 physicians, including nine board-certified radiologists, was conducted by using one of the four external validation
data sets. Performances of DLAD, physicians, and physicians assisted with DLAD were evaluated and compared.
Results: According to one internal and four external validation data sets, radiograph classification and nodule detection perfor-
mances of DLAD were a range of 0.92–0.99 (AUROC) and 0.831–0.924 (JAFROC FOM), respectively. DLAD showed a higher
AUROC and JAFROC FOM at the observer performance test than 17 of 18 and 15 of 18 physicians, respectively (P , .05), and
all physicians showed improved nodule detection performances with DLAD (mean JAFROC FOM improvement, 0.043; range,
0.006–0.190; P , .05).
Conclusion: This deep learning–based automatic detection algorithm outperformed physicians in radiograph classification and nod-
ule detection performance for malignant pulmonary nodules on chest radiographs, and it enhanced physicians’ performances when
used as a second reader.
©RSNA, 2018
Online supplemental material is available for this article.
• 43,292 chest PA (normal:nodule=34,067:9225)

• labeled/annotated by 13 board-certified radiologists.

• DLAD were validated 1 internal + 4 external datasets 

• 서울대병원 / 보라매병원 / 국립암센터 / UCSF 

• Classification / Lesion localization 

• 인공지능 vs. 의사 vs. 인공지능+의사

• 다양한 수준의 의사와 비교

• Non-radiology / radiology residents 

• Board-certified radiologist / Thoracic radiologists
Nam et al
Figure 1: Images in a 78-year-old female patient with a 1.9-cm part-solid nodule at the left upper lobe. (a) The nodule was faintly visible on the
chest radiograph (arrowheads) and was detected by 11 of 18 observers. (b) At contrast-enhanced CT examination, biopsy confirmed lung adeno-
carcinoma (arrow). (c) DLAD reported the nodule with a confidence level of 2, resulting in its detection by an additional five radiologists and an
elevation in its confidence by eight radiologists.
Figure 2: Images in a 64-year-old male patient with a 2.2-cm lung adenocarcinoma at the left upper lobe. (a) The nodule was faintly visible on
the chest radiograph (arrowheads) and was detected by seven of 18 observers. (b) Biopsy confirmed lung adenocarcinoma in the left upper lobe
on contrast-enhanced CT image (arrow). (c) DLAD reported the nodule with a confidence level of 2, resulting in its detection by an additional two
radiologists and an elevated confidence level of the nodule by two radiologists.
Deep Learning Automatic Detection Algorithm for Malignant Pulmonary Nodules
Table 3: Patient Classification and Nodule Detection at the Observer Performance Test
Observer
Test 1
DLAD versus Test 1
(P Value) Test 2
Test 1 versus Test 2 (P
Value)
Radiograph
Classification
(AUROC)
Nodule
Detection
(JAFROC FOM)
Radiograph
Classification
Nodule
Detection
Radiograph
Classification
(AUROC)
Nodule
Detection
(JAFROC
FOM)
Radiograph
Classification
Nodule
Detection
Nonradiology
physicians
Observer 1 0.77 0.716 ,.001 ,.001 0.91 0.853 ,.001 ,.001
Observer 2 0.78 0.657 ,.001 ,.001 0.90 0.846 ,.001 ,.001
Observer 3 0.80 0.700 ,.001 ,.001 0.88 0.783 ,.001 ,.001
Group 0.691 ,.001* 0.828 ,.001*
Radiology residents
Observer 4 0.78 0.767 ,.001 ,.001 0.80 0.785 .02 .03
Observer 5 0.86 0.772 .001 ,.001 0.91 0.837 .02 ,.001
Observer 6 0.86 0.789 .05 .002 0.86 0.799 .08 .54
Observer 7 0.84 0.807 .01 .003 0.91 0.843 .003 .02
Observer 8 0.87 0.797 .10 .003 0.90 0.845 .03 .001
Observer 9 0.90 0.847 .52 .12 0.92 0.867 .04 .03
Group 0.790 ,.001* 0.867 ,.001*
Board-certified
radiologists
Observer 10 0.87 0.836 .05 .01 0.90 0.865 .004 .002
Observer 11 0.83 0.804 ,.001 ,.001 0.84 0.817 .03 .04
Observer 12 0.88 0.817 .18 .005 0.91 0.841 .01 .01
Observer 13 0.91 0.824 ..99 .02 0.92 0.836 .51 .24
Observer 14 0.88 0.834 .14 .03 0.88 0.840 .87 .23
Group 0.821 .02* 0.840 .01*
Thoracic radiologists
Observer 15 0.94 0.856 .15 .21 0.96 0.878 .08 .03
Observer 16 0.92 0.854 .60 .17 0.93 0.872 .34 .02
Observer 17 0.86 0.820 .02 .01 0.88 0.838 .14 .12
Observer 18 0.84 0.800 ,.001 ,.001 0.87 0.827 .02 .02
Group 0.833 .08* 0.854 ,.001*
Note.—Observer 4 had 1 year of experience; observers 5 and 6 had 2 years of experience; observers 7–9 had 3 years of experience; observers
10–12 had 7 years of experience; observers 13 and 14 had 8 years of experience; observer 15 had 26 years of experience; observer 16 had 13
years of experience; and observers 17 and 18 had 9 years of experience. Observers 1–3 were 4th-year residents from obstetrics and gynecolo-
의사
인공지능 vs. 의사만

(p value) 의사+인공지능
의사 vs. 의사+인공지능

(p value)
영상의학과 1년차 전공의
영상의학과 2년차 전공의
영상의학과 3년차 전공의
산부인과 4년차 전공의
정형외과 4년차 전공의
내과 4년차 전공의
영상의학과 전문의
7년 경력
8년 경력
영상의학과 전문의 (흉부)
26년 경력
13년 경력
9년 경력
영상의학과 전공의
비영상의학과 의사
Deep Learning Automatic Detection Algorithm for Malignant Pulmonary Nodules
Table 3: Patient Classification and Nodule Detection at the Observer Performance Test
Observer
Test 1
DLAD versus Test 1
(P Value) Test 2
Test 1 versus Test 2 (P
Value)
Radiograph
Classification
(AUROC)
Nodule
Detection
(JAFROC FOM)
Radiograph
Classification
Nodule
Detection
Radiograph
Classification
(AUROC)
Nodule
Detection
(JAFROC
FOM)
Radiograph
Classification
Nodule
Detection
Nonradiology
physicians
Observer 1 0.77 0.716 ,.001 ,.001 0.91 0.853 ,.001 ,.001
Observer 2 0.78 0.657 ,.001 ,.001 0.90 0.846 ,.001 ,.001
Observer 3 0.80 0.700 ,.001 ,.001 0.88 0.783 ,.001 ,.001
Group 0.691 ,.001* 0.828 ,.001*
Radiology residents
Observer 4 0.78 0.767 ,.001 ,.001 0.80 0.785 .02 .03
Observer 5 0.86 0.772 .001 ,.001 0.91 0.837 .02 ,.001
Observer 6 0.86 0.789 .05 .002 0.86 0.799 .08 .54
Observer 7 0.84 0.807 .01 .003 0.91 0.843 .003 .02
Observer 8 0.87 0.797 .10 .003 0.90 0.845 .03 .001
Observer 9 0.90 0.847 .52 .12 0.92 0.867 .04 .03
Group 0.790 ,.001* 0.867 ,.001*
Board-certified
radiologists
Observer 10 0.87 0.836 .05 .01 0.90 0.865 .004 .002
Observer 11 0.83 0.804 ,.001 ,.001 0.84 0.817 .03 .04
Observer 12 0.88 0.817 .18 .005 0.91 0.841 .01 .01
Observer 13 0.91 0.824 ..99 .02 0.92 0.836 .51 .24
Observer 14 0.88 0.834 .14 .03 0.88 0.840 .87 .23
Group 0.821 .02* 0.840 .01*
Thoracic radiologists
Observer 15 0.94 0.856 .15 .21 0.96 0.878 .08 .03
Observer 16 0.92 0.854 .60 .17 0.93 0.872 .34 .02
Observer 17 0.86 0.820 .02 .01 0.88 0.838 .14 .12
Observer 18 0.84 0.800 ,.001 ,.001 0.87 0.827 .02 .02
Group 0.833 .08* 0.854 ,.001*
Note.—Observer 4 had 1 year of experience; observers 5 and 6 had 2 years of experience; observers 7–9 had 3 years of experience; observers
10–12 had 7 years of experience; observers 13 and 14 had 8 years of experience; observer 15 had 26 years of experience; observer 16 had 13
years of experience; and observers 17 and 18 had 9 years of experience. Observers 1–3 were 4th-year residents from obstetrics and gynecolo-
의사
인공지능 vs. 의사만

(p value) 의사+인공지능
의사 vs. 의사+인공지능

(p value)
영상의학과 1년차 전공의
영상의학과 2년차 전공의
영상의학과 3년차 전공의
산부인과 4년차 전공의
정형외과 4년차 전공의
내과 4년차 전공의
영상의학과 전문의
7년 경력
8년 경력
영상의학과 전문의 (흉부)
26년 경력
13년 경력
9년 경력
영상의학과 전공의
비영상의학과 의사
•인공지능을 second reader로 활용하면 정확도가 개선

•classification: 17 of 18 명이 개선 (15 of 18, P0.05)

•nodule detection: 18 of 18 명이 개선 (14 of 18, P0.05)
HUMAN ONLY
HUMAN + ALGORITHM
HUMAN ONLY
HUMAN + ALGORITHM
HUMAN ONLY
HUMAN + ALGORITHM
HUMAN ONLY
HUMAN + ALGORITHM
HUMAN ONLY
HUMAN + ALGORITHM
Clinical Study Results (CXR Nodule) - Radiology
On the courtesy of Lunit, Inc
Pathology
1. CAD: Computer Aided Detection

2. Triage: Prioritization of Critical cases

3. Image driven biomarker
A B DC
Benign without atypia / Atypic / DCIS (ductal carcinoma in situ) / Invasive Carcinoma
Interpretation?
Elmore etl al. JAMA 2015
Diagnostic Concordance Among Pathologists 

유방암 병리 데이터 판독하기
Figure 4. Participating Pathologists’ Interpretations of Each of the 240 Breast Biopsy Test Cases
0 25 50 75 100
Interpretations, %
2
4
6
8
10
12
14
16
18
20
22
24
26
28
30
32
34
36
38
40
42
44
46
48
50
52
54
56
58
60
62
64
66
68
70
72
Case
Benign without atypia
72 Cases
2070 Total interpretations
A
0 25 50 75 100
Interpretations, %
218
220
222
224
226
228
230
232
234
236
238
240
Case
Invasive carcinoma
23 Cases
663 Total interpretations
D
0 25 50 75 100
Interpretations, %
147
145
149
151
153
155
157
159
161
163
165
167
169
171
173
175
177
179
181
183
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
Case
DCIS
73 Cases
2097 Total interpretations
C
0 25 50 75 100
Interpretations, %
74
76
78
80
82
84
86
88
90
92
94
96
98
100
102
104
106
108
110
112
114
116
118
120
122
124
126
128
130
132
134
136
138
140
142
144
Case
Atypia
72 Cases
2070 Total interpretations
B
Benign without atypia
Atypia
DCIS
Invasive carcinoma
Pathologist interpretation
DCIS indicates ductal carcinoma in situ.
Diagnostic Concordance in Interpreting Breast Biopsies Original Investigation Research
Elmore etl al. JAMA 2015
유방암 판독에 대한 병리학과 전문의들의 불일치도
https://www.facebook.com/groups/TensorFlowKR/permalink/633902253617503/
구글 엔지니어들이 AACR 2018 에서

의료 인공지능 기조 연설
Downloadedfromhttps://journals.lww.com/ajspbyBhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4XMi0hCywCX1AWnYQp/IlQrHD3MyLIZIvnCFZVJ56DGsD590P5lh5KqE20T/dBX3x9CoM=on10/14/2018
Downloadedfromhttps://journals.lww.com/ajspbyBhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4XMi0hCywCX1AWnYQp/IlQrHD3MyLIZIvnCFZVJ56DGsD590P5lh5KqE20T/dBX3x9CoM=on10/14/2018
Impact of Deep Learning Assistance on the
Histopathologic Review of Lymph Nodes for Metastatic
Breast Cancer
David F. Steiner, MD, PhD,* Robert MacDonald, PhD,* Yun Liu, PhD,* Peter Truszkowski, MD,*
Jason D. Hipp, MD, PhD, FCAP,* Christopher Gammage, MS,* Florence Thng, MS,†
Lily Peng, MD, PhD,* and Martin C. Stumpe, PhD*
Abstract: Advances in the quality of whole-slide images have set the
stage for the clinical use of digital images in anatomic pathology.
Along with advances in computer image analysis, this raises the
possibility for computer-assisted diagnostics in pathology to improve
histopathologic interpretation and clinical care. To evaluate the
potential impact of digital assistance on interpretation of digitized
slides, we conducted a multireader multicase study utilizing our deep
learning algorithm for the detection of breast cancer metastasis in
lymph nodes. Six pathologists reviewed 70 digitized slides from lymph
node sections in 2 reader modes, unassisted and assisted, with a wash-
out period between sessions. In the assisted mode, the deep learning
algorithm was used to identify and outline regions with high like-
lihood of containing tumor. Algorithm-assisted pathologists demon-
strated higher accuracy than either the algorithm or the pathologist
alone. In particular, algorithm assistance significantly increased the
sensitivity of detection for micrometastases (91% vs. 83%, P=0.02).
In addition, average review time per image was significantly shorter
with assistance than without assistance for both micrometastases (61
vs. 116 s, P=0.002) and negative images (111 vs. 137 s, P=0.018).
Lastly, pathologists were asked to provide a numeric score regarding
the difficulty of each image classification. On the basis of this score,
pathologists considered the image review of micrometastases to be
significantly easier when interpreted with assistance (P=0.0005).
Utilizing a proof of concept assistant tool, this study demonstrates the
potential of a deep learning algorithm to improve pathologist accu-
racy and efficiency in a digital pathology workflow.
Key Words: artificial intelligence, machine learning, digital pathology,
breast cancer, computer aided detection
(Am J Surg Pathol 2018;00:000–000)
The regulatory approval and gradual implementation of
whole-slide scanners has enabled the digitization of glass
slides for remote consults and archival purposes.1 Digitiza-
tion alone, however, does not necessarily improve the con-
sistency or efficiency of a pathologist’s primary workflow. In
fact, image review on a digital medium can be slightly
slower than on glass, especially for pathologists with limited
digital pathology experience.2 However, digital pathology
and image analysis tools have already demonstrated po-
tential benefits, including the potential to reduce inter-reader
variability in the evaluation of breast cancer HER2 status.3,4
Digitization also opens the door for assistive tools based on
Artificial Intelligence (AI) to improve efficiency and con-
sistency, decrease fatigue, and increase accuracy.5
Among AI technologies, deep learning has demon-
strated strong performance in many automated image-rec-
ognition applications.6–8 Recently, several deep learning–
based algorithms have been developed for the detection of
breast cancer metastases in lymph nodes as well as for other
applications in pathology.9,10 Initial findings suggest that
some algorithms can even exceed a pathologist’s sensitivity
for detecting individual cancer foci in digital images. How-
ever, this sensitivity gain comes at the cost of increased false
positives, potentially limiting the utility of such algorithms for
automated clinical use.11 In addition, deep learning algo-
rithms are inherently limited to the task for which they have
been specifically trained. While we have begun to understand
the strengths of these algorithms (such as exhaustive search)
and their weaknesses (sensitivity to poor optical focus, tumor
mimics; manuscript under review), the potential clinical util-
ity of such algorithms has not been thoroughly examined.
While an accurate algorithm alone will not necessarily aid
pathologists or improve clinical interpretation, these benefits
may be achieved through thoughtful and appropriate in-
tegration of algorithm predictions into the clinical workflow.8
From the *Google AI Healthcare; and †Verily Life Sciences, Mountain
View, CA.
D.F.S., R.M., and Y.L. are co-first authors (equal contribution).
Work done as part of the Google Brain Healthcare Technology Fellowship
(D.F.S. and P.T.).
Conflicts of Interest and Source of Funding: D.F.S., R.M., Y.L., P.T.,
J.D.H., C.G., F.T., L.P., M.C.S. are employees of Alphabet and have
Alphabet stock.
Correspondence: David F. Steiner, MD, PhD, Google AI Healthcare,
1600 Amphitheatre Way, Mountain View, CA 94043
(e-mail: davesteiner@google.com).
Supplemental Digital Content is available for this article. Direct URL citations
appear in the printed text and are provided in the HTML and PDF
versions of this article on the journal’s website, www.ajsp.com.
Copyright © 2018 The Author(s). Published by Wolters Kluwer Health,
Inc. This is an open-access article distributed under the terms of the
Creative Commons Attribution-Non Commercial-No Derivatives
License 4.0 (CCBY-NC-ND), where it is permissible to download and
share the work provided it is properly cited. The work cannot be
changed in any way or used commercially without permission from
the journal.
ORIGINAL ARTICLE
Am J Surg Pathol  Volume 00, Number 00, ’’ 2018 www.ajsp.com | 1
• 구글이 개발한 병리 인공지능, LYNA(LYmph Node Assistant)

• 유방암의 림프절 전이에 대해서,

• 병리학 전문의 + 인공지능의 시너지를 증명하는 연구

• 정확성(민감도) / 판독 시간 / (micrometa의) 판독 난이도
당뇨성 망막병증 판독 인공지능
Gastroenterology
•Some polyps were detected with only partial appearance.
•detected in both normal and insufficient light condition.
•detected under both qualified and suboptimal bowel preparations.
ARTICLESNATURE BIOMEDICAL ENGINEERING
from patients who underwent colonoscopy examinations up to 2
years later.
Also, we demonstrated high per-image-sensitivity (94.38%
and 91.64%) in both the image (datasetA) and video (datasetC)
analyses. DatasetsA and C included large variations of polyp mor-
phology and image quality (Fig. 3, Supplementary Figs. 2–5 and
Supplementary Videos 3 and 4). For images with only flat and iso-
datasets are often small and do not represent the full range of colon
conditions encountered in the clinical setting, and there are often
discrepancies in the reporting of clinical metrics of success such as
sensitivity and specificity19,20,26
. Compared with other metrics such
as precision, we believe that sensitivity and specificity are the most
appropriate metrics for the evaluation of algorithm performance
because of their independence on the ratio of positive to negative
Fig. 3 | Examples of polyp detection for datasetsA and C. Polyps of different morphology, including flat isochromatic polyps (left), dome-shaped polyps
(second from left, middle), pedunculated polyps (second from right) and sessile serrated adenomatous polyps (right), were detected by the algorithm
(as indicated by the green tags in the bottom set of images) in both normal and insufficient light conditions, under both qualified and suboptimal bowel
preparations. Some polyps were detected with only partial appearance (middle, second from right). See Supplementary Figs 2–6 for additional examples.
flat isochromatic polyps dome-shaped polyps sessile serrated adenomatous polypspedunculated polyps
Examples of Polyp Detection for Datasets A and C
1Wang P, et al. Gut 2019;0:1–7. doi:10.1136/gutjnl-2018-317500
Endoscopy
ORIGINAL ARTICLE
Real-time automatic detection system increases
colonoscopic polyp and adenoma detection rates: a
prospective randomised controlled study
Pu Wang,  1
Tyler M Berzin,  2
Jeremy Romek Glissen Brown,  2
Shishira Bharadwaj,2
Aymeric Becq,2
Xun Xiao,1
Peixi Liu,1
Liangping Li,1
Yan Song,1
Di Zhang,1
Yi Li,1
Guangre Xu,1
Mengtian Tu,1
Xiaogang Liu  1
To cite: Wang P, Berzin TM,
Glissen Brown JR, et al. Gut
Epub ahead of print: [please
include Day Month Year].
doi:10.1136/
gutjnl-2018-317500
► Additional material is
published online only.To view
please visit the journal online
(http://dx.doi.org/10.1136/
gutjnl-2018-317500).
1
Department of
Gastroenterology, Sichuan
Academy of Medical Sciences
 Sichuan Provincial People’s
Hospital, Chengdu, China
2
Center for Advanced
Endoscopy, Beth Israel
Deaconess Medical Center and
Harvard Medical School, Boston,
Massachusetts, USA
Correspondence to
Xiaogang Liu, Department
of Gastroenterology Sichuan
Academy of Medical Sciences
and Sichuan Provincial People’s
Hospital, Chengdu, China;
Gary.samsph@gmail.com
Received 30 August 2018
Revised 4 February 2019
Accepted 13 February 2019
© Author(s) (or their
employer(s)) 2019. Re-use
permitted under CC BY-NC. No
commercial re-use. See rights
and permissions. Published
by BMJ.
ABSTRACT
Objective The effect of colonoscopy on colorectal
cancer mortality is limited by several factors, among them
a certain miss rate, leading to limited adenoma detection
rates (ADRs).We investigated the effect of an automatic
polyp detection system based on deep learning on polyp
detection rate and ADR.
Design In an open, non-blinded trial, consecutive
patients were prospectively randomised to undergo
diagnostic colonoscopy with or without assistance of a
real-time automatic polyp detection system providing
a simultaneous visual notice and sound alarm on polyp
detection.The primary outcome was ADR.
Results Of 1058 patients included, 536 were
randomised to standard colonoscopy, and 522 were
randomised to colonoscopy with computer-aided
diagnosis.The artificial intelligence (AI) system
significantly increased ADR (29.1%vs20.3%, p0.001)
and the mean number of adenomas per patient
(0.53vs0.31, p0.001).This was due to a higher number
of diminutive adenomas found (185vs102; p0.001),
while there was no statistical difference in larger
adenomas (77vs58, p=0.075). In addition, the number
of hyperplastic polyps was also significantly increased
(114vs52, p0.001).
Conclusions In a low prevalent ADR population, an
automatic polyp detection system during colonoscopy
resulted in a significant increase in the number of
diminutive adenomas detected, as well as an increase in
the rate of hyperplastic polyps.The cost–benefit ratio of
such effects has to be determined further.
Trial registration number ChiCTR-DDD-17012221;
Results.
INTRODUCTION
Colorectal cancer (CRC) is the second and third-
leading causes of cancer-related deaths in men and
women respectively.1
Colonoscopy is the gold stan-
dard for screening CRC.2 3
Screening colonoscopy
has allowed for a reduction in the incidence and
mortality of CRC via the detection and removal
of adenomatous polyps.4–8
Additionally, there is
evidence that with each 1.0% increase in adenoma
detection rate (ADR), there is an associated 3.0%
decrease in the risk of interval CRC.9 10
However,
polyps can be missed, with reported miss rates of
up to 27% due to both polyp and operator charac-
teristics.11 12
Unrecognised polyps within the visual field is
an important problem to address.11
Several studies
have shown that assistance by a second observer
increases the polyp detection rate (PDR), but such a
strategy remains controversial in terms of increasing
the ADR.13–15
Ideally, a real-time automatic polyp detec-
tion system, with performance close to that of
expert endoscopists, could assist the endosco-
pist in detecting lesions that might correspond to
adenomas in a more consistent and reliable way
Significance of this study
What is already known on this subject?
► Colorectal adenoma detection rate (ADR)
is regarded as a main quality indicator of
(screening) colonoscopy and has been shown
to correlate with interval cancers. Reducing
adenoma miss rates by increasing ADR has
been a goal of many studies focused on
imaging techniques and mechanical methods.
► Artificial intelligence has been recently
introduced for polyp and adenoma detection
as well as differentiation and has shown
promising results in preliminary studies.
What are the new findings?
► This represents the first prospective randomised
controlled trial examining an automatic polyp
detection during colonoscopy and shows an
increase of ADR by 50%, from 20% to 30%.
► This effect was mainly due to a higher rate of
small adenomas found.
► The detection rate of hyperplastic polyps was
also significantly increased.
How might it impact on clinical practice in the
foreseeable future?
► Automatic polyp and adenoma detection could
be the future of diagnostic colonoscopy in order
to achieve stable high adenoma detection rates.
► However, the effect on ultimate outcome is
still unclear, and further improvements such as
polyp differentiation have to be implemented.
on17March2019byguest.Protectedbycopyright.http://gut.bmj.com/Gut:firstpublishedas10.1136/gutjnl-2018-317500on27February2019.Downloadedfrom
• 이 인공지능이 polyp detection과 adenoma detection에 실제 도움을 줌

• Prospective RCT로 증명 (n=1058; standard=536, CAD=522)

• 인공지능의 도움으로 

• adenoma detection rate 증가: 29.1% vs 20.3% (p0.001)

• 환자당 detection 된 adenoma 수의 증가: 0.53 vs 0.31 (p0.001)

• 이는 주로 diminutive adenomas 를 더 많이 찾았기 때문: 185 vs 102 (p0.001)

• hyperplastic polyps의 수 증가: 114 vs 52 (p0.001)
Endoscopy
Prosp
The
trial
syste
ADR
the S
patie
to Fe
bowe
given
defin
high
infla
Figure 1 Deep learning architecture.The detection algorithm is a deep
convolutional neural network (CNN) based on SegNet architecture. Data
flow is from left to right: a colonoscopy image is sequentially warped
into a binary image, with 1 representing polyp pixels and 0 representing
no polyp in probability a map.This is then displayed, as showed in the
output, with a hollow tracing box on the CADe monitor.
1Wang P, et al. Gut 2019;0:1–7. doi:10.1136/gutjnl-2018-317500
Endoscopy
ORIGINAL ARTICLE
Real-time automatic detection system increases
colonoscopic polyp and adenoma detection rates: a
prospective randomised controlled study
Pu Wang,  1
Tyler M Berzin,  2
Jeremy Romek Glissen Brown,  2
Shishira Bharadwaj,2
Aymeric Becq,2
Xun Xiao,1
Peixi Liu,1
Liangping Li,1
Yan Song,1
Di Zhang,1
Yi Li,1
Guangre Xu,1
Mengtian Tu,1
Xiaogang Liu  1
To cite: Wang P, Berzin TM,
Glissen Brown JR, et al. Gut
Epub ahead of print: [please
include Day Month Year].
doi:10.1136/
gutjnl-2018-317500
► Additional material is
published online only.To view
please visit the journal online
(http://dx.doi.org/10.1136/
gutjnl-2018-317500).
1
Department of
Gastroenterology, Sichuan
Academy of Medical Sciences
 Sichuan Provincial People’s
Hospital, Chengdu, China
2
Center for Advanced
Endoscopy, Beth Israel
Deaconess Medical Center and
Harvard Medical School, Boston,
Massachusetts, USA
Correspondence to
Xiaogang Liu, Department
of Gastroenterology Sichuan
Academy of Medical Sciences
and Sichuan Provincial People’s
Hospital, Chengdu, China;
Gary.samsph@gmail.com
Received 30 August 2018
Revised 4 February 2019
Accepted 13 February 2019
© Author(s) (or their
employer(s)) 2019. Re-use
permitted under CC BY-NC. No
commercial re-use. See rights
and permissions. Published
by BMJ.
ABSTRACT
Objective The effect of colonoscopy on colorectal
cancer mortality is limited by several factors, among them
a certain miss rate, leading to limited adenoma detection
rates (ADRs).We investigated the effect of an automatic
polyp detection system based on deep learning on polyp
detection rate and ADR.
Design In an open, non-blinded trial, consecutive
patients were prospectively randomised to undergo
diagnostic colonoscopy with or without assistance of a
real-time automatic polyp detection system providing
a simultaneous visual notice and sound alarm on polyp
detection.The primary outcome was ADR.
Results Of 1058 patients included, 536 were
randomised to standard colonoscopy, and 522 were
randomised to colonoscopy with computer-aided
diagnosis.The artificial intelligence (AI) system
significantly increased ADR (29.1%vs20.3%, p0.001)
and the mean number of adenomas per patient
(0.53vs0.31, p0.001).This was due to a higher number
of diminutive adenomas found (185vs102; p0.001),
while there was no statistical difference in larger
adenomas (77vs58, p=0.075). In addition, the number
of hyperplastic polyps was also significantly increased
(114vs52, p0.001).
Conclusions In a low prevalent ADR population, an
automatic polyp detection system during colonoscopy
resulted in a significant increase in the number of
diminutive adenomas detected, as well as an increase in
the rate of hyperplastic polyps.The cost–benefit ratio of
such effects has to be determined further.
Trial registration number ChiCTR-DDD-17012221;
Results.
INTRODUCTION
Colorectal cancer (CRC) is the second and third-
leading causes of cancer-related deaths in men and
women respectively.1
Colonoscopy is the gold stan-
dard for screening CRC.2 3
Screening colonoscopy
has allowed for a reduction in the incidence and
mortality of CRC via the detection and removal
of adenomatous polyps.4–8
Additionally, there is
evidence that with each 1.0% increase in adenoma
detection rate (ADR), there is an associated 3.0%
decrease in the risk of interval CRC.9 10
However,
polyps can be missed, with reported miss rates of
up to 27% due to both polyp and operator charac-
teristics.11 12
Unrecognised polyps within the visual field is
an important problem to address.11
Several studies
have shown that assistance by a second observer
increases the polyp detection rate (PDR), but such a
strategy remains controversial in terms of increasing
the ADR.13–15
Ideally, a real-time automatic polyp detec-
tion system, with performance close to that of
expert endoscopists, could assist the endosco-
pist in detecting lesions that might correspond to
adenomas in a more consistent and reliable way
Significance of this study
What is already known on this subject?
► Colorectal adenoma detection rate (ADR)
is regarded as a main quality indicator of
(screening) colonoscopy and has been shown
to correlate with interval cancers. Reducing
adenoma miss rates by increasing ADR has
been a goal of many studies focused on
imaging techniques and mechanical methods.
► Artificial intelligence has been recently
introduced for polyp and adenoma detection
as well as differentiation and has shown
promising results in preliminary studies.
What are the new findings?
► This represents the first prospective randomised
controlled trial examining an automatic polyp
detection during colonoscopy and shows an
increase of ADR by 50%, from 20% to 30%.
► This effect was mainly due to a higher rate of
small adenomas found.
► The detection rate of hyperplastic polyps was
also significantly increased.
How might it impact on clinical practice in the
foreseeable future?
► Automatic polyp and adenoma detection could
be the future of diagnostic colonoscopy in order
to achieve stable high adenoma detection rates.
► However, the effect on ultimate outcome is
still unclear, and further improvements such as
polyp differentiation have to be implemented.
on17March2019byguest.Protectedbycopyright.http://gut.bmj.com/Gut:firstpublishedas10.1136/gutjnl-2018-317500on27February2019.Downloadedfrom
• 이 인공지능이 polyp detection과 adenoma detection에 실제 도움을 줌

• Prospective RCT로 증명 (n=1058; standard=536, CAD=522)

• 인공지능의 도움으로 

• adenoma detection rate 증가: 29.1% vs 20.3% (p0.001)

• 환자당 detection 된 adenoma 수의 증가: 0.53 vs 0.31 (p0.001)

• 이는 주로 diminutive adenomas 를 더 많이 찾았기 때문: 185 vs 102 (p0.001)

• hyperplastic polyps의 수 증가: 114 vs 52 (p0.001)
http://www.rolls-royce.com/about/our-technology/enabling-technologies/engine-health-management.aspx#sense
250 sensors to monitor the “health” of the GE turbines
Fig 1. What can consumer wearables do? Heart rate can be measured with an oximeter built into a ring [3], muscle activity with an electromyographi
sensor embedded into clothing [4], stress with an electodermal sensor incorporated into a wristband [5], and physical activity or sleep patterns via an
accelerometer in a watch [6,7]. In addition, a female’s most fertile period can be identified with detailed body temperature tracking [8], while levels of me
attention can be monitored with a small number of non-gelled electroencephalogram (EEG) electrodes [9]. Levels of social interaction (also known to a
PLOS Medicine 2016
•복잡한 의료 데이터의 분석 및 insight 도출

•영상 의료/병리 데이터의 분석/판독

•연속 데이터의 모니터링 및 예방/예측
의료 인공지능의 세 유형
Project Artemis at UIOT
S E P S I S
A targeted real-time early warning score (TREWScore)
for septic shock
Katharine E. Henry,1
David N. Hager,2
Peter J. Pronovost,3,4,5
Suchi Saria1,3,5,6
*
Sepsis is a leading cause of death in the United States, with mortality highest among patients who develop septic
shock. Early aggressive treatment decreases morbidity and mortality. Although automated screening tools can detect
patients currently experiencing severe sepsis and septic shock, none predict those at greatest risk of developing
shock. We analyzed routinely available physiological and laboratory data from intensive care unit patients and devel-
oped “TREWScore,” a targeted real-time early warning score that predicts which patients will develop septic shock.
TREWScore identified patients before the onset of septic shock with an area under the ROC (receiver operating
characteristic) curve (AUC) of 0.83 [95% confidence interval (CI), 0.81 to 0.85]. At a specificity of 0.67, TREWScore
achieved a sensitivity of 0.85 and identified patients a median of 28.2 [interquartile range (IQR), 10.6 to 94.2] hours
before onset. Of those identified, two-thirds were identified before any sepsis-related organ dysfunction. In compar-
ison, the Modified Early Warning Score, which has been used clinically for septic shock prediction, achieved a lower
AUC of 0.73 (95% CI, 0.71 to 0.76). A routine screening protocol based on the presence of two of the systemic inflam-
matory response syndrome criteria, suspicion of infection, and either hypotension or hyperlactatemia achieved a low-
er sensitivity of 0.74 at a comparable specificity of 0.64. Continuous sampling of data from the electronic health
records and calculation of TREWScore may allow clinicians to identify patients at risk for septic shock and provide
earlier interventions that would prevent or mitigate the associated morbidity and mortality.
INTRODUCTION
Seven hundred fifty thousand patients develop severe sepsis and septic
shock in the United States each year. More than half of them are
admitted to an intensive care unit (ICU), accounting for 10% of all
ICU admissions, 20 to 30% of hospital deaths, and $15.4 billion in an-
nual health care costs (1–3). Several studies have demonstrated that
morbidity, mortality, and length of stay are decreased when severe sep-
sis and septic shock are identified and treated early (4–8). In particular,
one study showed that mortality from septic shock increased by 7.6%
with every hour that treatment was delayed after the onset of hypo-
tension (9).
More recent studies comparing protocolized care, usual care, and
early goal-directed therapy (EGDT) for patients with septic shock sug-
gest that usual care is as effective as EGDT (10–12). Some have inter-
preted this to mean that usual care has improved over time and reflects
important aspects of EGDT, such as early antibiotics and early ag-
gressive fluid resuscitation (13). It is likely that continued early identi-
fication and treatment will further improve outcomes. However, the
Acute Physiology Score (SAPS II), SequentialOrgan Failure Assessment
(SOFA) scores, Modified Early Warning Score (MEWS), and Simple
Clinical Score (SCS) have been validated to assess illness severity and
risk of death among septic patients (14–17). Although these scores
are useful for predicting general deterioration or mortality, they typical-
ly cannot distinguish with high sensitivity and specificity which patients
are at highest risk of developing a specific acute condition.
The increased use of electronic health records (EHRs), which can be
queried in real time, has generated interest in automating tools that
identify patients at risk for septic shock (18–20). A number of “early
warning systems,” “track and trigger” initiatives, “listening applica-
tions,” and “sniffers” have been implemented to improve detection
andtimelinessof therapy forpatients with severe sepsis andseptic shock
(18, 20–23). Although these tools have been successful at detecting pa-
tients currently experiencing severe sepsis or septic shock, none predict
which patients are at highest risk of developing septic shock.
The adoption of the Affordable Care Act has added to the growing
excitement around predictive models derived from electronic health
R E S E A R C H A R T I C L E
onNovember3,2016http://stm.sciencemag.org/Downloadedfrom
puted as new data became avail
when his or her score crossed t
dation set, the AUC obtained f
0.81 to 0.85) (Fig. 2). At a spec
of 0.33], TREWScore achieved a s
a median of 28.2 hours (IQR, 10
Identification of patients b
A critical event in the developme
related organ dysfunction (seve
been shown to increase after th
more than two-thirds (68.8%) o
were identified before any sepsi
tients were identified a median
(Fig. 3B).
Comparison of TREWScore
Weevaluatedtheperformanceof
methods for the purpose of provid
use of TREWScore. We first com
to MEWS, a general metric used
of catastrophic deterioration (17
oped for tracking sepsis, MEWS
tion of patients at risk for severe
Fig. 2. ROC for detection of septic shock before onset in the validation
set. The ROC curve for TREWScore is shown in blue, with the ROC curve for
MEWS in red. The sensitivity and specificity performance of the routine
screening criteria is indicated by the purple dot. Normal 95% CIs are shown
for TREWScore and MEWS. TPR, true-positive rate; FPR, false-positive rate.
R E S E A R C H A R T I C L E
A targeted real-time early warning score (TREWScore)
for septic shock
AUC=0.83
At a specificity of 0.67, TREWScore achieved a sensitivity of 0.85 

and identified patients a median of 28.2 hours before onset.
•미국에서 아이폰 앱으로 출시

•사용이 얼마나 번거로울지가 관건

•어느 정도의 기간을 활용해야 효과가 있는가: 2주? 평생?

•Food logging 등을 어떻게 할 것인가?

•과금 방식도 아직 공개되지 않은듯
Prediction ofVentricular Arrhythmia
An Algorithm Based on Deep Learning for Predicting In-Hospital
Cardiac Arrest
Joon-myoung Kwon, MD;* Youngnam Lee, MS;* Yeha Lee, PhD; Seungwoo Lee, BS; Jinsik Park, MD, PhD
Background-—In-hospital cardiac arrest is a major burden to public health, which affects patient safety. Although traditional track-
and-trigger systems are used to predict cardiac arrest early, they have limitations, with low sensitivity and high false-alarm rates.
We propose a deep learning–based early warning system that shows higher performance than the existing track-and-trigger
systems.
Methods and Results-—This retrospective cohort study reviewed patients who were admitted to 2 hospitals from June 2010 to July
2017. A total of 52 131 patients were included. Specifically, a recurrent neural network was trained using data from June 2010 to
January 2017. The result was tested using the data from February to July 2017. The primary outcome was cardiac arrest, and the
secondary outcome was death without attempted resuscitation. As comparative measures, we used the area under the receiver
operating characteristic curve (AUROC), the area under the precision–recall curve (AUPRC), and the net reclassification index.
Furthermore, we evaluated sensitivity while varying the number of alarms. The deep learning–based early warning system (AUROC:
0.850; AUPRC: 0.044) significantly outperformed a modified early warning score (AUROC: 0.603; AUPRC: 0.003), a random forest
algorithm (AUROC: 0.780; AUPRC: 0.014), and logistic regression (AUROC: 0.613; AUPRC: 0.007). Furthermore, the deep learning–
based early warning system reduced the number of alarms by 82.2%, 13.5%, and 42.1% compared with the modified early warning
system, random forest, and logistic regression, respectively, at the same sensitivity.
Conclusions-—An algorithm based on deep learning had high sensitivity and a low false-alarm rate for detection of patients with
cardiac arrest in the multicenter study. (J Am Heart Assoc. 2018;7:e008678. DOI: 10.1161/JAHA.118.008678.)
Key Words: artificial intelligence • cardiac arrest • deep learning • machine learning • rapid response system • resuscitation
In-hospital cardiac arrest is a major burden to public health,
which affects patient safety.1–3
More than a half of cardiac
arrests result from respiratory failure or hypovolemic shock,
and 80% of patients with cardiac arrest show signs of
deterioration in the 8 hours before cardiac arrest.4–9
However,
209 000 in-hospital cardiac arrests occur in the United States
each year, and the survival discharge rate for patients with
cardiac arrest is 20% worldwide.10,11
Rapid response systems
(RRSs) have been introduced in many hospitals to detect
cardiac arrest using the track-and-trigger system (TTS).12,13
Two types of TTS are used in RRSs. For the single-parameter
TTS (SPTTS), cardiac arrest is predicted if any single vital sign
(eg, heart rate [HR], blood pressure) is out of the normal
range.14
The aggregated weighted TTS calculates a weighted
score for each vital sign and then finds patients with cardiac
arrest based on the sum of these scores.15
The modified early
warning score (MEWS) is one of the most widely used
approaches among all aggregated weighted TTSs (Table 1)16
;
however, traditional TTSs including MEWS have limitations, with
low sensitivity or high false-alarm rates.14,15,17
Sensitivity and
false-alarm rate interact: Increased sensitivity creates higher
false-alarm rates and vice versa.
Current RRSs suffer from low sensitivity or a high false-
alarm rate. An RRS was used for only 30% of patients before
unplanned intensive care unit admission and was not used for
22.8% of patients, even if they met the criteria.18,19
From the Departments of Emergency Medicine (J.-m.K.) and Cardiology (J.P.), Mediplex Sejong Hospital, Incheon, Korea; VUNO, Seoul, Korea (Youngnam L., Yeha L.,
S.L.).
*Dr Kwon and Mr Youngnam Lee contributed equally to this study.
Correspondence to: Joon-myoung Kwon, MD, Department of Emergency medicine, Mediplex Sejong Hospital, 20, Gyeyangmunhwa-ro, Gyeyang-gu, Incheon 21080,
Korea. E-mail: kwonjm@sejongh.co.kr
Received January 18, 2018; accepted May 31, 2018.
ª 2018 The Authors. Published on behalf of the American Heart Association, Inc., by Wiley. This is an open access article under the terms of the Creative Commons
Attribution-NonCommercial License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for
commercial purposes.
DOI: 10.1161/JAHA.118.008678 Journal of the American Heart Association 1
ORIGINAL RESEARCH
byguestonJune28,2018http://jaha.ahajournals.org/Downloadedfrom
•환자 수: 86,290

•cardiac arrest: 633

•Input: Heart rate, Respiratory rate, Body temperature, Systolic Blood Pressure
(source: VUNO)
Cardiac Arrest Prediction Accuracy
•대학병원 신속 대응팀에서 처리 가능한 알림 수 (A, B 지점) 에서 더 큰 정확도 차이를 보임

•A: DEWS 33.0%, MEWS 0.3%

•B: DEWS 42.7%, MEWS 4.0%
(source: VUNO)
APPH(Alarms Per Patients Per Hour)
(source: VUNO)
Less False Alarm
(source: VUNO)
시간에 따른 DEWS 예측 변화
Downloadedfromhttp://journals.lww.com/ccmjournalbyBhDMf5ePHKbH4TTImqenVKNoQAGGabrfcFVqkUtIOfFe0yYC0oXKcrV+IkHAY10fon03/21/2020
Downloadedfromhttp://journals.lww.com/ccmjournalbyBhDMf5ePHKbH4TTImqenVKNoQAGGabrfcFVqkUtIOfFe0yYC0oXKcrV+IkHAY10fon03/21/2020
Copyright © 2020 by the Society of Critical Care Medicine and Wolters Kluwer Health, Inc. All Rights Reserved.
Critical Care Medicine www.ccmjournal.org e285
Objectives: As the performance of a conventional track and trigger
system in a rapid response system has been unsatisfactory, we
developed and implemented an artificial intelligence for predict-
ing in-hospital cardiac arrest, denoted the deep learning-based
early warning system. The purpose of this study was to compare
the performance of an artificial intelligence-based early warning
system with that of conventional methods in a real hospital situ-
ation.
Design: Retrospective cohort study.
Setting: This study was conducted at a hospital in which deep
learning-based early warning system was implemented.
Patients: We reviewed the records of adult patients who were
admitted to the general ward of our hospital from April 2018 to
March 2019.
Interventions: The study population included 8,039 adult patients.
A total 83 events of deterioration occurred during the study pe-
riod. The outcome was events of deterioration, defined as cardiac
arrest and unexpected ICU admission. We defined a true alarm
as an alarm occurring within 0.5–24 hours before a deteriorating
event.
Measurements and Main Results: We used the area under the
receiver operating characteristic curve, area under the precision-
recall curve, number needed to examine, and mean alarm count
per day as comparative measures. The deep learning-based early
warning system (area under the receiver operating character-
istic curve, 0.865; area under the precision-recall curve, 0.066)
outperformed the modified early warning score (area under the
receiver operating characteristic curve, 0.682; area under the
precision-recall curve, 0.010) and reduced the number needed
to examine and mean alarm count per day by 69.2% and 59.6%,
respectively. At the same specificity, deep learning-based early
warning system had up to 257% higher sensitivity than conven-
tional methods.
Conclusions: The developed artificial intelligence based on
deep-learning, deep learning-based early warning system, ac-
curately predicted deterioration of patients in a general ward
and outperformed conventional methods. This study showed the
potential and effectiveness of artificial intelligence in an rapid
response system, which can be applied together with electronic
health records. This will be a useful method to identify patients
with deterioration and help with precise decision-making in daily
practice. (Crit Care Med 2020; 48:e285–e289)
Key Words: artificial intelligence; cardiology; critical care; deep
learning
I
n-hospital cardiac arrest is a major healthcare burden and
rapid response systems (RRSs) are used worldwide to iden-
tify deteriorating hospitalized patients and to prevent car-
diac arrest (1). Most patients with cardiac arrest show signs of
deterioration. However, 209,000 cardiac arrests occur and the
survival to discharge rate was only less than 20% in the United
States each year (2). One challenge with RRS is the failure to
detect the deteriorating signs of patient; thus, several track and
trigger systems (TTSs) have been developed (3). However, con-
ventional methods, such as the single parameter TTS (SPTTS)
and modified early warning score (MEWS), have been disap-
pointing owing to their limited ability to work together with
electronic health records (EHRs) (4).
We previously developed and validated an artificial intel-
ligence (AI) for predicting in-hospital cardiac arrest, denoted
the deep learning-based early warning system (DEWS) (5).
After fine-tuning and setup, we implemented DEWS with EHR
to monitor the risk of deterioration among patients in general
wards; we have actively used DEWS in our RRS since April
2018. The purpose of this study was to compare the perfor-
mance of our developed AI with that of conventional methods.
To our best knowledge, this is the first study to apply a deep
learning-based AI algorithm in an RRS, verified in an external
validation study in an actual hospital setting.DOI: 10.1097/CCM.0000000000004236
1
VUNO, Seoul, Korea.
2
Department of Critical care and Emergency Medicine, Mediplex Sejong
Hospital, Incheon, Korea.
3
Division of Cardiology, Cardiovascular Center, Mediplex Sejong Hospital,
Incheon, Korea.
Copyright © 2020 by the Society of Critical Care Medicine and Wolters
Kluwer Health, Inc. All Rights Reserved.
Detecting Patient Deterioration Using Artificial
Intelligence in a Rapid Response System
Kyung-Jae Cho, MS1
; Oyeon Kwon, MS1
; Joon-myoung Kwon, MD, MS2
; Yeha Lee, PhD1
;
Hyunho Park, MD1
; Ki-Hyun Jeon, MD, MS3
; Kyung-Hee Kim, MD, PhD3
; Jinsik Park, MD, PhD3
;
Byung-Hee Oh, MD PhD3
•Real clinical setting 에서 DEWS 를 다른 기존의 예측 시스템과 비교

•세종병원의 일반병동 성인환자 8,039 명 대상 

•cardiac arrest / unexpected ICU admission을 0.5~24시간 미리 예측

•Retrospective Cohort Study
Critical Care Medicine 2020
Cho et al
기존의 시스템보다 

더 정확하게 예측

(AUC=0.865)
기존의 시스템보다

더 일찍 예측

(동일 specificity 기준)
검사가 필요한

환자의 숫자도 감소
False Alarm의

횟수도 감소

(동일 sensitivity 기준)
Critical Care Medicine 2020
•복잡한 의료 데이터의 분석 및 insight 도출

•영상 의료/병리 데이터의 분석/판독

•연속 데이터의 모니터링 및 예방/예측
의료 인공지능의 세 유형
AnalysisTarget Discovery AnalysisLead Discovery Clinical Trial
Post Market
Surveillance
Digital Healthcare in Drug Development
AnalysisTarget Discovery AnalysisLead Discovery Clinical Trial
Post Market
Surveillance
Digital Healthcare in Drug Development
•개인 유전 정보 분석

•블록체인 기반 유전체 분석
•딥러닝 기반 후보 물질

•인공지능+제약사
•환자 모집

•데이터 측정: 웨어러블

•디지털 표현형

•복약 순응도
•SNS 기반의 PMS

•블록체인 기반의 PMS
+
Digital Therapeutics
AnalysisTarget Discovery AnalysisLead Discovery Clinical Trial
Post Market
Surveillance
Digital Healthcare in Drug Development
•딥러닝 기반의 lead discovery

•인공지능+제약사
604 VOLUME 35 NUMBER 7 JULY 2017 NATURE BIOTECHNOLOGY
AI-powered drug discovery captures pharma interest
Adrug-huntingdealinkedlastmonth,between
Numerate,ofSanBruno,California,andTakeda
PharmaceuticaltouseNumerate’sartificialintel-
ligence (AI) suite to discover small-molecule
therapies for oncology, gastroenterology and
central nervous system disorders, is the latest in
a growing number of research alliances involv-
ing AI-powered computational drug develop-
ment firms. Also last month, GNS Healthcare
of Cambridge, Massachusetts announced a deal
with Roche subsidiary Genentech of South San
Francisco, California to use GNS’s AI platform
to better understand what affects the efficacy of
knowntherapiesinoncology.InMay,Exscientia
of Dundee, Scotland, signed a deal with Paris-
based Sanofi that includes up to €250 ($280)
million in milestone payments. Exscientia will
provide the compound design and Sanofi the
chemical synthesis of new drugs for diabetes
and cardiovascular disease. The trend indicates
thatthepharmaindustry’slong-runningskepti-
cismaboutAIissofteningintogenuineinterest,
driven by AI’s promise to address the industry’s
principal pain point: clinical failure rates.
The industry’s willingness to consider AI
approaches reflects the reality that drug discov-
eryislaborious,timeconsumingandnotpartic-
ularly effective. A two-decade-long downward
trend in clinical success rates has only recently
improved (Nat. Rev. Drug Disc. 15, 379–380,
2016). Still, today, only about one in ten drugs
thatenterphase1clinicaltrialsreachespatients.
Half those failures are due to a lack of efficacy,
says Jackie Hunter, CEO of BenevolentBio, a
division of BenevolentAI of London. “That tells
you we’re not picking the right targets,” she says.
“Even a 5 or 10% reduction in efficacy failure
would be amazing.” Hunter’s views on AI in
drug discovery are featured in Ernst  Young’s
BiotechnologyReport2017releasedlastmonth.
Companies that have been watching AI from
the sidelines are now jumping in. The best-
known machine-learning model for drug dis-
covery is perhaps IBM’s Watson. IBM signed a
deal in December 2016 with Pfizer to aid the
pharma giant’s immuno-oncology drug discov-
eryefforts,addingtoastringofpreviousdealsin
the biopharma space (Nat.Biotechnol.33,1219–
1220, 2015). IBM’s Watson hunts for drugs by
sorting through vast amounts of textual data to
provide quick analyses, and tests hypotheses by
sorting through massive amounts of laboratory
data, clinicalreportsandscientificpublications.
BenevolentAI takes a similar approach with
algorithmsthatminetheresearchliteratureand
proprietary research databases.
The explosion of biomedical data has driven
much of industry’s interest in AI (Table 1). The
confluence of ever-increasing computational
horsepower and the proliferation of large data
sets has prompted scientists to seek learning
algorithms that can help them navigate such
massive volumes of information.
A lot of the excitement about AI in drug
discovery has spilled over from other fields.
Machine vision, which allows, among other
things, self-driving cars, and language process-
ing have given rise to sophisticated multilevel
artificial neural networks known as deep-
learning algorithms that can be used to model
biological processes from assay data as well as
textual data.
In the past people didn’t have enough data
to properly train deep-learning algorithms,
says Mark Gerstein, a biomedical informat-
ics professor at Yale University in New Haven,
Connecticut.Nowresearchershavebeenableto
build massive databases and harness them with
these algorithms, he says. “I think that excite-
ment is justified.”
Numerate is one of a growing number of AI
companies founded to take advantage of that
dataonslaughtasappliedtodrugdiscovery.“We
apply AI to chemical design at every stage,” says
Guido Lanza, Numerate’s CEO. It will provide
Tokyo-basedTakedawithcandidatesforclinical
trials by virtual compound screenings against
targets, designing and optimizing compounds,
andmodelingabsorption,distribution,metabo-
lism and excretion, and toxicity. The agreement
includes undisclosed milestone payments and
royalties.
Academic laboratories are also embracing
AI tools. In April, Atomwise of San Francisco
launched its Artificial Intelligence Molecular
Screen awards program, which will deliver 72
potentially therapeutic compounds to as many
as 100 university research labs at no charge.
Atomwise is a University of Toronto spinout
that in 2015 secured an alliance with Merck of
Kenilworth, New Jersey. For this new endeavor,
it will screen 10 million molecules using its
AtomNet platform to provide each lab with
72 compounds aimed at a specific target of the
laboratory’s choosing.
The Japanese government launched in
2016 a research consortium centered on
using Japan’s K supercomputer to ramp up
drug discovery efficiency across dozens of
local companies and institutions. Among
those involved are Takeda and tech giants
Fujitsu of Tokyo, Japan, and NEC, also of
Tokyo, as well as Kyoto University Hospital
and Riken, Japan’s National Research and
Development Institute, which will provide
clinical data.
Deep learning is starting to gain acolytes in the drug discovery space.
KTSDESIGN/SciencePhotoLibrary
N E W S©2017NatureAmerica,Inc.,partofSpringerNature.Allrightsreserved.
Genomics data analytics startup WuXi
NextCode Genomics of Shanghai; Cambridge,
Massachusetts; and Reykjavík, Iceland, collab-
orated with researchers at Yale University on a
study that used the company’s deep-learning
algorithm to identify a key mechanism in
blood vessel growth. The result could aid drug
discovery efforts aimed at inhibiting blood
vessel growth in tumors (Nature doi:10.1038/
nature22322, 2017).
IntheUS,duringtheObamaadministration,
industry and academia joined forces to apply
AI to accelerate drug discovery as part of the
CancerMoonshotinitiative (Nat.Biotechnol.34 ,
119, 2016). The Accelerating Therapeutics for
Opportunities in Medicine (ATOM), launched
in January 2016, marries computational and
experimental approaches, with Brentford,
UK-based GlaxoSmithKline, participating
with Lawrence Livermore National Laboratory
in Livermore, California, and the US National
Cancer Institute. The computational portion
of the process, which includes deep-learning
and other AI algorithms, will be tested in the
first two years. In the third year, “we hope to
start on day one with a disease hypothesis and
on day 365 to deliver a drug candidate,” says
MarthaHead,GlaxoSmithKline’s head, insights
from data.
Table 1 Selected collaborations in the AI-drug discovery space
AI company/
location Technology
Announced partner/
location Indication(s) Deal date
Atomwise Deep-learning screening
from molecular structure
data
Merck Malaria 2015
BenevolentAI Deep-learning and natural
language processing of
research literature
Janssen Pharmaceutica
(Johnson  Johnson),
Beerse, Belgium
Multiple November 8,
2016
Berg,
Framingham,
Massachusetts
Deep-learning screening
of biomarkers from patient
data
None Multiple N/A
Exscientia Bispecific compounds via
Bayesian models of ligand
activity from drug discovery
data
Sanofi Metabolic
diseases
May 9, 2017
GNS
Healthcare
Bayesian probabilistic
inference for investigating
efficacy
Genentech Oncology June 19,
2017
Insilico
Medicine
Deep-learning screening
from drug and disease
databases
None Age-related
diseases
N/A
Numerate Deep learning from pheno-
typic data
Takeda Oncology, gastro-
enterology and
central nervous
system disorders
June 12,
2017
Recursion,
Salt Lake City,
Utah
Cellular phenotyping via
image analysis
Sanofi Rare genetic
diseases
April 25,
2016
twoXAR, Palo
Alto, California
Deep-learning screening
from literature and assay
data
Santen
Pharmaceuticals,
Osaka, Japan
Glaucoma February 23,
2017
N/A, none announced. Source: companies’ websites.
N E W S
WSJ, 2017 June
• 다국적 제약사는 인공지능 기술을 신약 개발에 활용하기 위해 다양한 시도

• 최근 인공지능에서는 과거의 virtual screening, docking 등과는 다른 방식을 이용
https://research.googleblog.com/2017/12/deepvariant-highly-accurate-genomes.html
DeepVariant: Highly Accurate Genomes
With Deep Neural Networks
•2016년 PrecisionFDA의 SNP 퍼포먼스 부문에서 Verily 가 우승

•이 알고리즘이 개선되어 DeepVariant 라는 이름으로 공개

•Read의 alignment를 위해서 그 자체를 ‘이미지’로 인식하여 CNN으로 학습
https://research.googleblog.com/2017/12/deepvariant-highly-accurate-genomes.html
DeepVariant: Highly Accurate Genomes
With Deep Neural Networks
•2016년 PrecisionFDA의 SNP 퍼포먼스 부문에서 Verily 가 우승

•이 알고리즘이 개선되어 DeepVariant 라는 이름으로 공개

•Read의 alignment를 위해서 그 자체를 ‘이미지’로 인식하여 CNN으로 학습
L E T T E R S
This last experiment is especially demanding as not only do the species PacBio dataset is the opposite, with many false indels (79.8% PPV
Table 1 Evaluation of several bioinformatics methods on the high-coverage, whole-genome sample NA24385
Method Type F1 Recall Precision TP FN FP FP.gt FP.al Version
DeepVariant (live GitHub) Indel 0.99507 0.99347 0.99666 357,641 2350 1,198 217 840 Latest GitHub v0.4.1-b4e8d37d
GATK (raw) Indel 0.99366 0.99219 0.99512 357,181 2810 1,752 377 995 3.8-0-ge9d806836
Strelka Indel 0.99227 0.98829 0.99628 355,777 4214 1,329 221 855 2.8.4-3-gbe58942
DeepVariant (pFDA) Indel 0.99112 0.98776 0.99450 355,586 4405 1,968 846 1,027 pFDA submission May 2016
GATK (VQSR) Indel 0.99010 0.98454 0.99573 354,425 5566 1,522 343 909 3.8-0-ge9d806836
GATK (flt) Indel 0.98229 0.96881 0.99615 348,764 11227 1,349 370 916 3.8-0-ge9d806836
FreeBayes Indel 0.94091 0.91917 0.96372 330,891 29,100 12,569 9,149 3,347 v1.1.0-54-g49413aa
16GT Indel 0.92732 0.91102 0.94422 327,960 32,031 19,364 10,700 7,745 v1.0-34e8f934
SAMtools Indel 0.87951 0.83369 0.93066 300,120 59,871 22,682 2,302 20,282 1.6
DeepVariant (live GitHub) SNP 0.99982 0.99975 0.99989 3,054,552 754 350 157 38 Latest GitHub v0.4.1-b4e8d37d
DeepVariant (pFDA) SNP 0.99958 0.99944 0.99973 3,053,579 1,727 837 409 78 pFDA submission May 2016
Strelka SNP 0.99935 0.99893 0.99976 3,052,050 3,256 732 87 136 2.8.4-3-gbe58942
GATK (raw) SNP 0.99914 0.99973 0.99854 3,054,494 812 4,469 176 257 3.8-0-ge9d806836
16GT SNP 0.99583 0.99850 0.99318 3,050,725 4,581 20,947 3,476 3,899 v1.0-34e8f934
GATK (VQSR) SNP 0.99436 0.98940 0.99937 3,022,917 32,389 1,920 80 170 3.8-0-ge9d806836
FreeBayes SNP 0.99124 0.98342 0.99919 3,004,641 50,665 2,434 351 1,232 v1.1.0-54-g49413aa
SAMtools SNP 0.99021 0.98114 0.99945 2,997,677 57,629 1,651 1,040 200 1.6
GATK (flt) SNP 0.98958 0.97953 0.99983 2,992,764 62,542 509 168 26 3.8-0-ge9d806836
The dataset used in this evaluation is the same as in the precisionFDA Truth Challenge (pFDA). Several methods are compared, including the DeepVariant callset as submitted to
the contest and the most recent DeepVariant version from GitHub. Each method was run according to the individual authors’ best-practice recommendations and represents
a good-faith effort to achieve best results. Comparisons to the Genome in a Bottle truth set for this sample were performed using the hap.py software, available on GitHub at
http://github.com/Illumina/hap.py, using the same version of the GIAB truth set (v3.2.2) used by pFDA. The overall accuracy (F1, sort order within each variant type), recall, preci-
sion, and numbers of true positives (TP), false negatives (FN) and false positives (FP) are shown over the whole genome. False positives are further divided by those caused by
genotype mismatches (FP.gt) and those cause by allele mismatches (FP.al). Finally, the version of the software used for each method is provided. We present three GATK callsets:
GATK (raw), the unfiltered calls emitted by the HaplotypeCaller; GATK (VQSR), the callset filtered with variant quality score recalibration (VQSR); and GATK (flt), the raw GATK
callset filtered with run-flt in CHM-eval. See Supplementary Note 7 for more details.
targets.
To overcome these limitations we take an indirect approach. Instead of directly visualizing filters
in order to understand their specialization, we apply filters to input data and examine the location
where they maximally fire. Using this technique we were able to map filters to chemical functions.
For example, Figure 5 illustrate the 3D locations at which a particular filter from our first convo-
lutional layer fires. Visual inspection of the locations at which that filter is active reveals that this
filter specializes as a sulfonyl/sulfonamide detector. This demonstrates the ability of the model to
learn complex chemical features from simpler ones. In this case, the filter has inferred a meaningful
spatial arrangement of input atom types without any chemical prior knowledge.
Figure 5: Sulfonyl/sulfonamide detection with autonomously trained convolutional filters.
8
Protein-Compound Complex Structure
Binding, or non-binding?
AtomNet: A Deep Convolutional Neural Network for
Bioactivity Prediction in Structure-based Drug
Discovery
Izhar Wallach
Atomwise, Inc.
izhar@atomwise.com
Michael Dzamba
Atomwise, Inc.
misko@atomwise.com
Abraham Heifets
Atomwise, Inc.
abe@atomwise.com
Abstract
Deep convolutional neural networks comprise a subclass of deep neural networks
(DNN) with a constrained architecture that leverages the spatial and temporal
structure of the domain they model. Convolutional networks achieve the best pre-
dictive performance in areas such as speech and image recognition by hierarchi-
cally composing simple local features into complex models. Although DNNs have
been used in drug discovery for QSAR and ligand-based bioactivity predictions,
none of these models have benefited from this powerful convolutional architec-
ture. This paper introduces AtomNet, the first structure-based, deep convolutional
neural network designed to predict the bioactivity of small molecules for drug dis-
covery applications. We demonstrate how to apply the convolutional concepts of
feature locality and hierarchical composition to the modeling of bioactivity and
chemical interactions. In further contrast to existing DNN techniques, we show
that AtomNet’s application of local convolutional filters to structural target infor-
mation successfully predicts new active molecules for targets with no previously
known modulators. Finally, we show that AtomNet outperforms previous docking
approaches on a diverse set of benchmarks by a large margin, achieving an AUC
greater than 0.9 on 57.8% of the targets in the DUDE benchmark.
1 Introduction
Fundamentally, biological systems operate through the physical interaction of molecules. The ability
to determine when molecular binding occurs is therefore critical for the discovery of new medicines
and for furthering of our understanding of biology. Unfortunately, despite thirty years of compu-
tational efforts, computer tools remain too inaccurate for routine binding prediction, and physical
experiments remain the state of the art for binding determination. The ability to accurately pre-
dict molecular binding would reduce the time-to-discovery of new treatments, help eliminate toxic
molecules early in development, and guide medicinal chemistry efforts [1, 2].
In this paper, we introduce a new predictive architecture, AtomNet, to help address these challenges.
AtomNet is novel in two regards: AtomNet is the first deep convolutional neural network for molec-
ular binding affinity prediction. It is also the first deep learning system that incorporates structural
information about the target to make its predictions.
Deep convolutional neural networks (DCNN) are currently the best performing predictive models
for speech and vision [3, 4, 5, 6]. DCNN is a class of deep neural network that constrains its model
architecture to leverage the spatial and temporal structure of its domain. For example, a low-level
image feature, such as an edge, can be described within a small spatially-proximate patch of pixels.
Such a feature detector can share evidence across the entire receptive field by “tying the weights”
of the detector neurons, as the recognition of the edge does not depend on where it is found within
1
arXiv:1510.02855v1[cs.LG]10Oct2015
AtomNet: A Deep Convolutional Neural Network for
Bioactivity Prediction in Structure-based Drug
Discovery
Izhar Wallach
Atomwise, Inc.
izhar@atomwise.com
Michael Dzamba
Atomwise, Inc.
misko@atomwise.com
Abraham Heifets
Atomwise, Inc.
abe@atomwise.com
Abstract
Deep convolutional neural networks comprise a subclass of deep neural networks
(DNN) with a constrained architecture that leverages the spatial and temporal
structure of the domain they model. Convolutional networks achieve the best pre-
dictive performance in areas such as speech and image recognition by hierarchi-
cally composing simple local features into complex models. Although DNNs have
been used in drug discovery for QSAR and ligand-based bioactivity predictions,
none of these models have benefited from this powerful convolutional architec-
ture. This paper introduces AtomNet, the first structure-based, deep convolutional
neural network designed to predict the bioactivity of small molecules for drug dis-
covery applications. We demonstrate how to apply the convolutional concepts of
feature locality and hierarchical composition to the modeling of bioactivity and
chemical interactions. In further contrast to existing DNN techniques, we show
that AtomNet’s application of local convolutional filters to structural target infor-
mation successfully predicts new active molecules for targets with no previously
known modulators. Finally, we show that AtomNet outperforms previous docking
approaches on a diverse set of benchmarks by a large margin, achieving an AUC
greater than 0.9 on 57.8% of the targets in the DUDE benchmark.
1 Introduction
Fundamentally, biological systems operate through the physical interaction of molecules. The ability
to determine when molecular binding occurs is therefore critical for the discovery of new medicines
and for furthering of our understanding of biology. Unfortunately, despite thirty years of compu-
tational efforts, computer tools remain too inaccurate for routine binding prediction, and physical
experiments remain the state of the art for binding determination. The ability to accurately pre-
dict molecular binding would reduce the time-to-discovery of new treatments, help eliminate toxic
molecules early in development, and guide medicinal chemistry efforts [1, 2].
In this paper, we introduce a new predictive architecture, AtomNet, to help address these challenges.
AtomNet is novel in two regards: AtomNet is the first deep convolutional neural network for molec-
ular binding affinity prediction. It is also the first deep learning system that incorporates structural
information about the target to make its predictions.
Deep convolutional neural networks (DCNN) are currently the best performing predictive models
for speech and vision [3, 4, 5, 6]. DCNN is a class of deep neural network that constrains its model
architecture to leverage the spatial and temporal structure of its domain. For example, a low-level
image feature, such as an edge, can be described within a small spatially-proximate patch of pixels.
Such a feature detector can share evidence across the entire receptive field by “tying the weights”
of the detector neurons, as the recognition of the edge does not depend on where it is found within
1
arXiv:1510.02855v1[cs.LG]10Oct2015
Smina 123 35 5 0 0
Table 3: The number of targets on which AtomNet and Smina exceed given adjusted-logAUC thresh-
olds. For example, on the CHEMBL-20 PMD set, AtomNet achieves an adjusted-logAUC of 0.3
or better for 27 targets (out of 50 possible targets). ChEMBL-20 PMD contains 50 targets, DUDE-
30 contains 30 targets, DUDE-102 contains 102 targets, and ChEMBL-20 inactives contains 149
targets.
To overcome these limitations we take an indirect approach. Instead of directly visualizing filters
in order to understand their specialization, we apply filters to input data and examine the location
where they maximally fire. Using this technique we were able to map filters to chemical functions.
For example, Figure 5 illustrate the 3D locations at which a particular filter from our first convo-
lutional layer fires. Visual inspection of the locations at which that filter is active reveals that this
filter specializes as a sulfonyl/sulfonamide detector. This demonstrates the ability of the model to
learn complex chemical features from simpler ones. In this case, the filter has inferred a meaningful
spatial arrangement of input atom types without any chemical prior knowledge.
Figure 5: Sulfonyl/sulfonamide detection with autonomously trained convolutional filters.
8
• 이미 알려진 단백질-리간드 3차원 결합 구조를 딥러닝(CNN)으로 학습

• 화학 결합 등에 대한 계산 없이도, 단백질-리간드 결합 여부를 계산

• 기존의 구조기반 예측 등 대비, 딥러닝으로 더 정확히 예측하였음
AtomNet: A Deep Convolutional Neural Network for
Bioactivity Prediction in Structure-based Drug
Discovery
Izhar Wallach
Atomwise, Inc.
izhar@atomwise.com
Michael Dzamba
Atomwise, Inc.
misko@atomwise.com
Abraham Heifets
Atomwise, Inc.
abe@atomwise.com
Abstract
Deep convolutional neural networks comprise a subclass of deep neural networks
(DNN) with a constrained architecture that leverages the spatial and temporal
structure of the domain they model. Convolutional networks achieve the best pre-
dictive performance in areas such as speech and image recognition by hierarchi-
cally composing simple local features into complex models. Although DNNs have
been used in drug discovery for QSAR and ligand-based bioactivity predictions,
none of these models have benefited from this powerful convolutional architec-
ture. This paper introduces AtomNet, the first structure-based, deep convolutional
neural network designed to predict the bioactivity of small molecules for drug dis-
covery applications. We demonstrate how to apply the convolutional concepts of
feature locality and hierarchical composition to the modeling of bioactivity and
chemical interactions. In further contrast to existing DNN techniques, we show
that AtomNet’s application of local convolutional filters to structural target infor-
mation successfully predicts new active molecules for targets with no previously
known modulators. Finally, we show that AtomNet outperforms previous docking
approaches on a diverse set of benchmarks by a large margin, achieving an AUC
greater than 0.9 on 57.8% of the targets in the DUDE benchmark.
1 Introduction
Fundamentally, biological systems operate through the physical interaction of molecules. The ability
to determine when molecular binding occurs is therefore critical for the discovery of new medicines
and for furthering of our understanding of biology. Unfortunately, despite thirty years of compu-
tational efforts, computer tools remain too inaccurate for routine binding prediction, and physical
experiments remain the state of the art for binding determination. The ability to accurately pre-
dict molecular binding would reduce the time-to-discovery of new treatments, help eliminate toxic
molecules early in development, and guide medicinal chemistry efforts [1, 2].
In this paper, we introduce a new predictive architecture, AtomNet, to help address these challenges.
AtomNet is novel in two regards: AtomNet is the first deep convolutional neural network for molec-
ular binding affinity prediction. It is also the first deep learning system that incorporates structural
information about the target to make its predictions.
Deep convolutional neural networks (DCNN) are currently the best performing predictive models
for speech and vision [3, 4, 5, 6]. DCNN is a class of deep neural network that constrains its model
architecture to leverage the spatial and temporal structure of its domain. For example, a low-level
image feature, such as an edge, can be described within a small spatially-proximate patch of pixels.
Such a feature detector can share evidence across the entire receptive field by “tying the weights”
of the detector neurons, as the recognition of the edge does not depend on where it is found within
1
arXiv:1510.02855v1[cs.LG]10Oct2015
• 이미 알려진 단백질-리간드 3차원 결합 구조를 딥러닝(CNN)으로 학습

• 화학 결합 등에 대한 계산 없이도, 단백질-리간드 결합 여부를 계산

• 기존의 구조기반 예측 등 대비, 딥러닝으로 더 정확히 예측하였음
604 VOLUME 35 NUMBER 7 JULY 2017 NATURE BIOTECHNOLOGY
AI-powered drug discovery captures pharma interest
Adrug-huntingdealinkedlastmonth,between
Numerate,ofSanBruno,California,andTakeda
PharmaceuticaltouseNumerate’sartificialintel-
ligence (AI) suite to discover small-molecule
therapies for oncology, gastroenterology and
central nervous system disorders, is the latest in
a growing number of research alliances involv-
ing AI-powered computational drug develop-
ment firms. Also last month, GNS Healthcare
of Cambridge, Massachusetts announced a deal
with Roche subsidiary Genentech of South San
Francisco, California to use GNS’s AI platform
to better understand what affects the efficacy of
knowntherapiesinoncology.InMay,Exscientia
of Dundee, Scotland, signed a deal with Paris-
based Sanofi that includes up to €250 ($280)
million in milestone payments. Exscientia will
provide the compound design and Sanofi the
chemical synthesis of new drugs for diabetes
and cardiovascular disease. The trend indicates
thatthepharmaindustry’slong-runningskepti-
cismaboutAIissofteningintogenuineinterest,
driven by AI’s promise to address the industry’s
principal pain point: clinical failure rates.
The industry’s willingness to consider AI
approaches reflects the reality that drug discov-
eryislaborious,timeconsumingandnotpartic-
ularly effective. A two-decade-long downward
trend in clinical success rates has only recently
improved (Nat. Rev. Drug Disc. 15, 379–380,
2016). Still, today, only about one in ten drugs
thatenterphase1clinicaltrialsreachespatients.
Half those failures are due to a lack of efficacy,
says Jackie Hunter, CEO of BenevolentBio, a
division of BenevolentAI of London. “That tells
you we’re not picking the right targets,” she says.
“Even a 5 or 10% reduction in efficacy failure
would be amazing.” Hunter’s views on AI in
drug discovery are featured in Ernst  Young’s
BiotechnologyReport2017releasedlastmonth.
Companies that have been watching AI from
the sidelines are now jumping in. The best-
known machine-learning model for drug dis-
covery is perhaps IBM’s Watson. IBM signed a
deal in December 2016 with Pfizer to aid the
pharma giant’s immuno-oncology drug discov-
eryefforts,addingtoastringofpreviousdealsin
the biopharma space (Nat.Biotechnol.33,1219–
1220, 2015). IBM’s Watson hunts for drugs by
sorting through vast amounts of textual data to
provide quick analyses, and tests hypotheses by
sorting through massive amounts of laboratory
data, clinicalreportsandscientificpublications.
BenevolentAI takes a similar approach with
algorithmsthatminetheresearchliteratureand
proprietary research databases.
The explosion of biomedical data has driven
much of industry’s interest in AI (Table 1). The
confluence of ever-increasing computational
horsepower and the proliferation of large data
sets has prompted scientists to seek learning
algorithms that can help them navigate such
massive volumes of information.
A lot of the excitement about AI in drug
discovery has spilled over from other fields.
Machine vision, which allows, among other
things, self-driving cars, and language process-
ing have given rise to sophisticated multilevel
artificial neural networks known as deep-
learning algorithms that can be used to model
biological processes from assay data as well as
textual data.
In the past people didn’t have enough data
to properly train deep-learning algorithms,
says Mark Gerstein, a biomedical informat-
ics professor at Yale University in New Haven,
Connecticut.Nowresearchershavebeenableto
build massive databases and harness them with
these algorithms, he says. “I think that excite-
ment is justified.”
Numerate is one of a growing number of AI
companies founded to take advantage of that
dataonslaughtasappliedtodrugdiscovery.“We
apply AI to chemical design at every stage,” says
Guido Lanza, Numerate’s CEO. It will provide
Tokyo-basedTakedawithcandidatesforclinical
trials by virtual compound screenings against
targets, designing and optimizing compounds,
andmodelingabsorption,distribution,metabo-
lism and excretion, and toxicity. The agreement
includes undisclosed milestone payments and
royalties.
Academic laboratories are also embracing
AI tools. In April, Atomwise of San Francisco
launched its Artificial Intelligence Molecular
Screen awards program, which will deliver 72
potentially therapeutic compounds to as many
as 100 university research labs at no charge.
Atomwise is a University of Toronto spinout
that in 2015 secured an alliance with Merck of
Kenilworth, New Jersey. For this new endeavor,
it will screen 10 million molecules using its
AtomNet platform to provide each lab with
72 compounds aimed at a specific target of the
laboratory’s choosing.
The Japanese government launched in
2016 a research consortium centered on
using Japan’s K supercomputer to ramp up
drug discovery efficiency across dozens of
local companies and institutions. Among
those involved are Takeda and tech giants
Fujitsu of Tokyo, Japan, and NEC, also of
Tokyo, as well as Kyoto University Hospital
and Riken, Japan’s National Research and
Development Institute, which will provide
clinical data.
Deep learning is starting to gain acolytes in the drug discovery space.
KTSDESIGN/SciencePhotoLibrary
N E W S©2017NatureAmerica,Inc.,partofSpringerNature.Allrightsreserved.
604 VOLUME 35 NUMBER 7 JULY 2017 NATURE BIOTECHNOLOGY
AI-powered drug discovery captures pharma interest
Adrug-huntingdealinkedlastmonth,between
Numerate,ofSanBruno,California,andTakeda
PharmaceuticaltouseNumerate’sartificialintel-
ligence (AI) suite to discover small-molecule
therapies for oncology, gastroenterology and
central nervous system disorders, is the latest in
a growing number of research alliances involv-
ing AI-powered computational drug develop-
ment firms. Also last month, GNS Healthcare
of Cambridge, Massachusetts announced a deal
with Roche subsidiary Genentech of South San
Francisco, California to use GNS’s AI platform
to better understand what affects the efficacy of
knowntherapiesinoncology.InMay,Exscientia
of Dundee, Scotland, signed a deal with Paris-
based Sanofi that includes up to €250 ($280)
million in milestone payments. Exscientia will
provide the compound design and Sanofi the
chemical synthesis of new drugs for diabetes
and cardiovascular disease. The trend indicates
thatthepharmaindustry’slong-runningskepti-
cismaboutAIissofteningintogenuineinterest,
driven by AI’s promise to address the industry’s
principal pain point: clinical failure rates.
The industry’s willingness to consider AI
approaches reflects the reality that drug discov-
eryislaborious,timeconsumingandnotpartic-
ularly effective. A two-decade-long downward
trend in clinical success rates has only recently
improved (Nat. Rev. Drug Disc. 15, 379–380,
2016). Still, today, only about one in ten drugs
thatenterphase1clinicaltrialsreachespatients.
Half those failures are due to a lack of efficacy,
says Jackie Hunter, CEO of BenevolentBio, a
division of BenevolentAI of London. “That tells
you we’re not picking the right targets,” she says.
“Even a 5 or 10% reduction in efficacy failure
would be amazing.” Hunter’s views on AI in
drug discovery are featured in Ernst  Young’s
BiotechnologyReport2017releasedlastmonth.
Companies that have been watching AI from
the sidelines are now jumping in. The best-
known machine-learning model for drug dis-
covery is perhaps IBM’s Watson. IBM signed a
deal in December 2016 with Pfizer to aid the
pharma giant’s immuno-oncology drug discov-
eryefforts,addingtoastringofpreviousdealsin
the biopharma space (Nat.Biotechnol.33,1219–
1220, 2015). IBM’s Watson hunts for drugs by
sorting through vast amounts of textual data to
provide quick analyses, and tests hypotheses by
sorting through massive amounts of laboratory
data, clinicalreportsandscientificpublications.
BenevolentAI takes a similar approach with
algorithmsthatminetheresearchliteratureand
proprietary research databases.
The explosion of biomedical data has driven
much of industry’s interest in AI (Table 1). The
confluence of ever-increasing computational
horsepower and the proliferation of large data
sets has prompted scientists to seek learning
algorithms that can help them navigate such
massive volumes of information.
A lot of the excitement about AI in drug
discovery has spilled over from other fields.
Machine vision, which allows, among other
things, self-driving cars, and language process-
ing have given rise to sophisticated multilevel
artificial neural networks known as deep-
learning algorithms that can be used to model
biological processes from assay data as well as
textual data.
In the past people didn’t have enough data
to properly train deep-learning algorithms,
says Mark Gerstein, a biomedical informat-
ics professor at Yale University in New Haven,
Connecticut.Nowresearchershavebeenableto
build massive databases and harness them with
these algorithms, he says. “I think that excite-
ment is justified.”
Numerate is one of a growing number of AI
companies founded to take advantage of that
dataonslaughtasappliedtodrugdiscovery.“We
apply AI to chemical design at every stage,” says
Guido Lanza, Numerate’s CEO. It will provide
Tokyo-basedTakedawithcandidatesforclinical
trials by virtual compound screenings against
targets, designing and optimizing compounds,
andmodelingabsorption,distribution,metabo-
lism and excretion, and toxicity. The agreement
includes undisclosed milestone payments and
royalties.
Academic laboratories are also embracing
AI tools. In April, Atomwise of San Francisco
launched its Artificial Intelligence Molecular
Screen awards program, which will deliver 72
potentially therapeutic compounds to as many
as 100 university research labs at no charge.
Atomwise is a University of Toronto spinout
that in 2015 secured an alliance with Merck of
Kenilworth, New Jersey. For this new endeavor,
it will screen 10 million molecules using its
AtomNet platform to provide each lab with
72 compounds aimed at a specific target of the
laboratory’s choosing.
The Japanese government launched in
2016 a research consortium centered on
using Japan’s K supercomputer to ramp up
drug discovery efficiency across dozens of
local companies and institutions. Among
those involved are Takeda and tech giants
Fujitsu of Tokyo, Japan, and NEC, also of
Tokyo, as well as Kyoto University Hospital
and Riken, Japan’s National Research and
Development Institute, which will provide
clinical data.
Deep learning is starting to gain acolytes in the drug discovery space.
KTSDESIGN/SciencePhotoLibrary
N E W S©2017NatureAmerica,Inc.,partofSpringerNature.Allrightsreserved.
Genomics data analytics startup WuXi
NextCode Genomics of Shanghai; Cambridge,
Massachusetts; and Reykjavík, Iceland, collab-
orated with researchers at Yale University on a
study that used the company’s deep-learning
algorithm to identify a key mechanism in
blood vessel growth. The result could aid drug
discovery efforts aimed at inhibiting blood
vessel growth in tumors (Nature doi:10.1038/
nature22322, 2017).
IntheUS,duringtheObamaadministration,
industry and academia joined forces to apply
AI to accelerate drug discovery as part of the
CancerMoonshotinitiative (Nat.Biotechnol.34 ,
119, 2016). The Accelerating Therapeutics for
Opportunities in Medicine (ATOM), launched
in January 2016, marries computational and
experimental approaches, with Brentford,
UK-based GlaxoSmithKline, participating
with Lawrence Livermore National Laboratory
in Livermore, California, and the US National
Cancer Institute. The computational portion
of the process, which includes deep-learning
and other AI algorithms, will be tested in the
first two years. In the third year, “we hope to
start on day one with a disease hypothesis and
on day 365 to deliver a drug candidate,” says
MarthaHead,GlaxoSmithKline’s head, insights
from data.
Table 1 Selected collaborations in the AI-drug discovery space
AI company/
location Technology
Announced partner/
location Indication(s) Deal date
Atomwise Deep-learning screening
from molecular structure
data
Merck Malaria 2015
BenevolentAI Deep-learning and natural
language processing of
research literature
Janssen Pharmaceutica
(Johnson  Johnson),
Beerse, Belgium
Multiple November 8,
2016
Berg,
Framingham,
Massachusetts
Deep-learning screening
of biomarkers from patient
data
None Multiple N/A
Exscientia Bispecific compounds via
Bayesian models of ligand
activity from drug discovery
data
Sanofi Metabolic
diseases
May 9, 2017
GNS
Healthcare
Bayesian probabilistic
inference for investigating
efficacy
Genentech Oncology June 19,
2017
Insilico
Medicine
Deep-learning screening
from drug and disease
databases
None Age-related
diseases
N/A
Numerate Deep learning from pheno-
typic data
Takeda Oncology, gastro-
enterology and
central nervous
system disorders
June 12,
2017
Recursion,
Salt Lake City,
Utah
Cellular phenotyping via
image analysis
Sanofi Rare genetic
diseases
April 25,
2016
twoXAR, Palo
Alto, California
Deep-learning screening
from literature and assay
data
Santen
Pharmaceuticals,
Osaka, Japan
Glaucoma February 23,
2017
N/A, none announced. Source: companies’ websites.
N E W S
•현재 하루에 10m 개의 compound 를 스크리닝 가능

•실험보다 10,000배, Ultra HTS 보다 100배 빠름

•Toxicity, side effects, mechanism of action, efficacy 등의 규명을 위해서도 사용

•머크를 포함한 10개의 제약사, 하버드 등 40개 연구 기관과 프로젝트 진행 중

•대상 질병: Alzheimer's disease, bacterial infections, antibiotics, nephrology, 



ophthalmology, immuno-oncology, metabolic and childhood liver diseases 등
BRIEF COMMUNICATION
https://doi.org/10.1038/s41587-019-0224-x
1
Insilico Medicine Hong Kong Ltd, Pak Shek Kok, New Territories, Hong Kong. 2
WuXi AppTec Co., Ltd, Shanghai, China. 3
Department of Chemistry,
University of Toronto, Toronto, Ontario, Canada. 4
Department of Computer Science, University of Toronto, Toronto, Ontario, Canada. 5
Vector Institute for
Artificial Intelligence, Toronto, Ontario, Canada. 6
Canadian Institute for Advanced Research, Toronto, Ontario, Canada. *e-mail: alex@insilico.com
We have developed a deep generative model, generative tenso-
rial reinforcement learning (GENTRL), for de novo small-mole-
cule design. GENTRL optimizes synthetic feasibility, novelty,
and biological activity. We used GENTRL to discover potent
inhibitors of discoidin domain receptor 1 (DDR1), a kinase tar-
get implicated in fibrosis and other diseases, in 21 days. Four
compounds were active in biochemical assays, and two were
validated in cell-based assays. One lead candidate was tested
and demonstrated favorable pharmacokinetics in mice.
Drug discovery is resource intensive, and involves typical time-
lines of 10–20 years and costs that range from US$0.5 billion to
US$2.6 billion1,2
. Artificial intelligence promises to accelerate this
process and reduce costs by facilitating the rapid identification of
compounds3,4
. Deep generative models are machine learning tech-
niques that use neural networks to produce new data objects. These
techniques can generate objects with certain properties, such as
activity against a given target, that make them well suited for the
discovery of drug candidates. However, few examples of generative
drug design have achieved experimental validation involving syn-
thesis of novel compounds for in vitro and in vivo investigation5–16
.
Discoidin domain receptor 1 (DDR1) is a collagen-activated pro-
inflammatory receptor tyrosine kinase that is expressed in epithelial
cellsandinvolvedinfibrosis17
.However,itisnotclearwhetherDDR1
directly regulates fibrotic processes, such as myofibroblast activa-
tion and collagen deposition, or earlier inflammatory events that
are associated with reduced macrophage infiltration. Since 2013, at
least eight chemotypes have been published as selective DDR1 (or
DDR1 and DDR2) small-molecule inhibitors (Supplementary Table
1). Recently, a series of highly selective, spiro-indoline-based DDR1
inhibitors were shown to have potential therapeutic efficacy against
renal fibrosis in a Col4a3–/–
mice model of Alport syndrome18
. A
wider diversity of DDR1 inhibitors would therefore enable further
basic understanding and therapeutic intervention.
We developed generative tensorial reinforcement learning
(GENTRL), a machine learning approach for de novo drug design.
GENTRL prioritizes the synthetic feasibility of a compound, its
effectiveness against a given biological target, and how distinct it
is from other molecules in the literature and patent space. In this
work, GENTRL was used to rapidly design novel compounds that
are active against DDR1 kinase. Six of these compounds, each
complying with Lipinski’s rules1
, were designed, synthesized, and
experimentally tested in 46 days, which demonstrates the potential of
this approach to provide rapid and effective molecular design (Fig. 1a).
To create GENTRL, we combined reinforcement learning, varia-
tional inference, and tensor decompositions into a generative two-
step machine learning algorithm (Supplementary Fig. 1)19
. First, we
learned a mapping of chemical space, a set of discrete molecular
graphs, to a continuous space of 50 dimensions. We parameterized the
structure of the learned manifold in the tensor train format to use par-
tially known properties. Our auto-encoder-based model compresses
the space of structures onto a distribution that parameterizes the
latent space in a high-dimensional lattice with an exponentially large
number of multidimensional Gaussians in its nodes. This parameter-
ization ties latent codes and properties, and works with missing values
without their explicit input. In the second step, we explored this space
with reinforcement learning to discover new compounds.
GENTRL uses three distinct self-organizing maps (SOMs) as
reward functions: the trending SOM, the general kinase SOM, and
the specific kinase SOM. The trending SOM is a Kohonen-based
reward function that scores compound novelty using the applica-
tion priority date of structures that have been disclosed in patents.
Neurons that are abundantly populated with novel chemical entities
reward the generative model. The general kinase SOM is a Kohonen
map that distinguishes kinase inhibitors from other classes of mol-
ecules. The specific kinase SOM isolates DDR1 inhibitors from the
total pool of kinase-targeted molecules. GENTRL prioritizes the
structures it generates by using these three SOMs in sequence.
We used six data sets to build the model: (1) a large set of mole-
cules derived from a ZINC data set, (2) known DDR1 kinase inhibi-
tors, (3) common kinase inhibitors (positive set), (4) molecules that
act on non-kinase targets (negative set), (5) patent data for biologi-
cally active molecules that have been claimed by pharmaceutical
companies, and (6) three-dimensional (3D) structures for DDR1
inhibitors (Supplementary Table 1). Data sets were preprocessed to
exclude gross outliers and to reduce the number of compounds that
contained similar structures (see Methods).
We started to train GENTRL (pretraining) on a filtered ZINC
database (data set 1, described earlier), and then continued train-
ing using the DDR1 and common kinase inhibitors (data set 2 and
data set 3). We then launched the reinforcement learning stage
with the reward described earlier. We obtained an initial output
of 30,000 structures (Supplementary Data Set), which were then
Deep learning enables rapid identification of
potent DDR1 kinase inhibitors
Alex Zhavoronkov 1
*, Yan A. Ivanenkov1
, Alex Aliper1
, Mark S. Veselov1
, Vladimir A. Aladinskiy1
,
Anastasiya V. Aladinskaya1
, Victor A. Terentiev1
, Daniil A. Polykovskiy1
, Maksim D. Kuznetsov1
,
Arip Asadulaev1
, Yury Volkov1
, Artem Zholus1
, Rim R. Shayakhmetov1
, Alexander Zhebrak1
,
Lidiya I. Minaeva1
, Bogdan A. Zagribelnyy1
, Lennart H. Lee 2
, Richard Soll2
, David Madge2
, Li Xing2
,
Tao Guo 2
and Alán Aspuru-Guzik3,4,5,6
NATURE BIOTECHNOLOGY | VOL 37 | SEPTEMBER 2019 | 1038–1040 | www.nature.com/naturebiotechnology1038
• 딥러닝으로 46일” 만에 신약 후보 물질 디자인하기

• 홍콩의 인공지능 신약개발 회사 Insilico Medicine 의 연구 (Nat Biotech, Sep 2019)

• DDR1 이라는 fibrosis (섬유증) 관련 표적에 대한 신약 후보 물질 발굴 연구

• 강화 학습 (reinforced learning)에 기반한 deep generative model 을 디자인

• 학습 데이터: 기존에 알려진 small molecule DB, 알려진 DDR1 inhibitor, DDR1의 3D 구조 등

• reward function으로 3가지 SOM을 사용: trending/general kinase/specific kinase SOM

• 다른 타겟에 대해서도 범용적으로 적용할 수 있는 방법인가?
BRIEF COMMUNICATION
https://doi.org/10.1038/s41587-019-0224-x
1
Insilico Medicine Hong Kong Ltd, Pak Shek Kok, New Territories, Hong Kong. 2
WuXi AppTec Co., Ltd, Shanghai, China. 3
Department of Chemistry,
University of Toronto, Toronto, Ontario, Canada. 4
Department of Computer Science, University of Toronto, Toronto, Ontario, Canada. 5
Vector Institute for
Artificial Intelligence, Toronto, Ontario, Canada. 6
Canadian Institute for Advanced Research, Toronto, Ontario, Canada. *e-mail: alex@insilico.com
We have developed a deep generative model, generative tenso-
rial reinforcement learning (GENTRL), for de novo small-mole-
cule design. GENTRL optimizes synthetic feasibility, novelty,
and biological activity. We used GENTRL to discover potent
inhibitors of discoidin domain receptor 1 (DDR1), a kinase tar-
get implicated in fibrosis and other diseases, in 21 days. Four
compounds were active in biochemical assays, and two were
validated in cell-based assays. One lead candidate was tested
and demonstrated favorable pharmacokinetics in mice.
Drug discovery is resource intensive, and involves typical time-
lines of 10–20 years and costs that range from US$0.5 billion to
US$2.6 billion1,2
. Artificial intelligence promises to accelerate this
process and reduce costs by facilitating the rapid identification of
compounds3,4
. Deep generative models are machine learning tech-
niques that use neural networks to produce new data objects. These
techniques can generate objects with certain properties, such as
activity against a given target, that make them well suited for the
discovery of drug candidates. However, few examples of generative
drug design have achieved experimental validation involving syn-
thesis of novel compounds for in vitro and in vivo investigation5–16
.
Discoidin domain receptor 1 (DDR1) is a collagen-activated pro-
inflammatory receptor tyrosine kinase that is expressed in epithelial
cellsandinvolvedinfibrosis17
.However,itisnotclearwhetherDDR1
directly regulates fibrotic processes, such as myofibroblast activa-
tion and collagen deposition, or earlier inflammatory events that
are associated with reduced macrophage infiltration. Since 2013, at
least eight chemotypes have been published as selective DDR1 (or
DDR1 and DDR2) small-molecule inhibitors (Supplementary Table
1). Recently, a series of highly selective, spiro-indoline-based DDR1
inhibitors were shown to have potential therapeutic efficacy against
renal fibrosis in a Col4a3–/–
mice model of Alport syndrome18
. A
wider diversity of DDR1 inhibitors would therefore enable further
basic understanding and therapeutic intervention.
We developed generative tensorial reinforcement learning
(GENTRL), a machine learning approach for de novo drug design.
GENTRL prioritizes the synthetic feasibility of a compound, its
effectiveness against a given biological target, and how distinct it
is from other molecules in the literature and patent space. In this
work, GENTRL was used to rapidly design novel compounds that
are active against DDR1 kinase. Six of these compounds, each
complying with Lipinski’s rules1
, were designed, synthesized, and
experimentally tested in 46 days, which demonstrates the potential of
this approach to provide rapid and effective molecular design (Fig. 1a).
To create GENTRL, we combined reinforcement learning, varia-
tional inference, and tensor decompositions into a generative two-
step machine learning algorithm (Supplementary Fig. 1)19
. First, we
learned a mapping of chemical space, a set of discrete molecular
graphs, to a continuous space of 50 dimensions. We parameterized the
structure of the learned manifold in the tensor train format to use par-
tially known properties. Our auto-encoder-based model compresses
the space of structures onto a distribution that parameterizes the
latent space in a high-dimensional lattice with an exponentially large
number of multidimensional Gaussians in its nodes. This parameter-
ization ties latent codes and properties, and works with missing values
without their explicit input. In the second step, we explored this space
with reinforcement learning to discover new compounds.
GENTRL uses three distinct self-organizing maps (SOMs) as
reward functions: the trending SOM, the general kinase SOM, and
the specific kinase SOM. The trending SOM is a Kohonen-based
reward function that scores compound novelty using the applica-
tion priority date of structures that have been disclosed in patents.
Neurons that are abundantly populated with novel chemical entities
reward the generative model. The general kinase SOM is a Kohonen
map that distinguishes kinase inhibitors from other classes of mol-
ecules. The specific kinase SOM isolates DDR1 inhibitors from the
total pool of kinase-targeted molecules. GENTRL prioritizes the
structures it generates by using these three SOMs in sequence.
We used six data sets to build the model: (1) a large set of mole-
cules derived from a ZINC data set, (2) known DDR1 kinase inhibi-
tors, (3) common kinase inhibitors (positive set), (4) molecules that
act on non-kinase targets (negative set), (5) patent data for biologi-
cally active molecules that have been claimed by pharmaceutical
companies, and (6) three-dimensional (3D) structures for DDR1
inhibitors (Supplementary Table 1). Data sets were preprocessed to
exclude gross outliers and to reduce the number of compounds that
contained similar structures (see Methods).
We started to train GENTRL (pretraining) on a filtered ZINC
database (data set 1, described earlier), and then continued train-
ing using the DDR1 and common kinase inhibitors (data set 2 and
data set 3). We then launched the reinforcement learning stage
with the reward described earlier. We obtained an initial output
of 30,000 structures (Supplementary Data Set), which were then
Deep learning enables rapid identification of
potent DDR1 kinase inhibitors
Alex Zhavoronkov 1
*, Yan A. Ivanenkov1
, Alex Aliper1
, Mark S. Veselov1
, Vladimir A. Aladinskiy1
,
Anastasiya V. Aladinskaya1
, Victor A. Terentiev1
, Daniil A. Polykovskiy1
, Maksim D. Kuznetsov1
,
Arip Asadulaev1
, Yury Volkov1
, Artem Zholus1
, Rim R. Shayakhmetov1
, Alexander Zhebrak1
,
Lidiya I. Minaeva1
, Bogdan A. Zagribelnyy1
, Lennart H. Lee 2
, Richard Soll2
, David Madge2
, Li Xing2
,
Tao Guo 2
and Alán Aspuru-Guzik3,4,5,6
NATURE BIOTECHNOLOGY | VOL 37 | SEPTEMBER 2019 | 1038–1040 | www.nature.com/naturebiotechnology1038
BRIEF COMMUNICATIONNATUREBIOTECHNOLOGY
IC50
(DDR2) = 234 nM
IC50
(DDR1) = 10 nM IC50
(DDR1) = 21 nM
IC50
(DDR1) = 278 nM
IC50
(DDR2) = 162 nM
IC50
(DDR2) = 76 nM
IC50
(DDR1) = 1,000 nM
IC50
(DDR1)  10
4
nM IC50
(DDR1)  10
4
nM
IC50
(DDR2)  10
4
nMIC50
(DDR2)  10
4
nM
IC50
(DDR2) = 649 nM
7 days
a
12 days 2 days
Prioritization
Descriptors
MCFs
Clustering
Diversity
GENTRL
Model training
Structure generation
Reward functions
- Kohonen maps
- Novelty
Pharmacophore
hypothesis
Databases
- Kinase inhibitors
- Non-kinase set
- X-ray data
- IP base
Preprocessing
Outliers
Reference
compounds
Day 9
Target selection
by WuXi AppTec
DDR1 kinase
30,000
structures
Vendors
Kohonen
RMSD
Sammon
Parent structure
(IC
50
= 15.5 nM, DDR1)
25 days
Synthesis
Synthetic routes analysis
Prioritization
40 structures
Day 23
Day 35 Day 46
IP assessment
Biological evaluation
Six compounds
WuXi AppTec
Novel nanomolar hits
b c
• 딥러닝으로 46일” 만에 신약 후보 물질 디자인하기

• DB와 강화학습 기반의 모델을 바탕으로 30,000여 개의 화합물 디자인 (Day 21)

• 몇 가지 기준을 바탕으로 chemical space를 모두 커버할 수 있는 40개 화합물 선택

• 그 중에서 6개를 합성하는 데 성공 (Day 35)

• 이 6 가지 후보 물질 중 1, 2번이 in vitro 에서 strong inhibition (IC50 = 10nM, 21nM)

• 1, 2번 화합물은 cell based assay 에서 fibrotic marker 를 효과적으로 inhibition 확인 

• 1번을 rodent model 에서 half life 까지 확인 (Day 46)
AnalysisTarget Discovery AnalysisLead Discovery Clinical Trial
Post Market
Surveillance
Digital Healthcare in Drug Development
•환자 모집

•데이터 측정: 센서웨어러블

•디지털 표현형

•복약 순응도
Empowering the Oncology Community for Cancer Care
Genomics
Oncology
Clinical
Trial
Matching
Watson Health’s oncology clients span more than 35 hospital systems
“Empowering the Oncology Community
for Cancer Care”
Andrew Norden, KOTRA Conference, March 2017, “The Future of Health is Cognitive”
IBM Watson Health
Watson for Clinical Trial Matching (CTM)
18
1. According to the National Comprehensive Cancer Network (NCCN)
2. http://csdd.tufts.edu/files/uploads/02_-_jan_15,_2013_-_recruitment-retention.pdf© 2015 International Business Machines Corporation
Searching across
eligibility criteria of clinical
trials is time consuming
and labor intensive
Current
Challenges
Fewer than 5% of
adult cancer patients
participate in clinical
trials1
37% of sites fail to meet
minimum enrollment
targets. 11% of sites fail
to enroll a single patient 2
The Watson solution
• Uses structured and unstructured
patient data to quickly check
eligibility across relevant clinical
trials
• Provides eligible trial
considerations ranked by
relevance
• Increases speed to qualify
patients
Clinical Investigators
(Opportunity)
• Trials to Patient: Perform
feasibility analysis for a trial
• Identify sites with most
potential for patient enrollment
• Optimize inclusion/exclusion
criteria in protocols
Faster, more efficient
recruitment strategies,
better designed protocols
Point of Care
(Offering)
• Patient to Trials:
Quickly find the
right trial that a
patient might be
eligible for
amongst 100s of
open trials
available
Improve patient care
quality, consistency,
increased efficiencyIBM Confidential
AnalysisTarget Discovery AnalysisLead Discovery Clinical Trial
Post Market
Surveillance
Digital Healthcare in Drug Development
•개인 유전 정보 분석

•블록체인 기반 유전체 분석
•딥러닝 기반 후보 물질

•인공지능+제약사
•환자 모집

•데이터 측정: 웨어러블

•디지털 표현형

•복약 순응도
•SNS 기반의 PMS

•블록체인 기반의 PMS
What else?
What is drug?
AnalysisTarget Discovery AnalysisLead Discovery Clinical Trial
Post Market
Surveillance
Digital Healthcare in Drug Development
•개인 유전 정보 분석

•블록체인 기반 유전체 분석
•딥러닝 기반 후보 물질

•인공지능+제약사
•환자 모집

•데이터 측정: 웨어러블

•디지털 표현형

•복약 순응도
•SNS 기반의 PMS

•블록체인 기반의 PMS
+
Digital Therapeutics
Digital Therapeutics
디지털 신약 / 디지털 치료제
The Birth of Prescription Digital Therapeutics,
Pear Therapeutics and InCrowd, IIeX 2018”
DTxDM East 

(2018.10 at Boston)
“치료 효과가 있는 ‘게임’이 아니라, 

‘치료제’가 (어쩌다보니) 게임의 형식을 가진 것이다”

by Eddie Martucci, CEO of Akili Interactive, at DTxDM East 2018
www.dtxalliance.org
Digital Therapeutics:
Combining Technology and Evidence-based
Medicine to Transform Personalized Patient Care
Nov 2018
5www.dtxalliance.org
Defining Digital Therapeutics
Thought leaders across the digital therapeutics industry,
supported by the Digital Therapeutics Alliance, collaborated
to develop the following comprehensive definition:
Digital therapeutics (DTx) deliver evidence-based
therapeutic interventions to patients that are driven by
high quality software programs to prevent, manage,
or treat a medical disorder or disease. They are used
independently or in concert with medications, devices,
or other therapies to optimize patient care and health
outcomes.
DTx products incorporate advanced technology best
practices relating to design, clinical validation, usability,
and data security. They are reviewed and cleared or
approved by regulatory bodies as required to support
product claims regarding risk, efficacy, and intended use.
Digital therapeutics empower patients, healthcare
providers, and payers with intelligent and accessible tools
for addressing a wide range of conditions through high
quality, safe, and effective data-driven interventions.
Digital therapeutics
present the market
with evidence-based
technologies that
have the ability to
elevate medical best
practices, address
unmet medical needs,
expand healthcare
access, and improve
clinical and health
economic outcomes.
• 질병을 예방, 관리, 혹은 치료하는 고도의 소프트웨어 프로그램
• 독립적으로 사용될 수도 있고, 약제/기기/다른 치료제와 함께 사용될 수 있음
• 효능, 목적, 위험도 등의 주장과 관련해서는 규제기관의 인허가를 거침
8 www.dtxalliance.org
Developing Industry Standards
The direct delivery of personalized treatment interventions
to patients places digital therapeutics in a unique position, one
full of additional responsibility and promise. Given the diversity of
interventions being delivered by digital therapeutics and the types of
disease states addressed, it is important for all products to adhere to
industry-adopted core principles and best practices.
Core principles all digital therapeutics must adhere to:
Prevent, manage, or treat a medical disorder or disease
Produce a medical intervention that is driven by software, and
delivered via software or complementary hardware, medical device,
service, or medication
Incorporate design, manufacture, and quality best practices
Engage end users in product development and usability processes
Incorporate patient privacy and security protections
Apply product deployment, management, and maintenance best
practices
Publish trial results inclusive of clinically-meaningful outcomes in
peer-reviewed journals
Be reviewed and cleared or approved by regulatory bodies as
required to support product claims of risk, efficacy, and intended use
Make claims appropriate to clinical validation and regulatory status
Collect, analyze, and apply real world evidence and product
performance data
Digital therapeutics
are designed to
integrate into
patient lifestyles and
provider workflows
to deliver a fully
integrated healthcare
experience with
improved outcomes.
• 모든 digital therapeutics 가 따라야 하는 Core Principle:
• 이 medical intervention은 소프트웨어에 의해서 주도(driven by)되고,
• 또한 소프트웨어, 혹은 보완적인 하드웨어나 의료기기, 약을 통해 전달(delivered) 된다.
디지털 헬스케어
*SaMD: Software as a Medical Device
의료기기
디지털 치료제
런키퍼

슬립싸이클
엑스레이 기기

혈압계

체온계
엠페티카

얼라이브코어

프로테우스
의료 인공지능
왓슨
캄
페어
알킬리
어플라이드VR
(활동량 측정 웨어러블)
(달리기 모니터링 앱)
(수면 모니터링 앱)
(암 환자 

진료 보조)
(뇌전증 발작 측정 웨어러블)
(심전도 측정 가젯)
(복약 측정용 먹는 센서)
(명상 앱)
(중독 치료앱)
(ADHD 치료용 게임)
(진통 효과 VR)
하드웨어 기반의

의료기기들
최윤섭 디지털 헬스케어 연구소

소장 최윤섭, PhD 

yoonsup.choi@gmail.com
S/W
SaMD*
뷰노 루닛
지브라 메디컬 IDx
(엑스레이 기흉 등) (안저 사진 당뇨성 망막병증)
(엑스레이 골연령 등) (엑스레이 폐결절 등)
핏빗
제품의 목적 1. 건강 관리 2. 질병의 관리/예방
3. 다른 의약품의 

최적화
4. 질병 치료
제품의 유효성, 

위해도, 사용 목적 

등에 대한 주장
규제 기관 재량 

(항상 규제 받는 것은 아님)
제3자의 검증이 필요하며, 규제 기관의 규제를 받음
질병과 관련된 

제품의 주장 범위
질병에 관련한 유효성 

주장은 허용되지 않음
낮음~중간의 위해도 

(eg. 질병의 진행 속도를 

늦춰줌)
중간~높은 위해도 

(eg. 기존 약제의 

유효성을 높여줌)
중간~높은 위해도 

(eg. 질병 치료 등 

의학적인 유효성)
임상적인 근거 임상 시험이 필요하며, 지속적인 근거의 창출이 필요
구매 방식
환자 직접 구매 (DTC)

(의사 처방 필요 x)
일반의약품 (over-the-counter)

혹은 의사 처방 필요
의사 처방 필요
다른 약제와의 관계
독립적으로 사용 

or 다른 약제 간접 지원
단독 투여

or

병용 투여
병용 투여
단독 투여

or

병용 투여
디지털 치료제의 종류
대표적인 Digital Therapeutics의 사례연구
• Pear Therapeutics
• Akili Interactive
• Click Therapeutics
• Dthera Science
• Noom, Omada Health
• Hurray Positive, SK Health Connect
• Virtual Vietnam
• AppliedVR
• Woebot
• Cognoa
• Propeller Health
• Neofect
대표적인 Digital Therapeutics의 사례연구
• Pear Therapeutics
• Akili Interactive
• Click Therapeutics
• Dthera Science
• Noom, Omada Health
• Hurray Positive, SK Health Connect
• Virtual Vietnam
• AppliedVR
• Woebot
• Cognoa
• Propeller Health
• Neofect
Pear Therapeutics
Pear Therapeutics
•Pear Therapeutics의 reSET

•의사의 ‘처방’을 받아, 12주에 걸쳐 알콜, 코카인, 대마 등의 중독과 의존성을 치료

•스마트폰 앱 만으로 치료용 FDA 인허가 (De Novo)를 받은 것은 최초 (2017년 9월)

•업계에서는 digital therapeutics의 시초로 이 Pear Therapeutics를 꼽음
•reSET 의 Indication for Use

•18세 이상의, Substance Use Disorder(SUD)으로, 외래 진료를 받는 환자에게

•의사의 감독 하에, 기존의 contingency management system 에 더하여 (adjunctive to)

•CBT(Cognitive Behavioral Therapy)를 12주 동안 제공하여,

•SUD에 대한 abstinence와 치료 프로그램의 retention을 증가시키는 것이 목적
RCT of reSET
DE NOVO CLASSIFICATION REQUEST FOR RESET
logistic Generalized Estimating Equations (GEE) model with factors for treatment, time
and treatment X time (“treatment times time”) interaction. Missing data were treated as
failures. The analysis results of abstinence for cohort 1 and 2 are presented below,
additionally compared by abstinence at baseline. The abstinence analyses were
completed in the context of a GEE model that incorporates within-subject variability
across the observation window and estimates abstinence at specified time points based on
the model the analyses yields percentages rather than absolute numbers. The number of
patients reported in the table below represents the number of patients in that entire group
(e.g., N=252 patients in Cohort 1 were in the TAU group overall; N=139 patients were
abstinent at baseline in the Cohort 1 TAU group).
Table 3: Abstinence rates in Cohorts 1 (N=507) and 2 (N=399)
Patients who received rTAU + reSET had statistically significant increased odds of
remaining abstinent at the end of treatment:
Cohort 1: Odds ratio=2.22, 95% CI (1.24, 3.99); p=0.0076
Cohort 2: Odds ratio=3.17, 95% CI (1.68, 5.99); p=0.0004.
Cohort 3 (all opioids excluded, N=153 TAU, N=152 rTAU+reSET) had similar
abstinence to cohorts 1 and 2, with abstinence rates in the rTAU + reSET arm of 38.5%
compared to 17.5% in the TAU arm (Odds Ratio=2.95, 95% CI=1.43, 6.09, p=0.0034).
Abstinence: patients who were abstinent at baseline: Patients who were abstinent at
baseline were significantly more likely to remain abstinent throughout the study than
patients who were not abstinent at baseline for both patients who received TAU and
patients who received rTAU + reSET.
• TAU(Treatment As Usual)과 rTAU(reduced TAU)+reSET을 RCT
• Primary Opioid를 포함/제외하여 따로 분석
• Baseline에서 Abstinent/non-abstinent를 별개로 분석
RCT of reSET
DE NOVO CLASSIFICATION REQUEST FOR RESET
logistic Generalized Estimating Equations (GEE) model with factors for treatment, time
and treatment X time (“treatment times time”) interaction. Missing data were treated as
failures. The analysis results of abstinence for cohort 1 and 2 are presented below,
additionally compared by abstinence at baseline. The abstinence analyses were
completed in the context of a GEE model that incorporates within-subject variability
across the observation window and estimates abstinence at specified time points based on
the model the analyses yields percentages rather than absolute numbers. The number of
patients reported in the table below represents the number of patients in that entire group
(e.g., N=252 patients in Cohort 1 were in the TAU group overall; N=139 patients were
abstinent at baseline in the Cohort 1 TAU group).
Table 3: Abstinence rates in Cohorts 1 (N=507) and 2 (N=399)
Patients who received rTAU + reSET had statistically significant increased odds of
remaining abstinent at the end of treatment:
Cohort 1: Odds ratio=2.22, 95% CI (1.24, 3.99); p=0.0076
Cohort 2: Odds ratio=3.17, 95% CI (1.68, 5.99); p=0.0004.
Cohort 3 (all opioids excluded, N=153 TAU, N=152 rTAU+reSET) had similar
abstinence to cohorts 1 and 2, with abstinence rates in the rTAU + reSET arm of 38.5%
compared to 17.5% in the TAU arm (Odds Ratio=2.95, 95% CI=1.43, 6.09, p=0.0034).
Abstinence: patients who were abstinent at baseline: Patients who were abstinent at
baseline were significantly more likely to remain abstinent throughout the study than
patients who were not abstinent at baseline for both patients who received TAU and
patients who received rTAU + reSET.
• Cohort 2 (Excluding Primary Opioid) 의,
• Overall 과 Non-abstinent at baseline 그룹에서 통계적으로 유의한 차이
RCT of reSET
DE NOVO CLASSIFICATION REQUEST FOR RESET
The Kaplan-Meier curve for cohort 1 is shown below:
Figure 2: Kaplan-Meier curve for Cohort 1 (all comers)
Adverse events
In the entire clinical study, the number of patients with any adverse event was 13% (n=66). The
number of patients with any event was 29 (11.5%) in TAU and 37 (14.5%) in reSET + rTAU (p
= 0.3563). None of the adverse events in the reSET arm were adjudicated by the study
investigators to be device-related. The events evaluated were typical of patients with SUD,
including cardiovascular disease, gastrointestinal events, depression, mania, suicidal behavior,
• Cohort1에서 TAU와 rTAU+reSET의 12주 이후 retention을 비교
• Retention에도 통계적으로 유의미한 차이 확인
• Somryst: 불면증 치료용 DTx 를 2020년 3월 FDA 인허가

• 510(k)와 Pre-Cert 를 동시에 진행

• 불면증 CBT (인지행동치료) 소프트웨어: 처방을 받아야만 사용 가능

• 두 개의 RCT (무작위 대조군 임상시험)을 통해서 유효성을 증명

• 총 1,400명 이상의 불면증 및 우울증+불면증 환자가 등록

• 한 대규모 임상에서 1,00명의 성인에게 9주 동안 치료를 제공하여 불면증에 유의미한 개선

• 임상적 유효성이 18개월 동안 지속 되었음
대표적인 Digital Therapeutics의 사례연구
• Pear Therapeutics
• Akili Interactive
• Click Therapeutics
• Dthera Science
• Noom, Omada Health
• Hurray Positive, SK Health Connect
• Virtual Vietnam
• AppliedVR
• Woebot
• Cognoa
• Propeller Health
• Neofect
ADHD 치료용 아이패드 기반의 게임
LETTER doi:10.1038/nature12486
Video game training enhances cognitive control in
older adults
J. A. Anguera1,2,3
, J. Boccanfuso1,3
, J. L. Rintoul1,3
, O. Al-Hashimi1,2,3
, F. Faraji1,3
, J. Janowich1,3
, E. Kong1,3
, Y. Larraburo1,3
,
C. Rolle1,3
, E. Johnston1
 A. Gazzaley1,2,3,4
Cognitivecontrolisdefinedbyasetofneuralprocessesthatallowusto
interact with our complex environment in a goal-directed manner1
.
Humans regularly challenge these control processes when attempting
to simultaneously accomplish multiple goals (multitasking), generat-
ing interference as the result of fundamental information processing
limitations2
. It is clear that multitasking behaviour has become ubi-
quitous in today’s technologically dense world3
, and substantial evid-
ence has accrued regarding multitasking difficulties and cognitive
control deficits in our ageing population4
. Here we show that multi-
tasking performance, as assessed with a custom-designed three-
dimensional video game (NeuroRacer), exhibits a linear age-related
decline from 20 to 79 years of age. By playing an adaptive version of
NeuroRacer in multitasking training mode, older adults (60 to 85
years old) reduced multitasking costs compared to both an active
control group and a no-contact control group, attaining levels beyond
those achieved by untrained 20-year-old participants, with gains
persisting for 6 months. Furthermore, age-related deficits in neural
signatures of cognitive control, as measured with electroencephalo-
graphy,wereremediated by multitasking training (enhanced midline
frontal theta power and frontal–posterior theta coherence). Critically,
thistrainingresultedinperformancebenefitsthatextendedtountrained
cognitive control abilities (enhanced sustained attention and working
memory), with an increase in midline frontal theta power predicting
the training-induced boost in sustained attention and preservation
of multitasking improvement 6 months later. These findings high-
light the robust plasticity of the prefrontal cognitive control system
inthe ageing brain, and provide the first evidence, to our knowledge,
ofhowacustom-designedvideogamecanbeusedtoassesscognitive
abilities across the lifespan, evaluate underlying neural mechanisms,
and serve as a powerful tool for cognitive enhancement.
In a first experiment, we evaluated multitasking performance across
the adult lifespan. A total of 174 participants spanning six decades of life
(ages 20–79; ,30 individuals per decade) played a diagnostic version of
NeuroRacertomeasuretheirperceptualdiscriminationability(‘signtask’)
withandwithoutaconcurrentvisuomotortrackingtask(‘drivingtask’;see
Supplementary Information for details of NeuroRacer). Performance
was evaluated using two distinct game conditions: ‘sign only’ (respond
as rapidly as possible to the appearance of a sign only when a green circle
was present); and ‘sign and drive’ (simultaneously perform the sign task
while maintaining a car in the centre of a winding road using a joystick
(that is, ‘drive’; see Fig. 1a)). Perceptual discrimination performance was
evaluatedusingthesignaldetectionmetricofdiscriminability(d9).A‘cost’
index was used to assess multitasking performance by calculating the
percentage change in d9 from ‘sign only’ to ‘sign and drive’, such that
greater cost (that is, a more negative percentage cost) indicates increased
interference when simultaneously engaging in the two tasks (see Methods
Summary).
Prior to the assessment of multitasking costs, an adaptive staircase
algorithm was used to determine the difficulty levels of the game at
which each participant performed the perceptual discrimination and
visuomotor tracking tasks in isolation at ,80% accuracy. These levels
were then used to set the parameters of the component tasks in the
multitasking condition, so that each individual played the game at a
customizedchallengelevel.Thisensuredthatcomparisonswouldinform
differences in the ability to multitask, and not merely reflect disparities in
component skills (see Methods, Supplementary Figs 1 and 2, and Sup-
plementary Information for more details).
Multitasking performance diminished significantly across the adult
lifespan in a linear fashion (that is, increasing cost, see Fig. 2a and Sup-
plementaryTable1),withtheonlysignificantdifferenceincostbetween
adjacent decades being the increase from the twenties (226.7% cost) to
the thirties (238.6% cost). This deterioration in multitasking perform-
ance is consistent with the pattern of performance decline across the
lifespan observed for fluid cognitive abilities, such as reasoning5
and
working memory6
. Thus, using NeuroRacer as a performance assess-
ment tool, we replicated previously evidenced age-related multitasking
deficits7,8
, and revealed that multitasking performance declines linearly
as we advance in age beyond our twenties.
In a second experiment, we explored whether older adults who trained
by playing NeuroRacer in multitasking mode would exhibit improve-
mentsintheirmultitaskingperformanceonthegame9,10
(thatis,diminished
NeuroRacer costs). Critically, we also assessed whether this training
1
Department of Neurology, University of California, San Francisco, California 94158, USA. 2
Department of Physiology, University of California, San Francisco, California 94158, USA. 3
Center for Integrative
Neuroscience, University of California, San Francisco, California 94158, USA. 4
Department of Psychiatry, University of California, San Francisco, California 94158, USA.
1
month
MultitaskingSingle taskNo-contact
control
Initial
visit
NeuroRacer
EEG and
cognitive
testing
Drive only Sign only Sign and drive
and
1 hour × 3 times per week × 1 month
or
Single task Multitask
6+
months
Training intervention
NeuroRacer
or
a
b
+ +
Figure 1 | NeuroRacer experimental conditions and training design.
a, Screen shot captured during each experimental condition. b, Visualization of
training design and measures collected at each time point.
5 S E P T E M B E R 2 0 1 3 | V O L 5 0 1 | N A T U R E | 9 7
Macmillan Publishers Limited. All rights reserved©2013
OPEN
ORIGINAL ARTICLE
Characterizing cognitive control abilities in children with
16p11.2 deletion using adaptive ‘video game’ technology: a
pilot study
JA Anguera1,2
, AN Brandes-Aitken1
, CE Rolle1
, SN Skinner1
, SS Desai1
, JD Bower3
, WE Martucci3
, WK Chung4
, EH Sherr1,5
and
EJ Marco1,2,5
Assessing cognitive abilities in children is challenging for two primary reasons: lack of testing engagement can lead to low testing
sensitivity and inherent performance variability. Here we sought to explore whether an engaging, adaptive digital cognitive
platform built to look and feel like a video game would reliably measure attention-based abilities in children with and without
neurodevelopmental disabilities related to a known genetic condition, 16p11.2 deletion. We assessed 20 children with 16p11.2
deletion, a genetic variation implicated in attention deficit/hyperactivity disorder and autism, as well as 16 siblings without the
deletion and 75 neurotypical age-matched children. Deletion carriers showed significantly slower response times and greater
response variability when compared with all non-carriers; by comparison, traditional non-adaptive selective attention assessments
were unable to discriminate group differences. This phenotypic characterization highlights the potential power of administering
tools that integrate adaptive psychophysical mechanics into video-game-style mechanics to achieve robust, reliable measurements.
Translational Psychiatry (2016) 6, e893; doi:10.1038/tp.2016.178; published online 20 September 2016
INTRODUCTION
Cognition is typically associated with measures of intelligence
(for example, intellectual quotient (IQ)1
), and is a reflection of
one’s ability to perform higher-level processes by engaging
specific mechanisms associated with learning, memory and
reasoning. Such acts require the engagement of a specific subset
of cognitive resources called cognitive control abilities,2–5
which
engage the underlying neural mechanisms associated with atten-
tion, working memory and goal-management faculties.6
These
abilities are often assessed with validated pencil-and-paper
approaches or, now more commonly with these same paradigms
deployed on either desktop or laptop computers. These
approaches are often less than ideal when assessing pediatric
populations, as children have highly varied degree of testing
engagement, leading to low test sensitivity.7–9
This is especially
concerning when characterizing clinical populations, as increased
performance variability in these groups often exceeds the range of
testing sensitivity,7–9
limiting the ability to characterize cognitive
deficits in certain populations. A proper assessment of cognitive
control abilities in children is especially important, as these
abilities allow children to interact with their complex environment
in a goal-directed manner,10
are predictive of academic
performance11
and are correlated with overall quality of life.12
For pediatric clinical populations, this characterization is especially
critical as they are often assessed in an indirect fashion through
intelligence quotients, parent report questionnaires13
and/or
behavioral challenges,14
each of which fail to properly characterize
these abilities in a direct manner.
One approach to make testing more robust and user-friendly is
to present material in an optimally engaging manner, a strategy
particularly beneficial when assessing children. The rise of digital
health technologies facilitates the ability to administer these types
of tests on tablet-based technologies (that is, iPad) in a game-like
manner.15
For instance, Dundar and Akcayir16
assessed tablet-
based reading compared with book reading in school-aged
children, and discovered that students preferred tablet-based
reading, reporting it to be more enjoyable. Another approach
used to optimize the testing experience involves the integration of
adaptive staircase algorithms, as the incorporation of such appro-
aches lead to more reliable assessments that can be completed in
a timely manner. This approach, rooted in psychophysical
research,17
has been a powerful way to ensure that individuals
perform at their ability level on a given task, mitigating the possi-
bility of floor/ceiling effects. With respect to assessing individual
abilities, the incorporation of adaptive mechanics acts as a
normalizing agent for each individual in accordance with their
underlying cognitive abilities,18
facilitating fair comparisons between
groups (for example, neurotypical and study populations).
Adaptive mechanics in a consumer-style video game experi-
ence could potentially assist in the challenge of interrogating
cognitive abilities in a pediatric patient population. This synergistic
approach would seemingly raise one’s level of engagement by
making the testing experience more enjoyable and with greater
sensitivity to individual differences, a key aspect typically missing
in both clinical and research settings when testing these
populations. Video game approaches have previously been
utilized in clinical adult populations (for example, stroke,19,20
1
Department of Neurology, University of California, San Francisco, San Francisco, CA, USA; 2
Department of Psychiatry, University of California, San Francisco, San Francisco, CA,
USA; 3
Akili Interactive Labs, Boston, MA, USA; 4
Department of Pediatrics, Columbia University Medical Center, New York, NY, USA and 5
Department of Pediatrics, University of
California, San Francisco, San Francisco, CA, USA. Correspondence: JA Anguera or EJ Marco, University of California, San Francisco, Mission Bay – Sandler Neurosciences Center,
UCSF MC 0444, 675 Nelson Rising Lane, Room 502, San Francisco, CA 94158, USA.
E-mail: joaquin.anguera@ucsf.edu or elysa.marco@ucsf.edu
Received 6 March 2016; revised 13 July 2016; accepted 18 July 2016
Citation: Transl Psychiatry (2016) 6, e893; doi:10.1038/tp.2016.178
www.nature.com/tp
Figure 2. Project: EVO selective attention performance. (a) EVO single- and multi-tasking response time performance f
non-affected siblings and non-affected control groups). (b) EVO multi-tasking RT. (c) Visual search task performance
Characterizing cognitive control abilities in child
JA Anguera et al
•Project EVO (게임)을 통해서, 

•아동 집중력 장애(attention disorder) 관련 특정 유전형 carrier 를 골라낼 수 있음

•게임에서의 Response Time을 기준으로 carrier vs. non-carrier 간 유의미한 차이
RESEARCH ARTICLE
A pilot study to determine the feasibility of
enhancing cognitive abilities in children with
sensory processing dysfunction
Joaquin A. Anguera1,2☯
*, Anne N. Brandes-Aitken1☯
, Ashley D. Antovich1
, Camarin
E. Rolle1
, Shivani S. Desai1
, Elysa J. Marco1,2,3
1 Department of Neurology, University of California, San Francisco, United States of America, 2 Department
of Psychiatry, University of California, San Francisco, United States of America, 3 Department of Pediatrics,
University of California, San Francisco, United States of America
☯ These authors contributed equally to this work.
* joaquin.anguera@ucsf.edu
Abstract
Children with Sensory Processing Dysfunction (SPD) experience incoming information in
atypical, distracting ways. Qualitative challenges with attention have been reported in these
children, but such difficulties have not been quantified using either behavioral or functional
neuroimaging methods. Furthermore, the efficacy of evidence-based cognitive control inter-
ventions aimed at enhancing attention in this group has not been tested. Here we present
work aimed at characterizing and enhancing attentional abilities for children with SPD. A
sample of 38 SPD and 25 typically developing children were tested on behavioral, neural,
and parental measures of attention before and after a 4-week iPad-based at-home cognitive
remediation program. At baseline, 54% of children with SPD met or exceeded criteria on a
parent report measure for inattention/hyperactivity. Significant deficits involving sustained
attention, selective attention and goal management were observed only in the subset of
SPD children with parent-reported inattention. This subset of children also showed reduced
midline frontal theta activity, an electroencephalographic measure of attention. Following
the cognitive intervention, only the SPD children with inattention/hyperactivity showed both
improvements in midline frontal theta activity and on a parental report of inattention. Notably,
33% of these individuals no longer met the clinical cut-off for inattention, with the parent-
reported improvements persisting for 9 months. These findings support the benefit of a
targeted attention intervention for a subset of children with SPD, while simultaneously
highlighting the importance of having a multifaceted assessment for individuals with neuro-
developmental conditions to optimally personalize treatment.
Introduction
Five percent of all children suffer from Sensory Processing Dysfunction (SPD)[1], with these
individuals exhibiting exaggerated aversive, withdrawal, or seeking behaviors associated with
sensory inputs [2]. These sensory processing differences can have significant and lifelong con-
sequences for learning and social abilities, and are often shared by children who meet
PLOS ONE | https://doi.org/10.1371/journal.pone.0172616 April 5, 2017 1 / 19
a1111111111
a1111111111
a1111111111
a1111111111
a1111111111
OPEN ACCESS
Citation: Anguera JA, Brandes-Aitken AN, Antovich
AD, Rolle CE, Desai SS, Marco EJ (2017) A pilot
study to determine the feasibility of enhancing
cognitive abilities in children with sensory
processing dysfunction. PLoS ONE 12(4):
e0172616. https://doi.org/10.1371/journal.
pone.0172616
Editor: Jacobus P. van Wouwe, TNO,
NETHERLANDS
Received: October 5, 2016
Accepted: February 1, 2017
Published: April 5, 2017
Copyright: © 2017 Anguera et al. This is an open
access article distributed under the terms of the
Creative Commons Attribution License, which
permits unrestricted use, distribution, and
reproduction in any medium, provided the original
author and source are credited.
Data Availability Statement: All relevant data are
within the paper and its Supporting Information
files.
Funding: This work was supported by the
Mickelson-Brody Family Foundation, the Wallace
Research Foundation, the James Gates Family
Foundation, the Kawaja-Holcombe Family
Foundation (EJM), and the SNAP 2015 Crowd
funding effort.
•감각처리장애(SPD)를 가진 소아 환자 중 ADHD를 가진 20명에 대해서 실험

•4주 동안 (주당 5일, 25분)Project EVO 게임을 하게 한 결과, 

•20명 중 7명이 큰 개선을 보여서 더 이상 ADHD의 범주에 들지 않게 됨

•사용 후 적어도 9개월 동안 효과가 지속되었음
Fig 4. Transfer effect on behavioral and parent report measures. Pre and post (A) response time (B) and respo
revealing within group change. Error bars indicate standard error of the mean. Within group main effects of session
= p .05, ** =.p .01. Sun symbols indicate statistically significant instances where SPD+IA post-training performa
TDC group prior to training. (C) Vanderbilt parent report inattention change bar plot (calculated by pre-post margina
significant group x session interaction. Error bars indicate standard error of the mean. All group x session interactio
stars (* = p .05, ** =.p .01) on bar graph.
https://doi.org/10.1371/journal.pone.0172616.g004
PLOS ONE | https://doi.org/10.1371/journal.pone.0172616 April 5, 2017
•ADHD에 대해서는 대규모 RCT phase III 임상 시험 진행 중이며, FDA 의료기기 인허가 목표

•8-12살 환자(n=330), 치료 효과 없는 비디오게임을 control group으로

•primary endpoint: TOVA

•의사의 처방을 받는 ADHD 치료용 게임 + 보험사의 커버 목표
www.thelancet.com/digital-health Published online February 24, 2020 https://doi.org/10.1016/S2589-7500(20)30017-0 1
Articles
Lancet Digital Health 2020
Published Online
February 24, 2020
https://doi.org/10.1016/
S2589-7500(20)30017-0
See Online/Comment
https://doi.org/10.1016/
S2589-7500(20)30058-3
Psychiatry and Behavioral
Sciences, Duke University
Medical Center, Durham, NC,
USA (Prof S H Kollins PhD,
Prof R S E Keefe PhD); Duke
Clinical Research Institute,
Durham, NC, USA
(Prof S H Kollins); Akili
Interactive Labs, Boston, MA,
USA (D J DeLoss PhD,
E Cañadas PhD, J Lutz PhD);
Department of Psychiatry,
Virginia Commonwealth
University, Richmond, VA, USA
(Prof R L Findling MD); VeraSci,
Durham, NC, USA
(Prof R S E Keefe); Department
of Pediatrics, University of
Cincinnati College of Medicine,
Cincinnati, OH, USA
(Prof J N Epstein PhD); Meridien
Research  Lake Erie College of
Osteopathic Medicine,
Bradenton, FL, USA
(A J Cutler MD); and Psychiatry
and Neuroscience and
Physiology, SUNY Upstate
Medical University,
Syracuse, NY, USA
(Prof S V Faraone PhD)
Correspondence to:
Dr Scott Kollins, Psychiatry and
Behavioral Sciences, Duke
University Medical Center,
Durham, NC 27710, USA
scott.kollins@duke.edu
A novel digital intervention for actively reducing severity of
paediatricADHD (STARS-ADHD): a randomised controlledtrial
Scott H Kollins, Denton J DeLoss, Elena Cañadas, Jacqueline Lutz, Robert L Findling, Richard S E Keefe, Jeffery N Epstein, Andrew J Cutler,
StephenV Faraone
Summary
Background Attention-deficit hyperactivity disorder (ADHD) is a common paediatric neurodevelopmental disorder with
substantial effect on families and society. Alternatives to traditional care, including novel digital therapeutics, have
shown promise to remediate cognitive deficits associated with this disorder and may address barriers to standard
therapies, such as pharmacological interventions and behavioural therapy. AKL-T01 is an investigational digital
therapeutic designed to target attention and cognitive control delivered through a video game-like interface via at-home
play for 25 min per day, 5 days per week for 4 weeks. This study aimed to assess whether AKL-T01 improved attentional
performance in paediatric patients with ADHD.
Methods The Software Treatment for Actively Reducing Severity of ADHD (STARS-ADHD) was a randomised, double-
blind, parallel-group, controlled trial of paediatric patients (aged 8–12 years, without disorder-related medications) with
confirmed ADHD and Test of Variables of Attention (TOVA) Attention Performance Index (API) scores of −1·8 and
below done by 20 research institutions in the USA. Patients were randomly assigned 1:1 to AKL-T01 or a digital control
intervention. The primary outcome was mean change in TOVA API from pre-intervention to post-intervention. Safety,
tolerability, and compliance were also assessed. Analyses were done in the intention-to-treat population. This trial is
registered with ClinicalTrials.gov, NCT02674633 and is completed.
Findings Between July 15, 2016, and Nov 30, 2017, 857 patients were evaluated and 348 were randomly assigned to
receive AKL-T01 or control. Among patients who received AKL-T01 (n=180 [52%]; mean [SD] age, 9·7 [1·3] years) or
control (n=168 [48%]; mean [SD] age, 9·6 [1·3] years), the non-parametric estimate of the population median change
from baseline TOVA API was 0·88 (95% CI 0·24–1·49; p=0·0060). The mean (SD) change from baseline on the
TOVA API was 0·93 (3·15) in the AKL-T01 group and 0·03 (3·16) in the control group. There were no serious adverse
events or discontinuations. Treatment-related adverse events were mild and included frustration (5 [3%] of 180)
and headache (3 [2%] of 180). Patient compliance was a mean of 83 (83%) of 100 expected sessions played
(SD, 29·2 sessions).
Interpretation Although future research is needed for this digital intervention, this study provides evidence that
AKL-T01 might be used to improve objectively measured inattention in paediatric patients with ADHD, while
presenting minimal adverse events.
Funding Sponsored by Akili Interactive Labs.
Copyright © 2020 The Author(s). Published by Elsevier Ltd. This is an Open Access article under the CC BY 4.0 license.
Introduction
Attention-deficit hyperactivity disorder (ADHD) is a
neurodevelopmental disorder of persistent impaired
attention, hyperactivity, and impulsivity that negatively
affects daily functioning and quality of life. ADHD is
one of the most commonly diagnosed paediatric mental
health disorders, with a prevalence estimated to be 5%
worldwide,1
and exerts a substantial burden on families
and society.2
Front-line intervention for ADHD includes pharmaco-
logical and non-pharmacological interventions, which
have shown short-term efficacy.3–5
Existing treatments
have side-effects that limit their acceptability,6
are only
effective when administered, and may not be as effective
for reducing daily impairments versus ADHD symptoms.7
Pharmacotherapy may not be suitable for some patients
due to caregiver preferences or concerns about abuse,
misuse, and diversion. Barriers to access also limit the
use of behavioural interventions, given a shortage of
properly trained paediatric mental health specialists8
and
variability in insurance coverage for such services.9,10
Indeed, studies in both the USA and the UK have found
that most children with paediatric mental health needs do
not have proper access to services.11,12
Digital therapeutics for ADHD may address these
limitations with improved access, minimal side-effects,
and low potential for abuse. Numerous studies and
meta-analyses on digital interventions targeting specific
cognitive functions have attempted to assess the
magnitude of efficacy for children and adolescents with
ADHD. In general, the quality of the studies is low, and
many do not include a control group.3
Reported effect
Lancet Digital Health 2020
6 www.thelancet.com/digital-health Published
Figure 2: Primary endpoint:TOVA API mean (SE) change pre-intervention to
post-intervention in the intention-to-treat population
*Adjusted p0·050; prespecifiedWilcoxon rank-sum test.Triangle represents
median change, pre-intervention to post-intervention.
AKL-T01 (n=169) Active control (n=160)
–0·25
0
0·25
0·50
0·75
Improveme
Mean(SE)changein
AKL-T01 Control χ² test p
Test ofVariables of Attention—Attention
Performance Index (type A: improvement
1·4 points)
79/169 (47%) 51/160 (32%) 7·60 0·0058
Attention Performance Index (type B:
post-intervention score ≥0)
18/170 (11%) 7/160 (4%) 4·54 0·033
ADHD-Rating Scale (improvement ≥2 points from
pre-intervention to post-intervention)
128/173 (74%) 119/164 (73%) 0·088 0·77
ADHD-Rating Scale (≥30% reduction)* 42/173 (24%) 31/164 (19%) 1·43 0·23
Impairment Rating Scale 82/171 (48%) 60/161 (37%) 3·87 0·049
Clinical Global Impressions (≤2 at post-
intervention)
29/175 (17%) 26/164 (16%) 0·032 0·86
Clinical Global Impressions (1 at post-intervention) 1/175 (1%) 1/164 (1%) 0·0021 0·96
Data are n/N (%) unless otherwise indicated. AKL-T01=an investigational digital therapeutic. *Post-hoc analysis.
ADHD=Attention-deficit hyperactivity disorder. AKL-T01=an investigational digital therapeutic.
Table 2: Clinical responder analysis intention-to-treat population
and between measures. This trial is registered with
ClinicalTrials.gov, NCT02674633.
Role of the funding source
The funder had a role in study conception and design,
confirming data and statistical analyses, and conducting
the study. All authors had full access to all the data in the
study and were involved in data interpretation and
writing of the report. The corresponding author had final
responsibility for the decision to submit for publication.
Results
Of 857 children screened for eligibility, 348 patients were
randomly assigned to receive AKL-T01 (n=180) or control
(n=168) between July 15, 2016, and Nov 30, 2017 (figure 1
and appendix p 3). Demographic and clinical character-
istics at baseline are shown in table 1.
The mean number of sessions completed by patients
in the AKL-T01 group was 83·2 out of 100 sessions
(83% instructed use; SD=29·2 sessions). Patients in the
control group used their intervention 480·7 min of
500 min (96% instructed use).
There was a significant difference between intervention
groups on the primary efficacy endpoint (adjusted
p=0·0060); non-parametric estimate of the population
median change (Hodges-Lehmann estimate) was 0·88
(95% CI 0·24–1·49). The mean (SD) change from baseline
on the TOVA API was 0·93 (3·15) in the AKL-T01 group
and 0·03 (3·16) in the control group (figure 2). There were
no intervention-group differences for secondary measures:
IRS, ADHD-RS, ADHD-RS-I, ADHD-RS-H, BRIEF-
Parent Inhibit and Working Memory and Metacognition
ADHD-Rating Scale—Inattentive 21·9 (3·5) 21·6 (3·7)
ADHD-Rating Scale—Hyperactivity 17·1 (6·0) 16·7 (5·4)
Clinical Global Impressions—Severity† 4·5 (0·7) 4·6 (0·6)
Data are n (%) or mean (SD). AKL-T01=an investigational digital therapeutic.
*n=179 for AKL-T01. †Assessed only at baseline.
Table 1: Baseline characteristics
Figure 2: Primary endpoint:TOVA API mean (SE) change pre-intervention to
post-intervention in the intention-to-treat population
*Adjusted p0·050; prespecifiedWilcoxon rank-sum test.Triangle represents
median change, pre-intervention to post-intervention.
AKL-T01 (n=169) Active control (n=160)
–0·25
0
0·25
0·50
0·75
1·00
1·25
1·50
Improvement
Mean(SE)changeinTOVAAPI
*
AKL-T01 Control χ² test p
Primary Outcome Secondary Outcome
•Primary Outcome인 TOVA API 에 대해서는 대조군 대비 유의미한 개선 효과를 보임

•Secondary Outcome 들에 대해서는 유의미한 개선 효과를 보이지 못함
눔: 당뇨병 예방
버추얼 이라크: 참전 군인의 PTSD
AppliedVR: VR 진통제
• Woebot, 정신 상담 챗봇 스타트업

• 스탠퍼드의 mental health 전문가들이 시작한 우울증 치료 (인지행동치료) 목적의 챗봇 

• Andrew Ng 교수는 이사회장으로 참여
• Woebot, 정신 상담 챗봇

• 실제 상담사들이 하듯이, 대화형으로 설명하고 사용자의 정신 건강 상태를 체크

• 대부분 설문과 다를 것이 없지만 (정해진 답 중에 하나 선택), UI 상의 혁신이라고 볼 수 있음

• 아직까지는 아주 정교한 NLP를 사용하고 있지는 않음 (세션 당 한 번 정도)
• Woebot, 정신 상담 챗봇

• 실제 상담사들이 하듯이, 대화형으로 설명하고 사용자의 정신 건강 상태를 체크

• 대부분 설문과 다를 것이 없지만 (정해진 답 중에 하나 선택), UI 상의 혁신이라고 볼 수 있음

• 아직까지는 아주 정교한 NLP를 사용하고 있지는 않음 (세션 당 한 번 정도)
• Woebot, 정신 상담 챗봇

• 실제 상담사들이 하듯이, 대화형으로 설명하고 사용자의 정신 건강 상태를 체크

• 대부분 설문과 다를 것이 없지만 (정해진 답 중에 하나 선택), UI 상의 혁신이라고 볼 수 있음

• 아직까지는 아주 정교한 NLP를 사용하고 있지는 않음 (세션 당 한 번 정도)
Original Paper
Delivering Cognitive Behavior Therapy to Young Adults With
Symptoms of Depression and Anxiety Using a Fully Automated
Conversational Agent (Woebot): A Randomized Controlled Trial
Kathleen Kara Fitzpatrick1*
, PhD; Alison Darcy2*
, PhD; Molly Vierhile1
, BA
1
Stanford School of Medicine, Department of Psychiatry and Behavioral Sciences, Stanford, CA, United States
2
Woebot Labs Inc., San Francisco, CA, United States
*
these authors contributed equally
Corresponding Author:
Alison Darcy, PhD
Woebot Labs Inc.
55 Fair Avenue
San Francisco, CA, 94110
United States
Email: alison@woebot.io
Abstract
Background: Web-based cognitive-behavioral therapeutic (CBT) apps have demonstrated efficacy but are characterized by
poor adherence. Conversational agents may offer a convenient, engaging way of getting support at any time.
Objective: The objective of the study was to determine the feasibility, acceptability, and preliminary efficacy of a fully automated
conversational agent to deliver a self-help program for college students who self-identify as having symptoms of anxiety and
depression.
Methods: In an unblinded trial, 70 individuals age 18-28 years were recruited online from a university community social media
site and were randomized to receive either 2 weeks (up to 20 sessions) of self-help content derived from CBT principles in a
conversational format with a text-based conversational agent (Woebot) (n=34) or were directed to the National Institute of Mental
Health ebook, “Depression in College Students,” as an information-only control group (n=36). All participants completed
Web-based versions of the 9-item Patient Health Questionnaire (PHQ-9), the 7-item Generalized Anxiety Disorder scale (GAD-7),
and the Positive and Negative Affect Scale at baseline and 2-3 weeks later (T2).
Results: Participants were on average 22.2 years old (SD 2.33), 67% female (47/70), mostly non-Hispanic (93%, 54/58), and
Caucasian (79%, 46/58). Participants in the Woebot group engaged with the conversational agent an average of 12.14 (SD 2.23)
times over the study period. No significant differences existed between the groups at baseline, and 83% (58/70) of participants
provided data at T2 (17% attrition). Intent-to-treat univariate analysis of covariance revealed a significant group difference on
depression such that those in the Woebot group significantly reduced their symptoms of depression over the study period as
measured by the PHQ-9 (F=6.47; P=.01) while those in the information control group did not. In an analysis of completers,
participants in both groups significantly reduced anxiety as measured by the GAD-7 (F1,54= 9.24; P=.004). Participants’ comments
suggest that process factors were more influential on their acceptability of the program than content factors mirroring traditional
therapy.
Conclusions: Conversational agents appear to be a feasible, engaging, and effective way to deliver CBT.
(JMIR Ment Health 2017;4(2):e19) doi:10.2196/mental.7785
KEYWORDS
conversational agents; mobile mental health; mental health; chatbots; depression; anxiety; college students; digital health
Introduction
Up to 74% of mental health diagnoses have their first onset
particularly common among college students, with more than
half reporting symptoms of anxiety and depression in the
previous year that were so severe they had difficulty functioning
Fitzpatrick et alJMIR MENTAL HEALTH
depression at baseline as measured by the PHQ-9, while
three-quarters (74%, 52/70) were in the severe range for anxiety
as measured by the GAD-7.
Figure 1. Participant recruitment flow.
Table 1. Demographic and clinical variables of participants at baseline.
WoebotInformation control
Scale, mean (SD)
14.30 (6.65)13.25 (5.17)Depression (PHQ-9)
18.05 (5.89)19.02 (4.27)Anxiety (GAD-7)
25.54 (9.58)26.19 (8.37)Positive affect
24.87 (8.13)28.74 (8.92)Negative affect
22.58 (2.38)21.83 (2.24)Age, mean (SD)
Gender, n (%)
7 (21)4 (7)Male
27 (79)20 (55)Female
Ethnicity, n (%)
2 (6)2 (8)Latino/Hispanic
32 (94)22 (92)Non-Latino/Hispanic
28 (82)18 (75)Caucasian
Fitzpatrick et alJMIR MENTAL HEALTH
Delivering Cognitive Behavior Therapy toYoung Adults With
Symptoms of Depression and Anxiety Using a Fully Automated
Conversational Agent (Woebot):A Randomized Controlled Trial
•분노장애와 우울증이 있다고 스스로 생각하는 대학생들이 사용하는 self-help 챗봇

•목적: 챗봇의 feasibility, acceptability, preliminary efficacy 를 보기 위함

•대학생 총 70명을 대상으로 2주 동안 진행

•실험군 (Woebot): 34명

•대조군 (information-only): 31명

•Oucome: PHQ-9, GAD-7
d cPFWoebotInformation-only control
95% CIb
T2a
95% CIb
T2a
0.44.0176.039.74-12.3211.14 (0.71)12.07-15.2713.67 (.81)PHQ-9
0.14.5810.3816.16-18.1317.35 (0.60)15.52-18.5616.84 (.67)GAD-7
0.02.7070.1724.35-29.4126.88 (1.29)23.17-28.8626.02 (1.45)PANAS positive
affect
0.344.9120.9123.54-28.4225.98 (1.24)24.73-30.3227.53 (1.42)PANAS nega-
tive affect
a
Baseline=pooled mean (standard error)
b
95% confidence interval.
c
Cohen d shown for between-subjects effects using means and standard errors at Time 2.
Figure 2. Change in mean depression (PHQ-9) score by group over the study period. Error bars represent standard error.
Preliminary Efficacy
Table 2 shows the results of the primary ITT analyses conducted
on the entire sample. Univariate ANCOVA revealed a significant
treatment effect on depression revealing that those in the Woebot
group significantly reduced PHQ-9 score while those in the
information control group did not (F1,48=6.03; P=.017) (see
Figure 2). This represented a moderate between-groups effect
size (d=0.44). This effect is robust after Bonferroni correction
for multiple comparisons (P=.04). No other significant
between-group differences were observed on anxiety or affect.
Completer Analysis
As a secondary analysis, to explore whether any main effects
existed, 2x2 repeated measures ANOVAs were conducted on
the primary outcome variables (with the exception of PHQ-9)
among completers only. A significant main effect was observed
on GAD-7 (F1,54=9.24; P=.004) suggesting that completers
experienced a significant reduction in symptoms of anxiety
between baseline and T2, regardless of the group to which they
were assigned with a within-subjects effect size of d=0.37. No
main effects were observed for positive (F1,50=.001; P=.951;
d=0.21) or negative affect (F1,50=.06; P=.80; d=0.003) as
measured by the PANAS.
To further elucidate the source and magnitude of change in
depression, repeated measures dependent t tests were conducted
and Cohen d effect sizes were calculated on individual items of
the PHQ-9 among those in the Woebot condition. The analysis
revealed that baseline-T2 changes were observed on the
following items in order of decreasing magnitude: motoric
symptoms (d=2.09), appetite (d=0.65), little interest or pleasure
in things (d=0.44), feeling bad about self (d=0.40), and
concentration (d=0.39), and suicidal thoughts (d=0.30), feeling
down (d=0.14), sleep (d=0.12), and energy (d=0.06).
JMIR Ment Health 2017 | vol. 4 | iss. 2 | e19 | p.6http://mental.jmir.org/2017/2/e19/
(page number not for citation purposes)
XSL•FO
RenderX
Change in mean depression (PHQ-9) score
by group over the study period
•결과

•챗봇을 2주 동안 평균 12.14번 사용함

•우울증에 대해서는 significant group difference

•Woebot 그룹에서는 우울증(PHQ-9)의 유의미한 감소가 있었음

•대조군에서는 유의미한 감소 없음

•분노 장애에 대해서는 두 그룹 모두 유의미한 감소가 있었음 (GAD-7 기준)
Digital Therapeutics and Digital Medicine Summit | February 2018
After the endpoint:
how digital medicine can transcend
traditional research standards
Peter Hames, CEO  Co-Founder
Our starting point is
sleep - a destigmatized
“way in” to broader
mental health
Digital Therapeutics and Digital Medicine Summit | February 2018
Reference: Luik, A. et al. (2017), Behavioural and Cognitive Psychotherapy.
Our first product
is Sleepio
• A fully automated Cognitive Behavioral
Therapy (CBT) program for insomnia
• Accessible via app and web, it is an
effective digital medicine for insomnia
• Helps alleviate co-morbid anxiety and
depression
For more info see bighealth.com/our-solution
Digital Therapeutics and Digital Medicine Summit | February 2018
• 불면증에 대한 인지행동치료 (Cognitive Behavioral Therapy, CBT)

• 인지 치료: 잠에 대한 잘못된 생각을 바로 잡는 치료 (교육, 자극 조절, 인지 재구성…)

• 행동 치료: 잠에 방해가 되는 행동/습관 교정 (수면 위생법, 이완/호흡 훈련 …)

• Sleepio: 기존에는 대면으로 치료 받던 방식을, App으로 구현한 것 (최소 6주 과정)
• 불면증 개선 효과에 대한 임상적인 근거가 상당히 탄탄함 

• 30 peer-reviewed journal, 8 RCT

• FDA 인허가 프로세스를 밟지 않고 (치료 claim을 하지 않고), 사업 중
nature
“About as effective as CBT delivered in person”
THE LANCET
“A proven intervention for sleep disorders”
Digital Therapeutics and Digital Medicine Summit | February 2018
Reference: Espie et al. (2012), SLEEP. Average change in CBT group: SOL: 47 à 21mins, WASO: 76 à 28mins); Lancee et al. (2016), SLEEP.
% insomnia sufferers
achieving healthy sleep
Placebo Treatment as
usual
In-person
CBT-I
76%
29%
18%
70-75%
Validation of Sleepio’s effectiveness in
improving sleep quality
디지털 헬스케어의 3단계
•Step 1. 데이터의 측정

•Step 2. 데이터의 통합

•Step 3. 데이터의 분석
•복잡한 의료 데이터의 분석 및 insight 도출

•영상 의료/병리 데이터의 분석/판독

•연속 데이터의 모니터링 및 예방/예측
의료 인공지능의 세 유형
AnalysisTarget Discovery AnalysisLead Discovery Clinical Trial
Post Market
Surveillance
Digital Healthcare in Drug Development
•개인 유전 정보 분석

•블록체인 기반 유전체 분석
•딥러닝 기반 후보 물질

•인공지능+제약사
•환자 모집

•데이터 측정: 웨어러블

•디지털 표현형

•복약 순응도
•SNS 기반의 PMS

•블록체인 기반의 PMS
+
Digital Therapeutics
Feedback/Questions
• Email: yoonsup.choi@gmail.com
• Blog: http://www.yoonsupchoi.com
• Facebook: Yoon Sup Choi
• Youtube: 최윤섭의 디지털 헬스케어

[C&C] 의료의 미래 디지털 헬스케어

  • 1.
    의료의 미래, 디지털헬스케어 디지털 헬스케어 파트너스 최윤섭, PhD
  • 2.
    “It's in Apple'sDNA that technology alone is not enough. 
 It's technology married with liberal arts.”
  • 3.
    The Convergence ofIT, BT and Medicine
  • 5.
    최윤섭 지음 의료인공지능 표지디자인•최승협 컴퓨터공학, 생명과학,의학의 융합을 통해 디지 털 헬스케어 분야의 혁신을 창출하고 사회적 가 치를 만드는 것을 화두로 삼고 있는 융합생명과학자, 미래의료학자, 기업가, 엔젤투자가, 에반젤리스트이다. 국내 디지털 헬스케어 분야 의 대표적인 전문가로, 활발한 연구, 저술 및 강연 등을 통해 국내에 이 분야를 처음 소개한 장본인이다. 포항공과대학교에서 컴퓨터공학과 생명과학을 복수전공하였으며 동 대학원 시스템생명공학부에서 전산생물학으로 이학박사 학위를 취득하였다. 스탠퍼드대학교 방문연구원, 서울의대 암연구소 연구 조교수, KT 종합기술원 컨버전스연구소 팀장, 서울대병원 의생명연 구원 연구조교수 등을 거쳤다. 『사이언스』를 비롯한 세계적인 과학 저널에 10여 편의 논문을 발표했다. 국내 최초로 디지털 헬스케어를 본격적으로 연구하는 연구소인 ‘최 윤섭 디지털 헬스케어 연구소’를 설립하여 소장을 맡고 있다. 또한 국내 유일의 헬스케어 스타트업 전문 엑셀러레이터 ‘디지털 헬스케 어 파트너스’의 공동 창업자 및 대표 파트너로 혁신적인 헬스케어 스타트업을 의료 전문가들과 함께 발굴, 투자, 육성하고 있다. 성균 관대학교 디지털헬스학과 초빙교수로도 재직 중이다. 뷰노, 직토, 3billion, 서지컬마인드, 닥터다이어리, VRAD, 메디히어, 소울링, 메디히어, 모바일닥터 등의 헬스케어 스타트업에 투자하고 자문을 맡아 한국에서도 헬스케어 혁신을 만들어내기 위해 노력하 고 있다. 국내 최초의 디지털 헬스케어 전문 블로그 『최윤섭의 헬스 케어 이노베이션』에 활발하게 집필하고 있으며, 『매일경제』에 칼럼 을 연재하고 있다. 저서로 『헬스케어 이노베이션: 이미 시작된 미래』 와 『그렇게 나는 스스로 기업이 되었다』가 있다. •블로그_ http://www.yoonsupchoi.com/ •페이스북_ https://www.facebook.com/yoonsup.choi •이메일_ yoonsup.choi@gmail.com 최윤섭 의료 인공지능은 보수적인 의료 시스템을 재편할 혁신을 일으키고 있다. 의료 인공지능의 빠른 발전과 광범위한 영향은 전문화, 세분화되며 발전해 온 현대 의료 전문가들이 이해하기가 어려우며, 어디서부 터 공부해야 할지도 막연하다. 이런 상황에서 의료 인공지능의 개념과 적용, 그리고 의사와의 관계를 쉽 게 풀어내는 이 책은 좋은 길라잡이가 될 것이다. 특히 미래의 주역이 될 의학도와 젊은 의료인에게 유용 한 소개서이다. ━ 서준범, 서울아산병원 영상의학과 교수, 의료영상인공지능사업단장 인공지능이 의료의 패러다임을 크게 바꿀 것이라는 것에 동의하지 않는 사람은 거의 없다. 하지만 인공 지능이 처리해야 할 의료의 난제는 많으며 그 해결 방안도 천차만별이다. 흔히 생각하는 만병통치약 같 은 의료 인공지능은 존재하지 않는다. 이 책은 다양한 의료 인공지능의 개발, 활용 및 가능성을 균형 있 게 분석하고 있다. 인공지능을 도입하려는 의료인, 생소한 의료 영역에 도전할 인공지능 연구자 모두에 게 일독을 권한다. ━ 정지훈, 경희사이버대 미디어커뮤니케이션학과 선임강의교수, 의사 서울의대 기초의학교육을 책임지고 있는 교수의 입장에서, 산업화 이후 변하지 않은 현재의 의학 교육 으로는 격변하는 인공지능 시대에 의대생을 대비시키지 못한다는 한계를 절실히 느낀다. 저와 함께 의 대 인공지능 교육을 개척하고 있는 최윤섭 소장의 전문적 분석과 미래 지향적 안목이 담긴 책이다. 인공 지능이라는 미래를 대비할 의대생과 교수, 그리고 의대 진학을 고민하는 학생과 학부모에게 추천한다. ━ 최형진, 서울대학교 의과대학 해부학교실 교수, 내과 전문의 최근 의료 인공지능의 도입에 대해서 극단적인 시각과 태도가 공존하고 있다. 이 책은 다양한 사례와 깊 은 통찰을 통해 의료 인공지능의 현황과 미래에 대해 균형적인 시각을 제공하여, 인공지능이 의료에 본 격적으로 도입되기 위한 토론의 장을 마련한다. 의료 인공지능이 일상화된 10년 후 돌아보았을 때, 이 책 이 그런 시대를 이끄는 길라잡이 역할을 하였음을 확인할 수 있기를 기대한다. ━ 정규환, 뷰노 CTO 의료 인공지능은 다른 분야 인공지능보다 더 본질적인 이해가 필요하다. 단순히 인간의 일을 대신하는 수준을 넘어 의학의 패러다임을 데이터 기반으로 변화시키기 때문이다. 따라서 인공지능을 균형있게 이 해하고, 어떻게 의사와 환자에게 도움을 줄 수 있을지 깊은 고민이 필요하다. 세계적으로 일어나고 있는 이러한 노력의 결과물을 집대성한 이 책이 반가운 이유다. ━ 백승욱, 루닛 대표 의료 인공지능의 최신 동향뿐만 아니라, 의의와 한계, 전망, 그리고 다양한 생각거리까지 주는 책이다. 논쟁이 되는 여러 이슈에 대해서도 저자는 자신의 시각을 명확한 근거에 기반하여 설득력 있게 제시하 고 있다. 개인적으로는 이 책을 대학원 수업 교재로 활용하려 한다. ━ 신수용, 성균관대학교 디지털헬스학과 교수 최윤섭지음 의료인공지능 값 20,000원 ISBN 979-11-86269-99-2 미래의료학자 최윤섭 박사가 제시하는 의료 인공지능의 현재와 미래 의료 딥러닝과 IBM 왓슨의 현주소 인공지능은 의사를 대체하는가 값 20,000원 ISBN 979-11-86269-99-2 소울링, 메디히어, 모바일닥터 등의 헬스케어 스타트업에 투자하고 자문을 맡아 한국에서도 헬스케어 혁신을 만들어내기 위해 노력하 고 있다. 국내 최초의 디지털 헬스케어 전문 블로그 『최윤섭의 헬스 케어 이노베이션』에 활발하게 집필하고 있으며, 『매일경제』에 칼럼 을 연재하고 있다. 저서로 『헬스케어 이노베이션: 이미 시작된 미래』 와 『그렇게 나는 스스로 기업이 되었다』가 있다. •블로그_ http://www.yoonsupchoi.com/ •페이스북_ https://www.facebook.com/yoonsup.choi •이메일_ yoonsup.choi@gmail.com (2014) (2018) (2020)
  • 7.
  • 8.
  • 9.
    2010 2011 20122013 2014 2015 2016 2017 2018 Q1 Q2 Q3 Q4 153 283 476 647 608 568 684 851 765 FUNDING SNAPSHOT: YEAR OVER YEAR 5 Deal Count $1.4B $1.7B $1.7B $627M $603M$459M $8.2B $6.2B $7.1B $2.9B $2.3B$2.0B $1.2B $11.7B $2.3B Funding surpassed 2017 numbers by almost $3B, making 2018 the fourth consecutive increase in capital investment and largest since we began tracking digital health funding in 2010. Deal volume decreased from Q3 to Q4, but deal sizes spiked, with $3B invested in Q4 alone. Average deal size in 2018 was $21M, a $6M increase from 2017. $3.0B $14.6B DEALS & FUNDING INVESTORS SEGMENT DETAIL Source: StartUp Health Insights | startuphealth.com/insights Note: Report based on public data through 12/31/18 on seed (incl. accelerator), venture, corporate venture, and private equity funding only. © 2019 StartUp Health LLC •글로벌 투자 추이를 보더라도, 2018년 역대 최대 규모: $14.6B •2015년 이후 4년 연속 증가 중 https://hq.startuphealth.com/posts/startup-healths-2018-insights-funding-report-a-record-year-for-digital-health
  • 10.
    27 Switzerland EUROPE $3.2B $1.96B $1B $3.5B NORTH AMERICA $12BValuation $1.8B $3.1B$3.2B $1B $1B 38 healthcare unicorns valued at $90.7B Global VC-backed digital health companies with a private market valuation of $1B+ (7/26/19) UNITED KINGDOM $1.5B MIDDLE EAST $1B Valuation ISRAEL $7B $1B$1.2B $1B $1.65B $1.8B $1.25B $2.8B $1B $1B $2B Valuation $1.5B UNITED STATES GERMANY $1.7B $2.5B CHINA ASIA $3B $5.5B Valuation $5B $2.4B $2.4B France $1.1B $3.5B $1.6B $1B $1B $1B $1B CB Insights, Global Healthcare Reports 2019 2Q •전 세계적으로 38개의 디지털 헬스케어 유니콘 스타트업 (=기업가치 $1B 이상) 이 있으나, •국내에는 하나도 없음
  • 11.
    헬스케어 넓은 의미의 건강관리에는 해당되지만, 디지털 기술이 적용되지 않고, 전문 의료 영역도 아닌 것 예) 운동, 영양, 수면 디지털 헬스케어 건강 관리 중에 디지털 기술이 사용되는 것 예) 사물인터넷, 인공지능, 3D 프린터, VR/AR 모바일 헬스케어 디지털 헬스케어 중 모바일 기술이 사용되는 것 예) 스마트폰, 사물인터넷, SNS 개인 유전정보분석 암유전체, 질병위험도, 보인자, 약물 민감도 웰니스, 조상 분석 의료 질병 예방, 치료, 처방, 관리 등 전문 의료 영역 원격의료 원격 환자 모니터링 원격진료 전화, 화상, 판독 명상 앱 ADHD 치료 게임 PTSD 치료 VR 디지털 치료제 중독 치료 앱 헬스케어 관련 분야 구성도
  • 12.
    EDITORIAL OPEN Digital medicine,on its way to being just plain medicine npj Digital Medicine (2018)1:20175 ; doi:10.1038/ s41746-017-0005-1 There are already nearly 30,000 peer-reviewed English-language scientific journals, producing an estimated 2.5 million articles a year.1 So why another, and why one focused specifically on digital medicine? To answer that question, we need to begin by defining what “digital medicine” means: using digital tools to upgrade the practice of medicine to one that is high-definition and far more individualized. It encompasses our ability to digitize human beings using biosensors that track our complex physiologic systems, but also the means to process the vast data generated via algorithms, cloud computing, and artificial intelligence. It has the potential to democratize medicine, with smartphones as the hub, enabling each individual to generate their own real world data and being far more engaged with their health. Add to this new imaging tools, mobile device laboratory capabilities, end-to-end digital clinical trials, telemedicine, and one can see there is a remarkable array of transformative technology which lays the groundwork for a new form of healthcare. As is obvious by its definition, the far-reaching scope of digital medicine straddles many and widely varied expertise. Computer scientists, healthcare providers, engineers, behavioral scientists, ethicists, clinical researchers, and epidemiologists are just some of the backgrounds necessary to move the field forward. But to truly accelerate the development of digital medicine solutions in health requires the collaborative and thoughtful interaction between individuals from several, if not most of these specialties. That is the primary goal of npj Digital Medicine: to serve as a cross-cutting resource for everyone interested in this area, fostering collabora- tions and accelerating its advancement. Current systems of healthcare face multiple insurmountable challenges. Patients are not receiving the kind of care they want and need, caregivers are dissatisfied with their role, and in most countries, especially the United States, the cost of care is unsustainable. We are confident that the development of new systems of care that take full advantage of the many capabilities that digital innovations bring can address all of these major issues. Researchers too, can take advantage of these leading-edge technologies as they enable clinical research to break free of the confines of the academic medical center and be brought into the real world of participants’ lives. The continuous capture of multiple interconnected streams of data will allow for a much deeper refinement of our understanding and definition of most pheno- types, with the discovery of novel signals in these enormous data sets made possible only through the use of machine learning. Our enthusiasm for the future of digital medicine is tempered by the recognition that presently too much of the publicized work in this field is characterized by irrational exuberance and excessive hype. Many technologies have yet to be formally studied in a clinical setting, and for those that have, too many began and ended with an under-powered pilot program. In addition, there are more than a few examples of digital “snake oil” with substantial uptake prior to their eventual discrediting.2 Both of these practices are barriers to advancing the field of digital medicine. Our vision for npj Digital Medicine is to provide a reliable, evidence-based forum for all clinicians, researchers, and even patients, curious about how digital technologies can transform every aspect of health management and care. Being open source, as all medical research should be, allows for the broadest possible dissemination, which we will strongly encourage, including through advocating for the publication of preprints And finally, quite paradoxically, we hope that npj Digital Medicine is so successful that in the coming years there will no longer be a need for this journal, or any journal specifically focused on digital medicine. Because if we are able to meet our primary goal of accelerating the advancement of digital medicine, then soon, we will just be calling it medicine. And there are already several excellent journals for that. ACKNOWLEDGEMENTS Supported by the National Institutes of Health (NIH)/National Center for Advancing Translational Sciences grant UL1TR001114 and a grant from the Qualcomm Foundation. ADDITIONAL INFORMATION Competing interests:The authors declare no competing financial interests. Publisher's note:Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Change history:The original version of this Article had an incorrect Article number of 5 and an incorrect Publication year of 2017. These errors have now been corrected in the PDF and HTML versions of the Article. Steven R. Steinhubl1 and Eric J. Topol1 1 Scripps Translational Science Institute, 3344 North Torrey Pines Court, Suite 300, La Jolla, CA 92037, USA Correspondence: Steven R. Steinhubl (steinhub@scripps.edu) or Eric J. Topol (etopol@scripps.edu) REFERENCES 1. Ware, M. & Mabe, M. The STM report: an overview of scientific and scholarly journal publishing 2015 [updated March]. http://digitalcommons.unl.edu/scholcom/92017 (2015). 2. Plante, T. B., Urrea, B. & MacFarlane, Z. T. et al. Validation of the instant blood pressure smartphone App. JAMA Intern. Med. 176, 700–702 (2016). Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons. org/licenses/by/4.0/. © The Author(s) 2018 Received: 19 October 2017 Accepted: 25 October 2017 www.nature.com/npjdigitalmed Published in partnership with the Scripps Translational Science Institute 디지털 의료의 미래는? 일상적인 의료가 되는 것
  • 13.
    What is mostimportant factor in digital medicine?
  • 14.
    “Data! Data! Data!”he cried.“I can’t make bricks without clay!” - Sherlock Holmes,“The Adventure of the Copper Beeches”
  • 16.
    새로운 데이터가 새로운 방식으로 새로운주체에 의해 측정, 저장, 통합, 분석된다. 데이터의 종류 데이터의 질적/양적 측면 웨어러블 기기 스마트폰 유전 정보 분석 인공지능 SNS 사용자/환자 대중
  • 17.
    디지털 헬스케어의 3단계 •Step1. 데이터의 측정 •Step 2. 데이터의 통합 •Step 3. 데이터의 분석
  • 19.
    LETTER https://doi.org/10.1038/s41586-019-1390-1 A clinicallyapplicable approach to continuous prediction of future acute kidney injury Nenad Tomašev1 *, Xavier Glorot1 , Jack W. Rae1,2 , Michal Zielinski1 , Harry Askham1 , Andre Saraiva1 , Anne Mottram1 , Clemens Meyer1 , Suman Ravuri1 , Ivan Protsyuk1 , Alistair Connell1 , Cían O. Hughes1 , Alan Karthikesalingam1 , Julien Cornebise1,12 , Hugh Montgomery3 , Geraint Rees4 , Chris Laing5 , Clifton R. Baker6 , Kelly Peterson7,8 , Ruth Reeves9 , Demis Hassabis1 , Dominic King1 , Mustafa Suleyman1 , Trevor Back1,13 , Christopher Nielson10,11,13 , Joseph R. Ledsam1,13 * & Shakir Mohamed1,13 The early prediction of deterioration could have an important role in supporting healthcare professionals, as an estimated 11% of deaths in hospital follow a failure to promptly recognize and treat deteriorating patients1 . To achieve this goal requires predictions of patient risk that are continuously updated and accurate, and delivered at an individual level with sufficient context and enough time to act. Here we develop a deep learning approach for the continuous risk prediction of future deterioration in patients, building on recent work that models adverse events from electronic health records2–17 and using acute kidney injury—a common and potentially life-threatening condition18 —as an exemplar. Our model was developed on a large, longitudinal dataset of electronic health records that cover diverse clinical environments, comprising 703,782 adult patients across 172 inpatient and 1,062 outpatient sites. Our model predicts 55.8% of all inpatient episodes of acute kidney injury, and 90.2% of all acute kidney injuries that required subsequent administration of dialysis, with a lead time of up to 48 h and a ratio of 2 false alerts for every true alert. In addition to predicting future acute kidney injury, our model provides confidence assessments and a list of the clinical features that are most salient to each prediction, alongside predicted future trajectories for clinically relevant blood tests9 . Although the recognition and prompt treatment of acute kidney injury is known to be challenging, our approach may offer opportunities for identifying patients at risk within a time window that enables early treatment. Adverse events and clinical complications are a major cause of mor- tality and poor outcomes in patients, and substantial effort has been made to improve their recognition18,19 . Few predictors have found their way into routine clinical practice, because they either lack effective sensitivity and specificity or report damage that already exists20 . One example relates to acute kidney injury (AKI), a potentially life-threat- ening condition that affects approximately one in five inpatient admis- sions in the United States21 . Although a substantial proportion of cases of AKI are thought to be preventable with early treatment22 , current algorithms for detecting AKI depend on changes in serum creatinine as a marker of acute decline in renal function. Increases in serum cre- atinine lag behind renal injury by a considerable period, which results in delayed access to treatment. This supports a case for preventative ‘screening’-type alerts but there is no evidence that current rule-based alerts improve outcomes23 . For predictive alerts to be effective, they must empower clinicians to act before a major clinical decline has occurred by: (i) delivering actionable insights on preventable condi- tions; (ii) being personalized for specific patients; (iii) offering suffi- cient contextual information to inform clinical decision-making; and (iv) being generally applicable across populations of patients24 . Promising recent work on modelling adverse events from electronic health records2–17 suggests that the incorporation of machine learning may enable the early prediction of AKI. Existing examples of sequential AKI risk models have either not demonstrated a clinically applicable level of predictive performance25 or have focused on predictions across a short time horizon that leaves little time for clinical assessment and intervention26 . Our proposed system is a recurrent neural network that operates sequentially over individual electronic health records, processing the data one step at a time and building an internal memory that keeps track of relevant information seen up to that point. At each time point, the model outputs a probability of AKI occurring at any stage of sever- ity within the next 48 h (although our approach can be extended to other time windows or severities of AKI; see Extended Data Table 1). When the predicted probability exceeds a specified operating-point threshold, the prediction is considered positive. This model was trained using data that were curated from a multi-site retrospective dataset of 703,782 adult patients from all available sites at the US Department of Veterans Affairs—the largest integrated healthcare system in the United States. The dataset consisted of information that was available from hospital electronic health records in digital format. The total number of independent entries in the dataset was approximately 6 billion, includ- ing 620,000 features. Patients were randomized across training (80%), validation (5%), calibration (5%) and test (10%) sets. A ground-truth label for the presence of AKI at any given point in time was added using the internationally accepted ‘Kidney Disease: Improving Global Outcomes’ (KDIGO) criteria18 ; the incidence of KDIGO AKI was 13.4% of admissions. Detailed descriptions of the model and dataset are provided in the Methods and Extended Data Figs. 1–3. Figure 1 shows the use of our model. At every point throughout an admission, the model provides updated estimates of future AKI risk along with an associated degree of uncertainty. Providing the uncer- tainty associated with a prediction may help clinicians to distinguish ambiguous cases from those predictions that are fully supported by the available data. Identifying an increased risk of future AKI sufficiently far in advance is critical, as longer lead times may enable preventative action to be taken. This is possible even when clinicians may not be actively intervening with, or monitoring, a patient. Supplementary Information section A provides more examples of the use of the model. With our approach, 55.8% of inpatient AKI events of any severity were predicted early, within a window of up to 48 h in advance and with a ratio of 2 false predictions for every true positive. This corresponds to an area under the receiver operating characteristic curve of 92.1%, and an area under the precision–recall curve of 29.7%. When set at this threshold, our predictive model would—if operationalized—trigger a 1 DeepMind, London, UK. 2 CoMPLEX, Computer Science, University College London, London, UK. 3 Institute for Human Health and Performance, University College London, London, UK. 4 Institute of Cognitive Neuroscience, University College London, London, UK. 5 University College London Hospitals, London, UK. 6 Department of Veterans Affairs, Denver, CO, USA. 7 VA Salt Lake City Healthcare System, Salt Lake City, UT, USA. 8 Division of Epidemiology, University of Utah, Salt Lake City, UT, USA. 9 Department of Veterans Affairs, Nashville, TN, USA. 10 University of Nevada School of Medicine, Reno, NV, USA. 11 Department of Veterans Affairs, Salt Lake City, UT, USA. 12 Present address: University College London, London, UK. 13 These authors contributed equally: Trevor Back, Christopher Nielson, Joseph R. Ledsam, Shakir Mohamed. *e-mail: nenadt@google.com; jledsam@google.com 1 1 6 | N A T U R E | V O L 5 7 2 | 1 A U G U S T 2 0 1 9 Copyright 2016 American Medical Association. All rights reserved. Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs Varun Gulshan, PhD; Lily Peng, MD, PhD; Marc Coram, PhD; Martin C. Stumpe, PhD; Derek Wu, BS; Arunachalam Narayanaswamy, PhD; Subhashini Venugopalan, MS; Kasumi Widner, MS; Tom Madams, MEng; Jorge Cuadros, OD, PhD; Ramasamy Kim, OD, DNB; Rajiv Raman, MS, DNB; Philip C. Nelson, BS; Jessica L. Mega, MD, MPH; Dale R. Webster, PhD IMPORTANCE Deep learning is a family of computational methods that allow an algorithm to program itself by learning from a large set of examples that demonstrate the desired behavior, removing the need to specify rules explicitly. Application of these methods to medical imaging requires further assessment and validation. OBJECTIVE To apply deep learning to create an algorithm for automated detection of diabetic retinopathy and diabetic macular edema in retinal fundus photographs. DESIGN AND SETTING A specific type of neural network optimized for image classification called a deep convolutional neural network was trained using a retrospective development data set of 128 175 retinal images, which were graded 3 to 7 times for diabetic retinopathy, diabetic macular edema, and image gradability by a panel of 54 US licensed ophthalmologists and ophthalmology senior residents between May and December 2015. The resultant algorithm was validated in January and February 2016 using 2 separate data sets, both graded by at least 7 US board-certified ophthalmologists with high intragrader consistency. EXPOSURE Deep learning–trained algorithm. MAIN OUTCOMES AND MEASURES The sensitivity and specificity of the algorithm for detecting referable diabetic retinopathy (RDR), defined as moderate and worse diabetic retinopathy, referable diabetic macular edema, or both, were generated based on the reference standard of the majority decision of the ophthalmologist panel. The algorithm was evaluated at 2 operating points selected from the development set, one selected for high specificity and another for high sensitivity. RESULTS TheEyePACS-1datasetconsistedof9963imagesfrom4997patients(meanage,54.4 years;62.2%women;prevalenceofRDR,683/8878fullygradableimages[7.8%]);the Messidor-2datasethad1748imagesfrom874patients(meanage,57.6years;42.6%women; prevalenceofRDR,254/1745fullygradableimages[14.6%]).FordetectingRDR,thealgorithm hadanareaunderthereceiveroperatingcurveof0.991(95%CI,0.988-0.993)forEyePACS-1and 0.990(95%CI,0.986-0.995)forMessidor-2.Usingthefirstoperatingcutpointwithhigh specificity,forEyePACS-1,thesensitivitywas90.3%(95%CI,87.5%-92.7%)andthespecificity was98.1%(95%CI,97.8%-98.5%).ForMessidor-2,thesensitivitywas87.0%(95%CI,81.1%- 91.0%)andthespecificitywas98.5%(95%CI,97.7%-99.1%).Usingasecondoperatingpoint withhighsensitivityinthedevelopmentset,forEyePACS-1thesensitivitywas97.5%and specificitywas93.4%andforMessidor-2thesensitivitywas96.1%andspecificitywas93.9%. CONCLUSIONS AND RELEVANCE In this evaluation of retinal fundus photographs from adults with diabetes, an algorithm based on deep machine learning had high sensitivity and specificity for detecting referable diabetic retinopathy. Further research is necessary to determine the feasibility of applying this algorithm in the clinical setting and to determine whether use of the algorithm could lead to improved care and outcomes compared with current ophthalmologic assessment. JAMA. doi:10.1001/jama.2016.17216 Published online November 29, 2016. Editorial Supplemental content Author Affiliations: Google Inc, Mountain View, California (Gulshan, Peng, Coram, Stumpe, Wu, Narayanaswamy, Venugopalan, Widner, Madams, Nelson, Webster); Department of Computer Science, University of Texas, Austin (Venugopalan); EyePACS LLC, San Jose, California (Cuadros); School of Optometry, Vision Science Graduate Group, University of California, Berkeley (Cuadros); Aravind Medical Research Foundation, Aravind Eye Care System, Madurai, India (Kim); Shri Bhagwan Mahavir Vitreoretinal Services, Sankara Nethralaya, Chennai, Tamil Nadu, India (Raman); Verily Life Sciences, Mountain View, California (Mega); Cardiovascular Division, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, Massachusetts (Mega). Corresponding Author: Lily Peng, MD, PhD, Google Research, 1600 Amphitheatre Way, Mountain View, CA 94043 (lhpeng@google.com). Research JAMA | Original Investigation | INNOVATIONS IN HEALTH CARE DELIVERY (Reprinted) E1 Copyright 2016 American Medical Association. All rights reserved. Downloaded From: http://jamanetwork.com/ on 12/02/2016 안과 LETTERS https://doi.org/10.1038/s41591-018-0335-9 1 Guangzhou Women and Children’s Medical Center, Guangzhou Medical University, Guangzhou, China. 2 Institute for Genomic Medicine, Institute of Engineering in Medicine, and Shiley Eye Institute, University of California, San Diego, La Jolla, CA, USA. 3 Hangzhou YITU Healthcare Technology Co. Ltd, Hangzhou, China. 4 Department of Thoracic Surgery/Oncology, First Affiliated Hospital of Guangzhou Medical University, China State Key Laboratory and National Clinical Research Center for Respiratory Disease, Guangzhou, China. 5 Guangzhou Kangrui Co. Ltd, Guangzhou, China. 6 Guangzhou Regenerative Medicine and Health Guangdong Laboratory, Guangzhou, China. 7 Veterans Administration Healthcare System, San Diego, CA, USA. 8 These authors contributed equally: Huiying Liang, Brian Tsui, Hao Ni, Carolina C. S. Valentim, Sally L. Baxter, Guangjian Liu. *e-mail: kang.zhang@gmail.com; xiahumin@hotmail.com Artificial intelligence (AI)-based methods have emerged as powerful tools to transform medical care. Although machine learning classifiers (MLCs) have already demonstrated strong performance in image-based diagnoses, analysis of diverse and massive electronic health record (EHR) data remains chal- lenging. Here, we show that MLCs can query EHRs in a manner similar to the hypothetico-deductive reasoning used by physi- cians and unearth associations that previous statistical meth- ods have not found. Our model applies an automated natural language processing system using deep learning techniques to extract clinically relevant information from EHRs. In total, 101.6 million data points from 1,362,559 pediatric patient visits presenting to a major referral center were analyzed to train and validate the framework. Our model demonstrates high diagnostic accuracy across multiple organ systems and is comparable to experienced pediatricians in diagnosing com- mon childhood diseases. Our study provides a proof of con- cept for implementing an AI-based system as a means to aid physicians in tackling large amounts of data, augmenting diag- nostic evaluations, and to provide clinical decision support in cases of diagnostic uncertainty or complexity. Although this impact may be most evident in areas where healthcare provid- ers are in relative shortage, the benefits of such an AI system are likely to be universal. Medical information has become increasingly complex over time. The range of disease entities, diagnostic testing and biomark- ers, and treatment modalities has increased exponentially in recent years. Subsequently, clinical decision-making has also become more complex and demands the synthesis of decisions from assessment of large volumes of data representing clinical information. In the current digital age, the electronic health record (EHR) represents a massive repository of electronic data points representing a diverse array of clinical information1–3 . Artificial intelligence (AI) methods have emerged as potentially powerful tools to mine EHR data to aid in disease diagnosis and management, mimicking and perhaps even augmenting the clinical decision-making of human physicians1 . To formulate a diagnosis for any given patient, physicians fre- quently use hypotheticodeductive reasoning. Starting with the chief complaint, the physician then asks appropriately targeted questions relating to that complaint. From this initial small feature set, the physician forms a differential diagnosis and decides what features (historical questions, physical exam findings, laboratory testing, and/or imaging studies) to obtain next in order to rule in or rule out the diagnoses in the differential diagnosis set. The most use- ful features are identified, such that when the probability of one of the diagnoses reaches a predetermined level of acceptability, the process is stopped, and the diagnosis is accepted. It may be pos- sible to achieve an acceptable level of certainty of the diagnosis with only a few features without having to process the entire feature set. Therefore, the physician can be considered a classifier of sorts. In this study, we designed an AI-based system using machine learning to extract clinically relevant features from EHR notes to mimic the clinical reasoning of human physicians. In medicine, machine learning methods have already demonstrated strong per- formance in image-based diagnoses, notably in radiology2 , derma- tology4 , and ophthalmology5–8 , but analysis of EHR data presents a number of difficult challenges. These challenges include the vast quantity of data, high dimensionality, data sparsity, and deviations Evaluation and accurate diagnoses of pediatric diseases using artificial intelligence Huiying Liang1,8 , Brian Y. Tsui 2,8 , Hao Ni3,8 , Carolina C. S. Valentim4,8 , Sally L. Baxter 2,8 , Guangjian Liu1,8 , Wenjia Cai 2 , Daniel S. Kermany1,2 , Xin Sun1 , Jiancong Chen2 , Liya He1 , Jie Zhu1 , Pin Tian2 , Hua Shao2 , Lianghong Zheng5,6 , Rui Hou5,6 , Sierra Hewett1,2 , Gen Li1,2 , Ping Liang3 , Xuan Zang3 , Zhiqi Zhang3 , Liyan Pan1 , Huimin Cai5,6 , Rujuan Ling1 , Shuhua Li1 , Yongwang Cui1 , Shusheng Tang1 , Hong Ye1 , Xiaoyan Huang1 , Waner He1 , Wenqing Liang1 , Qing Zhang1 , Jianmin Jiang1 , Wei Yu1 , Jianqun Gao1 , Wanxing Ou1 , Yingmin Deng1 , Qiaozhen Hou1 , Bei Wang1 , Cuichan Yao1 , Yan Liang1 , Shu Zhang1 , Yaou Duan2 , Runze Zhang2 , Sarah Gibson2 , Charlotte L. Zhang2 , Oulan Li2 , Edward D. Zhang2 , Gabriel Karin2 , Nathan Nguyen2 , Xiaokang Wu1,2 , Cindy Wen2 , Jie Xu2 , Wenqin Xu2 , Bochu Wang2 , Winston Wang2 , Jing Li1,2 , Bianca Pizzato2 , Caroline Bao2 , Daoman Xiang1 , Wanting He1,2 , Suiqin He2 , Yugui Zhou1,2 , Weldon Haw2,7 , Michael Goldbaum2 , Adriana Tremoulet2 , Chun-Nan Hsu 2 , Hannah Carter2 , Long Zhu3 , Kang Zhang 1,2,7 * and Huimin Xia 1 * NATURE MEDICINE | www.nature.com/naturemedicine 소아청소년과 ARTICLES https://doi.org/10.1038/s41591-018-0177-5 1 Applied Bioinformatics Laboratories, New York University School of Medicine, New York, NY, USA. 2 Skirball Institute, Department of Cell Biology, New York University School of Medicine, New York, NY, USA. 3 Department of Pathology, New York University School of Medicine, New York, NY, USA. 4 School of Mechanical Engineering, National Technical University of Athens, Zografou, Greece. 5 Institute for Systems Genetics, New York University School of Medicine, New York, NY, USA. 6 Department of Biochemistry and Molecular Pharmacology, New York University School of Medicine, New York, NY, USA. 7 Center for Biospecimen Research and Development, New York University, New York, NY, USA. 8 Department of Population Health and the Center for Healthcare Innovation and Delivery Science, New York University School of Medicine, New York, NY, USA. 9 These authors contributed equally to this work: Nicolas Coudray, Paolo Santiago Ocampo. *e-mail: narges.razavian@nyumc.org; aristotelis.tsirigos@nyumc.org A ccording to the American Cancer Society and the Cancer Statistics Center (see URLs), over 150,000 patients with lung cancer succumb to the disease each year (154,050 expected for 2018), while another 200,000 new cases are diagnosed on a yearly basis (234,030 expected for 2018). It is one of the most widely spread cancers in the world because of not only smoking, but also exposure to toxic chemicals like radon, asbestos and arsenic. LUAD and LUSC are the two most prevalent types of non–small cell lung cancer1 , and each is associated with discrete treatment guidelines. In the absence of definitive histologic features, this important distinc- tion can be challenging and time-consuming, and requires confir- matory immunohistochemical stains. Classification of lung cancer type is a key diagnostic process because the available treatment options, including conventional chemotherapy and, more recently, targeted therapies, differ for LUAD and LUSC2 . Also, a LUAD diagnosis will prompt the search for molecular biomarkers and sensitizing mutations and thus has a great impact on treatment options3,4 . For example, epidermal growth factor receptor (EGFR) mutations, present in about 20% of LUAD, and anaplastic lymphoma receptor tyrosine kinase (ALK) rearrangements, present in<5% of LUAD5 , currently have tar- geted therapies approved by the Food and Drug Administration (FDA)6,7 . Mutations in other genes, such as KRAS and tumor pro- tein P53 (TP53) are very common (about 25% and 50%, respec- tively) but have proven to be particularly challenging drug targets so far5,8 . Lung biopsies are typically used to diagnose lung cancer type and stage. Virtual microscopy of stained images of tissues is typically acquired at magnifications of 20×to 40×, generating very large two-dimensional images (10,000 to>100,000 pixels in each dimension) that are oftentimes challenging to visually inspect in an exhaustive manner. Furthermore, accurate interpretation can be difficult, and the distinction between LUAD and LUSC is not always clear, particularly in poorly differentiated tumors; in this case, ancil- lary studies are recommended for accurate classification9,10 . To assist experts, automatic analysis of lung cancer whole-slide images has been recently studied to predict survival outcomes11 and classifica- tion12 . For the latter, Yu et al.12 combined conventional thresholding and image processing techniques with machine-learning methods, such as random forest classifiers, support vector machines (SVM) or Naive Bayes classifiers, achieving an AUC of ~0.85 in distinguishing normal from tumor slides, and ~0.75 in distinguishing LUAD from LUSC slides. More recently, deep learning was used for the classi- fication of breast, bladder and lung tumors, achieving an AUC of 0.83 in classification of lung tumor types on tumor slides from The Cancer Genome Atlas (TCGA)13 . Analysis of plasma DNA values was also shown to be a good predictor of the presence of non–small cell cancer, with an AUC of ~0.94 (ref. 14 ) in distinguishing LUAD from LUSC, whereas the use of immunochemical markers yields an AUC of ~0.94115 . Here, we demonstrate how the field can further benefit from deep learning by presenting a strategy based on convolutional neural networks (CNNs) that not only outperforms methods in previously Classification and mutation prediction from non–small cell lung cancer histopathology images using deep learning Nicolas Coudray 1,2,9 , Paolo Santiago Ocampo3,9 , Theodore Sakellaropoulos4 , Navneet Narula3 , Matija Snuderl3 , David Fenyö5,6 , Andre L. Moreira3,7 , Narges Razavian 8 * and Aristotelis Tsirigos 1,3 * Visual inspection of histopathology slides is one of the main methods used by pathologists to assess the stage, type and sub- type of lung tumors. Adenocarcinoma (LUAD) and squamous cell carcinoma (LUSC) are the most prevalent subtypes of lung cancer, and their distinction requires visual inspection by an experienced pathologist. In this study, we trained a deep con- volutional neural network (inception v3) on whole-slide images obtained from The Cancer Genome Atlas to accurately and automatically classify them into LUAD, LUSC or normal lung tissue. The performance of our method is comparable to that of pathologists, with an average area under the curve (AUC) of 0.97. Our model was validated on independent datasets of frozen tissues, formalin-fixed paraffin-embedded tissues and biopsies. Furthermore, we trained the network to predict the ten most commonly mutated genes in LUAD. We found that six of them—STK11, EGFR, FAT1, SETBP1, KRAS and TP53—can be pre- dicted from pathology images, with AUCs from 0.733 to 0.856 as measured on a held-out population. These findings suggest that deep-learning models can assist pathologists in the detection of cancer subtype or gene mutations. Our approach can be applied to any cancer type, and the code is available at https://github.com/ncoudray/DeepPATH. NATURE MEDICINE | www.nature.com/naturemedicine 병리과병리과병리과병리과병리과병리과 ARTICLES https://doi.org/10.1038/s41551-018-0301-3 1 Sichuan Academy of Medical Sciences & Sichuan Provincial People’s Hospital, Chengdu, China. 2 Shanghai Wision AI Co., Ltd, Shanghai, China. 3 Beth Israel Deaconess Medical Center and Harvard Medical School, Center for Advanced Endoscopy, Boston , MA, USA. *e-mail: gary.samsph@gmail.com C olonoscopy is the gold-standard screening test for colorectal cancer1–3 , one of the leading causes of cancer death in both the United States4,5 and China6 . Colonoscopy can reduce the risk of death from colorectal cancer through the detection of tumours at an earlier, more treatable stage as well as through the removal of precancerous adenomas3,7 . Conversely, failure to detect adenomas may lead to the development of interval cancer. Evidence has shown that each 1.0% increase in adenoma detection rate (ADR) leads to a 3.0% decrease in the risk of interval colorectal cancer8 . Although more than 14million colonoscopies are performed in the United States annually2 , the adenoma miss rate (AMR) is estimated to be 6–27%9 . Certain polyps may be missed more fre- quently, including smaller polyps10,11 , flat polyps12 and polyps in the left colon13 . There are two independent reasons why a polyp may be missed during colonoscopy: (i) it was never in the visual field or (ii) it was in the visual field but not recognized. Several hardware innovations have sought to address the first problem by improv- ing visualization of the colonic lumen, for instance by providing a larger, panoramic camera view, or by flattening colonic folds using a distal-cap attachment. The problem of unrecognized polyps within the visual field has been more difficult to address14 . Several studies have shown that observation of the video monitor by either nurses or gastroenterology trainees may increase polyp detection by up to 30%15–17 . Ideally, a real-time automatic polyp-detection system could serve as a similarly effective second observer that could draw the endoscopist’s eye, in real time, to concerning lesions, effec- tively creating an ‘extra set of eyes’ on all aspects of the video data with fidelity. Although automatic polyp detection in colonoscopy videos has been an active research topic for the past 20 years, per- formance levels close to that of the expert endoscopist18–20 have not been achieved. Early work in automatic polyp detection has focused on applying deep-learning techniques to polyp detection, but most published works are small in scale, with small development and/or training validation sets19,20 . Here, we report the development and validation of a deep-learn- ing algorithm, integrated with a multi-threaded processing system, for the automatic detection of polyps during colonoscopy. We vali- dated the system in two image studies and two video studies. Each study contained two independent validation datasets. Results We developed a deep-learning algorithm using 5,545colonoscopy images from colonoscopy reports of 1,290patients that underwent a colonoscopy examination in the Endoscopy Center of Sichuan Provincial People’s Hospital between January 2007 and December 2015. Out of the 5,545images used, 3,634images contained polyps (65.54%) and 1,911 images did not contain polyps (34.46%). For algorithm training, experienced endoscopists annotated the pres- ence of each polyp in all of the images in the development data- set. We validated the algorithm on four independent datasets. DatasetsA and B were used for image analysis, and datasetsC and D were used for video analysis. DatasetA contained 27,113colonoscopy images from colo- noscopy reports of 1,138consecutive patients who underwent a colonoscopy examination in the Endoscopy Center of Sichuan Provincial People’s Hospital between January and December 2016 and who were found to have at least one polyp. Out of the 27,113 images, 5,541images contained polyps (20.44%) and 21,572images did not contain polyps (79.56%). All polyps were confirmed histo- logically after biopsy. DatasetB is a public database (CVC-ClinicDB; Development and validation of a deep-learning algorithm for the detection of polyps during colonoscopy Pu Wang1 , Xiao Xiao2 , Jeremy R. Glissen Brown3 , Tyler M. Berzin 3 , Mengtian Tu1 , Fei Xiong1 , Xiao Hu1 , Peixi Liu1 , Yan Song1 , Di Zhang1 , Xue Yang1 , Liangping Li1 , Jiong He2 , Xin Yi2 , Jingjia Liu2 and Xiaogang Liu 1 * The detection and removal of precancerous polyps via colonoscopy is the gold standard for the prevention of colon cancer. However, the detection rate of adenomatous polyps can vary significantly among endoscopists. Here, we show that a machine- learningalgorithmcandetectpolypsinclinicalcolonoscopies,inrealtimeandwithhighsensitivityandspecificity.Wedeveloped the deep-learning algorithm by using data from 1,290 patients, and validated it on newly collected 27,113 colonoscopy images from 1,138 patients with at least one detected polyp (per-image-sensitivity, 94.38%; per-image-specificity, 95.92%; area under the receiver operating characteristic curve, 0.984), on a public database of 612 polyp-containing images (per-image-sensitiv- ity, 88.24%), on 138 colonoscopy videos with histologically confirmed polyps (per-image-sensitivity of 91.64%; per-polyp-sen- sitivity, 100%), and on 54 unaltered full-range colonoscopy videos without polyps (per-image-specificity, 95.40%). By using a multi-threaded processing system, the algorithm can process at least 25 frames per second with a latency of 76.80±5.60ms in real-time video analysis. The software may aid endoscopists while performing colonoscopies, and help assess differences in polyp and adenoma detection performance among endoscopists. NATURE BIOMEDICA L ENGINEERING | VOL 2 | OCTOBER 2018 | 741–748 | www.nature.com/natbiomedeng 741 소화기내과 1Wang P, et al. Gut 2019;0:1–7. doi:10.1136/gutjnl-2018-317500 Endoscopy ORIGINAL ARTICLE Real-time automatic detection system increases colonoscopic polyp and adenoma detection rates: a prospective randomised controlled study Pu Wang,  1 Tyler M Berzin,  2 Jeremy Romek Glissen Brown,  2 Shishira Bharadwaj,2 Aymeric Becq,2 Xun Xiao,1 Peixi Liu,1 Liangping Li,1 Yan Song,1 Di Zhang,1 Yi Li,1 Guangre Xu,1 Mengtian Tu,1 Xiaogang Liu  1 To cite: Wang P, Berzin TM, Glissen Brown JR, et al. Gut Epub ahead of print: [please include Day Month Year]. doi:10.1136/ gutjnl-2018-317500 ► Additional material is published online only.To view please visit the journal online (http://dx.doi.org/10.1136/ gutjnl-2018-317500). 1 Department of Gastroenterology, Sichuan Academy of Medical Sciences & Sichuan Provincial People’s Hospital, Chengdu, China 2 Center for Advanced Endoscopy, Beth Israel Deaconess Medical Center and Harvard Medical School, Boston, Massachusetts, USA Correspondence to Xiaogang Liu, Department of Gastroenterology Sichuan Academy of Medical Sciences and Sichuan Provincial People’s Hospital, Chengdu, China; Gary.samsph@gmail.com Received 30 August 2018 Revised 4 February 2019 Accepted 13 February 2019 © Author(s) (or their employer(s)) 2019. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ. ABSTRACT Objective The effect of colonoscopy on colorectal cancer mortality is limited by several factors, among them a certain miss rate, leading to limited adenoma detection rates (ADRs).We investigated the effect of an automatic polyp detection system based on deep learning on polyp detection rate and ADR. Design In an open, non-blinded trial, consecutive patients were prospectively randomised to undergo diagnostic colonoscopy with or without assistance of a real-time automatic polyp detection system providing a simultaneous visual notice and sound alarm on polyp detection.The primary outcome was ADR. Results Of 1058 patients included, 536 were randomised to standard colonoscopy, and 522 were randomised to colonoscopy with computer-aided diagnosis.The artificial intelligence (AI) system significantly increased ADR (29.1%vs20.3%, p<0.001) and the mean number of adenomas per patient (0.53vs0.31, p<0.001).This was due to a higher number of diminutive adenomas found (185vs102; p<0.001), while there was no statistical difference in larger adenomas (77vs58, p=0.075). In addition, the number of hyperplastic polyps was also significantly increased (114vs52, p<0.001). Conclusions In a low prevalent ADR population, an automatic polyp detection system during colonoscopy resulted in a significant increase in the number of diminutive adenomas detected, as well as an increase in the rate of hyperplastic polyps.The cost–benefit ratio of such effects has to be determined further. Trial registration number ChiCTR-DDD-17012221; Results. INTRODUCTION Colorectal cancer (CRC) is the second and third- leading causes of cancer-related deaths in men and women respectively.1 Colonoscopy is the gold stan- dard for screening CRC.2 3 Screening colonoscopy has allowed for a reduction in the incidence and mortality of CRC via the detection and removal of adenomatous polyps.4–8 Additionally, there is evidence that with each 1.0% increase in adenoma detection rate (ADR), there is an associated 3.0% decrease in the risk of interval CRC.9 10 However, polyps can be missed, with reported miss rates of up to 27% due to both polyp and operator charac- teristics.11 12 Unrecognised polyps within the visual field is an important problem to address.11 Several studies have shown that assistance by a second observer increases the polyp detection rate (PDR), but such a strategy remains controversial in terms of increasing the ADR.13–15 Ideally, a real-time automatic polyp detec- tion system, with performance close to that of expert endoscopists, could assist the endosco- pist in detecting lesions that might correspond to adenomas in a more consistent and reliable way Significance of this study What is already known on this subject? ► Colorectal adenoma detection rate (ADR) is regarded as a main quality indicator of (screening) colonoscopy and has been shown to correlate with interval cancers. Reducing adenoma miss rates by increasing ADR has been a goal of many studies focused on imaging techniques and mechanical methods. ► Artificial intelligence has been recently introduced for polyp and adenoma detection as well as differentiation and has shown promising results in preliminary studies. What are the new findings? ► This represents the first prospective randomised controlled trial examining an automatic polyp detection during colonoscopy and shows an increase of ADR by 50%, from 20% to 30%. ► This effect was mainly due to a higher rate of small adenomas found. ► The detection rate of hyperplastic polyps was also significantly increased. How might it impact on clinical practice in the foreseeable future? ► Automatic polyp and adenoma detection could be the future of diagnostic colonoscopy in order to achieve stable high adenoma detection rates. ► However, the effect on ultimate outcome is still unclear, and further improvements such as polyp differentiation have to be implemented. on17March2019byguest.Protectedbycopyright.http://gut.bmj.com/Gut:firstpublishedas10.1136/gutjnl-2018-317500on27February2019.Downloadedfrom 소화기내과 Downloadedfromhttps://journals.lww.com/ajspbyBhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4XMi0hCywCX1AWnYQp/IlQrHD3MyLIZIvnCFZVJ56DGsD590P5lh5KqE20T/dBX3x9CoM=on10/14/2018 Downloadedfromhttps://journals.lww.com/ajspbyBhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4XMi0hCywCX1AWnYQp/IlQrHD3MyLIZIvnCFZVJ56DGsD590P5lh5KqE20T/dBX3x9CoM=on10/14/2018 Impact of Deep Learning Assistance on the Histopathologic Review of Lymph Nodes for Metastatic Breast Cancer David F. Steiner, MD, PhD,* Robert MacDonald, PhD,* Yun Liu, PhD,* Peter Truszkowski, MD,* Jason D. Hipp, MD, PhD, FCAP,* Christopher Gammage, MS,* Florence Thng, MS,† Lily Peng, MD, PhD,* and Martin C. Stumpe, PhD* Abstract: Advances in the quality of whole-slide images have set the stage for the clinical use of digital images in anatomic pathology. Along with advances in computer image analysis, this raises the possibility for computer-assisted diagnostics in pathology to improve histopathologic interpretation and clinical care. To evaluate the potential impact of digital assistance on interpretation of digitized slides, we conducted a multireader multicase study utilizing our deep learning algorithm for the detection of breast cancer metastasis in lymph nodes. Six pathologists reviewed 70 digitized slides from lymph node sections in 2 reader modes, unassisted and assisted, with a wash- out period between sessions. In the assisted mode, the deep learning algorithm was used to identify and outline regions with high like- lihood of containing tumor. Algorithm-assisted pathologists demon- strated higher accuracy than either the algorithm or the pathologist alone. In particular, algorithm assistance significantly increased the sensitivity of detection for micrometastases (91% vs. 83%, P=0.02). In addition, average review time per image was significantly shorter with assistance than without assistance for both micrometastases (61 vs. 116 s, P=0.002) and negative images (111 vs. 137 s, P=0.018). Lastly, pathologists were asked to provide a numeric score regarding the difficulty of each image classification. On the basis of this score, pathologists considered the image review of micrometastases to be significantly easier when interpreted with assistance (P=0.0005). Utilizing a proof of concept assistant tool, this study demonstrates the potential of a deep learning algorithm to improve pathologist accu- racy and efficiency in a digital pathology workflow. Key Words: artificial intelligence, machine learning, digital pathology, breast cancer, computer aided detection (Am J Surg Pathol 2018;00:000–000) The regulatory approval and gradual implementation of whole-slide scanners has enabled the digitization of glass slides for remote consults and archival purposes.1 Digitiza- tion alone, however, does not necessarily improve the con- sistency or efficiency of a pathologist’s primary workflow. In fact, image review on a digital medium can be slightly slower than on glass, especially for pathologists with limited digital pathology experience.2 However, digital pathology and image analysis tools have already demonstrated po- tential benefits, including the potential to reduce inter-reader variability in the evaluation of breast cancer HER2 status.3,4 Digitization also opens the door for assistive tools based on Artificial Intelligence (AI) to improve efficiency and con- sistency, decrease fatigue, and increase accuracy.5 Among AI technologies, deep learning has demon- strated strong performance in many automated image-rec- ognition applications.6–8 Recently, several deep learning– based algorithms have been developed for the detection of breast cancer metastases in lymph nodes as well as for other applications in pathology.9,10 Initial findings suggest that some algorithms can even exceed a pathologist’s sensitivity for detecting individual cancer foci in digital images. How- ever, this sensitivity gain comes at the cost of increased false positives, potentially limiting the utility of such algorithms for automated clinical use.11 In addition, deep learning algo- rithms are inherently limited to the task for which they have been specifically trained. While we have begun to understand the strengths of these algorithms (such as exhaustive search) and their weaknesses (sensitivity to poor optical focus, tumor mimics; manuscript under review), the potential clinical util- ity of such algorithms has not been thoroughly examined. While an accurate algorithm alone will not necessarily aid pathologists or improve clinical interpretation, these benefits may be achieved through thoughtful and appropriate in- tegration of algorithm predictions into the clinical workflow.8 From the *Google AI Healthcare; and †Verily Life Sciences, Mountain View, CA. D.F.S., R.M., and Y.L. are co-first authors (equal contribution). Work done as part of the Google Brain Healthcare Technology Fellowship (D.F.S. and P.T.). Conflicts of Interest and Source of Funding: D.F.S., R.M., Y.L., P.T., J.D.H., C.G., F.T., L.P., M.C.S. are employees of Alphabet and have Alphabet stock. Correspondence: David F. Steiner, MD, PhD, Google AI Healthcare, 1600 Amphitheatre Way, Mountain View, CA 94043 (e-mail: davesteiner@google.com). Supplemental Digital Content is available for this article. Direct URL citations appear in the printed text and are provided in the HTML and PDF versions of this article on the journal’s website, www.ajsp.com. Copyright © 2018 The Author(s). Published by Wolters Kluwer Health, Inc. This is an open-access article distributed under the terms of the Creative Commons Attribution-Non Commercial-No Derivatives License 4.0 (CCBY-NC-ND), where it is permissible to download and share the work provided it is properly cited. The work cannot be changed in any way or used commercially without permission from the journal. ORIGINAL ARTICLE Am J Surg Pathol Volume 00, Number 00, ’’ 2018 www.ajsp.com | 1 병리과 S E P S I S A targeted real-time early warning score (TREWScore) for septic shock Katharine E. Henry,1 David N. Hager,2 Peter J. Pronovost,3,4,5 Suchi Saria1,3,5,6 * Sepsis is a leading cause of death in the United States, with mortality highest among patients who develop septic shock. Early aggressive treatment decreases morbidity and mortality. Although automated screening tools can detect patients currently experiencing severe sepsis and septic shock, none predict those at greatest risk of developing shock. We analyzed routinely available physiological and laboratory data from intensive care unit patients and devel- oped “TREWScore,” a targeted real-time early warning score that predicts which patients will develop septic shock. TREWScore identified patients before the onset of septic shock with an area under the ROC (receiver operating characteristic) curve (AUC) of 0.83 [95% confidence interval (CI), 0.81 to 0.85]. At a specificity of 0.67, TREWScore achieved a sensitivity of 0.85 and identified patients a median of 28.2 [interquartile range (IQR), 10.6 to 94.2] hours before onset. Of those identified, two-thirds were identified before any sepsis-related organ dysfunction. In compar- ison, the Modified Early Warning Score, which has been used clinically for septic shock prediction, achieved a lower AUC of 0.73 (95% CI, 0.71 to 0.76). A routine screening protocol based on the presence of two of the systemic inflam- matory response syndrome criteria, suspicion of infection, and either hypotension or hyperlactatemia achieved a low- er sensitivity of 0.74 at a comparable specificity of 0.64. Continuous sampling of data from the electronic health records and calculation of TREWScore may allow clinicians to identify patients at risk for septic shock and provide earlier interventions that would prevent or mitigate the associated morbidity and mortality. INTRODUCTION Seven hundred fifty thousand patients develop severe sepsis and septic shock in the United States each year. More than half of them are admitted to an intensive care unit (ICU), accounting for 10% of all ICU admissions, 20 to 30% of hospital deaths, and $15.4 billion in an- nual health care costs (1–3). Several studies have demonstrated that morbidity, mortality, and length of stay are decreased when severe sep- sis and septic shock are identified and treated early (4–8). In particular, one study showed that mortality from septic shock increased by 7.6% with every hour that treatment was delayed after the onset of hypo- tension (9). More recent studies comparing protocolized care, usual care, and early goal-directed therapy (EGDT) for patients with septic shock sug- gest that usual care is as effective as EGDT (10–12). Some have inter- preted this to mean that usual care has improved over time and reflects important aspects of EGDT, such as early antibiotics and early ag- gressive fluid resuscitation (13). It is likely that continued early identi- fication and treatment will further improve outcomes. However, the best approach to managing patients at high risk of developing septic shock before the onset of severe sepsis or shock has not been studied. Methods that can identify ahead of time which patients will later expe- rience septic shock are needed to further understand, study, and im- prove outcomes in this population. General-purpose illness severity scoring systems such as the Acute Physiology and Chronic Health Evaluation (APACHE II), Simplified Acute Physiology Score (SAPS II), SequentialOrgan Failure Assessment (SOFA) scores, Modified Early Warning Score (MEWS), and Simple Clinical Score (SCS) have been validated to assess illness severity and risk of death among septic patients (14–17). Although these scores are useful for predicting general deterioration or mortality, they typical- ly cannot distinguish with high sensitivity and specificity which patients are at highest risk of developing a specific acute condition. The increased use of electronic health records (EHRs), which can be queried in real time, has generated interest in automating tools that identify patients at risk for septic shock (18–20). A number of “early warning systems,” “track and trigger” initiatives, “listening applica- tions,” and “sniffers” have been implemented to improve detection andtimelinessof therapy forpatients with severe sepsis andseptic shock (18, 20–23). Although these tools have been successful at detecting pa- tients currently experiencing severe sepsis or septic shock, none predict which patients are at highest risk of developing septic shock. The adoption of the Affordable Care Act has added to the growing excitement around predictive models derived from electronic health data in a variety of applications (24), including discharge planning (25), risk stratification (26, 27), and identification of acute adverse events (28, 29). For septic shock in particular, promising work includes that of predicting septic shock using high-fidelity physiological signals collected directly from bedside monitors (30, 31), inferring relationships between predictors of septic shock using Bayesian networks (32), and using routine measurements for septic shock prediction (33–35). No current prediction models that use only data routinely stored in the EHR predict septic shock with high sensitivity and specificity many hours before onset. Moreover, when learning predictive risk scores, cur- rent methods (34, 36, 37) often have not accounted for the censoring effects of clinical interventions on patient outcomes (38). For instance, a patient with severe sepsis who received fluids and never developed septic shock would be treated as a negative case, despite the possibility that he or she might have developed septic shock in the absence of such treatment and therefore could be considered a positive case up until the 1 Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA. 2 Division of Pulmonary and Critical Care Medicine, Department of Medicine, School of Medicine, Johns Hopkins University, Baltimore, MD 21205, USA. 3 Armstrong Institute for Patient Safety and Quality, Johns Hopkins University, Baltimore, MD 21202, USA. 4 Department of Anesthesiology and Critical Care Medicine, School of Medicine, Johns Hopkins University, Baltimore, MD 21202, USA. 5 Department of Health Policy and Management, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD 21205, USA. 6 Department of Applied Math and Statistics, Johns Hopkins University, Baltimore, MD 21218, USA. *Corresponding author. E-mail: ssaria@cs.jhu.edu R E S E A R C H A R T I C L E www.ScienceTranslationalMedicine.org 5 August 2015 Vol 7 Issue 299 299ra122 1 onNovember3,2016http://stm.sciencemag.org/Downloadedfrom 감염내과 BRIEF COMMUNICATION OPEN Digital biomarkers of cognitive function Paul Dagum1 To identify digital biomarkers associated with cognitive function, we analyzed human–computer interaction from 7 days of smartphone use in 27 subjects (ages 18–34) who received a gold standard neuropsychological assessment. For several neuropsychological constructs (working memory, memory, executive function, language, and intelligence), we found a family of digital biomarkers that predicted test scores with high correlations (p 10−4 ). These preliminary results suggest that passive measures from smartphone use could be a continuous ecological surrogate for laboratory-based neuropsychological assessment. npj Digital Medicine (2018)1:10 ; doi:10.1038/s41746-018-0018-4 INTRODUCTION By comparison to the functional metrics available in other disciplines, conventional measures of neuropsychiatric disorders have several challenges. First, they are obtrusive, requiring a subject to break from their normal routine, dedicating time and often travel. Second, they are not ecological and require subjects to perform a task outside of the context of everyday behavior. Third, they are episodic and provide sparse snapshots of a patient only at the time of the assessment. Lastly, they are poorly scalable, taxing limited resources including space and trained staff. In seeking objective and ecological measures of cognition, we attempted to develop a method to measure memory and executive function not in the laboratory but in the moment, day-to-day. We used human–computer interaction on smart- phones to identify digital biomarkers that were correlated with neuropsychological performance. RESULTS In 2014, 27 participants (ages 27.1 ± 4.4 years, education 14.1 ± 2.3 years, M:F 8:19) volunteered for neuropsychological assessment and a test of the smartphone app. Smartphone human–computer interaction data from the 7 days following the neuropsychological assessment showed a range of correla- tions with the cognitive scores. Table 1 shows the correlation between each neurocognitive test and the cross-validated predictions of the supervised kernel PCA constructed from the biomarkers for that test. Figure 1 shows each participant test score and the digital biomarker prediction for (a) digits backward, (b) symbol digit modality, (c) animal fluency, (d) Wechsler Memory Scale-3rd Edition (WMS-III) logical memory (delayed free recall), (e) brief visuospatial memory test (delayed free recall), and (f) Wechsler Adult Intelligence Scale- 4th Edition (WAIS-IV) block design. Construct validity of the predictions was determined using pattern matching that computed a correlation of 0.87 with p 10−59 between the covariance matrix of the predictions and the covariance matrix of the tests. Table 1. Fourteen neurocognitive assessments covering five cognitive domains and dexterity were performed by a neuropsychologist. Shown are the group mean and standard deviation, range of score, and the correlation between each test and the cross-validated prediction constructed from the digital biomarkers for that test Cognitive predictions Mean (SD) Range R (predicted), p-value Working memory Digits forward 10.9 (2.7) 7–15 0.71 ± 0.10, 10−4 Digits backward 8.3 (2.7) 4–14 0.75 ± 0.08, 10−5 Executive function Trail A 23.0 (7.6) 12–39 0.70 ± 0.10, 10−4 Trail B 53.3 (13.1) 37–88 0.82 ± 0.06, 10−6 Symbol digit modality 55.8 (7.7) 43–67 0.70 ± 0.10, 10−4 Language Animal fluency 22.5 (3.8) 15–30 0.67 ± 0.11, 10−4 FAS phonemic fluency 42 (7.1) 27–52 0.63 ± 0.12, 10−3 Dexterity Grooved pegboard test (dominant hand) 62.7 (6.7) 51–75 0.73 ± 0.09, 10−4 Memory California verbal learning test (delayed free recall) 14.1 (1.9) 9–16 0.62 ± 0.12, 10−3 WMS-III logical memory (delayed free recall) 29.4 (6.2) 18–42 0.81 ± 0.07, 10−6 Brief visuospatial memory test (delayed free recall) 10.2 (1.8) 5–12 0.77 ± 0.08, 10−5 Intelligence scale WAIS-IV block design 46.1(12.8) 12–61 0.83 ± 0.06, 10−6 WAIS-IV matrix reasoning 22.1(3.3) 12–26 0.80 ± 0.07, 10−6 WAIS-IV vocabulary 40.6(4.0) 31–50 0.67 ± 0.11, 10−4 Received: 5 October 2017 Revised: 3 February 2018 Accepted: 7 February 2018 1 Mindstrong Health, 248 Homer Street, Palo Alto, CA 94301, USA Correspondence: Paul Dagum (paul@mindstronghealth.com) www.nature.com/npjdigitalmed 정신의학과 P R E C I S I O N M E D I C I N E Identification of type 2 diabetes subgroups through topological analysis of patient similarity Li Li,1 Wei-Yi Cheng,1 Benjamin S. Glicksberg,1 Omri Gottesman,2 Ronald Tamler,3 Rong Chen,1 Erwin P. Bottinger,2 Joel T. Dudley1,4 * Type 2 diabetes (T2D) is a heterogeneous complex disease affecting more than 29 million Americans alone with a rising prevalence trending toward steady increases in the coming decades. Thus, there is a pressing clinical need to improve early prevention and clinical management of T2D and its complications. Clinicians have understood that patients who carry the T2D diagnosis have a variety of phenotypes and susceptibilities to diabetes-related compli- cations. We used a precision medicine approach to characterize the complexity of T2D patient populations based on high-dimensional electronic medical records (EMRs) and genotype data from 11,210 individuals. We successfully identified three distinct subgroups of T2D from topology-based patient-patient networks. Subtype 1 was character- ized by T2D complications diabetic nephropathy and diabetic retinopathy; subtype 2 was enriched for cancer ma- lignancy and cardiovascular diseases; and subtype 3 was associated most strongly with cardiovascular diseases, neurological diseases, allergies, and HIV infections. We performed a genetic association analysis of the emergent T2D subtypes to identify subtype-specific genetic m 내분비내과 LETTER Derma o og - eve c a ca on o k n cancer w h deep neura ne work 피부과 FOCUS LETTERS W W W W W Ca d o og s eve a hy hm a de ec on and c ass ca on n ambu a o y e ec oca d og ams us ng a deep neu a ne wo k M m M FOCUS LETTERS 심장내과 D p a n ng nab obu a m n and on o human b a o y a n v o a on 산부인과 O G NA A W on o On o og nd b e n e e men e ommend on g eemen w h n e pe mu d p n umo bo d 종양내과신장내과 up d u onomou obo o u u g 외과
  • 20.
    NATURE MEDICINE and thealgorithm led to the best accuracy, and the algorithm mark- edly sped up the review of slides35 . This study is particularly notable, 41 Table 2 | FDA AI approvals are accelerating Company FDA Approval Indication Apple September 2018 Atrial fibrillation detection Aidoc August 2018 CT brain bleed diagnosis iCAD August 2018 Breast density via mammography Zebra Medical July 2018 Coronary calcium scoring Bay Labs June 2018 Echocardiogram EF determination Neural Analytics May 2018 Device for paramedic stroke diagnosis IDx April 2018 Diabetic retinopathy diagnosis Icometrix April 2018 MRI brain interpretation Imagen March 2018 X-ray wrist fracture diagnosis Viz.ai February 2018 CT stroke diagnosis Arterys February 2018 Liver and lung cancer (MRI, CT) diagnosis MaxQ-AI January 2018 CT brain bleed diagnosis Alivecor November 2017 Atrial fibrillation detection via Apple Watch Arterys January 2017 MRI heart interpretation NATURE MEDICINE 인공지능 기반 의료기기 
 FDA 인허가 현황 Nature Medicine 2019 • Zebra Medical Vision • 2019년 5월: 흉부 엑스레이에서 기흉 triage • 2019년 6월: head CT 에서 뇌출혈 판독 • Aidoc • 2019년 5월: CT에서 폐색전증 판독 • 2019년 6월: CT에서 경추골절 판독 • GE 헬스케어 • 2019년 9월: 흉부 엑스레이 기기에서 기흉 triage +
  • 21.
    인공지능 기반 의료기기
 국내 인허가 현황 • 1. 뷰노 본에이지 (2등급 허가) • 2. 루닛 인사이트 폐결절 (2등급 허가) • 3. JLK인스펙션 뇌경색 (3등급 허가) • 4. 인포메디텍 뉴로아이 (2등급 인증): MRI 기반 치매 진단 보조
 • 5. 삼성전자 폐결절 (2등급 허가) • 6. 뷰노 딥브레인 (2등급 인증) • 7. 루닛 인사이트 MMG (3등급 허가) • 8. JLK인스펙션 ATROSCAN (2등급 인증) 건강검진용 뇌 노화 측정 • 9. 뷰노 체스트엑스레이 (2등급 허가) • 10. 딥노이드 딥스파인 (2등급 허가): X-ray 요추 압박골절 검출보조 • 11. JLK 인스펙션 폐 CT(JLD-01A) (2등급 인증) • 12. JLK 인스펙션 대장내시경 (JFD-01A) (2등급 인증) • 13. JLK 인스펙션 위내시경 (JFD-02A) (2등급 인증) • 14. 루닛 인사이트 CXR (2등급 허가): 흉부 엑스레이에서 이상부위 검출 보조
 • 15. 뷰노 Fundus AI (3등급 허가): 안저 사진 분석, 12가지 이상 소견 유무 • 16. 딥바이오 DeepDx-Prostate: 전립선 조직 생검으로 암진단 보조 • 17. 뷰노 LungCT (2등급 허가): CT 영상 기반 폐결절 검출 인공지능 2018년 2019년 2020년
  • 22.
    JLK인스펙션, 코스닥 시장상장 •2019년 7월 기술성 평가 통과 •9월 6일 상장 예비 심사 청구 •2019년 12월 11일 코스닥 상장 •공모 시장에서 180억원 조달
  • 23.
    뷰노, 연내 상장계획 “뷰노는 지난 4월 산업은행에서 90억원을 투자 받는 과정에 서 기업가치 1500억원을 인정받았다. 업계에서는 뷰노의 상 장 후 기업가치는 2000억원 이상으로 예상하고 있다.” “뷰노는 나이스디앤비, 한국기업데이터 두 기관이 진행한 
 기술성평가에서 모두 A등급을 획득해 높은 인공지능(AI) 
 기술력을 입증했다. 뷰노는 이번 결과를 기반으로 이른 시일 내 코스닥 상장을 위한 예비심사 청구서를 제출할 예정이다.”
  • 24.
    Artificial Intelligence inmedicine is not a future. It is already here.
  • 25.
    Artificial Intelligence inmedicine is not a future. It is already here.
  • 26.
    Wrong Question 누가 더잘 하는가? (x) 의사를 대체할 것인가? (x)
  • 27.
    Right Question 더 나은의료를 어떻게 만들 수 있는가?(O) 의료의 목적을 어떻게 더 잘 이룰 수 있나? (O)
  • 28.
    The American MedicalAssociation House of Delegates has adopted policies to keep the focus on advancing the role of augmented intelligence (AI) in enhancing patient care, improving population health, reducing overall costs, increasing value and the support of professional satisfaction for physicians. Foundational policy Annual 2018 As a leader in American medicine, our AMA has a unique opportunity to ensure that the evolution of AI in medicine benefits patients, physicians and the health care community. To that end our AMA seeks to: Leverage ongoing engagement in digital health and other priority areas for improving patient outcomes and physician professional satisfaction to help set priorities for health care AI Identify opportunities to integrate practicing physicians’perspectives into the development, design, validation and implementation of health care AI Promote development of thoughtfully designed, high-quality, clinically validated health care AI that: • Is designed and evaluated in keeping with best practices in user-centered design, particularly for physicians and other members of the health care team • Is transparent • Conforms to leading standards for reproducibility • Identifies and takes steps to address bias and avoids introducing or exacerbating health care disparities, including when testing or deploying new AI tools on vulnerable populations • Safeguards patients’and other individuals’ privacy interests and preserves the security and integrity of personal information Encourage education for patients, physicians, medical students, other health care professionals and health administrators to promote greater understanding of the promise and limitations of health care AI Explore the legal implications of health care AI, such as issues of liability or intellectual property, and advocate for appropriate professional and governmental oversight for safe, effective, and equitable use of and access to health care AI Medical experts are working to determine the clinical applications of AI—work that will guide health care in the future. These experts, along with physicians, state and federal officials must find the path that ends with better outcomes for patients. We have to make sure the technology does not get ahead of our humanity and creativity as physicians. ”—Gerald E. Harmon, MD, AMA Board of Trustees “ Policy Augmented intelligence in health care https://www.ama-assn.org/system/files/2019-08/ai-2018-board-policy-summary.pdf Augmented Intelligence, rather than Artificial Intelligence
  • 29.
    Martin Duggan,“IBM WatsonHealth - Integrated Care the Evolution to Cognitive Computing” 인간 의사의 어떤 측면이 augmented 될 수 있는가?
  • 30.
    의료 인공지능 •1부: 제2의 기계시대와 의료 인공지능 •2부: 의료 인공지능의 과거와 현재 •3부: 미래를 어떻게 맞이할 것인가
  • 31.
    의료 인공지능 •1부: 제2의 기계시대와 의료 인공지능 •2부: 의료 인공지능의 과거와 현재 •3부: 미래를 어떻게 맞이할 것인가
  • 32.
    •복잡한 의료 데이터의분석 및 insight 도출 •영상 의료/병리 데이터의 분석/판독 •연속 데이터의 모니터링 및 예방/예측 의료 인공지능의 세 유형
  • 33.
    •복잡한 의료 데이터의분석 및 insight 도출 •영상 의료/병리 데이터의 분석/판독 •연속 데이터의 모니터링 및 예방/예측 의료 인공지능의 세 유형
  • 35.
    Jeopardy! 2011년 인간 챔피언두 명 과 퀴즈 대결을 벌여서 압도적인 우승을 차지
  • 36.
    600,000 pieces ofmedical evidence 2 million pages of text from 42 medical journals and clinical trials 69 guidelines, 61,540 clinical trials IBM Watson on Medicine Watson learned... + 1,500 lung cancer cases physician notes, lab results and clinical research + 14,700 hours of hands-on training
  • 40.
  • 41.
    WFO in ASCO2017 • Early experience with IBM WFO cognitive computing system for lung 
 
 and colorectal cancer treatment (마니팔 병원)
 • 지난 3년간: lung cancer(112), colon cancer(126), rectum cancer(124) • lung cancer: localized 88.9%, meta 97.9% • colon cancer: localized 85.5%, meta 76.6% • rectum cancer: localized 96.8%, meta 80.6% Performance of WFO in India 2017 ASCO annual Meeting, J Clin Oncol 35, 2017 (suppl; abstr 8527)
  • 42.
    WFO in ASCO2017 •가천대 길병원의 대장암과 위암 환자에 왓슨 적용 결과 • 대장암 환자(stage II-IV) 340명 • 진행성 위암 환자 185명 (Retrospective)
 • 의사와의 일치율 • 대장암 환자: 73% • 보조 (adjuvant) 항암치료를 받은 250명: 85% • 전이성 환자 90명: 40%
 • 위암 환자: 49% • Trastzumab/FOLFOX 가 국민 건강 보험 수가를 받지 못함 • S-1(tegafur, gimeracil and oteracil)+cisplatin): • 국내는 매우 루틴; 미국에서는 X
  • 43.
    •“향후 10년 동안첫번째 cardiovascular event 가 올 것인가” 예측 •전향적 코호트 스터디: 영국 환자 378,256 명 •일상적 의료 데이터를 바탕으로 기계학습으로 질병을 예측하는 첫번째 대규모 스터디 •기존의 ACC/AHA 가이드라인과 4가지 기계학습 알고리즘의 정확도를 비교 •Random forest; Logistic regression; Gradient boosting; Neural network
  • 44.
    ARTICLE OPEN Scalable andaccurate deep learning with electronic health records Alvin Rajkomar 1,2 , Eyal Oren1 , Kai Chen1 , Andrew M. Dai1 , Nissan Hajaj1 , Michaela Hardt1 , Peter J. Liu1 , Xiaobing Liu1 , Jake Marcus1 , Mimi Sun1 , Patrik Sundberg1 , Hector Yee1 , Kun Zhang1 , Yi Zhang1 , Gerardo Flores1 , Gavin E. Duggan1 , Jamie Irvine1 , Quoc Le1 , Kurt Litsch1 , Alexander Mossin1 , Justin Tansuwan1 , De Wang1 , James Wexler1 , Jimbo Wilson1 , Dana Ludwig2 , Samuel L. Volchenboum3 , Katherine Chou1 , Michael Pearson1 , Srinivasan Madabushi1 , Nigam H. Shah4 , Atul J. Butte2 , Michael D. Howell1 , Claire Cui1 , Greg S. Corrado1 and Jeffrey Dean1 Predictive modeling with electronic health record (EHR) data is anticipated to drive personalized medicine and improve healthcare quality. Constructing predictive statistical models typically requires extraction of curated predictor variables from normalized EHR data, a labor-intensive process that discards the vast majority of information in each patient’s record. We propose a representation of patients’ entire raw EHR records based on the Fast Healthcare Interoperability Resources (FHIR) format. We demonstrate that deep learning methods using this representation are capable of accurately predicting multiple medical events from multiple centers without site-specific data harmonization. We validated our approach using de-identified EHR data from two US academic medical centers with 216,221 adult patients hospitalized for at least 24 h. In the sequential format we propose, this volume of EHR data unrolled into a total of 46,864,534,945 data points, including clinical notes. Deep learning models achieved high accuracy for tasks such as predicting: in-hospital mortality (area under the receiver operator curve [AUROC] across sites 0.93–0.94), 30-day unplanned readmission (AUROC 0.75–0.76), prolonged length of stay (AUROC 0.85–0.86), and all of a patient’s final discharge diagnoses (frequency-weighted AUROC 0.90). These models outperformed traditional, clinically-used predictive models in all cases. We believe that this approach can be used to create accurate and scalable predictions for a variety of clinical scenarios. In a case study of a particular prediction, we demonstrate that neural networks can be used to identify relevant information from the patient’s chart. npj Digital Medicine (2018)1:18 ; doi:10.1038/s41746-018-0029-1 INTRODUCTION The promise of digital medicine stems in part from the hope that, by digitizing health data, we might more easily leverage computer information systems to understand and improve care. In fact, routinely collected patient healthcare data are now approaching the genomic scale in volume and complexity.1 Unfortunately, most of this information is not yet used in the sorts of predictive statistical models clinicians might use to improve care delivery. It is widely suspected that use of such efforts, if successful, could provide major benefits not only for patient safety and quality but also in reducing healthcare costs.2–6 In spite of the richness and potential of available data, scaling the development of predictive models is difficult because, for traditional predictive modeling techniques, each outcome to be predicted requires the creation of a custom dataset with specific variables.7 It is widely held that 80% of the effort in an analytic model is preprocessing, merging, customizing, and cleaning nurses, and other providers are included. Traditional modeling approaches have dealt with this complexity simply by choosing a very limited number of commonly collected variables to consider.7 This is problematic because the resulting models may produce imprecise predictions: false-positive predictions can overwhelm physicians, nurses, and other providers with false alarms and concomitant alert fatigue,10 which the Joint Commission identified as a national patient safety priority in 2014.11 False-negative predictions can miss significant numbers of clinically important events, leading to poor clinical outcomes.11,12 Incorporating the entire EHR, including clinicians’ free-text notes, offers some hope of overcoming these shortcomings but is unwieldy for most predictive modeling techniques. Recent developments in deep learning and artificial neural networks may allow us to address many of these challenges and unlock the information in the EHR. Deep learning emerged as the preferred machine learning approach in machine perception www.nature.com/npjdigitalmed •2018년 1월 구글이 전자의무기록(EMR)을 분석하여, 환자 치료 결과를 예측하는 인공지능 발표 •환자가 입원 중에 사망할 것인지 •장기간 입원할 것인지 •퇴원 후에 30일 내에 재입원할 것인지 •퇴원 시의 진단명
 •이번 연구의 특징: 확장성 •과거 다른 연구와 달리 EMR의 일부 데이터를 pre-processing 하지 않고, •전체 EMR 를 통째로 모두 분석하였음: UCSF, UCM (시카고 대학병원) •특히, 비정형 데이터인 의사의 진료 노트도 분석
  • 45.
    LETTERS https://doi.org/10.1038/s41591-018-0335-9 1 Guangzhou Women andChildren’s Medical Center, Guangzhou Medical University, Guangzhou, China. 2 Institute for Genomic Medicine, Institute of Engineering in Medicine, and Shiley Eye Institute, University of California, San Diego, La Jolla, CA, USA. 3 Hangzhou YITU Healthcare Technology Co. Ltd, Hangzhou, China. 4 Department of Thoracic Surgery/Oncology, First Affiliated Hospital of Guangzhou Medical University, China State Key Laboratory and National Clinical Research Center for Respiratory Disease, Guangzhou, China. 5 Guangzhou Kangrui Co. Ltd, Guangzhou, China. 6 Guangzhou Regenerative Medicine and Health Guangdong Laboratory, Guangzhou, China. 7 Veterans Administration Healthcare System, San Diego, CA, USA. 8 These authors contributed equally: Huiying Liang, Brian Tsui, Hao Ni, Carolina C. S. Valentim, Sally L. Baxter, Guangjian Liu. *e-mail: kang.zhang@gmail.com; xiahumin@hotmail.com Artificial intelligence (AI)-based methods have emerged as powerful tools to transform medical care. Although machine learning classifiers (MLCs) have already demonstrated strong performance in image-based diagnoses, analysis of diverse and massive electronic health record (EHR) data remains chal- lenging. Here, we show that MLCs can query EHRs in a manner similar to the hypothetico-deductive reasoning used by physi- cians and unearth associations that previous statistical meth- ods have not found. Our model applies an automated natural language processing system using deep learning techniques to extract clinically relevant information from EHRs. In total, 101.6 million data points from 1,362,559 pediatric patient visits presenting to a major referral center were analyzed to train and validate the framework. Our model demonstrates high diagnostic accuracy across multiple organ systems and is comparable to experienced pediatricians in diagnosing com- mon childhood diseases. Our study provides a proof of con- cept for implementing an AI-based system as a means to aid physiciansintacklinglargeamountsofdata,augmentingdiag- nostic evaluations, and to provide clinical decision support in cases of diagnostic uncertainty or complexity. Although this impact may be most evident in areas where healthcare provid- ers are in relative shortage, the benefits of such an AI system are likely to be universal. Medical information has become increasingly complex over time. The range of disease entities, diagnostic testing and biomark- ers, and treatment modalities has increased exponentially in recent years. Subsequently, clinical decision-making has also become more complex and demands the synthesis of decisions from assessment of large volumes of data representing clinical information. In the current digital age, the electronic health record (EHR) represents a massive repository of electronic data points representing a diverse array of clinical information1–3 . Artificial intelligence (AI) methods have emerged as potentially powerful tools to mine EHR data to aid in disease diagnosis and management, mimicking and perhaps even augmenting the clinical decision-making of human physicians1 . To formulate a diagnosis for any given patient, physicians fre- quently use hypotheticodeductive reasoning. Starting with the chief complaint, the physician then asks appropriately targeted questions relating to that complaint. From this initial small feature set, the physician forms a differential diagnosis and decides what features (historical questions, physical exam findings, laboratory testing, and/or imaging studies) to obtain next in order to rule in or rule out the diagnoses in the differential diagnosis set. The most use- ful features are identified, such that when the probability of one of the diagnoses reaches a predetermined level of acceptability, the process is stopped, and the diagnosis is accepted. It may be pos- sible to achieve an acceptable level of certainty of the diagnosis with only a few features without having to process the entire feature set. Therefore, the physician can be considered a classifier of sorts. In this study, we designed an AI-based system using machine learning to extract clinically relevant features from EHR notes to mimic the clinical reasoning of human physicians. In medicine, machine learning methods have already demonstrated strong per- formance in image-based diagnoses, notably in radiology2 , derma- tology4 , and ophthalmology5–8 , but analysis of EHR data presents a number of difficult challenges. These challenges include the vast quantity of data, high dimensionality, data sparsity, and deviations Evaluation and accurate diagnoses of pediatric diseases using artificial intelligence Huiying Liang1,8 , Brian Y. Tsui 2,8 , Hao Ni3,8 , Carolina C. S. Valentim4,8 , Sally L. Baxter 2,8 , Guangjian Liu1,8 , Wenjia Cai 2 , Daniel S. Kermany1,2 , Xin Sun1 , Jiancong Chen2 , Liya He1 , Jie Zhu1 , Pin Tian2 , Hua Shao2 , Lianghong Zheng5,6 , Rui Hou5,6 , Sierra Hewett1,2 , Gen Li1,2 , Ping Liang3 , Xuan Zang3 , Zhiqi Zhang3 , Liyan Pan1 , Huimin Cai5,6 , Rujuan Ling1 , Shuhua Li1 , Yongwang Cui1 , Shusheng Tang1 , Hong Ye1 , Xiaoyan Huang1 , Waner He1 , Wenqing Liang1 , Qing Zhang1 , Jianmin Jiang1 , Wei Yu1 , Jianqun Gao1 , Wanxing Ou1 , Yingmin Deng1 , Qiaozhen Hou1 , Bei Wang1 , Cuichan Yao1 , Yan Liang1 , Shu Zhang1 , Yaou Duan2 , Runze Zhang2 , Sarah Gibson2 , Charlotte L. Zhang2 , Oulan Li2 , Edward D. Zhang2 , Gabriel Karin2 , Nathan Nguyen2 , Xiaokang Wu1,2 , Cindy Wen2 , Jie Xu2 , Wenqin Xu2 , Bochu Wang2 , Winston Wang2 , Jing Li1,2 , Bianca Pizzato2 , Caroline Bao2 , Daoman Xiang1 , Wanting He1,2 , Suiqin He2 , Yugui Zhou1,2 , Weldon Haw2,7 , Michael Goldbaum2 , Adriana Tremoulet2 , Chun-Nan Hsu 2 , Hannah Carter2 , Long Zhu3 , Kang Zhang 1,2,7 * and Huimin Xia 1 * NATURE MEDICINE | www.nature.com/naturemedicine •소아 환자 130만 명의 EMR 데이터 101.6 million 개 분석 •딥러닝 기반의 자연어 처리 기술 •의사의 hypothetico-deductive reasoning 모방 •소아 환자의 common disease를 진단하는 인공지능 Nat Med 2019 Feb
  • 46.
    •복잡한 의료 데이터의분석 및 insight 도출 •영상 의료/병리 데이터의 분석/판독 •연속 데이터의 모니터링 및 예방/예측 의료 인공지능의 세 유형
  • 47.
  • 48.
    인공지능 기계학습 딥러닝 전문가 시스템 사이버네틱스 … 인공신경망 결정트리 서포트 벡터 머신 … 컨볼루션 신경망 (CNN) 순환 신경망(RNN) … 인공지능과 딥러닝의 관계 베이즈 네트워크
  • 49.
  • 50.
    “인공지능이 인간만큼 의료영상을 잘 분석한다는 논문은 이제 받지 않겠다. 이미 충분히 증명되었기 때문이다.”
  • 51.
    Clinical Impact! • 인공지능의의학적인 효용을 어떻게 보여줄 것인가 • ‘정확도 높다’ ➔ 환자의 치료 성과 개선 • ‘정확도 높다’ ➔ 의사와의 시너지 (정확성, 효율, 비용 등) • ‘하나의 질병’ ➔ ‘전체 질병’
 • 후향적 연구 / 내부 검증 ➔ 전향적 RCT ➔ 진료 현장에서 활용 • 인간의 지각 능력으로는 불가능한 것
  • 52.
    NATURE MEDICINE and thealgorithm led to the best accuracy, and the algorithm mark- edly sped up the review of slides35 . This study is particularly notable, 41 Table 2 | FDA AI approvals are accelerating Company FDA Approval Indication Apple September 2018 Atrial fibrillation detection Aidoc August 2018 CT brain bleed diagnosis iCAD August 2018 Breast density via mammography Zebra Medical July 2018 Coronary calcium scoring Bay Labs June 2018 Echocardiogram EF determination Neural Analytics May 2018 Device for paramedic stroke diagnosis IDx April 2018 Diabetic retinopathy diagnosis Icometrix April 2018 MRI brain interpretation Imagen March 2018 X-ray wrist fracture diagnosis Viz.ai February 2018 CT stroke diagnosis Arterys February 2018 Liver and lung cancer (MRI, CT) diagnosis MaxQ-AI January 2018 CT brain bleed diagnosis Alivecor November 2017 Atrial fibrillation detection via Apple Watch Arterys January 2017 MRI heart interpretation NATURE MEDICINE 인공지능 기반 의료기기 
 FDA 인허가 현황 Nature Medicine 2019 • Zebra Medical Vision • 2019년 5월: 흉부 엑스레이에서 기흉 triage • 2019년 6월: head CT 에서 뇌출혈 판독 • Aidoc • 2019년 5월: CT에서 폐색전증 판독 • 2019년 6월: CT에서 경추골절 판독 • GE 헬스케어 • 2019년 9월: 흉부 엑스레이 기기에서 기흉 triage +
  • 53.
    인공지능 기반 의료기기
 국내 인허가 현황 • 1. 뷰노 본에이지 (2등급 허가) • 2. 루닛 인사이트 폐결절 (2등급 허가) • 3. JLK인스펙션 뇌경색 (3등급 허가) • 4. 인포메디텍 뉴로아이 (2등급 인증): MRI 기반 치매 진단 보조
 • 5. 삼성전자 폐결절 (2등급 허가) • 6. 뷰노 딥브레인 (2등급 인증) • 7. 루닛 인사이트 MMG (3등급 허가) • 8. JLK인스펙션 ATROSCAN (2등급 인증) 건강검진용 뇌 노화 측정 • 9. 뷰노 체스트엑스레이 (2등급 허가) • 10. 딥노이드 딥스파인 (2등급 허가): X-ray 요추 압박골절 검출보조 • 11. JLK 인스펙션 폐 CT(JLD-01A) (2등급 인증) • 12. JLK 인스펙션 대장내시경 (JFD-01A) (2등급 인증) • 13. JLK 인스펙션 위내시경 (JFD-02A) (2등급 인증) • 14. 루닛 인사이트 CXR (2등급 허가): 흉부 엑스레이에서 이상부위 검출 보조
 • 15. 뷰노 Fundus AI (3등급 허가): 안저 사진 분석, 12가지 이상 소견 유무 • 16. 딥바이오 DeepDx-Prostate: 전립선 조직 생검으로 암진단 보조 • 17. 뷰노 LungCT (2등급 허가): CT 영상 기반 폐결절 검출 인공지능 2018년 2019년 2020년
  • 54.
  • 55.
    •손 엑스레이 영상을판독하여 환자의 골연령 (뼈 나이)를 계산해주는 인공지능 • 기존에 의사는 그룰리히-파일(Greulich-Pyle)법 등으로 표준 사진과 엑스레이를 비교하여 판독 • 인공지능은 참조표준영상에서 성별/나이별 패턴을 찾아서 유사성을 확률로 표시 + 표준 영상 검색 •의사가 성조숙증이나 저성장을 진단하는데 도움을 줄 수 있음
  • 56.
    - 1 - 보도 자 료 국내에서 개발한 인공지능(AI) 기반 의료기기 첫 허가 - 인공지능 기술 활용하여 뼈 나이 판독한다 - 식품의약품안전처 처장 류영진 는 국내 의료기기업체 주 뷰노가 개발한 인공지능 기술이 적용된 의료영상분석장치소프트웨어 뷰노메드 본에이지 를 월 일 허가했다고 밝혔습니다 이번에 허가된 뷰노메드 본에이지 는 인공지능 이 엑스레이 영상을 분석하여 환자의 뼈 나이를 제시하고 의사가 제시된 정보 등으로 성조숙증이나 저성장을 진단하는데 도움을 주는 소프트웨어입니다 그동안 의사가 환자의 왼쪽 손 엑스레이 영상을 참조표준영상 과 비교하면서 수동으로 뼈 나이를 판독하던 것을 자동화하여 판독시간을 단축하였습니다 이번 허가 제품은 년 월부터 빅데이터 및 인공지능 기술이 적용된 의료기기의 허가 심사 가이드라인 적용 대상으로 선정되어 임상시험 설계에서 허가까지 맞춤 지원하였습니다 뷰노메드 본에이지 는 환자 왼쪽 손 엑스레이 영상을 분석하여 의 료인이 환자 뼈 나이를 판단하는데 도움을 주기 위한 목적으로 허가되었습니다 - 2 - 분석은 인공지능이 촬영된 엑스레이 영상의 패턴을 인식하여 성별 남자 개 여자 개 로 분류된 뼈 나이 모델 참조표준영상에서 성별 나이별 패턴을 찾아 유사성을 확률로 표시하면 의사가 확률값 호르몬 수치 등의 정보를 종합하여 성조숙증이나 저성장을 진단합 니다 임상시험을 통해 제품 정확도 성능 를 평가한 결과 의사가 판단한 뼈 나이와 비교했을 때 평균 개월 차이가 있었으며 제조업체가 해당 제품 인공지능이 스스로 인지 학습할 수 있도록 영상자료를 주기적으로 업데이트하여 의사와의 오차를 좁혀나갈 수 있도록 설계되었습니다 인공지능 기반 의료기기 임상시험계획 승인건수는 이번에 허가받은 뷰노메드 본에이지 를 포함하여 현재까지 건입니다 임상시험이 승인된 인공지능 기반 의료기기는 자기공명영상으로 뇌경색 유형을 분류하는 소프트웨어 건 엑스레이 영상을 통해 폐결절 진단을 도와주는 소프트웨어 건 입니다 참고로 식약처는 인공지능 가상현실 프린팅 등 차 산업과 관련된 의료기기 신속한 개발을 지원하기 위하여 제품 연구 개발부터 임상시험 허가에 이르기까지 전 과정을 맞춤 지원하는 차세대 프로젝트 신개발 의료기기 허가도우미 등을 운영하고 있 습니다 식약처는 이번 제품 허가를 통해 개개인의 뼈 나이를 신속하게 분석 판정하는데 도움을 줄 수 있을 것이라며 앞으로도 첨단 의료기기 개발이 활성화될 수 있도록 적극적으로 지원해 나갈 것이라고 밝혔습니다
  • 57.
    저는 뷰노의 자문을맡고 있으며, 지분 관계가 있음을 밝힙니다
  • 58.
    AJR:209, December 20171 Since 1992, concerns regarding interob- server variability in manual bone age esti- mation [4] have led to the establishment of several automatic computerized methods for bone age estimation, including computer-as- sisted skeletal age scores, computer-aided skeletal maturation assessment systems, and BoneXpert (Visiana) [5–14]. BoneXpert was developed according to traditional machine- learning techniques and has been shown to have a good performance for patients of var- ious ethnicities and in various clinical set- tings [10–14]. The deep-learning technique is an improvement in artificial neural net- works. Unlike traditional machine-learning techniques, deep-learning techniques allow an algorithm to program itself by learning from the images given a large dataset of la- beled examples, thus removing the need to specify rules [15]. Deep-learning techniques permit higher levels of abstraction and improved predic- tions from data. Deep-learning techniques Computerized Bone Age Estimation Using Deep Learning– Based Program: Evaluation of the Accuracy and Efficiency Jeong Rye Kim1 Woo Hyun Shim1 Hee Mang Yoon1 Sang Hyup Hong1 Jin Seong Lee1 Young Ah Cho1 Sangki Kim2 Kim JR, Shim WH, Yoon MH, et al. 1 Department of Radiology and Research Institute of Radiology, Asan Medical Center, University of Ulsan College of Medicine, 88 Olympic-ro 43-gil, Songpa-gu, Seoul 05505, South Korea. Address correspondence to H. M. Yoon (espoirhm@gmail.com). 2 Vuno Research Center, Vuno Inc., Seoul, South Korea. Pediatric Imaging • Original Research Supplemental Data Available online at www.ajronline.org. AJR 2017; 209:1–7 0361–803X/17/2096–1 © American Roentgen Ray Society B one age estimation is crucial for developmental status determina- tions and ultimate height predic- tions in the pediatric population, particularly for patients with growth disor- ders and endocrine abnormalities [1]. Two major left-hand wrist radiograph-based methods for bone age estimation are current- ly used: the Greulich-Pyle [2] and Tanner- Whitehouse [3] methods. The former is much more frequently used in clinical practice. Greulich-Pyle–based bone age estimation is performed by comparing a patient’s left-hand radiograph to standard radiographs in the Greulich-Pyle atlas and is therefore simple and easily applied in clinical practice. How- ever, the process of bone age estimation, which comprises a simple comparison of multiple images, can be repetitive and time consuming and is thus sometimes burden- some to radiologists. Moreover, the accuracy depends on the radiologist’s experience and tends to be subjective. Keywords: bone age, children, deep learning, neural network model DOI:10.2214/AJR.17.18224 J. R. Kim and W. H. Shim contributed equally to this work. Received March 12, 2017; accepted after revision July 7, 2017. S. Kim is employed by Vuno, Inc., which created the deep learning–based automatic software system for bone age determination. J. R. Kim, W. H. Shim, H. M. Yoon, S. H. Hong, J. S. Lee, and Y. A. Cho are employed by Asan Medical Center, which holds patent rights for the deep learning–based automatic software system for bone age assessment. OBJECTIVE. The purpose of this study is to evaluate the accuracy and efficiency of a new automatic software system for bone age assessment and to validate its feasibility in clini- cal practice. MATERIALS AND METHODS. A Greulich-Pyle method–based deep-learning tech- nique was used to develop the automatic software system for bone age determination. Using this software, bone age was estimated from left-hand radiographs of 200 patients (3–17 years old) using first-rank bone age (software only), computer-assisted bone age (two radiologists with software assistance), and Greulich-Pyle atlas–assisted bone age (two radiologists with Greulich-Pyle atlas assistance only). The reference bone age was determined by the consen- sus of two experienced radiologists. RESULTS. First-rank bone ages determined by the automatic software system showed a 69.5% concordance rate and significant correlations with the reference bone age (r = 0.992; p 0.001). Concordance rates increased with the use of the automatic software system for both reviewer 1 (63.0% for Greulich-Pyle atlas–assisted bone age vs 72.5% for computer-as- sisted bone age) and reviewer 2 (49.5% for Greulich-Pyle atlas–assisted bone age vs 57.5% for computer-assisted bone age). Reading times were reduced by 18.0% and 40.0% for reviewers 1 and 2, respectively. CONCLUSION. Automatic software system showed reliably accurate bone age estima- tions and appeared to enhance efficiency by reducing reading times without compromising the diagnostic accuracy. Kim et al. Accuracy and Efficiency of Computerized Bone Age Estimation Pediatric Imaging Original Research Downloadedfromwww.ajronline.orgbyFloridaAtlanticUnivon09/13/17fromIPaddress131.91.169.193.CopyrightARRS.Forpersonaluseonly;allrightsreserved • 총 환자의 수: 200명 • 레퍼런스: 경험 많은 소아영상의학과 전문의 2명(18년, 4년 경력)의 컨센서스 • 의사A: 소아영상 세부전공한 영상의학 전문의 (500례 이상의 판독 경험) • 의사B: 영상의학과 2년차 전공의 (판독법 하루 교육 이수 + 20례 판독) • 인공지능: VUNO의 골연령 판독 딥러닝 AJR Am J Roentgenol. 2017 Dec;209(6):1374-1380.
  • 59.
    40 50 60 70 80 인공지능 의사 A의사 B 40 50 60 70 80 의사 A 
 + 인공지능 의사 B 
 + 인공지능 69.5% 63% 49.5% 72.5% 57.5% 정확도(%) 영상의학과 펠로우 (소아영상 세부전공) 영상의학과 2년차 전공의 인공지능 vs 의사 인공지능 + 의사 AJR Am J Roentgenol. 2017 Dec;209(6):1374-1380. • 총 환자의 수: 200명 • 의사A: 소아영상 세부전공한 영상의학 전문의 (500례 이상의 판독 경험) • 의사B: 영상의학과 2년차 전공의 (판독법 하루 교육 이수 + 20례 판독) • 레퍼런스: 경험 많은 소아영상의학과 전문의 2명(18년, 4년 경력)의 컨센서스 • 인공지능: VUNO의 골연령 판독 딥러닝 골연령 판독에 인간 의사와 인공지능의 시너지 효과 Digital Healthcare Institute Director,Yoon Sup Choi, PhD yoonsup.choi@gmail.com
  • 60.
    총 판독 시간(m) 0 50 100 150 200 w/o AI w/ AI 0 50 100 150 200 w/o AI w/ AI 188m 154m 180m 108m saving 40% of time saving 18% of time 의사 A 의사 B 골연령 판독에서 인공지능을 활용하면 판독 시간의 절감도 가능 • 총 환자의 수: 200명 • 의사A: 소아영상 세부전공한 영상의학 전문의 (500례 이상의 판독 경험) • 의사B: 영상의학과 2년차 전공의 (판독법 하루 교육 이수 + 20례 판독) • 레퍼런스: 경험 많은 소아영상의학과 전문의 2명(18년, 4년 경력)의 컨센서스 • 인공지능: VUNO의 골연령 판독 딥러닝 AJR Am J Roentgenol. 2017 Dec;209(6):1374-1380. Digital Healthcare Institute Director,Yoon Sup Choi, PhD yoonsup.choi@gmail.com
  • 61.
    This copy isfor personal use only. To order printed copies, contact reprints@rsna.org This copy is for personal use only. To order printed copies, contact reprints@rsna.org ORIGINAL RESEARCH • THORACIC IMAGING hest radiography, one of the most common diagnos- intraobserver agreements because of its limited spatial reso- Development and Validation of Deep Learning–based Automatic Detection Algorithm for Malignant Pulmonary Nodules on Chest Radiographs Ju Gang Nam, MD* • Sunggyun Park, PhD* • Eui Jin Hwang, MD • Jong Hyuk Lee, MD • Kwang-Nam Jin, MD, PhD • KunYoung Lim, MD, PhD • Thienkai HuyVu, MD, PhD • Jae Ho Sohn, MD • Sangheum Hwang, PhD • Jin Mo Goo, MD, PhD • Chang Min Park, MD, PhD From the Department of Radiology and Institute of Radiation Medicine, Seoul National University Hospital and College of Medicine, 101 Daehak-ro, Jongno-gu, Seoul 03080, Republic of Korea (J.G.N., E.J.H., J.M.G., C.M.P.); Lunit Incorporated, Seoul, Republic of Korea (S.P.); Department of Radiology, Armed Forces Seoul Hospital, Seoul, Republic of Korea (J.H.L.); Department of Radiology, Seoul National University Boramae Medical Center, Seoul, Republic of Korea (K.N.J.); Department of Radiology, National Cancer Center, Goyang, Republic of Korea (K.Y.L.); Department of Radiology and Biomedical Imaging, University of California, San Francisco, San Francisco, Calif (T.H.V., J.H.S.); and Department of Industrial Information Systems Engineering, Seoul National University of Science and Technology, Seoul, Republic of Korea (S.H.). Received January 30, 2018; revision requested March 20; revision received July 29; accepted August 6. Address correspondence to C.M.P. (e-mail: cmpark.morphius@gmail.com). Study supported by SNUH Research Fund and Lunit (06–2016–3000) and by Seoul Research and Business Development Program (FI170002). *J.G.N. and S.P. contributed equally to this work. Conflicts of interest are listed at the end of this article. Radiology 2018; 00:1–11 • https://doi.org/10.1148/radiol.2018180237 • Content codes: Purpose: To develop and validate a deep learning–based automatic detection algorithm (DLAD) for malignant pulmonary nodules on chest radiographs and to compare its performance with physicians including thoracic radiologists. Materials and Methods: For this retrospective study, DLAD was developed by using 43292 chest radiographs (normal radiograph– to–nodule radiograph ratio, 34067:9225) in 34676 patients (healthy-to-nodule ratio, 30784:3892; 19230 men [mean age, 52.8 years; age range, 18–99 years]; 15446 women [mean age, 52.3 years; age range, 18–98 years]) obtained between 2010 and 2015, which were labeled and partially annotated by 13 board-certified radiologists, in a convolutional neural network. Radiograph clas- sification and nodule detection performances of DLAD were validated by using one internal and four external data sets from three South Korean hospitals and one U.S. hospital. For internal and external validation, radiograph classification and nodule detection performances of DLAD were evaluated by using the area under the receiver operating characteristic curve (AUROC) and jackknife alternative free-response receiver-operating characteristic (JAFROC) figure of merit (FOM), respectively. An observer performance test involving 18 physicians, including nine board-certified radiologists, was conducted by using one of the four external validation data sets. Performances of DLAD, physicians, and physicians assisted with DLAD were evaluated and compared. Results: According to one internal and four external validation data sets, radiograph classification and nodule detection perfor- mances of DLAD were a range of 0.92–0.99 (AUROC) and 0.831–0.924 (JAFROC FOM), respectively. DLAD showed a higher AUROC and JAFROC FOM at the observer performance test than 17 of 18 and 15 of 18 physicians, respectively (P , .05), and all physicians showed improved nodule detection performances with DLAD (mean JAFROC FOM improvement, 0.043; range, 0.006–0.190; P , .05). Conclusion: This deep learning–based automatic detection algorithm outperformed physicians in radiograph classification and nod- ule detection performance for malignant pulmonary nodules on chest radiographs, and it enhanced physicians’ performances when used as a second reader. ©RSNA, 2018 Online supplemental material is available for this article.
  • 62.
    This copy isfor personal use only. To order printed copies, contact reprints@rsna.org This copy is for personal use only. To order printed copies, contact reprints@rsna.org ORIGINAL RESEARCH • THORACIC IMAGING hest radiography, one of the most common diagnos- intraobserver agreements because of its limited spatial reso- Development and Validation of Deep Learning–based Automatic Detection Algorithm for Malignant Pulmonary Nodules on Chest Radiographs Ju Gang Nam, MD* • Sunggyun Park, PhD* • Eui Jin Hwang, MD • Jong Hyuk Lee, MD • Kwang-Nam Jin, MD, PhD • KunYoung Lim, MD, PhD • Thienkai HuyVu, MD, PhD • Jae Ho Sohn, MD • Sangheum Hwang, PhD • Jin Mo Goo, MD, PhD • Chang Min Park, MD, PhD From the Department of Radiology and Institute of Radiation Medicine, Seoul National University Hospital and College of Medicine, 101 Daehak-ro, Jongno-gu, Seoul 03080, Republic of Korea (J.G.N., E.J.H., J.M.G., C.M.P.); Lunit Incorporated, Seoul, Republic of Korea (S.P.); Department of Radiology, Armed Forces Seoul Hospital, Seoul, Republic of Korea (J.H.L.); Department of Radiology, Seoul National University Boramae Medical Center, Seoul, Republic of Korea (K.N.J.); Department of Radiology, National Cancer Center, Goyang, Republic of Korea (K.Y.L.); Department of Radiology and Biomedical Imaging, University of California, San Francisco, San Francisco, Calif (T.H.V., J.H.S.); and Department of Industrial Information Systems Engineering, Seoul National University of Science and Technology, Seoul, Republic of Korea (S.H.). Received January 30, 2018; revision requested March 20; revision received July 29; accepted August 6. Address correspondence to C.M.P. (e-mail: cmpark.morphius@gmail.com). Study supported by SNUH Research Fund and Lunit (06–2016–3000) and by Seoul Research and Business Development Program (FI170002). *J.G.N. and S.P. contributed equally to this work. Conflicts of interest are listed at the end of this article. Radiology 2018; 00:1–11 • https://doi.org/10.1148/radiol.2018180237 • Content codes: Purpose: To develop and validate a deep learning–based automatic detection algorithm (DLAD) for malignant pulmonary nodules on chest radiographs and to compare its performance with physicians including thoracic radiologists. Materials and Methods: For this retrospective study, DLAD was developed by using 43292 chest radiographs (normal radiograph– to–nodule radiograph ratio, 34067:9225) in 34676 patients (healthy-to-nodule ratio, 30784:3892; 19230 men [mean age, 52.8 years; age range, 18–99 years]; 15446 women [mean age, 52.3 years; age range, 18–98 years]) obtained between 2010 and 2015, which were labeled and partially annotated by 13 board-certified radiologists, in a convolutional neural network. Radiograph clas- sification and nodule detection performances of DLAD were validated by using one internal and four external data sets from three South Korean hospitals and one U.S. hospital. For internal and external validation, radiograph classification and nodule detection performances of DLAD were evaluated by using the area under the receiver operating characteristic curve (AUROC) and jackknife alternative free-response receiver-operating characteristic (JAFROC) figure of merit (FOM), respectively. An observer performance test involving 18 physicians, including nine board-certified radiologists, was conducted by using one of the four external validation data sets. Performances of DLAD, physicians, and physicians assisted with DLAD were evaluated and compared. Results: According to one internal and four external validation data sets, radiograph classification and nodule detection perfor- mances of DLAD were a range of 0.92–0.99 (AUROC) and 0.831–0.924 (JAFROC FOM), respectively. DLAD showed a higher AUROC and JAFROC FOM at the observer performance test than 17 of 18 and 15 of 18 physicians, respectively (P , .05), and all physicians showed improved nodule detection performances with DLAD (mean JAFROC FOM improvement, 0.043; range, 0.006–0.190; P , .05). Conclusion: This deep learning–based automatic detection algorithm outperformed physicians in radiograph classification and nod- ule detection performance for malignant pulmonary nodules on chest radiographs, and it enhanced physicians’ performances when used as a second reader. ©RSNA, 2018 Online supplemental material is available for this article. • 43,292 chest PA (normal:nodule=34,067:9225) • labeled/annotated by 13 board-certified radiologists. • DLAD were validated 1 internal + 4 external datasets • 서울대병원 / 보라매병원 / 국립암센터 / UCSF • Classification / Lesion localization • 인공지능 vs. 의사 vs. 인공지능+의사 • 다양한 수준의 의사와 비교 • Non-radiology / radiology residents • Board-certified radiologist / Thoracic radiologists
  • 63.
    Nam et al Figure1: Images in a 78-year-old female patient with a 1.9-cm part-solid nodule at the left upper lobe. (a) The nodule was faintly visible on the chest radiograph (arrowheads) and was detected by 11 of 18 observers. (b) At contrast-enhanced CT examination, biopsy confirmed lung adeno- carcinoma (arrow). (c) DLAD reported the nodule with a confidence level of 2, resulting in its detection by an additional five radiologists and an elevation in its confidence by eight radiologists. Figure 2: Images in a 64-year-old male patient with a 2.2-cm lung adenocarcinoma at the left upper lobe. (a) The nodule was faintly visible on the chest radiograph (arrowheads) and was detected by seven of 18 observers. (b) Biopsy confirmed lung adenocarcinoma in the left upper lobe on contrast-enhanced CT image (arrow). (c) DLAD reported the nodule with a confidence level of 2, resulting in its detection by an additional two radiologists and an elevated confidence level of the nodule by two radiologists.
  • 64.
    Deep Learning AutomaticDetection Algorithm for Malignant Pulmonary Nodules Table 3: Patient Classification and Nodule Detection at the Observer Performance Test Observer Test 1 DLAD versus Test 1 (P Value) Test 2 Test 1 versus Test 2 (P Value) Radiograph Classification (AUROC) Nodule Detection (JAFROC FOM) Radiograph Classification Nodule Detection Radiograph Classification (AUROC) Nodule Detection (JAFROC FOM) Radiograph Classification Nodule Detection Nonradiology physicians Observer 1 0.77 0.716 ,.001 ,.001 0.91 0.853 ,.001 ,.001 Observer 2 0.78 0.657 ,.001 ,.001 0.90 0.846 ,.001 ,.001 Observer 3 0.80 0.700 ,.001 ,.001 0.88 0.783 ,.001 ,.001 Group 0.691 ,.001* 0.828 ,.001* Radiology residents Observer 4 0.78 0.767 ,.001 ,.001 0.80 0.785 .02 .03 Observer 5 0.86 0.772 .001 ,.001 0.91 0.837 .02 ,.001 Observer 6 0.86 0.789 .05 .002 0.86 0.799 .08 .54 Observer 7 0.84 0.807 .01 .003 0.91 0.843 .003 .02 Observer 8 0.87 0.797 .10 .003 0.90 0.845 .03 .001 Observer 9 0.90 0.847 .52 .12 0.92 0.867 .04 .03 Group 0.790 ,.001* 0.867 ,.001* Board-certified radiologists Observer 10 0.87 0.836 .05 .01 0.90 0.865 .004 .002 Observer 11 0.83 0.804 ,.001 ,.001 0.84 0.817 .03 .04 Observer 12 0.88 0.817 .18 .005 0.91 0.841 .01 .01 Observer 13 0.91 0.824 ..99 .02 0.92 0.836 .51 .24 Observer 14 0.88 0.834 .14 .03 0.88 0.840 .87 .23 Group 0.821 .02* 0.840 .01* Thoracic radiologists Observer 15 0.94 0.856 .15 .21 0.96 0.878 .08 .03 Observer 16 0.92 0.854 .60 .17 0.93 0.872 .34 .02 Observer 17 0.86 0.820 .02 .01 0.88 0.838 .14 .12 Observer 18 0.84 0.800 ,.001 ,.001 0.87 0.827 .02 .02 Group 0.833 .08* 0.854 ,.001* Note.—Observer 4 had 1 year of experience; observers 5 and 6 had 2 years of experience; observers 7–9 had 3 years of experience; observers 10–12 had 7 years of experience; observers 13 and 14 had 8 years of experience; observer 15 had 26 years of experience; observer 16 had 13 years of experience; and observers 17 and 18 had 9 years of experience. Observers 1–3 were 4th-year residents from obstetrics and gynecolo- 의사 인공지능 vs. 의사만 (p value) 의사+인공지능 의사 vs. 의사+인공지능 (p value) 영상의학과 1년차 전공의 영상의학과 2년차 전공의 영상의학과 3년차 전공의 산부인과 4년차 전공의 정형외과 4년차 전공의 내과 4년차 전공의 영상의학과 전문의 7년 경력 8년 경력 영상의학과 전문의 (흉부) 26년 경력 13년 경력 9년 경력 영상의학과 전공의 비영상의학과 의사
  • 65.
    Deep Learning AutomaticDetection Algorithm for Malignant Pulmonary Nodules Table 3: Patient Classification and Nodule Detection at the Observer Performance Test Observer Test 1 DLAD versus Test 1 (P Value) Test 2 Test 1 versus Test 2 (P Value) Radiograph Classification (AUROC) Nodule Detection (JAFROC FOM) Radiograph Classification Nodule Detection Radiograph Classification (AUROC) Nodule Detection (JAFROC FOM) Radiograph Classification Nodule Detection Nonradiology physicians Observer 1 0.77 0.716 ,.001 ,.001 0.91 0.853 ,.001 ,.001 Observer 2 0.78 0.657 ,.001 ,.001 0.90 0.846 ,.001 ,.001 Observer 3 0.80 0.700 ,.001 ,.001 0.88 0.783 ,.001 ,.001 Group 0.691 ,.001* 0.828 ,.001* Radiology residents Observer 4 0.78 0.767 ,.001 ,.001 0.80 0.785 .02 .03 Observer 5 0.86 0.772 .001 ,.001 0.91 0.837 .02 ,.001 Observer 6 0.86 0.789 .05 .002 0.86 0.799 .08 .54 Observer 7 0.84 0.807 .01 .003 0.91 0.843 .003 .02 Observer 8 0.87 0.797 .10 .003 0.90 0.845 .03 .001 Observer 9 0.90 0.847 .52 .12 0.92 0.867 .04 .03 Group 0.790 ,.001* 0.867 ,.001* Board-certified radiologists Observer 10 0.87 0.836 .05 .01 0.90 0.865 .004 .002 Observer 11 0.83 0.804 ,.001 ,.001 0.84 0.817 .03 .04 Observer 12 0.88 0.817 .18 .005 0.91 0.841 .01 .01 Observer 13 0.91 0.824 ..99 .02 0.92 0.836 .51 .24 Observer 14 0.88 0.834 .14 .03 0.88 0.840 .87 .23 Group 0.821 .02* 0.840 .01* Thoracic radiologists Observer 15 0.94 0.856 .15 .21 0.96 0.878 .08 .03 Observer 16 0.92 0.854 .60 .17 0.93 0.872 .34 .02 Observer 17 0.86 0.820 .02 .01 0.88 0.838 .14 .12 Observer 18 0.84 0.800 ,.001 ,.001 0.87 0.827 .02 .02 Group 0.833 .08* 0.854 ,.001* Note.—Observer 4 had 1 year of experience; observers 5 and 6 had 2 years of experience; observers 7–9 had 3 years of experience; observers 10–12 had 7 years of experience; observers 13 and 14 had 8 years of experience; observer 15 had 26 years of experience; observer 16 had 13 years of experience; and observers 17 and 18 had 9 years of experience. Observers 1–3 were 4th-year residents from obstetrics and gynecolo- 의사 인공지능 vs. 의사만 (p value) 의사+인공지능 의사 vs. 의사+인공지능 (p value) 영상의학과 1년차 전공의 영상의학과 2년차 전공의 영상의학과 3년차 전공의 산부인과 4년차 전공의 정형외과 4년차 전공의 내과 4년차 전공의 영상의학과 전문의 7년 경력 8년 경력 영상의학과 전문의 (흉부) 26년 경력 13년 경력 9년 경력 영상의학과 전공의 비영상의학과 의사 •인공지능을 second reader로 활용하면 정확도가 개선 •classification: 17 of 18 명이 개선 (15 of 18, P0.05) •nodule detection: 18 of 18 명이 개선 (14 of 18, P0.05)
  • 66.
    HUMAN ONLY HUMAN +ALGORITHM HUMAN ONLY HUMAN + ALGORITHM HUMAN ONLY HUMAN + ALGORITHM HUMAN ONLY HUMAN + ALGORITHM HUMAN ONLY HUMAN + ALGORITHM Clinical Study Results (CXR Nodule) - Radiology On the courtesy of Lunit, Inc
  • 67.
    Pathology 1. CAD: ComputerAided Detection 2. Triage: Prioritization of Critical cases 3. Image driven biomarker
  • 68.
    A B DC Benignwithout atypia / Atypic / DCIS (ductal carcinoma in situ) / Invasive Carcinoma Interpretation? Elmore etl al. JAMA 2015 Diagnostic Concordance Among Pathologists 유방암 병리 데이터 판독하기
  • 69.
    Figure 4. ParticipatingPathologists’ Interpretations of Each of the 240 Breast Biopsy Test Cases 0 25 50 75 100 Interpretations, % 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 Case Benign without atypia 72 Cases 2070 Total interpretations A 0 25 50 75 100 Interpretations, % 218 220 222 224 226 228 230 232 234 236 238 240 Case Invasive carcinoma 23 Cases 663 Total interpretations D 0 25 50 75 100 Interpretations, % 147 145 149 151 153 155 157 159 161 163 165 167 169 171 173 175 177 179 181 183 185 187 189 191 193 195 197 199 201 203 205 207 209 211 213 215 217 Case DCIS 73 Cases 2097 Total interpretations C 0 25 50 75 100 Interpretations, % 74 76 78 80 82 84 86 88 90 92 94 96 98 100 102 104 106 108 110 112 114 116 118 120 122 124 126 128 130 132 134 136 138 140 142 144 Case Atypia 72 Cases 2070 Total interpretations B Benign without atypia Atypia DCIS Invasive carcinoma Pathologist interpretation DCIS indicates ductal carcinoma in situ. Diagnostic Concordance in Interpreting Breast Biopsies Original Investigation Research Elmore etl al. JAMA 2015 유방암 판독에 대한 병리학과 전문의들의 불일치도
  • 70.
  • 71.
    Downloadedfromhttps://journals.lww.com/ajspbyBhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4XMi0hCywCX1AWnYQp/IlQrHD3MyLIZIvnCFZVJ56DGsD590P5lh5KqE20T/dBX3x9CoM=on10/14/2018 Downloadedfromhttps://journals.lww.com/ajspbyBhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4XMi0hCywCX1AWnYQp/IlQrHD3MyLIZIvnCFZVJ56DGsD590P5lh5KqE20T/dBX3x9CoM=on10/14/2018 Impact of DeepLearning Assistance on the Histopathologic Review of Lymph Nodes for Metastatic Breast Cancer David F. Steiner, MD, PhD,* Robert MacDonald, PhD,* Yun Liu, PhD,* Peter Truszkowski, MD,* Jason D. Hipp, MD, PhD, FCAP,* Christopher Gammage, MS,* Florence Thng, MS,† Lily Peng, MD, PhD,* and Martin C. Stumpe, PhD* Abstract: Advances in the quality of whole-slide images have set the stage for the clinical use of digital images in anatomic pathology. Along with advances in computer image analysis, this raises the possibility for computer-assisted diagnostics in pathology to improve histopathologic interpretation and clinical care. To evaluate the potential impact of digital assistance on interpretation of digitized slides, we conducted a multireader multicase study utilizing our deep learning algorithm for the detection of breast cancer metastasis in lymph nodes. Six pathologists reviewed 70 digitized slides from lymph node sections in 2 reader modes, unassisted and assisted, with a wash- out period between sessions. In the assisted mode, the deep learning algorithm was used to identify and outline regions with high like- lihood of containing tumor. Algorithm-assisted pathologists demon- strated higher accuracy than either the algorithm or the pathologist alone. In particular, algorithm assistance significantly increased the sensitivity of detection for micrometastases (91% vs. 83%, P=0.02). In addition, average review time per image was significantly shorter with assistance than without assistance for both micrometastases (61 vs. 116 s, P=0.002) and negative images (111 vs. 137 s, P=0.018). Lastly, pathologists were asked to provide a numeric score regarding the difficulty of each image classification. On the basis of this score, pathologists considered the image review of micrometastases to be significantly easier when interpreted with assistance (P=0.0005). Utilizing a proof of concept assistant tool, this study demonstrates the potential of a deep learning algorithm to improve pathologist accu- racy and efficiency in a digital pathology workflow. Key Words: artificial intelligence, machine learning, digital pathology, breast cancer, computer aided detection (Am J Surg Pathol 2018;00:000–000) The regulatory approval and gradual implementation of whole-slide scanners has enabled the digitization of glass slides for remote consults and archival purposes.1 Digitiza- tion alone, however, does not necessarily improve the con- sistency or efficiency of a pathologist’s primary workflow. In fact, image review on a digital medium can be slightly slower than on glass, especially for pathologists with limited digital pathology experience.2 However, digital pathology and image analysis tools have already demonstrated po- tential benefits, including the potential to reduce inter-reader variability in the evaluation of breast cancer HER2 status.3,4 Digitization also opens the door for assistive tools based on Artificial Intelligence (AI) to improve efficiency and con- sistency, decrease fatigue, and increase accuracy.5 Among AI technologies, deep learning has demon- strated strong performance in many automated image-rec- ognition applications.6–8 Recently, several deep learning– based algorithms have been developed for the detection of breast cancer metastases in lymph nodes as well as for other applications in pathology.9,10 Initial findings suggest that some algorithms can even exceed a pathologist’s sensitivity for detecting individual cancer foci in digital images. How- ever, this sensitivity gain comes at the cost of increased false positives, potentially limiting the utility of such algorithms for automated clinical use.11 In addition, deep learning algo- rithms are inherently limited to the task for which they have been specifically trained. While we have begun to understand the strengths of these algorithms (such as exhaustive search) and their weaknesses (sensitivity to poor optical focus, tumor mimics; manuscript under review), the potential clinical util- ity of such algorithms has not been thoroughly examined. While an accurate algorithm alone will not necessarily aid pathologists or improve clinical interpretation, these benefits may be achieved through thoughtful and appropriate in- tegration of algorithm predictions into the clinical workflow.8 From the *Google AI Healthcare; and †Verily Life Sciences, Mountain View, CA. D.F.S., R.M., and Y.L. are co-first authors (equal contribution). Work done as part of the Google Brain Healthcare Technology Fellowship (D.F.S. and P.T.). Conflicts of Interest and Source of Funding: D.F.S., R.M., Y.L., P.T., J.D.H., C.G., F.T., L.P., M.C.S. are employees of Alphabet and have Alphabet stock. Correspondence: David F. Steiner, MD, PhD, Google AI Healthcare, 1600 Amphitheatre Way, Mountain View, CA 94043 (e-mail: davesteiner@google.com). Supplemental Digital Content is available for this article. Direct URL citations appear in the printed text and are provided in the HTML and PDF versions of this article on the journal’s website, www.ajsp.com. Copyright © 2018 The Author(s). Published by Wolters Kluwer Health, Inc. This is an open-access article distributed under the terms of the Creative Commons Attribution-Non Commercial-No Derivatives License 4.0 (CCBY-NC-ND), where it is permissible to download and share the work provided it is properly cited. The work cannot be changed in any way or used commercially without permission from the journal. ORIGINAL ARTICLE Am J Surg Pathol Volume 00, Number 00, ’’ 2018 www.ajsp.com | 1 • 구글이 개발한 병리 인공지능, LYNA(LYmph Node Assistant) • 유방암의 림프절 전이에 대해서, • 병리학 전문의 + 인공지능의 시너지를 증명하는 연구 • 정확성(민감도) / 판독 시간 / (micrometa의) 판독 난이도
  • 72.
  • 73.
  • 74.
    •Some polyps weredetected with only partial appearance. •detected in both normal and insufficient light condition. •detected under both qualified and suboptimal bowel preparations. ARTICLESNATURE BIOMEDICAL ENGINEERING from patients who underwent colonoscopy examinations up to 2 years later. Also, we demonstrated high per-image-sensitivity (94.38% and 91.64%) in both the image (datasetA) and video (datasetC) analyses. DatasetsA and C included large variations of polyp mor- phology and image quality (Fig. 3, Supplementary Figs. 2–5 and Supplementary Videos 3 and 4). For images with only flat and iso- datasets are often small and do not represent the full range of colon conditions encountered in the clinical setting, and there are often discrepancies in the reporting of clinical metrics of success such as sensitivity and specificity19,20,26 . Compared with other metrics such as precision, we believe that sensitivity and specificity are the most appropriate metrics for the evaluation of algorithm performance because of their independence on the ratio of positive to negative Fig. 3 | Examples of polyp detection for datasetsA and C. Polyps of different morphology, including flat isochromatic polyps (left), dome-shaped polyps (second from left, middle), pedunculated polyps (second from right) and sessile serrated adenomatous polyps (right), were detected by the algorithm (as indicated by the green tags in the bottom set of images) in both normal and insufficient light conditions, under both qualified and suboptimal bowel preparations. Some polyps were detected with only partial appearance (middle, second from right). See Supplementary Figs 2–6 for additional examples. flat isochromatic polyps dome-shaped polyps sessile serrated adenomatous polypspedunculated polyps Examples of Polyp Detection for Datasets A and C
  • 75.
    1Wang P, et al. Gut2019;0:1–7. doi:10.1136/gutjnl-2018-317500 Endoscopy ORIGINAL ARTICLE Real-time automatic detection system increases colonoscopic polyp and adenoma detection rates: a prospective randomised controlled study Pu Wang,  1 Tyler M Berzin,  2 Jeremy Romek Glissen Brown,  2 Shishira Bharadwaj,2 Aymeric Becq,2 Xun Xiao,1 Peixi Liu,1 Liangping Li,1 Yan Song,1 Di Zhang,1 Yi Li,1 Guangre Xu,1 Mengtian Tu,1 Xiaogang Liu  1 To cite: Wang P, Berzin TM, Glissen Brown JR, et al. Gut Epub ahead of print: [please include Day Month Year]. doi:10.1136/ gutjnl-2018-317500 ► Additional material is published online only.To view please visit the journal online (http://dx.doi.org/10.1136/ gutjnl-2018-317500). 1 Department of Gastroenterology, Sichuan Academy of Medical Sciences Sichuan Provincial People’s Hospital, Chengdu, China 2 Center for Advanced Endoscopy, Beth Israel Deaconess Medical Center and Harvard Medical School, Boston, Massachusetts, USA Correspondence to Xiaogang Liu, Department of Gastroenterology Sichuan Academy of Medical Sciences and Sichuan Provincial People’s Hospital, Chengdu, China; Gary.samsph@gmail.com Received 30 August 2018 Revised 4 February 2019 Accepted 13 February 2019 © Author(s) (or their employer(s)) 2019. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ. ABSTRACT Objective The effect of colonoscopy on colorectal cancer mortality is limited by several factors, among them a certain miss rate, leading to limited adenoma detection rates (ADRs).We investigated the effect of an automatic polyp detection system based on deep learning on polyp detection rate and ADR. Design In an open, non-blinded trial, consecutive patients were prospectively randomised to undergo diagnostic colonoscopy with or without assistance of a real-time automatic polyp detection system providing a simultaneous visual notice and sound alarm on polyp detection.The primary outcome was ADR. Results Of 1058 patients included, 536 were randomised to standard colonoscopy, and 522 were randomised to colonoscopy with computer-aided diagnosis.The artificial intelligence (AI) system significantly increased ADR (29.1%vs20.3%, p0.001) and the mean number of adenomas per patient (0.53vs0.31, p0.001).This was due to a higher number of diminutive adenomas found (185vs102; p0.001), while there was no statistical difference in larger adenomas (77vs58, p=0.075). In addition, the number of hyperplastic polyps was also significantly increased (114vs52, p0.001). Conclusions In a low prevalent ADR population, an automatic polyp detection system during colonoscopy resulted in a significant increase in the number of diminutive adenomas detected, as well as an increase in the rate of hyperplastic polyps.The cost–benefit ratio of such effects has to be determined further. Trial registration number ChiCTR-DDD-17012221; Results. INTRODUCTION Colorectal cancer (CRC) is the second and third- leading causes of cancer-related deaths in men and women respectively.1 Colonoscopy is the gold stan- dard for screening CRC.2 3 Screening colonoscopy has allowed for a reduction in the incidence and mortality of CRC via the detection and removal of adenomatous polyps.4–8 Additionally, there is evidence that with each 1.0% increase in adenoma detection rate (ADR), there is an associated 3.0% decrease in the risk of interval CRC.9 10 However, polyps can be missed, with reported miss rates of up to 27% due to both polyp and operator charac- teristics.11 12 Unrecognised polyps within the visual field is an important problem to address.11 Several studies have shown that assistance by a second observer increases the polyp detection rate (PDR), but such a strategy remains controversial in terms of increasing the ADR.13–15 Ideally, a real-time automatic polyp detec- tion system, with performance close to that of expert endoscopists, could assist the endosco- pist in detecting lesions that might correspond to adenomas in a more consistent and reliable way Significance of this study What is already known on this subject? ► Colorectal adenoma detection rate (ADR) is regarded as a main quality indicator of (screening) colonoscopy and has been shown to correlate with interval cancers. Reducing adenoma miss rates by increasing ADR has been a goal of many studies focused on imaging techniques and mechanical methods. ► Artificial intelligence has been recently introduced for polyp and adenoma detection as well as differentiation and has shown promising results in preliminary studies. What are the new findings? ► This represents the first prospective randomised controlled trial examining an automatic polyp detection during colonoscopy and shows an increase of ADR by 50%, from 20% to 30%. ► This effect was mainly due to a higher rate of small adenomas found. ► The detection rate of hyperplastic polyps was also significantly increased. How might it impact on clinical practice in the foreseeable future? ► Automatic polyp and adenoma detection could be the future of diagnostic colonoscopy in order to achieve stable high adenoma detection rates. ► However, the effect on ultimate outcome is still unclear, and further improvements such as polyp differentiation have to be implemented. on17March2019byguest.Protectedbycopyright.http://gut.bmj.com/Gut:firstpublishedas10.1136/gutjnl-2018-317500on27February2019.Downloadedfrom • 이 인공지능이 polyp detection과 adenoma detection에 실제 도움을 줌 • Prospective RCT로 증명 (n=1058; standard=536, CAD=522) • 인공지능의 도움으로 • adenoma detection rate 증가: 29.1% vs 20.3% (p0.001) • 환자당 detection 된 adenoma 수의 증가: 0.53 vs 0.31 (p0.001) • 이는 주로 diminutive adenomas 를 더 많이 찾았기 때문: 185 vs 102 (p0.001) • hyperplastic polyps의 수 증가: 114 vs 52 (p0.001)
  • 76.
    Endoscopy Prosp The trial syste ADR the S patie to Fe bowe given defin high infla Figure1 Deep learning architecture.The detection algorithm is a deep convolutional neural network (CNN) based on SegNet architecture. Data flow is from left to right: a colonoscopy image is sequentially warped into a binary image, with 1 representing polyp pixels and 0 representing no polyp in probability a map.This is then displayed, as showed in the output, with a hollow tracing box on the CADe monitor. 1Wang P, et al. Gut 2019;0:1–7. doi:10.1136/gutjnl-2018-317500 Endoscopy ORIGINAL ARTICLE Real-time automatic detection system increases colonoscopic polyp and adenoma detection rates: a prospective randomised controlled study Pu Wang,  1 Tyler M Berzin,  2 Jeremy Romek Glissen Brown,  2 Shishira Bharadwaj,2 Aymeric Becq,2 Xun Xiao,1 Peixi Liu,1 Liangping Li,1 Yan Song,1 Di Zhang,1 Yi Li,1 Guangre Xu,1 Mengtian Tu,1 Xiaogang Liu  1 To cite: Wang P, Berzin TM, Glissen Brown JR, et al. Gut Epub ahead of print: [please include Day Month Year]. doi:10.1136/ gutjnl-2018-317500 ► Additional material is published online only.To view please visit the journal online (http://dx.doi.org/10.1136/ gutjnl-2018-317500). 1 Department of Gastroenterology, Sichuan Academy of Medical Sciences Sichuan Provincial People’s Hospital, Chengdu, China 2 Center for Advanced Endoscopy, Beth Israel Deaconess Medical Center and Harvard Medical School, Boston, Massachusetts, USA Correspondence to Xiaogang Liu, Department of Gastroenterology Sichuan Academy of Medical Sciences and Sichuan Provincial People’s Hospital, Chengdu, China; Gary.samsph@gmail.com Received 30 August 2018 Revised 4 February 2019 Accepted 13 February 2019 © Author(s) (or their employer(s)) 2019. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ. ABSTRACT Objective The effect of colonoscopy on colorectal cancer mortality is limited by several factors, among them a certain miss rate, leading to limited adenoma detection rates (ADRs).We investigated the effect of an automatic polyp detection system based on deep learning on polyp detection rate and ADR. Design In an open, non-blinded trial, consecutive patients were prospectively randomised to undergo diagnostic colonoscopy with or without assistance of a real-time automatic polyp detection system providing a simultaneous visual notice and sound alarm on polyp detection.The primary outcome was ADR. Results Of 1058 patients included, 536 were randomised to standard colonoscopy, and 522 were randomised to colonoscopy with computer-aided diagnosis.The artificial intelligence (AI) system significantly increased ADR (29.1%vs20.3%, p0.001) and the mean number of adenomas per patient (0.53vs0.31, p0.001).This was due to a higher number of diminutive adenomas found (185vs102; p0.001), while there was no statistical difference in larger adenomas (77vs58, p=0.075). In addition, the number of hyperplastic polyps was also significantly increased (114vs52, p0.001). Conclusions In a low prevalent ADR population, an automatic polyp detection system during colonoscopy resulted in a significant increase in the number of diminutive adenomas detected, as well as an increase in the rate of hyperplastic polyps.The cost–benefit ratio of such effects has to be determined further. Trial registration number ChiCTR-DDD-17012221; Results. INTRODUCTION Colorectal cancer (CRC) is the second and third- leading causes of cancer-related deaths in men and women respectively.1 Colonoscopy is the gold stan- dard for screening CRC.2 3 Screening colonoscopy has allowed for a reduction in the incidence and mortality of CRC via the detection and removal of adenomatous polyps.4–8 Additionally, there is evidence that with each 1.0% increase in adenoma detection rate (ADR), there is an associated 3.0% decrease in the risk of interval CRC.9 10 However, polyps can be missed, with reported miss rates of up to 27% due to both polyp and operator charac- teristics.11 12 Unrecognised polyps within the visual field is an important problem to address.11 Several studies have shown that assistance by a second observer increases the polyp detection rate (PDR), but such a strategy remains controversial in terms of increasing the ADR.13–15 Ideally, a real-time automatic polyp detec- tion system, with performance close to that of expert endoscopists, could assist the endosco- pist in detecting lesions that might correspond to adenomas in a more consistent and reliable way Significance of this study What is already known on this subject? ► Colorectal adenoma detection rate (ADR) is regarded as a main quality indicator of (screening) colonoscopy and has been shown to correlate with interval cancers. Reducing adenoma miss rates by increasing ADR has been a goal of many studies focused on imaging techniques and mechanical methods. ► Artificial intelligence has been recently introduced for polyp and adenoma detection as well as differentiation and has shown promising results in preliminary studies. What are the new findings? ► This represents the first prospective randomised controlled trial examining an automatic polyp detection during colonoscopy and shows an increase of ADR by 50%, from 20% to 30%. ► This effect was mainly due to a higher rate of small adenomas found. ► The detection rate of hyperplastic polyps was also significantly increased. How might it impact on clinical practice in the foreseeable future? ► Automatic polyp and adenoma detection could be the future of diagnostic colonoscopy in order to achieve stable high adenoma detection rates. ► However, the effect on ultimate outcome is still unclear, and further improvements such as polyp differentiation have to be implemented. on17March2019byguest.Protectedbycopyright.http://gut.bmj.com/Gut:firstpublishedas10.1136/gutjnl-2018-317500on27February2019.Downloadedfrom • 이 인공지능이 polyp detection과 adenoma detection에 실제 도움을 줌 • Prospective RCT로 증명 (n=1058; standard=536, CAD=522) • 인공지능의 도움으로 • adenoma detection rate 증가: 29.1% vs 20.3% (p0.001) • 환자당 detection 된 adenoma 수의 증가: 0.53 vs 0.31 (p0.001) • 이는 주로 diminutive adenomas 를 더 많이 찾았기 때문: 185 vs 102 (p0.001) • hyperplastic polyps의 수 증가: 114 vs 52 (p0.001)
  • 79.
  • 80.
    Fig 1. Whatcan consumer wearables do? Heart rate can be measured with an oximeter built into a ring [3], muscle activity with an electromyographi sensor embedded into clothing [4], stress with an electodermal sensor incorporated into a wristband [5], and physical activity or sleep patterns via an accelerometer in a watch [6,7]. In addition, a female’s most fertile period can be identified with detailed body temperature tracking [8], while levels of me attention can be monitored with a small number of non-gelled electroencephalogram (EEG) electrodes [9]. Levels of social interaction (also known to a PLOS Medicine 2016
  • 81.
    •복잡한 의료 데이터의분석 및 insight 도출 •영상 의료/병리 데이터의 분석/판독 •연속 데이터의 모니터링 및 예방/예측 의료 인공지능의 세 유형
  • 82.
  • 86.
    S E PS I S A targeted real-time early warning score (TREWScore) for septic shock Katharine E. Henry,1 David N. Hager,2 Peter J. Pronovost,3,4,5 Suchi Saria1,3,5,6 * Sepsis is a leading cause of death in the United States, with mortality highest among patients who develop septic shock. Early aggressive treatment decreases morbidity and mortality. Although automated screening tools can detect patients currently experiencing severe sepsis and septic shock, none predict those at greatest risk of developing shock. We analyzed routinely available physiological and laboratory data from intensive care unit patients and devel- oped “TREWScore,” a targeted real-time early warning score that predicts which patients will develop septic shock. TREWScore identified patients before the onset of septic shock with an area under the ROC (receiver operating characteristic) curve (AUC) of 0.83 [95% confidence interval (CI), 0.81 to 0.85]. At a specificity of 0.67, TREWScore achieved a sensitivity of 0.85 and identified patients a median of 28.2 [interquartile range (IQR), 10.6 to 94.2] hours before onset. Of those identified, two-thirds were identified before any sepsis-related organ dysfunction. In compar- ison, the Modified Early Warning Score, which has been used clinically for septic shock prediction, achieved a lower AUC of 0.73 (95% CI, 0.71 to 0.76). A routine screening protocol based on the presence of two of the systemic inflam- matory response syndrome criteria, suspicion of infection, and either hypotension or hyperlactatemia achieved a low- er sensitivity of 0.74 at a comparable specificity of 0.64. Continuous sampling of data from the electronic health records and calculation of TREWScore may allow clinicians to identify patients at risk for septic shock and provide earlier interventions that would prevent or mitigate the associated morbidity and mortality. INTRODUCTION Seven hundred fifty thousand patients develop severe sepsis and septic shock in the United States each year. More than half of them are admitted to an intensive care unit (ICU), accounting for 10% of all ICU admissions, 20 to 30% of hospital deaths, and $15.4 billion in an- nual health care costs (1–3). Several studies have demonstrated that morbidity, mortality, and length of stay are decreased when severe sep- sis and septic shock are identified and treated early (4–8). In particular, one study showed that mortality from septic shock increased by 7.6% with every hour that treatment was delayed after the onset of hypo- tension (9). More recent studies comparing protocolized care, usual care, and early goal-directed therapy (EGDT) for patients with septic shock sug- gest that usual care is as effective as EGDT (10–12). Some have inter- preted this to mean that usual care has improved over time and reflects important aspects of EGDT, such as early antibiotics and early ag- gressive fluid resuscitation (13). It is likely that continued early identi- fication and treatment will further improve outcomes. However, the Acute Physiology Score (SAPS II), SequentialOrgan Failure Assessment (SOFA) scores, Modified Early Warning Score (MEWS), and Simple Clinical Score (SCS) have been validated to assess illness severity and risk of death among septic patients (14–17). Although these scores are useful for predicting general deterioration or mortality, they typical- ly cannot distinguish with high sensitivity and specificity which patients are at highest risk of developing a specific acute condition. The increased use of electronic health records (EHRs), which can be queried in real time, has generated interest in automating tools that identify patients at risk for septic shock (18–20). A number of “early warning systems,” “track and trigger” initiatives, “listening applica- tions,” and “sniffers” have been implemented to improve detection andtimelinessof therapy forpatients with severe sepsis andseptic shock (18, 20–23). Although these tools have been successful at detecting pa- tients currently experiencing severe sepsis or septic shock, none predict which patients are at highest risk of developing septic shock. The adoption of the Affordable Care Act has added to the growing excitement around predictive models derived from electronic health R E S E A R C H A R T I C L E onNovember3,2016http://stm.sciencemag.org/Downloadedfrom
  • 88.
    puted as newdata became avail when his or her score crossed t dation set, the AUC obtained f 0.81 to 0.85) (Fig. 2). At a spec of 0.33], TREWScore achieved a s a median of 28.2 hours (IQR, 10 Identification of patients b A critical event in the developme related organ dysfunction (seve been shown to increase after th more than two-thirds (68.8%) o were identified before any sepsi tients were identified a median (Fig. 3B). Comparison of TREWScore Weevaluatedtheperformanceof methods for the purpose of provid use of TREWScore. We first com to MEWS, a general metric used of catastrophic deterioration (17 oped for tracking sepsis, MEWS tion of patients at risk for severe Fig. 2. ROC for detection of septic shock before onset in the validation set. The ROC curve for TREWScore is shown in blue, with the ROC curve for MEWS in red. The sensitivity and specificity performance of the routine screening criteria is indicated by the purple dot. Normal 95% CIs are shown for TREWScore and MEWS. TPR, true-positive rate; FPR, false-positive rate. R E S E A R C H A R T I C L E A targeted real-time early warning score (TREWScore) for septic shock AUC=0.83 At a specificity of 0.67, TREWScore achieved a sensitivity of 0.85 
 and identified patients a median of 28.2 hours before onset.
  • 90.
    •미국에서 아이폰 앱으로출시 •사용이 얼마나 번거로울지가 관건 •어느 정도의 기간을 활용해야 효과가 있는가: 2주? 평생? •Food logging 등을 어떻게 할 것인가? •과금 방식도 아직 공개되지 않은듯
  • 91.
  • 93.
    An Algorithm Basedon Deep Learning for Predicting In-Hospital Cardiac Arrest Joon-myoung Kwon, MD;* Youngnam Lee, MS;* Yeha Lee, PhD; Seungwoo Lee, BS; Jinsik Park, MD, PhD Background-—In-hospital cardiac arrest is a major burden to public health, which affects patient safety. Although traditional track- and-trigger systems are used to predict cardiac arrest early, they have limitations, with low sensitivity and high false-alarm rates. We propose a deep learning–based early warning system that shows higher performance than the existing track-and-trigger systems. Methods and Results-—This retrospective cohort study reviewed patients who were admitted to 2 hospitals from June 2010 to July 2017. A total of 52 131 patients were included. Specifically, a recurrent neural network was trained using data from June 2010 to January 2017. The result was tested using the data from February to July 2017. The primary outcome was cardiac arrest, and the secondary outcome was death without attempted resuscitation. As comparative measures, we used the area under the receiver operating characteristic curve (AUROC), the area under the precision–recall curve (AUPRC), and the net reclassification index. Furthermore, we evaluated sensitivity while varying the number of alarms. The deep learning–based early warning system (AUROC: 0.850; AUPRC: 0.044) significantly outperformed a modified early warning score (AUROC: 0.603; AUPRC: 0.003), a random forest algorithm (AUROC: 0.780; AUPRC: 0.014), and logistic regression (AUROC: 0.613; AUPRC: 0.007). Furthermore, the deep learning– based early warning system reduced the number of alarms by 82.2%, 13.5%, and 42.1% compared with the modified early warning system, random forest, and logistic regression, respectively, at the same sensitivity. Conclusions-—An algorithm based on deep learning had high sensitivity and a low false-alarm rate for detection of patients with cardiac arrest in the multicenter study. (J Am Heart Assoc. 2018;7:e008678. DOI: 10.1161/JAHA.118.008678.) Key Words: artificial intelligence • cardiac arrest • deep learning • machine learning • rapid response system • resuscitation In-hospital cardiac arrest is a major burden to public health, which affects patient safety.1–3 More than a half of cardiac arrests result from respiratory failure or hypovolemic shock, and 80% of patients with cardiac arrest show signs of deterioration in the 8 hours before cardiac arrest.4–9 However, 209 000 in-hospital cardiac arrests occur in the United States each year, and the survival discharge rate for patients with cardiac arrest is 20% worldwide.10,11 Rapid response systems (RRSs) have been introduced in many hospitals to detect cardiac arrest using the track-and-trigger system (TTS).12,13 Two types of TTS are used in RRSs. For the single-parameter TTS (SPTTS), cardiac arrest is predicted if any single vital sign (eg, heart rate [HR], blood pressure) is out of the normal range.14 The aggregated weighted TTS calculates a weighted score for each vital sign and then finds patients with cardiac arrest based on the sum of these scores.15 The modified early warning score (MEWS) is one of the most widely used approaches among all aggregated weighted TTSs (Table 1)16 ; however, traditional TTSs including MEWS have limitations, with low sensitivity or high false-alarm rates.14,15,17 Sensitivity and false-alarm rate interact: Increased sensitivity creates higher false-alarm rates and vice versa. Current RRSs suffer from low sensitivity or a high false- alarm rate. An RRS was used for only 30% of patients before unplanned intensive care unit admission and was not used for 22.8% of patients, even if they met the criteria.18,19 From the Departments of Emergency Medicine (J.-m.K.) and Cardiology (J.P.), Mediplex Sejong Hospital, Incheon, Korea; VUNO, Seoul, Korea (Youngnam L., Yeha L., S.L.). *Dr Kwon and Mr Youngnam Lee contributed equally to this study. Correspondence to: Joon-myoung Kwon, MD, Department of Emergency medicine, Mediplex Sejong Hospital, 20, Gyeyangmunhwa-ro, Gyeyang-gu, Incheon 21080, Korea. E-mail: kwonjm@sejongh.co.kr Received January 18, 2018; accepted May 31, 2018. ª 2018 The Authors. Published on behalf of the American Heart Association, Inc., by Wiley. This is an open access article under the terms of the Creative Commons Attribution-NonCommercial License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes. DOI: 10.1161/JAHA.118.008678 Journal of the American Heart Association 1 ORIGINAL RESEARCH byguestonJune28,2018http://jaha.ahajournals.org/Downloadedfrom
  • 94.
    •환자 수: 86,290 •cardiacarrest: 633 •Input: Heart rate, Respiratory rate, Body temperature, Systolic Blood Pressure (source: VUNO) Cardiac Arrest Prediction Accuracy
  • 95.
    •대학병원 신속 대응팀에서처리 가능한 알림 수 (A, B 지점) 에서 더 큰 정확도 차이를 보임 •A: DEWS 33.0%, MEWS 0.3% •B: DEWS 42.7%, MEWS 4.0% (source: VUNO) APPH(Alarms Per Patients Per Hour) (source: VUNO) Less False Alarm
  • 96.
  • 97.
    Downloadedfromhttp://journals.lww.com/ccmjournalbyBhDMf5ePHKbH4TTImqenVKNoQAGGabrfcFVqkUtIOfFe0yYC0oXKcrV+IkHAY10fon03/21/2020 Downloadedfromhttp://journals.lww.com/ccmjournalbyBhDMf5ePHKbH4TTImqenVKNoQAGGabrfcFVqkUtIOfFe0yYC0oXKcrV+IkHAY10fon03/21/2020 Copyright © 2020by the Society of Critical Care Medicine and Wolters Kluwer Health, Inc. All Rights Reserved. Critical Care Medicine www.ccmjournal.org e285 Objectives: As the performance of a conventional track and trigger system in a rapid response system has been unsatisfactory, we developed and implemented an artificial intelligence for predict- ing in-hospital cardiac arrest, denoted the deep learning-based early warning system. The purpose of this study was to compare the performance of an artificial intelligence-based early warning system with that of conventional methods in a real hospital situ- ation. Design: Retrospective cohort study. Setting: This study was conducted at a hospital in which deep learning-based early warning system was implemented. Patients: We reviewed the records of adult patients who were admitted to the general ward of our hospital from April 2018 to March 2019. Interventions: The study population included 8,039 adult patients. A total 83 events of deterioration occurred during the study pe- riod. The outcome was events of deterioration, defined as cardiac arrest and unexpected ICU admission. We defined a true alarm as an alarm occurring within 0.5–24 hours before a deteriorating event. Measurements and Main Results: We used the area under the receiver operating characteristic curve, area under the precision- recall curve, number needed to examine, and mean alarm count per day as comparative measures. The deep learning-based early warning system (area under the receiver operating character- istic curve, 0.865; area under the precision-recall curve, 0.066) outperformed the modified early warning score (area under the receiver operating characteristic curve, 0.682; area under the precision-recall curve, 0.010) and reduced the number needed to examine and mean alarm count per day by 69.2% and 59.6%, respectively. At the same specificity, deep learning-based early warning system had up to 257% higher sensitivity than conven- tional methods. Conclusions: The developed artificial intelligence based on deep-learning, deep learning-based early warning system, ac- curately predicted deterioration of patients in a general ward and outperformed conventional methods. This study showed the potential and effectiveness of artificial intelligence in an rapid response system, which can be applied together with electronic health records. This will be a useful method to identify patients with deterioration and help with precise decision-making in daily practice. (Crit Care Med 2020; 48:e285–e289) Key Words: artificial intelligence; cardiology; critical care; deep learning I n-hospital cardiac arrest is a major healthcare burden and rapid response systems (RRSs) are used worldwide to iden- tify deteriorating hospitalized patients and to prevent car- diac arrest (1). Most patients with cardiac arrest show signs of deterioration. However, 209,000 cardiac arrests occur and the survival to discharge rate was only less than 20% in the United States each year (2). One challenge with RRS is the failure to detect the deteriorating signs of patient; thus, several track and trigger systems (TTSs) have been developed (3). However, con- ventional methods, such as the single parameter TTS (SPTTS) and modified early warning score (MEWS), have been disap- pointing owing to their limited ability to work together with electronic health records (EHRs) (4). We previously developed and validated an artificial intel- ligence (AI) for predicting in-hospital cardiac arrest, denoted the deep learning-based early warning system (DEWS) (5). After fine-tuning and setup, we implemented DEWS with EHR to monitor the risk of deterioration among patients in general wards; we have actively used DEWS in our RRS since April 2018. The purpose of this study was to compare the perfor- mance of our developed AI with that of conventional methods. To our best knowledge, this is the first study to apply a deep learning-based AI algorithm in an RRS, verified in an external validation study in an actual hospital setting.DOI: 10.1097/CCM.0000000000004236 1 VUNO, Seoul, Korea. 2 Department of Critical care and Emergency Medicine, Mediplex Sejong Hospital, Incheon, Korea. 3 Division of Cardiology, Cardiovascular Center, Mediplex Sejong Hospital, Incheon, Korea. Copyright © 2020 by the Society of Critical Care Medicine and Wolters Kluwer Health, Inc. All Rights Reserved. Detecting Patient Deterioration Using Artificial Intelligence in a Rapid Response System Kyung-Jae Cho, MS1 ; Oyeon Kwon, MS1 ; Joon-myoung Kwon, MD, MS2 ; Yeha Lee, PhD1 ; Hyunho Park, MD1 ; Ki-Hyun Jeon, MD, MS3 ; Kyung-Hee Kim, MD, PhD3 ; Jinsik Park, MD, PhD3 ; Byung-Hee Oh, MD PhD3 •Real clinical setting 에서 DEWS 를 다른 기존의 예측 시스템과 비교 •세종병원의 일반병동 성인환자 8,039 명 대상 •cardiac arrest / unexpected ICU admission을 0.5~24시간 미리 예측 •Retrospective Cohort Study Critical Care Medicine 2020
  • 98.
    Cho et al 기존의시스템보다 더 정확하게 예측 (AUC=0.865) 기존의 시스템보다 더 일찍 예측 (동일 specificity 기준) 검사가 필요한 환자의 숫자도 감소 False Alarm의 횟수도 감소 (동일 sensitivity 기준) Critical Care Medicine 2020
  • 99.
    •복잡한 의료 데이터의분석 및 insight 도출 •영상 의료/병리 데이터의 분석/판독 •연속 데이터의 모니터링 및 예방/예측 의료 인공지능의 세 유형
  • 100.
    AnalysisTarget Discovery AnalysisLeadDiscovery Clinical Trial Post Market Surveillance Digital Healthcare in Drug Development
  • 101.
    AnalysisTarget Discovery AnalysisLeadDiscovery Clinical Trial Post Market Surveillance Digital Healthcare in Drug Development •개인 유전 정보 분석 •블록체인 기반 유전체 분석 •딥러닝 기반 후보 물질 •인공지능+제약사 •환자 모집 •데이터 측정: 웨어러블 •디지털 표현형 •복약 순응도 •SNS 기반의 PMS •블록체인 기반의 PMS + Digital Therapeutics
  • 102.
    AnalysisTarget Discovery AnalysisLeadDiscovery Clinical Trial Post Market Surveillance Digital Healthcare in Drug Development •딥러닝 기반의 lead discovery •인공지능+제약사
  • 103.
    604 VOLUME 35NUMBER 7 JULY 2017 NATURE BIOTECHNOLOGY AI-powered drug discovery captures pharma interest Adrug-huntingdealinkedlastmonth,between Numerate,ofSanBruno,California,andTakeda PharmaceuticaltouseNumerate’sartificialintel- ligence (AI) suite to discover small-molecule therapies for oncology, gastroenterology and central nervous system disorders, is the latest in a growing number of research alliances involv- ing AI-powered computational drug develop- ment firms. Also last month, GNS Healthcare of Cambridge, Massachusetts announced a deal with Roche subsidiary Genentech of South San Francisco, California to use GNS’s AI platform to better understand what affects the efficacy of knowntherapiesinoncology.InMay,Exscientia of Dundee, Scotland, signed a deal with Paris- based Sanofi that includes up to €250 ($280) million in milestone payments. Exscientia will provide the compound design and Sanofi the chemical synthesis of new drugs for diabetes and cardiovascular disease. The trend indicates thatthepharmaindustry’slong-runningskepti- cismaboutAIissofteningintogenuineinterest, driven by AI’s promise to address the industry’s principal pain point: clinical failure rates. The industry’s willingness to consider AI approaches reflects the reality that drug discov- eryislaborious,timeconsumingandnotpartic- ularly effective. A two-decade-long downward trend in clinical success rates has only recently improved (Nat. Rev. Drug Disc. 15, 379–380, 2016). Still, today, only about one in ten drugs thatenterphase1clinicaltrialsreachespatients. Half those failures are due to a lack of efficacy, says Jackie Hunter, CEO of BenevolentBio, a division of BenevolentAI of London. “That tells you we’re not picking the right targets,” she says. “Even a 5 or 10% reduction in efficacy failure would be amazing.” Hunter’s views on AI in drug discovery are featured in Ernst Young’s BiotechnologyReport2017releasedlastmonth. Companies that have been watching AI from the sidelines are now jumping in. The best- known machine-learning model for drug dis- covery is perhaps IBM’s Watson. IBM signed a deal in December 2016 with Pfizer to aid the pharma giant’s immuno-oncology drug discov- eryefforts,addingtoastringofpreviousdealsin the biopharma space (Nat.Biotechnol.33,1219– 1220, 2015). IBM’s Watson hunts for drugs by sorting through vast amounts of textual data to provide quick analyses, and tests hypotheses by sorting through massive amounts of laboratory data, clinicalreportsandscientificpublications. BenevolentAI takes a similar approach with algorithmsthatminetheresearchliteratureand proprietary research databases. The explosion of biomedical data has driven much of industry’s interest in AI (Table 1). The confluence of ever-increasing computational horsepower and the proliferation of large data sets has prompted scientists to seek learning algorithms that can help them navigate such massive volumes of information. A lot of the excitement about AI in drug discovery has spilled over from other fields. Machine vision, which allows, among other things, self-driving cars, and language process- ing have given rise to sophisticated multilevel artificial neural networks known as deep- learning algorithms that can be used to model biological processes from assay data as well as textual data. In the past people didn’t have enough data to properly train deep-learning algorithms, says Mark Gerstein, a biomedical informat- ics professor at Yale University in New Haven, Connecticut.Nowresearchershavebeenableto build massive databases and harness them with these algorithms, he says. “I think that excite- ment is justified.” Numerate is one of a growing number of AI companies founded to take advantage of that dataonslaughtasappliedtodrugdiscovery.“We apply AI to chemical design at every stage,” says Guido Lanza, Numerate’s CEO. It will provide Tokyo-basedTakedawithcandidatesforclinical trials by virtual compound screenings against targets, designing and optimizing compounds, andmodelingabsorption,distribution,metabo- lism and excretion, and toxicity. The agreement includes undisclosed milestone payments and royalties. Academic laboratories are also embracing AI tools. In April, Atomwise of San Francisco launched its Artificial Intelligence Molecular Screen awards program, which will deliver 72 potentially therapeutic compounds to as many as 100 university research labs at no charge. Atomwise is a University of Toronto spinout that in 2015 secured an alliance with Merck of Kenilworth, New Jersey. For this new endeavor, it will screen 10 million molecules using its AtomNet platform to provide each lab with 72 compounds aimed at a specific target of the laboratory’s choosing. The Japanese government launched in 2016 a research consortium centered on using Japan’s K supercomputer to ramp up drug discovery efficiency across dozens of local companies and institutions. Among those involved are Takeda and tech giants Fujitsu of Tokyo, Japan, and NEC, also of Tokyo, as well as Kyoto University Hospital and Riken, Japan’s National Research and Development Institute, which will provide clinical data. Deep learning is starting to gain acolytes in the drug discovery space. KTSDESIGN/SciencePhotoLibrary N E W S©2017NatureAmerica,Inc.,partofSpringerNature.Allrightsreserved. Genomics data analytics startup WuXi NextCode Genomics of Shanghai; Cambridge, Massachusetts; and Reykjavík, Iceland, collab- orated with researchers at Yale University on a study that used the company’s deep-learning algorithm to identify a key mechanism in blood vessel growth. The result could aid drug discovery efforts aimed at inhibiting blood vessel growth in tumors (Nature doi:10.1038/ nature22322, 2017). IntheUS,duringtheObamaadministration, industry and academia joined forces to apply AI to accelerate drug discovery as part of the CancerMoonshotinitiative (Nat.Biotechnol.34 , 119, 2016). The Accelerating Therapeutics for Opportunities in Medicine (ATOM), launched in January 2016, marries computational and experimental approaches, with Brentford, UK-based GlaxoSmithKline, participating with Lawrence Livermore National Laboratory in Livermore, California, and the US National Cancer Institute. The computational portion of the process, which includes deep-learning and other AI algorithms, will be tested in the first two years. In the third year, “we hope to start on day one with a disease hypothesis and on day 365 to deliver a drug candidate,” says MarthaHead,GlaxoSmithKline’s head, insights from data. Table 1 Selected collaborations in the AI-drug discovery space AI company/ location Technology Announced partner/ location Indication(s) Deal date Atomwise Deep-learning screening from molecular structure data Merck Malaria 2015 BenevolentAI Deep-learning and natural language processing of research literature Janssen Pharmaceutica (Johnson Johnson), Beerse, Belgium Multiple November 8, 2016 Berg, Framingham, Massachusetts Deep-learning screening of biomarkers from patient data None Multiple N/A Exscientia Bispecific compounds via Bayesian models of ligand activity from drug discovery data Sanofi Metabolic diseases May 9, 2017 GNS Healthcare Bayesian probabilistic inference for investigating efficacy Genentech Oncology June 19, 2017 Insilico Medicine Deep-learning screening from drug and disease databases None Age-related diseases N/A Numerate Deep learning from pheno- typic data Takeda Oncology, gastro- enterology and central nervous system disorders June 12, 2017 Recursion, Salt Lake City, Utah Cellular phenotyping via image analysis Sanofi Rare genetic diseases April 25, 2016 twoXAR, Palo Alto, California Deep-learning screening from literature and assay data Santen Pharmaceuticals, Osaka, Japan Glaucoma February 23, 2017 N/A, none announced. Source: companies’ websites. N E W S
  • 104.
    WSJ, 2017 June •다국적 제약사는 인공지능 기술을 신약 개발에 활용하기 위해 다양한 시도 • 최근 인공지능에서는 과거의 virtual screening, docking 등과는 다른 방식을 이용
  • 105.
    https://research.googleblog.com/2017/12/deepvariant-highly-accurate-genomes.html DeepVariant: Highly AccurateGenomes With Deep Neural Networks •2016년 PrecisionFDA의 SNP 퍼포먼스 부문에서 Verily 가 우승 •이 알고리즘이 개선되어 DeepVariant 라는 이름으로 공개 •Read의 alignment를 위해서 그 자체를 ‘이미지’로 인식하여 CNN으로 학습
  • 106.
    https://research.googleblog.com/2017/12/deepvariant-highly-accurate-genomes.html DeepVariant: Highly AccurateGenomes With Deep Neural Networks •2016년 PrecisionFDA의 SNP 퍼포먼스 부문에서 Verily 가 우승 •이 알고리즘이 개선되어 DeepVariant 라는 이름으로 공개 •Read의 alignment를 위해서 그 자체를 ‘이미지’로 인식하여 CNN으로 학습 L E T T E R S This last experiment is especially demanding as not only do the species PacBio dataset is the opposite, with many false indels (79.8% PPV Table 1 Evaluation of several bioinformatics methods on the high-coverage, whole-genome sample NA24385 Method Type F1 Recall Precision TP FN FP FP.gt FP.al Version DeepVariant (live GitHub) Indel 0.99507 0.99347 0.99666 357,641 2350 1,198 217 840 Latest GitHub v0.4.1-b4e8d37d GATK (raw) Indel 0.99366 0.99219 0.99512 357,181 2810 1,752 377 995 3.8-0-ge9d806836 Strelka Indel 0.99227 0.98829 0.99628 355,777 4214 1,329 221 855 2.8.4-3-gbe58942 DeepVariant (pFDA) Indel 0.99112 0.98776 0.99450 355,586 4405 1,968 846 1,027 pFDA submission May 2016 GATK (VQSR) Indel 0.99010 0.98454 0.99573 354,425 5566 1,522 343 909 3.8-0-ge9d806836 GATK (flt) Indel 0.98229 0.96881 0.99615 348,764 11227 1,349 370 916 3.8-0-ge9d806836 FreeBayes Indel 0.94091 0.91917 0.96372 330,891 29,100 12,569 9,149 3,347 v1.1.0-54-g49413aa 16GT Indel 0.92732 0.91102 0.94422 327,960 32,031 19,364 10,700 7,745 v1.0-34e8f934 SAMtools Indel 0.87951 0.83369 0.93066 300,120 59,871 22,682 2,302 20,282 1.6 DeepVariant (live GitHub) SNP 0.99982 0.99975 0.99989 3,054,552 754 350 157 38 Latest GitHub v0.4.1-b4e8d37d DeepVariant (pFDA) SNP 0.99958 0.99944 0.99973 3,053,579 1,727 837 409 78 pFDA submission May 2016 Strelka SNP 0.99935 0.99893 0.99976 3,052,050 3,256 732 87 136 2.8.4-3-gbe58942 GATK (raw) SNP 0.99914 0.99973 0.99854 3,054,494 812 4,469 176 257 3.8-0-ge9d806836 16GT SNP 0.99583 0.99850 0.99318 3,050,725 4,581 20,947 3,476 3,899 v1.0-34e8f934 GATK (VQSR) SNP 0.99436 0.98940 0.99937 3,022,917 32,389 1,920 80 170 3.8-0-ge9d806836 FreeBayes SNP 0.99124 0.98342 0.99919 3,004,641 50,665 2,434 351 1,232 v1.1.0-54-g49413aa SAMtools SNP 0.99021 0.98114 0.99945 2,997,677 57,629 1,651 1,040 200 1.6 GATK (flt) SNP 0.98958 0.97953 0.99983 2,992,764 62,542 509 168 26 3.8-0-ge9d806836 The dataset used in this evaluation is the same as in the precisionFDA Truth Challenge (pFDA). Several methods are compared, including the DeepVariant callset as submitted to the contest and the most recent DeepVariant version from GitHub. Each method was run according to the individual authors’ best-practice recommendations and represents a good-faith effort to achieve best results. Comparisons to the Genome in a Bottle truth set for this sample were performed using the hap.py software, available on GitHub at http://github.com/Illumina/hap.py, using the same version of the GIAB truth set (v3.2.2) used by pFDA. The overall accuracy (F1, sort order within each variant type), recall, preci- sion, and numbers of true positives (TP), false negatives (FN) and false positives (FP) are shown over the whole genome. False positives are further divided by those caused by genotype mismatches (FP.gt) and those cause by allele mismatches (FP.al). Finally, the version of the software used for each method is provided. We present three GATK callsets: GATK (raw), the unfiltered calls emitted by the HaplotypeCaller; GATK (VQSR), the callset filtered with variant quality score recalibration (VQSR); and GATK (flt), the raw GATK callset filtered with run-flt in CHM-eval. See Supplementary Note 7 for more details.
  • 107.
    targets. To overcome theselimitations we take an indirect approach. Instead of directly visualizing filters in order to understand their specialization, we apply filters to input data and examine the location where they maximally fire. Using this technique we were able to map filters to chemical functions. For example, Figure 5 illustrate the 3D locations at which a particular filter from our first convo- lutional layer fires. Visual inspection of the locations at which that filter is active reveals that this filter specializes as a sulfonyl/sulfonamide detector. This demonstrates the ability of the model to learn complex chemical features from simpler ones. In this case, the filter has inferred a meaningful spatial arrangement of input atom types without any chemical prior knowledge. Figure 5: Sulfonyl/sulfonamide detection with autonomously trained convolutional filters. 8 Protein-Compound Complex Structure Binding, or non-binding?
  • 109.
    AtomNet: A DeepConvolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery Izhar Wallach Atomwise, Inc. izhar@atomwise.com Michael Dzamba Atomwise, Inc. misko@atomwise.com Abraham Heifets Atomwise, Inc. abe@atomwise.com Abstract Deep convolutional neural networks comprise a subclass of deep neural networks (DNN) with a constrained architecture that leverages the spatial and temporal structure of the domain they model. Convolutional networks achieve the best pre- dictive performance in areas such as speech and image recognition by hierarchi- cally composing simple local features into complex models. Although DNNs have been used in drug discovery for QSAR and ligand-based bioactivity predictions, none of these models have benefited from this powerful convolutional architec- ture. This paper introduces AtomNet, the first structure-based, deep convolutional neural network designed to predict the bioactivity of small molecules for drug dis- covery applications. We demonstrate how to apply the convolutional concepts of feature locality and hierarchical composition to the modeling of bioactivity and chemical interactions. In further contrast to existing DNN techniques, we show that AtomNet’s application of local convolutional filters to structural target infor- mation successfully predicts new active molecules for targets with no previously known modulators. Finally, we show that AtomNet outperforms previous docking approaches on a diverse set of benchmarks by a large margin, achieving an AUC greater than 0.9 on 57.8% of the targets in the DUDE benchmark. 1 Introduction Fundamentally, biological systems operate through the physical interaction of molecules. The ability to determine when molecular binding occurs is therefore critical for the discovery of new medicines and for furthering of our understanding of biology. Unfortunately, despite thirty years of compu- tational efforts, computer tools remain too inaccurate for routine binding prediction, and physical experiments remain the state of the art for binding determination. The ability to accurately pre- dict molecular binding would reduce the time-to-discovery of new treatments, help eliminate toxic molecules early in development, and guide medicinal chemistry efforts [1, 2]. In this paper, we introduce a new predictive architecture, AtomNet, to help address these challenges. AtomNet is novel in two regards: AtomNet is the first deep convolutional neural network for molec- ular binding affinity prediction. It is also the first deep learning system that incorporates structural information about the target to make its predictions. Deep convolutional neural networks (DCNN) are currently the best performing predictive models for speech and vision [3, 4, 5, 6]. DCNN is a class of deep neural network that constrains its model architecture to leverage the spatial and temporal structure of its domain. For example, a low-level image feature, such as an edge, can be described within a small spatially-proximate patch of pixels. Such a feature detector can share evidence across the entire receptive field by “tying the weights” of the detector neurons, as the recognition of the edge does not depend on where it is found within 1 arXiv:1510.02855v1[cs.LG]10Oct2015
  • 110.
    AtomNet: A DeepConvolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery Izhar Wallach Atomwise, Inc. izhar@atomwise.com Michael Dzamba Atomwise, Inc. misko@atomwise.com Abraham Heifets Atomwise, Inc. abe@atomwise.com Abstract Deep convolutional neural networks comprise a subclass of deep neural networks (DNN) with a constrained architecture that leverages the spatial and temporal structure of the domain they model. Convolutional networks achieve the best pre- dictive performance in areas such as speech and image recognition by hierarchi- cally composing simple local features into complex models. Although DNNs have been used in drug discovery for QSAR and ligand-based bioactivity predictions, none of these models have benefited from this powerful convolutional architec- ture. This paper introduces AtomNet, the first structure-based, deep convolutional neural network designed to predict the bioactivity of small molecules for drug dis- covery applications. We demonstrate how to apply the convolutional concepts of feature locality and hierarchical composition to the modeling of bioactivity and chemical interactions. In further contrast to existing DNN techniques, we show that AtomNet’s application of local convolutional filters to structural target infor- mation successfully predicts new active molecules for targets with no previously known modulators. Finally, we show that AtomNet outperforms previous docking approaches on a diverse set of benchmarks by a large margin, achieving an AUC greater than 0.9 on 57.8% of the targets in the DUDE benchmark. 1 Introduction Fundamentally, biological systems operate through the physical interaction of molecules. The ability to determine when molecular binding occurs is therefore critical for the discovery of new medicines and for furthering of our understanding of biology. Unfortunately, despite thirty years of compu- tational efforts, computer tools remain too inaccurate for routine binding prediction, and physical experiments remain the state of the art for binding determination. The ability to accurately pre- dict molecular binding would reduce the time-to-discovery of new treatments, help eliminate toxic molecules early in development, and guide medicinal chemistry efforts [1, 2]. In this paper, we introduce a new predictive architecture, AtomNet, to help address these challenges. AtomNet is novel in two regards: AtomNet is the first deep convolutional neural network for molec- ular binding affinity prediction. It is also the first deep learning system that incorporates structural information about the target to make its predictions. Deep convolutional neural networks (DCNN) are currently the best performing predictive models for speech and vision [3, 4, 5, 6]. DCNN is a class of deep neural network that constrains its model architecture to leverage the spatial and temporal structure of its domain. For example, a low-level image feature, such as an edge, can be described within a small spatially-proximate patch of pixels. Such a feature detector can share evidence across the entire receptive field by “tying the weights” of the detector neurons, as the recognition of the edge does not depend on where it is found within 1 arXiv:1510.02855v1[cs.LG]10Oct2015 Smina 123 35 5 0 0 Table 3: The number of targets on which AtomNet and Smina exceed given adjusted-logAUC thresh- olds. For example, on the CHEMBL-20 PMD set, AtomNet achieves an adjusted-logAUC of 0.3 or better for 27 targets (out of 50 possible targets). ChEMBL-20 PMD contains 50 targets, DUDE- 30 contains 30 targets, DUDE-102 contains 102 targets, and ChEMBL-20 inactives contains 149 targets. To overcome these limitations we take an indirect approach. Instead of directly visualizing filters in order to understand their specialization, we apply filters to input data and examine the location where they maximally fire. Using this technique we were able to map filters to chemical functions. For example, Figure 5 illustrate the 3D locations at which a particular filter from our first convo- lutional layer fires. Visual inspection of the locations at which that filter is active reveals that this filter specializes as a sulfonyl/sulfonamide detector. This demonstrates the ability of the model to learn complex chemical features from simpler ones. In this case, the filter has inferred a meaningful spatial arrangement of input atom types without any chemical prior knowledge. Figure 5: Sulfonyl/sulfonamide detection with autonomously trained convolutional filters. 8 • 이미 알려진 단백질-리간드 3차원 결합 구조를 딥러닝(CNN)으로 학습 • 화학 결합 등에 대한 계산 없이도, 단백질-리간드 결합 여부를 계산 • 기존의 구조기반 예측 등 대비, 딥러닝으로 더 정확히 예측하였음
  • 111.
    AtomNet: A DeepConvolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery Izhar Wallach Atomwise, Inc. izhar@atomwise.com Michael Dzamba Atomwise, Inc. misko@atomwise.com Abraham Heifets Atomwise, Inc. abe@atomwise.com Abstract Deep convolutional neural networks comprise a subclass of deep neural networks (DNN) with a constrained architecture that leverages the spatial and temporal structure of the domain they model. Convolutional networks achieve the best pre- dictive performance in areas such as speech and image recognition by hierarchi- cally composing simple local features into complex models. Although DNNs have been used in drug discovery for QSAR and ligand-based bioactivity predictions, none of these models have benefited from this powerful convolutional architec- ture. This paper introduces AtomNet, the first structure-based, deep convolutional neural network designed to predict the bioactivity of small molecules for drug dis- covery applications. We demonstrate how to apply the convolutional concepts of feature locality and hierarchical composition to the modeling of bioactivity and chemical interactions. In further contrast to existing DNN techniques, we show that AtomNet’s application of local convolutional filters to structural target infor- mation successfully predicts new active molecules for targets with no previously known modulators. Finally, we show that AtomNet outperforms previous docking approaches on a diverse set of benchmarks by a large margin, achieving an AUC greater than 0.9 on 57.8% of the targets in the DUDE benchmark. 1 Introduction Fundamentally, biological systems operate through the physical interaction of molecules. The ability to determine when molecular binding occurs is therefore critical for the discovery of new medicines and for furthering of our understanding of biology. Unfortunately, despite thirty years of compu- tational efforts, computer tools remain too inaccurate for routine binding prediction, and physical experiments remain the state of the art for binding determination. The ability to accurately pre- dict molecular binding would reduce the time-to-discovery of new treatments, help eliminate toxic molecules early in development, and guide medicinal chemistry efforts [1, 2]. In this paper, we introduce a new predictive architecture, AtomNet, to help address these challenges. AtomNet is novel in two regards: AtomNet is the first deep convolutional neural network for molec- ular binding affinity prediction. It is also the first deep learning system that incorporates structural information about the target to make its predictions. Deep convolutional neural networks (DCNN) are currently the best performing predictive models for speech and vision [3, 4, 5, 6]. DCNN is a class of deep neural network that constrains its model architecture to leverage the spatial and temporal structure of its domain. For example, a low-level image feature, such as an edge, can be described within a small spatially-proximate patch of pixels. Such a feature detector can share evidence across the entire receptive field by “tying the weights” of the detector neurons, as the recognition of the edge does not depend on where it is found within 1 arXiv:1510.02855v1[cs.LG]10Oct2015 • 이미 알려진 단백질-리간드 3차원 결합 구조를 딥러닝(CNN)으로 학습 • 화학 결합 등에 대한 계산 없이도, 단백질-리간드 결합 여부를 계산 • 기존의 구조기반 예측 등 대비, 딥러닝으로 더 정확히 예측하였음
  • 113.
    604 VOLUME 35NUMBER 7 JULY 2017 NATURE BIOTECHNOLOGY AI-powered drug discovery captures pharma interest Adrug-huntingdealinkedlastmonth,between Numerate,ofSanBruno,California,andTakeda PharmaceuticaltouseNumerate’sartificialintel- ligence (AI) suite to discover small-molecule therapies for oncology, gastroenterology and central nervous system disorders, is the latest in a growing number of research alliances involv- ing AI-powered computational drug develop- ment firms. Also last month, GNS Healthcare of Cambridge, Massachusetts announced a deal with Roche subsidiary Genentech of South San Francisco, California to use GNS’s AI platform to better understand what affects the efficacy of knowntherapiesinoncology.InMay,Exscientia of Dundee, Scotland, signed a deal with Paris- based Sanofi that includes up to €250 ($280) million in milestone payments. Exscientia will provide the compound design and Sanofi the chemical synthesis of new drugs for diabetes and cardiovascular disease. The trend indicates thatthepharmaindustry’slong-runningskepti- cismaboutAIissofteningintogenuineinterest, driven by AI’s promise to address the industry’s principal pain point: clinical failure rates. The industry’s willingness to consider AI approaches reflects the reality that drug discov- eryislaborious,timeconsumingandnotpartic- ularly effective. A two-decade-long downward trend in clinical success rates has only recently improved (Nat. Rev. Drug Disc. 15, 379–380, 2016). Still, today, only about one in ten drugs thatenterphase1clinicaltrialsreachespatients. Half those failures are due to a lack of efficacy, says Jackie Hunter, CEO of BenevolentBio, a division of BenevolentAI of London. “That tells you we’re not picking the right targets,” she says. “Even a 5 or 10% reduction in efficacy failure would be amazing.” Hunter’s views on AI in drug discovery are featured in Ernst Young’s BiotechnologyReport2017releasedlastmonth. Companies that have been watching AI from the sidelines are now jumping in. The best- known machine-learning model for drug dis- covery is perhaps IBM’s Watson. IBM signed a deal in December 2016 with Pfizer to aid the pharma giant’s immuno-oncology drug discov- eryefforts,addingtoastringofpreviousdealsin the biopharma space (Nat.Biotechnol.33,1219– 1220, 2015). IBM’s Watson hunts for drugs by sorting through vast amounts of textual data to provide quick analyses, and tests hypotheses by sorting through massive amounts of laboratory data, clinicalreportsandscientificpublications. BenevolentAI takes a similar approach with algorithmsthatminetheresearchliteratureand proprietary research databases. The explosion of biomedical data has driven much of industry’s interest in AI (Table 1). The confluence of ever-increasing computational horsepower and the proliferation of large data sets has prompted scientists to seek learning algorithms that can help them navigate such massive volumes of information. A lot of the excitement about AI in drug discovery has spilled over from other fields. Machine vision, which allows, among other things, self-driving cars, and language process- ing have given rise to sophisticated multilevel artificial neural networks known as deep- learning algorithms that can be used to model biological processes from assay data as well as textual data. In the past people didn’t have enough data to properly train deep-learning algorithms, says Mark Gerstein, a biomedical informat- ics professor at Yale University in New Haven, Connecticut.Nowresearchershavebeenableto build massive databases and harness them with these algorithms, he says. “I think that excite- ment is justified.” Numerate is one of a growing number of AI companies founded to take advantage of that dataonslaughtasappliedtodrugdiscovery.“We apply AI to chemical design at every stage,” says Guido Lanza, Numerate’s CEO. It will provide Tokyo-basedTakedawithcandidatesforclinical trials by virtual compound screenings against targets, designing and optimizing compounds, andmodelingabsorption,distribution,metabo- lism and excretion, and toxicity. The agreement includes undisclosed milestone payments and royalties. Academic laboratories are also embracing AI tools. In April, Atomwise of San Francisco launched its Artificial Intelligence Molecular Screen awards program, which will deliver 72 potentially therapeutic compounds to as many as 100 university research labs at no charge. Atomwise is a University of Toronto spinout that in 2015 secured an alliance with Merck of Kenilworth, New Jersey. For this new endeavor, it will screen 10 million molecules using its AtomNet platform to provide each lab with 72 compounds aimed at a specific target of the laboratory’s choosing. The Japanese government launched in 2016 a research consortium centered on using Japan’s K supercomputer to ramp up drug discovery efficiency across dozens of local companies and institutions. Among those involved are Takeda and tech giants Fujitsu of Tokyo, Japan, and NEC, also of Tokyo, as well as Kyoto University Hospital and Riken, Japan’s National Research and Development Institute, which will provide clinical data. Deep learning is starting to gain acolytes in the drug discovery space. KTSDESIGN/SciencePhotoLibrary N E W S©2017NatureAmerica,Inc.,partofSpringerNature.Allrightsreserved.
  • 114.
    604 VOLUME 35NUMBER 7 JULY 2017 NATURE BIOTECHNOLOGY AI-powered drug discovery captures pharma interest Adrug-huntingdealinkedlastmonth,between Numerate,ofSanBruno,California,andTakeda PharmaceuticaltouseNumerate’sartificialintel- ligence (AI) suite to discover small-molecule therapies for oncology, gastroenterology and central nervous system disorders, is the latest in a growing number of research alliances involv- ing AI-powered computational drug develop- ment firms. Also last month, GNS Healthcare of Cambridge, Massachusetts announced a deal with Roche subsidiary Genentech of South San Francisco, California to use GNS’s AI platform to better understand what affects the efficacy of knowntherapiesinoncology.InMay,Exscientia of Dundee, Scotland, signed a deal with Paris- based Sanofi that includes up to €250 ($280) million in milestone payments. Exscientia will provide the compound design and Sanofi the chemical synthesis of new drugs for diabetes and cardiovascular disease. The trend indicates thatthepharmaindustry’slong-runningskepti- cismaboutAIissofteningintogenuineinterest, driven by AI’s promise to address the industry’s principal pain point: clinical failure rates. The industry’s willingness to consider AI approaches reflects the reality that drug discov- eryislaborious,timeconsumingandnotpartic- ularly effective. A two-decade-long downward trend in clinical success rates has only recently improved (Nat. Rev. Drug Disc. 15, 379–380, 2016). Still, today, only about one in ten drugs thatenterphase1clinicaltrialsreachespatients. Half those failures are due to a lack of efficacy, says Jackie Hunter, CEO of BenevolentBio, a division of BenevolentAI of London. “That tells you we’re not picking the right targets,” she says. “Even a 5 or 10% reduction in efficacy failure would be amazing.” Hunter’s views on AI in drug discovery are featured in Ernst Young’s BiotechnologyReport2017releasedlastmonth. Companies that have been watching AI from the sidelines are now jumping in. The best- known machine-learning model for drug dis- covery is perhaps IBM’s Watson. IBM signed a deal in December 2016 with Pfizer to aid the pharma giant’s immuno-oncology drug discov- eryefforts,addingtoastringofpreviousdealsin the biopharma space (Nat.Biotechnol.33,1219– 1220, 2015). IBM’s Watson hunts for drugs by sorting through vast amounts of textual data to provide quick analyses, and tests hypotheses by sorting through massive amounts of laboratory data, clinicalreportsandscientificpublications. BenevolentAI takes a similar approach with algorithmsthatminetheresearchliteratureand proprietary research databases. The explosion of biomedical data has driven much of industry’s interest in AI (Table 1). The confluence of ever-increasing computational horsepower and the proliferation of large data sets has prompted scientists to seek learning algorithms that can help them navigate such massive volumes of information. A lot of the excitement about AI in drug discovery has spilled over from other fields. Machine vision, which allows, among other things, self-driving cars, and language process- ing have given rise to sophisticated multilevel artificial neural networks known as deep- learning algorithms that can be used to model biological processes from assay data as well as textual data. In the past people didn’t have enough data to properly train deep-learning algorithms, says Mark Gerstein, a biomedical informat- ics professor at Yale University in New Haven, Connecticut.Nowresearchershavebeenableto build massive databases and harness them with these algorithms, he says. “I think that excite- ment is justified.” Numerate is one of a growing number of AI companies founded to take advantage of that dataonslaughtasappliedtodrugdiscovery.“We apply AI to chemical design at every stage,” says Guido Lanza, Numerate’s CEO. It will provide Tokyo-basedTakedawithcandidatesforclinical trials by virtual compound screenings against targets, designing and optimizing compounds, andmodelingabsorption,distribution,metabo- lism and excretion, and toxicity. The agreement includes undisclosed milestone payments and royalties. Academic laboratories are also embracing AI tools. In April, Atomwise of San Francisco launched its Artificial Intelligence Molecular Screen awards program, which will deliver 72 potentially therapeutic compounds to as many as 100 university research labs at no charge. Atomwise is a University of Toronto spinout that in 2015 secured an alliance with Merck of Kenilworth, New Jersey. For this new endeavor, it will screen 10 million molecules using its AtomNet platform to provide each lab with 72 compounds aimed at a specific target of the laboratory’s choosing. The Japanese government launched in 2016 a research consortium centered on using Japan’s K supercomputer to ramp up drug discovery efficiency across dozens of local companies and institutions. Among those involved are Takeda and tech giants Fujitsu of Tokyo, Japan, and NEC, also of Tokyo, as well as Kyoto University Hospital and Riken, Japan’s National Research and Development Institute, which will provide clinical data. Deep learning is starting to gain acolytes in the drug discovery space. KTSDESIGN/SciencePhotoLibrary N E W S©2017NatureAmerica,Inc.,partofSpringerNature.Allrightsreserved. Genomics data analytics startup WuXi NextCode Genomics of Shanghai; Cambridge, Massachusetts; and Reykjavík, Iceland, collab- orated with researchers at Yale University on a study that used the company’s deep-learning algorithm to identify a key mechanism in blood vessel growth. The result could aid drug discovery efforts aimed at inhibiting blood vessel growth in tumors (Nature doi:10.1038/ nature22322, 2017). IntheUS,duringtheObamaadministration, industry and academia joined forces to apply AI to accelerate drug discovery as part of the CancerMoonshotinitiative (Nat.Biotechnol.34 , 119, 2016). The Accelerating Therapeutics for Opportunities in Medicine (ATOM), launched in January 2016, marries computational and experimental approaches, with Brentford, UK-based GlaxoSmithKline, participating with Lawrence Livermore National Laboratory in Livermore, California, and the US National Cancer Institute. The computational portion of the process, which includes deep-learning and other AI algorithms, will be tested in the first two years. In the third year, “we hope to start on day one with a disease hypothesis and on day 365 to deliver a drug candidate,” says MarthaHead,GlaxoSmithKline’s head, insights from data. Table 1 Selected collaborations in the AI-drug discovery space AI company/ location Technology Announced partner/ location Indication(s) Deal date Atomwise Deep-learning screening from molecular structure data Merck Malaria 2015 BenevolentAI Deep-learning and natural language processing of research literature Janssen Pharmaceutica (Johnson Johnson), Beerse, Belgium Multiple November 8, 2016 Berg, Framingham, Massachusetts Deep-learning screening of biomarkers from patient data None Multiple N/A Exscientia Bispecific compounds via Bayesian models of ligand activity from drug discovery data Sanofi Metabolic diseases May 9, 2017 GNS Healthcare Bayesian probabilistic inference for investigating efficacy Genentech Oncology June 19, 2017 Insilico Medicine Deep-learning screening from drug and disease databases None Age-related diseases N/A Numerate Deep learning from pheno- typic data Takeda Oncology, gastro- enterology and central nervous system disorders June 12, 2017 Recursion, Salt Lake City, Utah Cellular phenotyping via image analysis Sanofi Rare genetic diseases April 25, 2016 twoXAR, Palo Alto, California Deep-learning screening from literature and assay data Santen Pharmaceuticals, Osaka, Japan Glaucoma February 23, 2017 N/A, none announced. Source: companies’ websites. N E W S
  • 115.
    •현재 하루에 10m개의 compound 를 스크리닝 가능 •실험보다 10,000배, Ultra HTS 보다 100배 빠름 •Toxicity, side effects, mechanism of action, efficacy 등의 규명을 위해서도 사용 •머크를 포함한 10개의 제약사, 하버드 등 40개 연구 기관과 프로젝트 진행 중 •대상 질병: Alzheimer's disease, bacterial infections, antibiotics, nephrology, 
 
 ophthalmology, immuno-oncology, metabolic and childhood liver diseases 등
  • 116.
    BRIEF COMMUNICATION https://doi.org/10.1038/s41587-019-0224-x 1 Insilico MedicineHong Kong Ltd, Pak Shek Kok, New Territories, Hong Kong. 2 WuXi AppTec Co., Ltd, Shanghai, China. 3 Department of Chemistry, University of Toronto, Toronto, Ontario, Canada. 4 Department of Computer Science, University of Toronto, Toronto, Ontario, Canada. 5 Vector Institute for Artificial Intelligence, Toronto, Ontario, Canada. 6 Canadian Institute for Advanced Research, Toronto, Ontario, Canada. *e-mail: alex@insilico.com We have developed a deep generative model, generative tenso- rial reinforcement learning (GENTRL), for de novo small-mole- cule design. GENTRL optimizes synthetic feasibility, novelty, and biological activity. We used GENTRL to discover potent inhibitors of discoidin domain receptor 1 (DDR1), a kinase tar- get implicated in fibrosis and other diseases, in 21 days. Four compounds were active in biochemical assays, and two were validated in cell-based assays. One lead candidate was tested and demonstrated favorable pharmacokinetics in mice. Drug discovery is resource intensive, and involves typical time- lines of 10–20 years and costs that range from US$0.5 billion to US$2.6 billion1,2 . Artificial intelligence promises to accelerate this process and reduce costs by facilitating the rapid identification of compounds3,4 . Deep generative models are machine learning tech- niques that use neural networks to produce new data objects. These techniques can generate objects with certain properties, such as activity against a given target, that make them well suited for the discovery of drug candidates. However, few examples of generative drug design have achieved experimental validation involving syn- thesis of novel compounds for in vitro and in vivo investigation5–16 . Discoidin domain receptor 1 (DDR1) is a collagen-activated pro- inflammatory receptor tyrosine kinase that is expressed in epithelial cellsandinvolvedinfibrosis17 .However,itisnotclearwhetherDDR1 directly regulates fibrotic processes, such as myofibroblast activa- tion and collagen deposition, or earlier inflammatory events that are associated with reduced macrophage infiltration. Since 2013, at least eight chemotypes have been published as selective DDR1 (or DDR1 and DDR2) small-molecule inhibitors (Supplementary Table 1). Recently, a series of highly selective, spiro-indoline-based DDR1 inhibitors were shown to have potential therapeutic efficacy against renal fibrosis in a Col4a3–/– mice model of Alport syndrome18 . A wider diversity of DDR1 inhibitors would therefore enable further basic understanding and therapeutic intervention. We developed generative tensorial reinforcement learning (GENTRL), a machine learning approach for de novo drug design. GENTRL prioritizes the synthetic feasibility of a compound, its effectiveness against a given biological target, and how distinct it is from other molecules in the literature and patent space. In this work, GENTRL was used to rapidly design novel compounds that are active against DDR1 kinase. Six of these compounds, each complying with Lipinski’s rules1 , were designed, synthesized, and experimentally tested in 46 days, which demonstrates the potential of this approach to provide rapid and effective molecular design (Fig. 1a). To create GENTRL, we combined reinforcement learning, varia- tional inference, and tensor decompositions into a generative two- step machine learning algorithm (Supplementary Fig. 1)19 . First, we learned a mapping of chemical space, a set of discrete molecular graphs, to a continuous space of 50 dimensions. We parameterized the structure of the learned manifold in the tensor train format to use par- tially known properties. Our auto-encoder-based model compresses the space of structures onto a distribution that parameterizes the latent space in a high-dimensional lattice with an exponentially large number of multidimensional Gaussians in its nodes. This parameter- ization ties latent codes and properties, and works with missing values without their explicit input. In the second step, we explored this space with reinforcement learning to discover new compounds. GENTRL uses three distinct self-organizing maps (SOMs) as reward functions: the trending SOM, the general kinase SOM, and the specific kinase SOM. The trending SOM is a Kohonen-based reward function that scores compound novelty using the applica- tion priority date of structures that have been disclosed in patents. Neurons that are abundantly populated with novel chemical entities reward the generative model. The general kinase SOM is a Kohonen map that distinguishes kinase inhibitors from other classes of mol- ecules. The specific kinase SOM isolates DDR1 inhibitors from the total pool of kinase-targeted molecules. GENTRL prioritizes the structures it generates by using these three SOMs in sequence. We used six data sets to build the model: (1) a large set of mole- cules derived from a ZINC data set, (2) known DDR1 kinase inhibi- tors, (3) common kinase inhibitors (positive set), (4) molecules that act on non-kinase targets (negative set), (5) patent data for biologi- cally active molecules that have been claimed by pharmaceutical companies, and (6) three-dimensional (3D) structures for DDR1 inhibitors (Supplementary Table 1). Data sets were preprocessed to exclude gross outliers and to reduce the number of compounds that contained similar structures (see Methods). We started to train GENTRL (pretraining) on a filtered ZINC database (data set 1, described earlier), and then continued train- ing using the DDR1 and common kinase inhibitors (data set 2 and data set 3). We then launched the reinforcement learning stage with the reward described earlier. We obtained an initial output of 30,000 structures (Supplementary Data Set), which were then Deep learning enables rapid identification of potent DDR1 kinase inhibitors Alex Zhavoronkov 1 *, Yan A. Ivanenkov1 , Alex Aliper1 , Mark S. Veselov1 , Vladimir A. Aladinskiy1 , Anastasiya V. Aladinskaya1 , Victor A. Terentiev1 , Daniil A. Polykovskiy1 , Maksim D. Kuznetsov1 , Arip Asadulaev1 , Yury Volkov1 , Artem Zholus1 , Rim R. Shayakhmetov1 , Alexander Zhebrak1 , Lidiya I. Minaeva1 , Bogdan A. Zagribelnyy1 , Lennart H. Lee 2 , Richard Soll2 , David Madge2 , Li Xing2 , Tao Guo 2 and Alán Aspuru-Guzik3,4,5,6 NATURE BIOTECHNOLOGY | VOL 37 | SEPTEMBER 2019 | 1038–1040 | www.nature.com/naturebiotechnology1038 • 딥러닝으로 46일” 만에 신약 후보 물질 디자인하기 • 홍콩의 인공지능 신약개발 회사 Insilico Medicine 의 연구 (Nat Biotech, Sep 2019) • DDR1 이라는 fibrosis (섬유증) 관련 표적에 대한 신약 후보 물질 발굴 연구 • 강화 학습 (reinforced learning)에 기반한 deep generative model 을 디자인 • 학습 데이터: 기존에 알려진 small molecule DB, 알려진 DDR1 inhibitor, DDR1의 3D 구조 등 • reward function으로 3가지 SOM을 사용: trending/general kinase/specific kinase SOM • 다른 타겟에 대해서도 범용적으로 적용할 수 있는 방법인가?
  • 117.
    BRIEF COMMUNICATION https://doi.org/10.1038/s41587-019-0224-x 1 Insilico MedicineHong Kong Ltd, Pak Shek Kok, New Territories, Hong Kong. 2 WuXi AppTec Co., Ltd, Shanghai, China. 3 Department of Chemistry, University of Toronto, Toronto, Ontario, Canada. 4 Department of Computer Science, University of Toronto, Toronto, Ontario, Canada. 5 Vector Institute for Artificial Intelligence, Toronto, Ontario, Canada. 6 Canadian Institute for Advanced Research, Toronto, Ontario, Canada. *e-mail: alex@insilico.com We have developed a deep generative model, generative tenso- rial reinforcement learning (GENTRL), for de novo small-mole- cule design. GENTRL optimizes synthetic feasibility, novelty, and biological activity. We used GENTRL to discover potent inhibitors of discoidin domain receptor 1 (DDR1), a kinase tar- get implicated in fibrosis and other diseases, in 21 days. Four compounds were active in biochemical assays, and two were validated in cell-based assays. One lead candidate was tested and demonstrated favorable pharmacokinetics in mice. Drug discovery is resource intensive, and involves typical time- lines of 10–20 years and costs that range from US$0.5 billion to US$2.6 billion1,2 . Artificial intelligence promises to accelerate this process and reduce costs by facilitating the rapid identification of compounds3,4 . Deep generative models are machine learning tech- niques that use neural networks to produce new data objects. These techniques can generate objects with certain properties, such as activity against a given target, that make them well suited for the discovery of drug candidates. However, few examples of generative drug design have achieved experimental validation involving syn- thesis of novel compounds for in vitro and in vivo investigation5–16 . Discoidin domain receptor 1 (DDR1) is a collagen-activated pro- inflammatory receptor tyrosine kinase that is expressed in epithelial cellsandinvolvedinfibrosis17 .However,itisnotclearwhetherDDR1 directly regulates fibrotic processes, such as myofibroblast activa- tion and collagen deposition, or earlier inflammatory events that are associated with reduced macrophage infiltration. Since 2013, at least eight chemotypes have been published as selective DDR1 (or DDR1 and DDR2) small-molecule inhibitors (Supplementary Table 1). Recently, a series of highly selective, spiro-indoline-based DDR1 inhibitors were shown to have potential therapeutic efficacy against renal fibrosis in a Col4a3–/– mice model of Alport syndrome18 . A wider diversity of DDR1 inhibitors would therefore enable further basic understanding and therapeutic intervention. We developed generative tensorial reinforcement learning (GENTRL), a machine learning approach for de novo drug design. GENTRL prioritizes the synthetic feasibility of a compound, its effectiveness against a given biological target, and how distinct it is from other molecules in the literature and patent space. In this work, GENTRL was used to rapidly design novel compounds that are active against DDR1 kinase. Six of these compounds, each complying with Lipinski’s rules1 , were designed, synthesized, and experimentally tested in 46 days, which demonstrates the potential of this approach to provide rapid and effective molecular design (Fig. 1a). To create GENTRL, we combined reinforcement learning, varia- tional inference, and tensor decompositions into a generative two- step machine learning algorithm (Supplementary Fig. 1)19 . First, we learned a mapping of chemical space, a set of discrete molecular graphs, to a continuous space of 50 dimensions. We parameterized the structure of the learned manifold in the tensor train format to use par- tially known properties. Our auto-encoder-based model compresses the space of structures onto a distribution that parameterizes the latent space in a high-dimensional lattice with an exponentially large number of multidimensional Gaussians in its nodes. This parameter- ization ties latent codes and properties, and works with missing values without their explicit input. In the second step, we explored this space with reinforcement learning to discover new compounds. GENTRL uses three distinct self-organizing maps (SOMs) as reward functions: the trending SOM, the general kinase SOM, and the specific kinase SOM. The trending SOM is a Kohonen-based reward function that scores compound novelty using the applica- tion priority date of structures that have been disclosed in patents. Neurons that are abundantly populated with novel chemical entities reward the generative model. The general kinase SOM is a Kohonen map that distinguishes kinase inhibitors from other classes of mol- ecules. The specific kinase SOM isolates DDR1 inhibitors from the total pool of kinase-targeted molecules. GENTRL prioritizes the structures it generates by using these three SOMs in sequence. We used six data sets to build the model: (1) a large set of mole- cules derived from a ZINC data set, (2) known DDR1 kinase inhibi- tors, (3) common kinase inhibitors (positive set), (4) molecules that act on non-kinase targets (negative set), (5) patent data for biologi- cally active molecules that have been claimed by pharmaceutical companies, and (6) three-dimensional (3D) structures for DDR1 inhibitors (Supplementary Table 1). Data sets were preprocessed to exclude gross outliers and to reduce the number of compounds that contained similar structures (see Methods). We started to train GENTRL (pretraining) on a filtered ZINC database (data set 1, described earlier), and then continued train- ing using the DDR1 and common kinase inhibitors (data set 2 and data set 3). We then launched the reinforcement learning stage with the reward described earlier. We obtained an initial output of 30,000 structures (Supplementary Data Set), which were then Deep learning enables rapid identification of potent DDR1 kinase inhibitors Alex Zhavoronkov 1 *, Yan A. Ivanenkov1 , Alex Aliper1 , Mark S. Veselov1 , Vladimir A. Aladinskiy1 , Anastasiya V. Aladinskaya1 , Victor A. Terentiev1 , Daniil A. Polykovskiy1 , Maksim D. Kuznetsov1 , Arip Asadulaev1 , Yury Volkov1 , Artem Zholus1 , Rim R. Shayakhmetov1 , Alexander Zhebrak1 , Lidiya I. Minaeva1 , Bogdan A. Zagribelnyy1 , Lennart H. Lee 2 , Richard Soll2 , David Madge2 , Li Xing2 , Tao Guo 2 and Alán Aspuru-Guzik3,4,5,6 NATURE BIOTECHNOLOGY | VOL 37 | SEPTEMBER 2019 | 1038–1040 | www.nature.com/naturebiotechnology1038 BRIEF COMMUNICATIONNATUREBIOTECHNOLOGY IC50 (DDR2) = 234 nM IC50 (DDR1) = 10 nM IC50 (DDR1) = 21 nM IC50 (DDR1) = 278 nM IC50 (DDR2) = 162 nM IC50 (DDR2) = 76 nM IC50 (DDR1) = 1,000 nM IC50 (DDR1) 10 4 nM IC50 (DDR1) 10 4 nM IC50 (DDR2) 10 4 nMIC50 (DDR2) 10 4 nM IC50 (DDR2) = 649 nM 7 days a 12 days 2 days Prioritization Descriptors MCFs Clustering Diversity GENTRL Model training Structure generation Reward functions - Kohonen maps - Novelty Pharmacophore hypothesis Databases - Kinase inhibitors - Non-kinase set - X-ray data - IP base Preprocessing Outliers Reference compounds Day 9 Target selection by WuXi AppTec DDR1 kinase 30,000 structures Vendors Kohonen RMSD Sammon Parent structure (IC 50 = 15.5 nM, DDR1) 25 days Synthesis Synthetic routes analysis Prioritization 40 structures Day 23 Day 35 Day 46 IP assessment Biological evaluation Six compounds WuXi AppTec Novel nanomolar hits b c • 딥러닝으로 46일” 만에 신약 후보 물질 디자인하기 • DB와 강화학습 기반의 모델을 바탕으로 30,000여 개의 화합물 디자인 (Day 21) • 몇 가지 기준을 바탕으로 chemical space를 모두 커버할 수 있는 40개 화합물 선택 • 그 중에서 6개를 합성하는 데 성공 (Day 35) • 이 6 가지 후보 물질 중 1, 2번이 in vitro 에서 strong inhibition (IC50 = 10nM, 21nM) • 1, 2번 화합물은 cell based assay 에서 fibrotic marker 를 효과적으로 inhibition 확인 • 1번을 rodent model 에서 half life 까지 확인 (Day 46)
  • 118.
    AnalysisTarget Discovery AnalysisLeadDiscovery Clinical Trial Post Market Surveillance Digital Healthcare in Drug Development •환자 모집 •데이터 측정: 센서웨어러블 •디지털 표현형 •복약 순응도
  • 120.
    Empowering the OncologyCommunity for Cancer Care Genomics Oncology Clinical Trial Matching Watson Health’s oncology clients span more than 35 hospital systems “Empowering the Oncology Community for Cancer Care” Andrew Norden, KOTRA Conference, March 2017, “The Future of Health is Cognitive”
  • 121.
    IBM Watson Health Watsonfor Clinical Trial Matching (CTM) 18 1. According to the National Comprehensive Cancer Network (NCCN) 2. http://csdd.tufts.edu/files/uploads/02_-_jan_15,_2013_-_recruitment-retention.pdf© 2015 International Business Machines Corporation Searching across eligibility criteria of clinical trials is time consuming and labor intensive Current Challenges Fewer than 5% of adult cancer patients participate in clinical trials1 37% of sites fail to meet minimum enrollment targets. 11% of sites fail to enroll a single patient 2 The Watson solution • Uses structured and unstructured patient data to quickly check eligibility across relevant clinical trials • Provides eligible trial considerations ranked by relevance • Increases speed to qualify patients Clinical Investigators (Opportunity) • Trials to Patient: Perform feasibility analysis for a trial • Identify sites with most potential for patient enrollment • Optimize inclusion/exclusion criteria in protocols Faster, more efficient recruitment strategies, better designed protocols Point of Care (Offering) • Patient to Trials: Quickly find the right trial that a patient might be eligible for amongst 100s of open trials available Improve patient care quality, consistency, increased efficiencyIBM Confidential
  • 122.
    AnalysisTarget Discovery AnalysisLeadDiscovery Clinical Trial Post Market Surveillance Digital Healthcare in Drug Development •개인 유전 정보 분석 •블록체인 기반 유전체 분석 •딥러닝 기반 후보 물질 •인공지능+제약사 •환자 모집 •데이터 측정: 웨어러블 •디지털 표현형 •복약 순응도 •SNS 기반의 PMS •블록체인 기반의 PMS
  • 123.
  • 124.
  • 125.
    AnalysisTarget Discovery AnalysisLeadDiscovery Clinical Trial Post Market Surveillance Digital Healthcare in Drug Development •개인 유전 정보 분석 •블록체인 기반 유전체 분석 •딥러닝 기반 후보 물질 •인공지능+제약사 •환자 모집 •데이터 측정: 웨어러블 •디지털 표현형 •복약 순응도 •SNS 기반의 PMS •블록체인 기반의 PMS + Digital Therapeutics
  • 126.
  • 127.
    The Birth ofPrescription Digital Therapeutics, Pear Therapeutics and InCrowd, IIeX 2018”
  • 128.
  • 129.
    “치료 효과가 있는‘게임’이 아니라, ‘치료제’가 (어쩌다보니) 게임의 형식을 가진 것이다” by Eddie Martucci, CEO of Akili Interactive, at DTxDM East 2018
  • 130.
    www.dtxalliance.org Digital Therapeutics: Combining Technologyand Evidence-based Medicine to Transform Personalized Patient Care Nov 2018
  • 131.
    5www.dtxalliance.org Defining Digital Therapeutics Thoughtleaders across the digital therapeutics industry, supported by the Digital Therapeutics Alliance, collaborated to develop the following comprehensive definition: Digital therapeutics (DTx) deliver evidence-based therapeutic interventions to patients that are driven by high quality software programs to prevent, manage, or treat a medical disorder or disease. They are used independently or in concert with medications, devices, or other therapies to optimize patient care and health outcomes. DTx products incorporate advanced technology best practices relating to design, clinical validation, usability, and data security. They are reviewed and cleared or approved by regulatory bodies as required to support product claims regarding risk, efficacy, and intended use. Digital therapeutics empower patients, healthcare providers, and payers with intelligent and accessible tools for addressing a wide range of conditions through high quality, safe, and effective data-driven interventions. Digital therapeutics present the market with evidence-based technologies that have the ability to elevate medical best practices, address unmet medical needs, expand healthcare access, and improve clinical and health economic outcomes. • 질병을 예방, 관리, 혹은 치료하는 고도의 소프트웨어 프로그램 • 독립적으로 사용될 수도 있고, 약제/기기/다른 치료제와 함께 사용될 수 있음 • 효능, 목적, 위험도 등의 주장과 관련해서는 규제기관의 인허가를 거침
  • 132.
    8 www.dtxalliance.org Developing IndustryStandards The direct delivery of personalized treatment interventions to patients places digital therapeutics in a unique position, one full of additional responsibility and promise. Given the diversity of interventions being delivered by digital therapeutics and the types of disease states addressed, it is important for all products to adhere to industry-adopted core principles and best practices. Core principles all digital therapeutics must adhere to: Prevent, manage, or treat a medical disorder or disease Produce a medical intervention that is driven by software, and delivered via software or complementary hardware, medical device, service, or medication Incorporate design, manufacture, and quality best practices Engage end users in product development and usability processes Incorporate patient privacy and security protections Apply product deployment, management, and maintenance best practices Publish trial results inclusive of clinically-meaningful outcomes in peer-reviewed journals Be reviewed and cleared or approved by regulatory bodies as required to support product claims of risk, efficacy, and intended use Make claims appropriate to clinical validation and regulatory status Collect, analyze, and apply real world evidence and product performance data Digital therapeutics are designed to integrate into patient lifestyles and provider workflows to deliver a fully integrated healthcare experience with improved outcomes. • 모든 digital therapeutics 가 따라야 하는 Core Principle: • 이 medical intervention은 소프트웨어에 의해서 주도(driven by)되고, • 또한 소프트웨어, 혹은 보완적인 하드웨어나 의료기기, 약을 통해 전달(delivered) 된다.
  • 133.
    디지털 헬스케어 *SaMD: Softwareas a Medical Device 의료기기 디지털 치료제 런키퍼 슬립싸이클 엑스레이 기기 혈압계 체온계 엠페티카 얼라이브코어 프로테우스 의료 인공지능 왓슨 캄 페어 알킬리 어플라이드VR (활동량 측정 웨어러블) (달리기 모니터링 앱) (수면 모니터링 앱) (암 환자 진료 보조) (뇌전증 발작 측정 웨어러블) (심전도 측정 가젯) (복약 측정용 먹는 센서) (명상 앱) (중독 치료앱) (ADHD 치료용 게임) (진통 효과 VR) 하드웨어 기반의 의료기기들 최윤섭 디지털 헬스케어 연구소 소장 최윤섭, PhD yoonsup.choi@gmail.com S/W SaMD* 뷰노 루닛 지브라 메디컬 IDx (엑스레이 기흉 등) (안저 사진 당뇨성 망막병증) (엑스레이 골연령 등) (엑스레이 폐결절 등) 핏빗
  • 134.
    제품의 목적 1.건강 관리 2. 질병의 관리/예방 3. 다른 의약품의 최적화 4. 질병 치료 제품의 유효성, 위해도, 사용 목적 등에 대한 주장 규제 기관 재량 (항상 규제 받는 것은 아님) 제3자의 검증이 필요하며, 규제 기관의 규제를 받음 질병과 관련된 제품의 주장 범위 질병에 관련한 유효성 주장은 허용되지 않음 낮음~중간의 위해도 (eg. 질병의 진행 속도를 늦춰줌) 중간~높은 위해도 (eg. 기존 약제의 유효성을 높여줌) 중간~높은 위해도 (eg. 질병 치료 등 의학적인 유효성) 임상적인 근거 임상 시험이 필요하며, 지속적인 근거의 창출이 필요 구매 방식 환자 직접 구매 (DTC)
 (의사 처방 필요 x) 일반의약품 (over-the-counter)
 혹은 의사 처방 필요 의사 처방 필요 다른 약제와의 관계 독립적으로 사용 
 or 다른 약제 간접 지원 단독 투여
 or
 병용 투여 병용 투여 단독 투여
 or
 병용 투여 디지털 치료제의 종류
  • 135.
    대표적인 Digital Therapeutics의사례연구 • Pear Therapeutics • Akili Interactive • Click Therapeutics • Dthera Science • Noom, Omada Health • Hurray Positive, SK Health Connect • Virtual Vietnam • AppliedVR • Woebot • Cognoa • Propeller Health • Neofect
  • 136.
    대표적인 Digital Therapeutics의사례연구 • Pear Therapeutics • Akili Interactive • Click Therapeutics • Dthera Science • Noom, Omada Health • Hurray Positive, SK Health Connect • Virtual Vietnam • AppliedVR • Woebot • Cognoa • Propeller Health • Neofect
  • 137.
  • 138.
    Pear Therapeutics •Pear Therapeutics의reSET •의사의 ‘처방’을 받아, 12주에 걸쳐 알콜, 코카인, 대마 등의 중독과 의존성을 치료 •스마트폰 앱 만으로 치료용 FDA 인허가 (De Novo)를 받은 것은 최초 (2017년 9월) •업계에서는 digital therapeutics의 시초로 이 Pear Therapeutics를 꼽음
  • 139.
    •reSET 의 Indicationfor Use •18세 이상의, Substance Use Disorder(SUD)으로, 외래 진료를 받는 환자에게 •의사의 감독 하에, 기존의 contingency management system 에 더하여 (adjunctive to) •CBT(Cognitive Behavioral Therapy)를 12주 동안 제공하여, •SUD에 대한 abstinence와 치료 프로그램의 retention을 증가시키는 것이 목적
  • 140.
    RCT of reSET DENOVO CLASSIFICATION REQUEST FOR RESET logistic Generalized Estimating Equations (GEE) model with factors for treatment, time and treatment X time (“treatment times time”) interaction. Missing data were treated as failures. The analysis results of abstinence for cohort 1 and 2 are presented below, additionally compared by abstinence at baseline. The abstinence analyses were completed in the context of a GEE model that incorporates within-subject variability across the observation window and estimates abstinence at specified time points based on the model the analyses yields percentages rather than absolute numbers. The number of patients reported in the table below represents the number of patients in that entire group (e.g., N=252 patients in Cohort 1 were in the TAU group overall; N=139 patients were abstinent at baseline in the Cohort 1 TAU group). Table 3: Abstinence rates in Cohorts 1 (N=507) and 2 (N=399) Patients who received rTAU + reSET had statistically significant increased odds of remaining abstinent at the end of treatment: Cohort 1: Odds ratio=2.22, 95% CI (1.24, 3.99); p=0.0076 Cohort 2: Odds ratio=3.17, 95% CI (1.68, 5.99); p=0.0004. Cohort 3 (all opioids excluded, N=153 TAU, N=152 rTAU+reSET) had similar abstinence to cohorts 1 and 2, with abstinence rates in the rTAU + reSET arm of 38.5% compared to 17.5% in the TAU arm (Odds Ratio=2.95, 95% CI=1.43, 6.09, p=0.0034). Abstinence: patients who were abstinent at baseline: Patients who were abstinent at baseline were significantly more likely to remain abstinent throughout the study than patients who were not abstinent at baseline for both patients who received TAU and patients who received rTAU + reSET. • TAU(Treatment As Usual)과 rTAU(reduced TAU)+reSET을 RCT • Primary Opioid를 포함/제외하여 따로 분석 • Baseline에서 Abstinent/non-abstinent를 별개로 분석
  • 141.
    RCT of reSET DENOVO CLASSIFICATION REQUEST FOR RESET logistic Generalized Estimating Equations (GEE) model with factors for treatment, time and treatment X time (“treatment times time”) interaction. Missing data were treated as failures. The analysis results of abstinence for cohort 1 and 2 are presented below, additionally compared by abstinence at baseline. The abstinence analyses were completed in the context of a GEE model that incorporates within-subject variability across the observation window and estimates abstinence at specified time points based on the model the analyses yields percentages rather than absolute numbers. The number of patients reported in the table below represents the number of patients in that entire group (e.g., N=252 patients in Cohort 1 were in the TAU group overall; N=139 patients were abstinent at baseline in the Cohort 1 TAU group). Table 3: Abstinence rates in Cohorts 1 (N=507) and 2 (N=399) Patients who received rTAU + reSET had statistically significant increased odds of remaining abstinent at the end of treatment: Cohort 1: Odds ratio=2.22, 95% CI (1.24, 3.99); p=0.0076 Cohort 2: Odds ratio=3.17, 95% CI (1.68, 5.99); p=0.0004. Cohort 3 (all opioids excluded, N=153 TAU, N=152 rTAU+reSET) had similar abstinence to cohorts 1 and 2, with abstinence rates in the rTAU + reSET arm of 38.5% compared to 17.5% in the TAU arm (Odds Ratio=2.95, 95% CI=1.43, 6.09, p=0.0034). Abstinence: patients who were abstinent at baseline: Patients who were abstinent at baseline were significantly more likely to remain abstinent throughout the study than patients who were not abstinent at baseline for both patients who received TAU and patients who received rTAU + reSET. • Cohort 2 (Excluding Primary Opioid) 의, • Overall 과 Non-abstinent at baseline 그룹에서 통계적으로 유의한 차이
  • 142.
    RCT of reSET DENOVO CLASSIFICATION REQUEST FOR RESET The Kaplan-Meier curve for cohort 1 is shown below: Figure 2: Kaplan-Meier curve for Cohort 1 (all comers) Adverse events In the entire clinical study, the number of patients with any adverse event was 13% (n=66). The number of patients with any event was 29 (11.5%) in TAU and 37 (14.5%) in reSET + rTAU (p = 0.3563). None of the adverse events in the reSET arm were adjudicated by the study investigators to be device-related. The events evaluated were typical of patients with SUD, including cardiovascular disease, gastrointestinal events, depression, mania, suicidal behavior, • Cohort1에서 TAU와 rTAU+reSET의 12주 이후 retention을 비교 • Retention에도 통계적으로 유의미한 차이 확인
  • 144.
    • Somryst: 불면증치료용 DTx 를 2020년 3월 FDA 인허가 • 510(k)와 Pre-Cert 를 동시에 진행 • 불면증 CBT (인지행동치료) 소프트웨어: 처방을 받아야만 사용 가능 • 두 개의 RCT (무작위 대조군 임상시험)을 통해서 유효성을 증명 • 총 1,400명 이상의 불면증 및 우울증+불면증 환자가 등록 • 한 대규모 임상에서 1,00명의 성인에게 9주 동안 치료를 제공하여 불면증에 유의미한 개선 • 임상적 유효성이 18개월 동안 지속 되었음
  • 145.
    대표적인 Digital Therapeutics의사례연구 • Pear Therapeutics • Akili Interactive • Click Therapeutics • Dthera Science • Noom, Omada Health • Hurray Positive, SK Health Connect • Virtual Vietnam • AppliedVR • Woebot • Cognoa • Propeller Health • Neofect
  • 147.
  • 148.
    LETTER doi:10.1038/nature12486 Video gametraining enhances cognitive control in older adults J. A. Anguera1,2,3 , J. Boccanfuso1,3 , J. L. Rintoul1,3 , O. Al-Hashimi1,2,3 , F. Faraji1,3 , J. Janowich1,3 , E. Kong1,3 , Y. Larraburo1,3 , C. Rolle1,3 , E. Johnston1 A. Gazzaley1,2,3,4 Cognitivecontrolisdefinedbyasetofneuralprocessesthatallowusto interact with our complex environment in a goal-directed manner1 . Humans regularly challenge these control processes when attempting to simultaneously accomplish multiple goals (multitasking), generat- ing interference as the result of fundamental information processing limitations2 . It is clear that multitasking behaviour has become ubi- quitous in today’s technologically dense world3 , and substantial evid- ence has accrued regarding multitasking difficulties and cognitive control deficits in our ageing population4 . Here we show that multi- tasking performance, as assessed with a custom-designed three- dimensional video game (NeuroRacer), exhibits a linear age-related decline from 20 to 79 years of age. By playing an adaptive version of NeuroRacer in multitasking training mode, older adults (60 to 85 years old) reduced multitasking costs compared to both an active control group and a no-contact control group, attaining levels beyond those achieved by untrained 20-year-old participants, with gains persisting for 6 months. Furthermore, age-related deficits in neural signatures of cognitive control, as measured with electroencephalo- graphy,wereremediated by multitasking training (enhanced midline frontal theta power and frontal–posterior theta coherence). Critically, thistrainingresultedinperformancebenefitsthatextendedtountrained cognitive control abilities (enhanced sustained attention and working memory), with an increase in midline frontal theta power predicting the training-induced boost in sustained attention and preservation of multitasking improvement 6 months later. These findings high- light the robust plasticity of the prefrontal cognitive control system inthe ageing brain, and provide the first evidence, to our knowledge, ofhowacustom-designedvideogamecanbeusedtoassesscognitive abilities across the lifespan, evaluate underlying neural mechanisms, and serve as a powerful tool for cognitive enhancement. In a first experiment, we evaluated multitasking performance across the adult lifespan. A total of 174 participants spanning six decades of life (ages 20–79; ,30 individuals per decade) played a diagnostic version of NeuroRacertomeasuretheirperceptualdiscriminationability(‘signtask’) withandwithoutaconcurrentvisuomotortrackingtask(‘drivingtask’;see Supplementary Information for details of NeuroRacer). Performance was evaluated using two distinct game conditions: ‘sign only’ (respond as rapidly as possible to the appearance of a sign only when a green circle was present); and ‘sign and drive’ (simultaneously perform the sign task while maintaining a car in the centre of a winding road using a joystick (that is, ‘drive’; see Fig. 1a)). Perceptual discrimination performance was evaluatedusingthesignaldetectionmetricofdiscriminability(d9).A‘cost’ index was used to assess multitasking performance by calculating the percentage change in d9 from ‘sign only’ to ‘sign and drive’, such that greater cost (that is, a more negative percentage cost) indicates increased interference when simultaneously engaging in the two tasks (see Methods Summary). Prior to the assessment of multitasking costs, an adaptive staircase algorithm was used to determine the difficulty levels of the game at which each participant performed the perceptual discrimination and visuomotor tracking tasks in isolation at ,80% accuracy. These levels were then used to set the parameters of the component tasks in the multitasking condition, so that each individual played the game at a customizedchallengelevel.Thisensuredthatcomparisonswouldinform differences in the ability to multitask, and not merely reflect disparities in component skills (see Methods, Supplementary Figs 1 and 2, and Sup- plementary Information for more details). Multitasking performance diminished significantly across the adult lifespan in a linear fashion (that is, increasing cost, see Fig. 2a and Sup- plementaryTable1),withtheonlysignificantdifferenceincostbetween adjacent decades being the increase from the twenties (226.7% cost) to the thirties (238.6% cost). This deterioration in multitasking perform- ance is consistent with the pattern of performance decline across the lifespan observed for fluid cognitive abilities, such as reasoning5 and working memory6 . Thus, using NeuroRacer as a performance assess- ment tool, we replicated previously evidenced age-related multitasking deficits7,8 , and revealed that multitasking performance declines linearly as we advance in age beyond our twenties. In a second experiment, we explored whether older adults who trained by playing NeuroRacer in multitasking mode would exhibit improve- mentsintheirmultitaskingperformanceonthegame9,10 (thatis,diminished NeuroRacer costs). Critically, we also assessed whether this training 1 Department of Neurology, University of California, San Francisco, California 94158, USA. 2 Department of Physiology, University of California, San Francisco, California 94158, USA. 3 Center for Integrative Neuroscience, University of California, San Francisco, California 94158, USA. 4 Department of Psychiatry, University of California, San Francisco, California 94158, USA. 1 month MultitaskingSingle taskNo-contact control Initial visit NeuroRacer EEG and cognitive testing Drive only Sign only Sign and drive and 1 hour × 3 times per week × 1 month or Single task Multitask 6+ months Training intervention NeuroRacer or a b + + Figure 1 | NeuroRacer experimental conditions and training design. a, Screen shot captured during each experimental condition. b, Visualization of training design and measures collected at each time point. 5 S E P T E M B E R 2 0 1 3 | V O L 5 0 1 | N A T U R E | 9 7 Macmillan Publishers Limited. All rights reserved©2013
  • 149.
    OPEN ORIGINAL ARTICLE Characterizing cognitivecontrol abilities in children with 16p11.2 deletion using adaptive ‘video game’ technology: a pilot study JA Anguera1,2 , AN Brandes-Aitken1 , CE Rolle1 , SN Skinner1 , SS Desai1 , JD Bower3 , WE Martucci3 , WK Chung4 , EH Sherr1,5 and EJ Marco1,2,5 Assessing cognitive abilities in children is challenging for two primary reasons: lack of testing engagement can lead to low testing sensitivity and inherent performance variability. Here we sought to explore whether an engaging, adaptive digital cognitive platform built to look and feel like a video game would reliably measure attention-based abilities in children with and without neurodevelopmental disabilities related to a known genetic condition, 16p11.2 deletion. We assessed 20 children with 16p11.2 deletion, a genetic variation implicated in attention deficit/hyperactivity disorder and autism, as well as 16 siblings without the deletion and 75 neurotypical age-matched children. Deletion carriers showed significantly slower response times and greater response variability when compared with all non-carriers; by comparison, traditional non-adaptive selective attention assessments were unable to discriminate group differences. This phenotypic characterization highlights the potential power of administering tools that integrate adaptive psychophysical mechanics into video-game-style mechanics to achieve robust, reliable measurements. Translational Psychiatry (2016) 6, e893; doi:10.1038/tp.2016.178; published online 20 September 2016 INTRODUCTION Cognition is typically associated with measures of intelligence (for example, intellectual quotient (IQ)1 ), and is a reflection of one’s ability to perform higher-level processes by engaging specific mechanisms associated with learning, memory and reasoning. Such acts require the engagement of a specific subset of cognitive resources called cognitive control abilities,2–5 which engage the underlying neural mechanisms associated with atten- tion, working memory and goal-management faculties.6 These abilities are often assessed with validated pencil-and-paper approaches or, now more commonly with these same paradigms deployed on either desktop or laptop computers. These approaches are often less than ideal when assessing pediatric populations, as children have highly varied degree of testing engagement, leading to low test sensitivity.7–9 This is especially concerning when characterizing clinical populations, as increased performance variability in these groups often exceeds the range of testing sensitivity,7–9 limiting the ability to characterize cognitive deficits in certain populations. A proper assessment of cognitive control abilities in children is especially important, as these abilities allow children to interact with their complex environment in a goal-directed manner,10 are predictive of academic performance11 and are correlated with overall quality of life.12 For pediatric clinical populations, this characterization is especially critical as they are often assessed in an indirect fashion through intelligence quotients, parent report questionnaires13 and/or behavioral challenges,14 each of which fail to properly characterize these abilities in a direct manner. One approach to make testing more robust and user-friendly is to present material in an optimally engaging manner, a strategy particularly beneficial when assessing children. The rise of digital health technologies facilitates the ability to administer these types of tests on tablet-based technologies (that is, iPad) in a game-like manner.15 For instance, Dundar and Akcayir16 assessed tablet- based reading compared with book reading in school-aged children, and discovered that students preferred tablet-based reading, reporting it to be more enjoyable. Another approach used to optimize the testing experience involves the integration of adaptive staircase algorithms, as the incorporation of such appro- aches lead to more reliable assessments that can be completed in a timely manner. This approach, rooted in psychophysical research,17 has been a powerful way to ensure that individuals perform at their ability level on a given task, mitigating the possi- bility of floor/ceiling effects. With respect to assessing individual abilities, the incorporation of adaptive mechanics acts as a normalizing agent for each individual in accordance with their underlying cognitive abilities,18 facilitating fair comparisons between groups (for example, neurotypical and study populations). Adaptive mechanics in a consumer-style video game experi- ence could potentially assist in the challenge of interrogating cognitive abilities in a pediatric patient population. This synergistic approach would seemingly raise one’s level of engagement by making the testing experience more enjoyable and with greater sensitivity to individual differences, a key aspect typically missing in both clinical and research settings when testing these populations. Video game approaches have previously been utilized in clinical adult populations (for example, stroke,19,20 1 Department of Neurology, University of California, San Francisco, San Francisco, CA, USA; 2 Department of Psychiatry, University of California, San Francisco, San Francisco, CA, USA; 3 Akili Interactive Labs, Boston, MA, USA; 4 Department of Pediatrics, Columbia University Medical Center, New York, NY, USA and 5 Department of Pediatrics, University of California, San Francisco, San Francisco, CA, USA. Correspondence: JA Anguera or EJ Marco, University of California, San Francisco, Mission Bay – Sandler Neurosciences Center, UCSF MC 0444, 675 Nelson Rising Lane, Room 502, San Francisco, CA 94158, USA. E-mail: joaquin.anguera@ucsf.edu or elysa.marco@ucsf.edu Received 6 March 2016; revised 13 July 2016; accepted 18 July 2016 Citation: Transl Psychiatry (2016) 6, e893; doi:10.1038/tp.2016.178 www.nature.com/tp Figure 2. Project: EVO selective attention performance. (a) EVO single- and multi-tasking response time performance f non-affected siblings and non-affected control groups). (b) EVO multi-tasking RT. (c) Visual search task performance Characterizing cognitive control abilities in child JA Anguera et al •Project EVO (게임)을 통해서, •아동 집중력 장애(attention disorder) 관련 특정 유전형 carrier 를 골라낼 수 있음 •게임에서의 Response Time을 기준으로 carrier vs. non-carrier 간 유의미한 차이
  • 150.
    RESEARCH ARTICLE A pilotstudy to determine the feasibility of enhancing cognitive abilities in children with sensory processing dysfunction Joaquin A. Anguera1,2☯ *, Anne N. Brandes-Aitken1☯ , Ashley D. Antovich1 , Camarin E. Rolle1 , Shivani S. Desai1 , Elysa J. Marco1,2,3 1 Department of Neurology, University of California, San Francisco, United States of America, 2 Department of Psychiatry, University of California, San Francisco, United States of America, 3 Department of Pediatrics, University of California, San Francisco, United States of America ☯ These authors contributed equally to this work. * joaquin.anguera@ucsf.edu Abstract Children with Sensory Processing Dysfunction (SPD) experience incoming information in atypical, distracting ways. Qualitative challenges with attention have been reported in these children, but such difficulties have not been quantified using either behavioral or functional neuroimaging methods. Furthermore, the efficacy of evidence-based cognitive control inter- ventions aimed at enhancing attention in this group has not been tested. Here we present work aimed at characterizing and enhancing attentional abilities for children with SPD. A sample of 38 SPD and 25 typically developing children were tested on behavioral, neural, and parental measures of attention before and after a 4-week iPad-based at-home cognitive remediation program. At baseline, 54% of children with SPD met or exceeded criteria on a parent report measure for inattention/hyperactivity. Significant deficits involving sustained attention, selective attention and goal management were observed only in the subset of SPD children with parent-reported inattention. This subset of children also showed reduced midline frontal theta activity, an electroencephalographic measure of attention. Following the cognitive intervention, only the SPD children with inattention/hyperactivity showed both improvements in midline frontal theta activity and on a parental report of inattention. Notably, 33% of these individuals no longer met the clinical cut-off for inattention, with the parent- reported improvements persisting for 9 months. These findings support the benefit of a targeted attention intervention for a subset of children with SPD, while simultaneously highlighting the importance of having a multifaceted assessment for individuals with neuro- developmental conditions to optimally personalize treatment. Introduction Five percent of all children suffer from Sensory Processing Dysfunction (SPD)[1], with these individuals exhibiting exaggerated aversive, withdrawal, or seeking behaviors associated with sensory inputs [2]. These sensory processing differences can have significant and lifelong con- sequences for learning and social abilities, and are often shared by children who meet PLOS ONE | https://doi.org/10.1371/journal.pone.0172616 April 5, 2017 1 / 19 a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 OPEN ACCESS Citation: Anguera JA, Brandes-Aitken AN, Antovich AD, Rolle CE, Desai SS, Marco EJ (2017) A pilot study to determine the feasibility of enhancing cognitive abilities in children with sensory processing dysfunction. PLoS ONE 12(4): e0172616. https://doi.org/10.1371/journal. pone.0172616 Editor: Jacobus P. van Wouwe, TNO, NETHERLANDS Received: October 5, 2016 Accepted: February 1, 2017 Published: April 5, 2017 Copyright: © 2017 Anguera et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability Statement: All relevant data are within the paper and its Supporting Information files. Funding: This work was supported by the Mickelson-Brody Family Foundation, the Wallace Research Foundation, the James Gates Family Foundation, the Kawaja-Holcombe Family Foundation (EJM), and the SNAP 2015 Crowd funding effort. •감각처리장애(SPD)를 가진 소아 환자 중 ADHD를 가진 20명에 대해서 실험 •4주 동안 (주당 5일, 25분)Project EVO 게임을 하게 한 결과, •20명 중 7명이 큰 개선을 보여서 더 이상 ADHD의 범주에 들지 않게 됨 •사용 후 적어도 9개월 동안 효과가 지속되었음 Fig 4. Transfer effect on behavioral and parent report measures. Pre and post (A) response time (B) and respo revealing within group change. Error bars indicate standard error of the mean. Within group main effects of session = p .05, ** =.p .01. Sun symbols indicate statistically significant instances where SPD+IA post-training performa TDC group prior to training. (C) Vanderbilt parent report inattention change bar plot (calculated by pre-post margina significant group x session interaction. Error bars indicate standard error of the mean. All group x session interactio stars (* = p .05, ** =.p .01) on bar graph. https://doi.org/10.1371/journal.pone.0172616.g004 PLOS ONE | https://doi.org/10.1371/journal.pone.0172616 April 5, 2017
  • 151.
    •ADHD에 대해서는 대규모RCT phase III 임상 시험 진행 중이며, FDA 의료기기 인허가 목표 •8-12살 환자(n=330), 치료 효과 없는 비디오게임을 control group으로 •primary endpoint: TOVA •의사의 처방을 받는 ADHD 치료용 게임 + 보험사의 커버 목표
  • 152.
    www.thelancet.com/digital-health Published onlineFebruary 24, 2020 https://doi.org/10.1016/S2589-7500(20)30017-0 1 Articles Lancet Digital Health 2020 Published Online February 24, 2020 https://doi.org/10.1016/ S2589-7500(20)30017-0 See Online/Comment https://doi.org/10.1016/ S2589-7500(20)30058-3 Psychiatry and Behavioral Sciences, Duke University Medical Center, Durham, NC, USA (Prof S H Kollins PhD, Prof R S E Keefe PhD); Duke Clinical Research Institute, Durham, NC, USA (Prof S H Kollins); Akili Interactive Labs, Boston, MA, USA (D J DeLoss PhD, E Cañadas PhD, J Lutz PhD); Department of Psychiatry, Virginia Commonwealth University, Richmond, VA, USA (Prof R L Findling MD); VeraSci, Durham, NC, USA (Prof R S E Keefe); Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA (Prof J N Epstein PhD); Meridien Research Lake Erie College of Osteopathic Medicine, Bradenton, FL, USA (A J Cutler MD); and Psychiatry and Neuroscience and Physiology, SUNY Upstate Medical University, Syracuse, NY, USA (Prof S V Faraone PhD) Correspondence to: Dr Scott Kollins, Psychiatry and Behavioral Sciences, Duke University Medical Center, Durham, NC 27710, USA scott.kollins@duke.edu A novel digital intervention for actively reducing severity of paediatricADHD (STARS-ADHD): a randomised controlledtrial Scott H Kollins, Denton J DeLoss, Elena Cañadas, Jacqueline Lutz, Robert L Findling, Richard S E Keefe, Jeffery N Epstein, Andrew J Cutler, StephenV Faraone Summary Background Attention-deficit hyperactivity disorder (ADHD) is a common paediatric neurodevelopmental disorder with substantial effect on families and society. Alternatives to traditional care, including novel digital therapeutics, have shown promise to remediate cognitive deficits associated with this disorder and may address barriers to standard therapies, such as pharmacological interventions and behavioural therapy. AKL-T01 is an investigational digital therapeutic designed to target attention and cognitive control delivered through a video game-like interface via at-home play for 25 min per day, 5 days per week for 4 weeks. This study aimed to assess whether AKL-T01 improved attentional performance in paediatric patients with ADHD. Methods The Software Treatment for Actively Reducing Severity of ADHD (STARS-ADHD) was a randomised, double- blind, parallel-group, controlled trial of paediatric patients (aged 8–12 years, without disorder-related medications) with confirmed ADHD and Test of Variables of Attention (TOVA) Attention Performance Index (API) scores of −1·8 and below done by 20 research institutions in the USA. Patients were randomly assigned 1:1 to AKL-T01 or a digital control intervention. The primary outcome was mean change in TOVA API from pre-intervention to post-intervention. Safety, tolerability, and compliance were also assessed. Analyses were done in the intention-to-treat population. This trial is registered with ClinicalTrials.gov, NCT02674633 and is completed. Findings Between July 15, 2016, and Nov 30, 2017, 857 patients were evaluated and 348 were randomly assigned to receive AKL-T01 or control. Among patients who received AKL-T01 (n=180 [52%]; mean [SD] age, 9·7 [1·3] years) or control (n=168 [48%]; mean [SD] age, 9·6 [1·3] years), the non-parametric estimate of the population median change from baseline TOVA API was 0·88 (95% CI 0·24–1·49; p=0·0060). The mean (SD) change from baseline on the TOVA API was 0·93 (3·15) in the AKL-T01 group and 0·03 (3·16) in the control group. There were no serious adverse events or discontinuations. Treatment-related adverse events were mild and included frustration (5 [3%] of 180) and headache (3 [2%] of 180). Patient compliance was a mean of 83 (83%) of 100 expected sessions played (SD, 29·2 sessions). Interpretation Although future research is needed for this digital intervention, this study provides evidence that AKL-T01 might be used to improve objectively measured inattention in paediatric patients with ADHD, while presenting minimal adverse events. Funding Sponsored by Akili Interactive Labs. Copyright © 2020 The Author(s). Published by Elsevier Ltd. This is an Open Access article under the CC BY 4.0 license. Introduction Attention-deficit hyperactivity disorder (ADHD) is a neurodevelopmental disorder of persistent impaired attention, hyperactivity, and impulsivity that negatively affects daily functioning and quality of life. ADHD is one of the most commonly diagnosed paediatric mental health disorders, with a prevalence estimated to be 5% worldwide,1 and exerts a substantial burden on families and society.2 Front-line intervention for ADHD includes pharmaco- logical and non-pharmacological interventions, which have shown short-term efficacy.3–5 Existing treatments have side-effects that limit their acceptability,6 are only effective when administered, and may not be as effective for reducing daily impairments versus ADHD symptoms.7 Pharmacotherapy may not be suitable for some patients due to caregiver preferences or concerns about abuse, misuse, and diversion. Barriers to access also limit the use of behavioural interventions, given a shortage of properly trained paediatric mental health specialists8 and variability in insurance coverage for such services.9,10 Indeed, studies in both the USA and the UK have found that most children with paediatric mental health needs do not have proper access to services.11,12 Digital therapeutics for ADHD may address these limitations with improved access, minimal side-effects, and low potential for abuse. Numerous studies and meta-analyses on digital interventions targeting specific cognitive functions have attempted to assess the magnitude of efficacy for children and adolescents with ADHD. In general, the quality of the studies is low, and many do not include a control group.3 Reported effect Lancet Digital Health 2020 6 www.thelancet.com/digital-health Published Figure 2: Primary endpoint:TOVA API mean (SE) change pre-intervention to post-intervention in the intention-to-treat population *Adjusted p0·050; prespecifiedWilcoxon rank-sum test.Triangle represents median change, pre-intervention to post-intervention. AKL-T01 (n=169) Active control (n=160) –0·25 0 0·25 0·50 0·75 Improveme Mean(SE)changein AKL-T01 Control χ² test p Test ofVariables of Attention—Attention Performance Index (type A: improvement 1·4 points) 79/169 (47%) 51/160 (32%) 7·60 0·0058 Attention Performance Index (type B: post-intervention score ≥0) 18/170 (11%) 7/160 (4%) 4·54 0·033 ADHD-Rating Scale (improvement ≥2 points from pre-intervention to post-intervention) 128/173 (74%) 119/164 (73%) 0·088 0·77 ADHD-Rating Scale (≥30% reduction)* 42/173 (24%) 31/164 (19%) 1·43 0·23 Impairment Rating Scale 82/171 (48%) 60/161 (37%) 3·87 0·049 Clinical Global Impressions (≤2 at post- intervention) 29/175 (17%) 26/164 (16%) 0·032 0·86 Clinical Global Impressions (1 at post-intervention) 1/175 (1%) 1/164 (1%) 0·0021 0·96 Data are n/N (%) unless otherwise indicated. AKL-T01=an investigational digital therapeutic. *Post-hoc analysis. ADHD=Attention-deficit hyperactivity disorder. AKL-T01=an investigational digital therapeutic. Table 2: Clinical responder analysis intention-to-treat population and between measures. This trial is registered with ClinicalTrials.gov, NCT02674633. Role of the funding source The funder had a role in study conception and design, confirming data and statistical analyses, and conducting the study. All authors had full access to all the data in the study and were involved in data interpretation and writing of the report. The corresponding author had final responsibility for the decision to submit for publication. Results Of 857 children screened for eligibility, 348 patients were randomly assigned to receive AKL-T01 (n=180) or control (n=168) between July 15, 2016, and Nov 30, 2017 (figure 1 and appendix p 3). Demographic and clinical character- istics at baseline are shown in table 1. The mean number of sessions completed by patients in the AKL-T01 group was 83·2 out of 100 sessions (83% instructed use; SD=29·2 sessions). Patients in the control group used their intervention 480·7 min of 500 min (96% instructed use). There was a significant difference between intervention groups on the primary efficacy endpoint (adjusted p=0·0060); non-parametric estimate of the population median change (Hodges-Lehmann estimate) was 0·88 (95% CI 0·24–1·49). The mean (SD) change from baseline on the TOVA API was 0·93 (3·15) in the AKL-T01 group and 0·03 (3·16) in the control group (figure 2). There were no intervention-group differences for secondary measures: IRS, ADHD-RS, ADHD-RS-I, ADHD-RS-H, BRIEF- Parent Inhibit and Working Memory and Metacognition ADHD-Rating Scale—Inattentive 21·9 (3·5) 21·6 (3·7) ADHD-Rating Scale—Hyperactivity 17·1 (6·0) 16·7 (5·4) Clinical Global Impressions—Severity† 4·5 (0·7) 4·6 (0·6) Data are n (%) or mean (SD). AKL-T01=an investigational digital therapeutic. *n=179 for AKL-T01. †Assessed only at baseline. Table 1: Baseline characteristics Figure 2: Primary endpoint:TOVA API mean (SE) change pre-intervention to post-intervention in the intention-to-treat population *Adjusted p0·050; prespecifiedWilcoxon rank-sum test.Triangle represents median change, pre-intervention to post-intervention. AKL-T01 (n=169) Active control (n=160) –0·25 0 0·25 0·50 0·75 1·00 1·25 1·50 Improvement Mean(SE)changeinTOVAAPI * AKL-T01 Control χ² test p Primary Outcome Secondary Outcome •Primary Outcome인 TOVA API 에 대해서는 대조군 대비 유의미한 개선 효과를 보임 •Secondary Outcome 들에 대해서는 유의미한 개선 효과를 보이지 못함
  • 153.
  • 154.
  • 155.
  • 156.
    • Woebot, 정신상담 챗봇 스타트업 • 스탠퍼드의 mental health 전문가들이 시작한 우울증 치료 (인지행동치료) 목적의 챗봇 • Andrew Ng 교수는 이사회장으로 참여
  • 157.
    • Woebot, 정신상담 챗봇 • 실제 상담사들이 하듯이, 대화형으로 설명하고 사용자의 정신 건강 상태를 체크 • 대부분 설문과 다를 것이 없지만 (정해진 답 중에 하나 선택), UI 상의 혁신이라고 볼 수 있음 • 아직까지는 아주 정교한 NLP를 사용하고 있지는 않음 (세션 당 한 번 정도)
  • 158.
    • Woebot, 정신상담 챗봇 • 실제 상담사들이 하듯이, 대화형으로 설명하고 사용자의 정신 건강 상태를 체크 • 대부분 설문과 다를 것이 없지만 (정해진 답 중에 하나 선택), UI 상의 혁신이라고 볼 수 있음 • 아직까지는 아주 정교한 NLP를 사용하고 있지는 않음 (세션 당 한 번 정도)
  • 159.
    • Woebot, 정신상담 챗봇 • 실제 상담사들이 하듯이, 대화형으로 설명하고 사용자의 정신 건강 상태를 체크 • 대부분 설문과 다를 것이 없지만 (정해진 답 중에 하나 선택), UI 상의 혁신이라고 볼 수 있음 • 아직까지는 아주 정교한 NLP를 사용하고 있지는 않음 (세션 당 한 번 정도)
  • 160.
    Original Paper Delivering CognitiveBehavior Therapy to Young Adults With Symptoms of Depression and Anxiety Using a Fully Automated Conversational Agent (Woebot): A Randomized Controlled Trial Kathleen Kara Fitzpatrick1* , PhD; Alison Darcy2* , PhD; Molly Vierhile1 , BA 1 Stanford School of Medicine, Department of Psychiatry and Behavioral Sciences, Stanford, CA, United States 2 Woebot Labs Inc., San Francisco, CA, United States * these authors contributed equally Corresponding Author: Alison Darcy, PhD Woebot Labs Inc. 55 Fair Avenue San Francisco, CA, 94110 United States Email: alison@woebot.io Abstract Background: Web-based cognitive-behavioral therapeutic (CBT) apps have demonstrated efficacy but are characterized by poor adherence. Conversational agents may offer a convenient, engaging way of getting support at any time. Objective: The objective of the study was to determine the feasibility, acceptability, and preliminary efficacy of a fully automated conversational agent to deliver a self-help program for college students who self-identify as having symptoms of anxiety and depression. Methods: In an unblinded trial, 70 individuals age 18-28 years were recruited online from a university community social media site and were randomized to receive either 2 weeks (up to 20 sessions) of self-help content derived from CBT principles in a conversational format with a text-based conversational agent (Woebot) (n=34) or were directed to the National Institute of Mental Health ebook, “Depression in College Students,” as an information-only control group (n=36). All participants completed Web-based versions of the 9-item Patient Health Questionnaire (PHQ-9), the 7-item Generalized Anxiety Disorder scale (GAD-7), and the Positive and Negative Affect Scale at baseline and 2-3 weeks later (T2). Results: Participants were on average 22.2 years old (SD 2.33), 67% female (47/70), mostly non-Hispanic (93%, 54/58), and Caucasian (79%, 46/58). Participants in the Woebot group engaged with the conversational agent an average of 12.14 (SD 2.23) times over the study period. No significant differences existed between the groups at baseline, and 83% (58/70) of participants provided data at T2 (17% attrition). Intent-to-treat univariate analysis of covariance revealed a significant group difference on depression such that those in the Woebot group significantly reduced their symptoms of depression over the study period as measured by the PHQ-9 (F=6.47; P=.01) while those in the information control group did not. In an analysis of completers, participants in both groups significantly reduced anxiety as measured by the GAD-7 (F1,54= 9.24; P=.004). Participants’ comments suggest that process factors were more influential on their acceptability of the program than content factors mirroring traditional therapy. Conclusions: Conversational agents appear to be a feasible, engaging, and effective way to deliver CBT. (JMIR Ment Health 2017;4(2):e19) doi:10.2196/mental.7785 KEYWORDS conversational agents; mobile mental health; mental health; chatbots; depression; anxiety; college students; digital health Introduction Up to 74% of mental health diagnoses have their first onset particularly common among college students, with more than half reporting symptoms of anxiety and depression in the previous year that were so severe they had difficulty functioning Fitzpatrick et alJMIR MENTAL HEALTH
  • 161.
    depression at baselineas measured by the PHQ-9, while three-quarters (74%, 52/70) were in the severe range for anxiety as measured by the GAD-7. Figure 1. Participant recruitment flow. Table 1. Demographic and clinical variables of participants at baseline. WoebotInformation control Scale, mean (SD) 14.30 (6.65)13.25 (5.17)Depression (PHQ-9) 18.05 (5.89)19.02 (4.27)Anxiety (GAD-7) 25.54 (9.58)26.19 (8.37)Positive affect 24.87 (8.13)28.74 (8.92)Negative affect 22.58 (2.38)21.83 (2.24)Age, mean (SD) Gender, n (%) 7 (21)4 (7)Male 27 (79)20 (55)Female Ethnicity, n (%) 2 (6)2 (8)Latino/Hispanic 32 (94)22 (92)Non-Latino/Hispanic 28 (82)18 (75)Caucasian Fitzpatrick et alJMIR MENTAL HEALTH Delivering Cognitive Behavior Therapy toYoung Adults With Symptoms of Depression and Anxiety Using a Fully Automated Conversational Agent (Woebot):A Randomized Controlled Trial •분노장애와 우울증이 있다고 스스로 생각하는 대학생들이 사용하는 self-help 챗봇 •목적: 챗봇의 feasibility, acceptability, preliminary efficacy 를 보기 위함 •대학생 총 70명을 대상으로 2주 동안 진행 •실험군 (Woebot): 34명 •대조군 (information-only): 31명 •Oucome: PHQ-9, GAD-7
  • 162.
    d cPFWoebotInformation-only control 95%CIb T2a 95% CIb T2a 0.44.0176.039.74-12.3211.14 (0.71)12.07-15.2713.67 (.81)PHQ-9 0.14.5810.3816.16-18.1317.35 (0.60)15.52-18.5616.84 (.67)GAD-7 0.02.7070.1724.35-29.4126.88 (1.29)23.17-28.8626.02 (1.45)PANAS positive affect 0.344.9120.9123.54-28.4225.98 (1.24)24.73-30.3227.53 (1.42)PANAS nega- tive affect a Baseline=pooled mean (standard error) b 95% confidence interval. c Cohen d shown for between-subjects effects using means and standard errors at Time 2. Figure 2. Change in mean depression (PHQ-9) score by group over the study period. Error bars represent standard error. Preliminary Efficacy Table 2 shows the results of the primary ITT analyses conducted on the entire sample. Univariate ANCOVA revealed a significant treatment effect on depression revealing that those in the Woebot group significantly reduced PHQ-9 score while those in the information control group did not (F1,48=6.03; P=.017) (see Figure 2). This represented a moderate between-groups effect size (d=0.44). This effect is robust after Bonferroni correction for multiple comparisons (P=.04). No other significant between-group differences were observed on anxiety or affect. Completer Analysis As a secondary analysis, to explore whether any main effects existed, 2x2 repeated measures ANOVAs were conducted on the primary outcome variables (with the exception of PHQ-9) among completers only. A significant main effect was observed on GAD-7 (F1,54=9.24; P=.004) suggesting that completers experienced a significant reduction in symptoms of anxiety between baseline and T2, regardless of the group to which they were assigned with a within-subjects effect size of d=0.37. No main effects were observed for positive (F1,50=.001; P=.951; d=0.21) or negative affect (F1,50=.06; P=.80; d=0.003) as measured by the PANAS. To further elucidate the source and magnitude of change in depression, repeated measures dependent t tests were conducted and Cohen d effect sizes were calculated on individual items of the PHQ-9 among those in the Woebot condition. The analysis revealed that baseline-T2 changes were observed on the following items in order of decreasing magnitude: motoric symptoms (d=2.09), appetite (d=0.65), little interest or pleasure in things (d=0.44), feeling bad about self (d=0.40), and concentration (d=0.39), and suicidal thoughts (d=0.30), feeling down (d=0.14), sleep (d=0.12), and energy (d=0.06). JMIR Ment Health 2017 | vol. 4 | iss. 2 | e19 | p.6http://mental.jmir.org/2017/2/e19/ (page number not for citation purposes) XSL•FO RenderX Change in mean depression (PHQ-9) score by group over the study period •결과 •챗봇을 2주 동안 평균 12.14번 사용함 •우울증에 대해서는 significant group difference •Woebot 그룹에서는 우울증(PHQ-9)의 유의미한 감소가 있었음 •대조군에서는 유의미한 감소 없음 •분노 장애에 대해서는 두 그룹 모두 유의미한 감소가 있었음 (GAD-7 기준)
  • 163.
    Digital Therapeutics andDigital Medicine Summit | February 2018 After the endpoint: how digital medicine can transcend traditional research standards Peter Hames, CEO Co-Founder
  • 164.
    Our starting pointis sleep - a destigmatized “way in” to broader mental health Digital Therapeutics and Digital Medicine Summit | February 2018 Reference: Luik, A. et al. (2017), Behavioural and Cognitive Psychotherapy.
  • 165.
    Our first product isSleepio • A fully automated Cognitive Behavioral Therapy (CBT) program for insomnia • Accessible via app and web, it is an effective digital medicine for insomnia • Helps alleviate co-morbid anxiety and depression For more info see bighealth.com/our-solution Digital Therapeutics and Digital Medicine Summit | February 2018 • 불면증에 대한 인지행동치료 (Cognitive Behavioral Therapy, CBT) • 인지 치료: 잠에 대한 잘못된 생각을 바로 잡는 치료 (교육, 자극 조절, 인지 재구성…) • 행동 치료: 잠에 방해가 되는 행동/습관 교정 (수면 위생법, 이완/호흡 훈련 …) • Sleepio: 기존에는 대면으로 치료 받던 방식을, App으로 구현한 것 (최소 6주 과정)
  • 166.
    • 불면증 개선효과에 대한 임상적인 근거가 상당히 탄탄함 • 30 peer-reviewed journal, 8 RCT • FDA 인허가 프로세스를 밟지 않고 (치료 claim을 하지 않고), 사업 중
  • 167.
    nature “About as effectiveas CBT delivered in person” THE LANCET “A proven intervention for sleep disorders” Digital Therapeutics and Digital Medicine Summit | February 2018 Reference: Espie et al. (2012), SLEEP. Average change in CBT group: SOL: 47 à 21mins, WASO: 76 à 28mins); Lancee et al. (2016), SLEEP. % insomnia sufferers achieving healthy sleep Placebo Treatment as usual In-person CBT-I 76% 29% 18% 70-75% Validation of Sleepio’s effectiveness in improving sleep quality
  • 168.
    디지털 헬스케어의 3단계 •Step1. 데이터의 측정 •Step 2. 데이터의 통합 •Step 3. 데이터의 분석
  • 169.
    •복잡한 의료 데이터의분석 및 insight 도출 •영상 의료/병리 데이터의 분석/판독 •연속 데이터의 모니터링 및 예방/예측 의료 인공지능의 세 유형
  • 170.
    AnalysisTarget Discovery AnalysisLeadDiscovery Clinical Trial Post Market Surveillance Digital Healthcare in Drug Development •개인 유전 정보 분석 •블록체인 기반 유전체 분석 •딥러닝 기반 후보 물질 •인공지능+제약사 •환자 모집 •데이터 측정: 웨어러블 •디지털 표현형 •복약 순응도 •SNS 기반의 PMS •블록체인 기반의 PMS + Digital Therapeutics
  • 172.
    Feedback/Questions • Email: yoonsup.choi@gmail.com •Blog: http://www.yoonsupchoi.com • Facebook: Yoon Sup Choi • Youtube: 최윤섭의 디지털 헬스케어